# Welcome to nxneo4j!
#### nxneo4j is a library that enables you to use networkX type of commands to interact with Neo4j. 

Check out the following Mediumn article before you begin:

- https://medium.com/neo4j/nxneo4j-networkx-api-for-neo4j-a-new-chapter-9fc65ddab222
- https://github.com/ybaktir/networkx-neo4j

### _Latest version is 0.0.3_
If not already installed, install the latest version like this:

In [43]:
# ! pip uninstall -y networkx-neo4j #remove the old installation

In [44]:
# ! pip install git+https://github.com/ybaktir/networkx-neo4j

In [45]:
import datetime, time
print ('Last run on: ' + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' ' + repr(time.tzname))

Last run on: 2021-12-10 19:25:24 ('IST', 'IST')


## Connect to Neo4j

Open a free Sandbox session at sandbox.neo4j.com:

https://sandbox.neo4j.com/

Get the connections details like the following

In [46]:
user = 'neo4j'
# password = '84KCr8fxUP5h8nmx8NdIrZIBR5mmrFaIoGoZTkesq5U' # Aura Free password
# uri = 'neo4j+s://01141c89.databases.neo4j.io' # Aura Free URI
password = "unonothing"
uri = "bolt://localhost:7687"

from neo4j import GraphDatabase
import nxneo4j as nx
driver = GraphDatabase.driver(uri=uri,auth=(user,password))
                              #OR "bolt://localhost:7673" for Neo4j Desktop
                              #OR the cloud url
G = nx.Graph(driver)                            

## Add Nodes

```
#Add a node
G.add_node("Yusuf")

#Add node with features
G.add_node("Nurgul",gender='F')

#Add multiple properties at once
G.add_node("Betul",age=4,gender='F')

#Check nodes
for node in G.nodes():   #Unlike networkX, nxneo4j returns a generator
    print(node)

#Or simply
list(G.nodes())

#Get the data associated with each node
list(G.nodes(data=True))

#number of nodes
len(G)

#Check a particular node feature
G.nodes['Betul']

#You can be more specific
G.nodes['Betul']['age']
```

## Add Edges

```
#Add one edge
G.add_edge('Yusuf','Betul')

#You can change the default connection label like the following
G.relationship_type = 'LOVES'
G.add_edge('Yusuf','Nurgul')
G.add_edge('Nurgul','Yusuf')

#You can add properties as well
G.add_edge('Betul','Nurgul',how_much='More than Dad')

#display the values
list(G.edges(data=True))

G.relationship_type = 'CONNECTED'

G.add_edges_from([(1,2),(3,4)])
```

## Remove Nodes

```
G.remove_node('Yusuf')

list(G.nodes())
```

## Graph Data Science

There are several builtin graph algorithms in Neo4j. nxneo4j will expand to cover all of them in the future versions. For now, the following networkX algorithms are supported: 
- pagerank
- betweenness_centrality
- closeness_centrality
- label_propagation
- connected_components
- clustering 
- triangles
- shortest_path
- shortest_weighted_path

Let's delete all data and load GOT data:

In [47]:
# G.delete_all()
# G.load_got()

#You can change the default parameters like the following:

# G.identifier_property = 'name'
# G.relationship_type = '*'
# G.node_label = 'Character'

In [48]:
len(G) #796 nodes

359

In [49]:
# nx.draw(G)

## 1. Centrality Algorithms

We’ll start with the famous PageRank algorithm. Let’s find out who the most influential characters in Game of Thrones are:

### Pagerank

We’ll start with the famous PageRank algorithm. Let’s find out who the most influential characters in Game of Thrones are:

In [50]:
nx.pagerank(G) #RAW OUTPUT

{'Baby One More Time': 3.1819176563922804,
 'a song': 0.8795088963450438,
 'Britney Spears': 1.79201436124696,
 'single': 0.4204630007933438,
 'song': 3.068471761156469,
 'release': 0.4204630007933438,
 'Work': 1.2509600764878221,
 'WrittenWork': 0.920305566416573,
 'PeriodicalLiterature': 0.920305566416573,
 'human': 2.7896433304770185,
 'person': 2.383359269447412,
 'natural person': 2.383359269447412,
 'Person': 2.3064925553929085,
 'Agent': 2.5669540475390162,
 'Artist': 0.8512196487066803,
 'Baby': 0.8578674141585692,
 'demographic profile': 0.5145935844943342,
 'One': 1.7527149041232508,
 'power of 10': 0.5224518838646621,
 'centered triangular number': 0.5224518838646621,
 'centered pentagonal number': 0.5224518838646621,
 'Time': 2.6868284202716843,
 'online magazine': 0.6742390932198294,
 'print news magazine': 0.6742390932198294,
 'magazine': 0.6742390932198294,
 'periodical': 0.6498425656232297,
 'communication medium': 0.40375600735369294,
 'literary form': 0.43980010803444

In [51]:
# the most influential characters
response = nx.pagerank(G)
sorted_pagerank = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_pagerank[:10]:
    print(character, score)

hit 4.547836915564434
bears 4.298359231029326
People 4.053190371498244
time 3.81858483217766
brand 3.6860308525134053
generally a time 3.6175016840413816
USA 3.535534185701005
fight 3.4124380920571245
people 3.40974057616403
IBM 3.370678279460246


### Betweenness centrality

We can also run betweenness centrality over the dataset. This algorithm will tell us which nodes are the most 'pivotal' i.e. how many of the shortest paths between pairs of characters must pass through them

In [52]:
# Betweenness centrality
nx.betweenness_centrality(G) #RAW OUTPUT

{' unnecessarily': 0.0,
 ' were married': 0.0,
 '(meta)class': 0.0,
 '1984': 1068.0,
 'Agent': 7551.820161206322,
 'American football position': 0.0,
 'American football team': 0.0,
 'Animal': 0.0,
 'Artist': 674.2339757018091,
 'Baby': 357.0,
 'Baby One More Time': 4814.8920033820805,
 'Biden': 1634.72012605214,
 'Britney': 1529.5264813254832,
 'Britney Spears': 3607.6006543376,
 'City': 0.0,
 'Clinton': 0.0,
 'Company': 0.0,
 'Country': 104.21972037937964,
 'Das': 240.0698273432797,
 'Devi': 0.0,
 'Diwali': 21063.220192717505,
 'Donald': 586.1402255639094,
 'Donald Trump': 1915.4397583657908,
 'EthnicGroup': 0.0,
 'Eukaryote': 0.0,
 'Festival': 16.84825174825174,
 'Godess': 357.0,
 'Gujratis': 4408.565053837859,
 'Han surname': 0.0,
 'Hillary': 0.23076923076923078,
 'Hillary Clinton': 430.46953969840735,
 'Holiday': 437.58009721272924,
 'IBM': 5738.205459496755,
 'India': 5638.34270311193,
 'It': 7957.511940262792,
 'Jackson': 1592.7166048657236,
 'Joe': 625.1376053985568,
 'Joe Bide

In [53]:
# RANKED OUTPUT
response = nx.betweenness_centrality(G)

sorted_bw = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_bw[:10]:
    print(character, score)

Diwali 21063.2201927175
people 10664.962564108222
Lights 9341.867883811612
time 8565.207622702375
end 8339.541986155236
Michael Jackson 8126.991797600441
It 7957.5119402627915
Agent 7551.820161206323
a shopping spree 6939.3780609395
USA 6894.67464425404


### Closeness centrality

Closeness centrality tells us on average how many hops away each character is from every other character.

In [54]:
# Closeness centrality
nx.closeness_centrality(G) #RAW OUTPUT

{' unnecessarily': 0.12574639971900245,
 ' were married': 0.12891609650702196,
 '(meta)class': 0.13137614678899082,
 '1984': 0.15674255691768826,
 'Agent': 0.24026845637583893,
 'American football position': 0.13570887035633056,
 'American football team': 0.13166605369621184,
 'Animal': 0.13166605369621184,
 'Artist': 0.20801859384079024,
 'Baby': 0.15279556124626548,
 'Baby One More Time': 0.18008048289738432,
 'Biden': 0.20562894887995406,
 'Britney': 0.20515759312320916,
 'Britney Spears': 0.20850320326150262,
 'City': 0.1512463033375581,
 'Clinton': 0.1881240147136101,
 'Company': 0.17971887550200802,
 'Country': 0.1967032967032967,
 'Das': 0.1725301204819277,
 'Devi': 0.17236398651901783,
 'Diwali': 0.24470266575529734,
 'Donald': 0.17344961240310078,
 'Donald Trump': 0.20681686886192951,
 'EthnicGroup': 0.17549019607843136,
 'Eukaryote': 0.13166605369621184,
 'Festival': 0.17063870352716873,
 'Godess': 0.17523250122369066,
 'Gujratis': 0.21271538918597743,
 'Han surname': 0.15131

In [55]:
# RANKED
response = nx.closeness_centrality(G)

sorted_cc = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_cc[:10]:
    print(character, score)

Diwali 0.24470266575529734
Agent 0.24026845637583893
Lights 0.23645970937912814
It 0.23368146214099217
human 0.2308188265635074
Person 0.22992935131663456
natural person 0.22992935131663456
person 0.22992935131663456
people 0.22417031934877896
IBM 0.21896024464831804


## 2. Community Detection Algoritms

### Label Propagation
We can also partition the characters into communities using the label propagation algorithm

In [56]:
# Label propagation
nx.label_propagation_communities(G) #RAW OUPUT is a generator

<generator object label_propagation_communities at 0x104d87820>

In [57]:
communities = nx.label_propagation_communities(G)
sorted_communities = sorted(communities, key=lambda x: len(x), reverse=True)
for community in sorted_communities[:10]:
    print(list(community)[:10])

['collection', 'Country', 'marker', 'composition_of_music', 'goat_sucker', 'shore', 'social relation', 'castle', 'model', 'irreflexive_action']
['alergic_to_fur', 'time_period', 'merchant', 'It', 'electrical_device', 'side', 'thing', 'river', 'musical ensemble', 'people']
['process', 'series_of_steps_taken', 'secured_loan', 'written_loan_contract', 'quantity', 'Mortgage renewal process', 'repetition', 'construction', 'unit_of_time', 'authorized_agreement']
['a rocket', 'rocket', 'fervour', 'Mitzvah', 'machine', 'great fervour', 'We', 'faster_than_aeroplane', 'artifact']
[' were married', 'who', 'three bears', 'there']


Characters are in the same community as those other characters with whom they frequently interact. The idea is that characters have closer ties to those in their community than to those outside.



### Clustering
We can calculate the clustering coefficient for each character. A clustering coefficient of '1' means that all characters that interact with that character also interact with each other:

In [58]:
# Clustering
nx.clustering(G) #RAW OUTPUT

{'exchanging gifts': 1.0,
 'Clinton': 1.0,
 'headquarters': 1.0,
 'who': 1.0,
 'attending feasts': 0.6666666666666666,
 'big brother': 0.6666666666666666,
 'is better than Donald Trump': 0.5,
 'Hillary': 0.5,
 'a very large company': 0.3333333333333333,
 'there': 0.3333333333333333,
 'The US President': 0.3333333333333333,
 'President': 0.3333333333333333,
 'a song': 0.3333333333333333,
 'WrittenWork': 0.3333333333333333,
 'PeriodicalLiterature': 0.3333333333333333,
 'feasts': 0.3333333333333333,
 'the Festival': 0.3333333333333333,
 'Donald': 0.3333333333333333,
 'Manoj': 0.3,
 'Trump': 0.2857142857142857,
 'Hillary Clinton': 0.2857142857142857,
 'Biden': 0.2857142857142857,
 'Donald Trump': 0.24444444444444444,
 'Britney': 0.21428571428571427,
 'also the New Year': 0.2,
 'Joe Biden': 0.19444444444444445,
 'three bears': 0.16666666666666666,
 'Britney Spears': 0.16666666666666666,
 'Work': 0.16666666666666666,
 'Artist': 0.16666666666666666,
 'Rinku Das': 0.16666666666666666,
 'Rinku'

In [59]:
response = nx.clustering(G)

biggest_coefficient = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character in biggest_coefficient[:10]:
    print(list(character)[:10])

['exchanging gifts', 1.0]
['Clinton', 1.0]
['headquarters', 1.0]
['who', 1.0]
['attending feasts', 0.6666666666666666]
['big brother', 0.6666666666666666]
['is better than Donald Trump', 0.5]
['Hillary', 0.5]
['a very large company', 0.3333333333333333]
['there', 0.3333333333333333]


In [60]:
list(nx.connected_components(G))

[{' unnecessarily',
  ' were married',
  '(meta)class',
  '1984',
  'Agent',
  'American football position',
  'American football team',
  'Animal',
  'Artist',
  'Baby',
  'Baby One More Time',
  'Biden',
  'Britney',
  'Britney Spears',
  'City',
  'Clinton',
  'Company',
  'Country',
  'Das',
  'Devi',
  'Diwali',
  'Donald',
  'Donald Trump',
  'EthnicGroup',
  'Eukaryote',
  'Festival',
  'Godess',
  'Gujratis',
  'Han surname',
  'Hillary',
  'Hillary Clinton',
  'Holiday',
  'IBM',
  'India',
  'It',
  'Jackson',
  'Joe',
  'Joe Biden',
  'Justin',
  'Lakshmi',
  'Lights',
  'Maharashtra',
  'Manoj',
  'Manoj Das',
  'Michael',
  'Michael Jackson',
  'Mitzvah',
  'Mortgage',
  'Mortgage renewal process',
  'Mukarrabun',
  'Mumbai',
  'MusicalWork',
  'New',
  'Once upon a time',
  'One',
  'Organisation',
  'People',
  'PeriodicalLiterature',
  'Person',
  'Place',
  'PopulatedPlace',
  'President',
  'RecordLabel',
  'Retailers',
  'Rinku',
  'Rinku Das',
  'Settlement',
  'Spe

In [61]:
nx.number_connected_components(G)

1

In [62]:
nx.triangles(G) #RAW OUTPUT

{' unnecessarily': 0,
 ' were married': 0,
 '(meta)class': 0,
 '1984': 0,
 'Agent': 4,
 'American football position': 0,
 'American football team': 0,
 'Animal': 0,
 'Artist': 1,
 'Baby': 0,
 'Baby One More Time': 4,
 'Biden': 6,
 'Britney': 6,
 'Britney Spears': 6,
 'City': 0,
 'Clinton': 1,
 'Company': 0,
 'Country': 0,
 'Das': 0,
 'Devi': 0,
 'Diwali': 4,
 'Donald': 1,
 'Donald Trump': 11,
 'EthnicGroup': 0,
 'Eukaryote': 0,
 'Festival': 0,
 'Godess': 0,
 'Gujratis': 0,
 'Han surname': 0,
 'Hillary': 3,
 'Hillary Clinton': 8,
 'Holiday': 0,
 'IBM': 3,
 'India': 0,
 'It': 3,
 'Jackson': 0,
 'Joe': 0,
 'Joe Biden': 7,
 'Justin': 0,
 'Lakshmi': 0,
 'Lights': 1,
 'Maharashtra': 0,
 'Manoj': 3,
 'Manoj Das': 7,
 'Michael': 0,
 'Michael Jackson': 0,
 'Mitzvah': 0,
 'Mortgage': 0,
 'Mortgage renewal process': 0,
 'Mukarrabun': 0,
 'Mumbai': 0,
 'MusicalWork': 0,
 'New': 1,
 'Once upon a time': 0,
 'One': 0,
 'Organisation': 0,
 'People': 0,
 'PeriodicalLiterature': 1,
 'Person': 4,
 'Place

## 3. Path Finding Algorithms

Let's find the distance between two characters

In [63]:
# Shortest path
nx.shortest_path(G, source="Tyrion-Lannister", target="Hodor")

ClientError: {code: Neo.ClientError.Procedure.ProcedureNotFound} {message: There is no procedure with the name `gds.alpha.shortestPath.stream` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.}

In [64]:
# Shortest weighted path
nx.shortest_weighted_path(G, source="Tyrion-Lannister", target="Hodor",weight='weight')

ClientError: {code: Neo.ClientError.Procedure.ProcedureNotFound} {message: There is no procedure with the name `gds.alpha.shortestPath.stream` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.}

In [None]:
G.identifier_property = 'name'