# Welcome to nxneo4j!
#### nxneo4j is a library that enables you to use networkX type of commands to interact with Neo4j. 

### _Latest version is 0.0.3_
If not already installed, install the latest version like this:

In [None]:
! pip uninstall -y networkx-neo4j #remove the old installation

In [None]:
! pip install git+https://github.com/ybaktir/networkx-neo4j #install the latest one

In [34]:
import datetime, time
print ('Last run on: ' + datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S") + ' ' + repr(time.tzname))

Last run on: 2020-08-29 05:08:27 ('CST', 'CDT')


## Connect to Neo4j

In [35]:
from neo4j import GraphDatabase

In [36]:
driver = GraphDatabase.driver(uri="bolt://localhost:11003",auth=("neo4j","your_password"))
                              #OR "bolt://localhost:7673"
                              #OR the cloud url

In [37]:
import nxneo4j as nx

In [38]:
G = nx.Graph(driver)

In [39]:
G.delete_all()  #This will delete all the data, be careful
                #Just making sure that the results are reprodusible.

## Add Nodes

In [40]:
#Add a node
G.add_node("Yusuf")

In [41]:
#Add node with features
G.add_node("Nurgul",gender='F')

In [42]:
#Add multiple properties at once
G.add_node("Betul",age=4,gender='F')

In [43]:
#Check nodes
for node in G.nodes():   #Unlike networkX, nxneo4j returns a generator
    print(node)

Yusuf
Nurgul
Betul


In [44]:
#Or simply
list(G.nodes())

['Yusuf', 'Nurgul', 'Betul']

In [45]:
#Get the data associated with each node
list(G.nodes(data=True))

[('Yusuf', {}),
 ('Nurgul', {'gender': 'F'}),
 ('Betul', {'gender': 'F', 'age': 4})]

In [46]:
#number of nodes
len(G)

3

In [47]:
#Display
nx.draw(G) #It is interactive, drag the nodes!

In [48]:
#Check a particular node feature
G.nodes['Betul']

{'gender': 'F', 'age': 4}

In [49]:
#You can be more specific
G.nodes['Betul']['age']

4

In [50]:
G.add_nodes_from([1,2,3,4])

In [51]:
list(G.nodes())

['Yusuf', 'Nurgul', 'Betul', 1, 2, 3, 4]

## Add Edges

In [52]:
#Add one edge
G.add_edge('Yusuf','Max')

In [53]:
nx.draw(G) #default relationship label is "CONNECTED"

In [54]:
#You can change the default connection label like the following
G.relationship_type = 'LOVES'

In [55]:
G.add_edge('Yusuf','Nurgul')
G.add_edge('Nurgul','Yusuf')

In [56]:
nx.draw(G)

In [57]:
#You can add properties as well
G.add_edge('Betul','Nurgul',how_much='More than Dad')

In [58]:
#display the values
list(G.edges(data=True))

[('Yusuf', 'Nurgul', {}),
 ('Nurgul', 'Yusuf', {}),
 ('Betul', 'Nurgul', {'how_much': 'More than Dad'})]

In [59]:
G.relationship_type = 'CONNECTED'

In [60]:
G.add_edges_from([(1,2),(3,4)])

In [61]:
nx.draw(G)

## Remove Nodes

In [62]:
G.remove_node('Yusuf')

In [63]:
list(G.nodes())

['Nurgul', 'Betul', 1, 2, 3, 4, 'Max']

## Graph Data Science

There are several builtin graph algorithms in Neo4j. nxneo4j will expand to cover all of them in the future versions. For now, the following networkX algorithms are supported: 
- pagerank
- betweenness_centrality
- closeness_centrality
- label_propagation
- connected_components
- clustering 
- triangles
- shortest_path
- shortest_weighted_path

Let's delete all data and load GOT data:

In [64]:
G.delete_all()
G.load_got()

In [65]:
#You can change the default parameters like the following:
G.identifier_property = 'name'
G.relationship_type = '*'
G.node_label = 'Character'

In [66]:
nx.draw(G) #Zoom in to see the names :)

In [67]:
len(G) #796 nodes

796

## 1. Centrality Algorithms

We’ll start with the famous PageRank algorithm. Let’s find out who the most influential characters in Game of Thrones are:

### Pagerank

We’ll start with the famous PageRank algorithm. Let’s find out who the most influential characters in Game of Thrones are:

In [68]:
nx.pagerank(G) #RAW OUTPUT

{'Addam-Marbrand': 1.1530906738907205,
 'Aegon-I-Targaryen': 1.340136593459244,
 'Aemon-Targaryen-(Maester-Aemon)': 3.682016526121894,
 'Aerys-II-Targaryen': 2.8191940473302197,
 'Aggo': 1.6893371647366078,
 'Albett': 0.4097704635215549,
 'Alliser-Thorne': 1.8608571494019561,
 'Alyn': 0.530801021036803,
 'Arthur-Dayne': 0.9644833384095991,
 'Arya-Stark': 11.692111895814374,
 'Arys-Oakheart': 0.9630584521588099,
 'Balon-Greyjoy': 2.188368580915064,
 'Balon-Swann': 1.6930206156822265,
 'Barristan-Selmy': 4.549115194358746,
 'Benjen-Stark': 1.7986023397149324,
 'Beric-Dondarrion': 1.981855961227795,
 'Boros-Blount': 1.8226806988970878,
 'Bowen-Marsh': 2.1303404675966933,
 'Bran-Stark': 8.087343455757251,
 'Brandon-Stark': 0.8709915236288859,
 'Bronn': 2.959527472498035,
 'Brynden-Tully': 2.4279120011438535,
 'Catelyn-Stark': 10.61921863453427,
 'Cayn': 0.3781995114182133,
 'Cersei-Lannister': 13.402380343756127,
 'Chella': 0.42373295618974444,
 'Chett': 0.9191578976190683,
 'Chiggen': 0.3

In [69]:
# the most influential characters
response = nx.pagerank(G)
sorted_pagerank = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_pagerank[:10]:
    print(character, score)

Jon-Snow 17.596909502152542
Tyrion-Lannister 17.5681362401036
Jaime-Lannister 13.925499363526765
Cersei-Lannister 13.402380343756127
Daenerys-Targaryen 12.499217151004377
Stannis-Baratheon 12.150398137088176
Arya-Stark 11.692111895814374
Robb-Stark 11.27772586147388
Eddard-Stark 10.683881511879148
Catelyn-Stark 10.61921863453427


### Betweenness centrality

We can also run betweenness centrality over the dataset. This algorithm will tell us which nodes are the most 'pivotal' i.e. how many of the shortest paths between pairs of characters must pass through them

In [70]:
# Betweenness centrality
nx.betweenness_centrality(G) #RAW OUTPUT

{'Addam-Marbrand': 89.57559107884312,
 'Aegon-Frey-(son-of-Stevron)': 0.0,
 'Aegon-I-Targaryen': 2070.1848929545354,
 'Aegon-Targaryen-(son-of-Rhaegar)': 1835.9956036694534,
 'Aegon-V-Targaryen': 0.0,
 'Aemon-Targaryen-(Dragonknight)': 0.0,
 'Aemon-Targaryen-(Maester-Aemon)': 5819.585548671321,
 'Aenys-Frey': 235.23170117059948,
 'Aeron-Greyjoy': 4261.08881136204,
 'Aerys-I-Targaryen': 0.0,
 'Aerys-II-Targaryen': 4417.767685936323,
 'Aggar': 2453.780691223792,
 'Aggo': 15.923235574445172,
 'Alayaya': 793.9999999999998,
 'Albett': 0.0,
 'Alebelly': 1.1950938359257544,
 'Alerie-Hightower': 0.0,
 'Alester-Florent': 94.76581230857614,
 'Alla-Tyrell': 2.3027411155635207,
 'Allar-Deem': 0.0,
 'Allard-Seaworth': 2.6523809523809523,
 'Alleras': 4477.456923010096,
 'Alliser-Thorne': 490.7662109802396,
 'Alyn': 3.2203780535884383,
 'Alys-Arryn': 794.0,
 'Alys-Karstark': 0.0,
 'Alysane-Mormont': 0.0,
 'Amabel': 0.0,
 'Amerei-Frey': 0.6072261072261071,
 'Amory-Lorch': 191.22169415616713,
 'Anders-

In [71]:
# RANKED OUTPUT
response = nx.betweenness_centrality(G)

sorted_bw = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_bw[:10]:
    print(character, score)

Jon-Snow 65395.26787165435
Tyrion-Lannister 50202.17398521848
Daenerys-Targaryen 39636.77718662115
Stannis-Baratheon 35984.21182863313
Theon-Greyjoy 35436.85268519103
Jaime-Lannister 32122.976615424588
Robert-Baratheon 31391.065251945023
Arya-Stark 29342.15853062157
Cersei-Lannister 28274.91542663558
Eddard-Stark 26470.250249098248


### Closeness centrality

Closeness centrality tells us on average how many hops away each character is from every other character.

In [72]:
# Closeness centrality
nx.closeness_centrality(G) #RAW OUTPUT

{'Addam-Marbrand': 0.34580252283601565,
 'Aegon-Frey-(son-of-Stevron)': 0.3136094674556213,
 'Aegon-I-Targaryen': 0.3616924476797088,
 'Aegon-Targaryen-(son-of-Rhaegar)': 0.34222987516142916,
 'Aegon-V-Targaryen': 0.2626362735381566,
 'Aemon-Targaryen-(Dragonknight)': 0.31139835487661577,
 'Aemon-Targaryen-(Maester-Aemon)': 0.3560232870577698,
 'Aenys-Frey': 0.31812725090036015,
 'Aeron-Greyjoy': 0.31299212598425197,
 'Aerys-I-Targaryen': 0.31485148514851485,
 'Aerys-II-Targaryen': 0.3663594470046083,
 'Aggar': 0.26903553299492383,
 'Aggo': 0.2951002227171492,
 'Alayaya': 0.3294653957728968,
 'Albett': 0.3084982537834691,
 'Alebelly': 0.29597915115413254,
 'Alerie-Hightower': 0.28191489361702127,
 'Alester-Florent': 0.31812725090036015,
 'Alla-Tyrell': 0.30424799081515497,
 'Allar-Deem': 0.3272951832029642,
 'Allard-Seaworth': 0.30970003895597975,
 'Alleras': 0.2729145211122554,
 'Alliser-Thorne': 0.363013698630137,
 'Alyn': 0.3270259152612094,
 'Alys-Arryn': 0.2771000348553503,
 'Alys

In [73]:
# RANKED
response = nx.closeness_centrality(G)

sorted_cc = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_cc[:10]:
    print(character, score)

Tyrion-Lannister 0.4763331336129419
Robert-Baratheon 0.4592720970537262
Eddard-Stark 0.455848623853211
Cersei-Lannister 0.45454545454545453
Jaime-Lannister 0.4519613416714042
Jon-Snow 0.44537815126050423
Stannis-Baratheon 0.4446308724832215
Robb-Stark 0.4441340782122905
Joffrey-Baratheon 0.4339519650655022
Catelyn-Stark 0.4334787350054526


## 2. Community Detection Algoritms

### Label Propagation
We can also partition the characters into communities using the label propagation algorithm

In [74]:
# Label propagation
nx.label_propagation_communities(G) #RAW OUPUT is a generator

<generator object label_propagation_communities at 0x110d3f550>

In [75]:
communities = nx.label_propagation_communities(G)
sorted_communities = sorted(communities, key=lambda x: len(x), reverse=True)
for community in sorted_communities[:10]:
    print(list(community)[:10])

['Jared-Frey', 'Walder-Frey-(son-of-Merrett)', 'Bellegere-Otherys', 'Cressen', 'Stafford-Lannister', 'Mya-Stone', 'Moelle', 'Morgo', 'Rolph-Spicer', 'Jack-Be-Lucky']
['Raymun-Redbeard', 'Grigg', 'Cotter-Pyke', 'Errok', 'Val', 'Ottyn-Wythers', 'Marwyn', 'Lark', 'Aegon-V-Targaryen', 'Dalbridge']
['Marselen', 'Old-Bill-Bone', 'Barristan-Selmy', 'Qotho', 'Rhaenys-Targaryen-(daughter-of-Rhaegar)', 'Pono', 'Jhiqui', 'Pyat-Pree', 'Caggo', 'Hizdahr-zo-Loraq']


Characters are in the same community as those other characters with whom they frequently interact. The idea is that characters have closer ties to those in their community than to those outside.



### Clustering
We can calculate the clustering coefficient for each character. A clustering coefficient of '1' means that all characters that interact with that character also interact with each other:

In [76]:
# Clustering
nx.clustering(G) #RAW OUTPUT

{'Desmond': 1.0,
 'Jommo': 1.0,
 'Jonos-Bracken': 1.0,
 'Joseth': 1.0,
 'Jyck': 1.0,
 'Morrec': 1.0,
 'Raymun-Darry': 1.0,
 'Will-(prologue)': 1.0,
 'Todder': 1.0,
 'Allar-Deem': 1.0,
 'Amabel': 1.0,
 'Beth-Cassel': 1.0,
 'Chayle': 1.0,
 'Cley-Cerwyn': 1.0,
 'Colen-of-Greenpools': 1.0,
 'Dalbridge': 1.0,
 'Drennan': 1.0,
 'Guyard-Morrigen': 1.0,
 'Jarman-Buckwell': 1.0,
 'Koss': 1.0,
 'Olyvar-Frey': 1.0,
 'Palla': 1.0,
 'Patchface': 1.0,
 'Perwyn-Frey': 1.0,
 'Pyat-Pree': 1.0,
 'Rhaenys-Targaryen': 1.0,
 'Walder-Frey-(son-of-Jammos)': 1.0,
 'Visenya-Targaryen': 1.0,
 'Hayhead': 1.0,
 'Marya-Seaworth': 1.0,
 'Harra': 1.0,
 'Osfryd-Kettleblack': 1.0,
 'Urzen': 1.0,
 'Steffon-Baratheon': 1.0,
 'Werlag': 1.0,
 'Bannen': 1.0,
 'Big-Boil': 1.0,
 'Brella': 1.0,
 'Butterbumps': 1.0,
 'Dick-Follard': 1.0,
 'Dirk': 1.0,
 'Gendel': 1.0,
 'Gorne': 1.0,
 'Husband': 1.0,
 'Jon-Umber-(Smalljon)': 1.0,
 'Jonothor-Darry': 1.0,
 'Joramun': 1.0,
 'Orell': 1.0,
 'Oswell-Kettleblack': 1.0,
 'Prendahl-na-Gh

In [77]:
response = nx.clustering(G)

biggest_coefficient = sorted(response.items(), key=lambda x: x[1], reverse=True)
for character in biggest_coefficient[:10]:
    print(list(character)[:10])

['Desmond', 1.0]
['Jommo', 1.0]
['Jonos-Bracken', 1.0]
['Joseth', 1.0]
['Jyck', 1.0]
['Morrec', 1.0]
['Raymun-Darry', 1.0]
['Will-(prologue)', 1.0]
['Todder', 1.0]
['Allar-Deem', 1.0]


In [78]:
list(nx.connected_components(G))

[{'Addam-Marbrand',
  'Aegon-Frey-(son-of-Stevron)',
  'Aegon-I-Targaryen',
  'Aegon-Targaryen-(son-of-Rhaegar)',
  'Aegon-V-Targaryen',
  'Aemon-Targaryen-(Dragonknight)',
  'Aemon-Targaryen-(Maester-Aemon)',
  'Aenys-Frey',
  'Aeron-Greyjoy',
  'Aerys-I-Targaryen',
  'Aerys-II-Targaryen',
  'Aggar',
  'Aggo',
  'Alayaya',
  'Albett',
  'Alebelly',
  'Alerie-Hightower',
  'Alester-Florent',
  'Alla-Tyrell',
  'Allar-Deem',
  'Allard-Seaworth',
  'Alleras',
  'Alliser-Thorne',
  'Alyn',
  'Alys-Arryn',
  'Alys-Karstark',
  'Alysane-Mormont',
  'Amabel',
  'Amerei-Frey',
  'Amory-Lorch',
  'Anders-Yronwood',
  'Andrew-Estermont',
  'Andrey-Dalt',
  'Anguy',
  'Anya-Waynwood',
  'Archibald-Yronwood',
  'Ardrian-Celtigar',
  'Areo-Hotah',
  'Arianne-Martell',
  'Armen',
  'Arnolf-Karstark',
  'Aron-Santagar',
  'Arron',
  'Arson',
  'Arstan',
  'Arthor-Karstark',
  'Arthur-Dayne',
  'Arwyn-Oakheart',
  'Arya-Stark',
  'Arys-Oakheart',
  'Asha-Greyjoy',
  'Ashara-Dayne',
  'Aurane-Waters',

In [79]:
nx.number_connected_components(G)

1

In [80]:
nx.triangles(G)

{'Addam-Marbrand': 34,
 'Aegon-Frey-(son-of-Stevron)': 5,
 'Aegon-I-Targaryen': 5,
 'Aegon-Targaryen-(son-of-Rhaegar)': 24,
 'Aegon-V-Targaryen': 0,
 'Aemon-Targaryen-(Dragonknight)': 0,
 'Aemon-Targaryen-(Maester-Aemon)': 66,
 'Aenys-Frey': 3,
 'Aeron-Greyjoy': 18,
 'Aerys-I-Targaryen': 0,
 'Aerys-II-Targaryen': 47,
 'Aggar': 0,
 'Aggo': 26,
 'Alayaya': 2,
 'Albett': 2,
 'Alebelly': 3,
 'Alerie-Hightower': 0,
 'Alester-Florent': 12,
 'Alla-Tyrell': 5,
 'Allar-Deem': 1,
 'Allard-Seaworth': 2,
 'Alleras': 10,
 'Alliser-Thorne': 43,
 'Alyn': 4,
 'Alys-Arryn': 0,
 'Alys-Karstark': 3,
 'Alysane-Mormont': 3,
 'Amabel': 1,
 'Amerei-Frey': 2,
 'Amory-Lorch': 84,
 'Anders-Yronwood': 0,
 'Andrew-Estermont': 1,
 'Andrey-Dalt': 9,
 'Anguy': 17,
 'Anya-Waynwood': 10,
 'Archibald-Yronwood': 4,
 'Ardrian-Celtigar': 0,
 'Areo-Hotah': 11,
 'Arianne-Martell': 24,
 'Armen': 9,
 'Arnolf-Karstark': 2,
 'Aron-Santagar': 0,
 'Arron': 3,
 'Arson': 0,
 'Arstan': 12,
 'Arthor-Karstark': 0,
 'Arthur-Dayne': 3,


## 3. Path Finding Algorithms

Let's find the distance between two characters

In [81]:
# Shortest path
nx.shortest_path(G, source="Tyrion-Lannister", target="Hodor")

['Tyrion-Lannister', 'Luwin', 'Hodor']

In [82]:
# Shortest weighted path
nx.shortest_weighted_path(G, source="Tyrion-Lannister", target="Hodor",weight='weight')

['Tyrion-Lannister', 'Theon-Greyjoy', 'Wyman-Manderly', 'Hodor']