# Examples

In this notebook we'll look at some examples of how to use networkx-neo4j.

First let's import the required libraries:

In [2]:
# pip install git+https://github.com/neo4j-graph-analytics/networkx-neo4j.git#egg=networkx-neo4j

from neo4j import GraphDatabase
import nxneo4j

Next we'll create an instance of the Neo4j driver

In [3]:
# You'll need to change these credentials to point to your own Neo4j Server
driver = GraphDatabase.driver( "bolt://localhost", auth=("neo4j", "neo"))

## Importing the Graph of Thrones

Before we run any algorithms we'll first import a Game of Thrones dataset. 
This dataset was curated by [Dr Andrew Beveridge](https://twitter.com/mathbeveridge?lang=en).

In [4]:
with driver.session() as session:
    session.run("""\
    CREATE CONSTRAINT ON (c:Character)
    ASSERT c.name IS UNIQUE
    """)
    
    session.run("""\
    LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book1-edges.csv" AS row
    MERGE (src:Character {name: row.Source})
    MERGE (tgt:Character {name: row.Target})
    // relationship for the book
    MERGE (src)-[r:INTERACTS1]->(tgt)
    ON CREATE SET r.weight = toInt(row.weight), r.book=1
    """)
    
    session.run("""\
    LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book2-edges.csv" AS row
    MERGE (src:Character {name: row.Source})
    MERGE (tgt:Character {name: row.Target})
    // relationship for the book
    MERGE (src)-[r:INTERACTS2]->(tgt)
    ON CREATE SET r.weight = toInt(row.weight), r.book=2
    """)  
    
    session.run("""\
    LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book3-edges.csv" AS row
    MERGE (src:Character {name: row.Source})
    MERGE (tgt:Character {name: row.Target})
    // relationship for the book
    MERGE (src)-[r:INTERACTS3]->(tgt)
    ON CREATE SET r.weight = toInt(row.weight), r.book=3
    """)       
    
    session.run("""\
    LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mathbeveridge/asoiaf/master/data/asoiaf-book45-edges.csv" AS row
    MERGE (src:Character {name: row.Source})
    MERGE (tgt:Character {name: row.Target})
    // relationship for the book
    MERGE (src)-[r:INTERACTS45]->(tgt)
    ON CREATE SET r.weight = toInt(row.weight), r.book=45
    """)           

## Configuring our graph

Next we’re going to create a map explaining the node labels, relationship types, and properties used in the Graph of Thrones.

In [5]:
config = {
    "node_label": "Character",
    "relationship_type": '*',
    "identifier_property": "name"
}
G = nxneo4j.Graph(driver, config)

We set:

* `node_label` to `Character` so that we’ll only consider nodes with that label
* `relationship_type` to `*` so that we’ll consider all relationship types in the graph
* `identifier_property` is the node property that we’ll use to identify each node from the networkx-neo4j API

## Centrality

Let's take a look at the centrality algorithms we have available to us.

### PageRank

We’ll start with the famous PageRank algorithm. Let’s find out who the most influential characters in Game of Thrones are:

In [6]:
sorted_pagerank = sorted(nxneo4j.centrality.pagerank(G).items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_pagerank[:10]:
    print(character, score)

Jon-Snow 17.596909995401738
Tyrion-Lannister 17.568136318535014
Jaime-Lannister 13.925499774190643
Cersei-Lannister 13.402380328492281
Daenerys-Targaryen 12.499216940395238
Stannis-Baratheon 12.15039828506236
Arya-Stark 11.692111871446901
Robb-Stark 11.277726159237629
Eddard-Stark 10.683881524590705
Catelyn-Stark 10.619218655788481


Hopefully there aren’t too many surprises there!

### Betweenness centrality

We can also run betweenness centrality over the dataset. This algorithm will tell us which nodes are the most 'pivotal' i.e. how many of the shortest paths between pairs of characters must pass through them

In [32]:
sorted_bw = sorted(nxneo4j.centrality.betweenness_centrality(G).items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_bw[:10]:
    print(character, score)

Jon-Snow 65578.93481374449
Tyrion-Lannister 50138.40835951188
Daenerys-Targaryen 39552.37894787023
Stannis-Baratheon 35867.917597987005
Theon-Greyjoy 35456.444688703545
Jaime-Lannister 32234.162314740188
Robert-Baratheon 31530.934471037195
Arya-Stark 29239.81618950015
Cersei-Lannister 28193.900581570768
Eddard-Stark 26445.10175634546


### Closeness centrality

Closeness centrality tells us on average how many hops away each character is from every other character.

In [37]:
sorted_cc = sorted(nxneo4j.centrality.closeness_centrality(G, wf_improved=False).items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_cc[:10]:
    print(character, score)

Tyrion-Lannister 0.4763331336129419
Robert-Baratheon 0.4592720970537262
Eddard-Stark 0.455848623853211
Cersei-Lannister 0.45454545454545453
Jaime-Lannister 0.4519613416714042
Jon-Snow 0.44537815126050423
Stannis-Baratheon 0.4446308724832215
Robb-Stark 0.4441340782122905
Joffrey-Baratheon 0.4339519650655022
Catelyn-Stark 0.4334787350054526


Again we see the usual suspects ranking highly on this metric of centrality.

### Harmonic centrality

Harmonic centrality is similar to closeness centrality but uses a slightly different scoring mechanism designed to handle disconnected components more cleanly.

In [42]:
sorted_hc = sorted(nxneo4j.centrality.harmonic_centrality(G).items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_hc[:10]:
    print(character, score)

Tyrion-Lannister 0.5383647798742138
Jon-Snow 0.5130817610062893
Cersei-Lannister 0.5121593291404611
Jaime-Lannister 0.5115723270440251
Eddard-Stark 0.5029350104821804
Stannis-Baratheon 0.5000628930817609
Robert-Baratheon 0.49976939203354304
Robb-Stark 0.49392033542976943
Arya-Stark 0.4883857442348009
Catelyn-Stark 0.48727463312368974


## Pathfinding 

Our next category of algorithms are used for path finding.

### Shortest Path

What if we want to find the shortest path between two characters?

In [27]:
nxneo4j.path_finding.shortest_path(G, "Tyrion-Lannister", "Hodor")

['Tyrion-Lannister', 'Bran-Stark', 'Hodor']

Notice that we refer to nodes by their name property — this is where the `identifier_property` that we defined in our config map is used.

## Label Propagation

We can also partition the characters into communities using the label propagation algorithm:

In [28]:
communities = nxneo4j.community.label_propagation_communities(G)
sorted_communities = sorted(communities, key=lambda x: len(x), reverse=True)
for community in sorted_communities[:10]:
    print(list(community)[:10])

['Varys', 'Shagga', 'Orton-Merryweather', 'Rolph-Spicer', 'Aemon-Targaryen-(Dragonknight)', 'Garrett-Paege', 'Shitmouth', 'Daven-Lannister', 'Garth-(Wolfs-Den)', 'Leona-Woolfield']
['Gerold-Dayne', 'Quhuru-Mo', 'Godric-Borrell', 'Devan-Seaworth', 'Marwyn', 'Gilly', 'Khorane-Sathmantes', 'Xhondo', 'Alester-Florent', 'Arthor-Karstark']
['Tycho-Nestoris', 'Arron', 'Maekar-I-Targaryen', 'Satin', 'Ulmer', 'Brandon-Norrey', 'Raymun-Redbeard', 'Craster', 'Bedwyck', 'Wynton-Stout']
['Alayaya', 'Allar-Deem', 'Cedric-Payne', 'Morgo', 'Eustace-Brune', 'Ballabar', 'Benerro', 'Vylarr', 'Jon-Connington', 'Podrick-Payne']
['Walder-Frey-(son-of-Merrett)', 'Squirrel', 'Barbrey-Dustin', 'Kyra', 'Sybelle-Glover', 'Sour-Alyn', 'Dagmer', 'Tristifer-Botley', 'Mikken', 'Stygg']
['Rodrik-Harlaw', 'Rodrik-Sparr', 'Talbert-Serry', 'Gormond-Goodbrother', 'Moqorro', 'Balon-Greyjoy', 'Victarion-Greyjoy', 'Gorold-Goodbrother', 'Baelor-Blacktyde', 'Hotho-Harlaw']
['Tomard', 'Mya-Stone', 'Petyr-Baelish', 'Yohn-Royce'

Characters are in the same community as those other characters with whom they frequently interact. The idea is that characters have closer ties to those in their community than to those outside.

### Number of connected components

We can work out the number of connected components (via the Union Find algorithm):

In [44]:
nxneo4j.community.number_connected_components(G)

149

### Connected Components

And we can find the characters in each of those components:

In [49]:
components = nxneo4j.community.connected_components(G)
sorted_components = sorted(components, key=lambda x: len(x), reverse=True)
for component in sorted_components[:10]:
    print(list(component)[:10])

['Varys', 'Shagga', 'Orton-Merryweather', 'Rolph-Spicer', 'Aemon-Targaryen-(Dragonknight)', 'Garrett-Paege', 'Shitmouth', 'Daven-Lannister', 'Rafford', 'Garth-(Wolfs-Den)']
['Gerold-Dayne', 'Quhuru-Mo', 'Godric-Borrell', 'Devan-Seaworth', 'Marwyn', 'Gilly', 'Khorane-Sathmantes', 'Xhondo', 'Alester-Florent', 'Arthor-Karstark']
['Alayaya', 'Allar-Deem', 'Cedric-Payne', 'Morgo', 'Eustace-Brune', 'Ballabar', 'Benerro', 'Vylarr', 'Jon-Connington', 'Podrick-Payne']
['Tycho-Nestoris', 'Arron', 'Maekar-I-Targaryen', 'Satin', 'Ulmer', 'Brandon-Norrey', 'Raymun-Redbeard', 'Craster', 'Bedwyck', 'Wynton-Stout']
['Walder-Frey-(son-of-Merrett)', 'Squirrel', 'Barbrey-Dustin', 'Kyra', 'Sybelle-Glover', 'Sour-Alyn', 'Dagmer', 'Tristifer-Botley', 'Mikken', 'Stygg']
['Rodrik-Harlaw', 'Rodrik-Sparr', 'Talbert-Serry', 'Gormond-Goodbrother', 'Moqorro', 'Balon-Greyjoy', 'Victarion-Greyjoy', 'Gorold-Goodbrother', 'Baelor-Blacktyde', 'Hotho-Harlaw']
['Tomard', 'Benedar-Belmore', 'Mya-Stone', 'Petyr-Baelish', '

### Clustering

We can calculate the clustering coefficient for each character.
A clustering coefficient of '1' means that all characters that interact with that character also interact with each other:

In [59]:
biggest_coefficient = sorted(nxneo4j.community.clustering(G).items(), key=lambda x: x[1], reverse=True)
for character in biggest_coefficient[:10]:
    print(list(character)[:10])

['Albett', 1.0]
['Chella', 1.0]
['Chiggen', 1.0]
['Clement-Piper', 1.0]
['Cohollo', 1.0]
['Daryn-Hornwood', 1.0]
['Desmond', 1.0]
['Eon-Hunter', 1.0]
['Heward', 1.0]
['High-Septon-(fat_one)', 1.0]


In [58]:
smallest_coefficient = sorted(nxneo4j.community.clustering(G).items(), key=lambda x: x[1])
for character in smallest_coefficient[:10]:
    print(list(character)[:10])

['Dolf', 0.0]
['Fogo', 0.0]
['Gunthor-son-of-Gurn', 0.0]
['Hugh', 0.0]
['Jafer-Flowers', 0.0]
['Kurleket', 0.0]
['Leo-Lefford', 0.0]
['Mord', 0.0]
['Hali', 0.0]
['Rickard-Stark', 0.0]


### Average Clustering Coefficient

We can also work out the average clustering coefficient across all characters:

In [63]:
nxneo4j.community.average_clustering(G)

0.4858622073350485