# Applied Graph Algorithms

In [1]:
from neo4j.v1 import GraphDatabase, basic_auth
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import pandas as pd

In [2]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=basic_auth("neo4j", "neo"))

## Betweenness Centrality

Betweenness centrality identifies nodes that are strategically positioned in the network, meaning that information will often travel through that person. Such an intermediary position gives that person power and influence.

Betweenness centrality is a raw count of the number of short paths that go through a given node. For example, if a node is located on a bottleneck between two large communities, then it will have high betweenness.

In [14]:
query = """\
CALL algo.betweenness.stream("Character", "INTERACTS1")
YIELD nodeId, centrality
MATCH (c:Character) WHERE ID(c) = nodeId
RETURN c.name, centrality
"""

with driver.session() as session:
    result = session.run(query)
df = pd.DataFrame([dict(record) for record in result])    

In [16]:
df.sort_values(by=["centrality"], ascending=False).head()

Unnamed: 0,c.name,centrality
750,Tyrion-Lannister,2682.658333
742,Sansa-Stark,2495.428571
736,Robb-Stark,1296.169048
656,Eddard-Stark,1058.45
751,Tywin-Lannister,929.833333


In [23]:
query = """\
CALL algo.betweenness.stream("Character", "INTERACTS1")
YIELD nodeId, centrality
MATCH (c:Character) WHERE ID(c) = nodeId
WITH c, centrality, [(c)-[r:INTERACTS1]-(other) | {character: other.name, weight: r.weight}] AS interactions
RETURN c.name, centrality, apoc.coll.sum([i in interactions | i.weight]) AS totalInteractions
"""

with driver.session() as session:
    df = pd.DataFrame([dict(record) for record in session.run(query)])  

In [27]:
df.sort_values(by=["centrality"], ascending=False).head(10)

Unnamed: 0,c.name,centrality,totalInteractions
750,Tyrion-Lannister,2682.658333,650.0
742,Sansa-Stark,2495.428571,545.0
736,Robb-Stark,1296.169048,516.0
656,Eddard-Stark,1058.45,1284.0
751,Tywin-Lannister,929.833333,181.0
738,Robert-Baratheon,792.791667,941.0
693,Jon-Snow,790.166667,784.0
690,Joffrey-Baratheon,537.45,422.0
732,Renly-Baratheon,446.308333,186.0
683,Jaime-Lannister,389.119048,241.0


In [26]:
df.sort_values(by=["totalInteractions"], ascending=False).head(10)

Unnamed: 0,c.name,centrality,totalInteractions
656,Eddard-Stark,1058.45,1284.0
738,Robert-Baratheon,792.791667,941.0
693,Jon-Snow,790.166667,784.0
750,Tyrion-Lannister,2682.658333,650.0
742,Sansa-Stark,2495.428571,545.0
632,Bran-Stark,22.5,531.0
636,Catelyn-Stark,172.042857,520.0
736,Robb-Stark,1296.169048,516.0
646,Daenerys-Targaryen,160.005952,443.0
623,Arya-Stark,0.0,430.0


## Storing betweenness centrality

Although the betweenness centrality algorithm runs very quickly on this dataset we wouldn’t usually be running this types of algorithms in the normal request/response flow of a web/mobile app. Instead of that we can store the result of the calculation as a property on the node and then refer to it in future queries.

Each of the algorithms has a variant that saves its output to the database rather than returning a stream. Let’s run the betweenness centrality algorithm and store the result as a property named `book1BetweennessCentrality`:

In [30]:
query = """\
CALL algo.betweenness("Character", "INTERACTS1", {writeProperty: "book1BetweennessCentrality"})
"""
with driver.session() as session:
    session.run(query)

We can write the following query to find the most influential characters:

In [31]:
query = """\
MATCH (c:Character)
RETURN c.name, c.book1BetweennessCentrality AS centrality
"""

with driver.session() as session:
    df = pd.DataFrame([dict(record) for record in session.run(query)])  

In [33]:
df.sort_values(by=["centrality"], ascending=False).head()

Unnamed: 0,c.name,centrality
750,Tyrion-Lannister,2682.658333
742,Sansa-Stark,2495.428571
736,Robb-Stark,1296.169048
656,Eddard-Stark,1058.45
751,Tywin-Lannister,929.833333


## Exercise: Betweenness Centrality for books 2-5

Now we want to calculate the betweenness centrality for the other books in the series and store the results in the database.

* Write queries that call algo.betweenness for the INTERACTS2, INTERACTS3, and INTERACTS45 relationship types.

After you’ve done that see if you can write queries to answer the following questions:

* Which character had the biggest increase in influence from book 1 to 5?

Wh* ich character had the biggest decrease?

Bonus question:

* Which characters who were in the top 10 influencers in book 1 are also in the top 10 influencers in book 5?