# GDMA - Assignment 3

Author: Julian Schelb (1069967)

In [11]:
from neo4j import GraphDatabase
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

### Connection to the database instance

In [12]:
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "subatomic-shrank-Respond"))
session = driver.session()

### Question 1

Import the graph from the attached karate.csv file. Then, using functions from the Graph Data Science Library (GDS), write Cypher queries to compute the Betweenness, Closeness, and Eigenvector centrality of each
node. You can find the full description of the functions at:

https://neo4j.com/docs/graph-data-science/current/algorithms/centrality/

**Import Data:**

In [9]:
query = """
LOAD CSV FROM 'file:///karate.csv' AS row FIELDTERMINATOR ';'
WITH row[0] as sourceId, row[1] as targetId
MERGE (s:Node {id: sourceId})
MERGE (t:Node {id: targetId})
MERGE (s)-[:RELATED_TO]->(t)   
RETURN s, t
"""

with driver.session() as session:
    result = session.run(query)

**Create In-Memory Graph Projection:**

In [15]:
query = """
CALL gds.graph.project('tmpGraph', 'Node', 'RELATED_TO')
"""

with driver.session() as session:
    result = session.run(query)

**Betweenness Centrality:**

In [35]:
query = """
CALL gds.betweenness.stream('tmpGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId).id AS nodeId, score
MATCH (n:Node) 
WHERE  n.id = nodeId
SET n.betweenness = score
RETURN n.id, score
ORDER BY score DESC
"""

dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data.head()

Unnamed: 0,n.id,score
0,3,8.833333
1,32,5.083333
2,9,2.25
3,29,2.166667
4,4,2.0


**Eigenvector Centrality:**

In [41]:
query = """
CALL gds.eigenvector.stream('tmpGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId).id AS nodeId, score
MATCH (n:Node) 
WHERE  n.id = nodeId
SET n.eigenvector = score
RETURN n.id, score
ORDER BY score DESC
"""
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data.head()

Unnamed: 0,n.id,score
0,34,0.993455
1,33,0.113378
2,14,0.006594
3,8,0.006594
4,13,0.006138


**Closeness Centrality:**

In [40]:
query = """
CALL gds.beta.closeness.stream('tmpGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId).id AS nodeId, score
MATCH (n:Node) 
WHERE  n.id = nodeId
SET n.closeness = score
RETURN n.id, score
ORDER BY score DESC
"""
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data.head(20)

Unnamed: 0,n.id,score
0,2,1.0
1,3,1.0
2,5,1.0
3,6,1.0
4,4,1.0
5,7,1.0
6,26,1.0
7,30,1.0
8,14,1.0
9,20,1.0


**Result:**

In [44]:
query = """
MATCH (a)
RETURN a.id, a.betweenness, a.eigenvector, a.closeness
"""        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,a.id,a.betweenness,a.eigenvector,a.closeness
0,1,0.0,8.660452e-09,0.0
1,2,0.5,6.456418e-07,1.0
2,3,8.833333,2.195913e-05,1.0
3,5,0.0,6.456418e-07,1.0
4,6,0.5,6.456418e-07,1.0
5,4,2.0,0.000456823,1.0
6,7,1.5,4.327262e-05,1.0
7,24,0.0,8.660452e-09,0.0
8,25,0.0,8.660452e-09,0.0
9,27,0.0,8.660452e-09,0.0


***

Finally, write a Cypher query that removes the node from the graph with
the highest centrality only if the graph is weakly connected. The check
whether the graph is weakly connected must be done with Cypher as well.
Also, write a python function that executes the Cypher query repeatedly
until the graph becomes disconnected. Ideally, the python code must
execute only a single Cypher query per iteration.

In [50]:
query = """
MATCH (n)
WITH n as node, n.eigenvector as centrality
RETURN node, centrality
ORDER BY centrality DESC
"""        
dtf_data = pd.DataFrame([dict(_) for _ in session.run(query)])
dtf_data

Unnamed: 0,node,centrality
0,"(closeness, betweenness, id, betw_centr, eigen...",0.9934551
1,"(closeness, betweenness, id, betw_centr, eigen...",0.1133783
2,"(closeness, betweenness, id, betw_centr, eigen...",0.006593799
3,"(closeness, betweenness, id, betw_centr, eigen...",0.006593799
4,"(closeness, betweenness, id, betw_centr, eigen...",0.006137621
5,"(closeness, betweenness, id, betw_centr, eigen...",0.005746022
6,"(closeness, betweenness, id, betw_centr, eigen...",0.005744748
7,"(closeness, betweenness, id, betw_centr, eigen...",0.0008697364
8,"(closeness, betweenness, id, betw_centr, eigen...",0.000456823
9,"(closeness, betweenness, id, betw_centr, eigen...",0.0004361465
