***D  GRAPH ALGORITHMS***\
First of all  to run this section we need to have plugin "Graph Data Science Library" installed. If using Neo4j Desktop, open it and click on "+Add". Click on PlugIns and install "Graph Data Science Library". After it is installed you are ready to go. Just follow this notebook.

In [6]:
from neo4j import GraphDatabase
from dotenv import load_dotenv
import os
import pandas as pd

In [7]:
load_dotenv()
URI = os.environ.get('NEO4J_URI')
USERNAME = os.environ.get('NEO4J_USERNAME')
PASSWORD = os.environ.get('NEO4J_PASSWORD')
DB_PATH= os.environ.get('NEO4J_DB_PATH')
driver = GraphDatabase.driver(URI, auth=(USERNAME, PASSWORD))

In [8]:
def run_query(query, params={}):
    with driver.session() as session:
        result = session.run(query, params)
        return result.data()  # Returns list of dictionaries


We will work with the algorithms pageRank and Louvain. Let's explore the different methods that this extension offers us to work with them.

In [9]:
algs=["pageRank", "louvain"]
types=["stream","stats"]
query="""
CALL gds.list();
"""
result=run_query(query)
for row in result:
    if row["name"].split(".")[1] in algs and row["name"].split(".")[2] in types:
        print(row) 

{'name': 'gds.louvain.stats', 'description': 'The Louvain method for community detection is an algorithm for detecting communities in networks.', 'signature': 'gds.louvain.stats(graphName :: STRING, configuration = {} :: MAP) :: (modularity :: FLOAT, modularities :: LIST<FLOAT>, ranLevels :: INTEGER, communityCount :: INTEGER, communityDistribution :: MAP, postProcessingMillis :: INTEGER, preProcessingMillis :: INTEGER, computeMillis :: INTEGER, configuration :: MAP)', 'type': 'procedure'}
{'name': 'gds.louvain.stats.estimate', 'description': 'Returns an estimation of the memory consumption for that procedure.', 'signature': 'gds.louvain.stats.estimate(graphNameOrConfiguration :: ANY, algoConfiguration :: MAP) :: (requiredMemory :: STRING, treeView :: STRING, mapView :: MAP, bytesMin :: INTEGER, bytesMax :: INTEGER, nodeCount :: INTEGER, relationshipCount :: INTEGER, heapPercentageMin :: FLOAT, heapPercentageMax :: FLOAT)', 'type': 'procedure'}
{'name': 'gds.louvain.stream', 'descripti

We will only use the stream and the stats methods. Let's start defining the subgraphs where we will run the queries on.

In [10]:
#Use the following query to drop the graph if it exists
#This is important to avoid errors when running the same code multiple times
query0="""CALL gds.graph.drop('paper-citations')
YIELD graphName
RETURN graphName"""
run_query(query0)

ClientError: {code: Neo.ClientError.Procedure.ProcedureCallFailed} {message: Failed to invoke procedure `gds.graph.drop`: Caused by: java.util.NoSuchElementException: Graph with name `paper-citations` does not exist on database `neo4j`. It might exist on another database.}

In [11]:
query1="""
MATCH (citing:paper)-[r:cite]->(cited:paper)
RETURN gds.graph.project(
  'paper-citations',
  citing,
  cited
)"""
run_query(query1)

[{"gds.graph.project(\n  'paper-citations',\n  citing,\n  cited\n)": {'relationshipCount': 3277,
   'graphName': 'paper-citations',
   'query': "\nMATCH (citing:paper)-[r:cite]->(cited:paper)\nRETURN gds.graph.project(\n  'paper-citations',\n  citing,\n  cited\n)",
   'projectMillis': 3419,
   'configuration': {'readConcurrency': 4,
    'undirectedRelationshipTypes': [],
    'jobId': 'd1bf4e09-8454-4205-9c4e-05161d7075b2',
    'logProgress': True,
    'query': "\nMATCH (citing:paper)-[r:cite]->(cited:paper)\nRETURN gds.graph.project(\n  'paper-citations',\n  citing,\n  cited\n)",
    'inverseIndexedRelationshipTypes': []},
   'nodeCount': 1885}}]

In [12]:
query="""
CALL gds.pageRank.stream('paper-citations')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).title AS title, score
ORDER BY score DESC, title ASC
"""

result=run_query(query)
for row in result:
    print(row) 


{'title': 'Statistical Learning Theory', 'score': 2.929660237371651}
{'title': 'Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift', 'score': 2.057274151467868}
{'title': 'Backpropagation Applied to Handwritten Zip Code Recognition', 'score': 1.9949260255627461}
{'title': 'Long Short-Term Memory', 'score': 1.7404189012593918}
{'title': 'Object Detection with Discriminatively Trained Part Based Models', 'score': 1.3052627232632437}
{'title': 'Dropout: a simple way to prevent neural networks from overfitting', 'score': 1.229563627892045}
{'title': 'Pegasos: primal estimated sub-gradient solver for SVM', 'score': 1.2192262076381837}
{'title': 'Contrastive Learning and Neural Oscillations', 'score': 1.2063022344433276}
{'title': 'Discrimination-aware data mining', 'score': 1.1449828984383572}
{'title': 'An Information-Maximization Approach to Blind Separation and Blind Deconvolution', 'score': 1.0328620499182861}
{'title': 'Practical Bayesian Optim

In [15]:
import json
query2="""
CALL gds.pageRank.stats('paper-citations')
YIELD centralityDistribution
RETURN centralityDistribution
"""

result=run_query(query2)
print(json.dumps(result[0]["centralityDistribution"], indent=1))

{
 "min": 0.14999961853027344,
 "max": 2.929672241210937,
 "p90": 0.3532552719116211,
 "p999": 2.0572805404663086,
 "p99": 0.919886589050293,
 "p50": 0.18356800079345703,
 "p75": 0.23738956451416016,
 "p95": 0.4851522445678711,
 "mean": 0.23539210605368374
}


In [None]:
#Use the following queries to drop the graphs if they exists
#This is important to avoid errors when running the same code multiple times
query0="""CALL gds.graph.drop('author-review')
YIELD graphName
RETURN graphName"""
run_query(query0)

query0="""CALL gds.graph.drop('author-write')
YIELD graphName
RETURN graphName"""
run_query(query0)

[{'graphName': 'author-review'}]

In [None]:
query1="""
MATCH (reviewer:author)
OPTIONAL MATCH (reviewer)<-[:reviewed_by]->(:paper)<-[:writes]-(author:author)
RETURN gds.graph.project(
  'author-review',
  reviewer,
  author
)"""
run_query(query1)


query2="""
MATCH (a:author)
OPTIONAL MATCH (a)-[:writes]->(:paper)<-[:writes]-(author:author)
RETURN gds.graph.project(
  'author-write',
  a,
  author
)"""
run_query(query2)

[{"gds.graph.project(\n  'author-write',\n  a,\n  author\n)": {'relationshipCount': 56266,
   'graphName': 'author-write',
   'query': "\nMATCH (a:author)\nOPTIONAL MATCH (a)-[:writes]->(:paper)<-[:writes]-(author:author)\nRETURN gds.graph.project(\n  'author-write',\n  a,\n  author\n)",
   'projectMillis': 467,
   'configuration': {'readConcurrency': 4,
    'undirectedRelationshipTypes': [],
    'jobId': '2828d07f-ba67-42ce-984f-3c1a735a1cf0',
    'logProgress': True,
    'query': "\nMATCH (a:author)\nOPTIONAL MATCH (a)-[:writes]->(:paper)<-[:writes]-(author:author)\nRETURN gds.graph.project(\n  'author-write',\n  a,\n  author\n)",
    'inverseIndexedRelationshipTypes': []},
   'nodeCount': 7343}}]

In [None]:
query="""
CALL gds.louvain.stream('author-write')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).authorName AS name, communityId
ORDER BY communityId 
"""

result=run_query(query)
for row in result:
    print(row) 

print("----------------------------------------")

query="""
CALL gds.louvain.stream('author-review')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).authorName AS name, communityId
ORDER BY communityId 
"""

result=run_query(query)
for row in result:
    print(row) 


{'name': 'Jie Huang', 'communityId': 1}
{'name': 'Cheng-Xiang Wang', 'communityId': 1}
{'name': 'L. Bai', 'communityId': 1}
{'name': 'Yang Yang', 'communityId': 1}
{'name': 'Jie Li', 'communityId': 1}
{'name': 'O. Tirkkonen', 'communityId': 1}
{'name': 'Ming-Tuo Zhou', 'communityId': 1}
{'name': 'Xuri Tang', 'communityId': 7}
{'name': 'Tian Han', 'communityId': 9}
{'name': 'Qianqian Yang', 'communityId': 9}
{'name': 'Zhiguo Shi', 'communityId': 9}
{'name': 'Shibo He', 'communityId': 9}
{'name': 'Zhaoyang Zhang', 'communityId': 9}
{'name': 'Binhui Xie', 'communityId': 20}
{'name': 'Shuang Li', 'communityId': 20}
{'name': 'Mingjiang Li', 'communityId': 20}
{'name': 'Chi Harold Liu', 'communityId': 20}
{'name': 'Gao Huang', 'communityId': 20}
{'name': 'Guoren Wang', 'communityId': 20}
{'name': 'Yulin Wang', 'communityId': 20}
{'name': 'Shiji Song', 'communityId': 20}
{'name': 'Xuran Pan', 'communityId': 20}
{'name': 'Yitong Xia', 'communityId': 20}
{'name': 'Cheng Wu', 'communityId': 20}


In [None]:
query="""
CALL gds.louvain.stats('author-write')
YIELD communityCount
"""

result=run_query(query)
for row in result:
    print(row) 
    
print("----------------------------------------")
query="""
CALL gds.louvain.stats('author-review')
YIELD communityCount
"""

result=run_query(query)
for row in result:
    print(row) 



{'communityCount': 1198}
