# Neo4J GDS 

https://neo4j.com/docs/graph-data-science/current/algorithms/community/

## Community
Community detection algorithms are used to evaluate how groups of nodes are clustered or partitioned, as well as their tendency to strengthen or break apart. The Neo4j GDS library includes the following community detection algorithms, grouped by quality tier:eaker-Listener Label Propagation

## Production-quality

- Conductance metric
<br>
- K-Core Decomposition
<br>
- K-1 Coloring
<br>
- K-Means Clustering
<br>
- Label Propagation
<br>
- Leiden
<br>
- Local Clustering Coefficient
<br>
- Louvain
<br>
- Modularity metric
<br>
- Modularity Optimization
<br>
- Strongly Connected Components
<br>
- Triangle Count
<br>
- Weakly Connected Components
<br>
## Alpha

- Approximate Maximum k-cut
<br>
- Speaker-Listener Label Propagation

In [2]:
from langchain.graphs import Neo4jGraph

graph = Neo4jGraph(
    url="bolt://18.234.164.187:7687",
    username="neo4j",
    password="energies-hope-fuses"
)

r = graph.query("""MATCH (n:Airport {city:"Los Angeles"}) RETURN n""")
print(r)

[{'n': {'altitude': 127, 'longest': 12091, 'city': 'Los Angeles', 'descr': 'Los Angeles International Airport', 'iata': 'LAX', 'icao': 'KLAX', 'location': POINT(-118.4079971 33.94250107), 'pagerank_weight': 8.19355807544669, 'id': '13', 'pagerank': 8.193558075446687, 'runways': 4, 'region_airports': 1, 'country_airports': 1}}]


# Conductance metric
https://neo4j.com/docs/graph-data-science/current/algorithms/conductance//<br>
Conductance is a metric that allows you to evaluate the quality of a community detection. Relationships of nodes in a community C connect to nodes either within C or outside C. The conductance is the ratio between relationships that point outside C and the total number of relationships of C. The lower the conductance, the more "well-knit" a community is.

It was shown by Yang and Leskovec in the paper "Defining and Evaluating Network Communities based on Ground-truth" that conductance is a very good metric for evaluating actual communities of real world graphs.

The algorithm runs in time linear to the number of relationships in the graph.it.


In [62]:
# create a new Relation named  HAS_CONTINENT_ROUTE
r = graph.query("""MATCH (p:Airport)-[:HAS_ROUTE]->(c:Airport) 
MATCH (p:Airport)-[:ON_CONTINENT]->(b:Continent) 
MATCH (c:Airport)-[:ON_CONTINENT]->(d:Continent) 
where b.name <> d.name
with  p,c,b,d, p.region_airports as region_airports, p.country_airports as country_airports, p.runways as runways ,c.region_airports as cregion_airports, c.country_airports as ccountry_airports, c.runways as crunways
merge (p)-[:HAS_CONTINENT_ROUTE {weight : ((region_airports* runways)/country_airports) / ((cregion_airports* crunways)/ccountry_airports)}]->(c) 
return p,c,b,d limit 25 """)
print(len(r))

3503


In [6]:
# create Projection Num Airports

In [None]:
# Call gds.graph.drop("Graph_Airports")

In [3]:
r = graph.query("""CALL gds.graph.project(
    'Conductance',
    'Airport',
    {
        HAS_CONTINENT_ROUTE: {
            orientation: 'UNDIRECTED'
        }
    },
    {
        nodeProperties: 'country_airports',
        relationshipProperties: 'weight'
    }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'country_airports': {'property': 'country_airports', 'defaultValue': None}}}}, 'relationshipProjection': {'HAS_CONTINENT_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_CONTINENT_ROUTE'}}, 'graphName': 'Conductance', 'nodeCount': 3503, 'relationshipCount': 16560, 'projectMillis': 388}]


In [66]:
len(r)

1

In [139]:
r[-1]

{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}},
 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT',
   'orientation': 'NATURAL',
   'indexInverse': False,
   'properties': {'weight': {'aggregation': 'DEFAULT',
     'property': 'weight',
     'defaultValue': None}},
   'type': 'HAS_ROUTE_WEIGHT'}},
 'graphName': 'Graph_Airports',
 'nodeCount': 3503,
 'relationshipCount': 46389,
 'projectMillis': 31}

In [None]:
# We now run the Louvain algorithm to create a division of the nodes into communities that we can then evalutate.

In [4]:
r = graph.query("""CALL gds.louvain.mutate('Conductance', { mutateProperty: 'community', relationshipWeightProperty: 'weight' })
YIELD communityCount""")
print(r)

[{'communityCount': 2731}]


# Run the algorithm in stream mode

Since we now have a community detection, we can evaluate how good it is under the conductance metric. Note that we in this case we use the feature of relationships being weighted by a relationship property.

In [31]:
# unweighted
r = graph.query("""CALL gds.graph.nodeProperty.stream('Conductance', 'community')
YIELD nodeId, propertyValue
RETURN gds.util.asNode(nodeId).descr AS name, propertyValue AS community
ORDER BY community ASC
""")
print(r[:50])

[{'name': 'Ronald Reagan Washington National Airport', 'community': 6}, {'name': 'New York La Guardia', 'community': 13}, {'name': 'Palm Beach International Airport', 'community': 18}, {'name': 'Long Beach Airport', 'community': 26}, {'name': 'Orange County/Santa Ana, John Wayne', 'community': 27}, {'name': 'Westchester County', 'community': 31}, {'name': 'San Antonio', 'community': 32}, {'name': 'The Eastern Iowa Airport', 'community': 35}, {'name': 'Houston Hobby', 'community': 37}, {'name': 'El Paso International Airport', 'community': 38}, {'name': 'Tucson International Airport', 'community': 42}, {'name': 'Santa Fe', 'community': 43}, {'name': 'Newark, Liberty', 'community': 51}, {'name': 'Dublin International Airport', 'community': 51}, {'name': 'Brussels Airport', 'community': 51}, {'name': 'Munich International Airport', 'community': 51}, {'name': 'Southwest Florida International Airport', 'community': 51}, {'name': 'Manchester Airport', 'community': 51}, {'name': 'Cologne Bonn

In [28]:
# unweighted
r = graph.query("""CALL gds.conductance.stream('Conductance', { communityProperty: 'community', relationshipWeightProperty: 'weight' })
YIELD community, conductance
RETURN  community, conductance
ORDER BY conductance DESC
""")

print(r[:5])

[{'community': 214, 'conductance': 0.6001809136137495}, {'community': 160, 'conductance': 0.5393586005830904}, {'community': 51, 'conductance': 0.4486486486486487}, {'community': 335, 'conductance': 0.44703965236284626}, {'community': 1763, 'conductance': 0.4326846174353506}]


In [34]:
for x in r:
    if x.get("community") == 214:
        print(x.get("name"))

Warsaw Chopin Airport
Malta International Airport
Sofia Airport
Belgrade Nikola Tesla Airport
Cairo International Airport
Addis Ababa Bole International Airport
Kuwait International Airport
Bahrain International Airport
Abu Dhabi International Airport
Bologna Guglielmo Marconi Airport
Tunis Carthage International Airport
Freeport, Grand Bahama International Airport
King Abdulaziz International Airport
Muscat International Airport
Luxor International Airport
Riyadh, King Khaled International Airport
Malaga Airport
Modlin Airport
Podgorica Airport
Berbera Airport
Borg El Arab International Airport
Port Sudan New International Airport
Juba International Airport
Khartoum International Airport
Julius Nyerere International Airport
Skopje Alexander the Great Airport
Ostrava Leos Janáček Airport
Kopitnari Airport
Saniat Rmel Airport
Aden Adde International Airport
Assiut International Airport
Sohag International Airport
Aswan International Airport
Asmara International Airport
John Paul II Inte

In [None]:
# Call gds.graph.drop("Conductance")

### K-Core Decompositionhttps://neo4j.com/docs/graph-data-science/current/algorithms/k-core/

The K-core decomposition constitutes a process of separates the nodes in a graph into groups based on the degree sequence and topology of the graph.

The term i-core refers to a maximal subgraph of the original graph such that each node in this subgraph has degree at least i. The maximality ensures that it is not possible to find another subgraph with more nodes where this degree property holds.

The nodes in the subgraph denoted by i-core also belong to the subgraph denoted by j-core for any j<i. The converse however is not true. Each node u is associated with a core value which denotes the largest value i such that u belongs to the i-core. The largest core value is called the degeneracy of the graph.

Standard algorithms for K-Core Decomposition iteratively remove the node of lowest degree until the graph becomes empty. When a node is removed from the graph, all of its relationships are removed, and the degree of its neighbors is reduced by one. With this approach, the different core groups are discovered one-by-one.n

In [52]:
r = graph.query("""CALL gds.graph.project(
  'Kcore',
  'Airport',
  {
    HAS_ROUTE_WEIGHT: {
      orientation: 'UNDIRECTED'
    }
  }
)
""")

print(r[:25])

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Kcore', 'nodeCount': 3503, 'relationshipCount': 92778, 'projectMillis': 26}]


# Memory Estimation 
If the results of the algorithm are as expected, the next step can be to write them back to the Neo4j database

In [53]:
r = graph.query("""CALL gds.kcore.write.estimate('Kcore', { writeProperty: 'coreValue' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory
""")

print(r[:25])

[{'nodeCount': 3503, 'relationshipCount': 92778, 'bytesMin': 144120, 'bytesMax': 144120, 'requiredMemory': '140 KiB'}]


# Write the results
If the results of the algorithm are as expected, the next step can be to write them back to the Neo4j database

In [54]:
r = graph.query("""CALL gds.kcore.stream('Kcore')
YIELD nodeId, coreValue
RETURN gds.util.asNode(nodeId).descr AS name, coreValue
ORDER BY coreValue DESC
""")

print(r[:20])

[{'name': 'London Heathrow', 'coreValue': 82}, {'name': 'London Gatwick', 'coreValue': 82}, {'name': 'Paris Charles de Gaulle', 'coreValue': 82}, {'name': 'Frankfurt am Main', 'coreValue': 82}, {'name': 'Helsinki Ventaa', 'coreValue': 82}, {'name': 'Dubai International Airport', 'coreValue': 82}, {'name': 'Dublin International Airport', 'coreValue': 82}, {'name': 'Leonardo da Vinci-Fiumicino International Airport', 'coreValue': 82}, {'name': 'Amsterdam Airport Schiphol', 'coreValue': 82}, {'name': 'Prague, Ruzyne International Airport', 'coreValue': 82}, {'name': 'Barcelona International Airport', 'coreValue': 82}, {'name': 'Adolfo Suarez Barajas Airport International Airport', 'coreValue': 82}, {'name': 'Vienna International Airport', 'coreValue': 82}, {'name': 'Zurich-Kloten Airport', 'coreValue': 82}, {'name': 'Geneva-Cointrin International Airport', 'coreValue': 82}, {'name': 'Brussels Airport', 'coreValue': 82}, {'name': 'Munich International Airport', 'coreValue': 82}, {'name': '

In [56]:
cv=[]
for x in r:
    if x.get("coreValue") not in cv:
        print(x.get("coreValue"), x.get("name"))
        cv.append(x.get("coreValue"))

82 London Heathrow
80 Brussels South Charleroi Airport
79 Liverpool John Lennon Airport
77 Vilnius International Airport
74 Venice, Treviso-Sant Angelo Airport
73 Newcastle Airport
72 Hartsfield - Jackson Atlanta International Airport
71 Modlin Airport
70 Columbus, Port Columbus International Airport
69 Rotterdam Airport
68 Katowice International Airport
67 Adnan Menderes International Airport
66 Niederrhein Airport
65 Zagreb Airport
64 Jose Marti International Airport
63 King Abdulaziz International Airport
62 Hurghada International Airport
61 Oaklahoma City, Will Rogers World Airport
60 Puerto Rico, Luis Munoz International Airport
58 Milas Bodrum International Airport
57 Enfidha - Hammamet International Airport
56 Don Miguel Hidalgo Y Costilla International Airport
55 Ljubljana Jože Pučnik Airport
54 Santiago de Compostela Airport
53 Krasnodar International Airport
52 Al Maktoum International Airport
51 Jersey Airport
50 Lanzhou Zhongchuan Airport
49 Iaşi Airport
48 Mactan Cebu Inte

 # Delete Projection

In [57]:
r = graph.query(""" Call gds.graph.drop("Kcore")
""")

print(r)

[{'graphName': 'Kcore', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 92778, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'e7907772-bb86-4de8-a2aa-21b2298652f3', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 14, 44, 4, 40694529, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.007562906429391597, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 14, 44, 4, 40694529, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 4, 14, 44, 4, 67689109, tzinfo=<UTC>), 'schema': {'graphProperties': {}, 'nodes': {'Airport': {}}, 'relationships': {'HAS_ROUTE_WEIGHT':

# K-1 Coloring

https://neo4j.com/docs/graph-data-science/current/algorithms/k1coloring/

The K-1 Coloring algorithm assigns a color to every node in the graph, trying to optimize for two objectives:

To make sure that every neighbor of a given node has a different color than the node itself.

To use as few colors as possible.

Note that the graph coloring problem is proven to be NP-complete, which makes it intractable on anything but trivial graph sizes. For that reason the implemented algorithm is a greedy algorithm. Thus it is neither guaranteed that the result is an optimal solution, using as few colors as theoretically possible, nor does it always produce a correct result where no two neighboring nodes have different colors. However the precision of the latter can be controlled by the number of iterations this algorithm runs.

In [85]:
 # MATCH (n:Country)-[:ON_CONTINENT]->(c:Continent) RETURN n,c LIMIT 50
r = graph.query("""CALL gds.graph.project(
    'K1-COLORING',
    'Airport',
    {
        HAS_ROUTE : {
            orientation: 'UNDIRECTED'
        }
    }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE'}}, 'graphName': 'K1-COLORING', 'nodeCount': 3503, 'relationshipCount': 185556, 'projectMillis': 85}]


In [79]:
r = graph.query("""CALL gds.k1coloring.stream('K1-COLORING')
YIELD nodeId, color
RETURN gds.util.asNode(nodeId).descr AS name, color
ORDER BY name""")
print(r[:20])

[{'name': 'A Coruna Airport', 'color': 0}, {'name': 'Aalborg Airport', 'color': 0}, {'name': 'Aarhus Airport', 'color': 6}, {'name': 'Aasiaat Airport', 'color': 0}, {'name': 'Aba Tenna Dejazmach Yilma International Airport', 'color': 0}, {'name': 'Abadan Airport', 'color': 0}, {'name': 'Abakan Airport', 'color': 0}, {'name': 'Abbotsford Airport', 'color': 5}, {'name': 'Abdul Rachman Saleh Airport', 'color': 0}, {'name': 'Abeche Airport', 'color': 0}, {'name': 'Abeid Amani Karume International Airport', 'color': 0}, {'name': 'Abel Santamaria Airport', 'color': 3}, {'name': 'Aberdeen Dyce Airport', 'color': 6}, {'name': 'Aberdeen Regional Airport', 'color': 0}, {'name': 'Abha Regional Airport', 'color': 3}, {'name': 'Abilene Regional Airport', 'color': 0}, {'name': 'Abraham González International Airport', 'color': 3}, {'name': 'Abraham Lincoln Capital Airport', 'color': 0}, {'name': 'Abu Dhabi International Airport', 'color': 14}, {'name': 'Abu Simbel Airport', 'color': 1}]


In [80]:
r = graph.query("""CALL gds.k1coloring.stats('K1-COLORING')
YIELD nodeCount, colorCount, ranIterations, didConverge""")
print(r)

[{'nodeCount': 3503, 'colorCount': 37, 'ranIterations': 2, 'didConverge': True}]


In [81]:
r = graph.query("""CALL gds.k1coloring.write('K1-COLORING', {writeProperty: 'color'})
YIELD nodeCount, colorCount, ranIterations, didConverge""")
print(r)

[{'nodeCount': 3503, 'colorCount': 38, 'ranIterations': 2, 'didConverge': True}]


In [None]:
# MATCH (n:Airport)-[:HAS_ROUTE]->(c:Airport) where n.color=9 RETURN n, c LIMIT 5

In [82]:
r = graph.query("""CALL gds.ephemeral.database.create(
  'gdsdb',
  'K1-COLORING')
""")

print(r)

[{'dbName': 'gdsdb', 'graphName': 'K1-COLORING', 'createMillis': 2768}]


In [86]:
r = graph.query(""" Call gds.graph.drop("K1-COLORING")
""")

print(r)

[{'graphName': 'K1-COLORING', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 185556, 'configuration': {'relationshipProjection': {'HAS_ROUTE': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {}, 'type': 'HAS_ROUTE'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': '76b4af55-26c2-484b-bb3a-b087c69453c6', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 24, 28, 964720748, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.015125812858783194, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 24, 28, 964720748, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 4, 15, 24, 29, 51058699, tzinfo=<UTC>), 'schema': {'graphProperties': {}, 'nodes': {'Airport': {}}, 'relationships': {'HAS_ROUTE': {}}}, 's

# K-Means Clustering
https://neo4j.com/docs/graph-data-science/current/algorithms/kmeans/

K-Means clustering is an unsupervised learning algorithm that is used to solve clustering problems. It follows a simple procedure of classifying a given data set into a number of clusters, defined by the parameter k. The Neo4j GDS Library conducts clustering based on node properties, with a float array node property being passed as input via the nodeProperty parameter. Nodes in the graph are then positioned as points in a d-dimensional space (where d is the length of the array property).

In [88]:
# create coordinates
r = graph.query("""MATCH (m:Airport)
with  m, m.location as location
set m.coordinates  =  [location.x, location.y]
return m limit 25""")
print(r[:5])

[{'m': {'altitude': 1026, 'longest': 12390, 'color': 0, 'city': 'Atlanta', 'coordinates': [-84.4281005859375, 33.6366996765137], 'descr': 'Hartsfield - Jackson Atlanta International Airport', 'iata': 'ATL', 'icao': 'KATL', 'location': POINT(-84.4281005859375 33.6366996765137), 'pagerank_weight': 10.38994835030296, 'id': '1', 'pagerank': 10.389948350302957, 'runways': 5, 'region_airports': 1, 'country_airports': 1}}, {'m': {'altitude': 151, 'longest': 12400, 'color': 0, 'city': 'Anchorage', 'coordinates': [-149.996002197266, 61.1744003295898], 'descr': 'Anchorage Ted Stevens', 'iata': 'ANC', 'icao': 'PANC', 'location': POINT(-149.996002197266 61.1744003295898), 'pagerank_weight': 2.651157253813937, 'id': '2', 'pagerank': 2.6511572538139374, 'runways': 3, 'region_airports': 1, 'country_airports': 1}}, {'m': {'altitude': 542, 'longest': 12250, 'color': 1, 'city': 'Austin', 'coordinates': [-97.6698989868164, 30.1944999694824], 'descr': 'Austin Bergstrom International Airport', 'iata': 'AUS

In [89]:
r = graph.query("""CALL gds.graph.project(
    'Airports',
    {
      Airport: {
        properties: 'coordinates'
      }
    },
    '*'
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'coordinates': {'property': 'coordinates', 'defaultValue': None}}}}, 'relationshipProjection': {'__ALL__': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': '*'}}, 'graphName': 'Airports', 'nodeCount': 3503, 'relationshipCount': 147447, 'projectMillis': 340}]


In [90]:
r = graph.query("""CALL gds.kmeans.write.estimate('Airports', {
  writeProperty: 'kmeans',
  nodeProperty: 'coordinates'
})
YIELD nodeCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'bytesMin': 103888, 'bytesMax': 124880, 'requiredMemory': '[101 KiB ... 121 KiB]'}]


In [91]:
r = graph.query("""CALL gds.kmeans.stream('Airports', {
  nodeProperty: 'coordinates',
  k: 30,
  randomSeed: 42
})
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).descr AS name, communityId
ORDER BY communityId, name ASC""")
print(r[:25])

[{'name': 'Abakan Airport', 'communityId': 0}, {'name': 'Aksu Airport', 'communityId': 0}, {'name': 'Alashankou Bole (Bortala) airport', 'communityId': 0}, {'name': 'Almaty Airport', 'communityId': 0}, {'name': 'Altai Airport', 'communityId': 0}, {'name': 'Altay Air Base', 'communityId': 0}, {'name': 'Astana International Airport', 'communityId': 0}, {'name': 'Barnaul Airport', 'communityId': 0}, {'name': 'Bayankhongor Airport', 'communityId': 0}, {'name': 'Bogashevo Airport', 'communityId': 0}, {'name': 'Bratsk Airport', 'communityId': 0}, {'name': 'Chinggis Khaan International Airport', 'communityId': 0}, {'name': 'Dalanzadgad Airport', 'communityId': 0}, {'name': 'Donoi Airport', 'communityId': 0}, {'name': 'Dunhuang Airport', 'communityId': 0}, {'name': 'Fuyun Kroktokay Airport', 'communityId': 0}, {'name': 'Gorno-Altaysk Airport', 'communityId': 0}, {'name': 'Hami Airport', 'communityId': 0}, {'name': 'Igarka Airport', 'communityId': 0}, {'name': 'Irkutsk Airport', 'communityId': 

In [92]:
cv=[]
for x in r:
    if x.get("communityId") not in cv:
        print(x.get("communityId"), x.get("name"))
        cv.append(x.get("communityId"))

0 Abakan Airport
1 Abu Dhabi International Airport
2 Auckland International Airport
3 Aktobe Airport
4 Aba Tenna Dejazmach Yilma International Airport
5 Alta Airport
6 Aberdeen Dyce Airport
7 Afonso Pena Airport
8 Agatti Airport
9 Anqing Airport
10 Abadan Airport
11 Adnan Menderes International Airport
12 Akita Airport
13 Antonio B. Won Pat International Airport
14 A Coruna Airport
15 Al Massira Airport
16 Aalborg Airport
17 Abel Santamaria Airport
18 Arad International Airport
19 Ahe Airport
20 Ajaccio-Napoléon Bonaparte Airport
21 Buon Ma Thuot Airport
22 Abeid Amani Karume International Airport
23 Adelaide International Airport
24 Abdul Rachman Saleh Airport
25 Aasiaat Airport
26 Aneityum Airport
27 Abbotsford Airport
28 Abeche Airport
29 Agartala Airport


In [93]:
r = graph.query("""Call gds.graph.drop("Airports")""")
print(r)

[{'graphName': 'Airports', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 147447, 'configuration': {'relationshipProjection': {'__ALL__': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {}, 'type': '*'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': '97e82383-f66f-495b-9db3-7e8d1cc2479e', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'coordinates': {'property': 'coordinates', 'defaultValue': None}}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 38, 4, 918719575, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.012019313461106112, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 38, 4, 918719575, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(2024, 4, 4, 15, 38, 5, 260820016, tzinfo=<UTC>), 'schema': {'graphProperties': {}, 'nodes': {'Airport'

# Label Propagation
https://neo4j.com/docs/graph-data-science/current/algorithms/label-propagation/

The Label Propagation algorithm (LPA) is a fast algorithm for finding communities in a graph. It detects these communities using network structure alone as its guide, and doesn’t require a pre-defined objective function or prior information about the communities.

LPA works by propagating labels throughout the network and forming communities based on this process of label propagation.

The intuition behind the algorithm is that a single label can quickly become dominant in a densely connected group of nodes, but will have trouble crossing a sparsely connected region. Labels will get trapped inside a densely connected group of nodes, and those nodes that end up with the same label when the algorithms finish can be considered part of the same community.


In [94]:
r = graph.query("""CALL gds.graph.project(
    'LP',
    'Airport',
    'HAS_ROUTE_WEIGHT',
    {
        nodeProperties: ['coordinates', 'altitude', 'runways'],
        relationshipProperties: 'weight'
    }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'runways': {'property': 'runways', 'defaultValue': None}, 'altitude': {'property': 'altitude', 'defaultValue': None}, 'coordinates': {'property': 'coordinates', 'defaultValue': None}}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'LP', 'nodeCount': 3503, 'relationshipCount': 46389, 'projectMillis': 255}]


In [95]:
r = graph.query("""CALL gds.labelPropagation.write.estimate('LP', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r)

[{'nodeCount': 3503, 'relationshipCount': 46389, 'bytesMin': 30040, 'bytesMax': 553816, 'requiredMemory': '[29 KiB ... 540 KiB]'}]


In [97]:
# In the stats execution mode, the algorithm returns a single row containing a summary of the algorithm result.
r = graph.query("""CALL gds.labelPropagation.stream('LP')
YIELD nodeId, communityId AS Community
RETURN gds.util.asNode(nodeId).descr AS Name, Community
ORDER BY Community, Name""")
print(r[:25])

[{'Name': 'A Coruna Airport', 'Community': 8629}, {'Name': 'Aalborg Airport', 'Community': 8629}, {'Name': 'Aarhus Airport', 'Community': 8629}, {'Name': 'Aba Tenna Dejazmach Yilma International Airport', 'Community': 8629}, {'Name': 'Abbotsford Airport', 'Community': 8629}, {'Name': 'Abeid Amani Karume International Airport', 'Community': 8629}, {'Name': 'Abel Santamaria Airport', 'Community': 8629}, {'Name': 'Aberdeen Dyce Airport', 'Community': 8629}, {'Name': 'Aberdeen Regional Airport', 'Community': 8629}, {'Name': 'Abha Regional Airport', 'Community': 8629}, {'Name': 'Abilene Regional Airport', 'Community': 8629}, {'Name': 'Abraham González International Airport', 'Community': 8629}, {'Name': 'Abraham Lincoln Capital Airport', 'Community': 8629}, {'Name': 'Abu Dhabi International Airport', 'Community': 8629}, {'Name': 'Achmad Yani Airport', 'Community': 8629}, {'Name': 'Adak Airport', 'Community': 8629}, {'Name': 'Adana Airport', 'Community': 8629}, {'Name': 'Addis Ababa Bole Int

In [99]:
cv=[]
for x in r:
    if x.get("Community") not in cv:
        print(x.get("Community"), x.get("Name"))
        cv.append(x.get("Community"))

8629 A Coruna Airport
8634 Atmautluak Airport
8826 Berlin, Tegel International Airport *Closed*
9099 Berlin-Schönefeld International Airport *Closed*
9568 New Castle Airport
9634 Toowoomba Airport
10850 Tikehau Airport
10851 Fakarava Airport
10852 Manihi Airport
10853 Arutua Airport
10854 Mataiva Airport
10855 Ahe Airport
10856 Aratika Nord Airport
10857 Takaroa Airport
10858 Nuku Hiva Airport
10859 Hiva Oa-Atuona Airport
10860 Bora Bora Airport
10861 Rangiroa Airport
10862 Huahine-Fare Airport
10863 Moorea Airport
10864 Hao Airport
10865 Maupiti Airport
10866 Raiatea Airport
10867 Sola Airport
10868 Siwo Airport
10869 Craig Cove Airport
10870 Longana Airport
10871 Sara Airport
10872 Tavie Airport
10873 Lamap Airport
10874 Lamen Bay Airport
10875 Maewo-Naone Airport
10876 Lonorore Airport
10877 Norsup Airport
10878 Gaua Island Airport
10879 Tongoa Airport
10880 Valesdir Airport
10881 Walaha Airport
10882 Southwest Bay Airport
10883 Dillon's Bay Airport
10884 Ipota Airport
10885 Tanna A

In [100]:
r = graph.query("""Call gds.graph.drop("LP")""")
print(r)

[{'graphName': 'LP', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 46389, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'NATURAL', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': 'e83a75fa-a950-4dc0-ac89-2b4fb8153eb2', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'runways': {'property': 'runways', 'defaultValue': None}, 'altitude': {'property': 'altitude', 'defaultValue': None}, 'coordinates': {'property': 'coordinates', 'defaultValue': None}}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 47, 27, 298504429, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.0037814532146957986, 'creationT

# Louvain
https://neo4j.com/docs/graph-data-science/current/algorithms/louvain/

The Louvain method is an algorithm to detect communities in large networks. It maximizes a modularity score for each community, where the modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network.

The Louvain algorithm is a hierarchical clustering algorithm, that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs.ode.

In [101]:
r = graph.query("""CALL gds.graph.project(
    'Louvain',
    'Airport',
    {
        HAS_ROUTE_WEIGHT: {
            orientation: 'UNDIRECTED'
        }
    },
    {
        nodeProperties: 'altitude',
        relationshipProperties: 'weight'
    }
)""")
print(r)

[{'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'altitude': {'property': 'altitude', 'defaultValue': None}}}}, 'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'graphName': 'Louvain', 'nodeCount': 3503, 'relationshipCount': 92778, 'projectMillis': 129}]


In [102]:
r = graph.query("""CALL gds.louvain.write.estimate('Louvain', { writeProperty: 'community' })
YIELD nodeCount, relationshipCount, bytesMin, bytesMax, requiredMemory""")
print(r[:25])

[{'nodeCount': 3503, 'relationshipCount': 92778, 'bytesMin': 229921, 'bytesMax': 2529048, 'requiredMemory': '[224 KiB ... 2469 KiB]'}]


In [103]:
r = graph.query("""CALL gds.louvain.stream('Louvain')
YIELD nodeId, communityId, intermediateCommunityIds
RETURN gds.util.asNode(nodeId).descr AS name, communityId
ORDER BY name ASC""")
print(r[:20])

[{'name': 'A Coruna Airport', 'communityId': 1312}, {'name': 'Aalborg Airport', 'communityId': 1312}, {'name': 'Aarhus Airport', 'communityId': 1312}, {'name': 'Aasiaat Airport', 'communityId': 1811}, {'name': 'Aba Tenna Dejazmach Yilma International Airport', 'communityId': 153}, {'name': 'Abadan Airport', 'communityId': 153}, {'name': 'Abakan Airport', 'communityId': 1956}, {'name': 'Abbotsford Airport', 'communityId': 1863}, {'name': 'Abdul Rachman Saleh Airport', 'communityId': 1487}, {'name': 'Abeche Airport', 'communityId': 153}, {'name': 'Abeid Amani Karume International Airport', 'communityId': 153}, {'name': 'Abel Santamaria Airport', 'communityId': 1863}, {'name': 'Aberdeen Dyce Airport', 'communityId': 1312}, {'name': 'Aberdeen Regional Airport', 'communityId': 1863}, {'name': 'Abha Regional Airport', 'communityId': 153}, {'name': 'Abilene Regional Airport', 'communityId': 1863}, {'name': 'Abraham González International Airport', 'communityId': 1863}, {'name': 'Abraham Linco

In [104]:
cv=[]
for x in r:
    if x.get("communityId") not in cv:
        print(x.get("communityId"), x.get("name"))
        cv.append(x.get("communityId"))

1312 A Coruna Airport
1811 Aasiaat Airport
153 Aba Tenna Dejazmach Yilma International Airport
1956 Abakan Airport
1863 Abbotsford Airport
1487 Abdul Rachman Saleh Airport
65 Adelaide International Airport
1573 Afonso Pena Airport
1097 Akhiok Airport
3065 Akiak Airport
1862 Akulivik Airport
3070 Alakanuk Airport
3080 Alashankou Bole (Bortala) airport
1108 Allakaket Airport
3057 Altai Airport
3061 Altay Air Base
1112 Ambler Airport
3071 Aneityum Airport
1104 Aniak Airport
3072 Aniwa Airport
3068 Anvik Airport
3465 Apataki Airport
3063 Araracuara Airport
3219 Arctic Village Airport
3432 Ataturk International Airport
3049 Attawapiskat Airport
3404 Awassa Airport
3086 Babo Airport
3076 Bakalalan Airport
3087 Bam Airport
3454 Barra do Garças Airport
3462 Batagay Airport
3082 Batticaloa Airport
3445 Bazhong Enyang Airport
1881 Bearskin Lake Airport
3073 Bedourie Airport
1817 Belize City Municipal Airport
199 Berlin, Tegel International Airport *Closed*
472 Berlin-Schönefeld International Air

In [105]:
r = graph.query("""Call gds.graph.drop("Louvain")""")
print(r)

[{'graphName': 'Louvain', 'database': 'neo4j', 'databaseLocation': 'local', 'memoryUsage': '', 'sizeInBytes': -1, 'nodeCount': 3503, 'relationshipCount': 92778, 'configuration': {'relationshipProjection': {'HAS_ROUTE_WEIGHT': {'aggregation': 'DEFAULT', 'orientation': 'UNDIRECTED', 'indexInverse': False, 'properties': {'weight': {'aggregation': 'DEFAULT', 'property': 'weight', 'defaultValue': None}}, 'type': 'HAS_ROUTE_WEIGHT'}}, 'readConcurrency': 4, 'relationshipProperties': {}, 'nodeProperties': {}, 'jobId': '7c1305df-6d88-44dd-984d-c83fffe5a79b', 'nodeProjection': {'Airport': {'label': 'Airport', 'properties': {'altitude': {'property': 'altitude', 'defaultValue': None}}}}, 'logProgress': True, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 53, 1, 651738470, tzinfo=<UTC>), 'validateRelationships': False, 'sudo': False}, 'density': 0.007562906429391597, 'creationTime': neo4j.time.DateTime(2024, 4, 4, 15, 53, 1, 651738470, tzinfo=<UTC>), 'modificationTime': neo4j.time.DateTime(202

# Local Clustering Coefficient
https://neo4j.com/docs/graph-data-science/current/algorithms/local-clustering-coefficient/
<br>
The Local Clustering Coefficient algorithm computes the local clustering coefficient for each node in the graph. The local clustering coefficient Cn of a node n describes the likelihood that the neighbours of n are also connected

# Modularity metric

https://neo4j.com/docs/graph-data-science/current/algorithms/modularity/

Modularity is a metric that allows you to evaluate the quality of a community detection. Relationships of nodes in a community C connect to nodes either within C or outside C. Graphs with high modularity have dense connections between the nodes within communities but sparse connections between nodes in different communities.

# Modularity Optimization
https://neo4j.com/docs/graph-data-science/current/algorithms/modularity-optimization/

The Modularity Optimization algorithm tries to detect communities in the graph based on their modularity. Modularity is a measure of the structure of a graph, measuring the density of connections within a module or community. Graphs with a high modularity score will have many connections within a community but only few pointing outwards to other communities. The algorithm will explore for every node if its modularity score might increase if it changes its community to one of its neighboring nodes.

# Strongly Connected Components
https://neo4j.com/docs/graph-data-science/current/algorithms/strongly-connected-components/

The Strongly Connected Components (SCC) algorithm finds maximal sets of connected nodes in a directed graph. A set is considered a strongly connected component if there is a directed path between each pair of nodes within the set. It is often used early in a graph analysis process to help us get an idea of how our graph is structured.


# Triangle Count
https://neo4j.com/docs/graph-data-science/current/algorithms/triangle-count/

The Triangle Count algorithm counts the number of triangles for each node in the graph. A triangle is a set of three nodes where each node has a relationship to the other two. In graph theory terminology, this is sometimes referred to as a 3-clique. The Triangle Count algorithm in the GDS library only finds triangles in undirected graphs.


# Weakly Connected Components
https://neo4j.com/docs/graph-data-science/current/algorithms/wcc/

The Weakly Connected Components (WCC) algorithm finds sets of connected nodes in directed and undirected graphs. Two nodes are connected, if there exists a path between them. The set of all nodes that are connected with each other form a component. In contrast to Strongly Connected Components (SCC), the direction of relationships on the path between two nodes is not considered. For example, in a directed graph (a)→(b), a and b will be in the same component, even if there is no directed relationship (b)→(a).
