# Graph Data Science for Supply Chain: Part II

In this Notebook we explore the application of graph data science to supply chain logistics. Specifically we will

1. Show how to calculate Centrality metrics and Louvain communities using Neo4j Graph Data Science.
2. Provide meaningful interpretations for Centrality metrics and Louvain communities within the context of supply chain and discuss their relationship to performance and risks.
3. Demonstrate how the above metrics can be used in statistical and predictive modeling to understand their association to delays and risks in supply chain networks.


For a sample dataset we will use the “Cargo 2000” transport and logistics case study [[1]](#1). Cargo 2000 (re-branded as Cargo iQ in 2016) is an initiative of the International Air Transport Association (IATA) that aims to deliver a new quality management system for the air cargo industry.logistics-diagram.png

The below figure shows a model of the business processes covered in the IATA case study. It represents the business processes of a freight forwarding company, in which up to three smaller shipments from suppliers are consolidated and then shipped together to customers. The business process is structured into incoming and outgoing transport legs, with the overall objective that freight is delivered to customers in a timely manner.  You can find out more about the business model in the [first blog of this series](https://neo4j.com/developer-blog/supply-chain-neo4j-gds-bloom/) where we explored the dataset in Neo4j Bloom or from the [original data source]( https://s-cube-network.eu/c2k/).

<img src="img/logistics-diagram.png" alt="summary" width="1000"/>


## Prerequisits
- Neo4j >= 4.3
- GDS >= 2.0
- The Cargo 2000 case study dataset loaded into a Neo4j database. There is a notebook to generate the Neo4j database [here](https://github.com/neo4j-product-examples/demo-supply-chain-logistics/blob/main/airplane-cargo/part1/transform-and-load.ipynb).

## References
<a id="1">[1]</a> A. Metzger, P. Leitner, D. Ivanovic, E. Schmieders, R. Franklin, M. Carro, S. Dustdar, and K. Pohl, “ Comparing and combining predictive business process monitoring techniques,” IEEE Trans. on Systems Man Cybernetics: Systems, 2015.


In [6]:
import pandas as pd
import numpy as np

## Connect ot Neo4j Graph Data Science

In [7]:
from graphdatascience import GraphDataScience

# Use Neo4j URI and credentials according to your setup
gds = GraphDataScience('neo4j://localhost', auth=('neo4j', 'neo'))

## Collapse Graph Data Model with `SENDS_TO` Relationships
Relationships going directly to/from airport nodes will allow for more direct calculation of centrality and community metrics according to transport routes.

In [8]:
gds.run_cypher('''
    MATCH(a1:Airport)<-[:LOCATED_AT]-(d1:DeparturePoint)-[r:TRANSPORT]->(d2:ArrivalWarehouse)-[:LOCATED_AT]->(a2:Airport)
    WITH a1, a2, count(r) AS flightCount
    MERGE (a1)-[s:SENDS_TO]->(a2)
    SET s.flightCount = flightCount
    RETURN count(s)
''')

Unnamed: 0,count(s)
0,1205


## Calculating Centrality Metrics Using Neo4j Graph Data Science

In [9]:
# Create the in-memory graph projection
g, _ = gds.graph.project('proj', 'Airport', {'SENDS_TO':{'properties':['flightCount']}})

In [10]:
# calculate and write out-degree centrality
gds.degree.write(g,relationshipWeightProperty='flightCount', writeProperty='outDegreeCentrality')
# calculate and write betweenness centrality
gds.betweenness.write(g, writeProperty='betweennessCentrality')
#calculate and write eigenvector centrality
gds.eigenvector.write(g,relationshipWeightProperty='flightCount', writeProperty='eigenvectorCentrality')
# drop the projected in-memory graph
g.drop()

## Calculating Louvain Communities on UNDIRECTED Orientation

In [11]:
g, _ = gds.graph.project('proj', 'Airport', {'SENDS_TO':{'orientation':'UNDIRECTED', 'properties':['flightCount']}})
gds.louvain.write(g, relationshipWeightProperty='flightCount', writeProperty='louvainId')

g.drop()

## Calculate In-Degree Centrality on REVERSED Orientation

In [12]:
g, _ = gds.graph.project('proj', 'Airport', {'SENDS_TO':{'orientation':'REVERSE', 'properties':['flightCount']}})
gds.degree.write(g,relationshipWeightProperty='flightCount', writeProperty='inDegreeCentrality')
g.drop()

## Top 5 Airports for Each Centrality Metric

In [16]:
metrics = ['outDegreeCentrality', 'inDegreeCentrality', 'betweennessCentrality', 'eigenvectorCentrality']
top_n = 5
for metric in metrics:
    print('\n=======================================')
    print(f'Top {top_n} Airports for {metric}')
    print(gds.run_cypher(f'''
        MATCH(a:Airport)
        RETURN a.airportId AS airportId, a.name AS name, a.{metric} AS {metric}
        ORDER BY {metric} DESC LIMIT {top_n}
    '''))


Top 5 Airports for outDegreeCentrality
   airportId         name  outDegreeCentrality
0        815    Moodytown               2240.0
1        128    Shanefort               2195.0
2        700    Davisfort               2003.0
3        349  Richardberg               1104.0
4        485  Michaelstad                711.0

Top 5 Airports for inDegreeCentrality
   airportId         name  inDegreeCentrality
0        700    Davisfort              2205.0
1        128    Shanefort              1839.0
2        349  Richardberg              1312.0
3        815    Moodytown              1091.0
4        485  Michaelstad               758.0

Top 5 Airports for betweennessCentrality
   airportId         name  betweennessCentrality
0        349  Richardberg           11579.601583
1        128    Shanefort           10715.139355
2        700    Davisfort            5178.707487
3        815    Moodytown            4471.174851
4        555  Masseyhaven            3722.941825

Top 5 Airports for eigenve