# Testing the Resilence of Global Supply Chain for Microchips

Robust and efficient supply chains are crucial for businesses to remain competitive. With hundreds of suppliers and intermediaries, tracking the resilience of your supply chain can be challenging.

Graph Analytics can help.

We can identify like suppliers with things like Louvain, and suppliers who form bottlenecks within a supply chain with things like PageRanks. Finally, we can test the resilience of a supply chain by removing a few suppliers and seeing the downstream effects with WCC.

# Set Up

First we need to install the graphdatascience package and load all of our secrets

In [None]:
!pip install graphdatascience

In [None]:
!pip install neo4j



In [None]:
from graphdatascience.session import GdsSessions, AuraAPICredentials, DbmsConnectionInfo, AlgorithmCategory
from datetime import timedelta
import pandas as pd
import os
from google.colab import userdata

In [None]:
CLIENT_ID = userdata.get("CLIENT_ID")
CLIENT_SECRET = userdata.get("CLIENT_SECRET")
TENANT_ID = userdata.get("TENANT_ID")

# Neo4j Database Connection Info
SUPPLIER_URI = userdata.get("SUPPLIER_URI")
NEO4J_USER = userdata.get("NEO4J_USER")
SUPPLIER_PASSWORD = userdata.get("SUPPLIER_PASSWORD")

## Establishing a Session

We then use our secrets to establish a connection to our AuraDB

In [None]:
sessions = GdsSessions(api_credentials=AuraAPICredentials(CLIENT_ID, CLIENT_SECRET, TENANT_ID))

name = "my-new-session"
memory = sessions.estimate(
    node_count=475,
    relationship_count=800,
    algorithm_categories=[AlgorithmCategory.CENTRALITY, AlgorithmCategory.NODE_EMBEDDING],
)

db_connection_info = DbmsConnectionInfo(SUPPLIER_URI, NEO4J_USER, SUPPLIER_PASSWORD)

# Create or retrieve a session
gds = sessions.get_or_create(
    session_name=name,
    memory=memory,
    db_connection=db_connection_info, # this is checking for a bolt server currently
    ttl=timedelta(hours=5),
)


# Our First Projection

We only have two types of nodes and they are fully interconnected, representing the high level supply chain needed to make microchips.

Therefore, we are going to project the entire graph:

In [None]:
# Define the custom Cypher query for projecting the graph
query = """
CALL {
    MATCH (source)-[rel]->(target)
    RETURN
        source,
        rel,
        target
}
RETURN gds.graph.project.remote(source, target, {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target),
    relationshipType: type(rel)
});

"""

# Project the graph into GDS
full, result = gds.graph.project(
    graph_name="full-graph",
    query=query
)



 Graph creation from Triplets:   0%|          | 0/100 [00:00<?, ?%/s]

# Running Louvain to See Communities

Next we are going to run Louvain to see what communities we can see within our data. By using the `.write` method from the `louvain`, we can write our results directly back to the database.

In [None]:
result = gds.louvain.write(full, writeProperty="louvain")

 Node properties export:   0%|          | 0/100 [00:00<?, ?%/s]

## Returning Results
We can use the `GraphDatabase` object to run and return results from our graph database.

In [None]:
from neo4j import GraphDatabase
import pandas as pd

driver = GraphDatabase.driver(SUPPLIER_URI, auth=(NEO4J_USER, SUPPLIER_PASSWORD))

# Define a function to execute the Cypher query and return a DataFrame
def query_to_dataframe(query, parameters=None):
    with driver.session() as session:
        result = session.run(query, parameters)
        # Convert the result to a list of dictionaries (records)
        records = [record.data() for record in result]
        # Convert the list of dictionaries into a Pandas DataFrame
        df = pd.DataFrame(records)
        return df

# Example Cypher query
cypher_query = """
MATCH (c)
RETURN c.louvain AS louvain, count(*) AS Count
ORDER BY Count DESC;
"""

# Execute the query and get the results as a DataFrame
df = query_to_dataframe(cypher_query)

# Display the DataFrame
df
# Close the driver when done


Unnamed: 0,louvain,Count
0,7,163
1,48,65
2,11,36
3,21,31
4,231,13
5,2,8
6,1,6
7,4,5
8,13,4
9,99,3


## What Did We Find?
We found there are 10 different communities in our graph. Louvain finds communities based on the topology of the graph. By this logic, there will be more relationships within a community than relationships leaving a community.

That means a few things. Smaller communities are relatively isolated within the supply chain. Take Group 13 for instance:

In [None]:
# Example Cypher query
cypher_query = """
MATCH (n)
WHERE n.louvain = 13
RETURN n.name, n.input_name;
"""

# Execute the query and get the results as a DataFrame
df = query_to_dataframe(cypher_query)

# Display the DataFrame
df

Unnamed: 0,n.name,n.input_name
0,AMD,
1,Nvidia,
2,Jingjia Micro,
3,,Logic chip design: Discrete GPUs


This is all the major designers of GPUs along with the GPU designs themselves. If these companies were to go out of business, it would have a large effect on this community and on the larger graph.

But how about those larger groups? Companies within the largest group will be highly connected to a large portion of the graph. But how can we tell which of these companies are most important to health of the supply chain?

# PageRank

PageRank is a centrality algorithm originally developed by Larry Page and powers Google's search engine. It measures the importance of a node within a graph based on the quality and quantity of its relationships.

In [None]:
pagerank_result = gds.pageRank.write(
    full,
    writeProperty="PR",  # Name of the property to store scores
    maxIterations=20,      # Maximum number of iterations
    dampingFactor=0.85     # Damping factor (default is 0.85)
)

 Node properties export:   0%|          | 0/100 [00:00<?, ?%/s]

Lets take a look at what PageRank has to say about entire graph:




In [None]:
# Example Cypher query
cypher_query = """
MATCH (n:Company)
RETURN n.name, n.provider_id, n.PR
ORDER BY n.PR DESC;
"""

# Execute the query and get the results as a DataFrame
df = query_to_dataframe(cypher_query)

# Display the DataFrame
df.head(5)

Unnamed: 0,n.name,n.provider_id,n.PR
0,Intel,P9,6.331569
1,TSMC,P34,6.222168
2,Samsung,P35,6.222168
3,Microchip,P19,4.128678
4,GlobalFoundries,P36,4.104069


Now comes the real test!

How brittle is our supply chain? What would happen if we removed a few of these companies?

In order to test this, we will project a subgraph into memory which removes Intel, TSMC, and Microchip.

In [None]:
# Define the custom Cypher query for projecting the entire graph
query = """
CALL {
    MATCH (source)-[rel]->(target)
    WHERE NOT (source:Company AND source.provider_id IN ["P9",  "P35", "P19"])
      AND NOT (target:Company AND target.provider_id IN ["P9",  "P35", "P19"])
    RETURN source, rel, target
}
RETURN gds.graph.project.remote(source, target, {
    sourceNodeLabels: labels(source),
    targetNodeLabels: labels(target),
    relationshipType: type(rel)
});
"""

graph_name = "four-less"

if gds.graph.exists(graph_name)["exists"]:
    # Drop the graph if it exists
    gds.graph.drop(graph_name)
    print(f"Graph '{graph_name}' dropped.")

# Project the entire graph into GDS using the custom query
G, result = gds.graph.project(
    graph_name=graph_name,
    query=query
)

 Graph creation from Triplets:   0%|          | 0/100 [00:00<?, ?%/s]

Then we will run Weakly Connected Componenets (WCC) against it to see if the supply chain split into two. WCC indentifies group of nodes that are connected in some way even if you ignore the direction of the relationships.

Before we removed these three companies. We had one big graph. If we find more than one group on our subgraph, that means our graph has split into two and we can no longer make mircochips.

In [None]:
# Run Weakly Connected Components on the projected graph
result = gds.wcc.write(G, writeProperty="wcc")

# Example Cypher query
cypher_query = """
MATCH (n)
RETURN n.wcc, count(*)
"""

# Execute the query and get the results as a DataFrame
df = query_to_dataframe(cypher_query)

# Display the DataFrame
df.head(10)

 Node properties export:   0%|          | 0/100 [00:00<?, ?%/s]

Unnamed: 0,n.wcc,count(*)
0,0,50
1,26,286


As you can see, we have two groups according to WCC connected components meaning that if just these three companies faced major disruptions, it would be impossible to find alternatives.  

## Working from Dataframes

Graph Analytics does not require an AuraDB instance in order to run. As long as you can export your data into pandas dataframes, you can use graph analytics.

Let's take a look at how we can use a path find algorithm to navigate our way through our suppliers.

Take a look at the CSV below. We have everything we need to easily model a supply chain. We just need to split it into a nodes and relationship dataframe in order to run graph analytics against it.

In [19]:
sequence = pd.read_csv("https://raw.githubusercontent.com/corydonbaylor/supply-chain/refs/heads/main/data/sequence_clean.csv")

sequence.head()

Unnamed: 0.1,Unnamed: 0,input_name,input_id,goes_into_name,goes_into_id,is_type_of_name,is_type_of_id
0,0,Crystal growing furnaces,N8,Wafer,N26,,
1,1,Crystal machining tools,N9,Wafer,N26,,
2,5,Advanced photolithography equipment,N19,Photolithography,N25,,
3,6,Ion implanters,N17,Wafer,N26,,
4,7,Photomask (maskless) lithography equipment,N28,Advanced photomask,N33,,


With some minimal cleaning, we can easily put our data in the format needed to project a graph!

In [20]:
# converting to int
sequence["input_id"] = sequence["input_id"].str[1:].astype(int)
sequence["goes_into_id"] = sequence["goes_into_id"].str[1:].astype(int)

# Create 'nodes' DataFrame
nodes_input = sequence[['input_id']].drop_duplicates().rename(columns={'input_id': 'nodeId'})
nodes_goes = sequence[['goes_into_id']].drop_duplicates().rename(columns={'goes_into_id': 'nodeId'})
nodes = pd.concat([nodes_input, nodes_goes]).drop_duplicates()

nodes.head()



Unnamed: 0,nodeId
0,8
1,9
2,19
3,17
4,28


In [21]:
# Create 'relationships' DataFrame
relationships = sequence[['input_id', 'goes_into_id']].rename(
    columns={'input_id': 'sourceNodeId', 'goes_into_id': 'targetNodeId'}
)
relationships['relationshipType'] = "GOES_INTO"

relationships.head()

Unnamed: 0,sourceNodeId,targetNodeId,relationshipType
0,8,26,GOES_INTO
1,9,26,GOES_INTO
2,19,25,GOES_INTO
3,17,26,GOES_INTO
4,28,33,GOES_INTO


### Constructing a Graph

Finally, we can build a projection using just these two dataframes. No AuraDB required. That means that if you can get your data into python, you can immediately spin up a graph analytics session and experience the power of graph algorithms!

In [None]:
G = gds.graph.construct("chain", nodes, relationships)

Uploading Nodes:   0%|          | 0/48 [00:00<?, ?Records/s]

Uploading Relationships:   0%|          | 0/62 [00:00<?, ?Records/s]

Let's run a path finding algorithm to see the shortest path through our supply chain and stream the results into a dataframe.

We will start at N8 which is a wafer and find the quickest path to N99 which is "Finished logic chip"

In [None]:
gds.shortestPath.dijkstra.stream(G,
 sourceNode= 8, targetNode= 99
                                 )

Unnamed: 0,index,sourceNode,targetNode,totalCost,nodeIds,costs,path
0,0,8,99,8.0,"[8, 26, 35, 25, 46, 57, 69, 78, 99]","[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]",


In [None]:
sessions.delete(session_name="my-new-session")

True