
### Efficient Supply Chain Management using Graph Neural Networks (GNNs)

**Notebook-3**


**Kunal Kishore || 22810041**

This notebook focuses on the implementation of an Prediction problem using linear programming and Graph Neural Networks (GNNs).

In this notebook, SupplyGraph data has been aused.

SupplyGraph data is available at the repository [https://github.com/CIOL-SUST/SupplyGraph](https://github.com/CIOL-SUST/SupplyGraph).


To succesfully run this notebook put  Dataset from "SupplyGraph/Raw Dataset/Homogenoeus/Edges/EdgesIndex/Edges (Product Group).csv" into same Directory as of this notebook.


In [1]:
pip install torch_geometric



## RandomForest
First, we'll use a Random Forest Classifier to predict the presence of an edge between nodes in the graph.



In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier as rf
from sklearn.metrics import accuracy_score
import itertools

# Read the data
rf_data = pd.read_csv('Edges (Product Group).csv')

# Step 1: Create a DataFrame containing all combinations of node1 and node2 for each product category
rf_all_nodes = set(rf_data['node1']).union(set(rf_data['node2']))
rf_node_combinations = list(itertools.product(rf_all_nodes, repeat=2))

# Create an empty DataFrame to store the merged data
rf_merged_data = pd.DataFrame(columns=['node1', 'node2', 'GroupCode', 'has_edge'])

# Iterate over each unique product category
for group_code in ['A', 'M', 'P', 'S']:
    # Create a DataFrame containing all combinations of nodes for the current product category
    group_combinations = pd.DataFrame(rf_node_combinations, columns=['node1', 'node2'])

    # Rename the 'GroupCode' column in rf_data to avoid duplicate columns after merging
    rf_data_group = rf_data[rf_data['GroupCode'] == group_code].rename(columns={'GroupCode': 'GroupCode_data'})

    # Merge with the original data to check for edges
    merged_group_data = pd.merge(group_combinations, rf_data_group, on=['node1', 'node2'], how='left')

    # Create a new label column based on the presence of an edge
    merged_group_data['has_edge'] = 0

    # Drop the redundant columns
    merged_group_data.drop(columns=['GroupCode_data'], inplace=True)
    merged_group_data["GroupCode"] = group_code
    # Concatenate with the main merged data
    rf_merged_data = pd.concat([rf_merged_data, merged_group_data], ignore_index=True)


def check_edge_presence(edge1, edge2, groupcode, rf_data):
    """
    Check if an edge exists between two nodes for a given groupcode.

    Parameters:
    - edge1 (int): The first node of the edge.
    - edge2 (int): The second node of the edge.
    - groupcode (str): The groupcode to check.
    - rf_data (DataFrame): The DataFrame containing edge data.

    Returns:
    - int: 1 if edge is found, 0 otherwise.
    """
    # Filter rf_data for the specified groupcode and edge
    edge_exists = (rf_data['GroupCode'] == groupcode) & ((rf_data['node1'] == edge1) & (rf_data['node2'] == edge2) | (rf_data['node1'] == edge2) & (rf_data['node2'] == edge1))

    # Return 1 if edge exists, 0 otherwise
    return int(edge_exists.any())


for i in range(len(rf_merged_data)):
    rf_merged_data.loc[i, 'has_edge'] = check_edge_presence(rf_merged_data.loc[i, 'node1'], rf_merged_data.loc[i, 'node2'], rf_merged_data.loc[i, 'GroupCode'], rf_data)


# Save rf_merged_data to a CSV file
rf_merged_data.to_csv('rf_merged_data.csv', index=False)


rf_merged_data

Unnamed: 0,node1,node2,GroupCode,has_edge
0,0,0,A,0
1,0,1,A,0
2,0,2,A,0
3,0,3,A,0
4,0,4,A,0
...,...,...,...,...
6401,39,35,S,0
6402,39,36,S,0
6403,39,37,S,0
6404,39,38,S,1


In [3]:
# Initialize dictionaries to store models and accuracies for each product group
rf_models = {}
rf_accuracies = {}

# Iterate over each product group
for group_code in ['A', 'M', 'P', 'S']:
    # Filter data for the current product group
    group_data = rf_merged_data[rf_merged_data['GroupCode'] == group_code]

    # Split the dataset into features (X) and the target variable (y)
    X = group_data[['node1', 'node2']]
    y = group_data['has_edge']

    # Convert target variable to integers
    y = y.astype(int)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22810041)

    # Initialize and train the Random Forest model
    rf_model = rf(random_state=22810041)
    rf_model.fit(X_train, y_train)

    # Make predictions on the testing set
    y_pred = rf_model.predict(X_test)

    # Evaluate the model's performance
    accuracy = accuracy_score(y_test, y_pred)

    # Store the model and accuracy for the current product group
    rf_models[group_code] = rf_model
    rf_accuracies[group_code] = accuracy


In [4]:
# Print accuracy scores for each product group with Random Forest
for group_code, accuracy in rf_accuracies.items():
    print(f"Accuracy for Group Code {group_code} (Random Forest): {accuracy}")


Accuracy for Group Code A (Random Forest): 0.96875
Accuracy for Group Code M (Random Forest): 0.984375
Accuracy for Group Code P (Random Forest): 0.9937888198757764
Accuracy for Group Code S (Random Forest): 0.975


## Logistic Regression Approach
Next, we'll implement a  Logistic Regression model to predict edges in the graph dataset.



In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import itertools

# Read the data
lg_data = pd.read_csv('Edges (Product Group).csv')

# Step 1: Create a DataFrame containing all combinations of node1 and node2 for each product category
lg_all_nodes = set(lg_data['node1']).union(set(lg_data['node2']))
lg_node_combinations = list(itertools.product(lg_all_nodes, repeat=2))

# Create an empty DataFrame to store the merged data
lg_merged_data = pd.DataFrame(columns=['node1', 'node2', 'GroupCode', 'has_edge'])

# Iterate over each unique product category
for lg_group_code in ['A', 'M', 'P', 'S']:
    # Create a DataFrame containing all combinations of nodes for the current product category
    lg_group_combinations = pd.DataFrame(lg_node_combinations, columns=['node1', 'node2'])

    # Rename the 'GroupCode' column in lg_data to avoid duplicate columns after merging
    lg_data_group = lg_data[lg_data['GroupCode'] == lg_group_code].rename(columns={'GroupCode': 'GroupCode_data'})

    # Merge with the original data to check for edges
    lg_merged_group_data = pd.merge(lg_group_combinations, lg_data_group, on=['node1', 'node2'], how='left')

    # Create a new label column based on the presence of an edge
    lg_merged_group_data['has_edge'] = 0

    # Drop the redundant columns
    lg_merged_group_data.drop(columns=['GroupCode_data'], inplace=True)
    lg_merged_group_data["GroupCode"] = lg_group_code
    # Concatenate with the main merged data
    lg_merged_data = pd.concat([lg_merged_data, lg_merged_group_data], ignore_index=True)


def check_edge_presence1(edge1, edge2, groupcode, data):
    """
    Check if an edge exists between two nodes for a given groupcode.

    Parameters:
    - edge1 (int): The first node of the edge.
    - edge2 (int): The second node of the edge.
    - groupcode (str): The groupcode to check.
    - data (DataFrame): The DataFrame containing edge data.

    Returns:
    - int: 1 if edge is found, 0 otherwise.
    """
    # Filter data for the specified groupcode and edge
    edge_exists = (data['GroupCode'] == groupcode) & ((data['node1'] == edge1) & (data['node2'] == edge2) | (data['node1'] == edge2) & (data['node2'] == edge1))

    # Return 1 if edge exists, 0 otherwise
    return int(edge_exists.any())


for i in range(len(lg_merged_data)):
    lg_merged_data.loc[i, 'has_edge'] = check_edge_presence1(lg_merged_data.loc[i, 'node1'], lg_merged_data.loc[i, 'node2'], lg_merged_data.loc[i, 'GroupCode'], lg_data)


lg_merged_data

Unnamed: 0,node1,node2,GroupCode,has_edge
0,0,0,A,0
1,0,1,A,0
2,0,2,A,0
3,0,3,A,0
4,0,4,A,0
...,...,...,...,...
6401,39,35,S,0
6402,39,36,S,0
6403,39,37,S,0
6404,39,38,S,1


In [6]:
# Train logistic regression model for each product group
lg_models = {}
lg_accuracies = {}

# Iterate over each product group
for lg_group_code in ['A', 'M', 'P', 'S']:
    # Filter data for the current product group
    lg_group_data = lg_merged_data[lg_merged_data['GroupCode'] == lg_group_code]

    # Split the dataset into features (X) and the target variable (y)
    lg_X = lg_group_data[['node1', 'node2']]
    lg_y = lg_group_data['has_edge']

    # Convert target variable to integers
    lg_y = lg_y.astype(int)

    # Split the dataset into features (X) and the target variable (y)
    lg_X_train, lg_X_test, lg_y_train, lg_y_test = train_test_split(lg_X, lg_y, test_size=0.2, random_state=22810041)


    # Initialize and train the Logistic Regression model
    lg_model = LogisticRegression(random_state=22810041)
    lg_model.fit(lg_X_train, lg_y_train)

    # Make predictions on the testing set
    lg_y_pred = lg_model.predict(lg_X_test)

    # Evaluate the model's performance
    lg_accuracy = accuracy_score(lg_y_test, lg_y_pred)

    # Store the model and accuracy for the current product group
    lg_models[lg_group_code] = lg_model
    lg_accuracies[lg_group_code] = lg_accuracy


In [7]:
# Print accuracy scores for each product group with Logistic Regression
for lg_group_code, lg_accuracy in lg_accuracies.items():
    print(f"Accuracy for Group Code {lg_group_code} (Logistic Regression): {lg_accuracy}")


Accuracy for Group Code A (Logistic Regression): 0.965625
Accuracy for Group Code M (Logistic Regression): 0.96875
Accuracy for Group Code P (Logistic Regression): 0.9596273291925466
Accuracy for Group Code S (Logistic Regression): 0.884375


## Graph Neural Network (GNN) Approach
Next, we'll implement a Graph Neural Network model to predict edges in the graph dataset.



In [8]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Read the data
gnn_data = pd.read_csv('Edges (Product Group).csv')

# Step 1: Create a DataFrame containing all combinations of node1 and node2 for each product category
gnn_all_nodes = set(gnn_data['node1']).union(set(gnn_data['node2']))
gnn_node_combinations = list(itertools.product(gnn_all_nodes, repeat=2))

# Create an empty DataFrame to store the merged data
gnn_merged_data = pd.DataFrame(columns=['node1', 'node2', 'GroupCode', 'has_edge'])

# Iterate over each unique product category
for gnn_group_code in ['A', 'M', 'P', 'S']:
    # Create a DataFrame containing all combinations of nodes for the current product category
    gnn_group_combinations = pd.DataFrame(gnn_node_combinations, columns=['node1', 'node2'])

    # Rename the 'GroupCode' column in gnn_data to avoid duplicate columns after merging
    gnn_data_group = gnn_data[gnn_data['GroupCode'] == gnn_group_code].rename(columns={'GroupCode': 'GroupCode_data'})

    # Merge with the original data to check for edges
    gnn_merged_group_data = pd.merge(gnn_group_combinations, gnn_data_group, on=['node1', 'node2'], how='left')

    # Create a new label column based on the presence of an edge
    gnn_merged_group_data['has_edge'] = 0

    # Drop the redundant columns
    gnn_merged_group_data.drop(columns=['GroupCode_data'], inplace=True)
    gnn_merged_group_data["GroupCode"] = gnn_group_code
    # Concatenate with the main merged data
    gnn_merged_data = pd.concat([gnn_merged_data, gnn_merged_group_data], ignore_index=True)


def check_edge_presence(edge1, edge2, groupcode, data):
    """
    Check if an edge exists between two nodes for a given groupcode.

    Parameters:
    - edge1 (int): The first node of the edge.
    - edge2 (int): The second node of the edge.
    - groupcode (str): The groupcode to check.
    - data (DataFrame): The DataFrame containing edge data.

    Returns:
    - int: 1 if edge is found, 0 otherwise.
    """
    # Filter data for the specified groupcode and edge
    edge_exists = (data['GroupCode'] == groupcode) & ((data['node1'] == edge1) & (data['node2'] == edge2) | (data['node1'] == edge2) & (data['node2'] == edge1))

    # Return 1 if edge exists, 0 otherwise
    return int(edge_exists.any())


for i in range(len(gnn_merged_data)):
    gnn_merged_data.loc[i, 'has_edge'] = check_edge_presence(gnn_merged_data.loc[i, 'node1'], gnn_merged_data.loc[i, 'node2'], gnn_merged_data.loc[i, 'GroupCode'], gnn_data)

gnn_merged_data


Unnamed: 0,node1,node2,GroupCode,has_edge
0,0,0,A,0
1,0,1,A,0
2,0,2,A,0
3,0,3,A,0
4,0,4,A,0
...,...,...,...,...
6401,39,35,S,0
6402,39,36,S,0
6403,39,37,S,0
6404,39,38,S,1


In [9]:
# Step 4: Train a separate GNN model for each product group
gnn_models = {}  # Dictionary to store trained GNN models

# Dictionary to store accuracy scores
accuracy_scores = {}

for group_code in ['A', 'M', 'P', 'S']:
    print(f"Training GNN model for Group Code {group_code}...")

    # Filter data for the current product group
    group_data = gnn_merged_data[gnn_merged_data['GroupCode'] == group_code]

    # Split the data into features (X) and the target variable (y)
    group_X = group_data[['node1', 'node2']]
    group_y = group_data['has_edge']

    # Split the data into training and testing sets
    group_X_train, group_X_test, group_y_train, group_y_test = train_test_split(group_X, group_y, test_size=0.2, random_state=22810041)

    # Ensure that the data is numeric and has no missing values
    group_X_train = group_X_train.astype(float)
    group_y_train = group_y_train.astype(int)

    # Define and train the GNN model
    gnn_model = tf.keras.Sequential([
        tf.keras.layers.Embedding(input_dim=len(gnn_all_nodes)+1, output_dim=64, input_length=2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])

    gnn_model.compile(optimizer='adam',
                      loss='binary_crossentropy',
                      metrics=['accuracy'])

    gnn_model.fit(group_X_train, group_y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=0)

    group_X_test_array = group_X_test.values

    # Convert DataFrame to NumPy array
    group_X_test_array = group_X_test.values

    # Convert the array to floating-point numbers
    group_X_test_array = group_X_test_array.astype(float)

    # Predict with the model
    group_y_pred = gnn_model.predict(group_X_test_array)
    group_y_pred_binary = (group_y_pred > 0.5).astype(int)

    # Convert group_y_test to a NumPy array
    group_y_test_array = group_y_test.values
    group_y_test_array = group_y_test.astype(int).values


    group_accuracy = accuracy_score(group_y_test_array, group_y_pred_binary)
    print(f"Accuracy for Group Code {group_code}: {group_accuracy}")
    print(" ")
    # Store the trained model in the dictionary
    gnn_models[group_code] = gnn_model

    # Save accuracy score
    accuracy_scores[group_code] = accuracy



Training GNN model for Group Code A...
Accuracy for Group Code A: 0.98125
 
Training GNN model for Group Code M...
Accuracy for Group Code M: 0.996875
 
Training GNN model for Group Code P...
Accuracy for Group Code P: 0.9937888198757764
 
Training GNN model for Group Code S...
Accuracy for Group Code S: 0.990625
 


In [10]:
# Print accuracy scores for each product group with GNN
for group_code, accuracy in accuracy_scores.items():
    print(f"Accuracy for Group Code {group_code} (GNN): {accuracy}")


Accuracy for Group Code A (GNN): 0.975
Accuracy for Group Code M (GNN): 0.975
Accuracy for Group Code P (GNN): 0.975
Accuracy for Group Code S (GNN): 0.975
