## Chapter 3: Convolutional GNNs: GCN and GraphSage

Introduction
This notebook demonstrates the use of Convolutional Graph Neural Networks (GCNs) and GraphSAGE for graph representation learning, leveraging the PyTorch Geometric library. To ensure compatibility with resource-limited environments, we utilize a subgraph containing only 10,000 nodes, which is less than 1% of the total nodes in the OGBN-Products dataset. The key processes covered include setting up the environment, data preparation, model definition, training, and evaluation.

### Section 1: Environment Setup

In this section, we ensure the environment is correctly configured with the necessary libraries and dependencies. We start by checking the CUDA version and PyTorch installation, followed by installing additional required libraries.

In [68]:
# Find the CUDA version PyTorch was installed with
!python -c "import torch; print(torch.version.cuda)"

12.1


In [69]:
# PyTorch version
!python -c "import torch; print(torch.__version__)"

2.3.0+cu121


In [None]:

# Use the above information to fill in the http address below
%%capture
!pip install ogb pyg-lib torch-scatter torch-sparse -f https://data.pyg.org/whl/torch-2.3.0+cu121.html
!pip install torch-geometric

In [None]:
# import gc

import torch
import torch.nn.functional as F
import torch.nn.functional as F
from torch.optim.lr_scheduler import ReduceLROnPlateau

import torch_geometric.transforms as T
from torch_geometric.data import NeighborSampler
from torch_geometric.nn import SAGEConv, GCNConv
from torch_geometric import utils, loader
from torch_geometric.utils import subgraph

from torch_geometric.nn import (
    Aggregation,
    MaxAggregation,
    MeanAggregation,
    MultiAggregation,
    SoftmaxAggregation,
    StdAggregation,
    SumAggregation,
    VarAggregation,
    LSTMAggregation
)

# importing obg datatset
from ogb.nodeproppred import PygNodePropPredDataset
from ogb.nodeproppred import PygNodePropPredDataset, Evaluator

from pandas.core.common import flatten
# import seaborn as sns
import matplotlib.pyplot as plt
# sns.set(rc={'figure.figsize':(16.7,8.27)})
# sns.set_theme(style="ticks")
from collections import Counter


from tqdm import tqdm
import collections
import os.path as osp
import pandas as pd
import numpy as np
import collections
# from pandas.core.common import flatten
# from scipy.special import softmax

import random

# Setting the seed
random.seed(42)
np.random.seed(42)
torch.manual_seed(42)
torch.cuda.manual_seed(42)
# torch.backends.cudnn.deterministic = True


### Section 2: Loading the Graph Data and Creating a Subgraph
In this section, we load the OGBN-Products dataset and create a subgraph consisting of 10,000 nodes. This subset of the full dataset allows us to perform experiments in environments with limited computational resources while still demonstrating the effectiveness of graph convolutional networks (GCNs) and GraphSAGE.

#### Step 1: Load the OGBN-Products Dataset
First, we will load the OGBN-Products dataset using the PyTorch Geometric library. The dataset contains product information from Amazon, where nodes represent products and edges represent co-purchasing relationships.


#### Step 2: Create a Subgraph
To create a subgraph with 10,000 nodes, we randomly select a subset of nodes from the full graph. This subset will be used for training and evaluating our models.


#### Step 3: Process Mappings for Labels and Product IDs
Next, we load and process mappings for label indices to product categories and product IDs. These mappings will help us interpret the results and visualize the data.


#### Step 4: Verify the Subgraph
Finally, we verify that the subgraph has been created correctly by checking the number of nodes and edges.


By completing these steps, we have successfully loaded the OGBN-Products dataset and created a subgraph with 10,000 nodes. This subgraph will be used in subsequent sections to train and evaluate our GCN and GraphSAGE models.

In [None]:
opt = 'GCN'

In [None]:
# Load the OGB evaluator for the dataset
evaluator = Evaluator(name='ogbn-products')

# Establish the device for model training 'cuda' if GPU, 'cpu' otherwise
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device = torch.device(device)

# Confirm the device. If it's a GPU, 'cuda' will print
print('Device: {}'.format(device))

In [None]:
# root = osp.join(osp.dirname(osp.realpath('./')), 'data', 'products')
# download and loading the obg dataset
root = osp.join(osp.dirname(osp.realpath('./')), 'content')
if opt == 'GCN':
  dataset = PygNodePropPredDataset( name='ogbn-products', root=root)
else:
  dataset = PygNodePropPredDataset( name='ogbn-products', transform=T.ToSparseTensor(), root=root)

In [None]:
data = dataset[0]


In [None]:
# Define the indices of nodes to include in your subset
subset_indices = torch.arange(0, 10000)  # For example, first 10,000 nodes


# Extract the subgraph corresponding to the subset
subset_edge_index, edge_attr, edge_mask = subgraph(subset_indices, data.edge_index, None, relabel_nodes=True, num_nodes=data.num_nodes, return_edge_mask=True)


# Adjust node features and labels for the subset
subset_features = data.x[subset_indices]
subset_labels = data.y[subset_indices]

# Create a new graph object for the subset
subset_graph = data.__class__()
subset_graph.edge_index = subset_edge_index
subset_graph.x = subset_features
subset_graph.y = subset_labels

In [None]:
# Load the mapping from label indices to product categories
path_to_file = '/content/ogbn_products/mapping/labelidx2productcategory.csv.gz'  # Adjust the path as needed
df = pd.read_csv(path_to_file)

# Create a dictionary mapping label indices to product categories
index_product_dict = dict(zip(df['label idx'], df['product category']))

In [None]:
# Load the mapping from label indices to product IDs
path_to_asin_file = '/content/ogbn_products/mapping/nodeidx2asin.csv.gz'  # Adjust the path as needed
asin_df = pd.read_csv(path_to_asin_file)

# Create a dictionary mapping label indices to product categories
node_asin_dict = dict(zip(asin_df['node idx'], asin_df['asin']))

In [None]:
# Verify the subgraph
print(f"Number of nodes in the subgraph: {subset_graph.num_nodes}")
print(f"Number of edges in the subgraph: {subset_graph.num_edges}")

In [None]:
subset_graph

### Section 3: Model Definition, Training, and Evaluation
In this section, we define and train our first GCN and GraphSAGE models. We will define the architecture of both models, set up the training process, and evaluate their performance. Additionally, we will use category and product/ASIN information to evaluate the accuracy of the category predictions and to generate product recommendations using similarity.

-
#### Step 1: Define the GCN Model
We start by defining the GCN model. The model consists of two graph convolutional layers (GCNConv), each followed by a ReLU activation function. The output layer applies a log softmax function to generate class probabilities.

-
#### Step 2: Define the GraphSAGE Model
Next, we define the GraphSAGE model. Similar to the GCN model, it consists of two graph convolutional layers (SAGEConv), each followed by a ReLU activation function. The output layer applies a log softmax function to generate class probabilities.

-

#### Step 3: Define the Training and Evaluation Process
We set up the optimizer, loss function, and training loop for both models. The training process involves forward propagation, loss computation, backpropagation, and optimization. We also implement functions to evaluate the model's performance on the validation set.

-
#### Step 4: Evaluate the Models
After training, we evaluate the performance of the GCN and GraphSAGE models on the test set. We calculate metrics such as F1 score and log loss to assess the quality of the predictions.

-

#### Step 5: Visualize the Training and Validation Loss
Finally, we plot the training and validation loss curves for both models to visualize their learning progress over epochs.

-

#### Step 6: Use Category and Product/ASIN Information
We utilize the category and product/ASIN information to further analyze the model's performance and generate recommendations:

We analyze the accuracy of category predictions by comparing the true labels with the predicted labels. This helps us understand the model's performance in predicting product categories.
Generate Product Recommendations:

Using the learned embeddings from the models, we compute the cosine similarity between products to generate recommendations. This involves finding the most similar products to a given product based on their embeddings.



In [None]:

class GCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

    def forward(self, x, edge_index, return_embeds=False):
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        x = self.conv2(x, edge_index)
        if return_embeds:
            return x
        return torch.log_softmax(x, dim=1)



gcn_model = GCN(in_channels=dataset.num_features, hidden_channels=64, out_channels=dataset.num_classes)



class GraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_channels, hidden_channels)
        self.conv2 = SAGEConv(hidden_channels, out_channels)

    def forward(self, x, edge_index, return_embeds=False):
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        x = self.conv2(x, edge_index)
        if return_embeds:
            return x
        return torch.log_softmax(x, dim=1)

graphsage_model = GraphSAGE(in_channels=dataset.num_features, hidden_channels=64, out_channels=dataset.num_classes)


optimizer = torch.optim.Adam(gcn_model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

def train():
    gcn_model.train()
    optimizer.zero_grad()
    out = gcn_model(data.x, data.edge_index)
    loss = criterion(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    return loss.item()



In [None]:
subset_graph.num_features = subset_graph.x.size(1)


# Ensure 'y' is a tensor of class labels
subset_graph.y = subset_graph.y.squeeze()  # Remove any extra dimensions

# If 'y' contains integer class labels
if subset_graph.y.dim() == 1:
    num_classes = int(subset_graph.y.max().item()) + 1

# If 'y' contains one-hot encoded labels
elif subset_graph.y.dim() == 2:
    num_classes = subset_graph.y.size(1)

subset_graph.num_classes = num_classes

In [None]:
num_classes, subset_graph.num_features

In [None]:

def generate_split_masks(num_nodes, train_frac=0.6, val_frac=0.2, test_frac=0.2):
    """
    Generates masks for train, validation, and test splits.

    Args:
        num_nodes (int): The number of nodes in the dataset.
        train_frac (float): The fraction of nodes to include in the training set.
        val_frac (float): The fraction of nodes to include in the validation set.
        test_frac (float): The fraction of nodes to include in the test set.

    Returns:
        A dictionary containing boolean masks for training, validation, and test splits.
    """
    # Ensure fractions sum up to 1
    assert train_frac + val_frac + test_frac == 1, "Fractions must sum up to 1."

    # Generate a random permutation of node indices
    indices = torch.randperm(num_nodes)

    # Determine split sizes
    train_size = int(num_nodes * train_frac)
    val_size = int(num_nodes * val_frac)

    # Create masks
    train_mask = torch.zeros(num_nodes, dtype=torch.bool)
    val_mask = torch.zeros(num_nodes, dtype=torch.bool)
    test_mask = torch.zeros(num_nodes, dtype=torch.bool)

    # Assign masks based on split sizes
    train_mask[indices[:train_size]] = True
    val_mask[indices[train_size:train_size + val_size]] = True
    test_mask[indices[train_size + val_size:]] = True

    return {'train_mask': train_mask, 'val_mask': val_mask, 'test_mask': test_mask}

# Usage example with a subset graph
subset_masks = generate_split_masks(num_nodes=subset_graph.num_nodes)

# Applying the masks to the subset graph
subset_graph.train_mask = subset_masks['train_mask']
subset_graph.val_mask = subset_masks['val_mask']
subset_graph.test_mask = subset_masks['test_mask']


In [None]:
# Update the in_channels and out_channels according to your subset_graph features and classes
gcn_model = GCN(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=subset_graph.num_classes)
graphsage_model = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=subset_graph.num_classes)

optimizer_gcn = torch.optim.Adam(gcn_model.parameters(), lr=0.01)
optimizer_sage = torch.optim.Adam(graphsage_model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

# Initialize variables to track the best model
best_val_loss_gcn = float('inf')
best_model_state_gcn = None

best_val_loss_sage = float('inf')
best_model_state_sage = None

def train(model, optimizer, data):
    model.train()
    optimizer.zero_grad()
    out = model(data.x, data.edge_index)
    loss = criterion(out[data.train_mask], data.y[data.train_mask].squeeze())  # Ensure the target tensor is of the correct shape
    loss.backward()
    optimizer.step()
    return loss.item()


In [None]:


# Initialize lists to store the training and validation loss values for each epoch
train_loss_gcn = []
val_loss_gcn = []
train_loss_sage = []
val_loss_sage = []

def validate(model, data):
    model.eval()
    with torch.no_grad():
        out = model(data.x, data.edge_index)
        val_loss = criterion(out[data.val_mask], data.y[data.val_mask].squeeze())  # Make sure the target tensor is of the correct shape
    return val_loss.item()

for epoch in range(200):
    loss_gcn = train(gcn_model, optimizer_gcn, subset_graph)
    train_loss_gcn.append(loss_gcn)
    current_val_loss_gcn = validate(gcn_model, subset_graph)
    val_loss_gcn.append(current_val_loss_gcn)

    # Check if the current validation loss is the best we've seen, save model if it is
    if current_val_loss_gcn < best_val_loss_gcn:
        best_val_loss_gcn = current_val_loss_gcn
        best_model_state_gcn = gcn_model.state_dict()

    loss_sage = train(graphsage_model, optimizer_sage, subset_graph)
    train_loss_sage.append(loss_sage)
    current_val_loss_sage = validate(graphsage_model, subset_graph)
    val_loss_sage.append(current_val_loss_sage)

    # Similarly for GraphSAGE
    if current_val_loss_sage < best_val_loss_sage:
        best_val_loss_sage = current_val_loss_sage
        best_model_state_sage = graphsage_model.state_dict()


    if epoch % 10 == 0:
        print(f'Epoch {epoch}, GCN Loss: {loss_gcn:.4f}, GraphSAGE Loss: {loss_sage:.4f}, GCN Val Loss: {val_loss_gcn[-1]:.4f}, GraphSAGE Val Loss: {val_loss_sage[-1]:.4f}')


# Optionally, load the best model state back into the model
gcn_model.load_state_dict(best_model_state_gcn)
graphsage_model.load_state_dict(best_model_state_sage)


def test_model(model, data):
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        out = model(data.x.to(device), data.edge_index.to(device))  # Forward pass
        preds = torch.argmax(out, dim=1)[data.test_mask]  # Only take the predictions from the test mask
        true_labels = data.y[data.test_mask]  # True labels for the test data
    return preds.cpu().numpy(), true_labels.cpu().numpy()

# Plotting the training and validation loss curves for GCN
plt.figure(figsize=(10, 6))
plt.plot(train_loss_gcn, label='GCN Training Loss')
plt.plot(val_loss_gcn, label='GCN Validation Loss', linestyle='--')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('GCN Training and Validation Loss Curves')
plt.legend()
plt.show()

# Plotting the training and validation loss curves for GraphSAGE
plt.figure(figsize=(10, 6))
plt.plot(train_loss_sage, label='GraphSAGE Training Loss')
plt.plot(val_loss_sage, label='GraphSAGE Validation Loss', linestyle='--')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('GraphSAGE Training and Validation Loss Curves')
plt.legend()
plt.show()


In [None]:
import torch
import torch.nn.functional as F
from sklearn.metrics import f1_score

def evaluate(model, data, mask):
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        logits = model(data.x, data.edge_index)  # Forward pass
        preds = logits.argmax(dim=1)  # Get the predicted classes

        # Calculate F1 score
        f1 = f1_score(data.y[mask].cpu().numpy(), preds[mask].cpu().numpy(), average='weighted')

        # Calculate log loss using PyTorch's NLLLoss
        log_probs = F.log_softmax(logits[mask], dim=1)
        loss_function = torch.nn.NLLLoss()
        logloss = loss_function(log_probs, data.y[mask])

    return f1, logloss.item()

# Example usage
f1_gcn, logloss_gcn = evaluate(gcn_model, subset_graph, subset_graph.val_mask)  # Evaluate GCN on validation set
f1_sage, logloss_sage = evaluate(graphsage_model, subset_graph, subset_graph.val_mask)  # Evaluate GraphSAGE on validation set

print(f"GCN F1 Score: {f1_gcn:.4f}, Log Loss: {logloss_gcn:.4f}")
print(f"GraphSAGE F1 Score: {f1_sage:.4f}, Log Loss: {logloss_sage:.4f}")


In [None]:
def analyze_category_predictions(true_labels, pred_labels):
    analysis_results = {}
    for category in set(true_labels):
        indices = [i for i, label in enumerate(true_labels) if label == category]

        if not indices:  # Skip categories not present in the subset
            continue

        category_preds = [pred_labels[i] for i in indices]
        correct_preds_count = category_preds.count(category)

        mispredictions = [pred for pred in category_preds if pred != category]
        most_common_misprediction, most_common_misprediction_count = Counter(mispredictions).most_common(1)[0] if mispredictions else ("None", 0)

        analysis_results[category] = {
            'Correct Prediction Percentage': correct_preds_count / len(indices) * 100 if indices else 0,
            'Correct Prediction Count': correct_preds_count,
            'Most Common Misprediction': most_common_misprediction,
            'Misprediction Percentage': most_common_misprediction_count / len(indices) * 100 if mispredictions and indices else 0,
            'Misprediction Count': most_common_misprediction_count,
            'Total Count': len(indices)
        }

    return analysis_results

all_true_labels = [index_product_dict[label.item()] for label in subset_graph.y]

with torch.no_grad():
    gcn_all_logits = gcn_model(subset_graph.x, subset_graph.edge_index)
    gcn_all_preds = torch.argmax(gcn_all_logits, dim=1)
    gcn_all_preds = [index_product_dict[pred.item()] for pred in gcn_all_preds]

    sage_all_logits = graphsage_model(subset_graph.x, subset_graph.edge_index)
    sage_all_preds = torch.argmax(sage_all_logits, dim=1)
    sage_all_preds = [index_product_dict[pred.item()] for pred in sage_all_preds]

gcn_analysis_all = analyze_category_predictions(all_true_labels, gcn_all_preds)

# Example: Printing analysis results for the GCN model for all nodes
for category, stats in gcn_analysis_all.items():
    print(f"Category: {category}")
    print(f"GCN Correct Predictions: {stats['Correct Prediction Count']} / {stats['Total Count']} ({stats['Correct Prediction Percentage']:.2f}%)")
    print(f"GCN Most Common Misprediction: {stats['Most Common Misprediction']} - {stats['Misprediction Count']} / {stats['Total Count']} ({stats['Misprediction Percentage']:.2f}%)")
    print("-" * 50)


In [None]:
import pandas as pd

# Convert the analysis results into a list of dictionaries for DataFrame construction
data = []
for category, stats in gcn_analysis_all.items():
    data.append({
        'Category': category,
        'GCN Correct Predictions (%)': f"{stats['Correct Prediction Percentage']:.2f}% ({stats['Correct Prediction Count']}/{stats['Total Count']})",
        'GCN Most Common Misprediction': f"{stats['Most Common Misprediction']} ({stats['Misprediction Percentage']:.2f}% of {stats['Total Count']})",
        'Total Count': stats['Total Count']  # Adding the total count
    })

# Create a DataFrame from the list of dictionaries
df = pd.DataFrame(data)

# Sort the DataFrame by the 'Total Count' column in descending order
df.sort_values('Total Count', ascending=False, inplace=True)

# Optional: drop the 'Total Count' column if you don't want to display it
# df = df.drop(columns=['Total Count'])

# Display the DataFrame
print(df)
df

In [None]:
import pandas as pd

# Analyze the predictions made by the GraphSAGE model
sage_analysis_all = analyze_category_predictions(all_true_labels, sage_all_preds)

# Convert the analysis results into a list of dictionaries for DataFrame construction
sage_data = []
for category, stats in sage_analysis_all.items():
    sage_data.append({
        'Category': category,
        'GraphSAGE Correct Predictions (%)': f"{stats['Correct Prediction Percentage']:.2f}% ({stats['Correct Prediction Count']}/{stats['Total Count']})",
        'GraphSAGE Most Common Misprediction': f"{stats['Most Common Misprediction']} ({stats['Misprediction Percentage']:.2f}% of {stats['Total Count']})",
        'Total Count': stats['Total Count']  # Adding the total count
    })

# Create a DataFrame from the list of dictionaries
sage_df = pd.DataFrame(sage_data)

# Sort the DataFrame by the 'Total Count' column in descending order
sage_df.sort_values('Total Count', ascending=False, inplace=True)

# Display the DataFrame
print(sage_df)
sage_df

In [None]:
pd.concat([df, sage_df.drop(columns=['Category'])], axis=1).drop(columns=['Total Count'])


# df.columns = ['Product Category'] + df.columns[1:].tolist()


In [None]:
# Load trained models (assuming they are already trained and loaded into gcn_model and graphsage_model)
gcn_model.eval()
graphsage_model.eval()

with torch.no_grad():
    gcn_embeddings = gcn_model(subset_graph.x, subset_graph.edge_index, return_embeds=True)
    graphsage_embeddings = graphsage_model(subset_graph.x, subset_graph.edge_index, return_embeds=True)


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity matrices
gcn_similarity_matrix = cosine_similarity(gcn_embeddings.cpu().numpy())
graphsage_similarity_matrix = cosine_similarity(graphsage_embeddings.cpu().numpy())

# Example usage: Find the top-k most similar items to a given product node
product_idx = 123  # Replace with the index of the product node
top_k = 6

# For GCN embeddings
top_k_similar_indices_gcn = np.argsort(-gcn_similarity_matrix[product_idx])[:top_k]
print(f"Top {top_k} similar products to product {product_idx} according to GCN:")
print(top_k_similar_indices_gcn)

# For GraphSAGE embeddings
top_k_similar_indices_sage = np.argsort(-graphsage_similarity_matrix[product_idx])[:top_k]
print(f"Top {top_k} similar products to product {product_idx} according to GraphSAGE:")
print(top_k_similar_indices_sage)


In [None]:
node_asin_dict

In [None]:
list(top_k_similar_indices_gcn)

In [None]:
[ node_asin_dict[x] for x in list(top_k_similar_indices_gcn) ]





6" Handheld Brass Telescope with Wooden Box - Pirate Navigation



In [None]:
# Funko POP Television: Adventure Time Marceline Vinyl Figure


# Adventure Time 5" Finn with Accessories

# Funko My Little Pony: DJ Pon-3 Vinyl Figure

# My Little Pony: Twilight Sparkle


# Plastic Gold Coins 288ct With 24 Pirate Themed tatoos

# Handheld Brass Telescope with Wooden Box - Pirate Navigation

In [None]:
list(top_k_similar_indices_sage)

In [None]:
[ node_asin_dict[x] for x in list(top_k_similar_indices_sage) ]

### Section 4: Experimenting with Aggregations
In this section, we experiment with different aggregation functions within the GraphSAGE model to understand their impact on model performance. Aggregation functions determine how node features are combined from neighboring nodes, which can significantly affect the learning process. We will test mean, sum, and max aggregation functions and evaluate their effects on training and validation performance.

#### Step 1: Define the GraphSAGE Model with Custom Aggregation
We modify the GraphSAGE model to accept an aggregation function as a parameter. This allows us to easily switch between different aggregation strategies and observe their effects on the model's performance.

#### Step 2: Train and Evaluate the Model with Different Aggregations
We define a function to train and evaluate the GraphSAGE model with a specified aggregation function. This function trains the model, computes the loss, and evaluates the validation accuracy at regular intervals. By comparing these metrics, we can determine the effectiveness of each aggregation strategy.

#### Step 3: Training Curve Comparison
To visualize the impact of different aggregation functions, we plot the training and validation loss curves for each aggregation strategy (mean, sum, max). This helps us understand how each aggregation function influences the convergence and performance of the model over time.

#### Step 4: Analyze Category Predictions
We further analyze the model's predictions by evaluating the accuracy of category predictions. We assess the accuracy of the model's category predictions by comparing the predicted labels with the true labels. This helps us understand the model's effectiveness in classifying products into their respective categories.

In [None]:
# Section 3.2

In [None]:
from torch_geometric.nn import SAGEConv

class GraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, agg_func='mean'):
        super(GraphSAGE, self).__init__()
        self.agg_func = agg_func
        self.conv1 = SAGEConv(in_channels, hidden_channels, self.agg_func)
        self.conv2 = SAGEConv(hidden_channels, out_channels, self.agg_func)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)


In [None]:
def train_evaluate(agg_func):
    model = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func=agg_func).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
    criterion = torch.nn.CrossEntropyLoss()

    for epoch in range(100):  # You can adjust the number of epochs
        model.train()
        optimizer.zero_grad()
        out = model(subset_graph.x.to(device), subset_graph.edge_index.to(device))
        loss = criterion(out[subset_graph.train_mask], subset_graph.y[subset_graph.train_mask].squeeze().to(device))
        loss.backward()
        optimizer.step()

        if epoch % 10 == 0:
            model.eval()
            with torch.no_grad():
                preds = out.argmax(dim=1)
                correct = (preds[subset_graph.val_mask] == subset_graph.y[subset_graph.val_mask].squeeze().to(device)).sum()
                acc = int(correct) / int(subset_graph.val_mask.sum())
                print(f'Epoch: {epoch}, Loss: {loss.item():.4f}, Val Acc: {acc:.4f}')

    return acc

# Aggregation functions to test
aggregations = ['mean', 'sum', 'max']

# Evaluate models with different aggregation functions
for agg in aggregations:
    print(f'\nTesting with {agg} aggregation:')
    acc = train_evaluate(agg)
    print(f'Validation Accuracy with {agg} aggregation: {acc:.4f}\n')


In [None]:
# Training curve comparison

In [None]:
import matplotlib.pyplot as plt

def train_model(model, data, optimizer, criterion):
    model.train()  # Set the model to training mode
    optimizer.zero_grad()  # Clear gradients

    out = model(data.x.to(device), data.edge_index.to(device))  # Forward pass
    loss = criterion(out[data.train_mask], data.y[data.train_mask].squeeze().to(device))  # Calculate loss only on the training mask
    loss.backward()  # Backpropagate the loss
    optimizer.step()  # Update model parameters

    return loss.item()  # Return the loss value

# Assuming GraphSAGE model is defined with an 'agg_func' parameter to specify the aggregation function
class GraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, agg_func='mean'):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr=agg_func)  # First GraphSAGE layer with specified aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr=agg_func)  # Second GraphSAGE layer with specified aggregation

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)

# Create models with different aggregation functions
model_mean = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='mean').to(device)
model_sum = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='add').to(device)  # Use 'add' for sum aggregation
model_max = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='max').to(device)

# Create separate optimizers for each model
optimizer_mean = torch.optim.Adam(model_mean.parameters(), lr=0.01)
optimizer_sum = torch.optim.Adam(model_sum.parameters(), lr=0.01)
optimizer_max = torch.optim.Adam(model_max.parameters(), lr=0.01)

criterion = torch.nn.CrossEntropyLoss()

# Training loops for different aggregation strategies
losses_mean_agg, losses_sum_agg, losses_max_agg = [], [], []

for epoch in range(1000):
    loss_mean = train_model(model_mean, subset_graph, optimizer_mean, criterion)
    losses_mean_agg.append(loss_mean)

    loss_sum = train_model(model_sum, subset_graph, optimizer_sum, criterion)
    losses_sum_agg.append(loss_sum)

    loss_max = train_model(model_max, subset_graph, optimizer_max, criterion)
    losses_max_agg.append(loss_max)

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Mean Agg Loss: {loss_mean:.4f}, Sum Agg Loss: {loss_sum:.4f}, Max Agg Loss: {loss_max:.4f}")

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(losses_mean_agg, label='Mean Aggregation')
plt.plot(losses_sum_agg, label='Sum Aggregation')
plt.plot(losses_max_agg, label='Max Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Training Loss')
plt.title('Training Loss Curves for Different Aggregation Strategies')
plt.legend()
plt.show()


In [None]:
import matplotlib.pyplot as plt

import torch
from torch_geometric.data import Data
from torch_geometric.utils import sort_edge_index, subgraph



class GraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, agg_func='mean'):
        super(GraphSAGE, self).__init__()
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr=agg_func)  # First GraphSAGE layer with specified aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr=agg_func)  # Second GraphSAGE layer with specified aggregation

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return F.log_softmax(x, dim=1)


def evaluate_model(model, data, criterion):
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        out = model(data.x.to(device), data.edge_index.to(device))  # Forward pass
        val_loss = criterion(out[data.val_mask], data.y[data.val_mask].squeeze().to(device))  # Calculate loss on the validation mask
    return val_loss.item()

# Training and evaluation loop
def train_and_evaluate(model, data, optimizer, criterion, epochs):
    train_losses, val_losses = [], []

    for epoch in range(epochs):
        train_loss = train_model(model, data, optimizer, criterion)
        val_loss = evaluate_model(model, data, criterion)
        train_losses.append(train_loss)
        val_losses.append(val_loss)

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

    return train_losses, val_losses

model_mean = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='mean').to(device)
model_sum = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='add').to(device)  # Use 'add' for sum aggregation
model_max = GraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, agg_func='max').to(device)


# Create separate optimizers for each model
optimizer_mean = torch.optim.Adam(model_mean.parameters(), lr=0.01)
optimizer_sum = torch.optim.Adam(model_sum.parameters(), lr=0.01)
optimizer_max = torch.optim.Adam(model_max.parameters(), lr=0.01)


criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_mean, val_losses_mean = train_and_evaluate(model_mean, subset_graph, optimizer_mean, criterion, 200)
train_losses_sum, val_losses_sum = train_and_evaluate(model_sum, subset_graph, optimizer_sum, criterion, 200)
train_losses_max, val_losses_max = train_and_evaluate(model_max, subset_graph, optimizer_max, criterion, 200)



# Calculate global minimum and maximum
global_min = min(min(train_losses_mean), min(val_losses_mean), min(train_losses_sum), min(val_losses_sum), min(train_losses_max), min(val_losses_max))
global_max = max(max(train_losses_mean), max(val_losses_mean), max(train_losses_sum), max(val_losses_sum), max(train_losses_max), max(val_losses_max))


# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

plt.subplot(1, 3, 1)
plt.plot(train_losses_mean, label='Train Loss')
plt.plot(val_losses_mean, label='Val Loss', linestyle='--')
plt.title('Mean Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim(global_min, global_max)  # Set uniform y-axis limits
plt.legend()

# Sum aggregation curves
plt.subplot(1, 3, 2)
plt.plot(train_losses_sum, label='Train Loss')
plt.plot(val_losses_sum, label='Val Loss', linestyle='--')
plt.title('Sum Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim(global_min, global_max)  # Set uniform y-axis limits
plt.legend()

# Max aggregation curves
plt.subplot(1, 3, 3)
plt.plot(train_losses_max, label='Train Loss')
plt.plot(val_losses_max, label='Val Loss', linestyle='--')
plt.title('Max Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.ylim(global_min, global_max)  # Set uniform y-axis limits
plt.legend()


plt.tight_layout()
plt.show()


In [None]:
# Comparing the perfornace of the models

import pandas as pd
from collections import Counter

# Function to perform analysis
def analyze_category_predictions(true_labels, pred_labels):
    analysis_results = {}
    for category in set(true_labels):
        indices = [i for i, label in enumerate(true_labels) if label == category]
        if not indices:
            continue
        category_preds = [pred_labels[i] for i in indices]
        correct_preds_count = category_preds.count(category)
        mispredictions = [pred for pred in category_preds if pred != category]
        most_common_misprediction = Counter(mispredictions).most_common(1)
        most_common_misprediction = most_common_misprediction[0] if most_common_misprediction else ("None", 0)

        analysis_results[category] = {
            'Correct Prediction Percentage': correct_preds_count / len(indices) * 100,
            'Correct Prediction Count': correct_preds_count,
            'Most Common Misprediction': most_common_misprediction[0],
            'Misprediction Percentage': most_common_misprediction[1] / len(indices) * 100,
            'Total Count': len(indices)
        }
    return analysis_results

# Convert to human-readable labels
all_true_labels = [index_product_dict[label.item()] for label in subset_graph.y.squeeze()]

# Get predictions for all models
def get_model_predictions(model, graph):
    model.eval()
    device = graph.x.device  # Ensure that model and data are on the same device
    model.to(device)
    with torch.no_grad():
        logits = model(graph.x.to(device), graph.edge_index.to(device))
        preds = torch.argmax(logits, dim=1)
        return [index_product_dict[pred.item()] for pred in preds]

# Analyze predictions for all models
mean_preds = get_model_predictions(model_mean, subset_graph)
sum_preds = get_model_predictions(model_sum, subset_graph)
max_preds = get_model_predictions(model_max, subset_graph)

mean_analysis = analyze_category_predictions(all_true_labels, mean_preds)
sum_analysis = analyze_category_predictions(all_true_labels, sum_preds)
max_analysis = analyze_category_predictions(all_true_labels, max_preds)

# Create DataFrames from the analysis
def create_dataframe(analysis_results):
    data = [{
        'Category': category,
        'Correct Predictions (%)': f"{stats['Correct Prediction Percentage']:.2f}% ({stats['Correct Prediction Count']})",
        'Most Common Misprediction': f"{stats['Most Common Misprediction']} ({stats['Misprediction Percentage']:.2f}%)",
        'Total Count': stats['Total Count']
    } for category, stats in analysis_results.items()]
    return pd.DataFrame(data).sort_values('Total Count', ascending=False)

mean_df = create_dataframe(mean_analysis)
sum_df = create_dataframe(sum_analysis)
max_df = create_dataframe(max_analysis)

# Display the DataFrames
print("Mean Aggregation Analysis")
print(mean_df)
print("\nSum Aggregation Analysis")
print(sum_df)
print("\nMax Aggregation Analysis")
print(max_df)


In [None]:
# Ensure 'Category' is a column and not an index
mean_df = mean_df.reset_index(drop=True)
sum_df = sum_df.reset_index(drop=True)
max_df = max_df.reset_index(drop=True)

# Merge the dataframes on 'Category' column
comparison_df = pd.merge(mean_df, sum_df, on='Category', suffixes=('_Mean', '_Sum'))
comparison_df = pd.merge(comparison_df, max_df, on='Category')
comparison_df.rename(columns={
    'Correct Predictions (%)': 'Correct Predictions (%)_Max',
    'Most Common Misprediction': 'Most Common Misprediction_Max',
    'Total Count': 'Total Count_Max'
}, inplace=True)

# Order the columns to group by Mean, Sum, Max
column_order = ['Category',
                'Correct Predictions (%)_Mean', 'Most Common Misprediction_Mean',
                'Correct Predictions (%)_Sum', 'Most Common Misprediction_Sum',
                'Correct Predictions (%)_Max', 'Most Common Misprediction_Max',
                'Total Count_Mean']  # You can include the Total Counts for Sum and Max if they differ

# Reindex the DataFrame based on the new column order
comparison_df = comparison_df[column_order]

# Display the combined DataFrame
comparison_df


In [None]:
import torch
from sklearn.metrics import f1_score, log_loss

import torch
import torch.nn.functional as F
from sklearn.metrics import f1_score
import numpy as np

def evaluate(model, data, mask):
    device = data.x.device  # Ensure that model and data are on the same device
    model.to(device)
    model.eval()

    # Initialize the loss function for negative log likelihood
    loss_function = torch.nn.NLLLoss()

    with torch.no_grad():
        logits = model(data.x.to(device), data.edge_index.to(device))

        # Convert logits to log probabilities (required by NLLLoss)
        log_probs = F.log_softmax(logits[mask], dim=1)

        # Get the predicted classes for accuracy/F1 calculations
        preds = logits[mask].argmax(dim=1)

        true_labels = data.y[mask].to(device)

        # Calculate F1 score using true labels and predictions
        f1 = f1_score(true_labels.cpu().numpy(), preds.cpu().numpy(), average='weighted')

        # Calculate negative log likelihood loss
        nll_loss = loss_function(log_probs, true_labels)

        # # Calculate log loss manually for comparison and potential use (this is informational)
        # probabilities = torch.softmax(logits[mask], dim=1).cpu().numpy()
        # logloss_sklearn = log_loss(true_labels.cpu().numpy(), probabilities, labels=np.unique(true_labels.cpu().numpy()))

    return f1, nll_loss.item()


# Define all possible labels explicitly (for example, labels from 0 to 31)
all_labels = list(range(32))  # Adjust this range based on your specific dataset

f1_score_mean, logloss_mean = evaluate(model_mean, subset_graph, subset_graph.val_mask)
print(f"Mean F1 Score: {f1_score_mean:.4f}, Log Loss: {logloss_mean:.4f}" )

f1_score_sum, logloss_sum = evaluate(model_sum, subset_graph, subset_graph.val_mask)
print(f"Sum F1 Score: {f1_score_sum:.4f}, Log Loss: {logloss_sum:.4f}" )

f1_score_max, logloss_max = evaluate(model_max, subset_graph, subset_graph.val_mask)
print(f"Max F1 Score: {f1_score_max:.4f}, Log Loss: {logloss_max:.4f}" )

### Section 5: Experimenting with Additional Aggregation Methods and Parameters
In this section, we extend our experimentation with aggregation methods by testing layer-wise aggregations and advanced configurations. By using different combinations of aggregation strategies within the layers of the GraphSAGE model, we aim to identify which configurations yield the best performance.

#### Layer-Wise Aggregation
We experiment with applying different aggregation functions at each layer of the GraphSAGE model. This approach allows us to understand the impact of combining multiple aggregation methods within a single model architecture.

#### Custom Aggregation Configurations
We test several custom configurations:

Sum-Max Aggregation: Applying sum aggregation in the first layer and max aggregation in the second layer.
Mean-Max Aggregation: Applying mean aggregation in the first layer and max aggregation in the second layer.
Multiple Aggregations: Using a combination of 'max', 'sum', and 'mean' aggregations within the layers.
Advanced Aggregations: Incorporating advanced aggregation techniques like Softmax and Std aggregation.
Jumping Knowledge Aggregation
We also implement Jumping Knowledge (JK) networks, which aggregate information from multiple layers to improve the model's representational power.

#### Training and Evaluation
For each aggregation method and configuration, we train the model and evaluate its performance on a validation set. The training involves optimizing the model parameters using the Adam optimizer and computing the loss with CrossEntropyLoss.

#### Performance Comparison
We plot the training and validation loss curves for each aggregation method to compare their performance visually. Additionally, we calculate and compare the F1 scores and log loss values to quantitatively assess the effectiveness of each method.

In [None]:
# layer-wise aggregation

import torch
from torch_geometric.nn import SAGEConv

class CustomGraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGraphSAGE, self).__init__()
        # First layer with mean aggregation
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr='sum')
        # Second layer with max aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr='max')


    def forward(self, x, edge_index):
        # Apply the first layer and ReLU activation function
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        # Apply the second layer
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)

# Instantiate the model and move it to the appropriate device
model_layer_max_sum = CustomGraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_max_sum.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_layer_max_sum, val_losses_layer_max_sum = train_and_evaluate(model_layer_max_sum, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer_max_sum, label='Train Loss')
plt.plot(val_losses_layer_max_sum, label='Val Loss', linestyle='--')
plt.title('Layer_max_sum Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
# layer-wise aggregation

import torch
from torch_geometric.nn import SAGEConv

class CustomGraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGraphSAGE, self).__init__()
        # First layer with mean aggregation
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr='mean')
        # Second layer with max aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr='max')


    def forward(self, x, edge_index):
        # Apply the first layer and ReLU activation function
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        # Apply the second layer
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)

# Instantiate the model and move it to the appropriate device
model_layer_sum_mean = CustomGraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_sum_mean.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_layer_sum_mean, val_losses_layer_sum_mean = train_and_evaluate(model_layer_sum_mean, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer_sum_mean, label='Train Loss')
plt.plot(val_losses_layer_sum_mean, label='Val Loss', linestyle='--')
plt.title('Layer_sum_mean Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
# layer-wise aggregation

import torch
from torch_geometric.nn import SAGEConv

class CustomGraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGraphSAGE, self).__init__()
        # First layer with mean aggregation
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr='mean')
        # Second layer with max aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr='max')


    def forward(self, x, edge_index):
        # Apply the first layer and ReLU activation function
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        # Apply the second layer
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)

# Instantiate the model and move it to the appropriate device
model_layer_sum_mean = CustomGraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_sum_mean.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_layer_sum_mean, val_losses_layer_sum_mean = train_and_evaluate(model_layer_sum_mean, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer_sum_mean, label='Train Loss')
plt.plot(val_losses_layer_sum_mean, label='Val Loss', linestyle='--')
plt.title('Layer_sum_mean Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
# layer-wise aggregation

import torch
from torch_geometric.nn import SAGEConv

class CustomGraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGraphSAGE, self).__init__()
        # First layer with mean aggregation
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr=['max', 'sum', 'mean'])
        # Second layer with max aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr='mean')


    def forward(self, x, edge_index):
        # Apply the first layer and ReLU activation function
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        # Apply the second layer
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)

# Instantiate the model and move it to the appropriate device
model_layer_list1 = CustomGraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_list1.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_layer_list1, val_losses_layer_list1 = train_and_evaluate(model_layer_list1, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer_list1, label='Train Loss')
plt.plot(val_losses_layer_list1, label='Val Loss', linestyle='--')
plt.title('Layer_list1 Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
# layer-wise aggregation

import torch
from torch_geometric.nn import SAGEConv

class CustomGraphSAGE(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGraphSAGE, self).__init__()
        # First layer with mean aggregation
        self.conv1 = SAGEConv(in_channels, hidden_channels, aggr=[SoftmaxAggregation(), StdAggregation() ])
        # Second layer with max aggregation
        self.conv2 = SAGEConv(hidden_channels, out_channels, aggr='mean' )


# model_10_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.5, aggr=['max', 'sum', 'mean']).to(device)
# model_50_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.50, aggr=[SoftmaxAggregation(), StdAggregation() ] ).to(device)


    def forward(self, x, edge_index):
        # Apply the first layer and ReLU activation function
        x = self.conv1(x, edge_index)
        x = torch.relu(x)
        # Apply the second layer
        x = self.conv2(x, edge_index)
        return torch.log_softmax(x, dim=1)

# Instantiate the model and move it to the appropriate device
model_layer_list2 = CustomGraphSAGE(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_list2.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate the models
train_losses_layer_list2, val_losses_layer_list2 = train_and_evaluate(model_layer_list2, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer_list2, label='Train Loss')
plt.plot(val_losses_layer_list2, label='Val Loss', linestyle='--')
plt.title('Layer_sum_mean Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
# knowledge jumping

In [None]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv, JumpingKnowledge

class CustomGCN(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(CustomGCN, self).__init__()
        self.conv1 = GCNConv(in_channels, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, out_channels)

        # Initialize Jumping Knowledge with concatenation mode
        self.jk = JumpingKnowledge(mode='cat')

    def forward(self, x, edge_index):
        # List to save outputs from each layer for JK
        layer_outputs = []

        # First layer
        x1 = self.conv1(x, edge_index)
        x1 = F.relu(x1)
        layer_outputs.append(x1)

        # Second layer
        x2 = self.conv2(x1, edge_index)
        layer_outputs.append(x2)

        # Apply Jumping Knowledge to aggregate layer outputs
        x = self.jk(layer_outputs)

        return x



# Instantiate the model and move it to the appropriate device
model_layer_jk = CustomGCN(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer_jk.parameters(), lr=0.01)  # Corrected to use model_layer_jk

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

# Function to train and evaluate the models


def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    model.to(device)  # Ensure the model is on the correct device
    train_losses, val_losses = [], []

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()

        # Ensure data is on the correct device
        data.x = data.x.to(device)
        data.edge_index = data.edge_index.to(device)
        data.y = data.y.to(device)  # Move labels to GPU if using GPU

        # Forward pass
        output = model(data.x, data.edge_index)

        # Only compute loss on the training mask
        loss = criterion(output[data.train_mask], data.y[data.train_mask])
        loss.backward()
        optimizer.step()

        # Validation phase
        model.eval()
        with torch.no_grad():
            val_output = model(data.x, data.edge_index)
            val_loss = criterion(val_output[data.val_mask], data.y[data.val_mask])

        train_losses.append(loss.item())
        val_losses.append(val_loss.item())

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Train Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}")

    return train_losses, val_losses

# # Dummy data for demonstration
# train_losses_layer_jk = [0.5 - 0.005*i for i in range(200)]  # Example loss data
# val_losses_layer_jk = [0.6 - 0.004*i for i in range(200)]  # Example validation loss data


train_losses_layer_jk, val_losses_layer_jk = train_and_evaluate(model_layer_jk, subset_graph, device, optimizer_layer, criterion, 500)


# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Plot for Layer Jumping Knowledge aggregation
plt.plot(train_losses_layer_jk, label='Train Loss')
plt.plot(val_losses_layer_jk, label='Val Loss', linestyle='--')
plt.title('Layer with Jumping Knowledge Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()


In [None]:
import torch
from sklearn.metrics import f1_score, log_loss

import torch
import torch.nn.functional as F
from sklearn.metrics import f1_score
import numpy as np

def evaluate(model, data, mask):
    device = data.x.device  # Ensure that model and data are on the same device
    model.to(device)
    model.eval()

    # Initialize the loss function for negative log likelihood
    loss_function = torch.nn.NLLLoss()

    with torch.no_grad():
        logits = model(data.x.to(device), data.edge_index.to(device))

        # Convert logits to log probabilities (required by NLLLoss)
        log_probs = F.log_softmax(logits[mask], dim=1)

        # Get the predicted classes for accuracy/F1 calculations
        preds = logits[mask].argmax(dim=1)

        true_labels = data.y[mask].to(device)

        # Calculate F1 score using true labels and predictions
        f1 = f1_score(true_labels.cpu().numpy(), preds.cpu().numpy(), average='weighted')

        # Calculate negative log likelihood loss
        nll_loss = loss_function(log_probs, true_labels)

        # # Calculate log loss manually for comparison and potential use (this is informational)
        # probabilities = torch.softmax(logits[mask], dim=1).cpu().numpy()
        # logloss_sklearn = log_loss(true_labels.cpu().numpy(), probabilities, labels=np.unique(true_labels.cpu().numpy()))

    return f1, nll_loss.item()
# Mean F1 Score: 0.7406, Log Loss: 2.1215

# Define all possible labels explicitly (for example, labels from 0 to 31)
all_labels = list(range(32))  # Adjust this range based on your specific dataset

f1_score_mean, logloss_mean = evaluate(model_mean, subset_graph, subset_graph.val_mask)
print(f"Mean F1 Score: {f1_score_mean:.4f}, Log Loss: {logloss_mean:.4f}" )

f1_score_sum, logloss_sum = evaluate(model_sum, subset_graph, subset_graph.val_mask)
print(f"Sum F1 Score: {f1_score_sum:.4f}, Log Loss: {logloss_sum:.4f}" )

f1_score_max, logloss_max = evaluate(model_max, subset_graph, subset_graph.val_mask)
print(f"Max F1 Score: {f1_score_max:.4f}, Log Loss: {logloss_max:.4f}" )

f1_score_layer_sum_mean, logloss_layer_sum_mean = evaluate(model_layer_sum_mean, subset_graph, subset_graph.val_mask)
print(f"Layer_sum_mean F1 Score: {f1_score_layer_sum_mean:.4f}, Log Loss: {logloss_layer_sum_mean:.4f}" )

f1_score_layer_max_sum, logloss_layer_max_sum = evaluate(model_layer_max_sum, subset_graph, subset_graph.val_mask)
print(f"Layer_max_sum F1 Score: {f1_score_layer_max_sum:.4f}, Log Loss: {logloss_layer_max_sum:.4f}" )

f1_score_layer_list1, logloss_layer_list1 = evaluate(model_layer_list1, subset_graph, subset_graph.val_mask)
print(f"Layer_list1 F1 Score: {f1_score_layer_list1:.4f}, Log Loss: {logloss_layer_list1:.4f}" )

f1_score_layer_list2, logloss_layer_list2 = evaluate(model_layer_list2, subset_graph, subset_graph.val_mask)
print(f"Layer__list2 F1 Score: {f1_score_layer_list2:.4f}, Log Loss: {logloss_layer_list2:.4f}" )

f1_score_layer_jk, logloss_layer_jk = evaluate(model_layer_jk, subset_graph, subset_graph.val_mask)
print(f"Layer_jk F1 Score: {f1_score_layer_jk:.4f}, Log Loss: {logloss_layer_jk:.4f}" )

### Section 6: Experimenting with Dropout
In this section, we experiment with different dropout rates in the GCN model to understand their impact on model performance. Dropout is a regularization technique that helps prevent overfitting by randomly dropping out neurons during the training process. We aim to explore how various dropout rates affect the training and validation performance of our models.

#### Dropout in GCN
We modify the GCN model to include a dropout layer. This model will apply dropout at each layer, except the final one, to observe its effect on the model's ability to generalize.

#### Training and Evaluation with and without Dropout
We train and evaluate the model with a specific dropout rate and compare its performance against a model without dropout. This comparison helps us understand the benefits and potential drawbacks of using dropout in GCNs.

#### Performance Comparison with Different Dropout Rates
We experiment with different dropout rates (e.g., 0%, 50%, and 85%) to observe how each setting impacts the training and validation loss curves. By plotting these curves, we can visualize the effects of dropout on model convergence and stability.

#### Summarizing Results
Dropout Rates: We experiment with three different dropout rates—0% (no dropout), 50%, and 85%—to evaluate their impact on model performance.
Training and Validation Losses: We plot the training and validation loss curves for each dropout rate to visualize how dropout affects model training and generalization.
Performance Metrics: We calculate precision, recall, and F1 scores for each model to quantitatively assess the effectiveness of dropout in improving model performance.
Confusion Matrix: We generate confusion matrices for each dropout rate to analyze the model's classification accuracy and error distribution.

In [None]:
#dropout


In [None]:
class GCNWithDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers=3, dropout_rate=0.5):
        super(GCNWithDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        self.dropout_rate = dropout_rate
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)


In [None]:
# Instantiate the model and move it to the appropriate device
model_layer = GCNWithDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes).to(device)

# Create an optimizer for the model's parameters
optimizer_layer = torch.optim.Adam(model_layer.parameters(), lr=0.01)

# Define the loss criterion
criterion = torch.nn.CrossEntropyLoss()

def train_and_evaluate(model, data, optimizer, criterion, epochs):
    train_losses, val_losses = [], []

    for epoch in range(epochs):
        train_loss = train_model(model, data, optimizer, criterion)
        val_loss = evaluate_model(model, data, criterion)
        train_losses.append(train_loss)
        val_losses.append(val_loss)

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

    return train_losses, val_losses

# Train and evaluate the models
train_losses_layer, val_losses_layer = train_and_evaluate(model_layer, subset_graph, optimizer_layer, criterion, 200)

# Plotting training and validation loss curves
plt.figure(figsize=(14, 7))

# Mean aggregation curves
plt.subplot(1, 3, 1)
plt.plot(train_losses_layer, label='Train Loss')
plt.plot(val_losses_layer, label='Val Loss', linestyle='--')
plt.title('Mean Aggregation')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

In [None]:
#compare dropout with no-dropout

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCNWithOptionalDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers=3, dropout_rate=0):
        super(GCNWithOptionalDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        self.dropout_rate = dropout_rate
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            if self.dropout_rate > 0:  # Apply dropout only if rate is greater than 0
                x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)


# Model without dropout
model_no_dropout = GCNWithOptionalDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, num_layers=3, dropout_rate=0).to(device)

# Model with dropout
model_with_dropout = GCNWithOptionalDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, num_layers=3, dropout_rate=.8).to(device)

# Optimizers
optimizer_no_dropout = torch.optim.Adam(model_no_dropout.parameters(), lr=0.01)
optimizer_with_dropout = torch.optim.Adam(model_with_dropout.parameters(), lr=0.01)

# Training and evaluating both models
train_losses_no_dropout, val_losses_no_dropout = train_and_evaluate(model_no_dropout, subset_graph, optimizer_no_dropout, criterion, 200)
train_losses_with_dropout, val_losses_with_dropout = train_and_evaluate(model_with_dropout, subset_graph, optimizer_with_dropout, criterion, 200)


plt.figure(figsize=(14, 7))

# Training and validation losses without dropout
plt.subplot(1, 2, 1)
plt.plot(train_losses_no_dropout, label='Train Loss - No Dropout')
plt.plot(val_losses_no_dropout, label='Val Loss - No Dropout', linestyle='--')
plt.title('Performance without Dropout')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# Training and validation losses with dropout
plt.subplot(1, 2, 2)
plt.plot(train_losses_with_dropout, label='Train Loss - With Dropout')
plt.plot(val_losses_with_dropout, label='Val Loss - With Dropout', linestyle='--')
plt.title('Performance with Dropout')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

plt.tight_layout()
plt.show()


In [None]:
# 3 dropout cases

class GCNWithCustomDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers=3, dropout_rate=0):
        super(GCNWithCustomDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        self.dropout_rate = dropout_rate
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            if self.dropout_rate > 0:  # Apply dropout only if rate is greater than 0
                x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)


# Models with different dropout rates
model_dropout_0 = GCNWithCustomDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, dropout_rate=0).to(device)
model_dropout_05 = GCNWithCustomDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, dropout_rate=0.5).to(device)
model_dropout_085 = GCNWithCustomDropout(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, dropout_rate=0.85).to(device)

# Optimizers
optimizer_0 = torch.optim.Adam(model_dropout_0.parameters(), lr=0.01)
optimizer_05 = torch.optim.Adam(model_dropout_05.parameters(), lr=0.01)
optimizer_085 = torch.optim.Adam(model_dropout_085.parameters(), lr=0.01)

# Training and evaluating all models
train_losses_0, val_losses_0 = train_and_evaluate(model_dropout_0, subset_graph, optimizer_0, criterion, 200)
train_losses_05, val_losses_05 = train_and_evaluate(model_dropout_05, subset_graph, optimizer_05, criterion, 200)
train_losses_085, val_losses_085 = train_and_evaluate(model_dropout_085, subset_graph, optimizer_085, criterion, 200)

max_epoch = 200

# Assuming you have data for your plots:
# train_losses_0, val_losses_0, train_losses_05, val_losses_05, train_losses_085, val_losses_085

# Determine common limits for x and y axes
all_losses = train_losses_0 + val_losses_0 + train_losses_05 + val_losses_05 + train_losses_085 + val_losses_085
x_limits = (0, max_epoch)  # replace max_epoch with your actual max epoch
y_limits = (min(all_losses), max(all_losses))

# Create subplots
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))

# Plot for 0% dropout
ax1.plot(train_losses_0, label='Train Loss - 0% Dropout')
ax1.plot(val_losses_0, label='Val Loss - 0% Dropout', linestyle='--')
ax1.set_title('Performance with 0% Dropout')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax1.set_xlim(x_limits)
ax1.set_ylim(y_limits)

# Plot for 50% dropout
ax2.plot(train_losses_05, label='Train Loss - 50% Dropout')
ax2.plot(val_losses_05, label='Val Loss - 50% Dropout', linestyle='--')
ax2.set_title('Performance with 50% Dropout')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend()
ax2.set_xlim(x_limits)
ax2.set_ylim(y_limits)

# Plot for 85% dropout
ax3.plot(train_losses_085, label='Train Loss - 85% Dropout')
ax3.plot(val_losses_085, label='Val Loss - 85% Dropout', linestyle='--')
ax3.set_title('Performance with 85% Dropout')
ax3.set_xlabel('Epoch')
ax3.set_ylabel('Loss')
ax3.legend()
ax3.set_xlim(x_limits)
ax3.set_ylim(y_limits)

plt.tight_layout()
plt.show()



In [None]:
# Function to get predictions from a model
def get_predictions(model, data):
    model.eval()
    with torch.no_grad():
        out = model(data.x.to(device), data.edge_index.to(device))
        preds = torch.argmax(out, dim=1)
        return [index_product_dict[pred.item()] for pred in preds]

all_true_labels = [index_product_dict[label.item()] for label in subset_graph.y.squeeze()]


# Analyze predictions from models with different dropout rates
predictions_0_dropout = get_predictions(model_dropout_0, subset_graph)
predictions_50_dropout = get_predictions(model_dropout_05, subset_graph)
predictions_85_dropout = get_predictions(model_dropout_085, subset_graph)

# Perform analysis for each dropout rate
analysis_0_dropout = analyze_category_predictions(all_true_labels, predictions_0_dropout)
analysis_50_dropout = analyze_category_predictions(all_true_labels, predictions_50_dropout)
analysis_85_dropout = analyze_category_predictions(all_true_labels, predictions_85_dropout)

# Convert the analysis results into DataFrames
df_0_dropout = create_dataframe(analysis_0_dropout)
df_50_dropout = create_dataframe(analysis_50_dropout)
df_85_dropout = create_dataframe(analysis_85_dropout)

# Merge the DataFrames for comparison
comparison_df = pd.merge(df_0_dropout, df_50_dropout, on='Category', suffixes=('_0', '_50'))
comparison_df = pd.merge(comparison_df, df_85_dropout, on='Category')
comparison_df.rename(columns={
    'Correct Predictions (%)': 'Correct Predictions (%)_85',
    'Most Common Misprediction': 'Most Common Misprediction_85',
    'Total Count': 'Total Count_85'
}, inplace=True)

# Set the desired order for the columns
column_order = ['Category',
                'Correct Predictions (%)_0', 'Most Common Misprediction_0', 'Total Count_0',
                'Correct Predictions (%)_50', 'Most Common Misprediction_50', 'Total Count_50',
                'Correct Predictions (%)_85', 'Most Common Misprediction_85', 'Total Count_85']

# Reindex the DataFrame based on the new column order and display it
comparison_df = comparison_df[column_order]
comparison_df


In [None]:
import torch
import torch.nn.functional as F
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

def test_model(model, data):
    model.eval()  # Set the model to evaluation mode
    with torch.no_grad():
        out = model(data.x.to(device), data.edge_index.to(device))  # Forward pass
        preds = torch.argmax(out, dim=1)[data.test_mask]  # Only take the predictions from the test mask
        true_labels = data.y[data.test_mask]  # True labels for the test data
    return preds.cpu().numpy(), true_labels.cpu().numpy()

# Assuming the model is already trained and test_data is properly configured
predictions, true_labels = test_model(model_dropout_05, subset_graph)

# Calculate precision, recall, and F1 score
precision = precision_score(true_labels, predictions, average='weighted')
recall = recall_score(true_labels, predictions, average='weighted')
f1 = f1_score(true_labels, predictions, average='weighted')

# Compute the confusion matrix
cm = confusion_matrix(true_labels, predictions)

# Display the metrics
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()


In [None]:
from torch_geometric.loader import DataLoader
#
# Create the test dataset by filtering out the test nodes
test_data = subset_graph.clone()  # Clone the graph data to preserve the original graph
test_data.x = subset_graph.x[subset_graph.test_mask]  # Node features for test set
test_data.edge_index = subset_graph.edge_index  # Edge index is typically the same for the test set
test_data.y = subset_graph.y[subset_graph.test_mask]  # Labels for the test set

# Make sure to adjust edge_index to only include edges from the test set nodes if necessary

# Now you can use test_data to get predictions from your models



In [None]:
# Assuming test_data is prepared with the correct test mask and device setup

# Test and evaluate each model
preds_0, labels_0 = test_model(model_dropout_0, subset_graph)
preds_05, labels_05 = test_model(model_dropout_05, subset_graph)
preds_085, labels_085 = test_model(model_dropout_085, subset_graph)

# Compute metrics for each model
precision_0 = precision_score(labels_0, preds_0, average='weighted')
recall_0 = recall_score(labels_0, preds_0, average='weighted')
f1_0 = f1_score(labels_0, preds_0, average='weighted')
cm_0 = confusion_matrix(labels_0, preds_0)

precision_05 = precision_score(labels_05, preds_05, average='weighted')
recall_05 = recall_score(labels_05, preds_05, average='weighted')
f1_05 = f1_score(labels_05, preds_05, average='weighted')
cm_05 = confusion_matrix(labels_05, preds_05)

precision_085 = precision_score(labels_085, preds_085, average='weighted')
recall_085 = recall_score(labels_085, preds_085, average='weighted')
f1_085 = f1_score(labels_085, preds_085, average='weighted')
cm_085 = confusion_matrix(labels_085, preds_085)

# Print metrics for 0% Dropout
print("0% Dropout:")
print(f"Precision: {precision_0:.4f}, Recall: {recall_0:.4f}, F1 Score: {f1_0:.4f}")
sns.heatmap(cm_0, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 0% Dropout')
plt.show()

# Print metrics for 50% Dropout
print("50% Dropout:")
print(f"Precision: {precision_05:.4f}, Recall: {recall_05:.4f}, F1 Score: {f1_05:.4f}")
sns.heatmap(cm_05, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 50% Dropout')
plt.show()

# Print metrics for 85% Dropout
print("85% Dropout:")
print(f"Precision: {precision_085:.4f}, Recall: {recall_085:.4f}, F1 Score: {f1_085:.4f}")
sns.heatmap(cm_085, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 85% Dropout')
plt.show()


In [None]:
# learning rate, depth

### Section 7: Experimenting with Model Depth
In this section, we experiment with the depth of the GCN by varying the number of layers in the model. Model depth can significantly impact the learning capacity and generalization ability of neural networks. We aim to explore how different depths affect the training and validation performance of our models.

#### Variable Depth in GCN
We modify the GCN model to accept a variable number of layers. This flexibility allows us to create models with different depths and evaluate their performance on the same dataset.

#### Training and Evaluation with Different Depths
We initialize three models with varying depths:

2 Layers: A shallow model with one hidden layer.
10 Layers: A moderately deep model with multiple hidden layers.
50 Layers: A very deep model with many hidden layers.
Each model is trained and evaluated using the same training and validation procedures to ensure a fair comparison.

#### Performance Comparison
We compare the training and validation loss curves for each model depth to understand how increasing the number of layers affects the model's ability to learn and generalize. By plotting these curves, we can visualize the impact of model depth on convergence and stability.

#### Summary of Results
Model Depths: We experiment with three different depths—2 layers, 10 layers, and 50 layers—to evaluate their impact on model performance.
Training and Validation Losses: We plot the training and validation loss curves for each model depth to visualize how the number of layers affects the learning process.
Performance Metrics: We calculate and compare the training and validation losses to assess the effectiveness of each model depth.

In [None]:
class GCNWithVariableDepth(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers=2):
        super(GCNWithVariableDepth, self).__init__()
        self.layers = torch.nn.ModuleList()
        # Base layer
        self.layers.append(GCNConv(in_channels, hidden_channels))
        # Hidden layers
        for _ in range(num_layers - 2):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        # Output layer
        self.layers.append(GCNConv(hidden_channels, out_channels))

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)


In [None]:
# Initialize models with different depths
model_2_layers = GCNWithVariableDepth(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, num_layers=2).to(device)
model_10_layers = GCNWithVariableDepth(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, num_layers=5).to(device)
model_50_layers = GCNWithVariableDepth(in_channels=subset_graph.num_features, hidden_channels=64, out_channels=dataset.num_classes, num_layers=20).to(device)

optimizer_2 = torch.optim.Adam(model_2_layers.parameters(), lr=0.01)
optimizer_10 = torch.optim.Adam(model_10_layers.parameters(), lr=0.01)
optimizer_50 = torch.optim.Adam(model_50_layers.parameters(), lr=0.01)


In [None]:
# Example training and evaluation function
import torch
import torch.nn.functional as F

def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    model.to(device)  # Ensure the model is on the correct device
    train_losses, val_losses = [], []

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()

        # Ensure data is on the correct device
        data.x = data.x.to(device)
        data.edge_index = data.edge_index.to(device)
        data.y = data.y.to(device)  # Move labels to GPU if using GPU

        # Forward pass
        output = model(data.x, data.edge_index)

        # Only compute loss on the training mask
        loss = criterion(output[data.train_mask], data.y[data.train_mask])
        loss.backward()
        optimizer.step()

        # Validation phase
        model.eval()
        with torch.no_grad():
            val_output = model(data.x, data.edge_index)
            val_loss = criterion(val_output[data.val_mask], data.y[data.val_mask])

        train_losses.append(loss.item())
        val_losses.append(val_loss.item())

        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Train Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}")

    return train_losses, val_losses




In [None]:
# Assuming 'train_and_evaluate' is defined to include training and validation
train_and_evaluate(model_2_layers, subset_graph, device, epochs=100, optimizer=optimizer_2, criterion=torch.nn.CrossEntropyLoss())
train_and_evaluate(model_10_layers, subset_graph, device, epochs=100, optimizer=optimizer_10, criterion=torch.nn.CrossEntropyLoss())
train_and_evaluate(model_50_layers, subset_graph, device, epochs=100, optimizer=optimizer_50, criterion=torch.nn.CrossEntropyLoss())


In [None]:
import torch
import torch.nn.functional as F

def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    model.to(device)  # Ensure the model is on the correct device
    train_losses, val_losses = [], []

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()

        # Move data to the correct device
        data.x = data.x.to(device)
        data.edge_index = data.edge_index.to(device)
        data.y = data.y.to(device)  # Move labels to GPU if using GPU

        # Forward pass and loss calculation on the training mask
        output = model(data.x, data.edge_index)
        train_loss = criterion(output[data.train_mask], data.y[data.train_mask])
        train_loss.backward()
        optimizer.step()

        # Validation loss calculation on the validation mask
        model.eval()
        with torch.no_grad():
            val_output = model(data.x, data.edge_index)
            val_loss = criterion(val_output[data.val_mask], data.y[data.val_mask])

        train_losses.append(train_loss.item())
        val_losses.append(val_loss.item())

        if epoch % 10 == 0:
            print(f"Epoch {epoch}: Train Loss: {train_loss.item():.4f}, Val Loss: {val_loss.item():.4f}")

    return train_losses, val_losses

# Models with 2, 10, and 50 layers
train_losses_2, val_losses_2 = train_and_evaluate(model_2_layers, subset_graph, device, optimizer_2, torch.nn.CrossEntropyLoss(), 100)
train_losses_10, val_losses_10 = train_and_evaluate(model_10_layers, subset_graph, device, optimizer_10, torch.nn.CrossEntropyLoss(), 100)
train_losses_50, val_losses_50 = train_and_evaluate(model_50_layers, subset_graph, device, optimizer_50, torch.nn.CrossEntropyLoss(), 100)

# Printing training and validation losses for each model configuration
print("Model with 2 layers - Train Losses:", train_losses_2)
print("Model with 2 layers - Validation Losses:", val_losses_2)

print("Model with 10 layers - Train Losses:", train_losses_10)
print("Model with 10 layers - Validation Losses:", val_losses_10)

print("Model with 50 layers - Train Losses:", train_losses_50)
print("Model with 50 layers - Validation Losses:", val_losses_50)


In [None]:
import matplotlib.pyplot as plt

def plot_losses(train_losses, val_losses, title):
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss', linestyle='--')
    plt.title(title)
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)
    plt.show()

# Assuming you have already collected the losses in the variables below
# train_losses_2, val_losses_2
# train_losses_10, val_losses_10
# train_losses_50, val_losses_50

plot_losses(train_losses_2, val_losses_2, "Training and Validation Losses for 2 Layers")
plot_losses(train_losses_10, val_losses_10, "Training and Validation Losses for 10 Layers")
plot_losses(train_losses_50, val_losses_50, "Training and Validation Losses for 50 Layers")


### Section 8: Combining Concepts into a Comprehensive Model
In this section, we integrate the various concepts explored earlier, including dropout, aggregation methods, and model depth, into a comprehensive model. This combined approach allows us to leverage the strengths of each technique to optimize our Graph Convolutional Network (GCN) performance.

#### Combined Model Architecture
We utilize a flexible GCN model that incorporates dropout, different aggregation strategies, and variable depth. This model aims to balance the trade-offs between model complexity, regularization, and representational power.

#### Model Initialization
We initialize three models with varying depths (2, 10, and 50 layers) and different dropout rates:

2 Layers: Dropout rate of 50%, Softmax Aggregation.
10 Layers: Dropout rate of 50%, combination of 'max', 'sum', and 'mean' aggregations.
50 Layers: Dropout rate of 50%, advanced aggregation techniques (Softmax and Std Aggregation).
Training and Evaluation
Each model is trained and evaluated using the same dataset and training procedure. We record the training and validation losses and compute performance metrics such as precision, recall, F1 score, and log loss.

#### Performance Comparison
We compare the training and validation loss curves, as well as the performance metrics, for each model to understand the impact of combining dropout, aggregation methods, and varying depths on the overall model performance.

In [None]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
import matplotlib.pyplot as plt

class GCNWithCustomDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers, dropout_rate=0.5):
        super(GCNWithCustomDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))
        self.dropout_rate = dropout_rate

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)

def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    train_losses, val_losses = [], []
    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        out = model(data.x.to(device), data.edge_index.to(device))
        loss = criterion(out[data.train_mask], data.y[data.train_mask].to(device))
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

        model.eval()
        with torch.no_grad():
            out = model(data.x.to(device), data.edge_index.to(device))
            val_loss = criterion(out[data.val_mask], data.y[data.val_mask].to(device))
            val_losses.append(val_loss.item())
    return train_losses, val_losses

# Initialize models with 50% dropout
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_2_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.5).to(device)
model_10_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 10, dropout_rate=0.5).to(device)
model_50_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 50, dropout_rate=0.5).to(device)

# Create optimizers and criterion
optimizer_2 = torch.optim.Adam(model_2_layers.parameters(), lr=0.01)
optimizer_10 = torch.optim.Adam(model_10_layers.parameters(), lr=0.01)
optimizer_50 = torch.optim.Adam(model_50_layers.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate each model
train_losses_2, val_losses_2 = train_and_evaluate(model_2_layers, subset_graph, device, optimizer_2, criterion, 100)
train_losses_10, val_losses_10 = train_and_evaluate(model_10_layers, subset_graph, device, optimizer_10, criterion, 100)
train_losses_50, val_losses_50 = train_and_evaluate(model_50_layers, subset_graph, device, optimizer_50, criterion, 100)

# Plot the losses
plot_losses(train_losses_2, val_losses_2, "Training and Validation Losses for 2 Layers with 50% Dropout")
plot_losses(train_losses_10, val_losses_10, "Training and Validation Losses for 10 Layers with 50% Dropout")
plot_losses(train_losses_50, val_losses_50, "Training and Validation Losses for 50 Layers with 50% Dropout")


In [None]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from sklearn.metrics import precision_recall_fscore_support
import matplotlib.pyplot as plt

class GCNWithCustomDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers, dropout_rate=0.5):
        super(GCNWithCustomDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))
        self.dropout_rate = dropout_rate

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)

def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    train_losses, val_losses = [], []
    best_metrics = {'precision': 0, 'recall': 0, 'f1_score': 0}

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        out = model(data.x.to(device), data.edge_index.to(device))
        loss = criterion(out[data.train_mask], data.y[data.train_mask].to(device))
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

        model.eval()
        with torch.no_grad():
            out = model(data.x.to(device), data.edge_index.to(device))
            val_loss = criterion(out[data.val_mask], data.y[data.val_mask].to(device))
            val_losses.append(val_loss.item())

            preds = torch.argmax(out[data.test_mask], dim=1)
            labels = data.y[data.test_mask].to(device)
            precision, recall, f1_score, _ = precision_recall_fscore_support(labels.cpu(), preds.cpu(), average='macro')

            # Update best metrics
            if f1_score > best_metrics['f1_score']:
                best_metrics.update({'precision': precision, 'recall': recall, 'f1_score': f1_score})

    return train_losses, val_losses, best_metrics

def evaluate_predictions(probs, labels):
    epsilon = 1e-9  # Small constant to prevent log(0)
    probs_clamped = torch.clamp(probs, epsilon, 1-epsilon)
    log_probs = torch.log(probs_clamped)
    loss_function = torch.nn.NLLLoss()
    log_loss = loss_function(log_probs, labels)
    return log_loss.item()

def plot_losses(train_losses, val_losses, best_metrics, title):
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss', linestyle='--')
    plt.title(title)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.ylim(0, max(max(train_losses), max(val_losses)) + 0.05)  # Adjust y-limit
    plt.legend()
    # plt.text(0.5, 0.1, f'Best Precision: {best_metrics["precision"]:.4f}\n'
    #                    f'Best Recall: {best_metrics["recall"]:.4f}\n'
    #                    f'Best F1 Score: {best_metrics["f1_score"]:.4f}', horizontalalignment='center',
    #          verticalalignment='center', transform=plt.gca().transAxes, fontsize=10, bbox=dict(facecolor='white', alpha=0.5))
    plt.show()

# Initialize and train models as before, then plot with the new `plot_losses` function including best metrics annotations.

# Initialize models with 50% dropout
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_2_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0).to(device)
model_10_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.25).to(device)
model_50_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.50).to(device)

# Create optimizers and criterion
optimizer_2 = torch.optim.Adam(model_2_layers.parameters(), lr=0.01)
optimizer_10 = torch.optim.Adam(model_10_layers.parameters(), lr=0.01)
optimizer_50 = torch.optim.Adam(model_50_layers.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate each model
train_losses_2, val_losses_2, best_metrics_2 = train_and_evaluate(model_2_layers, subset_graph, device, optimizer_2, criterion, 12000)
train_losses_10, val_losses_10 , best_metrics_10= train_and_evaluate(model_10_layers, subset_graph, device, optimizer_10, criterion, 12000)
train_losses_50, val_losses_50 , best_metrics_50 = train_and_evaluate(model_50_layers, subset_graph, device, optimizer_50, criterion, 12000)

# Plot the losses with aligned axes
max_loss = max(max(train_losses_2 + val_losses_2), max(train_losses_10 + val_losses_10), max(train_losses_50 + val_losses_50))


# Plot the losses
plot_losses(train_losses_2, val_losses_2, best_metrics_2, "Training and Validation Losses for 2 Layers with 50% Dropout")
plot_losses(train_losses_10, val_losses_10, best_metrics_10, "Training and Validation Losses for 10 Layers with 50% Dropout")
plot_losses(train_losses_50, val_losses_50, best_metrics_50, "Training and Validation Losses for 50 Layers with 50% Dropout")


In [None]:
#


In [None]:
import torch
import torch.nn.functional as F
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

def test_model(model, data, device):
    model.eval()
    with torch.no_grad():
        logits = model(data.x.to(device), data.edge_index.to(device))
        probs = F.softmax(logits, dim=1)
        preds = torch.argmax(probs, dim=1)
        return preds, data.y.to(device), probs

def evaluate_predictions(probs, labels):
    epsilon = 1e-9  # Small constant to prevent log(0)
    probs_clamped = torch.clamp(probs, epsilon, 1-epsilon)
    log_probs = torch.log(probs_clamped)
    loss_function = torch.nn.NLLLoss()
    log_loss = loss_function(log_probs, labels)
    return log_loss.item()


# Testing and evaluation of models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
preds_2, labels_2, probs_2 = test_model(model_2_layers, subset_graph, device)
preds_10, labels_10, probs_10 = test_model(model_10_layers, subset_graph, device)
preds_50, labels_50, probs_50 = test_model(model_50_layers, subset_graph, device)

log_loss_2 = evaluate_predictions(probs_2, labels_2)
log_loss_10 = evaluate_predictions(probs_10, labels_10)
log_loss_50 = evaluate_predictions(probs_50, labels_50)

# Compute metrics for each model
precision_2 = precision_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
recall_2 = recall_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
f1_2 = f1_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
cm_2 = confusion_matrix(labels_2.cpu().numpy(), preds_2.cpu().numpy())

precision_10 = precision_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
recall_10 = recall_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
f1_10 = f1_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
cm_10 = confusion_matrix(labels_10.cpu().numpy(), preds_10.cpu().numpy())

precision_50 = precision_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
recall_50 = recall_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
f1_50 = f1_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
cm_50 = confusion_matrix(labels_50.cpu().numpy(), preds_50.cpu().numpy())

# Print metrics for each model
print("2-Layer:")
print(f"Precision: {precision_2:.4f}, Recall: {recall_2:.4f}, F1 Score: {f1_2:.4f}, Log Loss: {log_loss_2:.4f}")
sns.heatmap(cm_2, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 0% Dropout')
plt.show()

print("10-Layer:")
print(f"Precision: {precision_10:.4f}, Recall: {recall_10:.4f}, F1 Score: {f1_10:.4f}, Log Loss: {log_loss_10:.4f}")
sns.heatmap(cm_10, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 50% Dropout')
plt.show()

print("50-Layer:")
print(f"Precision: {precision_50:.4f}, Recall: {recall_50:.4f}, F1 Score: {f1_50:.4f}, Log Loss: {log_loss_50:.4f}")
sns.heatmap(cm_50, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 85% Dropout')
plt.show()


In [None]:
import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from sklearn.metrics import precision_recall_fscore_support
import matplotlib.pyplot as plt

class GCNWithCustomDropout(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers, dropout_rate=0.5, aggr = 'mean'):
        super(GCNWithCustomDropout, self).__init__()
        self.layers = torch.nn.ModuleList([GCNConv(in_channels, hidden_channels)])
        for _ in range(1, num_layers-1):
            self.layers.append(GCNConv(hidden_channels, hidden_channels))
        self.layers.append(GCNConv(hidden_channels, out_channels))
        self.dropout_rate = dropout_rate

    def forward(self, x, edge_index):
        for layer in self.layers[:-1]:
            x = F.relu(layer(x, edge_index))
            x = F.dropout(x, p=self.dropout_rate, training=self.training)
        x = self.layers[-1](x, edge_index)
        return F.log_softmax(x, dim=1)

def train_and_evaluate(model, data, device, optimizer, criterion, epochs):
    train_losses, val_losses = [], []
    best_metrics = {'precision': 0, 'recall': 0, 'f1_score': 0}

    for epoch in range(epochs):
        model.train()
        optimizer.zero_grad()
        out = model(data.x.to(device), data.edge_index.to(device))
        loss = criterion(out[data.train_mask], data.y[data.train_mask].to(device))
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())

        model.eval()
        with torch.no_grad():
            out = model(data.x.to(device), data.edge_index.to(device))
            val_loss = criterion(out[data.val_mask], data.y[data.val_mask].to(device))
            val_losses.append(val_loss.item())

            preds = torch.argmax(out[data.test_mask], dim=1)
            labels = data.y[data.test_mask].to(device)
            precision, recall, f1_score, _ = precision_recall_fscore_support(labels.cpu(), preds.cpu(), average='macro')

            # Update best metrics
            if f1_score > best_metrics['f1_score']:
                best_metrics.update({'precision': precision, 'recall': recall, 'f1_score': f1_score})

    return train_losses, val_losses, best_metrics

def evaluate_predictions(probs, labels):
    epsilon = 1e-9  # Small constant to prevent log(0)
    probs_clamped = torch.clamp(probs, epsilon, 1-epsilon)
    log_probs = torch.log(probs_clamped)
    loss_function = torch.nn.NLLLoss()
    log_loss = loss_function(log_probs, labels)
    return log_loss.item()

def plot_losses(train_losses, val_losses, best_metrics, title):
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label='Train Loss')
    plt.plot(val_losses, label='Validation Loss', linestyle='--')
    plt.title(title)
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.ylim(0, max(max(train_losses), max(val_losses)) + 0.05)  # Adjust y-limit
    plt.legend()
    # plt.text(0.5, 0.1, f'Best Precision: {best_metrics["precision"]:.4f}\n'
    #                    f'Best Recall: {best_metrics["recall"]:.4f}\n'
    #                    f'Best F1 Score: {best_metrics["f1_score"]:.4f}', horizontalalignment='center',
    #          verticalalignment='center', transform=plt.gca().transAxes, fontsize=10, bbox=dict(facecolor='white', alpha=0.5))
    plt.show()

# Initialize and train models as before, then plot with the new `plot_losses` function including best metrics annotations.

# Initialize models with 50% dropout
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model_2_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=.5, aggr= SoftmaxAggregation()).to(device)
model_10_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.5, aggr=['max', 'sum', 'mean']).to(device)
model_50_layers = GCNWithCustomDropout(subset_graph.num_features, 64, dataset.num_classes, 2, dropout_rate=0.50, aggr=[SoftmaxAggregation(), StdAggregation() ] ).to(device)

# Create optimizers and criterion
optimizer_2 = torch.optim.Adam(model_2_layers.parameters(), lr=0.005)
optimizer_10 = torch.optim.Adam(model_10_layers.parameters(), lr=0.005)
optimizer_50 = torch.optim.Adam(model_50_layers.parameters(), lr=0.005)
criterion = torch.nn.CrossEntropyLoss()

# Train and evaluate each model
train_losses_2, val_losses_2, best_metrics_2 = train_and_evaluate(model_2_layers, subset_graph, device, optimizer_2, criterion, 12000)
train_losses_10, val_losses_10 , best_metrics_10= train_and_evaluate(model_10_layers, subset_graph, device, optimizer_10, criterion, 12000)
train_losses_50, val_losses_50 , best_metrics_50 = train_and_evaluate(model_50_layers, subset_graph, device, optimizer_50, criterion, 12000)

# Plot the losses with aligned axes
max_loss = max(max(train_losses_2 + val_losses_2), max(train_losses_10 + val_losses_10), max(train_losses_50 + val_losses_50))


# Plot the losses
plot_losses(train_losses_2, val_losses_2, best_metrics_2, "Training and Validation Losses for 2 Layers with 50% Dropout")
plot_losses(train_losses_10, val_losses_10, best_metrics_10, "Training and Validation Losses for 10 Layers with 50% Dropout")
plot_losses(train_losses_50, val_losses_50, best_metrics_50, "Training and Validation Losses for 50 Layers with 50% Dropout")


In [None]:
import torch
import torch.nn.functional as F
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

def test_model(model, data, device):
    model.eval()
    with torch.no_grad():
        logits = model(data.x.to(device), data.edge_index.to(device))
        probs = F.softmax(logits, dim=1)
        preds = torch.argmax(probs, dim=1)
        return preds, data.y.to(device), probs

def evaluate_predictions(probs, labels):
    epsilon = 1e-9  # Small constant to prevent log(0)
    probs_clamped = torch.clamp(probs, epsilon, 1-epsilon)
    log_probs = torch.log(probs_clamped)
    loss_function = torch.nn.NLLLoss()
    log_loss = loss_function(log_probs, labels)
    return log_loss.item()


# Testing and evaluation of models
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
preds_2, labels_2, probs_2 = test_model(model_2_layers, subset_graph, device)
preds_10, labels_10, probs_10 = test_model(model_10_layers, subset_graph, device)
preds_50, labels_50, probs_50 = test_model(model_50_layers, subset_graph, device)

log_loss_2 = evaluate_predictions(probs_2, labels_2)
log_loss_10 = evaluate_predictions(probs_10, labels_10)
log_loss_50 = evaluate_predictions(probs_50, labels_50)

# Compute metrics for each model
precision_2 = precision_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
recall_2 = recall_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
f1_2 = f1_score(labels_2.cpu().numpy(), preds_2.cpu().numpy(), average='weighted')
cm_2 = confusion_matrix(labels_2.cpu().numpy(), preds_2.cpu().numpy())

precision_10 = precision_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
recall_10 = recall_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
f1_10 = f1_score(labels_10.cpu().numpy(), preds_10.cpu().numpy(), average='weighted')
cm_10 = confusion_matrix(labels_10.cpu().numpy(), preds_10.cpu().numpy())

precision_50 = precision_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
recall_50 = recall_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
f1_50 = f1_score(labels_50.cpu().numpy(), preds_50.cpu().numpy(), average='weighted')
cm_50 = confusion_matrix(labels_50.cpu().numpy(), preds_50.cpu().numpy())

# Print metrics for each model
print("2-Layer:")
print(f"Precision: {precision_2:.4f}, Recall: {recall_2:.4f}, F1 Score: {f1_2:.4f}, Log Loss: {log_loss_2:.4f}")
sns.heatmap(cm_2, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 0% Dropout')
plt.show()

print("10-Layer:")
print(f"Precision: {precision_10:.4f}, Recall: {recall_10:.4f}, F1 Score: {f1_10:.4f}, Log Loss: {log_loss_10:.4f}")
sns.heatmap(cm_10, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 50% Dropout')
plt.show()

print("50-Layer:")
print(f"Precision: {precision_50:.4f}, Recall: {recall_50:.4f}, F1 Score: {f1_50:.4f}, Log Loss: {log_loss_50:.4f}")
sns.heatmap(cm_50, annot=True, fmt="d", cmap="Blues")
plt.title('Confusion Matrix - 85% Dropout')
plt.show()


In [None]:
# Load trained models (assuming they are already trained and loaded into gcn_model and graphsage_model)
gcn_model.eval()
graphsage_model.eval()

with torch.no_grad():
    embeddings = gcn_model(subset_graph.x, subset_graph.edge_index, return_embeds=True)


In [None]:
from sklearn.metrics.pairwise import cosine_similarity

# Compute cosine similarity matrices
similarity_matrix = cosine_similarity(embeddings.cpu().numpy())

# Example usage: Find the top-k most similar items to a given product node
product_idx = 123  # Replace with the index of the product node
top_k = 6

# For GCN embeddings
top_k_similar_indices = np.argsort(-similarity_matrix[product_idx])[:top_k]
print(f"Top {top_k} similar products to product {product_idx} according to GCN:")
print(top_k_similar_indices)