<a href="https://colab.research.google.com/github/saerarawas/AAI_635O_B11_202520-Recommender-System/blob/main/Week/Saera__Rawas_Recommender_System_Course_Project_GitHub_2025.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Graded Assessment -- AAI 6350 Recommender Systems Course --

# Part 1: Recommendation System Using GCNN [weight: 40\%]

# Step 1: Data Preparation
- Load the Data: Read the Excel file and extract the relevant columns (CustomerID, StockCode, Quantity).
- Data Cleaning: Ensure there are no missing values in the relevant columns.
- Create Interaction Matrix: Construct an adjacency matrix where rows represent customers and columns represent items. The values in the matrix will be the quantities purchased.

In [None]:
import pandas as pd
import numpy as np

# Load the dataset
data = pd.read_excel("/content/Rec_sys_data.xlsx")

# Create a pivot table to form the interaction matrix
interaction_matrix = data.pivot_table(index='CustomerID', columns='StockCode', values='Quantity', fill_value=0)

# Convert to a NumPy array for further processing
interaction_matrix = interaction_matrix.values

# Step 2: Graph Construction [25 points]
- Graph Representation: Each customer and item will be a node in the graph. An edge exists between a customer and an item if the customer has purchased that item.
- Adjacency Matrix: Create an adjacency matrix where the rows represent customers and the columns represent items.

In [None]:
# Get unique customers and items (StockCode)
customers = data['CustomerID'].unique()
items = data['StockCode'].unique()

# Create mappings from customer/item IDs to matrix indices
customer_to_index = {customer: i for i, customer in enumerate(customers)}
item_to_index = {item: j for j, item in enumerate(items)}

# Initialize the adjacency matrix with zeros
num_customers = len(customers)
num_items = len(items)
adjacency_matrix = np.zeros((num_customers, num_items), dtype=int)

# Populate the adjacency matrix: 1 if a customer purchased an item, 0 otherwise
for index, row in data.iterrows():
    customer_id = row['CustomerID']
    item_id = row['StockCode']
    if customer_id in customer_to_index and item_id in item_to_index:
        customer_index = customer_to_index[customer_id]
        item_index = item_to_index[item_id]
        adjacency_matrix[customer_index, item_index] = 1

# Print the shape of the adjacency matrix and a few rows to verify
print("Shape of the Adjacency Matrix (Customers x Items):", adjacency_matrix.shape)
print("\nFirst 5 rows of the Adjacency Matrix:")
print(adjacency_matrix[:5])

# Create a DataFrame for better visualization if needed
adjacency_df = pd.DataFrame(adjacency_matrix, index=customers, columns=items)
print("\nAdjacency DataFrame (first 5 rows and columns):")
print(adjacency_df.iloc[:5, :5])

Shape of the Adjacency Matrix (Customers x Items): (3647, 3538)

First 5 rows of the Adjacency Matrix:
[[1 1 1 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

Adjacency DataFrame (first 5 rows and columns):
       84029E  71053  21730  84406B  22752
17850       1      1      1       1      1
13047       0      0      0       0      0
12583       0      0      0       0      0
13748       0      0      0       0      0
15100       0      0      0       0      0


# Step 3: Model Definition (GCNN) [35 points]
- Define the GCNN Architecture: Use a library like PyTorch Geometric or TensorFlow with Keras to define the GCNN model.
- The model will consist of graph convolutional layers that learn representations for both customers and items.
- Prepare Data for Training: Convert the adjacency matrix and features into a format suitable for the GCNN.

In [None]:
import torch

!pip uninstall torch-scatter torch-sparse torch-geometric torch-cluster  --y
!pip install torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html
!pip install torch-cluster -f https://data.pyg.org/whl/torch-{torch.__version__}.html
!pip install git+https://github.com/pyg-team/pytorch_geometric.git

Found existing installation: torch_scatter 2.1.2+pt26cu124
Uninstalling torch_scatter-2.1.2+pt26cu124:
  Successfully uninstalled torch_scatter-2.1.2+pt26cu124
Found existing installation: torch_sparse 0.6.18+pt26cu124
Uninstalling torch_sparse-0.6.18+pt26cu124:
  Successfully uninstalled torch_sparse-0.6.18+pt26cu124
Found existing installation: torch-geometric 2.7.0
Uninstalling torch-geometric-2.7.0:
  Successfully uninstalled torch-geometric-2.7.0
Found existing installation: torch_cluster 1.6.3+pt26cu124
Uninstalling torch_cluster-1.6.3+pt26cu124:
  Successfully uninstalled torch_cluster-1.6.3+pt26cu124
Looking in links: https://data.pyg.org/whl/torch-2.6.0+cu124.html
Collecting torch-scatter
  Using cached https://data.pyg.org/whl/torch-2.6.0%2Bcu124/torch_scatter-2.1.2%2Bpt26cu124-cp311-cp311-linux_x86_64.whl (10.8 MB)
Installing collected packages: torch-scatter
Successfully installed torch-scatter-2.1.2+pt26cu124
Looking in links: https://data.pyg.org/whl/torch-2.6.0+cu124.htm

In [None]:
import torch
from torch_geometric.data import Data

# Convert adjacency matrix to edge list (source and target)
edges = []

for customer_index in range(num_customers):
    for item_index in range(num_items):
        if adjacency_matrix[customer_index, item_index] == 1:
            edges.append([customer_index, num_customers + item_index])  # Customer nodes: 0 to num_customers-1, Item nodes: num_customers to num_customers+num_items-1

edges = torch.tensor(edges, dtype=torch.long).t().contiguous()


# If you don't have additional features, you can just use ones as dummy features
customer_features = torch.ones(num_customers, 1)  # Example: all customers having the same feature
item_features = torch.ones(num_items, 1)  # Example: all items having the same feature
node_features = torch.cat([customer_features, item_features], dim=0)  # Concatenate customer and item features

# Create the Data object (PyTorch Geometric format)
data = Data(x=node_features, edge_index=edges)


In [None]:
import torch
import torch.nn as nn
import torch_geometric.nn as pyg_nn

class GCNN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCNN, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, hidden_channels)
        self.conv2 = pyg_nn.GCNConv(hidden_channels, out_channels)
        self.fc = nn.Linear(out_channels, 1)  # Output a single prediction (e.g., purchase likelihood)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        # Apply first convolutional layer
        x = self.conv1(x, edge_index)
        x = torch.relu(x)

        # Apply second convolutional layer
        x = self.conv2(x, edge_index)
        x = torch.relu(x)

        # Apply a fully connected layer (optional)
        x = self.fc(x)

        return x


In [None]:
from torch.optim import Adam

# Set model, optimizer, and loss function
model = GCNN(in_channels=1, hidden_channels=64, out_channels=32)  # Example channel sizes
optimizer = Adam(model.parameters(), lr=0.01)
criterion = nn.BCEWithLogitsLoss()  # Use BCEWithLogitsLoss for binary classification (purchase/no-purchase)

# Example training loop
epochs = 100
for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()

    # Forward pass
    out = model(data)

    # Get edge indices
    edge_index = data.edge_index

    # Create target tensor based on edges
    target = torch.tensor(adjacency_matrix[edge_index[0], edge_index[1] - num_customers], dtype=torch.float)

    # Compute the loss
    loss = criterion(out[edge_index[0]].view(-1), target.view(-1)) # Select predictions for edges only

    # Backpropagate
    loss.backward()
    optimizer.step()

    if epoch % 10 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')


Epoch 0, Loss: 0.7085666656494141
Epoch 10, Loss: 0.09228347986936569
Epoch 20, Loss: 0.000785060110501945
Epoch 30, Loss: 2.6711670216172934e-05
Epoch 40, Loss: 4.004476068075746e-06
Epoch 50, Loss: 9.940775953509728e-07
Epoch 60, Loss: 3.4912361002170655e-07
Epoch 70, Loss: 1.7277960751016508e-07
Epoch 80, Loss: 1.109426577272643e-07
Epoch 90, Loss: 8.105943294367535e-08


In [None]:
model.eval()
with torch.no_grad():
    predictions = model(data)
    predictions = torch.sigmoid(predictions).view(-1)  # Convert logits to probabilities


# Step 4: Training the Model [40 points]

- Loss Function: Use a suitable loss function, such as Mean Squared Error (MSE) as we are working with continuous interaction scores.
- Optimizer: Choose an optimizer like Adam or SGD.
- Training Loop: Implement the training loop to update the model weights based on the loss. In each epoch, calculate the predictions using the model, compute the loss between predicted and actual values, and perform backpropagation to update the model's weights.
- Also compute the validation loss to evaluate the model's performance on unseen data, and use early stopping to halt training when the validation loss stops improving, preventing overfitting.

In [None]:
import torch
import torch.nn as nn
from torch.optim import Adam
from sklearn.model_selection import train_test_split
from torch_geometric.data import Data

# 1. Prepare the data with proper train/val split
# Convert edge indices and features to numpy first
edge_index_np = data.edge_index.numpy().T
edge_labels = torch.tensor(adjacency_matrix[edge_index_np[:, 0], edge_index_np[:, 1] - num_customers], dtype=torch.float)

# Split edges into train and validation sets
train_idx, val_idx = train_test_split(
    np.arange(edge_index_np.shape[0]),
    test_size=0.2,
    random_state=42
)

train_edge_index = torch.tensor(edge_index_np[train_idx], dtype=torch.long).t().contiguous()
val_edge_index = torch.tensor(edge_index_np[val_idx], dtype=torch.long).t().contiguous()

train_labels = edge_labels[train_idx]
val_labels = edge_labels[val_idx]

# 2. Modify the model for regression (since you mentioned continuous scores)
class GCNN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCNN, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, hidden_channels)
        self.conv2 = pyg_nn.GCNConv(hidden_channels, out_channels)
        self.fc = nn.Linear(out_channels, 1)

    def forward(self, data, edge_index=None):
        x, full_edge_index = data.x, data.edge_index
        edge_index = full_edge_index if edge_index is None else edge_index

        x = self.conv1(x, full_edge_index)
        x = torch.relu(x)
        x = self.conv2(x, full_edge_index)
        x = torch.relu(x)
        return self.fc(x)

# 3. Initialize model, optimizer, and loss
model = GCNN(in_channels=1, hidden_channels=64, out_channels=32)
optimizer = Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()  # For continuous interaction scores

# 4. Training loop with early stopping
best_val_loss = float('inf')
patience = 5
patience_counter = 0

for epoch in range(100):
    model.train()
    optimizer.zero_grad()

    # Forward pass on training edges
preds = model(data, train_edge_index)

# Get predictions only for the relevant edges
train_preds = preds[train_edge_index[0]].view(-1) # Select predictions for edges only

# Calculate the loss using only the predictions for the edges in the training dataset
train_loss = criterion(train_preds, train_labels)
# Backpropagation
train_loss.backward()
optimizer.step()
# Validation
model.eval()
with torch.no_grad():
    val_preds = model(data, val_edge_index) # Get predictions only for validation edges
    val_loss = criterion(val_preds[val_edge_index[0]].view(-1), val_labels) # Select predictions for edges only and compare with labels
# Early stopping check
if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
else:
        patience_counter += 1
import torch
import torch.nn as nn
from torch.optim import Adam
from sklearn.model_selection import train_test_split
from torch_geometric.data import Data
import numpy as np
import torch_geometric.nn as pyg_nn

# 1. Prepare the data with proper train/val split
# Convert edge indices and features to numpy first
edge_index_np = data.edge_index.numpy().T
edge_labels = torch.tensor(adjacency_matrix[edge_index_np[:, 0], edge_index_np[:, 1] - num_customers], dtype=torch.float)

# Split edges into train and validation sets
train_idx, val_idx = train_test_split(
    np.arange(edge_index_np.shape[0]),
    test_size=0.2,
    random_state=42
)

train_edge_index = torch.tensor(edge_index_np[train_idx], dtype=torch.long).t().contiguous()
val_edge_index = torch.tensor(edge_index_np[val_idx], dtype=torch.long).t().contiguous()

train_labels = edge_labels[train_idx]
val_labels = edge_labels[val_idx]

# 2. Modify the model for regression (since you mentioned continuous scores)
class GCNN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCNN, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, hidden_channels)
        self.conv2 = pyg_nn.GCNConv(hidden_channels, out_channels)
        self.fc = nn.Linear(out_channels, 1)

    def forward(self, data, edge_index=None):
        x, full_edge_index = data.x, data.edge_index
        edge_index = full_edge_index if edge_index is None else edge_index

        x = self.conv1(x, full_edge_index)
        x = torch.relu(x)
        x = self.conv2(x, full_edge_index)
        x = torch.relu(x)
        return self.fc(x)

# 3. Initialize model, optimizer, and loss
model = GCNN(in_channels=1, hidden_channels=64, out_channels=32)
optimizer = Adam(model.parameters(), lr=0.01)
criterion = nn.MSELoss()  # For continuous interaction scores

# 4. Training loop with early stopping
best_val_loss = float('inf')
patience = 5
patience_counter = 0

for epoch in range(100):
    model.train()
    optimizer.zero_grad()

    # Forward pass on training edges
    preds = model(data, train_edge_index)

    # Get predictions only for the relevant edges
    train_preds = preds[train_edge_index[0]].view(-1) # Select predictions for edges only

    # Calculate the loss using only the predictions for the edges in the training dataset
    train_loss = criterion(train_preds, train_labels)
    # Backpropagation
    train_loss.backward()
    optimizer.step()
    # Validation
    model.eval()
    with torch.no_grad():
        val_preds = model(data, val_edge_index) # Get predictions only for validation edges
        val_loss = criterion(val_preds[val_edge_index[0]].view(-1), val_labels) # Select predictions for edges only and compare with labels
    # Early stopping check
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        patience_counter = 0
    else:
        patience_counter += 1
        if patience_counter >= patience:
            print(f"Early stopping at epoch {epoch}")
            break

    if epoch % 5 == 0:
        print(f'Epoch {epoch}: Train Loss: {train_loss.item():.4f}, Val Loss: {val_loss.item():.4f}')


Epoch 0: Train Loss: 0.7561, Val Loss: 0.5544
Epoch 5: Train Loss: 0.0045, Val Loss: 0.0230
Early stopping at epoch 9


# Part 2: Recommendation System Evaluation and Comparison Using GCNN and NeuMF Models [weight: 30\%]

# Step 1: Evaluation [40 points]

To calculate the average precision, recall, and F1 score for all customers, follow these steps:

- Obtain Model Predictions: Use the trained model to predict interaction scores for all customer-item pairs in the validation set.

- Rank Items by Predicted Scores: For each customer, rank items based on the predicted interaction scores in descending order.

- Define Relevant Items: Set a threshold to determine which items are considered relevant (e.g., based on the top-k predictions or actual interactions greater than zero).

- Calculate Precision, Recall, and F1 Score for Each Customer: For each customer, calculate precision, recall, and F1 score using the relevant predicted and actual items.

- Compute Average Precision, Recall, and F1 Score: Calculate the mean of precision, recall, and F1 scores across all customers.

In [None]:
model.eval()
with torch.no_grad():
    output = model(data).view(-1)  # Get raw scores


In [None]:
from collections import defaultdict

# Dictionaries to store predictions and ground truth per customer
predictions_per_customer = defaultdict(list)
actuals_per_customer = defaultdict(set)

# Use val_edge_index instead of val_edges
for i in range(val_edge_index.shape[1]):
    customer_idx = val_edge_index[0, i].item()
    item_idx = val_edge_index[1, i].item() - num_customers  # Convert to item index

    score = output[val_edge_index[0, i]].item()  # Predicted score
    actual = adjacency_matrix[customer_idx, item_idx]

    predictions_per_customer[customer_idx].append((item_idx, score))
    if actual > 0:
        actuals_per_customer[customer_idx].add(item_idx)


In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

def precision_recall_f1(y_true, y_pred):
    if not y_pred:
        return 0.0, 0.0, 0.0
    y_true_bin = [1 if item in y_true else 0 for item in y_pred]
    y_pred_bin = [1] * len(y_pred)

    tp = sum(y_true_bin)
    precision = tp / len(y_pred)
    recall = tp / len(y_true) if y_true else 0.0
    f1 = (2 * precision * recall) / (precision + recall + 1e-8)  # Avoid divide by zero
    return precision, recall, f1

# Store all scores
precision_list = []
recall_list = []
f1_list = []

k = 10  # Top-k items to recommend

for customer, predictions in predictions_per_customer.items():
    predictions_sorted = sorted(predictions, key=lambda x: x[1], reverse=True)
    top_k_items = [item for item, score in predictions_sorted[:k]]
    actual_items = actuals_per_customer[customer]

    p, r, f = precision_recall_f1(actual_items, top_k_items)
    precision_list.append(p)
    recall_list.append(r)
    f1_list.append(f)

# Compute averages
avg_precision = np.mean(precision_list)
avg_recall = np.mean(recall_list)
avg_f1 = np.mean(f1_list)

print(f"Average Precision@{k}: {avg_precision:.4f}")
print(f"Average Recall@{k}: {avg_recall:.4f}")
print(f"Average F1 Score@{k}: {avg_f1:.4f}")


Average Precision@10: 1.0000
Average Recall@10: 0.8408
Average F1 Score@10: 0.8872


In [None]:
import torch
import torch.nn as nn
import torch_geometric.nn as pyg_nn

class GCNN(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super(GCNN, self).__init__()
        self.conv1 = pyg_nn.GCNConv(in_channels, hidden_channels)
        self.conv2 = pyg_nn.GCNConv(hidden_channels, out_channels)
        self.fc = nn.Linear(out_channels, 1)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = torch.relu(self.conv1(x, edge_index))
        x = torch.relu(self.conv2(x, edge_index))
        x = self.fc(x)
        return x


In [None]:
from collections import defaultdict
import numpy as np

def evaluate_gcnn(model, data, adjacency_matrix, num_customers, num_items, top_k=10):
    model.eval()
    with torch.no_grad():
        scores = model(data).view(-1)

    item_scores = scores[num_customers:]  # Item nodes only

    precision_list, recall_list, f1_list = [], [], []

    for cust_idx in range(num_customers):
        actual_indices = np.where(adjacency_matrix[cust_idx] > 0)[0]
        if len(actual_indices) == 0:
            continue  # Skip customers with no purchases

        predicted_scores = item_scores  # Same for all customers
        top_k_items = torch.topk(predicted_scores, k=top_k).indices.tolist()

        actual_set = set(actual_indices)
        pred_set = set(top_k_items)

        tp = len(actual_set & pred_set)
        precision = tp / top_k
        recall = tp / len(actual_set)
        f1 = (2 * precision * recall) / (precision + recall + 1e-8)

        precision_list.append(precision)
        recall_list.append(recall)
        f1_list.append(f1)

    return np.mean(precision_list), np.mean(recall_list), np.mean(f1_list)


In [None]:
class NeuMF(nn.Module):
    def __init__(self, num_users, num_items, embedding_dim=32):
        super(NeuMF, self).__init__()
        self.user_embed = nn.Embedding(num_users, embedding_dim)
        self.item_embed = nn.Embedding(num_items, embedding_dim)

        self.mlp = nn.Sequential(
            nn.Linear(2 * embedding_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)
        )

    def forward(self, user_indices, item_indices):
        u = self.user_embed(user_indices)
        v = self.item_embed(item_indices)
        x = torch.cat([u, v], dim=-1)
        return self.mlp(x).squeeze()


In [None]:
def evaluate_neumf(model, adjacency_matrix, top_k=10):
    model.eval()
    precision_list, recall_list, f1_list = [], [], []

    num_users, num_items = adjacency_matrix.shape

    with torch.no_grad():
        for user in range(num_users):
            actual_items = np.where(adjacency_matrix[user] > 0)[0]
            if len(actual_items) == 0:
                continue

            item_indices = torch.arange(num_items)
            user_tensor = torch.full((num_items,), user, dtype=torch.long)

            scores = model(user_tensor, item_indices)
            top_items = torch.topk(scores, k=top_k).indices.tolist()

            actual_set = set(actual_items)
            pred_set = set(top_items)

            tp = len(actual_set & pred_set)
            precision = tp / top_k
            recall = tp / len(actual_set)
            f1 = (2 * precision * recall) / (precision + recall + 1e-8)

            precision_list.append(precision)
            recall_list.append(recall)
            f1_list.append(f1)

    return np.mean(precision_list), np.mean(recall_list), np.mean(f1_list)


In [None]:
gcnn_model = GCNN(in_channels=1, hidden_channels=64, out_channels=32) # Example channel sizes
gcnn_data = data

# Example usage after training both models:
gcnn_p, gcnn_r, gcnn_f1 = evaluate_gcnn(gcnn_model, gcnn_data, adjacency_matrix, num_customers, num_items)
print(f"GCNN -> Precision@10: {gcnn_p:.4f}, Recall@10: {gcnn_r:.4f}, F1: {gcnn_f1:.4f}")

# Make sure num_customers and num_items are defined
neumf_model = NeuMF(num_users=num_customers, num_items=num_items)

neumf_p, neumf_r, neumf_f1 = evaluate_neumf(neumf_model, adjacency_matrix)
print(f"NeuMF -> Precision@10: {neumf_p:.4f}, Recall@10: {neumf_r:.4f}, F1: {neumf_f1:.4f}")

GCNN -> Precision@10: 0.0003, Recall@10: 0.0001, F1: 0.0001
NeuMF -> Precision@10: 0.0135, Recall@10: 0.0023, F1: 0.0035


Performance Comparison: GCNN vs. NeuMF for Recommender Systems
Metric Analysis
The evaluation metrics show a significant performance gap between the two models:

Metric	      GCNN	  NeuMF	  Relative Improvement
Precision@10	0.0003	0.0135	44x better
Recall@10	    0.0001	0.0023	23x better
F1 Score	    0.0001	0.0035	35x better
Key Observations
Dramatic Performance Difference:

NeuMF outperforms GCNN by 1-2 orders of magnitude across all metrics

The GCNN's near-zero metrics suggest it's essentially not functioning as a recommender

Recommendation Quality:

NeuMF's Precision@10 (1.35%) means ~1 in 100 recommended items are relevant

GCNN's Precision@10 (0.03%) means only ~3 in 10,000 recommendations are relevant

Coverage Ability:

NeuMF's Recall@10 (0.23%) shows it finds some relevant items

GCNN's Recall@10 (0.01%) indicates it's missing nearly all relevant items

Why NeuMF Performs Better
Architectural Advantages:

NeuMF combines matrix factorization and neural networks, capturing both linear and non-linear patterns

The GCNN implementation appears to be failing to learn meaningful representations

Data Suitability:

NeuMF is specifically designed for implicit feedback scenarios

The GCNN may be suffering from:

Poor message passing between nodes

Inadequate feature representation

Improper hyperparameter tuning

Implementation Factors:

The GCNN's poor performance suggests potential bugs in:

Edge construction

Loss computation

Node feature representation

NeuMF's more straightforward architecture may be more robust to implementation errors

# Step 2: Generating Recommendations and Evaluating for a Specific Customer [40 points]

1- Mapping Customer IDs to Indices.

2- Get Predicted Scores for the Customer.

3- Rank Items by Predicted Scores.

4- Map Recommended Items to Stock Codes.

5- Compare Recommendations with Actual Interactions.

6- Calculate Precision, Recall, and F1 Score.

In [None]:
customer_to_index = {customer: i for i, customer in enumerate(customers)}
index_to_customer = {i: customer for i, customer in enumerate(customers)}
item_to_index = {item: j for j, item in enumerate(items)}
index_to_item = {j: item for j, item in enumerate(items)}


In [None]:
customer_id = customers[0]  # Or any ID of interest
customer_idx = customer_to_index[customer_id]

model.eval()
with torch.no_grad():
    scores = model(data).view(-1)

# Get predicted scores for all item nodes (offset by num_customers)
item_scores = scores[num_customers:]  # Only item nodes


In [None]:
# Sort items by descending predicted scores
top_k = 10
ranked_items = torch.topk(item_scores, k=top_k).indices.tolist()


In [None]:
recommended_stock_codes = [index_to_item[i] for i in ranked_items]
print(f"Top-{top_k} recommended items for customer {customer_id}:")
print(recommended_stock_codes)


Top-10 recommended items for customer 17850:
[22423, '85123A', 47566, 21212, 22720, 84879, '85099B', 22960, 23298, 22457]


In [None]:
# Get actual items purchased by the customer
actual_purchased_item_indices = np.where(adjacency_matrix[customer_idx] > 0)[0]
actual_stock_codes = [index_to_item[i] for i in actual_purchased_item_indices]

print(f"\nActual purchased items by customer {customer_id}:")
print(actual_stock_codes)



Actual purchased items by customer 17850:
['84029E', 71053, 21730, '84406B', 22752, '85123A', '84029G', 22633, 22632, 20679, 21068, 21871, 82483, 21071, 82486, 37370, '82494L', 82482, '15056BL', 22411, 22803]


In [None]:
def precision_recall_f1_set(y_true, y_pred):
    y_true_set = set(y_true)
    y_pred_set = set(y_pred)
    tp = len(y_true_set & y_pred_set)

    precision = tp / len(y_pred) if y_pred else 0.0
    recall = tp / len(y_true) if y_true else 0.0
    f1 = (2 * precision * recall) / (precision + recall + 1e-8)
    return precision, recall, f1

precision, recall, f1 = precision_recall_f1_set(actual_stock_codes, recommended_stock_codes)

print(f"\nPrecision@{top_k}: {precision:.4f}")
print(f"Recall@{top_k}: {recall:.4f}")
print(f"F1 Score@{top_k}: {f1:.4f}")



Precision@10: 0.1000
Recall@10: 0.0476
F1 Score@10: 0.0645


In [None]:
def compare_gcnn_neumf(customer_id, gcnn_model, neumf_model, data, adjacency_matrix, customer_to_index, items, top_k=10):
    customer_idx = customer_to_index[customer_id]
    num_items = len(items)

    # ===== GCNN Predictions =====
    gcnn_model.eval()
    with torch.no_grad():
        gcnn_scores = gcnn_model(data).view(-1)
        gcnn_item_scores = gcnn_scores[len(customer_to_index):]  # Only item node predictions
        gcnn_top_k_items = torch.topk(gcnn_item_scores, k=top_k).indices.tolist()
        gcnn_stockcodes = [items[i] for i in gcnn_top_k_items]

    # ===== NeuMF Predictions =====
    neumf_model.eval()
    with torch.no_grad():
        item_indices = torch.arange(num_items)
        user_tensor = torch.full((num_items,), customer_idx, dtype=torch.long)
        neumf_scores = neumf_model(user_tensor, item_indices)
        neumf_top_k_items = torch.topk(neumf_scores, k=top_k).indices.tolist()
        neumf_stockcodes = [items[i] for i in neumf_top_k_items]

    # ===== Actual Interactions =====
    actual_indices = np.where(adjacency_matrix[customer_idx] > 0)[0]
    actual_stockcodes = [items[i] for i in actual_indices]

    # ===== Print Results =====
    print(f"\n🧾 Recommendations for Customer ID {customer_id}")
    print(f"GCNN  Top-{top_k} StockCodes:  {gcnn_stockcodes}")
    print(f"NeuMF Top-{top_k} StockCodes:  {neumf_stockcodes}")
    print(f"Actual StockCodes (Purchased): {actual_stockcodes}")

    # Optional: Calculate overlap
    overlap = set(gcnn_stockcodes) & set(neumf_stockcodes)
    print(f"Overlap between GCNN and NeuMF: {overlap if overlap else 'None'}")

# 🔍 Call the function for customer 17850
compare_gcnn_neumf(
    customer_id=17850,
    gcnn_model=gcnn_model,
    neumf_model=neumf_model,
    data=data,
    adjacency_matrix=adjacency_matrix,
    customer_to_index=customer_to_index,
    items=items,
    top_k=10
)



🧾 Recommendations for Customer ID 17850
GCNN  Top-10 StockCodes:  ['90214M', 21895, '90214J', '90214V', 21370, '90214S', 21268, 22275, 82615, 84854]
NeuMF Top-10 StockCodes:  ['72802C', 79163, 23370, 84949, 22889, 22366, '84032B', 20816, 22038, 46118]
Actual StockCodes (Purchased): ['84029E', 71053, 21730, '84406B', 22752, '85123A', '84029G', 22633, 22632, 20679, 21068, 21871, 82483, 21071, 82486, 37370, '82494L', 82482, '15056BL', 22411, 22803]
Overlap between GCNN and NeuMF: None


Comparison Summary
Overlap Between GCNN & NeuMF:
None. The two models produced completely different top-10 recommendations.

Overlap with Actual Purchases:
None of the top-10 recommended items from either GCNN or NeuMF appear in the actual purchases for customer 17850.

Inference:

Both models have distinct recommendation strategies. GCNN uses the graph structure of user-item interactions, while NeuMF relies on latent factor learning.

Neither model captured the customer’s actual preferences well in this instance — likely due to:

Sparse data for this customer.

Cold start effects.

Lack of side features (e.g., timestamps, categories).

# Step 3: Discussion of Results [20 points]

Discuss the performance of the GCNN model compared to the Feedforward NeuMF model. Provide insights on which model performs better and why, based on the evaluation metrics. Consider aspects like Precision@K, Recall@K, and F1 score.

Compare the recommended items for Customer 17850 generated by your model with those recommended by Neo4j. Are there similarities between the two sets of recommendations?