<a href="https://colab.research.google.com/github/saerarawas/AAI_635O_B11_202520-Recommender-System/blob/main/Saera_Recommender_System_Course_Project_GitHub.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Graded Assessment -- AAI 6350 Recommender Systems Course --

# Part 1: Recommendation System Using GCNN [weight: 40\%]

# Step 1: Data Preparation
- Load the Data: Read the Excel file and extract the relevant columns (CustomerID, StockCode, Quantity).
- Data Cleaning: Ensure there are no missing values in the relevant columns.
- Create Interaction Matrix: Construct an adjacency matrix where rows represent customers and columns represent items. The values in the matrix will be the quantities purchased.

In [9]:
import pandas as pd
import numpy as np

# Load the dataset
data = pd.read_excel("/content/Rec_sys_data.xlsx")

# Create a pivot table to form the interaction matrix
interaction_matrix = data.pivot_table(index='CustomerID', columns='StockCode', values='Quantity', fill_value=0)

# Convert to a NumPy array for further processing
interaction_matrix = interaction_matrix.values

# Step 2: Graph Construction [25 points]
- Graph Representation: Each customer and item will be a node in the graph. An edge exists between a customer and an item if the customer has purchased that item.
- Adjacency Matrix: Create an adjacency matrix where the rows represent customers and the columns represent items.

In [10]:
# Get unique customers and items (StockCode)
customers = data['CustomerID'].unique()
items = data['StockCode'].unique()

# Create mappings from customer/item IDs to matrix indices
customer_to_index = {customer: i for i, customer in enumerate(customers)}
item_to_index = {item: j for j, item in enumerate(items)}

# Initialize the adjacency matrix with zeros
num_customers = len(customers)
num_items = len(items)
adjacency_matrix = np.zeros((num_customers, num_items), dtype=int)

# Populate the adjacency matrix: 1 if a customer purchased an item, 0 otherwise
for index, row in data.iterrows():
    customer_id = row['CustomerID']
    item_id = row['StockCode']
    if customer_id in customer_to_index and item_id in item_to_index:
        customer_index = customer_to_index[customer_id]
        item_index = item_to_index[item_id]
        adjacency_matrix[customer_index, item_index] = 1

# Print the shape of the adjacency matrix and a few rows to verify
print("Shape of the Adjacency Matrix (Customers x Items):", adjacency_matrix.shape)
print("\nFirst 5 rows of the Adjacency Matrix:")
print(adjacency_matrix[:5])

# Optional: You can also create a DataFrame for better visualization if needed
adjacency_df = pd.DataFrame(adjacency_matrix, index=customers, columns=items)
print("\nAdjacency DataFrame (first 5 rows and columns):")
print(adjacency_df.iloc[:5, :5])

Shape of the Adjacency Matrix (Customers x Items): (3647, 3538)

First 5 rows of the Adjacency Matrix:
[[1 1 1 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]]

Adjacency DataFrame (first 5 rows and columns):
       84029E  71053  21730  84406B  22752
17850       1      1      1       1      1
13047       0      0      0       0      0
12583       0      0      0       0      0
13748       0      0      0       0      0
15100       0      0      0       0      0


# Step 3: Model Definition (GCNN) [35 points]
- Define the GCNN Architecture: Use a library like PyTorch Geometric or TensorFlow with Keras to define the GCNN model.
- The model will consist of graph convolutional layers that learn representations for both customers and items.
- Prepare Data for Training: Convert the adjacency matrix and features into a format suitable for the GCNN.

In [11]:
#!pip install torch-geometric torch-sparse torch-scatter

In [12]:
!python -m venv myenv
!myenv\Scripts\activate  # On Windows
!python -m venv myenv
!myenv\Scripts\activate  # On Windows
!pip install torch torchvision torchaudio  # Install PyTorch
!pip install torch-scatter torch-sparse torch-geometric  # Install PyG
!pip install torch torchvision torchaudio  # Install PyTorch
!pip install torch-scatter torch-sparse torch-geometric  # Install PyG

Error: Command '['/content/myenv/bin/python3', '-m', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.
/bin/bash: line 1: myenvScriptsactivate: command not found
Error: Command '['/content/myenv/bin/python3', '-m', 'ensurepip', '--upgrade', '--default-pip']' returned non-zero exit status 1.
/bin/bash: line 1: myenvScriptsactivate: command not found
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecti

In [13]:
import torch_geometric
print(torch_geometric.__version__)

2.6.1


In [14]:
import torch
from torch import nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data
import pandas as pd
import numpy as np

# Assume 'adjacency_matrix', 'customers', and 'items' are already created as in the previous step

# 1. Define Node Features (Optional but Recommended)
# For simplicity, we'll start without explicit features.
# In a real-world scenario, you might have customer demographics, item descriptions, etc.
# If you don't have features, you can use an identity matrix as initial node embeddings.

num_customers = adjacency_matrix.shape[0]
num_items = adjacency_matrix.shape[1]
num_nodes = num_customers + num_items

# Create initial node embeddings (identity matrix) if no other features are available
customer_features = torch.eye(num_customers)
item_features = torch.eye(num_items)
#node_features = torch.cat([customer_features, item_features], dim=0).float()
# Pad item_features with zeros to match customer_features shape
padding_size = num_customers - num_items
padding = torch.zeros(num_items, padding_size)  # Create a padding tensor
item_features = torch.cat([item_features, padding], dim=1)  # Pad along dimension 1 (columns)

node_features = torch.cat([customer_features, item_features], dim=0).float()
# 2. Create Edge List (COO format) from the Adjacency Matrix
# PyTorch Geometric uses the COO format for representing sparse graphs
row_indices, col_indices = adjacency_matrix.nonzero()

# Shift item indices to account for customer nodes
edge_index_0 = torch.tensor(row_indices, dtype=torch.long)
edge_index_1 = torch.tensor(col_indices + num_customers, dtype=torch.long)
edge_index = torch.stack([edge_index_0, edge_index_1], dim=0)

# Also create the reverse edges (item purchased by customer implies connection)
reverse_edge_index_0 = torch.tensor(col_indices + num_customers, dtype=torch.long)
reverse_edge_index_1 = torch.tensor(row_indices, dtype=torch.long)
reverse_edge_index = torch.stack([reverse_edge_index_0, reverse_edge_index_1], dim=0)

# Combine forward and reverse edges to create an undirected graph representation
edge_index = torch.cat([edge_index, reverse_edge_index], dim=1)

# 3. Create Labels (for a supervised learning task - you'll need to define your task)
# For example, if you want to predict future purchases, you might need to create labels
# based on temporal splits of your data.
# For now, let's assume a simple task where the presence of an edge is the signal.
# We might not need explicit labels in the same way as node/graph classification.

# 4. Create the PyTorch Geometric Data object
data = Data(x=node_features, edge_index=edge_index)

# Print the Data object to see its structure
print(data)
print("Node Feature Shape:", data.x.shape)
print("Edge Index Shape:", data.edge_index.shape)

# 5. Define the GCNN Architecture
class GCN(torch.nn.Module):
    def __init__(self, num_node_features, hidden_channels, num_embeddings):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(num_node_features, hidden_channels)
        self.conv2 = GCNConv(hidden_channels, num_embeddings)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = self.conv2(x, edge_index)
        return x

# Instantiate the model
hidden_channels = 64  # You can experiment with this
embedding_dim = 32   # The dimensionality of the learned embeddings
model = GCN(num_node_features=node_features.shape[1], hidden_channels=hidden_channels, num_embeddings=embedding_dim)

print("\nGCNN Model:")
print(model)

# Example of a forward pass
out = model(data.x, data.edge_index)
print("\nOutput Embeddings Shape:", out.shape)
# The first 'num_customers' rows of 'out' are the customer embeddings,
# and the rest are the item embeddings.

Data(x=[7185, 3647], edge_index=[2, 385516])
Node Feature Shape: torch.Size([7185, 3647])
Edge Index Shape: torch.Size([2, 385516])

GCNN Model:
GCN(
  (conv1): GCNConv(3647, 64)
  (conv2): GCNConv(64, 32)
)

Output Embeddings Shape: torch.Size([7185, 32])


# Step 4: Training the Model [40 points]

- Loss Function: Use a suitable loss function, such as Mean Squared Error (MSE) as we are working with continuous interaction scores.
- Optimizer: Choose an optimizer like Adam or SGD.
- Training Loop: Implement the training loop to update the model weights based on the loss. In each epoch, calculate the predictions using the model, compute the loss between predicted and actual values, and perform backpropagation to update the model's weights.
- Also compute the validation loss to evaluate the model's performance on unseen data, and use early stopping to halt training when the validation loss stops improving, preventing overfitting.

In [15]:
import torch
from torch import nn
import torch.nn.functional as F
from torch_geometric.nn import GCNConv
from torch_geometric.data import Data
from torch.optim import Adam
from sklearn.model_selection import train_test_split
from tqdm import tqdm  # For progress bar

In [16]:
# 1. Define Loss Function
# For link prediction (reconstructing the adjacency), a common approach is to use
# a loss based on the dot product of the embeddings of connected nodes.
# We'll define a loss that encourages higher dot products for existing edges
# and lower dot products for non-existent edges (though sampling negatives is more efficient).

def link_prediction_loss(embeddings, edge_index):
    # Positive examples: existing edges
    src, dst = edge_index
    positive_scores = torch.sum(embeddings[src] * embeddings[dst], dim=1)
    positive_loss = -F.logsigmoid(positive_scores).mean()

    # Simplified negative sampling (can be improved with random sampling)
    num_edges = edge_index.size(1)
    num_nodes = embeddings.size(0)
    negative_src = torch.randint(0, num_nodes, (num_edges,))
    negative_dst = torch.randint(0, num_nodes, (num_edges,))
    negative_scores = torch.sum(embeddings[negative_src] * embeddings[negative_dst], dim=1)
    negative_loss = -F.logsigmoid(-negative_scores).mean()

    return positive_loss + negative_loss

# 2. Define Optimizer
learning_rate = 0.01
optimizer = Adam(model.parameters(), lr=learning_rate)

# 3. Prepare Data for Training and Validation (Splitting Edges)
# Splitting nodes might not be the best approach for graph-based tasks.
# We'll split the edges into training and validation sets.

num_edges = data.edge_index.size(1)
train_edges, val_edges = train_test_split(torch.arange(num_edges).numpy(), test_size=0.2, random_state=42)

train_edge_index = data.edge_index[:, train_edges]
val_edge_index = data.edge_index[:, val_edges]

# Create training and validation Data objects
train_data = Data(x=data.x, edge_index=train_edge_index)
val_data = Data(x=data.x, edge_index=val_edge_index)

# 4. Implement Training Loop with Validation and Early Stopping
num_epochs = 200
patience = 10  # Number of epochs to wait for improvement
best_val_loss = float('inf')
epochs_without_improvement = 0

for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    embeddings = model(train_data.x, train_data.edge_index)
    loss = link_prediction_loss(embeddings, train_data.edge_index)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_embeddings = model(val_data.x, val_data.edge_index)
        val_loss = link_prediction_loss(val_embeddings, val_data.edge_index)

    print(f"Epoch {epoch+1}/{num_epochs}, Train Loss: {loss.item():.4f}, Val Loss: {val_loss.item():.4f}")

    # Early Stopping
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_without_improvement = 0
        # Optionally save the best model state here
        # torch.save(model.state_dict(), 'best_gcn_model.pth')
    else:
        epochs_without_improvement += 1
        if epochs_without_improvement >= patience:
            print(f"Early stopping triggered at epoch {epoch+1}")
            break

# Optionally load the best model
# model.load_state_dict(torch.load('best_gcn_model.pth'))

print("Training Finished!")

Epoch 1/200, Train Loss: 1.3863, Val Loss: 1.3671
Epoch 2/200, Train Loss: 1.3642, Val Loss: 1.3242
Epoch 3/200, Train Loss: 1.3105, Val Loss: 1.3439
Epoch 4/200, Train Loss: 1.2865, Val Loss: 1.3745
Epoch 5/200, Train Loss: 1.2994, Val Loss: 1.3113
Epoch 6/200, Train Loss: 1.2651, Val Loss: 1.2872
Epoch 7/200, Train Loss: 1.2633, Val Loss: 1.2823
Epoch 8/200, Train Loss: 1.2645, Val Loss: 1.2726
Epoch 9/200, Train Loss: 1.2533, Val Loss: 1.2635
Epoch 10/200, Train Loss: 1.2367, Val Loss: 1.2652
Epoch 11/200, Train Loss: 1.2274, Val Loss: 1.2773
Epoch 12/200, Train Loss: 1.2276, Val Loss: 1.2614
Epoch 13/200, Train Loss: 1.2177, Val Loss: 1.2332
Epoch 14/200, Train Loss: 1.2043, Val Loss: 1.2170
Epoch 15/200, Train Loss: 1.2021, Val Loss: 1.2093
Epoch 16/200, Train Loss: 1.1970, Val Loss: 1.2023
Epoch 17/200, Train Loss: 1.1842, Val Loss: 1.2078
Epoch 18/200, Train Loss: 1.1804, Val Loss: 1.2093
Epoch 19/200, Train Loss: 1.1781, Val Loss: 1.1956
Epoch 20/200, Train Loss: 1.1688, Val Lo

# Part 2: Recommendation System Evaluation and Comparison Using GCNN and NeuMF Models [weight: 30\%]

# Step 1: Evaluation [40 points]

To calculate the average precision, recall, and F1 score for all customers, follow these steps:

- Obtain Model Predictions: Use the trained model to predict interaction scores for all customer-item pairs in the validation set.

- Rank Items by Predicted Scores: For each customer, rank items based on the predicted interaction scores in descending order.

- Define Relevant Items: Set a threshold to determine which items are considered relevant (e.g., based on the top-k predictions or actual interactions greater than zero).

- Calculate Precision, Recall, and F1 Score for Each Customer: For each customer, calculate precision, recall, and F1 score using the relevant predicted and actual items.

- Compute Average Precision, Recall, and F1 Score: Calculate the mean of precision, recall, and F1 scores across all customers.

In [17]:
import torch
import pandas as pd
from collections import defaultdict
from sklearn.metrics import precision_score, recall_score, f1_score

# Assume 'model', 'val_data', 'customers', 'items', 'customer_to_index', 'item_to_index' are available

# 1. Obtain Model Predictions for Validation Set
model.eval()
with torch.no_grad():
    val_embeddings = model(val_data.x, val_data.edge_index)
    customer_embeddings = val_embeddings[:len(customers)]
    item_embeddings = val_embeddings[len(customers):]

# Create a dictionary to store actual interactions in the validation set for each customer
actual_interactions = defaultdict(list)
for edge_idx in range(val_data.edge_index.shape[1]):
    u_idx = val_data.edge_index[0, edge_idx].item()
    i_idx = val_data.edge_index[1, edge_idx].item() - len(customers) # Adjust item index

    # Check if u_idx is within the bounds of the customers array
    if u_idx < len(customers):
        customer_id = customers[u_idx]
    else:
        # Handle the case where u_idx is out of bounds (e.g., skip or print a warning)
        continue  # Skip this edge if it refers to an item node

    # Check if i_idx is within the bounds of the items array
    if 0 <= i_idx < len(items):
        item_id = items[i_idx]
    else:
        # Handle the case where i_idx is out of bounds (e.g., skip or print a warning)
        continue  # Skip this edge if it refers to an invalid item index

    actual_interactions[customer_id].append(item_id)

# Create a dictionary to store predicted scores for each customer-item pair in the validation set
predicted_scores = defaultdict(dict)
for i, customer_id in enumerate(customers):
    customer_embed = customer_embeddings[i]
    for j, item_id in enumerate(items):
        item_embed = item_embeddings[j]
        score = torch.dot(customer_embed, item_embed).item() # Using dot product as a proxy for interaction score
        predicted_scores[customer_id][item_id] = score

# 2. Rank Items by Predicted Scores for Each Customer
ranked_predictions = defaultdict(list)
for customer_id, item_scores in predicted_scores.items():
    sorted_items = sorted(item_scores.items(), key=lambda item: item[1], reverse=True)
    ranked_predictions[customer_id] = [item[0] for item in sorted_items]

# 3. Define Relevant Items (Top-k approach)
k = 10 # Consider top-k predicted items as relevant
average_precision_sum = 0
average_recall_sum = 0
average_f1_sum = 0
num_customers_evaluated = 0

# 4. Calculate Precision, Recall, and F1 Score for Each Customer
for customer_id in actual_interactions:
    if customer_id in ranked_predictions:
        actual_relevant = set(actual_interactions[customer_id])
        top_k_predicted = set(ranked_predictions[customer_id][:k])

        true_positives = len(actual_relevant.intersection(top_k_predicted))
        predicted_positives = len(top_k_predicted)
        actual_positives = len(actual_relevant)

        # Calculate Precision
        precision = true_positives / predicted_positives if predicted_positives > 0 else 0

        # Calculate Recall
        recall = true_positives / actual_positives if actual_positives > 0 else 0

        # Calculate F1 Score
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

        average_precision_sum += precision
        average_recall_sum += recall
        average_f1_sum += f1
        num_customers_evaluated += 1

# 5. Compute Average Precision, Recall, and F1 Score
if num_customers_evaluated > 0:
    avg_precision = average_precision_sum / num_customers_evaluated
    avg_recall = average_recall_sum / num_customers_evaluated
    avg_f1 = average_f1_sum / num_customers_evaluated
else:
    avg_precision = 0
    avg_recall = 0
    avg_f1 = 0

print(f"Average Precision@{k}: {avg_precision:.4f}")
print(f"Average Recall@{k}: {avg_recall:.4f}")
print(f"Average F1 Score@{k}: {avg_f1:.4f}")

Average Precision@10: 0.0299
Average Recall@10: 0.0299
Average F1 Score@10: 0.0240


# Step 2: Generating Recommendations and Evaluating for a Specific Customer [40 points]

1- Mapping Customer IDs to Indices.

2- Get Predicted Scores for the Customer.

3- Rank Items by Predicted Scores.

4- Map Recommended Items to Stock Codes.

5- Compare Recommendations with Actual Interactions.

6- Calculate Precision, Recall, and F1 Score.

In [18]:
import torch
import pandas as pd
from collections import defaultdict
from sklearn.metrics import precision_score, recall_score, f1_score

# Assume 'model', 'val_data', 'customers', 'items', 'customer_to_index', 'item_to_index' are available

# Load product data for mapping StockCode to product names
data_prod = pd.read_excel("/content/Rec_sys_data.xlsx", sheet_name='product')
item_titles = data_prod[['StockCode', 'Product Name']].drop_duplicates()
item_titles_dict = dict(zip(item_titles['StockCode'], item_titles['Product Name']))


In [19]:
import torch
import pandas as pd

# 2. Get Predicted Scores for the Customer
def get_predicted_scores(model, customer_id, customers, items, customer_to_index, item_to_index, node_features, edge_index):
    """
    Calculates predicted interaction scores for a given customer and all items.

    Args:
        model: Your trained PyTorch GCNN model.
        customer_id: The ID of the customer for whom to generate predictions.
        customers: List of all customer IDs.
        items: List of all item IDs (StockCodes).
        customer_to_index: Dictionary mapping customer IDs to their numerical indices.
        item_to_index: Dictionary mapping item IDs to their numerical indices.
        node_features: The node feature matrix (torch.Tensor) for the graph.
        edge_index: The edge index tensor (torch.Tensor) representing the graph connections.

    Returns:
        A dictionary where keys are item IDs (StockCodes) and values are the predicted scores.
    """
    model.eval()  # Set the model to evaluation mode (important for inference)
    with torch.no_grad():  # Disable gradient calculation to save memory and speed up inference
        embeddings = model(node_features, edge_index)  # Get the node embeddings from the model
        customer_embeddings = embeddings[:len(customers)]  # Separate customer embeddings
        item_embeddings = embeddings[len(customers):]      # and item embeddings

        if customer_id not in customer_to_index:
            return {}  # Return an empty dictionary if the customer ID is not found

        customer_index = customer_to_index[customer_id]  # Get the numerical index of the customer
        customer_embed = customer_embeddings[customer_index]  # Get the embedding vector for the customer
        item_scores = {}  # Initialize a dictionary to store item scores

        for item_id, item_index in item_to_index.items():
            item_embed = item_embeddings[item_index]  # Get the embedding vector for the item
            score = torch.dot(customer_embed, item_embed).item()  # Calculate the dot product as the interaction score
            item_scores[item_id] = score  # Store the score for the item

        return item_scores

# 3. Rank Items by Predicted Scores
def rank_items_by_score(item_scores, top_n=10):
    """
    Ranks items based on their predicted scores in descending order.

    Args:
        item_scores: A dictionary of item IDs (StockCodes) and their predicted scores.
        top_n: The number of top-ranked items to return.  Defaults to 10.

    Returns:
        A list of (item_id, score) tuples, sorted in descending order of score, containing the top_n items.
    """
    sorted_items = sorted(item_scores.items(), key=lambda item: item[1], reverse=True)  # Sort items by score
    return sorted_items[:top_n]  # Return the top_n items
# 4. Map Recommended Items to Stock Codes (Already the keys in the ranked list)
# We will map to product names later if needed for display.
# 5. Get Recommendations and Actual Interactions
def get_recommendations_and_actual(customer_id, model, customers, items, customer_to_index, item_to_index, node_features, edge_index, val_data, k=10):
    """
    Generates top-k recommendations for a customer and retrieves the actual items they interacted with in the validation data.

    Args:
        customer_id: The ID of the customer.
        model: Your trained PyTorch GCNN model.
        customers: List of all customer IDs.
        items: List of all item IDs (StockCodes).
        customer_to_index: Dictionary mapping customer IDs to their numerical indices.
        item_to_index: Dictionary mapping item IDs to their numerical indices.
        node_features: The node feature matrix (torch.Tensor).
        edge_index: The edge index tensor (torch.Tensor).
        val_data: The validation data (a PyTorch Geometric Data object or a dictionary-like object).  Crucially, it must have 'edge_index'.
        k: The number of top recommendations to generate. Defaults to 10.

    Returns:
        A tuple containing:
        - A list of the top-k recommended item IDs (StockCodes).
        - A list of the actual item IDs (StockCodes) the customer interacted with in the validation data.
    """
    predicted_scores_for_customer = get_predicted_scores(model, customer_id, customers, items, customer_to_index, item_to_index, node_features, edge_index)
    ranked_predictions = [item[0] for item in rank_items_by_score(predicted_scores_for_customer, top_n=k)]  # Get ranked item IDs

    actual_relevant = set()  # Use a set for efficient membership checking
    customer_index_val = customer_to_index.get(customer_id)  # Get the customer's index in the validation data
    if customer_index_val is not None:
        # Iterate through the edges in the validation data's edge_index
        for edge_idx in range(val_data.edge_index.shape[1]):
            u_idx_val = val_data.edge_index[0, edge_idx].item()  # Get the source node index (customer)
            i_idx_val = val_data.edge_index[1, edge_idx].item() - len(customers)  # Get the target node index (item), adjusting for the offset
            if u_idx_val == customer_index_val:  # If the edge involves the customer we're interested in
                actual_relevant.add(items[i_idx_val])  # Add the item ID to the set of actual interactions

    return ranked_predictions, list(actual_relevant)  # Return the recommendations and actual interactions

# --- Assuming your model is trained and 'data' and 'val_data' are available ---
# Load product data for mapping StockCode to product names
try:
    data_prod = pd.read_excel("/content/Rec_sys_data.xlsx", sheet_name='product')
    item_titles = data_prod[['StockCode', 'Product Name']].drop_duplicates()
    item_titles_dict = dict(zip(item_titles['StockCode'], item_titles['Product Name']))
except FileNotFoundError:
    item_titles_dict = {}
    print("Warning: 'Rec_sys_data.xlsx' not found. Product names will not be available.")

# Get recommendations for Customer ID 17850
target_customer_id = 17850
top_k = 10  # Number of recommendations to generate

if 'model' in locals() and 'data' in locals() and 'val_data' in locals() and 'customers' in locals() and 'items' in locals() and 'customer_to_index' in locals() and 'item_to_index' in locals():
    recommendations, actual_interactions = get_recommendations_and_actual(
        target_customer_id, model, customers, items, customer_to_index, item_to_index, data.x, data.edge_index, val_data, k=top_k
    )

    print(f"\nTop {top_k} Recommendations for Customer ID: {target_customer_id}")
    for stock_code in recommendations:
        product_name = item_titles_dict.get(stock_code, "Product Name Not Found")
        print(f"- {stock_code}: {product_name}")

    print(f"\nActual Interactions for Customer ID: {target_customer_id}")
    for stock_code in actual_interactions:
        product_name = item_titles_dict.get(stock_code, "Product Name Not Found")
        print(f"- {stock_code}: {product_name}")
else:
    print(f"Customer ID {target_customer_id} not found in the training data.")

#else:
#    print("Error:  Make sure that 'model', 'data', 'val_data', 'customers', 'items', 'customer_to_index', and 'item_to_index' are defined and that your model is trained.")
#    print("  This code assumes you have already loaded your data and trained your GCNN model.")
# You can still run the evaluation for all customers if you want:
if 'model' in locals() and 'data' in locals() and 'val_data' in locals() and 'customers' in locals() and 'items' in locals() and 'customer_to_index' in locals() and 'item_to_index' in locals():
    avg_precision_at_k, avg_recall_at_k, avg_f1_at_k, all_customer_results = evaluate_all_customers(
        model, customers, items, customer_to_index, item_to_index, data.x, data.edge_index, val_data, k=10
    )

    print(f"\nAverage Precision@10: {avg_precision_at_k:.4f}")
    print(f"Average Recall@10: {avg_recall_at_k:.4f}")
    print(f"Average F1 Score@10: {avg_f1_at_k:.4f}")




Top 10 Recommendations for Customer ID: 17850
- 22423: Handcrafted Ercolano Music Box Featuring "Luncheon of the Boating Party" by Renoir, Pierre Auguste - New YorkNew York
- 85123A: Mediven Sheer and Soft 15-20 mmHg Thigh w/ Lace Silicone Top Band CT Wheat II - Ankle 8-8.75 inches
- 21212: 3 1/2"W x 20"D x 20"H Funston Craftsman Smooth Bracket, Douglas Fir
- 47566: Port Authority K110 Dry Zone UV Micro-Mesh Polo, Gusty Grey, S
- 22720: MightySkins Skin Decal Wrap Compatible with DJI Sticker Protective Cover 100's of Color Options
- 20725: billyboards Porcelain Menu Chalkboard
- 22961: 1.30 Carat (ctw) 14K White Gold Round Diamond Ladies 3 Stone Bridal Engagement Ring Set With Band
- 22960: Augusta 1235 Ladies Triumph Jersey
- 85099B: Ebe Women Reading Glasses Reader Cheaters Anti Reflective Lenses TR90 ry2209
- 23298: 3 1/2"W x 20"D x 20"H Funston Craftsman Smooth Bracket, Douglas Fir

Actual Interactions for Customer ID: 17850
- 22633: Handcrafted Ercolano Music Box Featuring "Lunch

In [1]:
# 6. Evaluate Model Performance
if 'model' in locals() and 'data' in locals() and 'val_data' in locals() and 'customers' in locals() and 'items' in locals() and 'customer_to_index' in locals() and 'item_to_index' in locals():
    avg_precision_at_k, avg_recall_at_k, avg_f1_at_k, all_customer_results = evaluate_all_customers(
        model, customers, items, customer_to_index, item_to_index, data.x, data.edge_index, val_data, k=10
    )

    print(f"\nAverage Precision@10: {avg_precision_at_k:.4f}")
    print(f"Average Recall@10: {avg_recall_at_k:.4f}")
    print(f"Average F1 Score@10: {avg_f1_at_k:.4f}")

# Step 3: Discussion of Results [20 points]

Discuss the performance of the GCNN model compared to the Feedforward NeuMF model. Provide insights on which model performs better and why, based on the evaluation metrics. Consider aspects like Precision@K, Recall@K, and F1 score.

Compare the recommended items for Customer 17850 generated by your model with those recommended by Neo4j. Are there similarities between the two sets of recommendations?