ASSIGNMENT :
    Design a real-time recommendation system for an e-commerce platform. The system should provide product recommendations based on the  purchase history using collaborative filtering.

In [163]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import mean_squared_error
from sklearn.decomposition import TruncatedSVD
from math import sqrt

DATASET SOURCE : KAGGLE
LINK : "https://www.kaggle.com/code/iamsouravbanerjee/shopping-trends-unveiled-eda-for-beginners?select=shopping_trends.csv"

In [164]:
df = pd.read_csv("shopping_trends.csv")

Select and preprocess relevant columns for collaborative filtering

In [165]:

interaction_data = df[["Customer ID", "Item Purchased", "Purchase Amount (USD)"]].copy()
interaction_data.loc[:, 'Customer Index'] = interaction_data['Customer ID'].astype('category').cat.codes
interaction_data.loc[:, 'Item Index'] = interaction_data['Item Purchased'].astype('category').cat.codes

Normalize purchase amounts to reduce skewness

In [166]:

interaction_data['Normalized Purchase Amount'] = (
    interaction_data['Purchase Amount (USD)'] - interaction_data['Purchase Amount (USD)'].mean()
) / interaction_data['Purchase Amount (USD)'].std()

Create a user-item interaction matrix

In [167]:

interaction_matrix = interaction_data.pivot_table(
    index='Customer Index', 
    columns='Item Index', 
    values='Normalized Purchase Amount', 
    fill_value=0
)

Adjust n_components to be <= number of columns in interaction_matrix

In [168]:

n_features = interaction_matrix.shape[1]
n_components = min(25, n_features)  # Set n_components dynamically based on the number of features

Apply dimensionality reduction using SVD

In [169]:

svd = TruncatedSVD(n_components=n_components, random_state=42)
reduced_matrix = svd.fit_transform(interaction_matrix)

Reconstruct an approximation of the interaction matrix

In [170]:

interaction_matrix_approx = pd.DataFrame(
    svd.inverse_transform(reduced_matrix),
    index=interaction_matrix.index,
    columns=interaction_matrix.columns
)

Function to split the interaction matrix into train and test sets

In [171]:

def train_test_split_interaction(interaction_matrix, test_size=0.1):
    train = interaction_matrix.copy()
    test = interaction_matrix.copy()
    
    for user in interaction_matrix.index:
        non_zero_indices = interaction_matrix.loc[user].to_numpy().nonzero()[0]
        if len(non_zero_indices) == 0:
            continue
        test_items = np.random.choice(
            non_zero_indices, 
            size=max(1, int(len(non_zero_indices) * test_size)), 
            replace=False
        )
        train.loc[user, test_items] = 0
        test.loc[user, np.setdiff1d(interaction_matrix.columns, test_items)] = 0
    
    return train, test

Split into train and test sets

In [172]:

train_matrix, test_matrix = train_test_split_interaction(interaction_matrix_approx)

Compute user similarity matrix using Pearson correlation

In [173]:

user_similarity_train = train_matrix.T.corr(method='pearson')
user_similarity_train.fillna(0, inplace=True)

Predict ratings for a given user

In [174]:
 
def predict_ratings(user_index, train_matrix, user_similarity_train_df, reg=0.1):
    user_interactions = train_matrix.loc[user_index]
    similar_users = user_similarity_train_df[user_index]
    weighted_scores = similar_users.dot(train_matrix)
    prediction_scores = weighted_scores / (similar_users.sum() + reg)
    return prediction_scores

Evaluate the system using RMSE on the test set

In [175]:

predicted_ratings = []
true_ratings = []

for user in test_matrix.index:
    predicted = predict_ratings(user, train_matrix, user_similarity_train)
    true = test_matrix.loc[user][test_matrix.loc[user] > 0]
    if true.empty:
        continue
    predicted = predicted.loc[true.index]
    if predicted.empty:
        continue
    true_ratings.extend(true.values)
    predicted_ratings.extend(predicted.values)

Compute MSE and RMSE

In [176]:

if true_ratings and predicted_ratings:
    mse = mean_squared_error(true_ratings, predicted_ratings)
    rmse = sqrt(mse)
    print(f"Mean Squared Error (MSE): {mse:.4f}")
    print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")
else:
    print("Insufficient data for evaluation.")

Mean Squared Error (MSE): 18.1420
Root Mean Squared Error (RMSE): 4.2593


In [177]:
def recommend_items(user_index, train_matrix, user_similarity_train_df, top_n=5):
    user_data = train_matrix.loc[user_index]
    predicted_scores = predict_ratings(user_index, train_matrix, user_similarity_train_df)
    items_to_recommend = predicted_scores[user_data == 0]
    top_recommendations = items_to_recommend.sort_values(ascending=False).head(top_n)
    return top_recommendations.index.tolist()

In [178]:
user_input = int(input("Enter the user index for recommendations: "))

Ensure the input user index exists in the train matrix

In [179]:

if user_input in train_matrix.index:
    # Generate recommendations for the input user
    recommended_items = recommend_items(user_input, train_matrix, user_similarity_train)
    
    # Map item indices back to item names
    item_mapping = interaction_data[['Item Index', 'Item Purchased']].drop_duplicates().set_index('Item Index')
    recommended_item_names = item_mapping.loc[recommended_items]['Item Purchased'].tolist()

    print(f"Recommendations for User {user_input}: {recommended_item_names}")
else:
    print(f"User with index {user_input} does not exist in the dataset.")

Recommendations for User 12: ['Sunglasses', 'Jeans', 'Backpack']
