<a href="https://colab.research.google.com/github/reckn/super-disco/blob/main/ialwayswantedtodiecleanandpretty.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Product Recommendation using Matrix Factorization 🛍️

### Overview
This code implements a recommendation system using matrix factorization, specifically Singular Value Decomposition (SVD). It predicts user-item interactions based on purchase history and product details, then recommends top products for a given user.

### Functions Explained:

#### 1. Load Datasets
   - **Purpose**: Load customer interactions, purchase history, and product details datasets from CSV files.
   - **Inputs**: None.
   - **Output**: DataFrames `interactions`, `purchase_history`, and `product_details`.

#### 2. Preprocess Data
   - **Purpose**: Merge datasets to create a unified dataset and prepare it for matrix factorization.
   - **Inputs**: Purchase history and product details DataFrames.
   - **Output**: User-item matrix (`user_item_matrix`) and sparse matrix representation (`user_item_sparse`).

#### 3. Decompose Matrix using SVD
   - **Purpose**: Decompose the user-item matrix using Singular Value Decomposition (SVD) to extract latent factors.
   - **Inputs**: Sparse user-item matrix and the number of latent factors (`k`).
   - **Output**: Decomposed matrices `U`, `sigma`, and `Vt`.

#### 4. Make Predictions
   - **Purpose**: Reconstruct the user-item matrix using the decomposed matrices to make predictions.
   - **Inputs**: Decomposed matrices `U`, `sigma`, and `Vt`.
   - **Output**: Predicted ratings for all user-item interactions.

#### 5. Recommend Products
   - **Purpose**: Generate top product recommendations for a given user based on predicted ratings.
   - **Inputs**: User ID and number of recommendations (`num_recommendations`).
   - **Output**: DataFrame containing top recommended products for the user.

### How to Use:
1. Ensure you have the necessary CSV files containing customer interactions, purchase history, and product details.
2. Run the script to obtain recommendations for a specific user.
3. Check the printed output for the top recommended products.

### Additional Notes:
- Adjust the number of latent factors (`k`) in SVD to control the model's complexity and performance.
- The provided `user_id` should correspond to the IDs used in the purchase history dataset.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds

# Load datasets
interactions = pd.read_csv('/content/drive/MyDrive/fair_dataset/customer_interactions_synthetic.csv')
purchase_history = pd.read_csv('/content/drive/MyDrive/fair_dataset/purchase_history_synthetic.csv')
product_details = pd.read_csv('/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv')

# Preprocess data
# Merge datasets to create a unified dataset
purchase_history = pd.merge(purchase_history, product_details, on='Product ID')

# Create a user-item matrix
user_item_matrix = pd.pivot_table(purchase_history, values='Ratings', index='Customer ID', columns='Product ID', fill_value=0)

# Convert the user-item matrix to a sparse matrix
user_item_sparse = csr_matrix(user_item_matrix.values)

# Decompose the matrix using SVD
U, sigma, Vt = svds(user_item_sparse, k=50)  # k is the number of latent factors

# Convert sigma to diagonal matrix form
sigma = np.diag(sigma)

# Make predictions
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

# Convert the reconstructed matrix back to a DataFrame
preds_df = pd.DataFrame(all_user_predicted_ratings, columns=user_item_matrix.columns)

# Function to recommend top products for a given user
def recommend_products(user_id, num_recommendations=5):
    # Get the row corresponding to the user
    user_row_number = user_id - 1
    sorted_user_predictions = preds_df.iloc[user_row_number].sort_values(ascending=False)

    # Get the user's purchase history
    user_history = purchase_history[purchase_history['Customer ID'] == user_id]['Product ID']

    # Filter out products the user has already purchased
    recommendations = sorted_user_predictions.drop(user_history, errors='ignore')

    # Get top recommendations
    top_recommendations = recommendations.head(num_recommendations)
    top_product_details = product_details[product_details['Product ID'].isin(top_recommendations.index)]

    return top_product_details[['Product ID', 'Category', 'Price', 'Ratings', 'Product Icon']]

# Example: Get recommendations for a specific user (e.g., user ID 100)
user_id = 100
recommendations = recommend_products(user_id)
print("Top 5 recommendations for user", user_id)
print(recommendations)


Top 5 recommendations for user 100
     Product ID                         Category       Price  Ratings  \
250         251                  Books and Media    5.894098      5.0   
636         637  Apparel and Fashion Accessories  133.945918      4.6   
697         698      Home and Kitchen Appliances  559.541235      4.9   
928         929                   Toys and Games  277.117038      5.0   
951         952                  Books and Media   45.681721      4.9   

                                          Product Icon  
250  https://raw.githubusercontent.com/reckn/super-...  
636  https://raw.githubusercontent.com/reckn/super-...  
697  https://raw.githubusercontent.com/reckn/super-...  
928  https://raw.githubusercontent.com/reckn/super-...  
951  https://raw.githubusercontent.com/reckn/super-...  


## Recommendation Model Serialization Documentation

### Overview
This code serializes a recommendation model for future use. It loads datasets, preprocesses the data, performs matrix factorization using Singular Value Decomposition (SVD), makes predictions, and then saves the predictions DataFrame to a file using pickle.

### Steps Explained:

#### 1. Load Datasets
   - **Purpose**: Load customer interactions, purchase history, and product details datasets from CSV files.
   - **Inputs**: None.
   - **Output**: DataFrames `interactions`, `purchase_history`, and `product_details`.

#### 2. Preprocess Data
   - **Purpose**: Merge datasets to create a unified dataset and prepare it for matrix factorization.
   - **Inputs**: Purchase history and product details DataFrames.
   - **Output**: User-item matrix (`user_item_matrix`) and sparse matrix representation (`user_item_sparse`).

#### 3. Decompose Matrix using SVD
   - **Purpose**: Decompose the user-item matrix using Singular Value Decomposition (SVD) to extract latent factors.
   - **Inputs**: Sparse user-item matrix and the number of latent factors (`k`).
   - **Output**: Decomposed matrices `U`, `sigma`, and `Vt`.

#### 4. Make Predictions
   - **Purpose**: Reconstruct the user-item matrix using the decomposed matrices to make predictions.
   - **Inputs**: Decomposed matrices `U`, `sigma`, and `Vt`.
   - **Output**: Predicted ratings for all user-item interactions.

#### 5. Save Predictions
   - **Purpose**: Serialize the predictions DataFrame using pickle for future use.
   - **Inputs**: Predictions DataFrame.
   - **Output**: Serialized file (`preds_df.pkl`).

### How to Use:
1. Run the script to perform matrix factorization and generate predictions.
2. Check for the serialized file `preds_df.pkl`, which contains the predictions DataFrame, for future use.

### Additional Notes:
- Serialization allows you to save the model state, enabling reuse without retraining.
- Remember to load the serialized file appropriately when needed.


In [3]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
import pickle

# Load datasets
interactions = pd.read_csv('/content/drive/MyDrive/fair_dataset/customer_interactions_synthetic.csv')
purchase_history = pd.read_csv('/content/drive/MyDrive/fair_dataset/purchase_history_synthetic.csv')
product_details = pd.read_csv('/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv')

# Preprocess data
# Merge datasets to create a unified dataset
purchase_history = pd.merge(purchase_history, product_details, on='Product ID')

# Create a user-item matrix
user_item_matrix = pd.pivot_table(purchase_history, values='Ratings', index='Customer ID', columns='Product ID', fill_value=0)

# Convert the user-item matrix to a sparse matrix
user_item_sparse = csr_matrix(user_item_matrix.values)

# Decompose the matrix using SVD
U, sigma, Vt = svds(user_item_sparse, k=50)  # k is the number of latent factors

# Convert sigma to diagonal matrix form
sigma = np.diag(sigma)

# Make predictions
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

# Convert the reconstructed matrix back to a DataFrame
preds_df = pd.DataFrame(all_user_predicted_ratings, columns=user_item_matrix.columns)

# Save preds_df to a file
with open('preds_df.pkl', 'wb') as f:
    pickle.dump(preds_df, f)

## Product Recommendation Documentation

### Overview
This code provides product recommendations for a given user based on a previously trained recommendation model. It loads datasets, including customer interactions, purchase history, and product details, and utilizes a serialized prediction DataFrame (`preds_df.pkl`) to make recommendations.

### Function Explained:

#### 1. `recommend_products(user_id, num_recommendations=5)`
   - **Purpose**: Generate top product recommendations for a given user based on the serialized predictions DataFrame.
   - **Inputs**:
       - `user_id` (int): ID of the user for whom recommendations are generated.
       - `num_recommendations` (int): Number of recommendations to generate (default is 5).
   - **Output**: DataFrame containing top recommended products for the user.

### How to Use:
1. Ensure you have the necessary CSV files containing customer interactions, purchase history, and product details.
2. Run the script to obtain recommendations for a specific user (e.g., user ID 100).
3. Check the printed output for the top recommended products.

### Additional Notes:
- The serialized predictions DataFrame (`preds_df.pkl`) must be available in the same directory as the script.
- Adjust the number of recommendations to generate by modifying the `num_recommendations` parameter.


In [4]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds
import pickle

# Load datasets
interactions = pd.read_csv('/content/drive/MyDrive/fair_dataset/customer_interactions_synthetic.csv')
purchase_history = pd.read_csv('/content/drive/MyDrive/fair_dataset/purchase_history_synthetic.csv')
product_details = pd.read_csv('/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv')

# Function to load preds_df and recommend top products for a given user
def recommend_products(user_id, num_recommendations=5):
    # Load preds_df
    with open('preds_df.pkl', 'rb') as f:
        preds_df = pickle.load(f)

    # Get the row corresponding to the user
    user_row_number = user_id - 1
    sorted_user_predictions = preds_df.iloc[user_row_number].sort_values(ascending=False)

    # Get the user's purchase history
    user_history = purchase_history[purchase_history['Customer ID'] == user_id]['Product ID']

    # Filter out products the user has already purchased
    recommendations = sorted_user_predictions.drop(user_history, errors='ignore')

    # Get top recommendations
    top_recommendations = recommendations.head(num_recommendations)
    top_product_details = product_details[product_details['Product ID'].isin(top_recommendations.index)]

    return top_product_details[['Product ID', 'Category', 'Price', 'Ratings', 'Product Icon']]

# Example: Get recommendations for a specific user (e.g., user ID 100)
user_id = 100
recommendations = recommend_products(user_id)
print("Top 5 recommendations for user", user_id)
print(recommendations)

Top 5 recommendations for user 100
     Product ID                         Category       Price  Ratings  \
250         251                  Books and Media    5.894098      5.0   
636         637  Apparel and Fashion Accessories  133.945918      4.6   
697         698      Home and Kitchen Appliances  559.541235      4.9   
928         929                   Toys and Games  277.117038      5.0   
951         952                  Books and Media   45.681721      4.9   

                                          Product Icon  
250  https://raw.githubusercontent.com/reckn/super-...  
636  https://raw.githubusercontent.com/reckn/super-...  
697  https://raw.githubusercontent.com/reckn/super-...  
928  https://raw.githubusercontent.com/reckn/super-...  
951  https://raw.githubusercontent.com/reckn/super-...  


## Miscellaneous

In [5]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
from scipy.sparse.linalg import svds

# Load datasets
interactions = pd.read_csv('/content/drive/MyDrive/fair_dataset/customer_interactions_synthetic.csv')
purchase_history = pd.read_csv('/content/drive/MyDrive/fair_dataset/purchase_history_synthetic.csv')
product_details = pd.read_csv('/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv')

# Merge datasets to create a unified dataset
purchase_history = pd.merge(purchase_history, product_details, on='Product ID')

# Create a user-item matrix
user_item_matrix = pd.pivot_table(purchase_history, values='Ratings', index='Customer ID', columns='Product ID', fill_value=0)

# Convert the user-item matrix to a sparse matrix
user_item_sparse = csr_matrix(user_item_matrix.values)

# Decompose the matrix using SVD
U, sigma, Vt = svds(user_item_sparse, k=50)  # k is the number of latent factors

# Convert sigma to diagonal matrix form
sigma = np.diag(sigma)

# Make predictions
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

# Get the customer IDs from the user_item_matrix
customer_ids = user_item_matrix.index

# Create a mapping dictionary from customer IDs to row indices
id_to_index_mapping = {customer_id: idx for idx, customer_id in enumerate(customer_ids)}

# Function to recommend top products for a given user
def recommend_products(user_id, num_recommendations=5):
    # Get the corresponding index for the given user_id
    svd_index = id_to_index_mapping.get(user_id)
    if svd_index is None:
        print("User ID not found in the mapping dictionary.")
        return None

    # Retrieve predicted ratings for the user
    user_ratings = all_user_predicted_ratings[svd_index]

    # Get the user's purchase history
    user_history = user_item_matrix.iloc[svd_index]

    # Set already purchased products to zero so they are not recommended
    user_ratings[user_history > 0] = 0

    # Get indices of top recommendations
    top_indices = np.argsort(user_ratings)[::-1][:num_recommendations]

    # Get product details for top recommendations
    top_product_details = product_details.iloc[top_indices]

    return top_product_details[['Product ID', 'Category', 'Price', 'Ratings', 'Product Icon']]

# Example usage
recommendations = recommend_products(user_id=12345, num_recommendations=5)
print(recommendations)


User ID not found in the mapping dictionary.
None


Save and load model so that we do not need to train the model again each time we run the program

In [6]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix, save_npz, load_npz
from scipy.sparse.linalg import svds
import pickle

# File paths
interactions_path = '/content/drive/MyDrive/fair_dataset/customer_interactions_synthetic.csv'
purchase_history_path = '/content/drive/MyDrive/fair_dataset/purchase_history_synthetic.csv'
product_details_path = '/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv'

# Function to load datasets
def load_datasets(interactions_path, purchase_history_path, product_details_path):
    interactions = pd.read_csv(interactions_path)
    purchase_history = pd.read_csv(purchase_history_path)
    product_details = pd.read_csv(product_details_path)
    return interactions, purchase_history, product_details

# Function to merge datasets
def merge_datasets(purchase_history, product_details):
    merged_data = pd.merge(purchase_history, product_details, on='Product ID')
    return merged_data

# Function to create user-item matrix using groupby
def create_user_item_matrix(data):
    user_item_matrix = data.groupby(['Customer ID', 'Product ID'])['Ratings'].max().unstack(fill_value=0)
    return user_item_matrix

# Function to convert user-item matrix to sparse matrix
def convert_to_sparse_matrix(user_item_matrix):
    user_item_sparse = csr_matrix(user_item_matrix.values)
    return user_item_sparse

# Function to perform SVD decomposition
def perform_svd(user_item_sparse, k=50):
    U, sigma, Vt = svds(user_item_sparse, k=k)
    sigma = np.diag(sigma)
    return U, sigma, Vt

# Function to save SVD results and mapping dictionary
def save_results(U, sigma, Vt, id_to_index_mapping):
    np.save('U.npy', U)
    np.save('sigma.npy', sigma)
    np.save('Vt.npy', Vt)
    with open('id_to_index_mapping.pkl', 'wb') as f:
        pickle.dump(id_to_index_mapping, f)

# Function to load SVD results and mapping dictionary
def load_results():
    U = np.load('U.npy')
    sigma = np.load('sigma.npy')
    Vt = np.load('Vt.npy')
    with open('id_to_index_mapping.pkl', 'rb') as f:
        id_to_index_mapping = pickle.load(f)
    return U, sigma, Vt, id_to_index_mapping

# Function to recommend top products for a given user
def recommend_products(user_id, all_user_predicted_ratings, user_item_matrix, product_details, id_to_index_mapping, num_recommendations=5):
    svd_index = id_to_index_mapping.get(user_id)
    if svd_index is None:
        print("User ID not found in the mapping dictionary.")
        return None

    user_ratings = all_user_predicted_ratings[svd_index]
    user_history = user_item_matrix.iloc[svd_index]
    user_ratings[user_history > 0] = 0

    top_indices = np.argsort(user_ratings)[::-1][:num_recommendations]
    top_product_details = product_details.iloc[top_indices]

    return top_product_details[['Product Icon']]

# Load datasets
interactions, purchase_history, product_details = load_datasets(interactions_path, purchase_history_path, product_details_path)

# Merge datasets
merged_data = merge_datasets(purchase_history, product_details)

# Create user-item matrix
user_item_matrix = create_user_item_matrix(merged_data)

# Convert user-item matrix to sparse matrix
user_item_sparse = convert_to_sparse_matrix(user_item_matrix)

try:
    # Load precomputed SVD results and mapping dictionary
    U, sigma, Vt, id_to_index_mapping = load_results()
    print("Loaded precomputed SVD results and mapping dictionary.")
except FileNotFoundError:
    # Perform SVD decomposition
    U, sigma, Vt = perform_svd(user_item_sparse)

    # Create mapping dictionary from customer IDs to row indices
    customer_ids = user_item_matrix.index
    id_to_index_mapping = {customer_id: idx for idx, customer_id in enumerate(customer_ids)}

    # Save SVD results and mapping dictionary
    save_results(U, sigma, Vt, id_to_index_mapping)
    print("Computed SVD results and created mapping dictionary.")

# Make predictions
all_user_predicted_ratings = np.dot(np.dot(U, sigma), Vt)

# Example usage
recommendations = recommend_products(user_id=4126209, all_user_predicted_ratings=all_user_predicted_ratings, user_item_matrix=user_item_matrix, product_details=product_details, id_to_index_mapping=id_to_index_mapping, num_recommendations=5)
print(recommendations)


Computed SVD results and created mapping dictionary.
User ID not found in the mapping dictionary.
None


In [7]:
from sklearn.metrics import mean_squared_error

# Calculate RMSE
rmse = np.sqrt(mean_squared_error(user_item_matrix.values, all_user_predicted_ratings))

# Print RMSE
print("Root Mean Squared Error (RMSE):", rmse)

Root Mean Squared Error (RMSE): 0.502364716890839
