<a href="https://colab.research.google.com/github/reckn/super-disco/blob/main/butimtoobusyonworkingdays.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Recommendation System Collaborative

### Overview
This code is all about recommendation systems! 🎉 It helps you build a recommendation model based on customer interactions and purchase history.

### Functions Explained:

#### 1. `load_data(file_path)`
   - **Purpose**: Loads data from CSV files.
   - **Input**: `file_path` - Path to the directory containing CSV files.
   - **Output**: Three DataFrames: `interactions`, `purchase_history`, and `product_details`.
   
#### 2. `preprocess_data(purchase_history, product_details, save_path)`
   - **Purpose**: Prepares the purchase history data for modeling.
   - **Inputs**:
       - `purchase_history`: DataFrame containing purchase history.
       - `product_details`: DataFrame containing product details.
       - `save_path`: Path to save the preprocessed dataset.
   - **Output**: Preprocessed purchase history DataFrame.

#### 3. `build_interaction_matrix(purchase_history, save_path)`
   - **Purpose**: Constructs an interaction matrix from purchase history data.
   - **Inputs**:
       - `purchase_history`: DataFrame containing purchase history.
       - `save_path`: Path to save the sparse interaction matrix.
   - **Output**: Sparse interaction matrix.

#### 4. `train_model(interactions_matrix, epochs=10)`
   - **Purpose**: Trains a LightFM model using the interaction matrix.
   - **Inputs**:
       - `interactions_matrix`: Sparse interaction matrix.
       - `epochs`: Number of training iterations (default is 10).
   - **Output**: Trained LightFM model.

#### 5. `save_model(model, save_path)`
   - **Purpose**: Saves the trained LightFM model to disk.
   - **Inputs**:
       - `model`: Trained LightFM model.
       - `save_path`: Path to save the trained model.

#### 6. `main()`
   - **Purpose**: Orchestrates the entire process of data loading, preprocessing, model training, and saving.
   - **Steps**:
       1. Load data.
       2. Preprocess data.
       3. Build interaction matrix.
       4. Train model.
       5. Save model.

### How to Use:
1. Make sure you have CSV files containing customer interactions, purchase history, and product details.
2. Set the paths for your data files and where you want to save the results.
3. Run `main()` and let the magic happen! 🚀

### Additional Notes:
- If you need to use the interaction matrix or purchase history again, there are commented-out lines to help you reload them.
- Feel free to tweak the number of training epochs for the model by changing the `epochs` parameter in `train_model()`.


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
pip install lightfm

Collecting lightfm
  Downloading lightfm-1.17.tar.gz (316 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m316.4/316.4 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: lightfm
  Building wheel for lightfm (setup.py) ... [?25l[?25hdone
  Created wheel for lightfm: filename=lightfm-1.17-cp310-cp310-linux_x86_64.whl size=808330 sha256=d837701c01f78934c8a58c2cfa5d10db98f5da4e114472cbc84aab8c71c9eda9
  Stored in directory: /root/.cache/pip/wheels/4f/9b/7e/0b256f2168511d8fa4dae4fae0200fdbd729eb424a912ad636
Successfully built lightfm
Installing collected packages: lightfm
Successfully installed lightfm-1.17


In [3]:
import pandas as pd
from scipy.sparse import save_npz, load_npz
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.cross_validation import random_train_test_split
import pickle

def load_data(file_path):
    """
    Load data from CSV files.

    Args:
    - file_path (str): Path to the directory containing CSV files.

    Returns:
    - interactions (DataFrame): DataFrame containing customer interactions.
    - purchase_history (DataFrame): DataFrame containing purchase history.
    - product_details (DataFrame): DataFrame containing product details.
    """
    interactions = pd.read_csv(file_path + 'customer_interactions_synthetic.csv')
    purchase_history = pd.read_csv(file_path + 'purchase_history_synthetic.csv')
    product_details = pd.read_csv(file_path + 'product_details_synthetic.csv')
    return interactions, purchase_history, product_details

def preprocess_data(purchase_history, product_details, save_path):
    """
    Preprocess purchase history data.

    Args:
    - purchase_history (DataFrame): DataFrame containing purchase history.
    - product_details (DataFrame): DataFrame containing product details.
    - save_path (str): Path to save preprocessed dataset.

    Returns:
    - purchase_history (DataFrame): Preprocessed purchase history DataFrame.
    """
    purchase_history = pd.merge(purchase_history, product_details, on='Product ID')
    purchase_history.to_csv(save_path + 'preprocessed_dataset.csv', index=False)
    return purchase_history

def build_interaction_matrix(purchase_history, save_path):
    """
    Build interaction matrix from purchase history data.

    Args:
    - purchase_history (DataFrame): DataFrame containing purchase history.
    - save_path (str): Path to save sparse interaction matrix.

    Returns:
    - interactions_matrix (sparse matrix): Sparse interaction matrix.
    """
    dataset = Dataset()
    dataset.fit(users=purchase_history['Customer ID'], items=purchase_history['Product ID'])
    interactions_matrix, _ = dataset.build_interactions([(x['Customer ID'], x['Product ID'], 1) for _, x in purchase_history.iterrows()])
    save_npz(save_path + 'interactions_matrix_sparse.npz', interactions_matrix)
    return interactions_matrix

def train_model(interactions_matrix, epochs=10):
    """
    Train LightFM model.

    Args:
    - interactions_matrix (sparse matrix): Sparse interaction matrix.
    - epochs (int): Number of training epochs.

    Returns:
    - model (LightFM model): Trained LightFM model.
    """
    train_interactions, _ = random_train_test_split(interactions_matrix, test_percentage=0.2, random_state=42)
    model = LightFM(loss='warp')
    model.fit(train_interactions, epochs=epochs)
    return model

def save_model(model, save_path):
    """
    Save trained LightFM model to disk.

    Args:
    - model (LightFM model): Trained LightFM model.
    - save_path (str): Path to save trained model.
    """
    with open(save_path + 'lightfm_model.pkl', 'wb') as f:
        pickle.dump(model, f)

def main():
    """
    Main function to orchestrate data loading, preprocessing, model training, and saving.
    """
    file_path = '/content/drive/MyDrive/fair_dataset/'
    save_path = '/content/drive/MyDrive/fair_dataset/'

    # Load data
    interactions, purchase_history, product_details = load_data(file_path)

    # Preprocess data
    purchase_history = preprocess_data(purchase_history, product_details, save_path)

    # Build interaction matrix
    interactions_matrix = build_interaction_matrix(purchase_history, save_path)

    # Train model
    model = train_model(interactions_matrix)

    # Save model
    save_model(model, save_path)

    # Additional steps
    # If needed, load interaction matrix and purchase history again
    # interactions_matrix = load_npz(save_path + 'interactions_matrix_sparse.npz')
    # purchase_history = pd.read_csv(save_path + 'preprocessed_dataset.csv')

if __name__ == "__main__":
    main()


## Evaluation of Recommendation System Documentation

### Overview
This code evaluates the performance of a recommendation system built using LightFM. It assesses the model's ability to predict user-item interactions and provides evaluation metrics such as RMSE (Root Mean Squared Error) and AUC (Area Under the Curve) scores.

### Functions Explained:

#### 1. Load Data and Model
   - **Purpose**: Load preprocessed dataset and trained model.
   - **Inputs**: None.
   - **Output**: Loaded `purchase_history` DataFrame and trained `model`.

#### 2. Split Data
   - **Purpose**: Split interactions matrix into train and test sets.
   - **Inputs**: None.
   - **Output**: `train_interactions` and `test_interactions` sparse matrices.

#### 3. Predict Scores
   - **Purpose**: Predict preference scores for train and test sets using the trained model.
   - **Inputs**: None.
   - **Output**: Predicted scores for train and test sets.

#### 4. Compute RMSE (Root Mean Squared Error)
   - **Purpose**: Calculate RMSE for train and test sets to evaluate model performance.
   - **Inputs**: Predicted scores and actual interactions data.
   - **Output**: RMSE for train and test sets.

#### 5. Compute AUC (Area Under the Curve)
   - **Purpose**: Calculate AUC score for train and test sets to evaluate model performance.
   - **Inputs**: Trained model and interaction matrices.
   - **Output**: AUC score for train and test sets.

#### 6. Print Evaluation Metrics
   - **Purpose**: Output evaluation metrics (RMSE and AUC) for train and test sets.
   - **Inputs**: RMSE and AUC scores.
   - **Output**: Printed evaluation metrics.

### How to Use:
1. Ensure you have the preprocessed dataset, trained model, and interaction matrix saved.
2. Run the script to evaluate the model's performance.
3. Check the printed RMSE and AUC scores for train and test sets.

### Additional Notes:
- RMSE measures the difference between predicted and actual values, with lower values indicating better performance.
- AUC measures the model's ability to discriminate between positive and negative interactions, with higher values indicating better performance.
- Adjust the test percentage in data splitting by modifying the `test_percentage` parameter in `random_train_test_split`.


In [4]:
import pandas as pd
import numpy as np
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.cross_validation import random_train_test_split
from lightfm.evaluation import auc_score
from sklearn.metrics import mean_squared_error
import pickle
from scipy.sparse import load_npz

# Load preprocessed dataset
purchase_history = pd.read_csv('/content/drive/MyDrive/fair_dataset/preprocessed_dataset.csv')

# Load the sparse matrix containing user-item interactions
interactions_matrix = load_npz('/content/drive/MyDrive/fair_dataset/interactions_matrix_sparse.npz')

# Load trained model from pickle file
with open('/content/drive/MyDrive/fair_dataset/lightfm_model.pkl', 'rb') as f:
    model = pickle.load(f)

# Split data into train and test sets
# Randomly split interactions matrix into train and test sets
train_interactions, test_interactions = random_train_test_split(interactions_matrix, test_percentage=0.2, random_state=42)

# Predict scores for training and test sets
# Predict the preference scores for interactions in both train and test sets using the trained model
train_scores = model.predict(train_interactions.row, train_interactions.col)
test_scores = model.predict(test_interactions.row, test_interactions.col)

# Compute RMSE (Root Mean Squared Error)
# Calculate RMSE for both train and test sets to evaluate the model's performance
train_rmse = np.sqrt(mean_squared_error(train_interactions.data, train_scores))
test_rmse = np.sqrt(mean_squared_error(test_interactions.data, test_scores))

# Compute AUC (Area Under the Curve)
# Calculate AUC score for both train and test sets to evaluate the model's performance
train_auc = auc_score(model, train_interactions).mean()
test_auc = auc_score(model, test_interactions).mean()

# Print RMSE and AUC
# Output the evaluation metrics: RMSE and AUC scores for both train and test sets
print("Train RMSE:", train_rmse)
print("Test RMSE:", test_rmse)
print("Train AUC Score:", train_auc)
print("Test AUC Score:", test_auc)


Train RMSE: 2.5414734892967394
Test RMSE: 2.868328441671248
Train AUC Score: 0.67846113
Test AUC Score: 0.49914727


## Recommendation Generation Documentation

### Overview
This code generates personalized product recommendations for a given user using a trained LightFM model. It utilizes user-item interaction data and product details to recommend products based on the user's preferences.

### Functions Explained:

#### 1. `load_model(model_path)`
   - **Purpose**: Load a trained LightFM model from a pickle file.
   - **Inputs**:
       - `model_path` (str): Path to the pickle file containing the trained model.
   - **Output**: Trained LightFM model.

#### 2. `recommend_products(model, user_id, interactions_matrix, product_details, num_recommendations=5)`
   - **Purpose**: Generate recommendations for a given user.
   - **Inputs**:
       - `model` (LightFM): Trained LightFM model.
       - `user_id` (int): ID of the user for whom recommendations are generated.
       - `interactions_matrix` (sparse matrix): Sparse matrix representing user-item interactions.
       - `product_details` (DataFrame): DataFrame containing product details.
       - `num_recommendations` (int): Number of recommendations to generate (default is 5).
   - **Output**: DataFrames containing the user's known interactions (`known_positives`) and top recommended products (`top_items`).

### How to Use:
1. Ensure you have the trained LightFM model, interactions matrix, and product details dataset saved.
2. Specify the user ID for whom recommendations will be generated (`user_id`).
3. Run the script to obtain recommendations.
4. Check the printed outputs for the user's known interactions and top recommended products.

### Additional Notes:
- Adjust the number of recommendations to generate by modifying the `num_recommendations` parameter in `recommend_products`.
- The provided `user_id` should correspond to the IDs used in the interactions matrix.


In [13]:
import pandas as pd
import numpy as np
from lightfm import LightFM
from lightfm.data import Dataset
from lightfm.cross_validation import random_train_test_split
from scipy.sparse import load_npz
import pickle

def load_model(model_path):
    """
    Load a trained LightFM model from a pickle file.

    Args:
    - model_path (str): Path to the pickle file containing the trained model.

    Returns:
    - model (LightFM): Trained LightFM model.
    """
    with open(model_path, 'rb') as f:
        model = pickle.load(f)
    return model

def recommend_products(model, user_id, interactions_matrix, product_details, num_recommendations=5):
    """
    Generate recommendations for a given user.

    Args:
    - model (LightFM): Trained LightFM model.
    - user_id (int): ID of the user for whom recommendations are generated.
    - interactions_matrix (sparse matrix): Sparse matrix representing user-item interactions.
    - product_details (DataFrame): DataFrame containing product details.
    - num_recommendations (int): Number of recommendations to generate.

    Returns:
    - known_positives (DataFrame): DataFrame containing the user's known interactions.
    - top_items (DataFrame): DataFrame containing the top recommended products.
    """
    # Get the user's known interactions
    known_positives = product_details.loc[interactions_matrix.tocsr()[user_id].indices]

    # Predict scores for all products
    num_items = interactions_matrix.shape[1]
    scores = model.predict(user_id, np.arange(num_items))

    # Rank the products based on scores
    top_items = product_details.iloc[np.argsort(-scores)][:num_recommendations]

    return known_positives, top_items

# Load the trained LightFM model
model_path = '/content/drive/MyDrive/fair_dataset/lightfm_model.pkl'
model = load_model(model_path)

# Load interactions matrix
interactions_matrix = load_npz('/content/drive/MyDrive/fair_dataset/interactions_matrix_sparse.npz')

# Load product details
product_details = pd.read_csv('/content/drive/MyDrive/fair_dataset/product_details_synthetic.csv')

# User ID to make recommendations for
user_id = 2665  # Please note that the actual user ID != user_id in interactions_matrix, this driving me nuts.

# Get recommendations for the user
known_positives, top_items = recommend_products(model, user_id, interactions_matrix, product_details)

print("User's known interactions:")
print(known_positives)

print("\nTop recommended products:")
print(top_items)


User's known interactions:
     Product ID                              Category        Price  Ratings  \
10           11           Home and Kitchen Appliances   498.231012      1.9   
55           56      Consumer Electronics Accessories   115.233923      4.8   
86           87      Consumer Electronics Accessories    21.833070      2.8   
157         158           Home and Kitchen Appliances  1931.479308      4.2   
188         189       Apparel and Fashion Accessories    48.677301      2.1   
196         197     Beauty and Personal Care Products    69.109522      2.6   
207         208                           Electronics  3342.564356      4.8   
237         238  Sporting Goods and Fitness Equipment   904.035733      3.3   
325         326           Home and Kitchen Appliances   319.079902      4.0   
380         381      Consumer Electronics Accessories   199.505981      2.8   
454         455          Health and Wellness Products    33.911870      3.6   
539         540          