<a href="https://colab.research.google.com/github/novrian6/ml_product_prediction/blob/main/Complete_Product_Recommendation_using_SVD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Programmed by Nova Novriansyah
### Created at: 26 December 2023 22:30
### Note:
#####To run this please make sure merged_data.csv is available. Files can be found on the git and uploaded to the colab.
####1.The following code will create a model using merged data provided (merged.csv). The data is merging of 3 files (product_details, purchase_history_customer_interactions), and flatten to be used by ML model training.
####2.ML Algorithm: Due to small amount of data had, and considering the  features/columns on the data, the algorithm used is collaborative filtering using SVD (Singular Value Decomposition).  
####3. Singular Value Decomposition (SVD) is a mathematical technique used for matrix factorization. In the context of recommendation systems or collaborative filtering, SVD is applied to factorize a user-item interaction matrix into lower-dimensional matrices, capturing latent features or preferences. Latent features used is ratings.
####4. Deep learning is not possible in this case (amount of data is too litle),   to get better accurate prediction might be possible using Deep Learning (RNN/LSTM) if more data provided.
####5. The output model of the training will be saved as collab_filtering_model.pkl.
####6.This model result with accuracy metrics RMSE :0.3162 , which is considered good using SVD




##1. Load Library

In [5]:
#unmark and run below line if surprise is not installed
!pip install surprise

Collecting surprise
  Downloading surprise-0.1-py2.py3-none-any.whl (1.8 kB)
Collecting scikit-surprise (from surprise)
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m772.0/772.0 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp310-cp310-linux_x86_64.whl size=3163752 sha256=0bd5603d3218f3920f2dbe7cf1e10a9b4f61df90ceb88b4ea897fd1207c406c1
  Stored in directory: /root/.cache/pip/wheels/a5/ca/a8/4e28def53797fdc4363ca4af740db15a9c2f1595ebc51fb445
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.3 surprise-0.1


In [10]:
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise import accuracy
import pandas as pd

##2. Load and preprocess Data as Dataframe

In [11]:
# Load data as data frame
# The data is merging of 3 files (product_details, purchase_history_customer_interactions), and flatten to be used by ML model training.
df = pd.read_csv("merged_data.csv")
df


Unnamed: 0,customer_id,product_id,purchase_date,page_views,time_spent,category,price,ratings
0,1,101,2023-01-01,25,120,Electronics,500,4.5
1,1,105,2023-01-05,25,120,Electronics,800,4.8
2,2,102,2023-01-02,20,90,Clothing,50,3.8
3,3,103,2023-01-03,30,150,Home & Kitchen,200,4.2
4,4,104,2023-01-04,15,80,Beauty,30,4.0
5,5,101,2023-01-05,22,110,Electronics,500,4.5


##3.Preprocess the data and create train set and test set

In [14]:
# Load the DataFrame into Surprise library's Dataset.
#In SVD  3 common features used are customer id, product id and ratings.
#Ratings contains latent factors that represent customer preferences.
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(df[['customer_id', 'product_id', 'ratings']], reader)



In [15]:
# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2, random_state=42)


##4.Create and train the Model using SVD (Singular value decomposition)

In [16]:
# Use SVD algorithm for collaborative filtering. SVD choosen due to the amount of data provided in addition its simplicity to handle small amount of data.
# Given more data provided, improved model can use   deep learning with LSTM and RNN
model = SVD()
model.fit(trainset)


<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7d0023e88a90>

##5.Test & Evaluate the model using test data

In [17]:
# Predictions on test set
test_predictions = model.test(testset)

# Evaluation on test set - RMSE. RMSE. 0.3162 is considered good on using SVD
test_rmse = accuracy.rmse(test_predictions)
print(f"Test RMSE: {test_rmse}")

RMSE: 0.3162
Test RMSE: 0.3162277660168382


##6.Code below to test the model prediction


In [18]:
# Make predictions for a particular user (example: customer_id = 1)
user_id = 4
user_predictions = []
for item_id in df['product_id'].unique():
    pred = model.predict(user_id, item_id)
    user_predictions.append({
        'user_id': user_id,
        'item_id': item_id,
        'predicted_rating': pred.est
    })

# Display predictions for the user
user_predictions_df = pd.DataFrame(user_predictions)
print(user_predictions_df)

   user_id  item_id  predicted_rating
0        4      101          4.410539
1        4      105          4.429267
2        4      102          4.347228
3        4      103          4.400000
4        4      104          4.400000


##7. Save the Model

In [19]:
#save the model
import pickle
model_filename = 'collab_filtering_model.pkl'
with open(model_filename, 'wb') as file:
    pickle.dump(model, file)

#Below code used for testing purpose

###1. Load saved Model

In [20]:
#Load the model
model_filename = 'collab_filtering_model.pkl'
# Load the model from the file
with open(model_filename, 'rb') as file:
    loaded_model = pickle.load(file)

# Now you can use loaded_model for predictions or other tasks


###2. Use the loaded model to predict


In [21]:
# Make predictions for a particular user (example: customer_id = 1) using loaded model
user_id = 4
user_predictions = []
for item_id in df['product_id'].unique():
    pred = loaded_model.predict(user_id, item_id)
    user_predictions.append({
        'user_id': user_id,
        'item_id': item_id,
        'predicted_rating': pred.est
    })

# Display predictions for the user
user_predictions_df = pd.DataFrame(user_predictions)
print(user_predictions_df)

   user_id  item_id  predicted_rating
0        4      101          4.410539
1        4      105          4.429267
2        4      102          4.347228
3        4      103          4.400000
4        4      104          4.400000
