# Product Recommender on RetailRocket Dataset

- We have used the cornac package and mircosoft's recommenders module to train a Bayesian Personalised Ranking model on retail rocket e commere dataset. 

- The model learns and recommends top K items after ranking them based on user and product interactions

- Dataset in itself is huge, hence we have taken a subsample to train the model on google colab

In [1]:
AUTHORNAME = "Archit Kaila"
COLLABORATORS = "Shrey Gupta, Shen Juin Lee"

In [2]:
## Clone the repository and code base to run Non DRL Recommenders
!git clone https://github.com/architkaila/recommenders_aipi590.git

Cloning into 'recommenders_aipi590'...
remote: Enumerating objects: 76, done.[K
remote: Counting objects: 100% (76/76), done.[K
remote: Compressing objects: 100% (50/50), done.[K
remote: Total 76 (delta 29), reused 64 (delta 20), pack-reused 0[K
Unpacking objects: 100% (76/76), done.


In [3]:
## Install required libraries (only for google colab)
!pip install cornac
!pip install recommenders

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting cornac
  Downloading cornac-1.14.2-cp38-cp38-manylinux1_x86_64.whl (14.4 MB)
[K     |████████████████████████████████| 14.4 MB 14.1 MB/s 
[?25hCollecting powerlaw
  Downloading powerlaw-1.5-py3-none-any.whl (24 kB)
Installing collected packages: powerlaw, cornac
Successfully installed cornac-1.14.2 powerlaw-1.5
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting recommenders
  Downloading recommenders-1.1.1-py3-none-any.whl (339 kB)
[K     |████████████████████████████████| 339 kB 14.3 MB/s 
Collecting scikit-surprise>=1.0.6
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
[K     |████████████████████████████████| 771 kB 72.5 MB/s 
[?25hCollecting transformers<5,>=2.5.0
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 59.3 MB/s 
Collecting retrying>=1.3

In [4]:
## Fetch the dataset from S3 Bucket
!wget https://aipi590.s3.amazonaws.com/events.csv -P "/content/recommenders_aipi590/Non_DRL_Recommenders/Dataset_1_Retail_Rocket/"


--2022-12-12 16:01:54--  https://aipi590.s3.amazonaws.com/events.csv
Resolving aipi590.s3.amazonaws.com (aipi590.s3.amazonaws.com)... 52.217.232.201, 52.217.78.4, 52.217.129.249, ...
Connecting to aipi590.s3.amazonaws.com (aipi590.s3.amazonaws.com)|52.217.232.201|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 94237913 (90M) [text/csv]
Saving to: ‘/content/recommenders_aipi590/Non_DRL_Recommenders/Dataset_1_Retail_Rocket/events.csv’


2022-12-12 16:01:57 (27.6 MB/s) - ‘/content/recommenders_aipi590/Non_DRL_Recommenders/Dataset_1_Retail_Rocket/events.csv’ saved [94237913/94237913]



In [5]:
## Import standard libraries
import pandas as pd
import numpy as np

In [6]:
## Import python script to run and evaluate BPR model
from recommenders_aipi590.Non_DRL_Recommenders.bpr_model import run_bpr_model

### Read dataset

In [7]:
## Reading the e-commerce dataset
df = pd.read_csv('/content/recommenders_aipi590/Non_DRL_Recommenders/Dataset_1_Retail_Rocket/events.csv')
df.head()

Unnamed: 0,timestamp,visitorid,event,itemid,transactionid
0,1433221332117,257597,view,355908,
1,1433224214164,992329,view,248676,
2,1433221999827,111016,view,318965,
3,1433221955914,483717,view,253185,
4,1433221337106,951259,view,367447,


In [8]:
## The implicit feedback between items and users pairs can be obtained using the events column
df.event.value_counts()

view           2664312
addtocart        69332
transaction      22457
Name: event, dtype: int64

In [9]:
## We take a subsample of our original dataset to train out model
df = df.sample(n=5000, random_state=0)

### Prepare datset

- The BPR implimentation in Cornac module works on the rankings (implicit feedbacks) for each user item pair. 

- We use the Negative Sampling method to prepare our data. This works on the assumption that if there is a interaction between user and item, then ranking is set to one else it is set to 0

- The postive interactions are present in our dataset and the negative interactions we prepare manually

In [10]:
## Set ranking (implicit feedbak) to 1 for interactions between user and item
df = df[['visitorid', 'itemid']].copy()
df['FEEDBACK'] = 1

# Remove duplicates from our samples
df = df.drop_duplicates()

# Rename the columns for explanability
df.rename(columns = {'visitorid': 'userID', 'itemid': 'itemID', 'FEEDBACK': 'rating'}, inplace = True)

df.head()

Unnamed: 0,userID,itemID,rating
125223,1034162,392595,1
2744309,203418,409876,1
1829628,1023669,325670,1
43129,1290714,229204,1
1261516,168692,220360,1


In [11]:
## Obtain list of unique items and users present in our dataset to genrate negative interations
item_ids = df['itemID'].unique()
user_ids = df['userID'].unique()

In [12]:
## Adding negative feedback (0 ranking) for instances of no interaction between items and users
absent_interactions_feedback = [[user, item, 0] for item in item_ids for user in user_ids] 

# Convert prepared data into a dataframe
negative_feedback_df = pd.DataFrame(data=absent_interactions_feedback, columns=["userID", "itemID", "rating"])

negative_feedback_df.head()

Unnamed: 0,userID,itemID,rating
0,1034162,392595,0
1,203418,392595,0
2,1023669,392595,0
3,1290714,392595,0
4,168692,392595,0


In [13]:
## Merge the positive and negative feedback into one single master dataframe
prepared_dataset = pd.merge(negative_feedback_df, df, on=['userID', 'itemID'], how='outer').fillna(0).drop('rating_x', axis = 1)

# Cleaning up the column names
prepared_dataset.rename(columns = {'rating_y': 'rating'}, inplace = True)

prepared_dataset.head()

Unnamed: 0,userID,itemID,rating
0,1034162,392595,1.0
1,203418,392595,0.0
2,1023669,392595,0.0
3,1290714,392595,0.0
4,168692,392595,0.0


In [14]:
## Check number of positive and negative feedback samples
prepared_dataset['rating'].value_counts()

0.0    22284404
1.0        4996
Name: rating, dtype: int64

### Run and Evaluate Product Ranking Model

- We use the Cornac module the train and evaluate a Bayesian Personalised Ranking model
- We set the value for top K as 5 and train our model for 50 epochs
- We set the LR to 0.01
- We utilize 80% of our dataset for training and 20% for testing

In [17]:
## Call our BPR model train and evaluation script on our prepared dataset
result = run_bpr_model(data=prepared_dataset, k=10, epochs=20, learning_rate=0.001, train_size=0.8)

rating_threshold = 1.0
exclude_unknowns = False
---
Training data:
Number of users = 4872
Number of items = 4575
Number of ratings = 17831520
Max rating = 1.0
Min rating = 0.0
Global mean = 0.0
---
Test data:
Number of users = 4872
Number of items = 4575
Number of ratings = 4457880
Number of unknown users = 0
Number of unknown items = 0
---
Total users = 4872
Total items = 4575

[BPR] Training started!


  0%|          | 0/20 [00:00<?, ?it/s]

Optimization finished!

[BPR] Evaluation started!


Ranking:   0%|          | 0/4872 [00:00<?, ?it/s]

In [18]:
## Capture the model metric results on test data
print(result)

    |    MAP |    MRR | NDCG@10 | Train (s) | Test (s)
--- + ------ + ------ + ------- + --------- + --------
BPR | 0.0015 | 0.0015 |  0.0007 |  171.2883 |   5.0060



# **References**

1. Data Preparation for Colborative Filtering | Microsoft
https://github.com/microsoft/recommenders/blob/main/examples/01_prepare_data/data_transform.ipynb

2. Cornac Movie Recommendation using BPR | Microsoft
https://github.com/microsoft/recommenders/blob/main/examples/02_model_collaborative_filtering/cornac_bpr_deep_dive.ipynb

3. Bayesian Personalised Ranking (BPR) Evaluation Example | PreferredAI, Cornac
https://github.com/PreferredAI/cornac/blob/master/examples/bpr_netflix.py
https://cornac.preferred.ai/

4. BPR: Bayesian personalized ranking from implicit feedback | Rendle, S., Freudenthaler, C., Gantner, Z., & Schmidt-Thieme, L. (2009, June).
https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf