#Duke University

Course: AIPI590 (Deep Reinforcement Learning Applications)

Fall 2022:


# Final Project: Product recommender for E-commerce

**Dataset**: [Amazon Clothing and Jewelry](http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Clothing_Shoes_and_Jewelry_5.json.gz)

**Metric**: NDCG

# Dataset Explanation

**Description**:

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.

This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).

Finally, this data have been reduced to extract the 5-core, such that each of the remaining users and items have 5 reviews each.


**Basic Statistics**:

Number of users:	39,387\
Number of items:	23,033\
Number of transactions:	278,677

**Example**
```
{
  "reviewerID": "A2SUAM1J3GNN3B",
  "asin": "0000013714",
  "reviewerName": "J. McDonald",
  "helpful": [2, 3],
  "reviewText": "I bought this for my husband who plays the piano.  He is having a wonderful time playing these old hymns.  The music  is at times hard to read because we think the book was published for singing from more than playing from.  Great purchase though!",
  "overall": 5.0,
  "summary": "Heavenly Highway Hymns",
  "unixReviewTime": 1252800000,
  "reviewTime": "09 13, 2009"
}
```
where

- reviewerID - ID of the reviewer, e.g. A2SUAM1J3GNN3B
- asin - ID of the product, e.g. 0000013714
- reviewerName - name of the reviewer
- helpful - helpfulness rating of the review, e.g. 2/3
- reviewText - text of the review
- overall - rating of the product
- summary - summary of the review
- unixReviewTime - time of the review (unix time)
- reviewTime - time of the review (raw)


More details can be found here: http://jmcauley.ucsd.edu/data/amazon/links.html

# Imports


In [9]:
import pandas as pd
import os

#Install

In [None]:
! pip install pandas trfl

# Loading Data

Download the data from link mentioned previously into a Datasets folder in this repository. 

In [56]:

amazon = pd.read_json('Datasets/Amazon/reviews_Clothing_Shoes_and_Jewelry_5.json.gz',compression='gzip',lines=True)
amazon.head()

Unnamed: 0,reviewerID,asin,reviewerName,helpful,reviewText,overall,summary,unixReviewTime,reviewTime
0,A1KLRMWW2FWPL4,31887,"Amazon Customer ""cameramom""","[0, 0]",This is a great tutu and at a really great pri...,5,Great tutu- not cheaply made,1297468800,"02 12, 2011"
1,A2G5TCU2WDFZ65,31887,Amazon Customer,"[0, 0]",I bought this for my 4 yr old daughter for dan...,5,Very Cute!!,1358553600,"01 19, 2013"
2,A1RLQXYNCMWRWN,31887,Carola,"[0, 0]",What can I say... my daughters have it in oran...,5,I have buy more than one,1357257600,"01 4, 2013"
3,A8U3FAMSJVHS5,31887,Caromcg,"[0, 0]","We bought several tutus at once, and they are ...",5,"Adorable, Sturdy",1398556800,"04 27, 2014"
4,A3GEOILWLK86XM,31887,CJ,"[0, 0]",Thank you Halo Heaven great product for Little...,5,Grammy's Angels Love it,1394841600,"03 15, 2014"


In [44]:
print(len(amazon))
print(len(amazon.reviewerID.unique()))
print(len(amazon.asin.unique()))

278677
39387
23033


# Modeling

Deep RL model used on amazon dataset is 
[**Self-Supervised Reinforcement Learning for Recommender Systems**](https://arxiv.org/abs/2006.05779).


Download the code from link given and save the files in kaggle folder into Scripts folder of this repository. 

##Tensorflow Conversion
Since, this code is based on tensorflow version 1, we will update it to use tensorflow version 2.

In [None]:
PROJ_DIR = '/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle'
# change current directory after mounting
%cd $PROJ_DIR
! ls

/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle
data		     preprocess_kaggle.py  SA2C_new.py	     split_data.py
DQN_NS.py	     __pycache__	   SA2C.py	     test.py
NextItNetModules.py  replay_buffer.py	   SASRecModules.py  utility.py
pop.py		     report.txt		   SNQN.py


In [None]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle/SA2C.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle/SA2C_new.py' \
  --reportfile report.txt

In [None]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/split_data.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/split_data_new.py' \
  --reportfile report.txt

In [None]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/replay_buffer.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/replay_buffer_new.py' \
  --reportfile report.txt

In [None]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/NextItNetModules.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/NextItNetModules.py' \
  --reportfile report.txt

In [None]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/utility.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/utility.py' \
  --reportfile report.txt

In [36]:
!tf_upgrade_v2 \
  --infile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/SASRecModules.py' \
  --outfile '/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project/Scripts/SASRecModules.py' \
  --reportfile report.txt

INFO line 22:11: Added keywords to args of function 'tf.convert_to_tensor'
INFO line 90:9: Renamed 'tf.variable_scope' to 'tf.compat.v1.variable_scope'
INFO line 91:23: Renamed 'tf.get_variable' to 'tf.compat.v1.get_variable'
INFO line 95:51: Multiplying scale arg of tf.contrib.layers.l2_regularizer by half to what tf.keras.regularizers.l2 expects.

INFO line 99:18: Added keywords to args of function 'tf.nn.embedding_lookup'
INFO line 134:9: Renamed 'tf.variable_scope' to 'tf.compat.v1.variable_scope'
INFO line 142:12: Renamed 'tf.layers.dense' to 'tf.compat.v1.layers.dense'
INFO line 143:12: Renamed 'tf.layers.dense' to 'tf.compat.v1.layers.dense'
INFO line 144:12: Renamed 'tf.layers.dense' to 'tf.compat.v1.layers.dense'
INFO line 152:32: Added keywords to args of function 'tf.transpose'
INFO line 158:35: Added keywords to args of function 'tf.reduce_sum'
INFO line 160:62: Added keywords to args of function 'tf.shape'
INFO line 163:18: Renamed 'tf.where' to 'tf.compat.v1.where'
INFO l

In [2]:
cd '/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle'

/content/drive/Othercomputers/My-MacBook-Air/SA2C_Code/Kaggle


## Data Preparation
Run the code below to create following necessary datasets before running the model.

* sorted_amazon.df
* data_ststis.df
* pop_dict.df
* replay_buffer.df
* sampled_train.df
* sampled_val.df



In [45]:
cd /content/drive/Othercomputers/My-MacBook-Air/590-Final-Project

/content/drive/Othercomputers/My-MacBook-Air/590-Final-Project


In [50]:
%run 'Scripts/preprocess.py'

In [51]:
%run 'Scripts/split_data.py'

In [62]:
!python 'Scripts/pop.py' --data='Datasets/Amazon'

0.0
2.0
1.0


In [57]:
%run 'Scripts/replay_buffer.py'

        reviewerID   asin reviewerName helpful  \
30668            0   1883     LarryRun  [0, 0]   
72389            0   4954     LarryRun  [0, 0]   
122588           0   9193     LarryRun  [2, 4]   
135480           0  10286     LarryRun  [0, 0]   
55901            0   3676     LarryRun  [1, 1]   

                                               reviewText  overall  \
30668   These champion mesh are so much better than th...        5   
72389   These wet look leggings were a great buy and f...        5   
122588  I am not sure why other reviews say this is no...        5   
135480  I got so many comments when I wore these pants...        5   
55901   Bought this for my girl for Halloween and it f...        5   

                           summary  unixReviewTime   reviewTime  
30668        Great fit and quality      1370908800  06 11, 2013  
72389    Super fit. Looks amazing!      1350518400  10 18, 2012  
122588   Exactly as pictured! HOT!      1350518400  10 18, 2012  
135480  Awesom

## Running model on rent the Amazon

NDCG is a measure of ranking quality.

In [None]:
!python 'Scripts/SA2C.py' --model=GRU --data='Datasets/Amazon'

  tf.compat.v1.nn.rnn_cell.GRUCell(self.hidden_size),
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2022-12-11 19:36:25.608243: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
  self.output1 = tf.compat.v1.layers.dense(self.states_hidden, self.item_num,
  self.output2= tf.compat.v1.layers.dense(

In [37]:
!python 'Scripts/SA2C.py' --model=SASRec --data='Datasets/Amazon'

  self.seq = tf.compat.v1.layers.dropout(self.seq,
2022-12-11 18:43:01.463359: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
  Q = tf.compat.v1.layers.dense(queries, num_units, activation=None) # (N, T_q, C)
  K = tf.compat.v1.layers.dense(keys, num_units, activation=None) # (N, T_k, C)
  V = tf.compat.v1.layers.dense(keys, num_units, activation=None) # (N, T_k, C)
  outputs = tf.compat.v1.layers.dropout(outputs, rate=dropout_rate, training=tf.convert_to_tensor(value=is_training))
  outputs = tf.compat.v1.layers.conv1d(**params)
  outputs = tf.compat.v1.layers.dropout(outputs, rate=dropout_rate, training=tf.convert_to_tensor(value=is_training))
  outputs = tf.compat.v1.layers.conv1d(**params)
  outputs = tf.compat.v1.layers.dropout(outputs, rate=dropout_rate, training=tf.convert_to_tensor(value=is_training))
  self.output1 = tf.compat.v1.layers.dense(self.states_hidden, self.item_num,
  self.output2