# Self-Attentive Sequential Recommender (SASRec) + Supervised Negative Q-Learning (SNQN) Recommender with CQL Loss

In this notebook, we train an SASRec-SNQN model to recommend a list of items to users in the dataset. We are using RetailRocket and H&M datasets to train this model. We will compare the success of models with the use of CQL Loss and without it. 

## Setup

In [None]:
# Clone our repository
!git clone https://github.com/szheng3/recommendation-system.git
# Install Requirements
!pip install trfl

## RetailRocket

### Prepare the Data

In [None]:
# Download the data
!wget https://aipi590.s3.amazonaws.com/events.csv -P 'recommendation-system/Explore_CQL/Data/RR_data'


In [None]:
# Preprocess the data
!python 'recommendation-system/Explore_CQL/DLR2/src/gen_replay_buffer.py' --data='recommendation-system/Explore_CQL/Data/RR_data'

In [None]:
!tf_upgrade_v2 \
  --infile 'recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py' \
  --outfile 'recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py' \
  --reportfile report_SNQN.txt

### Training

#### Without CQL Loss

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --data="recommendation-system/Explore_CQL/Data/RR_data"

2023-05-02 16:22:00.707715: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Not using CQL loss
  self.seq = tf.compat.v1.layers.dropout(self.seq,
2023-05-02 16:22:04.258908: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[{{node Placeholder}}]]
2023-05-02 16:22:04.338297: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/D

#### With CQL Loss

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --CQL_alpha=0.5 --data="recommendation-system/Explore_CQL/Data/RR_data"

2023-05-03 14:44:24.600262: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Using CQL loss.
  self.seq = tf.compat.v1.layers.dropout(self.seq,
2023-05-03 14:44:28.318958: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[{{node Placeholder}}]]
2023-05-03 14:44:28.391534: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentatio

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --CQL_alpha=1.0 --data="recommendation-system/Explore_CQL/Data/RR_data"

2023-05-03 04:06:37.666411: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Using CQL loss.
  self.seq = tf.compat.v1.layers.dropout(self.seq,
2023-05-03 04:06:40.988046: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[{{node Placeholder}}]]
2023-05-03 04:06:41.052021: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentatio

## H&M

### Prepare the Data

In [None]:
# Download the data
!wget https://aipi590.s3.amazonaws.com/transactions_train.csv -P "recommendation-system/Explore_CQL/Data/HM_data"

--2023-05-02 20:34:42--  https://aipi590.s3.amazonaws.com/transactions_train.csv
Resolving aipi590.s3.amazonaws.com (aipi590.s3.amazonaws.com)... 52.217.228.57, 3.5.29.19, 54.231.204.217, ...
Connecting to aipi590.s3.amazonaws.com (aipi590.s3.amazonaws.com)|52.217.228.57|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3488002253 (3.2G) [text/csv]
Saving to: ‘recommendation-system/Explore_CQL/Data/HM_data/transactions_train.csv’


2023-05-02 20:39:01 (12.9 MB/s) - ‘recommendation-system/Explore_CQL/Data/HM_data/transactions_train.csv’ saved [3488002253/3488002253]



In [None]:
# Preprocess the data
!python "recommendation-system/Explore_CQL/DLR2/src/gen_replay_buffer_HM.py" --data="recommendation-system/Explore_CQL/Data/HM_data"


Start reading all transaction data ...
Finish reading in 00:00:28

Filter and save all valid sampled data
Index(['timestamp', 'session_id', 'item_id', 'price', 'sales_channel_id'], dtype='object')

Start counting popularity ...
13040912it [08:15, 26313.18it/s]
Popularity finished in 00:08:15

Start spliting into train, val, test data ...

           Generate Replay Buffer:
                Total Session Size : 1245612
                     Train:      871928 ids | 9124752 actions
                     Validation: 249122 ids | 2611174 actions
                     Test:       124562 ids | 1304986 actions
                     
                Random : True
                Random Seed : 1234
                Format : paper
    
                Total session id number : 1245612
                Total item id number  : 96222
    
Generating training replay buffer
100% 608701/608701 [09:51<00:00, 1029.08it/s]


### Training

#### Without CQL Loss

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --data="recommendation-system/Explore_CQL/Data/HM_data"

2023-05-02 21:00:17.915279: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-05-02 21:00:17.970957: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Not using CQL loss
  self.seq = tf.compat.v1.layers.dropout(self.seq,
2023-05-02 21:00:21.143740: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder' with dtype bool
	 [[{{node P

#### With CQL Loss

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --CQL_alpha=0.5 --data="recommendation-system/Explore_CQL/Data/HM_data"

In [None]:
!python "recommendation-system/Explore_CQL/DLR2/src/SNQN_v2.py" --model=SASRec --epoch=10 --CQL_alpha=1.0 --data="recommendation-system/Explore_CQL/Data/HM_data"