# <center>DataLab Cup 4: Recommender Systems</center>
<center>Shan-Hung Wu & DataLab</center>
<center>Fall 2023</center>

Team: 陳瑜旋轉陳玟旋轉陳瑜旋

Team Member: 111501538 劉杰閎、111062588 陳玟璇、111062697 吳律穎

## Platform: [Kaggle](https://www.kaggle.com/t/b06e248a3827434f80c4fdc6009d5fe0)

Please download the dataset and the environment source code from Kaggle.

## Environment Setting


In [1]:
import os
import random
import copy

import tensorflow as tf
import numpy as np
import pandas as pd
from tqdm import tqdm

from evaluation.environment import TrainingEnvironment, TestingEnvironment
from sentence_transformers import SentenceTransformer

2024-01-15 22:56:19.202720: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        # Select GPU number 1
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

2 Physical GPUs, 1 Logical GPUs


2024-01-15 22:56:20.853764: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-15 22:56:20.853929: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-15 22:56:20.859759: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-15 22:56:20.859936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-15 22:56:20.860074: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from S

In [3]:
# Official hyperparameters for this competition (do not modify)
N_TRAIN_USERS = 1000
N_TEST_USERS = 2000
N_ITEMS = 209527
HORIZON = 2000
TEST_EPISODES = 5
SLATE_SIZE = 5

bus id: 0000:01:00.0, compute capability: 8.6


## Datasets

In [4]:
# Dataset paths
USER_DATA = os.path.join('dataset', 'user_data.json')
ITEM_DATA = os.path.join('dataset', 'item_data.json')

# Output file path
OUTPUT_PATH = os.path.join('output', 'output.csv')

### User Data

In [5]:
df_user = pd.read_json(USER_DATA, lines=True)
df_user

Unnamed: 0,user_id,history
0,0,"[42558, 65272, 13353]"
1,1,"[146057, 195688, 143652]"
2,2,"[67551, 85247, 33714]"
3,3,"[116097, 192703, 103229]"
4,4,"[68756, 140123, 135289]"
...,...,...
1995,1995,"[95090, 131393, 130239]"
1996,1996,"[2360, 147130, 8145]"
1997,1997,"[99794, 138694, 157888]"
1998,1998,"[55561, 60372, 51442]"


### Item Data

In [6]:
df_item = pd.read_json(ITEM_DATA, lines=True)
df_item

Unnamed: 0,item_id,headline,short_description
0,0,Over 4 Million Americans Roll Up Sleeves For O...,Health experts said it is too early to predict...
1,1,"American Airlines Flyer Charged, Banned For Li...",He was subdued by passengers and crew when he ...
2,2,23 Of The Funniest Tweets About Cats And Dogs ...,"""Until you have a dog you don't understand wha..."
3,3,The Funniest Tweets From Parents This Week (Se...,"""Accidentally put grown-up toothpaste on my to..."
4,4,Woman Who Called Cops On Black Bird-Watcher Lo...,Amy Cooper accused investment firm Franklin Te...
...,...,...,...
209522,209522,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,Verizon Wireless and AT&T are already promotin...
209523,209523,Maria Sharapova Stunned By Victoria Azarenka I...,"Afterward, Azarenka, more effusive with the pr..."
209524,209524,"Giants Over Patriots, Jets Over Colts Among M...","Leading up to Super Bowl XLVI, the most talked..."
209525,209525,Aldon Smith Arrested: 49ers Linebacker Busted ...,CORRECTION: An earlier version of this story i...


## Text Embedding from item descriptions

在這裡我們針對每一個item的描述，使用`SentenceTransformer`去做Text Embedding，可以得到每一個item都有headline的embedding跟short_description的embedding，兩者的維度都是768。


In [7]:
dataset_dir = './dataset/'
if os.path.exists(dataset_dir + 'train_data_embedding.pkl'):
    df_item_train = pd.read_pickle(dataset_dir + 'train_data_embedding.pkl')
else:
    sbert = SentenceTransformer('all-mpnet-base-v2')
    df_item_train = pd.read_json(ITEM_DATA, lines=True)
    df_item_train['headline_embeddings'] = df_item_train['headline'].apply(lambda x: sbert.encode(x))
    df_item_train['short_description_embeddings'] = df_item_train['short_description'].apply(lambda x: sbert.encode(x))
    df_item_train.to_pickle(dataset_dir + 'train_data_embedding.pkl')

df_item_train

Unnamed: 0,item_id,headline,short_description,headline_embeddings,short_description_embeddings
0,0,Over 4 Million Americans Roll Up Sleeves For O...,Health experts said it is too early to predict...,"[-0.054995973, 0.10514701, 0.0009537986, -0.07...","[0.04689467, 0.089309394, -0.018575395, -0.029..."
1,1,"American Airlines Flyer Charged, Banned For Li...",He was subdued by passengers and crew when he ...,"[-0.020863444, 0.011131575, 0.0013632453, -0.0...","[0.017128233, -0.0062120855, 0.015252358, 0.02..."
2,2,23 Of The Funniest Tweets About Cats And Dogs ...,"""Until you have a dog you don't understand wha...","[0.017761054, 0.053476874, 6.918786e-05, -0.03...","[0.10238154, 0.07736524, 0.0020822838, -0.0614..."
3,3,The Funniest Tweets From Parents This Week (Se...,"""Accidentally put grown-up toothpaste on my to...","[-0.0029250348, 0.01137404, 0.0045979875, -0.0...","[0.04334459, 0.056244634, 0.0071496996, -0.057..."
4,4,Woman Who Called Cops On Black Bird-Watcher Lo...,Amy Cooper accused investment firm Franklin Te...,"[-0.0049342206, 0.053551663, 0.027952224, -0.0...","[-0.0066743735, 0.03416268, -0.00058029604, 0...."
...,...,...,...,...,...
209522,209522,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,Verizon Wireless and AT&T are already promotin...,"[0.041218493, -0.007820907, -0.01887703, -0.02...","[-0.029524302, -0.0045847334, -0.054970894, -0..."
209523,209523,Maria Sharapova Stunned By Victoria Azarenka I...,"Afterward, Azarenka, more effusive with the pr...","[-0.047861934, -0.027825285, -0.0048302715, -0...","[0.03547541, -0.027677324, 0.019167567, -0.007..."
209524,209524,"Giants Over Patriots, Jets Over Colts Among M...","Leading up to Super Bowl XLVI, the most talked...","[-0.0816778, 0.022369152, 0.027179016, 0.02018...","[-0.020275101, 0.10664522, -0.007810726, -0.01..."
209525,209525,Aldon Smith Arrested: 49ers Linebacker Busted ...,CORRECTION: An earlier version of this story i...,"[-0.04274766, 0.12479968, -0.047635496, -0.057...","[0.044633802, 0.014033731, -0.004920267, -0.02..."


### Creating User Embedding dataframe

+ 我們會建立一個 User Embedding dataframe來代表每一個User的feature
+ 計算的方式就是透過每一個user之前點過哪三個item，我們去拿那三個item的embedding出來取平均，就當作這個user的embedding，最後一個column是把headline跟short description concat起來。


In [8]:
N_USERS = 2000
N_ITEMS = 209527
EMBEDDING_DIM = 768
HISTORY_SIZE = 3


In [9]:
if os.path.exists(dataset_dir + 'user_embedding.pkl'):
    df_user_embedding = pd.read_pickle(dataset_dir+'user_embedding.pkl')
else:
    df_user_embedding_list = []
    for user in range(N_USERS):
        # print(df_user.iloc[user])
        sum_headline = tf.zeros(shape=(EMBEDDING_DIM,)) # since all embeddings are (768,)
        sum_short_description = tf.zeros(shape=(EMBEDDING_DIM,)) # since all embeddings are (768,)
        for item in df_user.iloc[user]["history"]:
            headline_tensor = tf.convert_to_tensor(df_item_train.iloc[item]["headline_embeddings"])
            short_description_tensor = tf.convert_to_tensor(df_item_train.iloc[item]["short_description_embeddings"])
            sum_headline += headline_tensor
            sum_short_description += short_description_tensor
        sum_headline = tf.divide(sum_headline, HISTORY_SIZE)
        sum_short_description = tf.divide(sum_short_description, HISTORY_SIZE)
        concat_embedding = tf.concat([sum_headline, sum_short_description],axis=0)
        df_user_embedding_list.append([user, sum_headline.numpy(), sum_short_description.numpy(), concat_embedding.numpy()])
    df_user_embedding = pd.DataFrame(df_user_embedding_list, columns = ['user_id', 'headline_embedding', 'short_description_embedding','concat_embeddings'])

In [10]:
print(df_user_embedding.iloc[0])
print(df_user_embedding.iloc[0]["concat_embeddings"][768])
display(df_user_embedding.head(5))

user_id                                                                        0
headline_embedding             [0.038156908, 0.041387293, -0.004624029, -0.02...
short_description_embedding    [0.03716896, 0.0512002, -0.025162613, -0.01830...
concat_embeddings              [0.038156908, 0.041387293, -0.004624029, -0.02...
Name: 0, dtype: object
0.03716896


Unnamed: 0,user_id,headline_embedding,short_description_embedding,concat_embeddings
0,0,"[0.038156908, 0.041387293, -0.004624029, -0.02...","[0.03716896, 0.0512002, -0.025162613, -0.01830...","[0.038156908, 0.041387293, -0.004624029, -0.02..."
1,1,"[-0.01718974, 0.04380578, -0.0033005949, 0.026...","[-0.022690356, 0.041603807, -0.009130617, -0.0...","[-0.01718974, 0.04380578, -0.0033005949, 0.026..."
2,2,"[-0.0044823308, -0.017107317, -0.038572405, -0...","[0.03541447, 0.021701857, -0.016264068, -0.025...","[-0.0044823308, -0.017107317, -0.038572405, -0..."
3,3,"[-0.03414116, 0.035626348, -0.024177575, 0.043...","[-0.032475274, 0.034354758, -0.0064822477, 0.0...","[-0.03414116, 0.035626348, -0.024177575, 0.043..."
4,4,"[-0.0039870082, 0.082041055, -0.01948834, -0.0...","[0.009200389, 0.0539891, -0.026606128, -0.0117...","[-0.0039870082, 0.082041055, -0.01948834, -0.0..."


### Creating Item embedding dataframe

那我們也重新去建立一個item embedding，丟棄調原本對item的description，最後一個column是把headline跟short description concat起來。

In [11]:
if (os.path.exists(dataset_dir+'item_embedding.pkl')):
    df_item_embedding = pd.read_pickle(dataset_dir+'item_embedding.pkl')
else:
    item_embedding_list = []
    for item in range(len(df_item_train)):
        headline_tensor = tf.convert_to_tensor(df_item_train.iloc[item]["headline_embeddings"])
        short_description_tensor = tf.convert_to_tensor(df_item_train.iloc[item]["short_description_embeddings"])
        concat_embedding = tf.concat([headline_tensor, short_description_tensor],axis=0)
        item_embedding_list.append([item, headline_tensor.numpy(), short_description_tensor.numpy(), concat_embedding.numpy()])
    df_item_embedding = pd.DataFrame(item_embedding_list, columns=["item_id","headline_embeddings","short_description_embeddings","concat_embeddings"])
    df_item_embedding.to_pickle(dataset_dir + 'item_embedding.pkl')

In [12]:
print(df_item_embedding.iloc[0])
print(df_item_embedding.iloc[0]["concat_embeddings"][768])
print("len of item dataframe is ", len(df_item_embedding))
display(df_item_embedding.head(5))

item_id                                                                         0
headline_embeddings             [-0.054995973, 0.10514701, 0.0009537986, -0.07...
short_description_embeddings    [0.04689467, 0.089309394, -0.018575395, -0.029...
concat_embeddings               [-0.054995973, 0.10514701, 0.0009537986, -0.07...
Name: 0, dtype: object
0.04689467
len of item dataframe is  209527


Unnamed: 0,item_id,headline_embeddings,short_description_embeddings,concat_embeddings
0,0,"[-0.054995973, 0.10514701, 0.0009537986, -0.07...","[0.04689467, 0.089309394, -0.018575395, -0.029...","[-0.054995973, 0.10514701, 0.0009537986, -0.07..."
1,1,"[-0.020863444, 0.011131575, 0.0013632453, -0.0...","[0.017128233, -0.0062120855, 0.015252358, 0.02...","[-0.020863444, 0.011131575, 0.0013632453, -0.0..."
2,2,"[0.017761054, 0.053476874, 6.918786e-05, -0.03...","[0.10238154, 0.07736524, 0.0020822838, -0.0614...","[0.017761054, 0.053476874, 6.918786e-05, -0.03..."
3,3,"[-0.0029250348, 0.01137404, 0.0045979875, -0.0...","[0.04334459, 0.056244634, 0.0071496996, -0.057...","[-0.0029250348, 0.01137404, 0.0045979875, -0.0..."
4,4,"[-0.0049342206, 0.053551663, 0.027952224, -0.0...","[-0.0066743735, 0.03416268, -0.00058029604, 0....","[-0.0049342206, 0.053551663, 0.027952224, -0.0..."


In [13]:
from scipy import spatial
from sklearn.metrics.pairwise import cosine_similarity

### User-Item Similarity Matrix

在這個地方，我們是透過計算每一個user跟每一個item之間的餘弦相似度，並存成一個dataframe

舉例來說，row=10, item=100，這個對應到的值就是user 10跟item 100的相似度，越高代表這個user有越高的機率會喜歡這個item.

In [14]:
dataset_dir = './dataset/'
if os.path.exists(dataset_dir + 'user_item_similarity.pkl'):
    df_user_item_similarity = pd.read_pickle(dataset_dir + 'user_item_similarity.pkl')
else:
    with open(dataset_dir+'user_item_similarity.txt','a') as fin:
        user_item_similarity_matrix = []
        for user in range(N_USERS):
            cosine_similarity_list = []
            for item in range(N_ITEMS):
                # calculating cosine similarity
                cosine_similarity = (1-spatial.distance.cosine(df_user_embedding.iloc[user]["concat_embeddings"], df_item_embedding.iloc[item]["concat_embeddings"]))
                cosine_similarity_list.append(cosine_similarity)    
            
            user_item_similarity_matrix.append(cosine_similarity_list)
            fin.write(",".join([str(x) for x in cosine_similarity_list]))
            fin.write('\n')
            fin.flush()
        df_user_item_similarity = pd.DataFrame(user_item_similarity_matrix)
        df_user_item_similarity.to_pickle(dataset_dir + 'user_item_similarity.pkl')

In [15]:
display(df_user_item_similarity.head(5))
print(df_user_item_similarity.shape)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,209517,209518,209519,209520,209521,209522,209523,209524,209525,209526
0,0.062287,0.003323,0.261205,0.274883,0.064868,0.013853,0.226747,0.061278,0.196566,0.109174,...,0.043999,0.059647,0.192302,0.062932,0.180494,0.012018,0.048684,0.079522,-0.015544,0.075266
1,0.221764,0.055316,0.112049,0.190925,0.091024,0.078525,0.017141,0.173482,0.051898,0.018566,...,-0.004871,-0.035946,0.117543,-0.005578,0.029166,0.013565,0.055716,0.008542,0.07323,0.03899
2,0.106126,0.022207,0.117865,0.251699,-0.012615,-0.014647,0.126634,0.021103,0.089571,0.015671,...,0.03622,0.084124,-0.012249,0.043811,0.079477,-0.061269,0.03782,0.004233,0.000226,0.029401
3,0.093782,0.016448,0.107896,0.075283,0.06736,-0.035875,0.099989,0.168689,0.12879,0.098932,...,0.056915,0.040844,0.170552,0.072513,0.178354,0.060635,0.035779,0.203117,0.084404,0.080729
4,0.122536,0.071247,0.153996,0.225814,0.183863,0.045011,0.210932,0.074886,0.209297,0.033176,...,0.063099,0.134406,0.060474,0.110714,0.092821,0.012666,0.129592,0.059556,0.067587,0.003772


(2000, 209527)


## Simulation Environments

### Training

#### Example of Interact with training environment

In [16]:
# copy from origin df.
df_similarity = df_user_item_similarity.copy()
sorted_indices = np.argsort(df_similarity.iloc[0])
print((sorted_indices))
sorted_indices = list(sorted_indices)
print('max similarity index: ', sorted_indices[-1])
print('min similarity index: ', sorted_indices[0])
print('max similarity score: ', df_similarity.iloc[0][sorted_indices[-1]])
print('min similarity score: ', df_similarity.iloc[0][sorted_indices[0]])
print('difference between max score and min score:', df_similarity.iloc[0][sorted_indices[-1]] - df_similarity.iloc[0][sorted_indices[0]])
print('difference between max score and min score:', df_similarity.iloc[1][sorted_indices[-1]] - df_similarity.iloc[1][sorted_indices[0]])

df_similarity

0          40075
1          96295
2         187611
3         139955
4         191058
           ...  
209522      9102
209523      9810
209524     65272
209525     42558
209526     13353
Name: 0, Length: 209527, dtype: int64
max similarity index:  13353
min similarity index:  40075
max similarity score:  0.6880343556404114
min similarity score:  -0.12662772834300995
difference between max score and min score: 0.8146620839834213
difference between max score and min score: 0.04912831587716937


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,209517,209518,209519,209520,209521,209522,209523,209524,209525,209526
0,0.062287,0.003323,0.261205,0.274883,0.064868,0.013853,0.226747,0.061278,0.196566,0.109174,...,0.043999,0.059647,0.192302,0.062932,0.180494,0.012018,0.048684,0.079522,-0.015544,0.075266
1,0.221764,0.055316,0.112049,0.190925,0.091024,0.078525,0.017141,0.173482,0.051898,0.018566,...,-0.004871,-0.035946,0.117543,-0.005578,0.029166,0.013565,0.055716,0.008542,0.073230,0.038990
2,0.106126,0.022207,0.117865,0.251699,-0.012615,-0.014647,0.126634,0.021103,0.089571,0.015671,...,0.036220,0.084124,-0.012249,0.043811,0.079477,-0.061269,0.037820,0.004233,0.000226,0.029401
3,0.093782,0.016448,0.107896,0.075283,0.067360,-0.035875,0.099989,0.168689,0.128790,0.098932,...,0.056915,0.040844,0.170552,0.072513,0.178354,0.060635,0.035779,0.203117,0.084404,0.080729
4,0.122536,0.071247,0.153996,0.225814,0.183863,0.045011,0.210932,0.074886,0.209297,0.033176,...,0.063099,0.134406,0.060474,0.110714,0.092821,0.012666,0.129592,0.059556,0.067587,0.003772
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,0.100583,0.015665,0.170305,0.348556,0.004256,0.037536,0.147794,0.190394,0.262166,0.082865,...,0.060153,0.081491,0.119582,0.026745,0.165972,0.012540,0.067981,0.044060,-0.026443,-0.015400
1996,0.091208,0.042278,0.095109,0.068298,0.088406,0.037034,0.087014,0.329571,0.112063,0.101930,...,-0.045567,-0.018176,0.064000,0.040877,0.201333,0.011241,0.077976,0.176499,0.095174,0.040855
1997,0.084990,0.107297,0.142880,0.165294,0.093537,0.032139,0.144430,0.103643,0.136623,0.098574,...,0.043109,-0.038939,0.077905,0.010034,0.133854,0.056863,0.035067,0.070287,0.059929,0.019807
1998,0.170120,0.071936,0.222804,0.198526,0.098436,-0.000620,0.210958,0.153454,0.097726,0.120007,...,0.026999,0.008165,0.022288,0.071645,0.013865,0.025776,0.133728,0.029238,0.022635,0.013028


在這裡可以去把最新的epoch的matrix給讀出來繼續拿來訓練或是拿來測試

In [17]:
# Reload last training dataframe.
print(dataset_dir)
if(os.path.exists(dataset_dir+'test/epoch_119.pkl')):
    df_next = pd.read_pickle(dataset_dir+'test/epoch_119.pkl')
    print("Load weights with epoch119")
    display(df_next)
else:
    print('path is not exist')
    df_next = df_user_item_similarity.copy()
    display(df_next)

./dataset/
Load weights with epoch119


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,209517,209518,209519,209520,209521,209522,209523,209524,209525,209526
0,0.062287,0.003323,0.261205,0.274883,0.064868,0.013853,0.226747,0.061278,0.196566,0.109174,...,0.043999,0.059647,0.192302,0.062932,0.180494,0.012018,0.048684,0.079522,-0.015544,0.075266
1,-1.000000,0.055316,0.112049,0.171833,0.091024,0.078525,0.017141,0.173482,0.051898,0.018566,...,-0.004871,-0.035946,0.117543,-0.005578,0.029166,0.013565,0.055716,0.008542,0.073230,0.038990
2,0.106126,0.022207,0.117865,0.251699,-0.012615,-0.014647,0.126634,0.021103,0.089571,0.015671,...,0.036220,0.084124,-0.012249,0.043811,0.079477,-0.061269,0.037820,0.004233,0.000226,0.029401
3,0.093782,0.016448,0.107896,0.075283,0.067360,-0.035875,0.099989,0.168689,0.128790,0.098932,...,0.056915,0.040844,0.170552,0.072513,0.178354,0.060635,0.035779,0.203117,0.084404,0.080729
4,0.122536,0.071247,0.153996,0.225814,0.183863,0.045011,0.210932,0.074886,0.209297,0.033176,...,0.063099,0.134406,0.060474,0.110714,0.092821,0.012666,0.129592,0.059556,0.067587,0.003772
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,0.100583,0.015665,0.170305,0.348556,0.004256,0.037536,0.147794,0.190394,0.262166,0.082865,...,0.060153,0.081491,0.119582,0.026745,0.165972,0.012540,0.067981,0.044060,-0.026443,-0.015400
1996,0.091208,0.042278,0.095109,0.068298,0.088406,0.037034,0.087014,0.329571,0.112063,0.101930,...,-0.045567,-0.018176,0.064000,0.040877,0.201333,0.011241,0.077976,0.176499,0.095174,0.040855
1997,0.084990,0.107297,0.142880,0.165294,0.093537,0.032139,0.144430,0.103643,0.136623,0.098574,...,0.043109,-0.038939,0.077905,0.010034,0.133854,0.056863,0.035067,0.070287,0.059929,0.019807
1998,0.170120,0.071936,0.222804,0.198526,0.098436,-0.000620,0.210958,0.153454,0.097726,0.120007,...,0.026999,0.008165,0.022288,0.071645,0.013865,0.025776,0.133728,0.029238,0.022635,0.013028


In [18]:
import time
from datetime import datetime

def time_log(str):
  logStr = f'[{datetime.now().strftime("%Y-%m-%d %H:%M:%S")}] {str}'
  logStr = logStr.replace('\n', '')
  print(logStr)
  fout = open('./dataset/test/test_training.log', 'a')
  fout.write(f'{logStr}\n')
  fout.close()

time_log("We assume remove HIGH_SIMILARITY_RATE_THRESHOLD would lead to better result")

[2024-01-15 22:56:36] We assume remove HIGH_SIMILARITY_RATE_THRESHOLD would lead to better result


In [19]:
HIGH_SIMILARITY_RATE_THRESHOLD = 0.0
training = False

在訓練的時候我主要用了兩個dataframe來做紀錄，第一個是`df_next`，另一個是`df_current`，在每一次的iteration開始前，`df_current`會從`df_next` copy一份，並在每一次要推薦使用者的時候，會根據 `df_current[user_id]`中相似度最高的前五個item優先進行推薦，推薦完以後這次的iteration就不該再推薦他，因此我會把下次的iteration設為1，而這次設為-1；那如果我挑了五個推薦，結果這個user一個都沒點，我就會給這些item一些penalty，所以把這些item的分數都乘上0.9

In [20]:
if(training):
    # Initialize the training environment
    train_env = TrainingEnvironment()

    training_scores = []
    # Reset the training environment (this can be useful when you have finished one episode of simulation and do not want to re-initialize a new environment)
    train_env.reset()
    for epoch in range(1,100):
        # APPLY CHANGE TO CURRENT DF.
        df_current = df_next.copy()
        with tqdm(desc='Training') as pbar:
            # Check if there exist any active users in the environment
            while (train_env.has_next_state()):

                # print(f'There is {"still some" if train_env.has_next_state() else "no"} active users in the training environment.')

                # Get the current user ID
                user_id = train_env.get_state()
                sorted_indices = np.argsort(df_current.iloc[user_id])
                # Get top5 similarity response of recommending the slate to the current user
                slate = list(sorted_indices[-5:])
                clicked_id, in_environment = train_env.get_response(slate)
                # Update similarity matrix here
                if(clicked_id == -1): # mean there is no click in this item
                    for item in slate:
                        # if(df_current.iloc[user_id][item] >= HIGH_SIMILARITY_RATE_THRESHOLD): # used to click, but not click this time
                        df_current.iloc[user_id][item] *= 0.9
                        df_next.iloc[user_id][item] *= 0.9
                        # else:
                        #     df_current.iloc[user_id][item] = -1
                        #     df_next.iloc[user_id][item] = -1

                else:
                    for item in slate:
                        if(item == clicked_id):
                            df_current.iloc[user_id][item] = -1
                            df_next.iloc[user_id][item] = 1
                        # else:
                        #     df_current.iloc[user_id][item] = 0.8                
                        #     df_next.iloc[user_id][item] = 0.8
                # print(f'The click result of recommending {slate} to user {user_id} is {f"item {clicked_id}" if clicked_id != -1 else f"{clicked_id} (no click)"}.')
                # print(f'User {user_id} {"is still in" if in_environment else "leaves"} the environment.')
                pbar.update(1)
        training_scores.append(train_env.get_score())
        # print(f"Epoch: {25+epoch}, Score: {sum(train_env.get_score())}")
        time_log(f"Epoch: {epoch+92}, Score: {sum(train_env.get_score())}, Iteration:{pbar}")
        df_next.to_pickle(f"./dataset/test/epoch_{epoch+92}.pkl")
        train_env.reset()

    # Get the normalized session length score of all users
    avg_train_scores = [np.average(score) for score in zip(*training_scores)]
    df_train_score = pd.DataFrame([[user_id, score] for user_id, score in enumerate(avg_train_scores)], columns=['user_id', 'avg_score'])
    display(df_train_score)
    print(training_scores)
    total_score = df_train_score['avg_score'].sum()
    print(f"Total train score:{total_score}")
else:
    print("Not train this time")

Not train this time


## Testing

## Consider user-user similarity (Create user-user similarity matrix)

在這裡，為了考慮user與user之間的相似度，我將user embedding又取出來了

In [21]:
display(df_user_embedding)

Unnamed: 0,user_id,headline_embedding,short_description_embedding,concat_embeddings
0,0,"[0.038156908, 0.041387293, -0.004624029, -0.02...","[0.03716896, 0.0512002, -0.025162613, -0.01830...","[0.038156908, 0.041387293, -0.004624029, -0.02..."
1,1,"[-0.01718974, 0.04380578, -0.0033005949, 0.026...","[-0.022690356, 0.041603807, -0.009130617, -0.0...","[-0.01718974, 0.04380578, -0.0033005949, 0.026..."
2,2,"[-0.0044823308, -0.017107317, -0.038572405, -0...","[0.03541447, 0.021701857, -0.016264068, -0.025...","[-0.0044823308, -0.017107317, -0.038572405, -0..."
3,3,"[-0.03414116, 0.035626348, -0.024177575, 0.043...","[-0.032475274, 0.034354758, -0.0064822477, 0.0...","[-0.03414116, 0.035626348, -0.024177575, 0.043..."
4,4,"[-0.0039870082, 0.082041055, -0.01948834, -0.0...","[0.009200389, 0.0539891, -0.026606128, -0.0117...","[-0.0039870082, 0.082041055, -0.01948834, -0.0..."
...,...,...,...,...
1995,1995,"[0.007958271, 0.06905828, -0.012985666, 0.0113...","[-0.005231034, 0.03755425, -0.013397035, -0.01...","[0.007958271, 0.06905828, -0.012985666, 0.0113..."
1996,1996,"[-0.036823038, 0.03161138, -0.017325308, 0.003...","[-0.010244309, -0.01270321, 0.00085817085, -0....","[-0.036823038, 0.03161138, -0.017325308, 0.003..."
1997,1997,"[0.02073842, 0.056454603, 0.000992029, 0.00940...","[0.012727796, 0.0049337894, -0.012769024, 0.02...","[0.02073842, 0.056454603, 0.000992029, 0.00940..."
1998,1998,"[0.016565206, 0.06024626, -0.0010410805, -0.01...","[0.025898555, 0.046939295, -0.02147156, -0.017...","[0.016565206, 0.06024626, -0.0010410805, -0.01..."


我會去計算user之間的相似度，因此在替test-user推薦item之前，我會找train user中哪些與test user比較相近，就推薦這個test user那個train user會喜歡的東西~

In [22]:
if(os.path.exists(dataset_dir + 'user_user_similarity_matrix.pkl')):
    df_user_user_similarity_matrix = pd.read_pickle(dataset_dir + 'user_user_similarity_matrix.pkl')
else:
    user_user_similarity_matrix = []
    for train_user in range(0,1000):
        row_similarity_list = []
        for test_user in range(1000,2000):
            cosine_similarity = (1-spatial.distance.cosine(df_user_embedding.iloc[train_user]["concat_embeddings"], df_user_embedding.iloc[test_user]["concat_embeddings"]))
            row_similarity_list.append(cosine_similarity)
        user_user_similarity_matrix.append(row_similarity_list)
    # print(user_user_similarity_matrix)
    columns_names = list(range(1000,2000))
    df_user_user_similarity_matrix = pd.DataFrame(user_user_similarity_matrix, columns = columns_names)
    df_user_user_similarity_matrix.to_pickle(dataset_dir + 'user_user_similarity_matrix.pkl')


In [23]:
display(df_user_user_similarity_matrix)

Unnamed: 0,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,...,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999
0,0.281648,0.208695,0.125609,0.179470,0.183822,0.109009,0.052345,0.202675,0.335739,0.217125,...,0.173621,0.222536,0.221223,0.189922,0.258909,0.221459,0.169270,0.253004,0.236725,0.312044
1,0.107756,0.157925,0.171894,0.191783,0.215627,0.263724,0.066528,0.048420,0.140302,0.219080,...,0.188666,0.127834,0.170375,0.164114,0.132382,0.155832,0.172978,0.137667,0.036835,0.184401
2,0.340798,0.104089,0.117053,0.121768,0.192938,0.217003,-0.007815,0.082012,0.093833,0.201811,...,0.162832,0.347783,0.093973,0.229857,0.219791,0.337218,-0.006569,0.166411,0.199871,0.136065
3,0.152984,0.204095,0.054092,0.151933,0.109114,0.051870,0.128989,0.186338,0.399219,0.216707,...,0.244051,0.087133,0.480642,0.195987,0.244529,0.177108,0.142491,0.211308,-0.009201,0.387058
4,0.410860,0.238368,0.188938,0.215626,0.339893,0.351212,0.086864,0.204705,0.207856,0.335223,...,0.293352,0.420971,0.161138,0.279038,0.344157,0.386043,0.085055,0.218188,0.275574,0.251951
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0.318691,0.269468,0.119631,0.186885,0.302532,0.243200,0.058353,0.258401,0.215519,0.326197,...,0.224176,0.286827,0.162822,0.274992,0.254537,0.387950,0.023907,0.317955,0.197057,0.175260
996,0.246249,0.342056,0.188454,0.261937,0.381595,0.282175,0.023506,0.282524,0.276801,0.293508,...,0.296109,0.172917,0.195978,0.312822,0.194241,0.184523,0.052026,0.216403,0.212665,0.304270
997,0.253551,0.193244,0.278929,0.220450,0.320774,0.388695,0.158398,0.211402,0.324682,0.186920,...,0.220253,0.273932,0.362760,0.306700,0.410991,0.453089,0.130172,0.230232,0.169021,0.406711
998,0.232465,0.211833,0.121255,0.133941,0.225623,0.214997,0.058819,0.195395,0.183076,0.203365,...,0.160095,0.239532,0.179513,0.185097,0.231373,0.376211,0.013491,0.250970,0.063096,0.201342


## Updating User-Item Similarity matrix


### Load the latest dataframe

接著我會把最像的user (in train user) 推薦item的順序如法炮製給這個test user

In [24]:
df_user_item_similarity_latest = pd.read_pickle('./dataset/test/epoch_119.pkl')
display(df_user_item_similarity_latest)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,209517,209518,209519,209520,209521,209522,209523,209524,209525,209526
0,0.062287,0.003323,0.261205,0.274883,0.064868,0.013853,0.226747,0.061278,0.196566,0.109174,...,0.043999,0.059647,0.192302,0.062932,0.180494,0.012018,0.048684,0.079522,-0.015544,0.075266
1,-1.000000,0.055316,0.112049,0.171833,0.091024,0.078525,0.017141,0.173482,0.051898,0.018566,...,-0.004871,-0.035946,0.117543,-0.005578,0.029166,0.013565,0.055716,0.008542,0.073230,0.038990
2,0.106126,0.022207,0.117865,0.251699,-0.012615,-0.014647,0.126634,0.021103,0.089571,0.015671,...,0.036220,0.084124,-0.012249,0.043811,0.079477,-0.061269,0.037820,0.004233,0.000226,0.029401
3,0.093782,0.016448,0.107896,0.075283,0.067360,-0.035875,0.099989,0.168689,0.128790,0.098932,...,0.056915,0.040844,0.170552,0.072513,0.178354,0.060635,0.035779,0.203117,0.084404,0.080729
4,0.122536,0.071247,0.153996,0.225814,0.183863,0.045011,0.210932,0.074886,0.209297,0.033176,...,0.063099,0.134406,0.060474,0.110714,0.092821,0.012666,0.129592,0.059556,0.067587,0.003772
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,0.100583,0.015665,0.170305,0.348556,0.004256,0.037536,0.147794,0.190394,0.262166,0.082865,...,0.060153,0.081491,0.119582,0.026745,0.165972,0.012540,0.067981,0.044060,-0.026443,-0.015400
1996,0.091208,0.042278,0.095109,0.068298,0.088406,0.037034,0.087014,0.329571,0.112063,0.101930,...,-0.045567,-0.018176,0.064000,0.040877,0.201333,0.011241,0.077976,0.176499,0.095174,0.040855
1997,0.084990,0.107297,0.142880,0.165294,0.093537,0.032139,0.144430,0.103643,0.136623,0.098574,...,0.043109,-0.038939,0.077905,0.010034,0.133854,0.056863,0.035067,0.070287,0.059929,0.019807
1998,0.170120,0.071936,0.222804,0.198526,0.098436,-0.000620,0.210958,0.153454,0.097726,0.120007,...,0.026999,0.008165,0.022288,0.071645,0.013865,0.025776,0.133728,0.029238,0.022635,0.013028


### Updating the test user to item similarity

更新這個user-item的矩陣


In [25]:
# for test_user in range(1000, 2000):
#     # print("Origin similarity:", df_user_item_similarity_latest.iloc[test_user][0])
#     column_data = df_user_user_similarity_matrix[test_user]
#     top_three_similarity = column_data.nlargest(3) # top3 user and its similarity
#     sum_of_similarities = top_three_similarity.sum() # used to normalized
#     normalized_list = [top_three_similarity.iloc[0]/sum_of_similarities, top_three_similarity.iloc[1]/sum_of_similarities, top_three_similarity.iloc[2]/sum_of_similarities]
#     # print(top_three_similarity)
#     # print("Top 3 similar user: ", top_three_similarity.index[0], top_three_similarity.index[1], top_three_similarity.index[2])
#     # print("We should multiply their similarity to :", normalized_list)
#     # print(f"Origin similarity of item 0 is {df_user_item_similarity_latest.iloc[top_three_similarity.index[0]][0]}, {df_user_item_similarity_latest.iloc[top_three_similarity.index[1]][0]}, {df_user_item_similarity_latest.iloc[top_three_similarity.index[2]][0]}")
#     new_similarity_list = df_user_item_similarity_latest.iloc[top_three_similarity.index[0]] * normalized_list[0] + df_user_item_similarity_latest.iloc[top_three_similarity.index[1]] * normalized_list[1] + df_user_item_similarity_latest.iloc[top_three_similarity.index[2]] * normalized_list[2]
#     # print("After update, the new similarity of user item is :", new_similarity_list[0])
#     df_user_item_similarity_latest.iloc[test_user] = df_user_item_similarity_latest.iloc[top_three_similarity.index[0]]

In [26]:
display(df_user_item_similarity_latest)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,209517,209518,209519,209520,209521,209522,209523,209524,209525,209526
0,0.062287,0.003323,0.261205,0.274883,0.064868,0.013853,0.226747,0.061278,0.196566,0.109174,...,0.043999,0.059647,0.192302,0.062932,0.180494,0.012018,0.048684,0.079522,-0.015544,0.075266
1,-1.000000,0.055316,0.112049,0.171833,0.091024,0.078525,0.017141,0.173482,0.051898,0.018566,...,-0.004871,-0.035946,0.117543,-0.005578,0.029166,0.013565,0.055716,0.008542,0.073230,0.038990
2,0.106126,0.022207,0.117865,0.251699,-0.012615,-0.014647,0.126634,0.021103,0.089571,0.015671,...,0.036220,0.084124,-0.012249,0.043811,0.079477,-0.061269,0.037820,0.004233,0.000226,0.029401
3,0.093782,0.016448,0.107896,0.075283,0.067360,-0.035875,0.099989,0.168689,0.128790,0.098932,...,0.056915,0.040844,0.170552,0.072513,0.178354,0.060635,0.035779,0.203117,0.084404,0.080729
4,0.122536,0.071247,0.153996,0.225814,0.183863,0.045011,0.210932,0.074886,0.209297,0.033176,...,0.063099,0.134406,0.060474,0.110714,0.092821,0.012666,0.129592,0.059556,0.067587,0.003772
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1995,0.100583,0.015665,0.170305,0.348556,0.004256,0.037536,0.147794,0.190394,0.262166,0.082865,...,0.060153,0.081491,0.119582,0.026745,0.165972,0.012540,0.067981,0.044060,-0.026443,-0.015400
1996,0.091208,0.042278,0.095109,0.068298,0.088406,0.037034,0.087014,0.329571,0.112063,0.101930,...,-0.045567,-0.018176,0.064000,0.040877,0.201333,0.011241,0.077976,0.176499,0.095174,0.040855
1997,0.084990,0.107297,0.142880,0.165294,0.093537,0.032139,0.144430,0.103643,0.136623,0.098574,...,0.043109,-0.038939,0.077905,0.010034,0.133854,0.056863,0.035067,0.070287,0.059929,0.019807
1998,0.170120,0.071936,0.222804,0.198526,0.098436,-0.000620,0.210958,0.153454,0.097726,0.120007,...,0.026999,0.008165,0.022288,0.071645,0.013865,0.025776,0.133728,0.029238,0.022635,0.013028


In [27]:
TESTING = True

## Load Experience Data

In [28]:
user_item_info = {uid: {} for uid in range(N_TEST_USERS)}  # for every element: {user_id: {item_id: click_count}}

with open('./dataset/clicked_ids_output_final.txt', 'r') as file:
    for line in file:
        line = line.strip().split(', ')
        user_id = int(line[0])
        item_id = int(line[1])
        
        try:
            user_item_info[user_id][item_id] += 1
        except KeyError:
            user_item_info[user_id][item_id] = 1

### most alike user

In [29]:
# for test_user_id in range(1000, 2000):
#     # print("Origin similarity:", df_user_item_similarity_latest.iloc[test_user][0])
#     column_data = df_user_user_similarity_matrix[test_user_id]
#     most_alike_user = column_data.nlargest(1) # most alike user and its similarity
#     most_alike_user_id = most_alike_user.index[0]
#     user_item_info[test_user_id] = user_item_info[most_alike_user_id]

### top 3 alike user

In [30]:
# ensure test user is empty dict
for uid in range(1000, 2000):
    user_item_info[uid] = {}

for test_user_id in range(1000, 2000):
    # print("Origin similarity:", df_user_item_similarity_latest.iloc[test_user][0])
    column_data = df_user_user_similarity_matrix[test_user_id]
    top_three_similarity = column_data.nlargest(3) # top 3 alike user and its similarity
    
    sum_of_similarities = top_three_similarity.sum() # used to normalized
    normalized_list = [top_three_similarity.iloc[i]/sum_of_similarities for i in range(3)]
    # print(normalized_list, "\n")
    
    for i in range(3):
        alike_user_id = top_three_similarity.index[i]
        interest_items = list(user_item_info[alike_user_id].keys())
        interest_count = list(user_item_info[alike_user_id].values())
        interest_count = [x * normalized_list[i] for x in interest_count]
                
        for j in range(len(interest_items)):
            item_id = interest_items[j]
            item_count = interest_count[j]
            try:
                user_item_info[test_user_id][item_id] += item_count
            except KeyError:
                user_item_info[test_user_id][item_id] = item_count

In [31]:
Items = {}
Weights = {}

for user_id in tqdm(range(N_TEST_USERS)):
    Items[user_id] = list(user_item_info[user_id].keys())
    interest_count = np.array(list(user_item_info[user_id].values()))
    total_count = interest_count.sum()
    Weights[user_id] = interest_count / total_count

100%|██████████| 2000/2000 [00:01<00:00, 1702.67it/s]


In [33]:
if(TESTING):
    # Initialize the testing environment
    test_env = TestingEnvironment()
    scores = []

    # The item_ids here is for the random recommender
    item_ids = [i for i in range(N_ITEMS)]

    # Repeat the testing process for 5 times
    for _ in range(3):
        # [TODO] Load your model weights here (in the beginning of each testing episode)
        # [TODO] Code for loading your model weights...
        # df_test_similarity = df_user_item_similarity_latest.copy() # reload from train
        current_weights = copy.deepcopy(Weights)
        
        # Start the testing process
        with tqdm(desc='Testing') as pbar:
            # Run as long as there exist some active users
            while test_env.has_next_state():
                # Get the current user id
                cur_user = test_env.get_state()

                # [TODO] Employ your recommendation policy to generate a slate of 5 distinct items
                # [TODO] Code for generating the recommended slate...
                # Here we provide a simple random implementation
                # slate = random.sample(item_ids, k=SLATE_SIZE)

                interest_items = Items[cur_user]
                weight = current_weights[cur_user]
                
                slate = random.choices(interest_items, weight, k=5)
                while len(np.unique(slate)) != SLATE_SIZE:
                    slate = random.choices(interest_items, weight, k=5)
    
                    
                # Get the response of the slate from the environment
                clicked_id, in_environment = test_env.get_response(slate)
                if (clicked_id != -1):
                    weight_idx = interest_items.index(clicked_id)
                    current_weights[cur_user][weight_idx] = 0
                    

                # [TODO] Update your model here (optional)
                # [TODO] You can update your model at each step, or perform a batched update after some interval
                # [TODO] Code for updating your model...
   
   
                # Update the progress indicator
                pbar.update(1)

        # Record the score of this testing episode
        scores.append(test_env.get_score())

        # Reset the testing environment
        test_env.reset()

        # [TODO] Delete or reset your model weights here (in the end of each testing episode)
        # [TODO] Code for deleting your model weights...
        # df_test_similarity = df_user_item_similarity.copy()
    # Calculate the average scores 
    avg_scores = [np.average(score) for score in zip(*scores)]

    # Generate a DataFrame to output the result in a .csv file
    df_result = pd.DataFrame([[user_id, avg_score] for user_id, avg_score in enumerate(avg_scores)], columns=['user_id', 'avg_score'])
    df_result.to_csv(OUTPUT_PATH, index=False)
    display(df_result)
else:
    print("Not to test this time")

Testing: 270942it [14:17, 316.07it/s]
Testing: 270415it [14:11, 317.62it/s]
Testing: 272063it [14:11, 319.53it/s]


Unnamed: 0,user_id,avg_score
0,0,0.010000
1,1,0.780667
2,2,0.025167
3,3,0.030333
4,4,0.020667
...,...,...
1995,1995,0.003167
1996,1996,0.003500
1997,1997,0.002833
1998,1998,0.002500


In [34]:
if(TESTING):
    total_score = df_result['avg_score'].sum()
    print(f"Total test score:{total_score}")
    print(f"eval metric: {1-total_score/2000}")

Total test score:135.57
eval metric: 0.932215


## Scoring

- Ranking of **private** leaderboard of the Kaggle competition. (80%)
- Report. (20%)

### How is the Score For Ranking Calculated:

We will calculate the MAE (Mean Absolute Error) between your submitted `output.csv` and a "ground-truth" of all 1s. The lower the better.

### Your Report Should Contain:

- Models you have tried during the competition. Briefly describe the main idea of the model and the reason why you chose that model.
- List the experiments you have done. For instance, data collecting, utilizing the user / item datasets, hyperparameters tuning, training process, and so on.
- Discussions, lessons learned, or anything else worth mentioning.
- **Ensure your report notebook contains your training and testing code. We will re-run your code if we find your score on Kaggle suspicious.**

Please name your report as `DL_comp4_{Your Team name}_report.ipynb.` and submit your report to the eeclass system before the deadline.

## What You Can Do

- Implement any recommender models.
- Collect data through accessing the **public methods provided by the environments** (i.e. methods listed in the ***Environment Public Methods*** section) and train your model.
- Use the provided user history data (`dataset/user_data.json`) and item text description data (`dataset/item_data.json`) as auxiliary data to aid your model training.
- Update the model during one testing episode while **following the rules mentioned in the ***Testing*** section.**
- You can use a pretrained text encoder if you need text embeddings for the item text descriptions. **(This is the only part you can use a pretrained model in this competition.)**

## What You CAN NOT Do

- Use any dataset other than the provided ones. Using the original News Category Dataset is also prohibited.
- Use any pretrained recommender models.
- Plagiarize other teams' work.
- Hack our simulation environments. Any attempt of accessing or modifying the data files in the `evaluation` directory, modifying the source code of the environments, accessing or modifying the private attributes and methods (i.e. methods and attributes not listed in the ***Environment Public Methods*** section), not following the rules in the ***Testing*** section, or any other forbidden actions mentioned in the previous section of the notebook will be regarded as cheating.

## Competition Timeline

- 2024/01/08 (Mon): Competition launched.
- 2024/01/15 (Mon) 08:00 (TW): Competition deadline.
- 2024/01/16 (Tue) 12:00 (TW): Report deadline.
- 2024/01/16 (Tue) 15:30 (TW): Top-3 teams sharing.

## References

1. Misra, Rishabh. "News Category Dataset." arXiv preprint arXiv:2209.11429 (2022).
2. Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 9798585463570 (2021).