# DL_comp4_全_report

## Member

- 邱煒甯, 108072244
- 劉祥暉, 109072142
- 簡佩如, 112065525
- 陳凱揚, 108032053

## 1. Load data

In [2]:
import os
import random
import copy
import pickle
import csv

import numpy as np
import pandas as pd
from tqdm import tqdm
import tensorflow as tf

from evaluation.environment import TrainingEnvironment, TestingEnvironment

os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '0'
os.environ['TF_XLA_FLAGS'] = '--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit'

2024-01-15 20:36:23.066878: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9360] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-15 20:36:23.066929: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-15 20:36:23.066954: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1537] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-15 20:36:23.076403: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [18]:
# Official hyperparameters for this competition (do not modify)
N_TRAIN_USERS = 1000
N_TEST_USERS = 2000
N_ITEMS = 209527
HORIZON = 2000

EMBEDDING_SIZE = 128
LEARNING_RATE = 1e-3
TRAIN_EPISODES = 250
TEST_EPISODES = 5
TRAIN_RETRAIN = 400
TEST_RETRAIN = 100
SLATE_SIZE = 5

In [4]:
# Dataset paths
USER_DATA = os.path.join('dataset', 'user_data.json')
ITEM_DATA = os.path.join('dataset', 'item_data.json')

# Output file path
OUTPUT_PATH = os.path.join('output', 'output.csv')

In [5]:
df_user = pd.read_json(USER_DATA, lines=True)
# df_user

In [6]:
df_item = pd.read_json(ITEM_DATA, lines=True)
# df_item

## 2. Preprocess

In [7]:
BATCH_SIZE = 64
BUFFER_SIZE = 5000
REPEAT_TIME = 1
MAX_DATA = 200000

In [8]:
def dataset_generator(history, clicked):
    dataset = tf.data.Dataset.from_tensor_slices((history, clicked))
    dataset = dataset.shuffle(BUFFER_SIZE)
    dataset = dataset.repeat(REPEAT_TIME)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return dataset


def save_history(history, clicked):
    history_file = os.path.join('dataset', 'history.pkl')
    clicked_file = os.path.join('dataset', 'clicked.pkl')
    
    if len(history) > MAX_DATA:
        history = history[:6000] + history[-(MAX_DATA - 6000):]
        clicked = clicked[:6000] + clicked[-(MAX_DATA - 6000):]
    
    with open(history_file, 'wb') as file:
        pickle.dump(history, file)
    with open(clicked_file, 'wb') as file:
        pickle.dump(clicked, file)

        
def read_history():
    user_history = [copy.deepcopy(h) for h in df_user['history']]
    try:
        history_file = os.path.join('dataset', 'history.pkl')
        clicked_file = os.path.join('dataset', 'clicked.pkl')

        with open(history_file, 'rb') as file:
            history = pickle.load(file)
        with open(clicked_file, 'rb') as file:
            clicked = pickle.load(file)
    except:
        history = [(idx, i) for idx, row in enumerate(user_history) for i in row]
        clicked = [1 for _ in range(len(history))]
    
    return user_history, history, clicked


def save_csv(train_history):
    with open('train_history1.csv', 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(train_history)
    

user_history, history, clicked = read_history()
save_history(history, clicked)

print(sum([len(i) for i in user_history]))
print(len(history))
print(len(clicked))

6000
123658
123658


## 3. Define model

In [19]:
class FunkSVDRecommender(tf.keras.Model):
    '''
    Simplified Funk-SVD recommender model
    '''

    def __init__(self, m_users: int, n_items: int, embedding_size: int, learning_rate: float):
        super().__init__()
        self.m = m_users
        self.n = n_items
        self.k = embedding_size
        self.lr = learning_rate

        # user embeddings P
        self.P = tf.Variable(tf.keras.initializers.RandomNormal()(shape=(self.m, self.k)))

        # item embeddings Q
        self.Q = tf.Variable(tf.keras.initializers.RandomNormal()(shape=(self.n, self.k)))
        
        # loss object
        self.loss_object = tf.keras.losses.BinaryCrossentropy(from_logits=False)

        # optimizer
        self.optimizer = tf.optimizers.Adam(learning_rate=self.lr)
        
        # checkpoint
        self.checkpoint_path ="/home/u6180060/DL/C4/ckpt" #'./ckpt'
        self.checkpoint = tf.train.Checkpoint(optimizer=self.optimizer, model=self)
        self.checkpoint_manager = tf.train.CheckpointManager(self.checkpoint, self.checkpoint_path, max_to_keep=3)
        
    def save_checkpoint(self):
        self.checkpoint_manager.save()

    def restore_checkpoint(self):
        # Restore the latest checkpoint
        if self.checkpoint_manager.latest_checkpoint:
            self.checkpoint.restore(self.checkpoint_manager.latest_checkpoint)

    @tf.function
    def call(self, user_ids: tf.Tensor, item_ids: tf.Tensor):
        # dot product the user and item embeddings corresponding to the observed interaction pairs to produce predictions
        raw_score = tf.reduce_sum(tf.gather(self.P, indices=user_ids) * tf.gather(self.Q, indices=item_ids), axis=1)
        y_pred = tf.nn.sigmoid(raw_score)

        return y_pred

    @tf.function
    def compute_loss(self, y_true: tf.Tensor, y_pred: tf.Tensor):
        loss = self.loss_object(y_true, y_pred)
        
        return loss

    @tf.function
    def train_step(self, data: tf.Tensor, label: tf.Tensor):
        user_ids = tf.cast(data[:, 0], dtype=tf.int32)
        item_ids = tf.cast(data[:, 1], dtype=tf.int32)
        y_true = tf.cast(label, dtype=tf.float32)
        
        # compute loss
        with tf.GradientTape() as tape:
            y_pred = self(user_ids, item_ids)
            loss = self.compute_loss(y_true, y_pred)

        # compute gradients
        gradients = tape.gradient(loss, self.trainable_variables)

        # update weights
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        return loss

    @tf.function
    def eval_predict_onestep(self, query: int):
        # dot product the selected user and all item embeddings to produce predictions
        user_id = tf.cast(query, tf.int32)
        y_pred = tf.reduce_sum(tf.gather(self.P, user_id) * self.Q, axis=1)
        y_top = tf.math.top_k(y_pred, 10).indices
        
        return y_top
    def eval_predict_onestep_test(self, query:int, train_history):
        user_id = tf.cast(query, tf.int32)
        y_pred = tf.reduce_sum(tf.gather(self.P, user_id) * self.Q, axis=1).numpy().tolist()
        for i in train_history:
            y_pred[i]=tf.float32.max
        y_top = tf.math.top_k(y_pred, 10).indices
        return y_top

## 4. Training

In [34]:
def get_more_data(sorted_y_pred,train_history):
    y_top_5 = []
    
    # select the top 5 items with highest scores in y_pred and not in the history
    idx = 0
    while idx < len(sorted_y_pred) and len(y_top_5) < 5:
        if sorted_y_pred[idx] not in train_history:
            y_top_5.append(sorted_y_pred[idx])
        idx += 1
    
    # select the item in history if len(y_top_5) < 5
    while len(y_top_5) < 5:
        random_number = random.randint(0, 209526)
        if random_number not in train_history and random_number not in y_top_5:
            y_top_5.append(random_number)
        
    return y_top_5

def get_top_5_train(sorted_y_pred, history):
    y_top_5 = []
    
    # select the top 5 items with highest scores in y_pred and not in the history
    idx = 0
    while idx < len(sorted_y_pred) and len(y_top_5) < 5:
        if sorted_y_pred[idx] not in history:
            y_top_5.append(sorted_y_pred[idx])
        idx += 1
    
    # select the item in history if len(y_top_5) < 5
    if len(y_top_5) < 5:
        y_top_5 += history[:5 - len(y_top_5)]
        
    return y_top_5

def get_top_5_test(sorted_y_pred, train_history, test_history):
    y_top_5 = []
    
    # select the top 5 items with highest scores in y_pred and not in the history
    idx = 0
    while idx < len(sorted_y_pred) and len(y_top_5) < 5:
        if sorted_y_pred[idx] not in test_history and sorted_y_pred[idx] in train_history:
            y_top_5.append(sorted_y_pred[idx])
        idx += 1
    
    idx = 0
    while idx < len(sorted_y_pred) and len(y_top_5) < 5:
        if sorted_y_pred[idx] not in test_history and sorted_y_pred[idx] not in y_top_5:
            y_top_5.append(sorted_y_pred[idx])
        idx += 1
    
    # select the item in history if len(y_top_5) < 5
    if len(y_top_5) < 5:
        y_top_5 += test_history[:5 - len(y_top_5)]
        
    return y_top_5

def update_history(y_top_5, clicked_id, user_id, user_history, history, clicked):
    if clicked_id in y_top_5:
        if clicked_id in user_history[user_id]:
            user_history[user_id].append(clicked_id)
        for _ in range(2):
            history.append((user_id, clicked_id))
            clicked.append(1)
    else:
        for y in y_top_5:
            history.append((user_id, y))
            clicked.append(0)
    
    return user_history, history, clicked

        
def retrain(model, history, clicked):
    dataset = dataset_generator(history, clicked)
    
    train_loss = []

    # training
    for (data, label) in dataset:
        loss = recommend_model.train_step(data, label)
        train_loss.append(loss.numpy())

    # print losses
    # print(f'Retrain_loss: {avg_train_loss:.4f}')


recommend_model = FunkSVDRecommender(N_TEST_USERS, N_ITEMS, EMBEDDING_SIZE, LEARNING_RATE)
recommend_model.restore_checkpoint()
# retrain(recommend_model, history, clicked)

In [11]:
# Initialize the training environment
train_env = TrainingEnvironment()

# Reset the training environment (this can be useful when you have finished one episode of simulation and do not want to re-initialize a new environment)
train_env.reset()

# Check if there exist any active users in the environment
env_has_next_state = train_env.has_next_state()
print(f'There is {"still some" if env_has_next_state else "no"} active users in the training environment.')

# Get the current user ID
user_id = train_env.get_state()
print(f'The current user is user {user_id}.')

# Get the response of recommending the slate to the current user
sorted_y_pred = recommend_model.eval_predict_onestep(user_id).numpy()
slate = get_top_5_train(sorted_y_pred, user_history[user_id])
clicked_id, in_environment = train_env.get_response(slate)
print(f'The click result of recommending {slate} to user {user_id} is {f"item {clicked_id}" if clicked_id != -1 else f"{clicked_id} (no click)"}.')
print(f'User {user_id} {"is still in" if in_environment else "leaves"} the environment.')

# Get the normalized session length score of all users
train_score = train_env.get_score()
df_train_score = pd.DataFrame([[user_id, score] for user_id, score in enumerate(train_score)], columns=['user_id', 'avg_score'])
# df_train_score

There is still some active users in the training environment.
The current user is user 500.


2024-01-15 20:36:57.704604: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5602ab91d830 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-01-15 20:36:57.704667: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2024-01-15 20:36:57.740709: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8906
2024-01-15 20:36:57.799189: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


The click result of recommending [105680, 147868, 38295, 80453, 95466] to user 500 is item 105680.
User 500 is still in the environment.


In [None]:
train_history = [copy.deepcopy(h) for h in df_user['history']]

# Initialize the training environment
train_env = TrainingEnvironment()

# The item_ids here is for the random recommender
item_ids = [i for i in range(N_ITEMS)]

# Repeat the training process for 5 times
start_epoch = 0


for epoch in range(start_epoch, 10000):
    # [TODO] Load your model weights here (in the beginning of each testing episode)
    # [TODO] Code for loading your model weights...
    recommend_model.restore_checkpoint()
    user_history, history, clicked = read_history()
    retrain_cnt = 0

    # Start the training process
    with tqdm(desc='Training') as pbar:
        # Run as long as there exist some active users
        while train_env.has_next_state():
            # Get the current user id
            cur_user = train_env.get_state()

            # [TODO] Employ your recommendation policy to generate a slate of 5 distinct items
            # [TODO] Code for generating the recommended slate...
            # Here we provide a simple random implementation
            sorted_y_pred = random.sample(range(0, 209527), 10)#recommend_model.eval_predict_onestep(cur_user).numpy()
            slate = get_more_data(sorted_y_pred, train_history[cur_user])

            # Get the response of the slate from the environment
            clicked_id, in_environment = train_env.get_response(slate)

            # [TODO] Update your model here (optional)
            # [TODO] You can update your model at each step, or perform a batched update after some interval
            # [TODO] Code for updating your model...
            if clicked_id != -1 and clicked_id not in train_history[cur_user]:
                train_history[cur_user].append(clicked_id)
                save_csv(train_history)
            
            user_history, history, clicked = update_history(slate, clicked_id, cur_user, user_history, history, clicked)
            # Update retrain count
            retrain_cnt += 1
            if retrain_cnt == TRAIN_RETRAIN:
                retrain_cnt = 0
                retrain(recommend_model, history, clicked)
            
            # Update the progress indicator
            pbar.update(1)
    
    # Output the training score
    train_score = train_env.get_score()
    avg_scores = [np.average(score) for score in zip(*[train_score])]
    result = (len(avg_scores) - sum(avg_scores)) / len(avg_scores)
    print(f'Epoch {epoch} score: {result:.6f}\n')
    with open('traing_output.txt', 'a') as file:
        file.write(f'Epoch {epoch} score: {result:.6f}\n')
    
    # Reset the training environment
    train_env.reset()
    
    # Save checkpoint
    save_history(history, clicked)
    recommend_model.save_checkpoint()

Training: 5104it [00:07, 671.84it/s]


Epoch 0 score: 0.997448



Training: 5100it [00:07, 676.07it/s]


Epoch 1 score: 0.997450



Training: 5123it [00:07, 672.78it/s]


Epoch 2 score: 0.997439



Training: 5123it [00:07, 663.87it/s]


Epoch 3 score: 0.997439



Training: 5140it [00:07, 656.43it/s]


Epoch 4 score: 0.997430



Training: 5102it [00:07, 678.41it/s]


Epoch 5 score: 0.997449



Training: 5096it [00:07, 691.79it/s]


Epoch 6 score: 0.997452



Training: 5104it [00:07, 666.12it/s]


Epoch 7 score: 0.997448



Training: 5111it [00:07, 670.65it/s]


Epoch 8 score: 0.997445



Training: 5135it [00:07, 659.58it/s]


Epoch 9 score: 0.997433



Training: 5118it [00:07, 677.18it/s]


Epoch 10 score: 0.997441



Training: 5140it [00:08, 635.42it/s]


Epoch 11 score: 0.997430



Training: 5138it [00:08, 642.22it/s]


Epoch 12 score: 0.997431



Training: 5112it [00:07, 670.58it/s]


Epoch 13 score: 0.997444



Training: 5139it [00:07, 663.69it/s]


Epoch 14 score: 0.997430



Training: 5140it [00:07, 648.27it/s]


Epoch 15 score: 0.997430



Training: 5129it [00:07, 657.87it/s]


Epoch 16 score: 0.997436



Training: 5127it [00:07, 666.21it/s]


Epoch 17 score: 0.997437



Training: 5130it [00:07, 658.12it/s]


Epoch 18 score: 0.997435



Training: 5121it [00:07, 672.21it/s]


Epoch 19 score: 0.997440



Training: 5099it [00:07, 678.63it/s]


Epoch 20 score: 0.997451



Training: 5121it [00:07, 662.00it/s]


Epoch 21 score: 0.997440



Training: 5117it [00:07, 667.45it/s]


Epoch 22 score: 0.997441



Training: 5121it [00:07, 669.09it/s]


Epoch 23 score: 0.997440



Training: 5112it [00:07, 673.90it/s]


Epoch 24 score: 0.997444



Training: 5130it [00:07, 651.92it/s]


Epoch 25 score: 0.997435



Training: 5110it [00:07, 674.24it/s]


Epoch 26 score: 0.997445



Training: 5124it [00:08, 585.34it/s]


Epoch 27 score: 0.997438



Training: 5114it [00:07, 674.21it/s]


Epoch 28 score: 0.997443



Training: 5118it [00:07, 667.66it/s]


Epoch 29 score: 0.997441



Training: 5124it [00:07, 670.86it/s]


Epoch 30 score: 0.997438



Training: 5130it [00:07, 660.55it/s]


Epoch 31 score: 0.997435



Training: 5136it [00:07, 665.28it/s]


Epoch 32 score: 0.997432



Training: 5123it [00:07, 662.97it/s]


Epoch 33 score: 0.997439



Training: 5119it [00:07, 662.08it/s]


Epoch 34 score: 0.997441



Training: 5098it [00:07, 649.35it/s]


Epoch 35 score: 0.997451



Training: 5122it [00:08, 622.63it/s]


Epoch 36 score: 0.997439



Training: 5110it [00:08, 594.81it/s]


Epoch 37 score: 0.997445



Training: 5125it [00:08, 635.17it/s]


Epoch 38 score: 0.997437



Training: 5131it [00:08, 633.54it/s]


Epoch 39 score: 0.997435



Training: 5105it [00:08, 635.70it/s]


Epoch 40 score: 0.997448



Training: 5121it [00:08, 635.10it/s]


Epoch 41 score: 0.997440



Training: 5105it [00:07, 643.38it/s]


Epoch 42 score: 0.997448



Training: 5116it [00:08, 637.16it/s]


Epoch 43 score: 0.997442



Training: 5131it [00:08, 619.90it/s]


Epoch 44 score: 0.997435



Training: 5139it [00:08, 622.21it/s]


Epoch 45 score: 0.997430



Training: 5102it [00:08, 632.65it/s]


Epoch 46 score: 0.997449



Training: 5121it [00:08, 626.27it/s]


Epoch 47 score: 0.997440



Training: 5131it [00:07, 644.18it/s]


Epoch 48 score: 0.997435



Training: 5135it [00:08, 623.93it/s]


Epoch 49 score: 0.997433



Training: 5119it [00:08, 622.77it/s]


Epoch 50 score: 0.997441



Training: 5132it [00:08, 629.71it/s]


Epoch 51 score: 0.997434



Training: 5118it [00:08, 632.29it/s]


Epoch 52 score: 0.997441



Training: 5118it [00:07, 639.81it/s]


Epoch 53 score: 0.997441



Training: 5125it [00:08, 635.32it/s]


Epoch 54 score: 0.997437



Training: 5145it [00:08, 621.99it/s]


Epoch 55 score: 0.997428



Training: 5127it [00:08, 629.30it/s]


Epoch 56 score: 0.997437



Training: 5119it [00:08, 624.60it/s]


Epoch 57 score: 0.997441



Training: 5141it [00:08, 626.59it/s]


Epoch 58 score: 0.997430



Training: 5146it [00:08, 627.32it/s]


Epoch 59 score: 0.997427



Training: 5112it [00:08, 634.18it/s]


Epoch 60 score: 0.997444



Training: 5133it [00:08, 626.67it/s]


Epoch 61 score: 0.997434



Training: 5127it [00:08, 615.38it/s]


Epoch 62 score: 0.997437



Training: 5114it [00:08, 633.28it/s]


Epoch 63 score: 0.997443



Training: 5096it [00:08, 635.13it/s]


Epoch 64 score: 0.997452



Training: 5135it [00:08, 614.87it/s]


Epoch 65 score: 0.997433



Training: 5118it [00:08, 631.09it/s]


Epoch 66 score: 0.997441



Training: 5107it [00:08, 635.12it/s]


Epoch 67 score: 0.997447



Training: 5122it [00:08, 626.46it/s]


Epoch 68 score: 0.997439



Training: 5137it [00:08, 621.95it/s]


Epoch 69 score: 0.997432



Training: 5150it [00:08, 603.85it/s]


Epoch 70 score: 0.997425



Training: 5119it [00:08, 627.98it/s]


Epoch 71 score: 0.997441



Training: 5131it [00:08, 615.09it/s]


Epoch 72 score: 0.997435



Training: 5144it [00:08, 618.24it/s]


Epoch 73 score: 0.997428



Training: 5134it [00:08, 624.69it/s]


Epoch 74 score: 0.997433



Training: 5107it [00:08, 631.85it/s]


Epoch 75 score: 0.997447



Training: 5152it [00:08, 623.06it/s]


Epoch 76 score: 0.997424



Training: 5114it [00:07, 643.28it/s]


Epoch 77 score: 0.997443



Training: 5108it [00:07, 657.47it/s]


Epoch 78 score: 0.997446



Training: 5129it [00:07, 657.24it/s]


Epoch 79 score: 0.997436



Training: 5122it [00:07, 662.37it/s]


Epoch 80 score: 0.997439



Training: 5119it [00:07, 655.05it/s]


Epoch 81 score: 0.997441



Training: 5115it [00:07, 668.94it/s]


Epoch 82 score: 0.997443



Training: 5120it [00:07, 662.57it/s]


Epoch 83 score: 0.997440



Training: 5107it [00:07, 669.72it/s]


Epoch 84 score: 0.997447



Training: 5148it [00:07, 651.30it/s]


Epoch 85 score: 0.997426



Training: 5123it [00:07, 670.32it/s]


Epoch 86 score: 0.997439



Training: 5104it [00:07, 668.87it/s]


Epoch 87 score: 0.997448



Training: 5111it [00:07, 661.50it/s]


Epoch 88 score: 0.997445



Training: 5143it [00:07, 644.03it/s]


Epoch 89 score: 0.997428



Training: 5097it [00:07, 672.68it/s]


Epoch 90 score: 0.997452



Training: 5154it [00:08, 643.71it/s]


Epoch 91 score: 0.997423



Training: 5136it [00:07, 653.31it/s]


Epoch 92 score: 0.997432



Training: 5124it [00:07, 661.01it/s]


Epoch 93 score: 0.997438



Training: 5124it [00:07, 657.68it/s]


Epoch 94 score: 0.997438



Training: 5123it [00:07, 664.78it/s]


Epoch 95 score: 0.997439



Training: 5127it [00:07, 651.50it/s]


Epoch 96 score: 0.997437



Training: 5108it [00:07, 660.51it/s]


Epoch 97 score: 0.997446



Training: 5129it [00:07, 657.90it/s]


Epoch 98 score: 0.997436



Training: 5126it [00:07, 651.08it/s]


Epoch 99 score: 0.997437



Training: 5133it [00:07, 659.53it/s]


Epoch 100 score: 0.997434



Training: 5118it [00:07, 662.57it/s]


Epoch 101 score: 0.997441



Training: 5118it [00:07, 667.37it/s]


Epoch 102 score: 0.997441



Training: 5132it [00:07, 660.92it/s]


Epoch 103 score: 0.997434



Training: 5122it [00:07, 653.44it/s]


Epoch 104 score: 0.997439



Training: 5101it [00:07, 669.49it/s]


Epoch 105 score: 0.997450



Training: 5111it [00:07, 658.67it/s]


Epoch 106 score: 0.997445



Training: 5129it [00:07, 661.01it/s]


Epoch 107 score: 0.997436



Training: 5135it [00:07, 658.17it/s]


Epoch 108 score: 0.997433



Training: 5133it [00:07, 644.71it/s]


Epoch 109 score: 0.997434



Training: 5122it [00:07, 665.92it/s]


Epoch 110 score: 0.997439



Training: 5120it [00:07, 659.60it/s]


Epoch 111 score: 0.997440



Training: 5130it [00:07, 651.93it/s]


Epoch 112 score: 0.997435



Training: 5124it [00:07, 660.97it/s]


Epoch 113 score: 0.997438



Training: 5135it [00:07, 649.67it/s]


Epoch 114 score: 0.997433



Training: 5135it [00:07, 645.22it/s]


Epoch 115 score: 0.997433



Training: 5131it [00:07, 649.68it/s]


Epoch 116 score: 0.997435



Training: 5105it [00:07, 663.47it/s]


Epoch 117 score: 0.997448



Training: 5146it [00:07, 651.45it/s]


Epoch 118 score: 0.997427



Training: 5137it [00:07, 646.22it/s]


Epoch 119 score: 0.997432



Training: 5097it [00:07, 663.42it/s]


Epoch 120 score: 0.997452



Training: 5119it [00:07, 658.17it/s]


Epoch 121 score: 0.997441



Training: 5124it [00:07, 657.93it/s]


Epoch 122 score: 0.997438



Training: 5128it [00:07, 643.02it/s]


Epoch 123 score: 0.997436



Training: 5110it [00:07, 656.62it/s]


Epoch 124 score: 0.997445



Training: 5122it [00:07, 660.45it/s]


Epoch 125 score: 0.997439



Training: 5089it [00:07, 665.49it/s]


Epoch 126 score: 0.997456



Training: 5134it [00:07, 653.79it/s]


Epoch 127 score: 0.997433



Training: 5119it [00:07, 657.75it/s]


Epoch 128 score: 0.997441



Training: 5127it [00:07, 655.93it/s]


Epoch 129 score: 0.997437



Training: 5129it [00:07, 654.51it/s]


Epoch 130 score: 0.997436



Training: 5104it [00:07, 664.08it/s]


Epoch 131 score: 0.997448



Training: 5123it [00:07, 656.04it/s]


Epoch 132 score: 0.997439



Training: 5132it [00:07, 648.78it/s]


Epoch 133 score: 0.997434



Training: 5137it [00:07, 647.34it/s]


Epoch 134 score: 0.997432



Training: 5111it [00:07, 664.02it/s]


Epoch 135 score: 0.997445



Training: 5113it [00:07, 663.15it/s]


Epoch 136 score: 0.997444



Training: 5109it [00:07, 654.46it/s]


Epoch 137 score: 0.997445



Training: 5103it [00:07, 661.30it/s]


Epoch 138 score: 0.997449



Training: 5125it [00:07, 658.39it/s]


Epoch 139 score: 0.997437



Training: 5113it [00:07, 653.72it/s]


Epoch 140 score: 0.997444



Training: 5135it [00:07, 650.76it/s]


Epoch 141 score: 0.997433



Training: 5117it [00:07, 656.81it/s]


Epoch 142 score: 0.997441



Training: 5115it [00:07, 664.08it/s]


Epoch 143 score: 0.997443



Training: 5138it [00:07, 649.38it/s]


Epoch 144 score: 0.997431



Training: 5109it [00:07, 665.44it/s]


Epoch 145 score: 0.997445



Training: 5120it [00:07, 648.65it/s]


Epoch 146 score: 0.997440



Training: 5093it [00:07, 671.59it/s]


Epoch 147 score: 0.997454



Training: 5169it [00:08, 640.88it/s]


Epoch 148 score: 0.997416



Training: 5120it [00:07, 660.95it/s]


Epoch 149 score: 0.997440



Training: 5180it [00:08, 629.19it/s]


Epoch 150 score: 0.997410



Training: 5120it [00:07, 653.76it/s]


Epoch 151 score: 0.997440



Training: 5176it [00:08, 644.36it/s]


Epoch 152 score: 0.997412



Training: 5121it [00:07, 660.15it/s]


Epoch 153 score: 0.997440



Training: 5130it [00:07, 663.58it/s]


Epoch 154 score: 0.997435



Training: 5117it [00:07, 656.90it/s]


Epoch 155 score: 0.997441



Training: 5114it [00:07, 648.56it/s]


Epoch 156 score: 0.997443



Training: 5105it [00:07, 667.97it/s]


Epoch 157 score: 0.997448



Training: 5119it [00:07, 666.27it/s]


Epoch 158 score: 0.997441



Training: 5094it [00:07, 653.48it/s]

## 5. Testing

In [37]:
# Initialize the testing environment
test_env = TestingEnvironment()
scores = []

# The item_ids here is for the random recommender
item_ids = [i for i in range(N_ITEMS)]
test_history = [copy.deepcopy(h) for h in df_user['history']]
# Repeat the testing process for 5 times
for _ in range(TEST_EPISODES):
    # [TODO] Load your model weights here (in the beginning of each testing episode)
    # [TODO] Code for loading your model weights...
#     recommend_model.restore_checkpoint()
    recommend_model.checkpoint.restore("/home/u6180060/DL/C4/ckpt/ckpt-250")
    test_history, history, clicked = read_history()
    retrain_cnt = 0

    # Start the testing process
    with tqdm(desc='Testing') as pbar:
        # Run as long as there exist some active users
        while test_env.has_next_state():
            # Get the current user id
            cur_user = test_env.get_state()

            # [TODO] Employ your recommendation policy to generate a slate of 5 distinct items
            # [TODO] Code for generating the recommended slate...
            # Here we provide a simple random implementation
            sorted_y_pred = recommend_model.eval_predict_onestep_test(cur_user,train_history[cur_user]).numpy()
            slate = get_top_5_train(sorted_y_pred, test_history[cur_user])

            # Get the response of the slate from the environment
            clicked_id, in_environment = test_env.get_response(slate)

            # [TODO] Update your model here (optional)
            # [TODO] You can update your model at each step, or perform a batched update after some interval
            # [TODO] Code for updating your model...
            if clicked_id != -1 and clicked_id not in train_history[cur_user]:
                train_history[cur_user].append(clicked_id)
            test_history, history, clicked = update_history(slate, clicked_id, cur_user, test_history, history, clicked)
            
            # Update retrain count
            retrain_cnt += 1
            if retrain_cnt == TEST_RETRAIN:
                retrain_cnt = 0
                retrain(recommend_model, history, clicked)
            
            # Update the progress indicator
            pbar.update(1)

    # Record the score of this testing episode
    scores.append(test_env.get_score())
    
    # Output the testing score
    test_score = test_env.get_score()
    avg_scores = [np.average(score) for score in zip(*[test_score])]
    result = (len(avg_scores) - sum(avg_scores)) / len(avg_scores)
    print(f'Score: {result:.6f}\n')
    with open('testing_output.txt', 'a') as file:
        file.write(f'Score: {result:.6f}\n')

    # Reset the testing environment
    test_env.reset()

    # [TODO] Delete or reset your model weights here (in the end of each testing episode)
    # [TODO] Code for deleting your model weights...
    test_history, history, clicked = None, None, None

# Calculate the average scores 
avg_scores = [np.max(score) for score in zip(*scores)]
result = (len(avg_scores) - sum(avg_scores)) / len(avg_scores)
print(f'Result: {result:.6f}')

# Generate a DataFrame to output the result in a .csv file
df_result = pd.DataFrame([[user_id, avg_score] for user_id, avg_score in enumerate(avg_scores)], columns=['user_id', 'avg_score'])
df_result.to_csv(OUTPUT_PATH, index=False)
df_result

Testing: 14369it [39:54,  6.00it/s]


Score: 0.996408

Result: 0.996408


Unnamed: 0,user_id,avg_score
0,0,0.0025
1,1,0.0045
2,2,0.0050
3,3,0.0050
4,4,0.0050
...,...,...
1995,1995,0.0025
1996,1996,0.0025
1997,1997,0.0025
1998,1998,0.0025


## 6. Report

### Models you have tried during the competition. Briefly describe the main idea of the model and the reason why you chose that model.

- Content-based algorithms (item feature)

    我們有嘗試將 item 的 headline, description 使用 bert 去產生 embedding, 並利用已經有的 user history 和從 training 環境中獲取的 interact data 去對每個 user 都 train 一個簡單的 linear model，用來找出最適合推薦給這個 user 的 item。

- Collaborative filtering (Funk-SVD)

    這邊是根據 tutorial 的範例修改而成，其中不一樣的地方在於我們會將曾經選擇過的 history 跳過不選擇，因為 user 去選看過的 item 的機率非常低，且會紀錄曾經點擊過的 item，再將來重新 train 的時候，會優先推薦這些 item 給 user。而這邊的 loss 是使用 `BinaryCrossEntropy`，最後我們採用這個 model。

### List the experiments you have done. For instance, data collecting, utilizing the user / item datasets, hyperparameters tuning, training process, and so on.

- 收集 data 的方式為如果有點選其中一個 item，那就會紀錄這筆點過的資訊並 oversample 2-4 次，其他 4 個 item 則不紀錄，因為這 4 個 item 不一定是 user 不想看的，有可能是因為被點的那個太想看，導致這些 item 沒被選到；如果 user 5 個 item 都不點選，則這 5 個 item 都會記錄下來，因為代表 user 選擇不看的機率比較高，這 5 個 item 是 user 不想看的。

- 在和 training 環境互動時，會每隔 500 個 response，就將新蒐集到的資料和原始的資料一起給 model 來 retrain，在 test 環境時則會更頻繁，因為希望多學到新 user 的資料。

- 我們有一直在微調 learning rate，不然很多時候會 train 太慢或是一直大震盪，同時也有調整要間隔幾個 response 再去做 retrain。

### Discussions, lessons learned, or anything else worth mentioning.

- 我認為這次很重要的點就是要超級大量的一直去搜集 training 環境的互動資料，並利用這些曾經點選的紀錄再將來推薦給 user，尤其 training 環境可以無限一直重複跑，所以可以蒐集到無限的資料來準確地建立出每個 user 對每個 item 的喜好，有了這些資料後，甚至不用太厲害的 model 都能輕鬆推薦正確的 item 給 training 環境中的 1000 名 user。

- 而 testing 環境中新的 1000 名 user 要準確的推薦給他們實在是太困難了，因為在每次環境中只有短短的時間去得到這些新 user 的 response，然而 item 數量多達 20 萬，要能成功推薦有很大的難度。


## 7. Conclusion

Overall, through this competition, we found that the key point is to collect more data by interacting with the environment. We mainly tried two different model,  including content-based personal model and collaborative filtering, but we found that their performance are quite similar. Thus, we understood that the model was not the bottleneck. Just before deadline, we finally understood that we should collect as many as possible so our performace started to improve. We used random recommender to interact with the environment and record the data. However, it was a pity that we did’t have enough time to collect more data to pass the TA60 and TA70 at the end. We should have noticed the key point was data earlier:(