# <center>DataLab Cup 4: Recommender Systems</center>

## Platform: [Kaggle](https://www.kaggle.com/t/b06e248a3827434f80c4fdc6009d5fe0)

Please download the dataset and the environment source code from Kaggle.

In [19]:
import os
import random

import numpy as np
import pandas as pd
from tqdm import tqdm, trange

from evaluation.environment import TrainingEnvironment, TestingEnvironment

import tensorflow as tf
import matplotlib.pyplot as plt

import pickle

In [20]:
# Check GPU
gpus = tf.config.experimental.list_physical_devices("GPU")
if gpus:
    try:
        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        # Select GPU number 1
        tf.config.experimental.set_visible_devices(gpus[0], "GPU")
        logical_gpus = tf.config.experimental.list_logical_devices("GPU")
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

1 Physical GPUs, 1 Logical GPUs


In [21]:
# Official hyperparameters for this competition (do not modify)
N_TRAIN_USERS = 1000
N_TEST_USERS = 2000
N_ITEMS = 209527
HORIZON = 2000
TEST_EPISODES = 5
SLATE_SIZE = 5

## Datasets

In [22]:
# Dataset paths
USER_DATA = os.path.join('dataset', 'user_data.json')
ITEM_DATA = os.path.join('dataset', 'item_data.json')

# Output file path
OUTPUT_PATH = os.path.join('output', 'output.csv')

## User Data

In [23]:
df_user = pd.read_json(USER_DATA, lines=True)
df_user

Unnamed: 0,user_id,history
0,0,"[42558, 65272, 13353]"
1,1,"[146057, 195688, 143652]"
2,2,"[67551, 85247, 33714]"
3,3,"[116097, 192703, 103229]"
4,4,"[68756, 140123, 135289]"
...,...,...
1995,1995,"[95090, 131393, 130239]"
1996,1996,"[2360, 147130, 8145]"
1997,1997,"[99794, 138694, 157888]"
1998,1998,"[55561, 60372, 51442]"


In [76]:
pairs = df_user.explode("history")
pairs["label"] = 1

pairs

Unnamed: 0,user_id,history,label
0,0,42558,1
0,0,65272,1
0,0,13353,1
1,1,146057,1
1,1,195688,1
...,...,...,...
1998,1998,60372,1
1998,1998,51442,1
1999,1999,125409,1
1999,1999,77906,1


## Item Data

In [25]:
df_item = pd.read_json(ITEM_DATA, lines=True)
df_item['concat'] = df_item['headline'] + ' ' + df_item['short_description']
df_item

Unnamed: 0,item_id,headline,short_description,concat
0,0,Over 4 Million Americans Roll Up Sleeves For O...,Health experts said it is too early to predict...,Over 4 Million Americans Roll Up Sleeves For O...
1,1,"American Airlines Flyer Charged, Banned For Li...",He was subdued by passengers and crew when he ...,"American Airlines Flyer Charged, Banned For Li..."
2,2,23 Of The Funniest Tweets About Cats And Dogs ...,"""Until you have a dog you don't understand wha...",23 Of The Funniest Tweets About Cats And Dogs ...
3,3,The Funniest Tweets From Parents This Week (Se...,"""Accidentally put grown-up toothpaste on my to...",The Funniest Tweets From Parents This Week (Se...
4,4,Woman Who Called Cops On Black Bird-Watcher Lo...,Amy Cooper accused investment firm Franklin Te...,Woman Who Called Cops On Black Bird-Watcher Lo...
...,...,...,...,...
209522,209522,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,Verizon Wireless and AT&T are already promotin...,RIM CEO Thorsten Heins' 'Significant' Plans Fo...
209523,209523,Maria Sharapova Stunned By Victoria Azarenka I...,"Afterward, Azarenka, more effusive with the pr...",Maria Sharapova Stunned By Victoria Azarenka I...
209524,209524,"Giants Over Patriots, Jets Over Colts Among M...","Leading up to Super Bowl XLVI, the most talked...","Giants Over Patriots, Jets Over Colts Among M..."
209525,209525,Aldon Smith Arrested: 49ers Linebacker Busted ...,CORRECTION: An earlier version of this story i...,Aldon Smith Arrested: 49ers Linebacker Busted ...


In [26]:
from sentence_transformers import SentenceTransformer

has_enbedded = True

if not has_enbedded:
    model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
    embeddings = model.encode(df_item['concat'].tolist())
    
    df_item['embedding'] = embeddings.tolist()
    df_item['embedding'].to_pickle("./dataset/embedding.pkl")

In [27]:
if os.path.exists("./dataset/embedding.pkl"):
    embeddings_from_file = pd.read_pickle("./dataset/embedding.pkl")
    df_item['embedding'] = embeddings_from_file.tolist()

df_item

Unnamed: 0,item_id,headline,short_description,concat,embedding
0,0,Over 4 Million Americans Roll Up Sleeves For O...,Health experts said it is too early to predict...,Over 4 Million Americans Roll Up Sleeves For O...,"[0.504711925983429, -0.09660917520523071, 1.02..."
1,1,"American Airlines Flyer Charged, Banned For Li...",He was subdued by passengers and crew when he ...,"American Airlines Flyer Charged, Banned For Li...","[-0.7140856981277466, -0.020469149574637413, 0..."
2,2,23 Of The Funniest Tweets About Cats And Dogs ...,"""Until you have a dog you don't understand wha...",23 Of The Funniest Tweets About Cats And Dogs ...,"[0.3640383183956146, 0.46643778681755066, 0.22..."
3,3,The Funniest Tweets From Parents This Week (Se...,"""Accidentally put grown-up toothpaste on my to...",The Funniest Tweets From Parents This Week (Se...,"[-0.6239121556282043, 0.8828951716423035, 0.29..."
4,4,Woman Who Called Cops On Black Bird-Watcher Lo...,Amy Cooper accused investment firm Franklin Te...,Woman Who Called Cops On Black Bird-Watcher Lo...,"[0.2773666977882385, 0.27412697672843933, -0.0..."
...,...,...,...,...,...
209522,209522,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,Verizon Wireless and AT&T are already promotin...,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,"[-0.4424550235271454, 0.4845919609069824, 0.63..."
209523,209523,Maria Sharapova Stunned By Victoria Azarenka I...,"Afterward, Azarenka, more effusive with the pr...",Maria Sharapova Stunned By Victoria Azarenka I...,"[-0.2965202331542969, 0.0609898716211319, 0.14..."
209524,209524,"Giants Over Patriots, Jets Over Colts Among M...","Leading up to Super Bowl XLVI, the most talked...","Giants Over Patriots, Jets Over Colts Among M...","[-0.05925000458955765, -0.564041018486023, 0.5..."
209525,209525,Aldon Smith Arrested: 49ers Linebacker Busted ...,CORRECTION: An earlier version of this story i...,Aldon Smith Arrested: 49ers Linebacker Busted ...,"[-0.25016430020332336, 0.15449568629264832, -0..."


In [28]:
from sklearn.metrics.pairwise import cosine_similarity

# 計算相似性矩陣
similarity_items = {}

if not os.path.exists("./dataset/similarity_items.pkl"):
    # 提取 embedding 列的向量
    embeddings = np.vstack(df_item['embedding'].to_numpy())

    # 設定分批處理的大小
    batch_size = 1000

    for i in range(0, len(df_item), batch_size):
        embeddings_batch = np.vstack(df_item['embedding'].iloc[i:i+batch_size].to_numpy())
        similarity_matrix_batch = cosine_similarity(embeddings_batch, embeddings)

        for item_id in range(i, min(i+batch_size, len(df_item))):
            # 排序相似度，得到相似 item 的索引
            similar_items = np.argsort(similarity_matrix_batch[item_id - i])[::-1][:100]

            # 將相似 item 存入 similarity_items 字典中
            similarity_items[df_item['item_id'].iloc[item_id]] = list(similar_items)

    with open("./dataset/similarity_items.pkl", "wb") as f:
        pickle.dump(similarity_items, f)
        
else:
    with open("./dataset/similarity_items.pkl", "rb") as f:
        similarity_items = pickle.load(f)

# import pprint
# pprint.pprint(similarity_items)

## Simulation Environments

We offer two simulation environments in this competition: `TrainingEnvironment` and `TestingEnvironment`. The only distinction between the two environments is the number of users, with 1000 for training and 2000 for testing. All public methods for both environments behave the same since they share the same base class.

**Important Note: Ensure that you collect interaction data only by accessing the environment through the designated public methods listed below. Directly accessing or modifying any file or code in the `evaluation` directory, or retrieving internal attributes and states of the environment (including all attributes / methods starting with an underscore `_`), will be considered as cheating.**

## Training

In [54]:
class FunkSVDRecommender(tf.keras.Model):
    '''
    Simplified Funk-SVD recommender model
    '''

    def __init__(self, m_users: int, n_items: int, bias_mu, embedding_size: int, learning_rate: float,
                 regularization_train: bool, regularization_update: bool, seed: int):
        '''
        Constructor of the model
        '''
        super().__init__()
        self.m = m_users
        self.n = n_items
        self.k = embedding_size
        self.lr = learning_rate
        self.reg_train = regularization_train
        self.reg_update = regularization_update
        self.seed = seed
        self.B_mu = tf.constant([bias_mu])

        # user embeddings P
        self.P = tf.Variable(tf.keras.initializers.RandomNormal()(shape=(self.m, self.k)))

        # item embeddings Q
        self.Q = tf.Variable(tf.keras.initializers.RandomNormal()(shape=(self.n, self.k)))
        
        # bias term
        self.B_user = tf.Variable(tf.keras.initializers.RandomNormal(seed=self.seed)(shape=(self.m, 1)))
        self.B_item = tf.Variable(tf.keras.initializers.RandomNormal(seed=self.seed)(shape=(self.n, 1)))

        # optimizer
        self.optimizer = tf.optimizers.Adam(learning_rate=self.lr)

    @tf.function
    def call(self, user_ids: tf.Tensor, item_ids: tf.Tensor) -> tf.Tensor:
        '''
        Forward pass used in training and validating
        '''
        # dot product the user and item embeddings corresponding to the observed interaction pairs to produce predictions
        y_pred = tf.reduce_sum(tf.gather(self.P, indices=user_ids) * tf.gather(self.Q, indices=item_ids), axis=1)
        
        y_pred = tf.add(y_pred, tf.squeeze(tf.gather(self.B_user, indices=user_ids)))
        y_pred = tf.add(y_pred, tf.squeeze(tf.gather(self.B_item, indices=item_ids)))

        return y_pred

    @tf.function
    def compute_loss(self, y_true: tf.Tensor, y_pred: tf.Tensor, regularization: bool) -> tf.Tensor:
        '''
        Compute the MSE loss of the model
        '''
        # loss = tf.losses.binary_crossentropy(y_true, y_pred, from_logits=True)
        
        if regularization:
            loss = tf.losses.binary_crossentropy(y_true, y_pred)
            # loss = tf.losses.mean_squared_error(y_true, y_pred)
            reg = 0.01 * (tf.nn.l2_loss(self.Q) + tf.nn.l2_loss(self.P) +
                          tf.nn.l2_loss(self.B_item) + tf.nn.l2_loss(self.B_user))
            loss += reg
        else:
            loss = tf.losses.binary_crossentropy(y_true, y_pred)
            # loss = tf.losses.mean_squared_error(y_true, y_pred)

        return loss

    @tf.function
    def train_step(self, data: tf.Tensor) -> tf.Tensor:
        '''
        Train the model with one batch
        data: batched user-item interactions
        each record in data is in the format [user_id, history, clicked]
        '''
#         user_ids = tf.cast(data[:, 0], dtype=tf.int32)
#         item_ids = tf.cast(data[:, 1], dtype=tf.int32)
#         y_true = tf.cast(data[:, 2], dtype=tf.float32)

        user_ids, item_ids, y_true = data
    
#         print(f"user_ids shape: {user_ids.shape}")
#         print(f"item_ids shape: {item_ids.shape}")

        # compute loss
        with tf.GradientTape() as tape:
            y_pred = self(user_ids, item_ids)
            loss = self.compute_loss(y_true, y_pred, self.reg_train)

        # compute gradients
        gradients = tape.gradient(loss, self.trainable_variables)

        # update weights
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        return loss

    @tf.function
    def eval_predict_onestep(self, user_id: int) -> tf.Tensor:
        '''
        Retrieve and return the NewsIDs of the 5 recommended news given a query
        You should return a tf.Tensor with shape=(5,)
        '''
        user_id = tf.cast(user_id, tf.int32)
        
        # dot product the selected user and all item embeddings to produce predictions
        y_pred = tf.reduce_sum(tf.gather(self.P, user_id) * self.Q, axis=1)
        
        y_pred = tf.add(tf.gather(self.B_user, user_id), y_pred)
        y_pred = tf.add(tf.squeeze(self.B_item), y_pred)

        # select the top 5 items with highest scores in y_pred
        y_top_5 = tf.math.top_k(y_pred, k=5).indices

        return y_top_5
    
    def get_topk(self, user_id, k=5) -> tf.Tensor:
        user_ids = tf.repeat(tf.constant(user_id), self.n)
        item_ids = tf.range(self.n)
        rank_list = tf.squeeze(self.call(user_ids, item_ids))
        return tf.math.top_k(rank_list, k=k).indices.numpy().tolist()

In [55]:
def get_content_topk(clicked_id, k=2, choose_self=True):
    n = 0 if choose_self else 1
    if clicked_id in similarity_items:
        return similarity_items[clicked_id][n : n + k]

    item_to_embedding = embeddings_from_file
    scores = tf.losses.CosineSimilarity(reduction="none")(
        tf.repeat(
            tf.constant(item_to_embedding.iloc[clicked_id], shape=(1, 384)),
            len(item_to_embedding),
            axis=0,
        ),
        tf.constant(item_to_embedding),
    )

    sort_items = tf.argsort(scores).numpy().tolist()

    similarity_items[clicked_id] = sort_items[:100]
    return sort_items[n : n + k]

In [68]:
# hyperparameters
EMBEDDING_SIZE = 256
BATCH_SIZE = 256
N_EPOCHS = 10
LEARNING_RATE = 5*(1e-3)
SEED = 514
BIAS_MU = 1.0
REGULARIZATION_TRAIN = True
REGULARIZATION_UPDATE = True

N_NEG = 5

In [69]:
# prepare datasets
dataset_train = tf.data.Dataset.from_tensor_slices((
            tf.convert_to_tensor(pairs["user_id"].to_numpy(dtype=int)),
            tf.convert_to_tensor(pairs["history"].to_numpy(dtype=int)),
            tf.convert_to_tensor(pairs["label"].to_numpy(dtype=int)),
        ))

dataset_train = dataset_train.batch(batch_size=BATCH_SIZE, num_parallel_calls=tf.data.AUTOTUNE).prefetch(buffer_size=tf.data.AUTOTUNE)

In [70]:
# build the model
model = FunkSVDRecommender(m_users=N_TRAIN_USERS, 
                           n_items=N_ITEMS,
                           bias_mu=BIAS_MU,
                           embedding_size=EMBEDDING_SIZE, 
                           learning_rate=LEARNING_RATE,
                           regularization_train=REGULARIZATION_TRAIN,
                           regularization_update=REGULARIZATION_UPDATE,
                           seed=SEED)

In [71]:
checkpoint = tf.train.Checkpoint(model=model)
ckpt_manager = tf.train.CheckpointManager(checkpoint, "./checkpoint/FunkSVD", max_to_keep=5)
best_manager = tf.train.CheckpointManager(checkpoint, "./checkpoint/FunkSVD/best", max_to_keep=1)

In [78]:
def train(model, dataset, n_neg=5):
    epoch_loss = []
    
    pbar = trange(N_EPOCHS, desc="Training", ncols=0)
    for _ in pbar:
        batch_loss = []
        
        for user_ids, pos_item_ids, labels in dataset:
            losses = []
            batch_size = len(user_ids)
            
            # Train positive samples
            loss = model.train_step((
                user_ids,
                pos_item_ids,
                labels,
            ))
            losses.append(loss)

            # Train negative samples
            neg_item_ids = tf.random.uniform(
                shape=(n_neg, batch_size),
                minval=0,
                maxval=N_ITEMS,
                dtype=tf.int32,
            )
            for _neg_item_id in neg_item_ids:
                loss = model.train_step((
                    tf.constant(user_ids),
                    tf.constant(_neg_item_id),
                    tf.zeros(batch_size),
                ))
                losses.append(loss)

            batch_loss.append(tf.reduce_mean(losses).numpy())
        epoch_loss.append(np.mean(batch_loss))
        pbar.set_postfix({"loss": epoch_loss[-1]})
    pbar.set_postfix({"loss": np.mean(epoch_loss)}, refresh=True)

    return model, np.mean(epoch_loss)


def update(model, user_id, clicked_id):
    # Positive samples
    model.train_step((
        tf.convert_to_tensor([[user_id]]),
        tf.convert_to_tensor([[clicked_id]]),
        tf.ones(1),
    ))

    # Negative samples
    neg_item_ids = tf.random.uniform(
        shape=(N_NEG,),
        minval=0,
        maxval=N_ITEMS,
        dtype=tf.int32,
    )
    model.train_step((
        tf.repeat(user_id, N_NEG),
        neg_item_ids,
        tf.zeros(N_NEG),
    ))

    return model


# Explore pipeline
def explore(env, model, pairs, slate_size=5):
    hit_count = 0
    pbar = tqdm(desc="Explore")
    while env.has_next_state():
        user_id = env.get_state()
        random_pos_item_id = random.choice(tuple(df_user.loc[user_id, 'history']))
        coll_slate = model.get_topk(user_id, 2)
        cont_slate = get_content_topk(random_pos_item_id, 3, False)
        slate = np.unique(coll_slate + cont_slate).tolist()
        while len(slate) < slate_size:
            slate = np.unique(
                slate
                + random.sample(model.get_topk(user_id, 10), slate_size - len(slate))
            ).tolist()
        clicked_id, _ = env.get_response(slate)

        if clicked_id != -1:
            hit_count += 1
            new_row = pd.DataFrame({'user_id': user_id, 'history':clicked_id, 'label': 1}, index=[0])
            pairs = pd.concat([pairs, new_row], ignore_index=True)
            model = update(model, user_id, clicked_id)

        pbar.update(1)
        pbar.set_postfix({"#click": hit_count})

    return model, hit_count

In [86]:
best_score = 0
for epoch in range(1, N_EPOCHS + 1):
    print(f" Eposide {epoch} starts...")

    # Initialize
    env = TrainingEnvironment()
    
    # training
    model, _ = train(model, dataset_train, n_neg=N_NEG)
          
    # Explore and update    
    model, _ = explore(env, model, pairs, 5)
    score = np.mean(env.get_score())
    print(f"Avg. Score: {score:.6f}")

    # Save
    ckpt_manager.save()
    
    df_user.to_pickle("./dataset/user_data_plus.pkl")
    with open("./dataset/similarity_items.pkl", "wb") as f:
        pickle.dump(similarity_items, f)

    # Save best model
    if score > best_score:
        best_score = score
        best_manager.save()
        print(f"Best model saved at {best_manager.latest_checkpoint}.")
    print("==============================================================")

 Eposide 1 starts...


Explore: 2800it [00:47, 59.10it/s, #click=298]
Training: 100% 10/10 [00:17<00:00,  1.74s/it, loss=0.701]
Explore: 5932it [01:49, 53.95it/s, #click=572]


Avg. Score: 0.002966
Best model saved at ./checkpoint/FunkSVD/best\ckpt-32.
 Eposide 2 starts...


Training: 100% 10/10 [00:33<00:00,  3.34s/it, loss=0.895]
Explore: 5923it [01:50, 53.83it/s, #click=566]


Avg. Score: 0.002962
 Eposide 3 starts...


Training: 100% 10/10 [00:14<00:00,  1.42s/it, loss=0.805]
Explore: 5912it [00:59, 99.21it/s, #click=560] 


Avg. Score: 0.002956
 Eposide 4 starts...


Training: 100% 10/10 [00:14<00:00,  1.43s/it, loss=0.78]
Explore: 5991it [01:01, 97.44it/s, #click=604] 


Avg. Score: 0.002996
Best model saved at ./checkpoint/FunkSVD/best\ckpt-36.
 Eposide 5 starts...


Training: 100% 10/10 [00:14<00:00,  1.44s/it, loss=0.76]
Explore: 5974it [01:00, 98.37it/s, #click=593] 


Avg. Score: 0.002987
 Eposide 6 starts...


Training: 100% 10/10 [00:14<00:00,  1.42s/it, loss=0.762]
Explore: 5984it [01:00, 98.17it/s, #click=603] 


Avg. Score: 0.002992
 Eposide 7 starts...


Training: 100% 10/10 [00:14<00:00,  1.42s/it, loss=0.792]
Explore: 5967it [01:00, 97.89it/s, #click=597] 


Avg. Score: 0.002984
 Eposide 8 starts...


Training: 100% 10/10 [00:14<00:00,  1.42s/it, loss=0.781]
Explore: 6020it [01:06, 90.72it/s, #click=627] 


Avg. Score: 0.003010
Best model saved at ./checkpoint/FunkSVD/best\ckpt-41.
 Eposide 9 starts...


Training: 100% 10/10 [00:14<00:00,  1.48s/it, loss=0.793]
Explore: 5991it [01:05, 90.95it/s, #click=613] 


Avg. Score: 0.002996
 Eposide 10 starts...


Training: 100% 10/10 [00:15<00:00,  1.50s/it, loss=0.793]
Explore: 5977it [01:04, 92.39it/s, #click=600] 


Avg. Score: 0.002989


## Testing

In [87]:
class History:
    def __init__(self, user_path):
        df_user = pd.read_json(user_path, lines=True)
        self.init_histories = df_user.set_index("user_id")["history"]
        self.curr_histories = self.init_histories.copy()

    def reset(self):
        self.curr_histories = self.init_histories.copy()

    def add(self, user_id, item_id):
        self.curr_histories.loc[user_id].append(item_id)

    def get(self, user_id):
        return self.curr_histories.loc[user_id]

    def update_init(self, sequence):
        self.init_histories = (
            pd.DataFrame(sequence, columns=["user_id", "history"])
            .groupby("user_id")["history"]
            .apply(list)
        )

In [92]:
best_ckpt_dir = "./checkpoint/FunkSVD/best"

# Initialize the testing environment
test_env = TestingEnvironment()
scores = []

# Repeat the testing process for 5 times
for epoch in range(5):
    # [TODO] Load your model weights here (in the beginning of each testing episode)
    # [TODO] Code for loading your model weights...
    
    print(f"Model restored from {tf.train.latest_checkpoint(best_ckpt_dir)}.")
    checkpoint = tf.train.Checkpoint(model=model)
    checkpoint.restore(tf.train.latest_checkpoint(best_ckpt_dir))
    history = History("./dataset/user_data.json")
    clicked_count = 0

    # Start the testing process
    with tqdm(desc="Testing") as pbar:
        # Run as long as there exist some active users
        while test_env.has_next_state():
            # Get the current user id
            cur_user = test_env.get_state()

            # [TODO] Employ your recommendation policy to generate a slate of 5 distinct items
            # [TODO] Code for generating the recommended slate...
            random_pos_item_id = random.choice(
                np.unique(history.get(cur_user)).tolist()
            )
            coll_slate = model.get_topk(cur_user, 2)
            cont_slate = get_content_topk(random_pos_item_id, 3, False)
            slate = np.unique(coll_slate + cont_slate).tolist()

            while len(slate) < 5:
                slate = np.unique(
                    slate
                    + random.sample(
                        model.get_topk(cur_user, 10),
                        5 - len(slate),
                    )
                ).tolist()

            # Get the response of the slate from the environment
            clicked_id, _in_environment = test_env.get_response(slate)

            # [TODO] Update your model here (optional)
            # [TODO] You can update your model at each step, or perform a batched update after some interval
            # [TODO] Code for updating your model...
            if clicked_id != -1:
                clicked_count += 1
                history.add(cur_user, clicked_id)
                model = update(model, cur_user, clicked_id)
                pbar.set_postfix({"#click": clicked_count})

            # Update the progress indicator
            pbar.update(1)

    # Record the score of this testing episode
    scores.append(test_env.get_score())

    # Reset the testing environment
    test_env.reset()

    # [TODO] Delete or reset your model weights here (in the end of each testing episode)
    # [TODO] Code for deleting your model weights...
    checkpoint.restore(tf.train.latest_checkpoint(best_ckpt_dir))
    history.reset()

# Calculate the average scores
avg_scores = [np.average(score) for score in zip(*scores)]

# Generate a DataFrame to output the result in a .csv file
df_result = pd.DataFrame([[user_id, avg_score] for user_id, avg_score in enumerate(avg_scores)],columns=["user_id", "avg_score"],)
df_result.to_csv(OUTPUT_PATH, index=False)
df_result

Model restored from ./checkpoint/FunkSVD/best\ckpt-41.


Testing: 12358it [02:13, 92.23it/s, #click=1503] 


Model restored from ./checkpoint/FunkSVD/best\ckpt-41.


Testing: 12324it [02:08, 95.91it/s, #click=1489] 


Model restored from ./checkpoint/FunkSVD/best\ckpt-41.


Testing: 12316it [02:07, 96.80it/s, #click=1482] 


Model restored from ./checkpoint/FunkSVD/best\ckpt-41.


Testing: 12332it [02:01, 101.71it/s, #click=1494]


Model restored from ./checkpoint/FunkSVD/best\ckpt-41.


Testing: 12407it [02:02, 100.95it/s, #click=1548]


Unnamed: 0,user_id,avg_score
0,0,0.0025
1,1,0.0025
2,2,0.0030
3,3,0.0025
4,4,0.0025
...,...,...
1995,1995,0.0025
1996,1996,0.0025
1997,1997,0.0025
1998,1998,0.0025


## Model

Model使用FunkSVDRecommender，是一種Collaborative Filtering的推薦系統，主要是利用user和item的interactions，目標是預測用戶對尚未互動過的物品的興趣，以便向用戶推薦他們可能感興趣的物品。其作法是建立一個User-Item Interaction Matrix，並將這個矩陣分解為三個矩陣的乘積，(User Matrix, Item Matrix, Diagonal Matrix)，再把對角矩陣開根號分別乘進去前兩個矩陣，即可得到${R} = {P}_{mxk} Q^T_{kxn}$。

挑選這個Model主要是因為如下原因:
1. Implicit Feature Representation: FunkSVD 將用戶和物品表示為具有隱性特徵的向量，這些特徵是model自動學習的。這使得model能捕捉到用戶和物品之間的複雜關係。
2. 簡單且易於實現，通常能夠在大規模數據集上取得不錯的效果。
 

## Experiments

1. Data collecting:利用Training environment去取得更多筆的user data，進而去推測前1000筆user的興趣。
2. Hyperparameters tuning:主要是tune `LEARNING_RATE`跟`N_EPOCHS`，最後是使用1e-3和200。
3. Training process:主要就是不斷在環境裡跟使用者互動，並更新model權重。

## Discussions

1. 由於 user_data.json 的資料量只有6000個iteraction，必須透過執行 TrainingEnvironment收集多一點interaction data，否則可能會比 random還差。
2. 由於物品數量龐大，大部分的互動都屬於負面互動（未點擊）。同時，系統中存在隨機性（對於同一個使用者-物品組合，有時會發生點擊，有時則不會）。如果將互動數據用1和-1表示正面和負面互動，可能會引入過多的Noise。可能要使用一些soft label，例如給一個使用者多看幾次同樣的物品，再取其點擊的平均並標準化等。
3. 多取得與Training user的互動(類似於hack模擬環境的user data)，並從大部分人的趨勢去推測某些熱門新聞或是user的興趣，例如:對於同一則新聞是否會再次點擊，也許看過的就不會想再看等等。