# DataLab Cup 4: Recommender Systems

## Overview
In this competition, your goal is to design a recommender system that suggests news articles to users. The performance of your recommender system will be assessed using a simulation environment.

At each timestep, the simulation environment randomly selects an active user with a given user_id. Once you receive this user_id, your recommender system must generate a slate (a list of 5 distinct item_ids to recommend to the current user) and pass it to the environment. The environment then uses its internal information to determine which item the user will choose from the recommended list (with some degree of stochasticity) or decide not to choose any item due to a lack of interest.

Each user has a latent patience value (invisible to your recommender system), which slightly increases when an item is chosen and drastically decreases when no item is chosen in each round. If a user's patience drops below 0 or the user runs out of the time budget (2000 timesteps), the user leaves the environment. The chosen item_id (or -1 if no item is chosen) and whether the current user stays (True) or leaves (False) are returned as the result of recommending a slate of items. A new user (if any) will be randomly selected for recommendations in the next timestep after the response of the current user is generated.

Your recommender system should continue recommending items to the current user at each timestep as long as there are still active users in the environment. The simulation process terminates after all users have left the system.

Your goal is to maximize the session length of each user. The session length is defined as the number of timesteps a user interacts with your recommender system before leaving the environment. The calculated session length score, normalized to the range of 0 ~ 1, will be provided by the simulation environment after the completion of the simulation process.

在這個競賽中，你的目標是設計一個推薦系統，向使用者推薦新聞文章。你的推薦系統的性能將使用模擬環境進行評估。

在每個時間步驟，模擬環境會隨機選擇一個具有特定 user_id 的活躍使用者。一旦你收到這個 user_id，你的推薦系統必須生成一個 slate（一個包含 5 個不同 item_id 的推薦清單）並將其傳遞給環境。然後，環境使用內部信息來決定使用者將從推薦清單中選擇哪個項目（具有一定程度的隨機性），或者因缺乏興趣而決定不選擇任何項目。

每個使用者都有一個潛在的耐心值（對你的推薦系統不可見），當選擇一個項目時稍微增加，而當在每個回合中未選擇任何項目時則大幅減少。如果使用者的耐心值降到低於 0，或者使用者用完時間預算（2000 個時間步驟），則使用者將離開環境。選擇的 item_id（如果沒有選擇任何項目則為 -1）以及當前使用者是否留下（True）或離開（False）將作為推薦一系列項目後的結果返回。在生成當前使用者的回應之後，下一個時間步驟將隨機選擇新的使用者進行推薦。

只要環境中仍然有活躍的使用者，你的推薦系統應該在每個時間步驟繼續向當前使用者推薦項目。當所有使用者都離開系統後，模擬過程將終止。

你的目標是最大化每個使用者的會話長度。會話長度定義為使用者與你的推薦系統互動的時間步驟數，在離開環境之前。計算出的會話長度得分將在模擬過程完成後由模擬環境提供，並歸一化為 0 ~ 1 的範圍內。

In [1]:
import os
import random

import numpy as np
import pandas as pd
from tqdm import tqdm

from evaluation.environment import TrainingEnvironment, TestingEnvironment

2024-01-08 23:52:58.282485: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
# Official hyperparameters for this competition (do not modify)
N_TRAIN_USERS = 1000
N_TEST_USERS = 2000
N_ITEMS = 209527
HORIZON = 2000
TEST_EPISODES = 5
SLATE_SIZE = 5

## Datasets
In this competition, we won't provide a substantial user-item interaction dataset. Instead, limited information (3 items per user) on historical interactions will be available. To train your recommender system effectively, you need to employ a recommender policy to interact with the training environment and collect additional interaction data.

We will introduce the side-information datasets provided in the following sections.

在這個競賽中，我們不會提供大量的使用者-物品互動數據集。相反，我們將提供有限的歷史互動信息（每個使用者的 3 個項目）。為了有效訓練你的推薦系統，你需要使用一個推薦策略來與訓練環境進行互動，並收集額外的互動數據。

我們將在以下部分介紹提供的側信息數據集。

In [3]:
# Dataset paths
USER_DATA = os.path.join('dataset', 'user_data.json')
ITEM_DATA = os.path.join('dataset', 'item_data.json')

# Output file path
OUTPUT_PATH = os.path.join('output', 'output.csv')

## User Data
In the training environment, there are a total of 1000 users identified by IDs ranging from 0 to 999. For the testing environment, there are 2000 users with IDs ranging from 0 to 1999. The testing environment includes the same 1000 users found in the training environment (user 0 to user 999), and an additional 1000 new users (user 1000 to user 1999) are introduced.

For all 2000 users, we provide you with the past 3 clicked item IDs of each user. Let's examine the user dataset.

在訓練環境中，共有 1000 個使用者，其 ID 範圍從 0 到 999。而在測試環境中，則有 2000 個使用者，其 ID 範圍從 0 到 1999。測試環境包含與訓練環境相同的 1000 個使用者（從使用者 0 到使用者 999），並引入了額外的 1000 個新使用者（從使用者 1000 到使用者 1999）。

對於所有 2000 個使用者，我們提供每個使用者過去點擊過的 3 個項目 ID。讓我們來檢查使用者數據集。

In [4]:
df_user = pd.read_json(USER_DATA, lines=True)
df_user

Unnamed: 0,user_id,history
0,0,"[42558, 65272, 13353]"
1,1,"[146057, 195688, 143652]"
2,2,"[67551, 85247, 33714]"
3,3,"[116097, 192703, 103229]"
4,4,"[68756, 140123, 135289]"
...,...,...
1995,1995,"[95090, 131393, 130239]"
1996,1996,"[2360, 147130, 8145]"
1997,1997,"[99794, 138694, 157888]"
1998,1998,"[55561, 60372, 51442]"


## Item Data
Both the training and testing environments share a common pool of 209527 items as their item candidate pool. For the side information of these items, we provide text descriptions for each news article. The item dataset is derived from the News Category Dataset. It's important to note that you should only use the dataset provided by us. Utilizing the original dataset, which contains extra information, will be considered as cheating. Let's explore the item dataset.

訓練和測試環境共享一個包含 209527 個項目候選的項目池。對於這些項目的側信息，我們為每篇新聞文章提供了文字描述。該項目數據集是從新聞分類數據集中衍生而來。值得注意的是，你應該僅使用我們提供的數據集。使用包含額外信息的原始數據集將被視為作弊行為。讓我們來探索項目數據集。

In [5]:
df_item = pd.read_json(ITEM_DATA, lines=True)
df_item

Unnamed: 0,item_id,headline,short_description
0,0,Over 4 Million Americans Roll Up Sleeves For O...,Health experts said it is too early to predict...
1,1,"American Airlines Flyer Charged, Banned For Li...",He was subdued by passengers and crew when he ...
2,2,23 Of The Funniest Tweets About Cats And Dogs ...,"""Until you have a dog you don't understand wha..."
3,3,The Funniest Tweets From Parents This Week (Se...,"""Accidentally put grown-up toothpaste on my to..."
4,4,Woman Who Called Cops On Black Bird-Watcher Lo...,Amy Cooper accused investment firm Franklin Te...
...,...,...,...
209522,209522,RIM CEO Thorsten Heins' 'Significant' Plans Fo...,Verizon Wireless and AT&T are already promotin...
209523,209523,Maria Sharapova Stunned By Victoria Azarenka I...,"Afterward, Azarenka, more effusive with the pr..."
209524,209524,"Giants Over Patriots, Jets Over Colts Among M...","Leading up to Super Bowl XLVI, the most talked..."
209525,209525,Aldon Smith Arrested: 49ers Linebacker Busted ...,CORRECTION: An earlier version of this story i...


## Simulation Environments
We offer two simulation environments in this competition: TrainingEnvironment and TestingEnvironment. The only distinction between the two environments is the number of users, with 1000 for training and 2000 for testing. All public methods for both environments behave the same since they share the same base class.

Important Note: Ensure that you collect interaction data only by accessing the environment through the designated public methods listed below. Directly accessing or modifying any file or code in the evaluation directory, or retrieving internal attributes and states of the environment (including all attributes / methods starting with an underscore _), will be considered as cheating.

在這個競賽中，我們提供兩個模擬環境：TrainingEnvironment（訓練環境）和TestingEnvironment（測試環境）。這兩個環境唯一的區別是使用者的數量，訓練環境有 1000 個使用者，測試環境則有 2000 個使用者。由於它們共享同一個基本類別，因此兩個環境的所有公共方法行為都相同。

重要提示：請確保僅通過下面列出的指定公共方法來訪問環境以收集互動數據。直接訪問或修改評估目錄中的任何文件或代碼，或檢索環境的內部屬性和狀態（包括所有以底線 _ 開頭的屬性/方法），將被視為作弊行為。

## Training
The implementation of the recommender algorithm is left to you. If you're in need of ideas, you can refer to the Recommender Systems Tutorial notebook in Lecture 16. Here, we'll just provide some example use cases of the public methods.

Hint: If you're looking for inspiration, consider starting by collecting interaction data from the environment using your initial recommender policy. Afterward, improve your model with this data, and iterate through this collect-then-train loop.

Important Note: Ensure that you save your model weights after training. You will need to load a set of model weights trained exclusively on the training environment at the beginning of each test episode.


執行推薦算法的實現由你來完成。如果你需要想法，可以參考第16講中的推薦系統教程筆記本。這裡，我們將提供一些公共方法的示例用法。

提示：如果你需要靈感，可以考慮通過初始推薦策略從環境中收集互動數據開始。然後，使用這些數據改進你的模型，並在收集-訓練迴圈中進行迭代。

重要提示：請確保在訓練後保存模型權重。你將需要在每個測試情節開始時加載一組僅在訓練環境中訓練的模型權重。

In [6]:
# Initialize the training environment
train_env = TrainingEnvironment()

# Reset the training environment (this can be useful when you have finished one episode of simulation and do not want to re-initialize a new environment)
train_env.reset()

# Check if there exist any active users in the environment
env_has_next_state = train_env.has_next_state()
print(f'There is {"still some" if env_has_next_state else "no"} active users in the training environment.')

# Get the current user ID
user_id = train_env.get_state()
print(f'The current user is user {user_id}.')

# Get the response of recommending the slate to the current user
slate = [0, 1, 2, 3, 4]
clicked_id, in_environment = train_env.get_response(slate)
print(f'The click result of recommending {slate} to user {user_id} is {f"item {clicked_id}" if clicked_id != -1 else f"{clicked_id} (no click)"}.')
print(f'User {user_id} {"is still in" if in_environment else "leaves"} the environment.')

# Get the normalized session length score of all users
train_score = train_env.get_score()
df_train_score = pd.DataFrame([[user_id, score] for user_id, score in enumerate(train_score)], columns=['user_id', 'avg_score'])
df_train_score

2024-01-08 23:53:03.940244: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-08 23:53:03.940389: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-08 23:53:04.150578: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-08 23:53:04.150741: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2024-01-08 23:53:04.150839: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from S

There is still some active users in the training environment.
The current user is user 389.
The click result of recommending [0, 1, 2, 3, 4] to user 389 is -1 (no click).
User 389 is still in the environment.


Unnamed: 0,user_id,avg_score
0,0,0.0
1,1,0.0
2,2,0.0
3,3,0.0
4,4,0.0
...,...,...
995,995,0.0
996,996,0.0
997,997,0.0
998,998,0.0


## Testing
While testing, you are allowed to update your model. However, please adhere to the following rules:

1. Follow the testing template provided below. Modify only the sections marked as [TODO]. Additionally, please carefully follow the instructions specified in each [TODO] section. Modifying other sections or not adhering to the instructions is strictly forbidden.

2. Limit model updates to one testing episode. During testing-time updates, follow these steps: (a) Load your model weights trained exclusively on the training environment. (b) Run the testing environment and update your model with the collected data during the testing process. (c) Obtain the score for this testing episode and delete your model weights since they now contain some testing information. You should not save the model weights trained on the testing environment for another testing episode. Doing so will be regarded as cheating.

3. Due to the randomness in the user decision process, run the testing process 5 times and calculate the average session length for each user as the final score. This part has been covered for you.

After completing the testing process, remember to submit the generated output.csv file to the Kaggle competition.

We will illustrate the testing process with a pure random recommender below.


在測試時，你可以更新你的模型。但是，請遵循以下規則：

1. 按照下面提供的測試模板進行測試。僅修改標記為 [TODO] 的部分。此外，請仔細遵循每個 [TODO] 部分指定的指示。嚴禁修改其他部分或不遵守指示。

2. 將模型更新限制為一個測試情節。在測試時更新模型時，請按照以下步驟進行：(a) 加載僅在訓練環境中訓練的模型權重。(b) 運行測試環境並使用在測試過程中收集的數據來更新你的模型。(c) 獲取此測試情節的分數並刪除你的模型權重，因為它們現在包含了一些測試信息。你不應該保存在測試環境中訓練的模型權重以供另一個測試情節使用。這樣做將被視為作弊行為。

3. 由於使用者決策過程中存在隨機性，運行測試過程 5 次並計算每個使用者的平均會話長度作為最終得分。這部分已經為你處理了。

完成測試過程後，請記得將生成的 output.csv 文件提交到 Kaggle 競賽中。

以下我們將以一個純隨機推薦器來說明測試過程。

In [7]:
# Initialize the testing environment
test_env = TestingEnvironment()
scores = []

# The item_ids here is for the random recommender
item_ids = [i for i in range(N_ITEMS)]

# Repeat the testing process for 5 times
for _ in range(TEST_EPISODES):
    # [TODO] Load your model weights here (in the beginning of each testing episode)
    # [TODO] Code for loading your model weights...

    # Start the testing process
    with tqdm(desc='Testing') as pbar:
        # Run as long as there exist some active users
        while test_env.has_next_state():
            # Get the current user id
            cur_user = test_env.get_state()

            # [TODO] Employ your recommendation policy to generate a slate of 5 distinct items
            # [TODO] Code for generating the recommended slate...
            # Here we provide a simple random implementation
            slate = random.sample(item_ids, k=SLATE_SIZE)

            # Get the response of the slate from the environment
            clicked_id, in_environment = test_env.get_response(slate)

            # [TODO] Update your model here (optional)
            # [TODO] You can update your model at each step, or perform a batched update after some interval
            # [TODO] Code for updating your model...

            # Update the progress indicator
            pbar.update(1)

    # Record the score of this testing episode
    scores.append(test_env.get_score())

    # Reset the testing environment
    test_env.reset()

    # [TODO] Delete or reset your model weights here (in the end of each testing episode)
    # [TODO] Code for deleting your model weights...

# Calculate the average scores 
avg_scores = [np.average(score) for score in zip(*scores)]

# Generate a DataFrame to output the result in a .csv file
df_result = pd.DataFrame([[user_id, avg_score] for user_id, avg_score in enumerate(avg_scores)], columns=['user_id', 'avg_score'])
df_result.to_csv(OUTPUT_PATH, index=False)
df_result

Testing: 10223it [00:10, 962.43it/s]
Testing: 10247it [00:10, 991.02it/s]
Testing: 10232it [00:10, 977.50it/s]
Testing: 10293it [00:10, 976.84it/s]
Testing: 10247it [00:10, 983.85it/s]


Unnamed: 0,user_id,avg_score
0,0,0.0025
1,1,0.0027
2,2,0.0025
3,3,0.0025
4,4,0.0025
...,...,...
1995,1995,0.0025
1996,1996,0.0025
1997,1997,0.0025
1998,1998,0.0025
