In this notebook, we compare two types of user embeddings on a mean rating prediction task: statistical embeddings, which are based on users' genre preferences derived from anime descriptions, and PTUM embeddings learned through the PTUM model. We use XGBoost as the regression model and evaluate the quality of the predictions on the validation dataset using the median absolute error (MedAE).

In [None]:
!git clone https://github.com/horacemtb/Anime-recommender-engine.git

Cloning into 'Anime-recommender-engine'...
remote: Enumerating objects: 128, done.[K
remote: Counting objects: 100% (107/107), done.[K
remote: Compressing objects: 100% (85/85), done.[K
remote: Total 128 (delta 38), reused 75 (delta 17), pack-reused 21 (from 1)[K
Receiving objects: 100% (128/128), 75.11 MiB | 17.90 MiB/s, done.
Resolving deltas: 100% (40/40), done.
Updating files: 100% (41/41), done.


In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import pandas as pd
import numpy as np

import xgboost as xgb

from sklearn.model_selection import train_test_split
from sklearn.metrics import median_absolute_error

from tqdm import tqdm
import pickle

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
pd.set_option('display.max_columns', None)

Load datasets and generate mean ratings as the target variable. Write functions to train an XGBoost regressor for predicting mean ratings and implement a baseline estimation method using simple median prediction for comparison.

In [None]:
ptum_user_emb_df = pd.read_csv('/content/user_embeddings.csv')
holdout_data = pd.read_csv('/content/holdout_data.csv')

user_emb_df = pd.read_csv('/content/Anime-recommender-engine/app/data/user-preferences.csv')
user_emb_df = user_emb_df[user_emb_df['user_id'].isin(ptum_user_emb_df['user_id'].unique())].reset_index(drop=True)
user_emb_df.drop('mean_rating', axis = 1, inplace = True)

In [None]:
user_mean_rating = holdout_data.groupby('user_id')['rating'].mean().to_dict()

In [None]:
ptum_user_emb_df['rating'] = ptum_user_emb_df['user_id'].map(user_mean_rating)
user_emb_df['rating'] = user_emb_df['user_id'].map(user_mean_rating)

In [None]:
np.all(ptum_user_emb_df['user_id'].unique() == user_emb_df['user_id'].unique())

True

In [None]:
def train_xgboost(df, params):

  train_df, val_df = train_test_split(df, test_size = 0.2, random_state = 789)

  xgb_train = xgb.DMatrix(train_df.drop(['user_id', 'rating'], axis=1), label=train_df['rating'])
  xgb_val = xgb.DMatrix(val_df.drop(['user_id', 'rating'], axis=1), label=val_df['rating'])

  xgb_regressor = xgb.train(params,
                            dtrain=xgb_train,
                            num_boost_round=2000,
                            evals=[(xgb_val, 'validation')],
                            early_stopping_rounds=20,
                            verbose_eval = 50
                           )

  val_preds = xgb_regressor.predict(xgb_val)

  return round(median_absolute_error(val_df['rating'], val_preds), 4)

In [None]:
def estimate_baseline(df):

  train_df, val_df = train_test_split(df, test_size = 0.2, random_state = 789)

  val_preds = [val_df['rating'].median()] * len(val_df)

  return round(median_absolute_error(val_df['rating'], val_preds), 4)

In [None]:
params = {'objective': 'reg:squarederror',
          'max_depth': 8,
          'min_child_weight': 10,
          'learning_rate': 0.01,
          'colsample_bytree': 0.6,
          'subsample': 0.8,
          'random_state': 42,
          'n_jobs': 8
         }

In [None]:
print(f"MAE on validation data with PTUM embeddings: {train_xgboost(ptum_user_emb_df, params)}")
print(f"MAE on validation data with regular embeddings: {train_xgboost(user_emb_df, params)}")
print(f"MAE on validation data using baseline: {estimate_baseline(user_emb_df)}")

[0]	validation-rmse:0.81526
[50]	validation-rmse:0.79089
[100]	validation-rmse:0.78312
[136]	validation-rmse:0.78276
MAE on validation data with PTUM embeddings: 0.4891
[0]	validation-rmse:0.81483
[50]	validation-rmse:0.77444
[100]	validation-rmse:0.75715
[150]	validation-rmse:0.74795
[200]	validation-rmse:0.74281
[250]	validation-rmse:0.74235
[256]	validation-rmse:0.74203
MAE on validation data with regular embeddings: 0.5034
MAE on validation data using baseline: 0.5075


The results demonstrate that PTUM-based user embeddings outperform both statistical embeddings and the baseline median prediction in the mean rating prediction task. The PTUM embeddings achieved the lowest Median Absolute Error (MAE) of 0.4891 on the validation data, indicating their superior ability to capture user preferences. Statistical embeddings, derived from users' genre preferences based on anime descriptions, performed slightly worse with an MAE of 0.5034 but still outperformed the baseline method, which achieved an MAE of 0.5075.

What makes this particularly impressive is that PTUM embeddings did not rely on any anime descriptions, unlike the statistical embeddings, which explicitly utilized this information. This underscores the potential of PTUM in learning meaningful representations of user behavior directly from interaction data alone.