## NRMS Model with Trivial Discounting
### Course: *02456 - Deep Learning*  
**Technical University of Denmark (DTU)**  
---

### 📜 **Context**  
- This notebook is created as part of the course *02456 - Deep Learning* at DTU. It demonstrates a news recommender system model using Danish media *Ekstra Bladet*'s dataset to predict user preferences for news articles. The model implementation is inspired by the article [Neural News Recommendation with Multi-Head Self-Attention](https://dl.acm.org/doi/10.1145/3640457.3687164).
---

### 📝 **Differences from the Original Paper**  
- **Adding of Temporal Fetaures**: The published time from the article is taken into account. Relative time deltas are used as inputs for a pre-defined discounting function to calculate discount factors of the news representation.

### 🛠️ **What Does This Script Do?**  
1. **Model Creation**:  
   - Implements an intuitive temporal discounting in tensorflow and integrates it to the nrms model.

2. **Training**:  
   - Trains the model using data from *Ekstra Bladet's "2024 RecSys Challenge"*.

3. **Evaluation**:  
   - Evaluates the model on a dataset from *Ekstra Bladet's "2024 RecSys Challenge"*.
---

### 💻 **Hardware Setup**  
- This notebook has been tested on DTU's HPC and Google Colab Pro using a T4 GPU with 50GB of system RAM.
---

### 🔗 **References**  
1. [Neural News Recommendation with Multi-Head Self-Attention](https://dl.acm.org/doi/10.1145/3640457.3687164)  
2. [Extra Bladet's "2024 RecSys Challenge"](https://recsys.eb.dk/)
3. The main script is inspired by the examples from the organisor from the challenge. The Dataloader, Temporal layer and the integration to the original model is completly self created. 
---

### 🖊️ **Authors**  
- Simon Stohrer
- Jonas Vincent Ralf Dauscher
- Jofre Bonillo Mesegué
- Jan Christopher Leisbrock
- Emil Kragh Toft

### **Reproducibility**
The path in the following cell needs to be changed to the location, where your src folder is stored. After the load dataset headline, the path for the file location needs to be changed as well. The fraction is set to a very small value to allow a fast execution of the code, but should be set to 1 for score reproduction.

In [21]:
import sys
sys.path.append('/content/drive/MyDrive/Deepl learning/Jan_update/src')  # Add the parent directory to sys.path

## Load functionality

In [22]:
from transformers import AutoTokenizer, AutoModel
from pathlib import Path
import tensorflow as tf
import polars as pl
import datetime
from typing import List, Dict, Any, Tuple, Optional, Union
from datetime import datetime, timedelta
import numpy as np

from ebrec.utils._constants import *

from ebrec.utils._behaviors import (
    create_binary_labels_column,
    sampling_strategy_wu2019,
    add_prediction_scores,
    truncate_history,
    ebnerd_from_path,
)
from ebrec.evaluation import MetricEvaluator, AucScore, NdcgScore, MrrScore
from ebrec.utils._articles import convert_text2encoding_with_transformers
from ebrec.utils._polars import concat_str_columns, slice_join_dataframes
from ebrec.utils._articles import create_article_id_to_value_mapping
from ebrec.utils._nlp import get_transformers_word_embeddings
from ebrec.utils._python import write_submission_file, rank_predictions_by_score

#from ebrec.models.newsrec.dataloader import NewsrecDataLoader 
from ebrec.models.newsrec.model_config import hparams_nrms


In [23]:
# List all physical devices
gpus = tf.config.experimental.list_physical_devices("GPU")
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

physical_devices = tf.config.list_physical_devices()
print("Available devices:", physical_devices)

Available devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


### Generate labels


In [24]:
PATH = Path("/content/drive/MyDrive/Deepl learning/Jan_update/ebnerd_data")
#make sure to adjust the path to the data from the ebnerd dataset
DATASPLIT = "ebnerd_small"
DUMP_DIR = Path.joinpath(PATH,"ebnerd_predictions")
DUMP_DIR.mkdir(exist_ok=True, parents=True)

History size can often be a memory bottleneck; if adjusted, the NRMS hyperparameter ```history_size``` must be updated to ensure compatibility and efficient memory usage

In [25]:
HISTORY_SIZE = 20
hparams_nrms.history_size = HISTORY_SIZE

In [26]:
# We just want to load the necessary columns
COLUMNS = [
    DEFAULT_USER_COL,
    DEFAULT_IMPRESSION_ID_COL,
    DEFAULT_IMPRESSION_TIMESTAMP_COL,
    DEFAULT_HISTORY_ARTICLE_ID_COL,
    DEFAULT_CLICKED_ARTICLES_COL,
    DEFAULT_INVIEW_ARTICLES_COL,
]
# This notebook is just a simple 'get-started'; we down sample the number of samples to just run quickly through it.
FRACTION = 0.001

In this example we sample the dataset, just to keep it smaller. We'll split the training data into training and validation 

In [27]:
df = (
    ebnerd_from_path(
        PATH.joinpath(DATASPLIT, "train"),
        history_size=HISTORY_SIZE,
        padding=0,
    )
    .select(COLUMNS)
    .pipe(
        sampling_strategy_wu2019,
        npratio=4,
        shuffle=True,
        with_replacement=True,
        seed=123,
    )
    .pipe(create_binary_labels_column)
    .sample(fraction=FRACTION)
)

dt_split = pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL).max() - timedelta(days=1)
df_train = df.filter(pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL) < dt_split)
df_validation = df.filter(pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL) >= dt_split)

print(f"Train samples: {df_train.height}\nValidation samples: {df_validation.height}")
df_train.head(2)

Train samples: 202
Validation samples: 32


user_id,impression_id,impression_time,article_id_fixed,article_ids_clicked,article_ids_inview,labels
u32,u32,datetime[μs],list[i32],list[i64],list[i64],list[i8]
335666,566421329,2023-05-23 19:12:35,"[9765438, 9761384, … 9769575]",[9778168],"[9778139, 9778277, … 9778277]","[0, 0, … 0]"
1300067,123067130,2023-05-21 18:19:46,"[9768793, 9768566, … 9770452]",[9775042],"[9775042, 9769557, … 9775079]","[1, 0, … 0]"


### Test set
We'll use the validation set, as the test set.

In [28]:
df_test = (
    ebnerd_from_path(
        PATH.joinpath(DATASPLIT, "validation"),
        history_size=HISTORY_SIZE,
        padding=0,
    )
    .select(COLUMNS)
    .pipe(create_binary_labels_column)
    .sample(fraction=FRACTION)
)

## Load articles

In [29]:
df_articles = pl.read_parquet(PATH.joinpath(DATASPLIT+"/articles.parquet"))


In [30]:
# Prepare temporal features

def create_article_time_dict(df_articles: pl.DataFrame) -> Dict[int, datetime]:
    """Create lookup dictionary for article publishing times"""
    return dict(zip(
        df_articles["article_id"].to_list(),
        df_articles["published_time"].to_list()
    ))
article_time_dict = create_article_time_dict(df_articles)

def prepare_temporal_features(
    df: pl.DataFrame,
    article_time_dict: Dict[int, datetime],
    inview_col: str
) -> pl.DataFrame:
    """Add temporal features using vectorized operations"""

    inview_time_col = f"published_time_{inview_col}"

    return df.with_columns([
        pl.col(inview_col).map_elements(
            lambda ids: [article_time_dict.get(id) for id in ids],
            return_dtype=pl.List(pl.Datetime)
        ).alias(inview_time_col)
    ])


In [31]:

# Add temporal features
df_train = prepare_temporal_features(
    df_train,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)

df_validation = prepare_temporal_features(
    df_validation,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)

df_test = prepare_temporal_features(
    df_test,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)   


In [32]:

def compute_temporal_differences(
    df: pl.DataFrame,
    inview_time_col: str
) -> pl.DataFrame:
    """Compute time differences and exponential discounts"""

    # Add reference date (latest date from inview articles)
    df = df.with_columns(
        pl.col(inview_time_col)
        .map_elements(
            lambda dates: max((d for d in dates if d), default=None),
            return_dtype=pl.Datetime
        )
        .alias("reference_date")
    )
    
    return df
def calculate_time_difference_seconds(
    timestamps: List[Optional[datetime]], 
    reference_time: datetime
) -> List[Optional[float]]:
    """
    Calculate the time difference in seconds between a list of timestamps and a reference time.
    
    Args:
        timestamps: List of timestamps to compare (can contain None)
        reference_time: The reference timestamp to compare against
        
    Returns:
        List of time differences in seconds or None if timestamp is None
    """
    return [(reference_time - timestamp).total_seconds() if timestamp else None for timestamp in timestamps]

def add_time_difference_column(
    df: pl.DataFrame,
    timestamp_column: str,
    reference_time_column: str,
    output_column: str
) -> pl.DataFrame:
    """
    Add a column with time differences in seconds between lists of timestamps and a reference time.
    
    Args:
        df: Input Polars DataFrame
        timestamp_column: Name of column containing lists of timestamps
        reference_time_column: Name of column containing the reference time
        output_column: Name of output column
        
    Returns:
        DataFrame with added time difference column
    """
    df = df.with_columns([
        pl.struct([timestamp_column, reference_time_column]).map_elements(
            lambda row: calculate_time_difference_seconds(row[timestamp_column], row[reference_time_column]),
            return_dtype=pl.List(pl.Float64)
        ).alias(output_column)
    ])

    return df
def compute_exponential_discount(deltas: List[Optional[float]]) -> List[Optional[float]]:
    """
    Compute exponential discount based on time deltas.
    
    Args:
        deltas: List of time deltas in seconds
        
    Returns:
        List of discounts
    """
    
    max_delta = max((d for d in deltas if d is not None), default=1)
    max_delta = max(1, max_delta)  # Ensure max_delta is at least 1 to avoid division by zero
    
    return [np.exp(-d / (max_delta*4)) if d is not None else None for d in deltas]

def add_discount_column(
    df: pl.DataFrame,
    time_delta_column: str,
    output_column: str
) -> pl.DataFrame:
    """
    Add a column with exponential discounts based on time deltas.
    
    Args:
        df: Input Polars DataFrame
        time_delta_column: Name of column containing lists of time deltas
        output_column: Name of output column
        
    Returns:
        DataFrame with added discount column
    """
    df = df.with_columns([
        pl.col(time_delta_column).map_elements(
            compute_exponential_discount,
            return_dtype=pl.List(pl.Float64)
        ).alias(output_column)
    ])

    return df

In [33]:

df_train = compute_temporal_differences(
    df_train,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}"
)

df_validation = compute_temporal_differences(
    df_validation,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}"
)

df_test = compute_temporal_differences(
    df_test,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}"
)

df_train = add_time_difference_column(
    df_train,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}",
    "reference_date", 
    "time_delta"
)

df_validation = add_time_difference_column(
    df_validation,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}",
    "reference_date", 
    "time_delta"
)

df_test = add_time_difference_column(
    df_test,
    f"published_time_{DEFAULT_INVIEW_ARTICLES_COL}",
    "reference_date", 
    "time_delta"
)

df_train = add_discount_column(
    df_train,
    "time_delta",
    "discount_time_delta"
)

df_validation = add_discount_column(
    df_validation,
    "time_delta",
    "discount_time_delta"
)

df_test = add_discount_column(
    df_test,
    "time_delta",
    "discount_time_delta"
)



## Init model using HuggingFace's tokenizer and wordembedding
In the original implementation, they use the GloVe embeddings and tokenizer. To get going fast, we'll use a multilingual LLM from Hugging Face. 
Utilizing the tokenizer to tokenize the articles and the word-embedding to init NRMS.


In [34]:
TRANSFORMER_MODEL_NAME = "FacebookAI/xlm-roberta-base"
TEXT_COLUMNS_TO_USE = [DEFAULT_SUBTITLE_COL, DEFAULT_TITLE_COL]
MAX_TITLE_LENGTH = 30

# LOAD HUGGINGFACE:
transformer_model = AutoModel.from_pretrained(TRANSFORMER_MODEL_NAME)
transformer_tokenizer = AutoTokenizer.from_pretrained(TRANSFORMER_MODEL_NAME)

# We'll init the word embeddings using the
word2vec_embedding = get_transformers_word_embeddings(transformer_model)
#
df_articles, cat_cal = concat_str_columns(df_articles, columns=TEXT_COLUMNS_TO_USE)
df_articles, token_col_title = convert_text2encoding_with_transformers(
    df_articles, transformer_tokenizer, cat_cal, max_length=MAX_TITLE_LENGTH
)
# =>
article_mapping = create_article_id_to_value_mapping(
    df=df_articles, value_col=token_col_title
)

# Initiate the customized dataloaders
In the implementations we have disconnected the models and data. Hence, you should built a dataloader that fits your needs.

Note, with this ```NRMSDataLoader``` the ```eval_mode=False``` is meant for ```model.model.fit()``` whereas ```eval_mode=True``` is meant for ```model.scorer.predict()```. 

In [35]:
from dataclasses import dataclass, field
import tensorflow as tf
import polars as pl
import numpy as np

from ebrec.utils._articles_behaviors import map_list_article_id_to_value
from ebrec.utils._python import (
    repeat_by_list_values_from_matrix,
    create_lookup_objects,
)

from ebrec.utils._constants import (
    DEFAULT_INVIEW_ARTICLES_COL,
    DEFAULT_LABELS_COL,
    DEFAULT_USER_COL,
)

In [36]:
@dataclass
class NRMSTemporalDataLoader(NewsrecDataLoader):
    def transform(self, df: pl.DataFrame) -> pl.DataFrame:
        return df.pipe(
            map_list_article_id_to_value,
            behaviors_column=self.history_column,
            mapping=self.lookup_article_index,
            fill_nulls=self.unknown_index,
            drop_nulls=False,
        ).pipe(
            map_list_article_id_to_value,
            behaviors_column=self.inview_col,
            mapping=self.lookup_article_index,
            fill_nulls=self.unknown_index,
            drop_nulls=False,
        )
 
    def __getitem__(self, idx) -> tuple[tuple[np.ndarray], np.ndarray]:
        """
        his_input_title:    (samples, history_size, document_dimension)
        pred_input_title:   (samples, npratio, document_dimension)
        discount_time_delta: (samples, npratio, document_dimension)
        batch_y:            (samples, npratio)
        """
        batch_X = self.X[idx * self.batch_size : (idx + 1) * self.batch_size].pipe(
            self.transform
        )
        batch_y = self.y[idx * self.batch_size : (idx + 1) * self.batch_size]
        # =>
        if self.eval_mode:
            repeats = np.array(batch_X["n_samples"])
            # =>
            batch_y = np.array(batch_y.explode().to_list()).reshape(-1, 1)
            # =>
            his_input_title = repeat_by_list_values_from_matrix(
                batch_X[self.history_column].to_list(),
                matrix=self.lookup_article_matrix,
                repeats=repeats,
            )
            # =>
            pred_input_title = self.lookup_article_matrix[
                batch_X[self.inview_col].explode().to_list()
            ]
 
            discount_time_delta = np.array(batch_X["discount_time_delta"].explode().to_list()).reshape(-1, 1, 1)
        else:
            batch_y = np.array(batch_y.to_list())
            his_input_title = self.lookup_article_matrix[
                batch_X[self.history_column].to_list()
            ]
            pred_input_title = self.lookup_article_matrix[
                batch_X[self.inview_col].to_list()
            ]
            pred_input_title = np.squeeze(pred_input_title, axis=2)
 
            discount_time_delta = np.array(batch_X["discount_time_delta"].to_list())
 
        his_input_title = np.squeeze(his_input_title, axis=2)
        return (his_input_title, pred_input_title, discount_time_delta), batch_y

In [37]:
BATCH_SIZE = 16

train_dataloader = NRMSTemporalDataLoader(
    behaviors=df_train,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=False,
    batch_size=BATCH_SIZE,
)
val_dataloader = NRMSTemporalDataLoader(
    behaviors=df_validation,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=False,
    batch_size=BATCH_SIZE,
)

# Initiate the custom model

In [38]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
from ebrec.models.newsrec.layers import AttLayer2, SelfAttention
import tensorflow as tf
import numpy as np
 
from tensorflow.keras.layers import Embedding, Input, Dropout, Dense, BatchNormalization
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.regularizers import l2
 
 
class NRMSTemporalModel:
    """NRMS model(Neural News Recommendation with Multi-Head Self-Attention)
 
    Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang,and Xing Xie, "Neural News
    Recommendation with Multi-Head Self-Attention" in Proceedings of the 2019 Conference
    on Empirical Methods in Natural Language Processing and the 9th International Joint Conference
    on Natural Language Processing (EMNLP-IJCNLP)
 
    Attributes:
    """
 
    def __init__(
        self,
        hparams: dict,
        word2vec_embedding: np.ndarray = None,
        word_emb_dim: int = 300,
        vocab_size: int = 32000,
        seed: int = None,
    ):
        """Initialization steps for NRMS."""
        self.hparams = hparams
        self.seed = seed
 
        # SET SEED:
        tf.random.set_seed(seed)
        np.random.seed(seed)
 
        # INIT THE WORD-EMBEDDINGS:
        if word2vec_embedding is None:
            # Xavier Initialization
            initializer = GlorotUniform(seed=self.seed)
            self.word2vec_embedding = initializer(shape=(vocab_size, word_emb_dim))
            # self.word2vec_embedding = np.random.rand(vocab_size, word_emb_dim)
        else:
            self.word2vec_embedding = word2vec_embedding
 
        # BUILD AND COMPILE MODEL:
        self.model, self.scorer = self._build_graph()
        data_loss = self._get_loss(self.hparams.loss)
        train_optimizer = self._get_opt(
            optimizer=self.hparams.optimizer, lr=self.hparams.learning_rate
        )
        self.model.compile(loss=data_loss, optimizer=train_optimizer)
 
    def _get_loss(self, loss: str):
        """Make loss function, consists of data loss and regularization loss
        Returns:
            object: Loss function or loss function name
        """
        if loss == "cross_entropy_loss":
            data_loss = "categorical_crossentropy"
        elif loss == "log_loss":
            data_loss = "binary_crossentropy"
        else:
            raise ValueError(f"this loss not defined {loss}")
        return data_loss
 
    def _get_opt(self, optimizer: str, lr: float):
        """Get the optimizer according to configuration. Usually we will use Adam.
        Returns:
            object: An optimizer.
        """
        # TODO: shouldn't be a string input you should just set the optimizer, to avoid stuff like this:
        # => 'WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.'
        if optimizer == "adam":
            train_opt = tf.keras.optimizers.Adam(learning_rate=lr)
        else:
            raise ValueError(f"this optimizer not defined {optimizer}")
        return train_opt
 
    def _build_graph(self):
        """Build NRMS model and scorer.
 
        Returns:
            object: a model used to train.
            object: a model used to evaluate and inference.
        """
        model, scorer = self._build_nrms()
        return model, scorer
 
    def _build_userencoder(self, titleencoder):
        """The main function to create user encoder of NRMS.
 
        Args:
            titleencoder (object): the news encoder of NRMS.
 
        Return:
            object: the user encoder of NRMS.
        """
        his_input_title = tf.keras.Input(
            shape=(self.hparams.history_size, self.hparams.title_size), dtype="int32"
        )
 
        click_title_presents = tf.keras.layers.TimeDistributed(titleencoder)(
            his_input_title
        )
        y = SelfAttention(self.hparams.head_num, self.hparams.head_dim, seed=self.seed)(
            [click_title_presents] * 3
        )
        user_present = AttLayer2(self.hparams.attention_hidden_dim, seed=self.seed)(y)
 
        model = tf.keras.Model(his_input_title, user_present, name="user_encoder")
        return model
 
    def _build_newsencoder(self):
        """The main function to create news encoder of NRMS.
 
        Args:
            embedding_layer (object): a word embedding layer.
 
        Return:
            object: the news encoder of NRMS.
        """
        embedding_layer = tf.keras.layers.Embedding(
            self.word2vec_embedding.shape[0],
            self.word2vec_embedding.shape[1],
            weights=[self.word2vec_embedding],
            trainable=True,
        )
        sequences_input_title = tf.keras.Input(
            shape=(self.hparams.title_size,), dtype="int32"
        )
        embedded_sequences_title = embedding_layer(sequences_input_title)
 
        y = tf.keras.layers.Dropout(self.hparams.dropout)(embedded_sequences_title)
        y = SelfAttention(self.hparams.head_num, self.hparams.head_dim, seed=self.seed)(
            [y, y, y]
        )
 
        # Create configurable Dense layers:
        for layer in [400, 400, 400]:
            y = tf.keras.layers.Dense(units=layer, activation="relu")(y)
            y = tf.keras.layers.BatchNormalization()(y)
            y = tf.keras.layers.Dropout(self.hparams.dropout)(y)
 
        y = tf.keras.layers.Dropout(self.hparams.dropout)(y)
        pred_title = AttLayer2(self.hparams.attention_hidden_dim, seed=self.seed)(y)
 
        model = tf.keras.Model(sequences_input_title, pred_title, name="news_encoder")
        return model
 
    def _build_nrms(self):
        """The main function to create NRMS's logic. The core of NRMS
        is a user encoder and a news encoder.
 
        Returns:
            object: a model used to train.
            object: a model used to evaluate and inference.
        """
 
        his_input_title = tf.keras.Input(
            shape=(self.hparams.history_size, self.hparams.title_size),
            dtype="int32",
        )
        pred_input_title = tf.keras.Input(
            # shape = (hparams.npratio + 1, hparams.title_size)
            shape=(None, self.hparams.title_size),
            dtype="int32",
        )
        pred_input_title_one = tf.keras.Input(
            shape=(
                1,
                self.hparams.title_size,
            ),
            dtype="int32",
        )
 
        discount_time_delta = tf.keras.Input(
            shape=(None, 1), dtype="float32"
        )
 
        discount_time_delta_one = tf.keras.Input(
            shape=(1, 1), dtype="float32"
        )
 
        pred_title_one_reshape = tf.keras.layers.Reshape((self.hparams.title_size,))(
            pred_input_title_one
        )
        titleencoder = self._build_newsencoder()
        self.userencoder = self._build_userencoder(titleencoder)
        self.newsencoder = titleencoder
 
        user_present = self.userencoder(his_input_title)
        news_present = tf.keras.layers.TimeDistributed(self.newsencoder)(
            pred_input_title
        )
        news_present = tf.keras.layers.Multiply()( [news_present, discount_time_delta])
       
        news_present_one = self.newsencoder(pred_title_one_reshape)
        news_present_one = tf.keras.layers.Multiply()([news_present_one, discount_time_delta_one])
 
        preds = tf.keras.layers.Dot(axes=-1)([news_present, user_present])
        preds = tf.keras.layers.Activation(activation="softmax")(preds)
 
        pred_one = tf.keras.layers.Dot(axes=-1)([news_present_one, user_present])
        pred_one = tf.keras.layers.Activation(activation="sigmoid")(pred_one)
 
        model = tf.keras.Model([his_input_title, pred_input_title, discount_time_delta], preds)
        scorer = tf.keras.Model([his_input_title, pred_input_title_one, discount_time_delta_one], pred_one)
 
        return model, scorer
 

## Train the model


In [39]:
# List all physical devices
physical_devices = tf.config.list_physical_devices()
print("Available devices:", physical_devices)

Available devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


Initiate the NRMS-model:

In [40]:
model = NRMSTemporalModel(
    hparams=hparams_nrms,
    word2vec_embedding=word2vec_embedding,
    seed=42,
)
model.model.compile(
    optimizer=model.model.optimizer,
    loss=model.model.loss,
    metrics=["AUC"],
)

MODEL_NAME = model.__class__.__name__
MODEL_WEIGHTS = DUMP_DIR.joinpath(f"state_dict/{MODEL_NAME}/mini.weights.h5")
LOG_DIR = DUMP_DIR.joinpath(f"runs/{MODEL_NAME}")
print(MODEL_WEIGHTS)
### Callbacks
#We will add some callbacks to model training.
# Tensorboard:
#tensorboard_callback = tf.keras.callbacks.TensorBoard(
#    log_dir=LOG_DIR,
#    histogram_freq=1,
#)

# Earlystopping:
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="val_AUC",
    mode="max",
    patience=3,
    restore_best_weights=True,
)

# ModelCheckpoint:
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=MODEL_WEIGHTS,
    monitor="val_AUC",
    mode="max",
    save_best_only=False,
    save_weights_only=True,
    verbose=1,
)

# Learning rate scheduler:
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_AUC",
    mode="max",
    factor=0.2,
    patience=2,
    min_lr=1e-6,
)

callbacks = [early_stopping, modelcheckpoint, lr_scheduler]#tensorboard_callback
USE_CALLBACKS = True
EPOCHS = 4

hist = model.model.fit(
    train_dataloader,
    validation_data=val_dataloader,
    epochs=EPOCHS,
    callbacks=callbacks if USE_CALLBACKS else [],
)

C:\Users\janle\Desktop\Master_local\Data_storage\Deep_learning\ebnerd_data\ebnerd_predictions\state_dict\NRMSTemporalModel\mini.weights.h5
Epoch 1/4


  self._warn_if_super_not_called()


[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - AUC: 0.4890 - loss: 3.6007
Epoch 1: saving model to C:\Users\janle\Desktop\Master_local\Data_storage\Deep_learning\ebnerd_data\ebnerd_predictions\state_dict\NRMSTemporalModel\mini.weights.h5
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 3s/step - AUC: 0.4906 - loss: 3.5912 - val_AUC: 0.4855 - val_loss: 2.3757 - learning_rate: 1.0000e-04
Epoch 2/4
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - AUC: 0.7048 - loss: 1.9654
Epoch 2: saving model to C:\Users\janle\Desktop\Master_local\Data_storage\Deep_learning\ebnerd_data\ebnerd_predictions\state_dict\NRMSTemporalModel\mini.weights.h5
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 2s/step - AUC: 0.7018 - loss: 1.9873 - val_AUC: 0.5070 - val_loss: 5.0848 - learning_rate: 1.0000e-04
Epoch 3/4
[1m13/13[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2s/step - AUC: 0.7120 - loss: 1.9289
Epoch 3: saving

In [41]:
if USE_CALLBACKS:
    _ = model.model.load_weights(filepath=MODEL_WEIGHTS)

# Example how to compute some metrics:

In [42]:
BATCH_SIZE_TEST = 16

test_dataloader = NRMSTemporalDataLoader(
    behaviors=df_test,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=True,
    batch_size=BATCH_SIZE_TEST,
)

In [43]:
pred_test = model.scorer.predict(test_dataloader)



[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 2s/step


## Add the predictions to the dataframe

In [44]:
df_test = add_prediction_scores(df_test, pred_test)
df_test.head(2)

user_id,impression_id,impression_time,article_id_fixed,article_ids_clicked,article_ids_inview,labels,published_time_article_ids_inview,reference_date,time_delta,discount_time_delta,scores
u32,u32,datetime[μs],list[i32],list[i32],list[i32],list[i8],list[datetime[μs]],datetime[μs],list[f64],list[f64],list[f32]
2530015,434585758,2023-05-26 07:20:38,"[9779383, 9779269, … 9780181]",[9778944],"[9781598, 9781624, … 9186608]","[0, 0, … 0]","[2023-05-25 20:53:00, 2023-05-25 18:22:12, … 2022-04-21 05:41:48]",2023-05-26 03:12:18,"[22758.0, 31806.0, … 3.455103e7]","[0.999835, 0.99977, … 0.778801]","[0.071588, 0.172548, … 0.00758]"
1227139,7074582,2023-05-30 20:21:31,"[9780096, 9780096, … 9780181]",[9788149],"[9788149, 9780702, … 9506503]","[1, 0, … 0]","[2023-05-30 19:48:09, 2023-05-30 07:43:11, … 2022-11-11 08:55:58]",2023-05-30 19:48:09,"[0.0, 43498.0, … 1.7319131e7]","[1.0, 0.999372, … 0.778801]","[0.002128, 0.944029, … 0.018472]"


### Compute metrics

In [45]:
metrics = MetricEvaluator(
    labels=df_test["labels"].to_list(),
    predictions=df_test["scores"].to_list(),
    metric_functions=[AucScore(), MrrScore(), NdcgScore(k=5), NdcgScore(k=10)],
)
metrics.evaluate()

AUC: 100%|███████████████████████████████████| 244/244 [00:00<00:00, 615.02it/s]
AUC: 100%|████████████████████████████████████████████| 244/244 [00:00<?, ?it/s]
AUC: 100%|█████████████████████████████████| 244/244 [00:00<00:00, 10421.69it/s]
AUC: 100%|█████████████████████████████████| 244/244 [00:00<00:00, 17789.16it/s]


<MetricEvaluator class>: 
 {
    "auc": 0.5380245008302906,
    "mrr": 0.34388649184913933,
    "ndcg@5": 0.37738480858637524,
    "ndcg@10": 0.4589275216023012
}

## Make submission file

In [46]:
df_test = df_test.with_columns(
    pl.col("scores")
    .map_elements(lambda x: list(rank_predictions_by_score(x)))
    .alias("ranked_scores")
)
df_test.head(2)

  df_test = df_test.with_columns(


user_id,impression_id,impression_time,article_id_fixed,article_ids_clicked,article_ids_inview,labels,published_time_article_ids_inview,reference_date,time_delta,discount_time_delta,scores,ranked_scores
u32,u32,datetime[μs],list[i32],list[i32],list[i32],list[i8],list[datetime[μs]],datetime[μs],list[f64],list[f64],list[f32],list[i64]
2530015,434585758,2023-05-26 07:20:38,"[9779383, 9779269, … 9780181]",[9778944],"[9781598, 9781624, … 9186608]","[0, 0, … 0]","[2023-05-25 20:53:00, 2023-05-25 18:22:12, … 2022-04-21 05:41:48]",2023-05-26 03:12:18,"[22758.0, 31806.0, … 3.455103e7]","[0.999835, 0.99977, … 0.778801]","[0.071588, 0.172548, … 0.00758]","[20, 19, … 23]"
1227139,7074582,2023-05-30 20:21:31,"[9780096, 9780096, … 9780181]",[9788149],"[9788149, 9780702, … 9506503]","[1, 0, … 0]","[2023-05-30 19:48:09, 2023-05-30 07:43:11, … 2022-11-11 08:55:58]",2023-05-30 19:48:09,"[0.0, 43498.0, … 1.7319131e7]","[1.0, 0.999372, … 0.778801]","[0.002128, 0.944029, … 0.018472]","[5, 1, … 3]"


This is using the validation, simply add the testset to your flow.

In [47]:
write_submission_file(
    impression_ids=df_test[DEFAULT_IMPRESSION_ID_COL],
    prediction_scores=df_test["ranked_scores"],
    path=DUMP_DIR.joinpath("predictions.txt"),
    filename_zip=f"{DATASPLIT}_predictions-{MODEL_NAME}.zip",
)

244it [00:00, ?it/s]

Zipping C:\Users\janle\Desktop\Master_local\Data_storage\Deep_learning\ebnerd_data\ebnerd_predictions\predictions.txt to C:\Users\janle\Desktop\Master_local\Data_storage\Deep_learning\ebnerd_data\ebnerd_predictions\ebnerd_small_predictions-NRMSTemporalModel.zip





# DONE 🚀