## NRMS Model with Temporal Layer  
### Course: *02456 - Deep Learning*  
**Technical University of Denmark (DTU)**  
---

### 📜 **Context**  
- This notebook is created as part of the course *02456 - Deep Learning* at DTU. It demonstrates a news recommender system model using Danish media *Ekstra Bladet*'s dataset to predict user preferences for news articles. The model implementation is inspired by the article [Neural News Recommendation with Multi-Head Self-Attention](https://dl.acm.org/doi/10.1145/3640457.3687164).
---

### 📝 **Differences from the Original Paper**  
- **Adding of Temporal Fetaures**: The published time from the article is taken into account by feeding relative time-deltas into its own layer, which returns discounting factors for the news representation.

### 🛠️ **What Does This Script Do?**  
1. **Model Creation**:  
   - Implements a temporal layer in tensorflow and integrates it to the nrms model.

2. **Training**:  
   - Trains the model using data from *Ekstra Bladet's "2024 RecSys Challenge"*.

3. **Evaluation**:  
   - Evaluates the model on a dataset from *Ekstra Bladet's "2024 RecSys Challenge"*.
---

### 💻 **Hardware Setup**  
- This notebook has been tested on DTU's HPC and Google Colab Pro using a T4 GPU with 50GB of system RAM.
---

### 🔗 **References**  
1. [Neural News Recommendation with Multi-Head Self-Attention](https://dl.acm.org/doi/10.1145/3640457.3687164)  
2. [Extra Bladet's "2024 RecSys Challenge"](https://recsys.eb.dk/)
3. The main script is inspired by the examples from the organisor from the challenge. The Dataloader, Temporal layer and the integration to the original model is completly self created. 
---

### 🖊️ **Authors**  
- Simon Stohrer
- Jonas Vincent Ralf Dauscher
- Jofre Bonillo Mesegué
- Jan Christopher Leisbrock
- Emil Kragh Toft


### **Reproducibility**
The path in the following cell needs to be changed to the location, where your src folder is stored. After the load dataset headline, the path for the file location needs to be changed as well. The fraction is set to a very small value to allow a fast execution of the code, but should be set to 1 for score reproduction.

In [43]:
import sys
sys.path.append('/content/drive/MyDrive/Deepl learning/Jan_update/src')  # Add the parent directory to sys.path

In [42]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Load functionality

In [None]:
from transformers import AutoTokenizer, AutoModel
from pathlib import Path
import tensorflow as tf
import polars as pl
import datetime
from typing import List, Dict, Any, Tuple, Optional, Union
from datetime import datetime, timedelta
import numpy as np

from ebrec.utils._constants import *

from ebrec.utils._behaviors import (
    create_binary_labels_column,
    sampling_strategy_wu2019,
    add_prediction_scores,
    truncate_history,
    ebnerd_from_path,
)
from ebrec.evaluation import MetricEvaluator, AucScore, NdcgScore, MrrScore
from ebrec.utils._articles import convert_text2encoding_with_transformers
from ebrec.utils._polars import concat_str_columns
from ebrec.utils._articles import create_article_id_to_value_mapping
from ebrec.utils._nlp import get_transformers_word_embeddings
from ebrec.utils._python import write_submission_file, rank_predictions_by_score

from ebrec.models.newsrec.dataloader import NewsrecDataLoader
from ebrec.models.newsrec.model_config import hparams_nrms

In [None]:
# List all physical devices
gpus = tf.config.experimental.list_physical_devices("GPU")
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)
physical_devices = tf.config.list_physical_devices()
print("Available devices:", physical_devices)

Available devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Load dataset

In [None]:
# Make sure that PATH is the Data path for the data provided by ekstra bladet
PATH = Path("/content/drive/MyDrive/Deepl learning/Jan_update/ebnerd_data")
DATASPLIT = "ebnerd_small"
DUMP_DIR = Path.joinpath(PATH,"ebnerd_predictions")
DUMP_DIR.mkdir(exist_ok=True, parents=True)

In [48]:
HISTORY_SIZE = 20
hparams_nrms.history_size = HISTORY_SIZE

In [None]:
# We just want to load the necessary columns
COLUMNS = [
    DEFAULT_USER_COL,
    DEFAULT_IMPRESSION_ID_COL,
    DEFAULT_IMPRESSION_TIMESTAMP_COL,
    DEFAULT_HISTORY_ARTICLE_ID_COL,
    DEFAULT_CLICKED_ARTICLES_COL,
    DEFAULT_INVIEW_ARTICLES_COL,
]
# This notebook is just a simple 'get-started'; we down sample the number of samples to just run quickly through it.
FRACTION = 0.1

In this example we sample the dataset, just to keep it smaller. We'll split the training data into training and validation

In [None]:
df = (
    ebnerd_from_path(
        PATH.joinpath(DATASPLIT, "train"),
        history_size=HISTORY_SIZE,
        padding=0,
    )
    .select(COLUMNS)
    .pipe(
        sampling_strategy_wu2019,
        npratio=4,
        shuffle=True,
        with_replacement=True,
        seed=123,
    )
    .pipe(create_binary_labels_column)
    .sample(fraction=FRACTION)
)

dt_split = pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL).max() - timedelta(days=1)
df_train = df.filter(pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL) < dt_split)
df_validation = df.filter(pl.col(DEFAULT_IMPRESSION_TIMESTAMP_COL) >= dt_split)

print(f"Train samples: {df_train.height}\nValidation samples: {df_validation.height}")

Train samples: 201537
Validation samples: 32740


user_id,impression_id,impression_time,article_id_fixed,article_ids_clicked,article_ids_inview,labels
u32,u32,datetime[μs],list[i32],list[i64],list[i64],list[i8]
526520,157014,2023-05-22 19:50:50,"[9758182, 9761469, … 9770799]",[9776442],"[9776394, 9776223, … 9776442]","[0, 0, … 1]"
526520,157016,2023-05-22 19:52:45,"[9758182, 9761469, … 9770799]",[9776234],"[9776322, 9776234, … 9220931]","[0, 1, … 0]"


### Test set
We'll use the validation set, as the test set.

In [51]:
df_test = (
    ebnerd_from_path(
        PATH.joinpath(DATASPLIT, "validation"),
        history_size=HISTORY_SIZE,
        padding=0,
    )
    .select(COLUMNS)
    .pipe(create_binary_labels_column)
    .sample(fraction=FRACTION)
)

## Load articles

In [52]:
df_articles = pl.read_parquet(PATH.joinpath(DATASPLIT+"/articles.parquet"))
df_articles.head(2)

article_id,title,subtitle,last_modified_time,premium,body,published_time,image_ids,article_type,url,ner_clusters,entity_groups,topics,category,subcategory,category_str,total_inviews,total_pageviews,total_read_time,sentiment_score,sentiment_label
i32,str,str,datetime[μs],bool,str,datetime[μs],list[i64],str,str,list[str],list[str],list[str],i16,list[i16],str,i32,i32,f32,f32,str
3001353,"""Natascha var ikke den første""","""Politiet frygter nu, at Natasc…",2023-06-29 06:20:33,False,"""Sagen om den østriske Natascha…",2006-08-31 08:06:45,[3150850],"""article_default""","""https://ekstrabladet.dk/krimi/…",[],[],"[""Kriminalitet"", ""Personfarlig kriminalitet""]",140,[],"""krimi""",,,,0.9955,"""Negative"""
3003065,"""Kun Star Wars tjente mere""","""Biografgængerne strømmer ind f…",2023-06-29 06:20:35,False,"""Vatikanet har opfordret til at…",2006-05-21 16:57:00,[3006712],"""article_default""","""https://ekstrabladet.dk/underh…",[],[],"[""Underholdning"", ""Film og tv"", ""Økonomi""]",414,"[433, 434]","""underholdning""",,,,0.846,"""Positive"""


In [None]:
# Prepare temporal features
def create_article_time_dict(df_articles: pl.DataFrame) -> Dict[int, datetime]:
    """Create lookup dictionary for article publishing times"""
    return dict(zip(
        df_articles["article_id"].to_list(),
        df_articles["published_time"].to_list()
    ))
article_time_dict = create_article_time_dict(df_articles)

In [54]:
def prepare_temporal_features(
    df: pl.DataFrame,
    article_time_dict: Dict[int, datetime],
    inview_col: str
) -> pl.DataFrame:
    """Add temporal features using vectorized operations."""
    # Add published times
    df = df.with_columns([
        pl.col(inview_col).map_elements(
            lambda ids: [article_time_dict.get(id) for id in ids],
            return_dtype=pl.List(pl.Datetime)
        ).alias(f"published_time_{inview_col}")
    ])
    # Add reference date (latest date from inview articles)
    df = df.with_columns(
        pl.col(f"published_time_{inview_col}")
        .map_elements(
            lambda dates: max((d for d in dates if d), default=None),
            return_dtype=pl.Datetime
        )
        .alias("reference_date")
    )
    # Calculate time differences in seconds
    df = df.with_columns([
        pl.struct([f"published_time_{inview_col}", "reference_date"])
        .map_elements(
            lambda row: calculate_time_difference_seconds(
                row[f"published_time_{inview_col}"],
                row["reference_date"]
            ),
            return_dtype=pl.List(pl.Float64)
        ).alias("time_delta")
    ])
    return df
def calculate_time_difference_seconds(
    timestamps: List[Optional[datetime]],
    reference_time: datetime
) -> List[Optional[float]]:
    """Calculate time differences in seconds between timestamps and reference time."""
    return [
        (reference_time - timestamp).total_seconds()
        if timestamp else None
        for timestamp in timestamps
    ]
# Create article time dictionary
article_time_dict = create_article_time_dict(df_articles)
# Add temporal features to your datasets
df_train = prepare_temporal_features(
    df_train,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)
df_validation = prepare_temporal_features(
    df_validation,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)
df_test = prepare_temporal_features(
    df_test,
    article_time_dict,
    DEFAULT_INVIEW_ARTICLES_COL
)


## Init model using HuggingFace's tokenizer and wordembedding
In the original implementation, they use the GloVe embeddings and tokenizer. To get going fast, we'll use a multilingual LLM from Hugging Face.
Utilizing the tokenizer to tokenize the articles and the word-embedding to init NRMS.


In [55]:
TRANSFORMER_MODEL_NAME = "FacebookAI/xlm-roberta-base"
TEXT_COLUMNS_TO_USE = [DEFAULT_SUBTITLE_COL, DEFAULT_TITLE_COL]
MAX_TITLE_LENGTH = 30

# LOAD HUGGINGFACE:
transformer_model = AutoModel.from_pretrained(TRANSFORMER_MODEL_NAME)
transformer_tokenizer = AutoTokenizer.from_pretrained(TRANSFORMER_MODEL_NAME)

# We'll init the word embeddings using the
word2vec_embedding = get_transformers_word_embeddings(transformer_model)
#
df_articles, cat_cal = concat_str_columns(df_articles, columns=TEXT_COLUMNS_TO_USE)
df_articles, token_col_title = convert_text2encoding_with_transformers(
    df_articles, transformer_tokenizer, cat_cal, max_length=MAX_TITLE_LENGTH
)
# =>
article_mapping = create_article_id_to_value_mapping(
    df=df_articles, value_col=token_col_title
)

# Initiate the customized NRMSTemporal dataloaders


In [56]:
from dataclasses import dataclass, field
import tensorflow as tf
import polars as pl
import numpy as np

from ebrec.utils._articles_behaviors import map_list_article_id_to_value
from ebrec.utils._python import (
    repeat_by_list_values_from_matrix,
    create_lookup_objects,
)

from ebrec.utils._constants import (
    DEFAULT_INVIEW_ARTICLES_COL,
    DEFAULT_LABELS_COL,
    DEFAULT_USER_COL,
)


In [None]:
@dataclass
class NRMSTemporalDataLoader(NewsrecDataLoader):
    """DataLoader for NRMS model with temporal features.

    This dataloader handles both the article content and temporal features,
    ensuring proper shape and normalization of time-based signals.

    Attributes:
        behaviors (pl.DataFrame): DataFrame containing user behaviors
        history_column (str): Name of column containing user history
        article_dict (dict): Dictionary mapping article IDs to their embeddings
        unknown_representation (str): How to handle unknown articles
        eval_mode (bool): Whether in evaluation mode
        batch_size (int): Size of batches
        inview_col (str): Column name for candidate articles
        labels_col (str): Column name for labels
        user_col (str): Column name for user IDs
    """

    def transform(self, df: pl.DataFrame) -> pl.DataFrame:
        """Transform article IDs to their corresponding embeddings."""
        return df.pipe(
            map_list_article_id_to_value,
            behaviors_column=self.history_column,
            mapping=self.lookup_article_index,
            fill_nulls=self.unknown_index,
            drop_nulls=False,
        ).pipe(
            map_list_article_id_to_value,
            behaviors_column=self.inview_col,
            mapping=self.lookup_article_index,
            fill_nulls=self.unknown_index,
            drop_nulls=False,
        )

    def normalize_time_deltas(self, time_deltas: np.ndarray, eval_mode: bool = False) -> np.ndarray:
        """Normalize time deltas and ensure correct shape.

        Args:
            time_deltas: Array of time differences in seconds
            eval_mode: Whether in evaluation mode (affects reshaping)

        Returns:
            Normalized time deltas with proper shape
        """
        # Add small epsilon to avoid division by zero
        epsilon = 1e-10

        # Replace None/NaN values with maximum time delta
        max_delta = np.nanmax(time_deltas) + epsilon
        time_deltas = np.nan_to_num(time_deltas, nan=max_delta)

        # Normalize to [0, 1] range using log-scale normalization
        # Adding 1 to avoid log(0) and to make very recent items close to 0
        normalized = np.log1p(time_deltas) / np.log1p(max_delta)

        # Shape handling
        if normalized.ndim == 1:
            normalized = normalized.reshape(-1, 1)

        return normalized

    def __getitem__(self, idx) -> tuple[tuple[np.ndarray], np.ndarray]:
        """Get a batch of data.

        Args:
            idx: Batch index

        Returns:
            Tuple containing:
                - his_input_title: User history article embeddings
                - pred_input_title: Candidate article embeddings
                - time_deltas: Normalized time differences
                - batch_y: Labels
        """
        batch_X = self.X[idx * self.batch_size : (idx + 1) * self.batch_size].pipe(
            self.transform
        )
        batch_y = self.y[idx * self.batch_size : (idx + 1) * self.batch_size]

        if self.eval_mode:
            # Evaluation mode - process all candidates
            repeats = np.array(batch_X["n_samples"])
            batch_y = np.array(batch_y.explode().to_list()).reshape(-1, 1)

            # Process history
            his_input_title = repeat_by_list_values_from_matrix(
                batch_X[self.history_column].to_list(),
                matrix=self.lookup_article_matrix,
                repeats=repeats,
            )

            # Process candidates
            pred_input_title = self.lookup_article_matrix[
                batch_X[self.inview_col].explode().to_list()
            ]

            # Process time deltas
            time_deltas = np.array(batch_X["time_delta"].explode().to_list())
            time_deltas = self.normalize_time_deltas(time_deltas, eval_mode=True)

        else:
            # Training mode - process fixed number of candidates
            batch_y = np.array(batch_y.to_list())

            # Process history
            his_input_title = self.lookup_article_matrix[
                batch_X[self.history_column].to_list()
            ]

            # Process candidates
            pred_input_title = self.lookup_article_matrix[
                batch_X[self.inview_col].to_list()
            ]
            pred_input_title = np.squeeze(pred_input_title, axis=2)

            # Process time deltas
            time_deltas = np.array(batch_X["time_delta"].to_list())
            time_deltas = self.normalize_time_deltas(time_deltas)

        # Final shape adjustments
        his_input_title = np.squeeze(his_input_title, axis=2)

        # Ensure time_deltas matches pred_input_title shape for training mode
        if not self.eval_mode:
            # Reshape time_deltas to match pred_input_title: (batch_size, n_candidates, 1)
            time_deltas = time_deltas.reshape(pred_input_title.shape[0], -1, 1)
        else:
            # For eval mode, maintain the proper shape based on all candidates
            time_deltas = time_deltas.reshape(-1, 1, 1)

        return (his_input_title, pred_input_title, time_deltas), batch_y

In [None]:
BATCH_SIZE = 64
train_dataloader = NRMSTemporalDataLoader(
    behaviors=df_train,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=False,
    batch_size=BATCH_SIZE,
)
val_dataloader = NRMSTemporalDataLoader(
    behaviors=df_validation,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=False,
    batch_size=BATCH_SIZE,
)

# Create the model

## Create the customized Temporal Layer

In [60]:
class TemporalLayer(tf.keras.layers.Layer):
    """Custom layer to learn temporal relationships in news recommendations.
    This layer takes time differences as input and learns a temporal weighting function.
    Instead of using a fixed exponential decay, it allows the model to learn the optimal
    temporal weighting scheme.
    """
    def __init__(self, units=64, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)
        self.dropout_rate = 0.2
    def build(self, input_shape):
        # Create trainable weights for temporal transformation
        self.temporal_transform = tf.keras.layers.Dense(
            self.units,
            activation=self.activation,
            kernel_initializer='glorot_uniform',
            name='temporal_transform'
        )
        # Final projection to scalar weight
        self.temporal_intermediate = tf.keras.layers.Dense(
            400,
            activation=self.activation,  # Ensure output is between 0 and 1
            kernel_initializer='glorot_uniform',
            name='temporal_intermediate'
        )
        # Final projection to scalar weight
        self.temporal_intermediate_2 = tf.keras.layers.Dense(
            400,
            activation=self.activation,  # Ensure output is between 0 and 1
            kernel_initializer='glorot_uniform',
            name='temporal_intermediate_2'
        )
        # Final projection to scalar weight
        self.temporal_weight = tf.keras.layers.Dense(
            1,
            activation='sigmoid',  # Ensure output is between 0 and 1
            kernel_initializer='glorot_uniform',
            name='temporal_weight'
        )
        self.temporal_dropout = tf.keras.layers.Dropout(self.dropout_rate)

        super().build(input_shape)
    def call(self, inputs, training=None):
        # inputs shape: (batch_size, sequence_length, 1)
        # Transform temporal features through MLP
        x = self.temporal_transform(inputs)  # (batch_size, sequence_length, units)
        x = self.temporal_intermediate(x)
        x = self.temporal_dropout(x)
        x = self.temporal_intermediate_2(x)
        x = self.temporal_dropout(x)
        # Project to temporal weights
        temporal_weights = self.temporal_weight(x)  # (batch_size, sequence_length, 1)

        temporal_weights = tf.ensure_shape(temporal_weights, (None, None, 1))
        temporal_weights_400 = tf.tile(temporal_weights, [1, 1, 400])  # Shape: (batch_size, 400)
        temporal_weights_400 = tf.ensure_shape(temporal_weights_400, (None, None, 400))

        return temporal_weights_400  # Will be used for multiplication with news embeddings

In [None]:
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
from ebrec.models.newsrec.layers import AttLayer2, SelfAttention
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Embedding, Input, Dropout, Dense, BatchNormalization
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.regularizers import l2
class NRMSTemporalModel:
    """NRMS model(Neural News Recommendation with Multi-Head Self-Attention)
    Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang,and Xing Xie, "Neural News
    Recommendation with Multi-Head Self-Attention" in Proceedings of the 2019 Conference
    on Empirical Methods in Natural Language Processing and the 9th International Joint Conference
    on Natural Language Processing (EMNLP-IJCNLP)
    Attributes:
    """
    def __init__(
        self,
        hparams: dict,
        word2vec_embedding: np.ndarray = None,
        word_emb_dim: int = 300,
        vocab_size: int = 32000,
        seed: int = None,
    ):
        """Initialization steps for NRMS."""
        self.hparams = hparams
        self.seed = seed
        # SET SEED:
        tf.random.set_seed(seed)
        np.random.seed(seed)
        # INIT THE WORD-EMBEDDINGS:
        if word2vec_embedding is None:
            # Xavier Initialization
            initializer = GlorotUniform(seed=self.seed)
            self.word2vec_embedding = initializer(shape=(vocab_size, word_emb_dim))
            # self.word2vec_embedding = np.random.rand(vocab_size, word_emb_dim)
        else:
            self.word2vec_embedding = word2vec_embedding
        # BUILD AND COMPILE MODEL:
        self.model, self.scorer = self._build_graph()
        data_loss = self._get_loss(self.hparams.loss)
        train_optimizer = self._get_opt(
            optimizer=self.hparams.optimizer, lr=self.hparams.learning_rate
        )
        self.model.compile(loss=data_loss, optimizer=train_optimizer)
    def _get_loss(self, loss: str):
        """Make loss function, consists of data loss and regularization loss
        Returns:
            object: Loss function or loss function name
        """
        if loss == "cross_entropy_loss":
            data_loss = "categorical_crossentropy"
        elif loss == "log_loss":
            data_loss = "binary_crossentropy"
        else:
            raise ValueError(f"this loss not defined {loss}")
        return data_loss
    def _get_opt(self, optimizer: str, lr: float):
        """Get the optimizer according to configuration. Usually we will use Adam.
        Returns:
            object: An optimizer.
        """
        # TODO: shouldn't be a string input you should just set the optimizer, to avoid stuff like this:
        # => 'WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.'
        if optimizer == "adam":
            train_opt = tf.keras.optimizers.Adam(learning_rate=lr)
        else:
            raise ValueError(f"this optimizer not defined {optimizer}")
        return train_opt
    def _build_graph(self):
        """Build NRMS model and scorer.
        Returns:
            object: a model used to train.
            object: a model used to evaluate and inference.
        """
        model, scorer = self._build_nrms()
        return model, scorer
    def _build_userencoder(self, titleencoder):
        """The main function to create user encoder of NRMS.
        Args:
            titleencoder (object): the news encoder of NRMS.
        Return:
            object: the user encoder of NRMS.
        """
        his_input_title = tf.keras.Input(
            shape=(self.hparams.history_size, self.hparams.title_size), dtype="int32"
        )
        click_title_presents = tf.keras.layers.TimeDistributed(titleencoder)(
            his_input_title
        )
        y = SelfAttention(self.hparams.head_num, self.hparams.head_dim, seed=self.seed)(
            [click_title_presents] * 3
        )
        user_present = AttLayer2(self.hparams.attention_hidden_dim, seed=self.seed)(y)
        model = tf.keras.Model(his_input_title, user_present, name="user_encoder")
        return model
    def _build_newsencoder(self):
        """The main function to create news encoder of NRMS.
        Args:
            embedding_layer (object): a word embedding layer.
        Return:
            object: the news encoder of NRMS.
        """
        embedding_layer = tf.keras.layers.Embedding(
            self.word2vec_embedding.shape[0],
            self.word2vec_embedding.shape[1],
            weights=[self.word2vec_embedding],
            trainable=True,
        )
        sequences_input_title = tf.keras.Input(
            shape=(self.hparams.title_size,), dtype="int32"
        )
        embedded_sequences_title = embedding_layer(sequences_input_title)
        y = tf.keras.layers.Dropout(self.hparams.dropout)(embedded_sequences_title)
        y = SelfAttention(self.hparams.head_num, self.hparams.head_dim, seed=self.seed)(
            [y, y, y]
        )
        # Create configurable Dense layers:
        for layer in [400, 400, 400]:
            y = tf.keras.layers.Dense(units=layer, activation="relu")(y)
            y = tf.keras.layers.BatchNormalization()(y)
            y = tf.keras.layers.Dropout(self.hparams.dropout)(y)
        y = tf.keras.layers.Dropout(self.hparams.dropout)(y)
        pred_title = AttLayer2(self.hparams.attention_hidden_dim, seed=self.seed)(y)
        model = tf.keras.Model(sequences_input_title, pred_title, name="news_encoder")
        return model
    def _build_nrms(self):

        """Build NRMS model with learned temporal features.

        Instead of using pre-computed temporal discounts, this version learns

        temporal relationships from raw time differences.

        """

        # Input layers

        his_input_title = tf.keras.Input(

            shape=(self.hparams.history_size, self.hparams.title_size),

            dtype="int32",

        )

        pred_input_title = tf.keras.Input(

            shape=(None, self.hparams.title_size),

            dtype="int32",

        )

        pred_input_title_one = tf.keras.Input(

            shape=(1, self.hparams.title_size),

            dtype="int32",

        )

        # Time delta inputs (now just raw time differences)

        time_delta = tf.keras.Input(

            shape=(None, 1), dtype="float32"

        )

        time_delta_one = tf.keras.Input(

            shape=(1, 1), dtype="float32"

        )

        # Reshape single prediction input

        pred_title_one_reshape = tf.keras.layers.Reshape(

            (self.hparams.title_size,)

        )(pred_input_title_one)

        # Build encoders

        titleencoder = self._build_newsencoder()

        self.userencoder = self._build_userencoder(titleencoder)

        self.newsencoder = titleencoder

        # Get user representation

        user_present = self.userencoder(his_input_title)

        # Get news representations

        news_present = tf.keras.layers.TimeDistributed(self.newsencoder)(

            pred_input_title

        )

        news_present_one = self.newsencoder(pred_title_one_reshape)

        # Create temporal layer
        temporal_layer = TemporalLayer(units=64, name='temporal_layer')

        # Learn temporal weights and apply them
        temporal_weights = temporal_layer(time_delta)
        print(temporal_weights)

        temporal_weights_one = temporal_layer(time_delta_one)
   
        # Apply temporal weights to news representations
        news_present = tf.keras.layers.Multiply()([news_present, temporal_weights])
        news_present_one = tf.keras.layers.Multiply()([news_present_one, temporal_weights_one])

        # Compute final predictions
        preds = tf.keras.layers.Dot(axes=-1)([news_present, user_present])
        preds = tf.keras.layers.Activation(activation="softmax")(preds)
        pred_one = tf.keras.layers.Dot(axes=-1)([news_present_one, user_present])
        pred_one = tf.keras.layers.Activation(activation="sigmoid")(pred_one)

        # Create models

        model = tf.keras.Model(

            [his_input_title, pred_input_title, time_delta],

            preds

        )

        scorer = tf.keras.Model(

            [his_input_title, pred_input_title_one, time_delta_one],

            pred_one

        )

        return model, scorer



In [None]:
model = NRMSTemporalModel(
    hparams=hparams_nrms,
    word2vec_embedding=word2vec_embedding,
    seed=42,
)
model.model.compile(
    optimizer=model.model.optimizer,
    loss=model.model.loss,
    metrics=["AUC"],
)

<KerasTensor shape=(None, None, 400), dtype=float32, sparse=False, name=keras_tensor_73>


In [63]:
# Print model summary
def print_model_summary(model):
    """Print detailed summary of model architecture"""
    # Print overall model summary
    print("Overall Model Summary:")
    model.model.summary()
    # Print individual component summaries
    print("\nNews Encoder Summary:")
    model.newsencoder.summary()
    print("\nUser Encoder Summary:")
    model.userencoder.summary()

# Plot model architecture
def plot_model_architecture(model, filename="nrms_model.png"):
    """Save visualization of model architecture"""
    tf.keras.utils.plot_model(
        model.model,
        to_file=filename,
        show_shapes=True,
        show_layer_names=True,
        rankdir="TB",
        expand_nested=True,
        dpi=96,
    )

# Usage:
print_model_summary(model)
plot_model_architecture(model)

Overall Model Summary:



News Encoder Summary:



User Encoder Summary:


## Train the model


In [59]:
# List all physical devices
physical_devices = tf.config.list_physical_devices()
print("Available devices:", physical_devices)

Available devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


In [None]:
MODEL_NAME = model.__class__.__name__
MODEL_WEIGHTS = DUMP_DIR.joinpath(f"state_dict/{MODEL_NAME}/mini_0_1_fraction_small_dataset_with_normalization.weights.h5")
LOG_DIR = DUMP_DIR.joinpath(f"runs/{MODEL_NAME}")


# Earlystopping:
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="val_AUC",
    mode="max",
    patience=3,
    restore_best_weights=True,
)

# ModelCheckpoint:
modelcheckpoint = tf.keras.callbacks.ModelCheckpoint(
    filepath=MODEL_WEIGHTS,
    monitor="val_AUC",
    mode="max",
    save_best_only=False,
    save_weights_only=True,
    verbose=1,
)

# Learning rate scheduler:
lr_scheduler = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_AUC",
    mode="max",
    factor=0.2,
    patience=2,
    min_lr=1e-6,
)

callbacks = [early_stopping, modelcheckpoint, lr_scheduler]#tensorboard_callback
USE_CALLBACKS = True
EPOCHS = 4

hist = model.model.fit(
    train_dataloader,
    validation_data=val_dataloader,
    epochs=EPOCHS,
    callbacks=callbacks if USE_CALLBACKS else [],
)

/content/drive/MyDrive/Deepl learning/Jan_update/ebnerd_data/ebnerd_predictions/state_dict/NRMSTemporalModel/mini_0_1_fraction_small_dataset_with_normalization.weights.h5
Epoch 1/4


  self._warn_if_super_not_called()


[1m394/394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 397ms/step - AUC: 0.5505 - loss: 1.7850
Epoch 1: saving model to /content/drive/MyDrive/Deepl learning/Jan_update/ebnerd_data/ebnerd_predictions/state_dict/NRMSTemporalModel/mini_0_1_fraction_small_dataset_with_normalization.weights.h5
[1m394/394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m316s[0m 493ms/step - AUC: 0.5506 - loss: 1.7846 - val_AUC: 0.6002 - val_loss: 1.7131 - learning_rate: 1.0000e-04
Epoch 2/4
[1m394/394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 158ms/step - AUC: 0.7077 - loss: 1.4276
Epoch 2: saving model to /content/drive/MyDrive/Deepl learning/Jan_update/ebnerd_data/ebnerd_predictions/state_dict/NRMSTemporalModel/mini_0_1_fraction_small_dataset_with_normalization.weights.h5
[1m394/394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m78s[0m 197ms/step - AUC: 0.7077 - loss: 1.4276 - val_AUC: 0.6485 - val_loss: 1.5866 - learning_rate: 1.0000e-04
Epoch 3/4
[1m394/394[0m [32m━━━━━━━━━━

In [None]:
if USE_CALLBACKS:
    _ = model.model.load_weights(filepath=MODEL_WEIGHTS)

# Compute some metrics on the test dataset

In [67]:
BATCH_SIZE_TEST = 512

test_dataloader = NRMSTemporalDataLoader(
    behaviors=df_test,
    article_dict=article_mapping,
    unknown_representation="zeros",
    history_column=DEFAULT_HISTORY_ARTICLE_ID_COL,
    eval_mode=True,
    batch_size=BATCH_SIZE_TEST,
)

In [None]:
pred_test = model.scorer.predict(test_dataloader)

[1m420/478[0m [32m━━━━━━━━━━━━━━━━━[0m[37m━━━[0m [1m10:11[0m 11s/step

## Add the predictions to the dataframe

In [None]:
from typing import Any, Iterable
from pathlib import Path
from tqdm import tqdm
import warnings
import datetime
import inspect


from ebrec.utils._polars import (
    slice_join_dataframes,
    _check_columns_in_df,
    drop_nulls_from_list,
    generate_unique_name,
    shuffle_list_column,
)
import polars as pl

from ebrec.utils._constants import *
from ebrec.utils._python import create_lookup_dict
def add_prediction_scores(
    df: pl.DataFrame,
    scores: Iterable[float],
    prediction_scores_col: str = "scores",
    inview_col: str = DEFAULT_INVIEW_ARTICLES_COL,
) -> pl.DataFrame:
    """
    Adds prediction scores to a DataFrame for the corresponding test predictions.

    Args:
        df (pl.DataFrame): The DataFrame to which the prediction scores will be added.
        test_prediction (Iterable[float]): A list, array or simialr of prediction scores for the test data.

    Returns:
        pl.DataFrame: The DataFrame with the prediction scores added.

    Raises:
        ValueError: If there is a mismatch in the lengths of the list columns.

    >>> from ebrec.utils._constants import DEFAULT_INVIEW_ARTICLES_COL
    >>> df = pl.DataFrame(
            {
                "id": [1,2],
                DEFAULT_INVIEW_ARTICLES_COL: [
                    [1, 2, 3],
                    [4, 5],
                ],
            }
        )
    >>> test_prediction = [[0.3], [0.4], [0.5], [0.6], [0.7]]
    >>> add_prediction_scores(df.lazy(), test_prediction).collect()
        shape: (2, 3)
        ┌─────┬─────────────┬────────────────────────┐
        │ id  ┆ article_ids ┆ prediction_scores_test │
        │ --- ┆ ---         ┆ ---                    │
        │ i64 ┆ list[i64]   ┆ list[f32]              │
        ╞═════╪═════════════╪════════════════════════╡
        │ 1   ┆ [1, 2, 3]   ┆ [0.3, 0.4, 0.5]        │
        │ 2   ┆ [4, 5]      ┆ [0.6, 0.7]             │
        └─────┴─────────────┴────────────────────────┘
    ## The input can can also be an np.array
    >>> add_prediction_scores(df.lazy(), np.array(test_prediction)).collect()
        shape: (2, 3)
        ┌─────┬─────────────┬────────────────────────┐
        │ id  ┆ article_ids ┆ prediction_scores_test │
        │ --- ┆ ---         ┆ ---                    │
        │ i64 ┆ list[i64]   ┆ list[f32]              │
        ╞═════╪═════════════╪════════════════════════╡
        │ 1   ┆ [1, 2, 3]   ┆ [0.3, 0.4, 0.5]        │
        │ 2   ┆ [4, 5]      ┆ [0.6, 0.7]             │
        └─────┴─────────────┴────────────────────────┘
    """
    GROUPBY_ID = generate_unique_name(df.columns, "_groupby_id")
    #print(GROUPBY_ID)
    # df_preds = pl.DataFrame()
    scores = (
        df.lazy()
        .select(pl.col(inview_col))
        .with_row_index(GROUPBY_ID)
        .explode(inview_col)
        .with_columns(pl.Series(prediction_scores_col, scores).explode())
        .group_by(GROUPBY_ID)
        .agg(inview_col, prediction_scores_col)
        .sort(GROUPBY_ID)
        .collect()
    )
    return df.with_columns(scores.select(prediction_scores_col))#.drop(GROUPBY_ID)


In [None]:
df_test = add_prediction_scores(df_test, pred_test)

### Compute metrics

In [71]:
metrics = MetricEvaluator(
    labels=df_test["labels"].to_list(),
    predictions=df_test["scores"].to_list(),
    metric_functions=[AucScore(), MrrScore(), NdcgScore(k=5), NdcgScore(k=10)],
)
metrics.evaluate()

AUC: 100%|█████████████████████████████| 244647/244647 [05:49<00:00, 699.64it/s]
AUC: 100%|███████████████████████████| 244647/244647 [00:06<00:00, 40561.90it/s]
AUC: 100%|███████████████████████████| 244647/244647 [00:13<00:00, 17681.06it/s]
AUC: 100%|███████████████████████████| 244647/244647 [00:14<00:00, 17434.03it/s]


<MetricEvaluator class>: 
 {
    "auc": 0.639020025664197,
    "mrr": 0.413061618053277,
    "ndcg@5": 0.46424257054059764,
    "ndcg@10": 0.5231771458662604
}

# Plot the resulting Temporal Layer

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

def analyze_temporal_weights(nrms_model, hours_range=(0, 1), num_points=100):
    """Analyze and visualize the temporal weights learned by the model.

    Args:
        nrms_model: Instance of NRMSTemporalModel
        hours_range: Tuple of (min_hours, max_hours) to analyze
        num_points: Number of points to sample within the hours range

    Returns:
        tuple: (time_points, weights) - Arrays containing the analyzed data
    """
    # Generate time differences (convert hours to your model's time unit)
    hours = np.linspace(hours_range[0], hours_range[1], num_points)

    # Get the temporal layer from the model
    # First, get the base Keras model
    model = nrms_model.model

    # Find the temporal layer
    temporal_layer = None
    for layer in model.layers:
        if layer.name == 'temporal_layer':
            temporal_layer = layer
            break

    if temporal_layer is None:
        raise ValueError("Could not find TemporalLayer in the model")

    # Prepare batch of time differences
    time_diffs = hours.reshape(-1, 1, 1)  # Shape: (num_points, 1, 1)

    # Get temporal weights
    temporal_weights = temporal_layer(time_diffs)  # Shape: (num_points, 1, 400)

    # Average weights across the embedding dimension
    mean_weights = tf.reduce_mean(temporal_weights, axis=-1).numpy().flatten()

    # Create visualization
    plt.figure(figsize=(12, 6))
    plt.plot(hours, mean_weights, 'b-', linewidth=2)
    plt.fill_between(hours, mean_weights, alpha=0.2)
    plt.grid(True, linestyle='--', alpha=0.7)

    plt.title('Learned Temporal Importance Weights', fontsize=14, pad=20)
    plt.xlabel('Normalized Time Since Publication', fontsize=12)
    plt.ylabel('Temporal Weight', fontsize=12)

    # Add explanatory text
    plt.figtext(0.02, -0.1,
                'Higher weights indicate greater importance in the recommendation system.\n' +
                'Shows how the model weights news articles based on their age.',
                fontsize=10, ha='left')

    # Customize the plot
    plt.tight_layout()

    # Return the raw data for further analysis if needed
    return hours, mean_weights


hours, weights = analyze_temporal_weights(model, hours_range=(0, 1), num_points=1000)
plt.show()