<a href="https://colab.research.google.com/github/ruth-chirinos/Internal_MIA/blob/main/HM_multitask_tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Multi-task recommenders

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/multitask"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/multitask.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/multitask.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/multitask.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In the [basic retrieval tutorial](basic_retrieval) we built a retrieval system using movie watches as positive interaction signals.

In many applications, however, there are multiple rich sources of feedback to draw upon. For example, an e-commerce site may record user visits to product pages (abundant, but relatively low signal), image clicks, adding to cart, and, finally, purchases. It may even record post-purchase signals such as reviews and returns.

Integrating all these different forms of feedback is critical to building systems that users love to use, and that do not optimize for any one metric at the expense of overall performance.

In addition, building a joint model for multiple tasks may produce better results than building a number of task-specific models. This is especially true where some data is abundant (for example, clicks), and some data is sparse (purchases, returns, manual reviews). In those scenarios, a joint model may be able to use representations learned from the abundant task to improve its predictions on the sparse task via a phenomenon known as [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning). For example, [this paper](https://openreview.net/pdf?id=SJxPVcSonN) shows that a model predicting explicit user ratings from sparse user surveys can be substantially improved by adding an auxiliary task that uses abundant click log data.

In this tutorial, we are going to build a multi-objective recommender for Movielens, using both implicit (movie watches) and explicit signals (ratings).

## Imports


Let's first get our imports out of the way.


In [None]:
!pip install -q tensorflow-recommenders
!pip install -q --upgrade tensorflow-datasets

In [None]:
import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import datetime
from sklearn.preprocessing import LabelEncoder

# H&M Group


In [None]:

# Load datasets
#base_path_data_example = "/Users/rchirinos-macbook/Documents/Boss/MIA/TFM2023/PythonProject/SimilarProjects/HM-CreateDatasetSamples"
#base_path_global = "/Users/rchirinos-macbook/Documents/Boss/MIA/TFM2023/PythonProject/data"

transactions = pd.read_csv('transactions_train_sample00001.csv')
articles_total = pd.read_csv('articles.csv')

#transactions = tfds.load('transactions_train_sample00001.csv')
#articles_total = tfds.load('articles.csv')

print('QTY of transactions '+str(len(transactions)))
print('QTY of articles_total '+str(len(articles_total)))


In [None]:
# Merge datasets based on article_id
merged_df = pd.merge(transactions, articles_total, on='article_id', how='left')
print('QTY of merged '+str(len(merged_df)))

In [None]:
merged_df.info()

In [None]:
merged_df.head(5)

In [None]:
merged_df[['customer_id', 'article_id', 'prod_name']].sort_values(by=['article_id'])

In [None]:
transactions_short = merged_df[['customer_id', 'article_id', 'prod_name', 'price']]
transactions_short.info()

In [None]:
# Group together
transactions_groupped = transactions_short.groupby(['customer_id', 'article_id', 'prod_name']).sum().reset_index()
transactions_groupped

In [None]:
def create_DataBinary(DataGrouped):
    transactions_binary = transactions_groupped.copy()
    transactions_binary['PurchasedYes'] = 1
    return transactions_binary

transactions_binary = create_DataBinary(transactions_groupped)
transactions_binary.head()

In [None]:
df_resultado = transactions_binary.drop(['price', 'article_id'], axis=1)

In [None]:
print(df_resultado.info())

In [None]:
df_resultado.head(5)

In [None]:
# Obtener la lista única de usuarios y productos
usuarios = df_resultado['customer_id']
productos = df_resultado['prod_name'] #Aqui podra ser el 100% de articulos
usuarios_unique = df_resultado['customer_id'].unique()
productos_unique = df_resultado['prod_name'].unique()

# Crear todas las combinaciones posibles entre usuarios y productos
combinaciones = pd.DataFrame([(usuario, producto) for usuario in usuarios_unique for producto in productos_unique],
                              columns=['customer_id', 'prod_name'])

print(combinaciones.info())
combinaciones

In [None]:
# Realizar un left join con las transacciones
df_resultado = pd.merge(combinaciones, df_resultado, on=['customer_id', 'prod_name'], how='left')

# Rellenar los valores nulos con 0 (no comprado)
df_resultado['PurchasedYes'].fillna(0, inplace=True)

# Imprimir el resultado
print(df_resultado.info())
print("Resultado final:")
print(df_resultado[['customer_id', 'prod_name', 'PurchasedYes']])

## Preparing the dataset

We're going to use the Movielens 100K dataset.

And repeat our preparations for building vocabularies and splitting the data into a train and a test set:

In [None]:

# Creating training and testing sets
train, test = train_test_split(df_resultado, test_size=0.2, random_state=42)


In [None]:

# Encode 'customer_id' to numerical values
label_encoder = LabelEncoder()
df_resultado['customer_id'] = label_encoder.fit_transform(df_resultado['customer_id'])



In [None]:
print(type(train))
print(type(test))

In [None]:
print(train.info())
train.head(5)

In [None]:
print(test.info())
test.head(5)

## Implementing a Model


In [None]:
df_transacciones = df_resultado

In [None]:
print(tf.__version__)
print(tfrs.__version__)

In [None]:
# productos
# usuarios
# df_resultado
# customer_id	prod_name	PurchasedYes

# Create user and item datasets
#user_ids = df_transacciones['customer_id'].astype('category').cat.codes.astype(int).unique()
#item_ids = df_transacciones['prod_name'].astype('category').cat.codes.astype(int).unique()


In [None]:
# Crear dataset para entrenamiento
train_data = tf.data.Dataset.from_tensor_slices(
    ({"customer_id": df_transacciones['customer_id'], "prod_name": df_transacciones['prod_name']},
     df_transacciones['PurchasedYes'].astype(int))
)

In [None]:
print(df_transacciones.info())
print(type(usuarios_unique))
print(type(productos_unique))

In [None]:
type(productos)
tf_productos = tf.data.Dataset.from_tensor_slices(productos)
type(tf_productos)

In [None]:
#**************************************
class MovielensModel(tfrs.models.Model):

  def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
    # We take the loss weights in the constructor: this allows us to instantiate
    # several model objects with different loss weights.

    super().__init__()

    embedding_dimension = 32

    # User and movie models.
    self.movie_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=productos_unique, mask_token=None),
      tf.keras.layers.Embedding(len(productos_unique) + 1, embedding_dimension)
    ])
    self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=usuarios_unique, mask_token=None),
      tf.keras.layers.Embedding(len(usuarios_unique) + 1, embedding_dimension)
    ])

    # A small model to take in user and movie embeddings and predict ratings.
    # We can make this as complicated as we want as long as we output a scalar
    # as our prediction.
    self.rating_model = tf.keras.Sequential([
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dense(1),
    ])

    # The tasks.
    self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
        loss=tf.keras.losses.MeanSquaredError(),
        metrics=[tf.keras.metrics.RootMeanSquaredError()],
    )
    self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=tf_productos.batch(128).map(self.movie_model)
        )
    )

    # The loss weights.
    self.rating_weight = rating_weight
    self.retrieval_weight = retrieval_weight


  def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
    # We pick out the user features and pass them into the user model.
    user_embeddings = self.user_model(features["customer_id"])
    # And pick out the movie features and pass them into the movie model.
    movie_embeddings = self.movie_model(features["prod_name"])

    return (
        user_embeddings,
        movie_embeddings,
        # We apply the multi-layered rating model to a concatentation of
        # user and movie embeddings.
        self.rating_model(
            tf.concat([user_embeddings, movie_embeddings], axis=1)
        ),
    )

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

    ratings = features.pop("PurchasedYes")

    user_embeddings, movie_embeddings, rating_predictions = self(features)

    # We compute the loss for each task.
    rating_loss = self.rating_task(
        labels=ratings,
        predictions=rating_predictions,
    )
    retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)

    # And combine them using the loss weights.
    return (self.rating_weight * rating_loss
            + self.retrieval_weight * retrieval_loss)


In [None]:
model = MovielensModel(rating_weight=1.0, retrieval_weight=0.0)


In [None]:
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
# Randomly shuffle data and split between train and test.
tf_transacciones = tf.data.Dataset.from_tensor_slices(dict(df_transacciones))

tf.random.set_seed(42)
shuffled = tf_transacciones.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

In [None]:
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

In [None]:
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

In [None]:
#**************************************

class HyM_model(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      customer_model: tf.keras.Model,
      article_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.customer_model = customer_model
    self.article_model = article_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["customer_id"])
    movie_embeddings = self.movie_model(features["prod_name"])

    return self.task(user_embeddings, movie_embeddings)

In [None]:
# Crear principales sequencias de modelo
embedding_dimension = 32

customer_model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
        input_dim=len(user_ids), output_dim=embedding_dimension
    )
])

article_model = tf.keras.Sequential([
    tf.keras.layers.Embedding(
        input_dim=len(item_ids), output_dim=embedding_dimension
    )
])

In [None]:
# Creando Metrics

task = tfrs.tasks.Retrieval(
    metrics = tfrs.metrics.FactorizedTopK( candidates=productos)
    )

In [None]:
# Creando el Modelo
model = HyM_model(
    customer_model,
    article_model,
    task=task
)

'''
model = tfrs.Model(
    user_model,
    item_model,
    #task=tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(candidates=df_articulos['prod_name'].unique())),
    #task=tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(candidates=productos))
    task=task
)
'''

In [None]:

# Compilar modelo
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

# Entrenar modelo
model.fit(train_data, epochs=10)

# Evaluar modelo
evaluation = model.evaluate(train_data, return_dict=True)

# Imprimir métricas de evaluación
for metric, value in evaluation.items():
    print(f"{metric}: {value}")

# Obtener recomendaciones para un usuario
user_id = 0
top_k_recommendations = model.recommend(tf.constant([user_id]))

# Imprimir recomendaciones
print(f"Top-10 recomendaciones para usuario {user_id}: {top_k_recommendations['prod_name']}")

### ---
### The query Tower

In [None]:
#The first step is to decide on the dimensionality of the query and candidate representations:
embedding_dimension = 32

Higher values will correspond to models that may be more accurate, but will also be slower to fit and more prone to overfitting.

The second is to define the model itself. Here, we're going to use Keras preprocessing layers to first convert user ids to integers, and then convert those to user embeddings via an Embedding layer. Note that we use the list of unique user ids we computed earlier as a vocabulary:

In [None]:
# productos
# usuarios

# Defining the Model
user_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=usuarios, mask_token=None),
  # We add an additional embedding to account for unknown tokens.
  tf.keras.layers.Embedding(len(usuarios) + 1, embedding_dimension)
])

A simple model like this corresponds exactly to a classic matrix factorization approach. While defining a subclass of tf.keras.Model for this simple model might be overkill, we can easily extend it to an arbitrarily complex model using standard Keras components, as long as we return an embedding_dimension-wide output at the end.

### The candidate tower
We can do the same with the candidate tower.

In [None]:
movie_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=productos, mask_token=None),
  tf.keras.layers.Embedding(len(productos) + 1, embedding_dimension)
])

## Metrics
In our training data we have positive (user, movie) pairs. To figure out how good our model is, we need to compare the affinity score that the model calculates for this pair to the scores of all the other possible candidates: if the score for the positive pair is higher than for all other candidates, our model is highly accurate.

To do this, we can use the tfrs.metrics.FactorizedTopK metric. The metric has one required argument: the dataset of candidates that are used as implicit negatives for evaluation.

In our case, that's the movies dataset, converted into embeddings via our movie model:


In [None]:
metrics = tfrs.metrics.FactorizedTopK(
  #candidates=productos.batch(128).map(movie_model)
  candidates=productos.map(movie_model)
)

## A multi-task model

There are two critical parts to multi-task recommenders:

1. They optimize for two or more objectives, and so have two or more losses.
2. They share variables between the tasks, allowing for transfer learning.

In this tutorial, we will define our models as before, but instead of having  a single task, we will have two tasks: one that predicts ratings, and one that predicts movie watches.

The user and movie models are as before:

```python
user_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=unique_user_ids, mask_token=None),
  # We add 1 to account for the unknown token.
  tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
])

movie_model = tf.keras.Sequential([
  tf.keras.layers.StringLookup(
      vocabulary=unique_movie_titles, mask_token=None),
  tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
])
```

However, now we will have two tasks. The first is the rating task:

```python
tfrs.tasks.Ranking(
    loss=tf.keras.losses.MeanSquaredError(),
    metrics=[tf.keras.metrics.RootMeanSquaredError()],
)
```

Its goal is to predict the ratings as accurately as possible.

The second is the retrieval task:

```python
tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        candidates=movies.batch(128)
    )
)
```

As before, this task's goal is to predict which movies the user will or will not watch.

### Putting it together

We put it all together in a model class.

The new component here is that - since we have two tasks and two losses - we need to decide on how important each loss is. We can do this by giving each of the losses a weight, and treating these weights as hyperparameters. If we assign a large loss weight to the rating task, our model is going to focus on predicting ratings (but still use some information from the retrieval task); if we assign a large loss weight to the retrieval task, it will focus on retrieval instead.

In [None]:
print(type(usuarios[0]))
print(type(productos[0]))

In [None]:
class MovielensModel(tfrs.models.Model):

  def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
    # We take the loss weights in the constructor: this allows us to instantiate
    # several model objects with different loss weights.

    super().__init__()

    embedding_dimension = 32

    # User and movie models.
    self.movie_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=productos, mask_token=None),
      tf.keras.layers.Embedding(len(productos) + 1, embedding_dimension)
    ])

    self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
      tf.keras.layers.StringLookup(
        vocabulary=usuarios, mask_token=None),
      tf.keras.layers.Embedding(len(usuarios) + 1, embedding_dimension)
    ])


    # A small model to take in user and movie embeddings and predict ratings.
    # We can make this as complicated as we want as long as we output a scalar
    # as our prediction.
    self.rating_model = tf.keras.Sequential([
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dense(1),
    ])

    # The tasks.
    self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
        loss=tf.keras.losses.MeanSquaredError(),
        metrics=[tf.keras.metrics.RootMeanSquaredError()],
    )
    self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            #candidates=movie.batch(128).map(self.movie_model)
            candidates=productos
        )
    )

    # The loss weights.
    self.rating_weight = rating_weight
    self.retrieval_weight = retrieval_weight

  def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
    # We pick out the user features and pass them into the user model.
    user_embeddings = self.user_model(features["customer_id"])
    # And pick out the movie features and pass them into the movie model.
    movie_embeddings = self.movie_model(features["article_id"])

    return (
        user_embeddings,
        movie_embeddings,
        # We apply the multi-layered rating model to a concatentation of
        # user and movie embeddings.
        self.rating_model(
            tf.concat([user_embeddings, movie_embeddings], axis=1)
        ),
    )

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

    ratings = features.pop("PurchasedYes")

    user_embeddings, movie_embeddings, rating_predictions = self(features)

    # We compute the loss for each task.
    rating_loss = self.rating_task(
        labels=ratings,
        predictions=rating_predictions,
    )
    retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)

    # And combine them using the loss weights.
    return (self.rating_weight * rating_loss
            + self.retrieval_weight * retrieval_loss)

### Rating-specialized model

Depending on the weights we assign, the model will encode a different balance of the tasks. Let's start with a model that only considers ratings.

In [None]:
model = MovielensModel(rating_weight=1.0, retrieval_weight=0.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
'''
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()
print(type(cached_train))
print(type(cached_test))
'''

In [None]:
model.fit(train, epochs=3)
metrics = model.evaluate(test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

The model does OK on predicting ratings (with an RMSE of around 1.11), but performs poorly at predicting which movies will be watched or not: its accuracy at 100 is almost 4 times worse than a model trained solely to predict watches.

### Retrieval-specialized model

Let's now try a model that focuses on retrieval only.

In [None]:
model = MovielensModel(rating_weight=0.0, retrieval_weight=1.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

We get the opposite result: a model that does well on retrieval, but poorly on predicting ratings.

### Joint model

Let's now train a model that assigns positive weights to both tasks.

In [None]:
model = MovielensModel(rating_weight=1.0, retrieval_weight=1.0)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [None]:
model.fit(cached_train, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

The result is a model that performs roughly as well on both tasks as each specialized model.

### Making prediction

We can use the trained multitask model to get trained user and movie embeddings, as well as the predicted rating:

In [None]:
trained_movie_embeddings, trained_user_embeddings, predicted_rating = model({
      "user_id": np.array(["42"]),
      "movie_title": np.array(["Dances with Wolves (1990)"])
  })
print("Predicted rating:")
print(predicted_rating)

While the results here do not show a clear accuracy benefit from a joint model in this case, multi-task learning is in general an extremely useful tool. We can expect better results when we can transfer knowledge from a data-abundant task (such as clicks) to a closely related data-sparse task (such as purchases).