# TensorFlow Recommenders (TFRS) Demo 

In this notebook we will demonstrate the use of TFRS on sales data from [Babyshop](https://www.babyshop.se/). This tutorial is heavily based on the [official tensorflow recommenders tutorials](https://www.tensorflow.org/recommenders/examples/quickstart). 

<div class="alert alert-block alert-info">
There are a lot of machine learning "best practices" that are ignored in this notebook for the sake of simplicity. The focus is to get an introduction to TFRS and general understanding of how this library works, not to build an industrial recommendation system. 
</div>

**YOU WILL NEED TO RUN THE FOLLOWING CELL IF USING GOOGLE COLAB**

In [None]:
!pip install tensorflow_recommenders==0.6.0

**AFTER YOU INSTALL TENSORFLOW RECOMMENDERS YOU MAY NEED TO RESTART THE RUNTIME** 

"Runtime" --> "Restart Runtime"

## **Imports**

In [1]:
from typing import Dict, Any, Text

import numpy as np 
import pandas as pd

import tensorflow as tf
import tensorflow_recommenders as tfrs

# **Reading in the Data** 

First we will read in the training and test data. 

<div class="alert alert-block alert-info">
See <code>EDA.ipynb</code> for analysis on the data and details on how the train and test sets were created. 
</div>

In the following cells we will create an even smaller version of the dataset so that we can train on a reasonable amount of time on a CPU. 

In [2]:
TRAIN_PATH = 'https://raw.githubusercontent.com/msvensson222/tfrs-retail-example/master/train.csv'
TEST_PATH = 'https://raw.githubusercontent.com/msvensson222/tfrs-retail-example/master/test.csv'
ITEM_INFO_PATH = 'https://raw.githubusercontent.com/msvensson222/tfrs-retail-example/master/item_info.csv'

In [3]:
train_df = pd.read_csv(TRAIN_PATH, dtype={'user_no': str, 'item_no': str})
test_df = pd.read_csv(TEST_PATH, dtype={'user_no': str, 'item_no': str})

# For evaluation
item_info_df = pd.read_csv(ITEM_INFO_PATH, dtype={'item_no': str})

In [4]:
display(train_df)

Unnamed: 0,user_no,item_no,gender_description,brand,product_group,first_interaction_month
0,3514657341026450752,-8200171396217105230,girls,jacadi,all in ones,5
1,-2544835772752526495,6010486836306001722,unisex,done by deer,tableware,11
2,-6023760384625599940,-289310928076258010,unisex,axkid,car seat accessories,2
3,4084143572023326121,-1069008842172275553,boys,pom dapi,sandals,5
4,-4787976733877481713,608763176274829755,unisex,little luwi,tops,8
...,...,...,...,...,...,...
667004,6183491195824661353,-487489333946043722,girls,ikks,dresses,4
667005,-8074445800271606192,7154496603299236573,unisex,tommee tippee,baby feeding,3
667006,3873852775369901008,3465194094158419708,unisex,by nils,sandals,5
667007,-1306455725574612144,2424760068735106973,girls,kenzo,tops,1


We will obtain our smaller dataset by just taking the top 2000 users (i.e. users with the most interactions) in the training data. 

In [5]:
NUM_USERS = 2000
top_users = train_df['user_no'].value_counts()[:NUM_USERS].index

train_df_filtered = train_df.loc[train_df['user_no'].isin(top_users), :]
test_df_filtered = test_df.loc[test_df['user_no'].isin(top_users), :]
items = train_df_filtered['item_no'].unique()

In the following cells we create TensorFlow datasets out of the Pandas DataFrames and print out the first few instances just to get an idea of what the datasets look like. 

In [6]:
train_dataset = tf.data.Dataset.from_tensor_slices(dict(train_df_filtered))
test_dataset = tf.data.Dataset.from_tensor_slices(dict(test_df_filtered))

items_dataset = tf.data.Dataset.from_tensor_slices(items)

2022-02-10 09:28:28.824445: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [7]:
for item in items_dataset.take(3):
    print(item)

tf.Tensor(b'-1119687312509640915', shape=(), dtype=string)
tf.Tensor(b'-3219910350938683317', shape=(), dtype=string)
tf.Tensor(b'1179978263120783371', shape=(), dtype=string)


In [8]:
for interaction in train_dataset.take(3):
    print(interaction)

{'user_no': <tf.Tensor: shape=(), dtype=string, numpy=b'-2683506524939646253'>, 'item_no': <tf.Tensor: shape=(), dtype=string, numpy=b'-1119687312509640915'>, 'gender_description': <tf.Tensor: shape=(), dtype=string, numpy=b'unisex'>, 'brand': <tf.Tensor: shape=(), dtype=string, numpy=b'reima'>, 'product_group': <tf.Tensor: shape=(), dtype=string, numpy=b'boots'>, 'first_interaction_month': <tf.Tensor: shape=(), dtype=int64, numpy=11>}
{'user_no': <tf.Tensor: shape=(), dtype=string, numpy=b'-8270295623916047084'>, 'item_no': <tf.Tensor: shape=(), dtype=string, numpy=b'-3219910350938683317'>, 'gender_description': <tf.Tensor: shape=(), dtype=string, numpy=b'boys'>, 'brand': <tf.Tensor: shape=(), dtype=string, numpy=b'moschino kid-teen'>, 'product_group': <tf.Tensor: shape=(), dtype=string, numpy=b'tops'>, 'first_interaction_month': <tf.Tensor: shape=(), dtype=int64, numpy=11>}
{'user_no': <tf.Tensor: shape=(), dtype=string, numpy=b'-1493854771764820101'>, 'item_no': <tf.Tensor: shape=()

---
---

# **Baseline**

The first thing we can do is start with a very "naive" baseline: for every interaction in the test dataset we will just predict the top 100 items from the training set. This will give us a reference point for any metrics we calculate after training a model. 

A side benefit is that we can get a better understanding of TFRS by recreating the way that metrics are calculated by TFRS. See [here](https://github.com/tensorflow/recommenders/blob/8b249f3fc0f8d3d907eecf010809a5df3759d65d/tensorflow_recommenders/metrics/factorized_top_k.py#L64) for the source code; the following cells are basically a simplified version of the code found in the TFRS library. 

In [9]:
NUM_TOP_ITEMS = 100
top_items = train_df_filtered['item_no'].value_counts()[:100].index

ks = (1, 5, 10, 50, 100)
metrics = [tf.keras.metrics.Mean() for k in ks]

true_candidates = tf.expand_dims(tf.constant(test_df_filtered['item_no'].values), 1)
retrieved_candidates = tf.expand_dims(top_items, 1)
# Pretend like we retrieve the same top 100 candidates for every interaction in test data
retrieved_candidates = tf.transpose(tf.repeat(retrieved_candidates, 
                                              tf.constant(true_candidates.shape[0]), 
                                              axis=1))
ids_match = tf.cast(tf.math.equal(true_candidates, retrieved_candidates), tf.float32)

In [10]:
for k, metric in zip(ks, metrics):
    # Clip to only count multiple matches once.
    match_found = tf.clip_by_value(
        tf.reduce_sum(ids_match[:, :k], axis=1, keepdims=True),
        0.0, 1.0
    )
    metric.update_state(match_found)

In [11]:
for k, metric in zip(ks, metrics):
    print(f'Top {k} categorical accuracy: {metric.result().numpy():.5f}')

Top 1 categorical accuracy: 0.00218
Top 5 categorical accuracy: 0.00480
Top 10 categorical accuracy: 0.01134
Top 50 categorical accuracy: 0.02967
Top 100 categorical accuracy: 0.04058


# Creating a Simple Model

We will start by creating a very simple model similar to the one created in [the TFRS basic retrieval tutorial](https://www.tensorflow.org/recommenders/examples/basic_retrieval). Quoting from the tutorial, the model will be created by composing two sub-models: 

> 1. A query model computing the query representation (normally a fixed-dimensionality embedding vector) using query features
> 2. A candidate model computing the candidate representation (an equally-sized vector using the candidate features
> 
> The outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the candidate and the query.

For our use case, we will pretend that we want to recommend items to users. As such, our **query** model will produce representations of the **users** (and potentially additional **context**, such as time, device, etc.) and our **candidate** model will produce representations of the **items**. 

For the rest of the notebook we will refer to the "query" model as a `user_model` and the "candidate" model as a `item_model`

<div class="alert alert-block alert-info">
<b>Tip:</b>  There is nothing forcing us to associate users with a query model and items with a candidate model. For example, we could just as easily associate items with a query model and items with a candidate model for an <b>item-item</b> recommender. 
</div>

In the following cells we will build each tower separately (via the `create_embedding_model` function). We will also define the task, which in this case will be a retrieval task. Finally we will put together the two sub-models and the task in a `tfrs.Model`, which allows us to implement a model by only implementing the `__init__` and `compute_loss` methods—the base model class will take care of the training loop. 

In [12]:
def get_vocab(df, feature, top_n=None):
    return df[feature].value_counts()[:top_n].index

def create_embedding_model(df, feature, num_oov_indices=1, embedding_dim=32):
    feature_vocab = get_vocab(df, feature)
    embedding_model = tf.keras.Sequential([
        tf.keras.layers.StringLookup(vocabulary=feature_vocab, 
                                     num_oov_indices=num_oov_indices),
        tf.keras.layers.Embedding(len(feature_vocab) + num_oov_indices, embedding_dim)
    ])
    
    
    return embedding_model

class SimpleTFRSModel(tfrs.Model):

    def __init__(self, user_model, item_model, task):
        super().__init__()
        self.user_model: tf.keras.Model = user_model
        self.item_model: tf.keras.Model = item_model
        self.task: tf.keras.layers.Layer = task
            

    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        # We pick out the user features and pass them into the user model
        # and item features to pass to the item model. Use the returned embeddings 
        # to calculate the loss
        user_embeddings = self.user_model(features['user_no'])
        positive_item_embeddings = self.item_model(features['item_no'])
        # The task computes the loss and the metrics. Don't compute metrics during training 
        # because it will take too long otherwise
        return self.task(user_embeddings, positive_item_embeddings, compute_metrics=not training)

In [13]:
user_model = create_embedding_model(train_df_filtered, "user_no")
item_model = create_embedding_model(train_df_filtered, "item_no")
metrics = tfrs.metrics.FactorizedTopK(
  candidates=items_dataset.batch(128).map(item_model)
)
task = tfrs.tasks.Retrieval(
  metrics=metrics
)

simple_tfrs_model = SimpleTFRSModel(user_model, item_model, task)

---
---

<div class="alert alert-block alert-warning">
<b>The above is just a convenience!</b> The following class is a simplified version of what
is actually going on under-the-hood:

```python 
class NonTFRSModel(tf.keras.Model):
    def __init__(self, user_model, item_model, metrics):
        """
        Note that we don't pass in the task! That's because we define 
        what the task is here.
        """
        super().__init__()
        self.user_model = user_model 
        self.item_model = item_model 
        # When we perform retrieval, the default loss is actually just good 
        # old CategoricalCrossentropy :) 
        self._loss = tf.keras.losses.CategoricalCrossentropy(
            from_logits=True, reduction=tf.keras.losses.Reduction.SUM
        )
        self._factorized_metrics = metrics

    def calc_loss(self, query_embeddings, candidate_embeddings): 
        scores = tf.linalg.matmul(
            query_embeddings, 
            candidate_embeddings, 
            transpose_b=True
        )
        num_queries, num_candidates = scores.shape
        labels = tf.eye(num_queries, num_candidates)
        loss = self._loss(y_true=labels, y_pred=scores)
        self._factorized_metrics.update_state(
            query_embeddings, 
            candidate_embeddings
        )
        return loss
    

    def train_step(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
        with tf.GradientTape() as tape: 
            user_embeddings = self.user_model(features['user_no'])
            positive_item_embeddings = self.item_model(features['item_no'])
            loss = self.calc_loss(user_embeddings, positive_item_embeddings)

        gradients = tape.gradient(loss, self.trainable_variables)
        self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

        metrics = {metric.name: metric.result() for metric in self.metrics}
        return metrics 

    def test_step(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor: 
        user_embeddings = self.user_model(features['user_no'])
        positive_item_embeddings = self.item_model(features['item_no'])

        loss = self.compute_loss(user_embeddings, positive_item_embeddings)        

        metrics = {metric.name: metric.result() for metric in self.metrics}
        return metrics 
```

We can then instantiate and compile a model like so: 

```python 
simple_model = NonTFRSModel(user_model, item_model, metrics)
# Need to specify run_eagerly=True because we need the shape of the scores 
# in the calc_loss function
simple_model.compile(optimizer=tf.keras.optimizers.Adam(), run_eagerly=True)
```

After that we can just train the model the same as below :)

</div>
---
---

In [14]:
train_dataset_interactions = train_dataset.map(lambda x: {
    'user_no': x['user_no'],
    'item_no': x['item_no']
})
test_dataset_interactions = test_dataset.map(lambda x: {
    'user_no': x['user_no'],
    'item_no': x['item_no']
})

train_ds = train_dataset_interactions.shuffle(1_000).batch(4096)
test_ds = test_dataset_interactions.batch(4096)

<div class="alert alert-block alert-info">
In the interest of accelerating training as much as possible, we won't calculate any metrics. If we were training "for real" we'd probably want to monitor the training and implement early stopping 
</div>

In [15]:
simple_tfrs_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
history = simple_tfrs_model.fit(train_ds, 
                                epochs=3, 
                                validation_data=test_ds)

Epoch 1/3
Epoch 2/3
Epoch 3/3


## Evaluation

In [16]:
train_results = simple_tfrs_model.evaluate(train_ds, return_dict=True)
test_results = simple_tfrs_model.evaluate(test_ds, return_dict=True)



In [17]:
print(f"Train top-100 accuracy:  {train_results['factorized_top_k/top_100_categorical_accuracy']}")
print(f"Test top-100 accuracy:  {test_results['factorized_top_k/top_100_categorical_accuracy']}")

Train top-100 accuracy:  0.8510119318962097
Test top-100 accuracy:  0.050610821694135666


The model does slightly better than the baseline, but it is also overfitting like crazy. Quoting from the tutorial, this is due to two factors: 

> 1. Our model is likely to perform better on the data that it has seen, simply because it can memorize it. This overfitting phenomenon is especially strong when models have many parameters. It can be mediated by model regularization and use of user and movie features that help the model generalize better to unseen data.
> 2. The model is re-recommending some of users' already [bought items]. These known-positive watches can crowd out test [items] out of top K recommendations.

## Serving and Qualitative Evaluation

In order to serve the model, we create an "index". Basically this is a way for us to do nearest neighbor search in the embedding space: we get in a "query" (in this case a user), calculate an embedding, and then compare that embedding to the embeddings of all candidate items. 

In this case, the number of candidate items is very small, so we just brute force the search. For real-world use cases we would want to use an approximate nearest neighbor search. TFRS allows us to build an index based on [ScaNN](https://github.com/google-research/google-research/tree/master/scann) if we install the optional dependency. 

In [18]:
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(simple_tfrs_model.user_model)
# recommends items out of the entire items dataset.
_ = index.index_from_dataset(
        tf.data.Dataset.zip((items_dataset.batch(100), 
                             items_dataset.batch(100).map(simple_tfrs_model.item_model))))

To qualitatively analyze the performance of the model, we can look at the predictions for a random user. 

<div class="alert alert-block alert-info">
<b>Tip: </b> Rerun the next few cells to get predictions for different users. 
    
<b>Tip: </b> The first time you run the <code>%%time</code> cells, it may take a bit longer than normal if there is any <code>@tf.function</code> tracing going on. 
</div>

In [19]:
random_user = np.random.choice(train_df_filtered['user_no'].unique())

In [20]:
%%time
# Get recommendations.
_, titles = index(tf.constant([random_user]))

CPU times: user 4.17 ms, sys: 2.01 ms, total: 6.18 ms
Wall time: 11.2 ms


<div class="alert alert-block alert-info">
<b>Tip: </b> You can also explicitly exclude certain items (e.g. items prviously interacted with) by calling <code>query_with_exclusions</code>. 
</div>

In [21]:
items_to_exclude = train_df_filtered.loc[train_df_filtered['user_no'] == random_user]['item_no'].unique()

In [22]:
%%time
_, titles = index.query_with_exclusions(tf.constant([random_user]), 
                                       tf.constant([items_to_exclude]))

CPU times: user 493 ms, sys: 13.1 ms, total: 506 ms
Wall time: 550 ms


Now we can actually examine these recommendations to see if they make any sense! 

**Historical purchases**

In [23]:
train_df_filtered.loc[train_df_filtered['user_no'] == random_user]

Unnamed: 0,user_no,item_no,gender_description,brand,product_group,first_interaction_month
1086,-4227527154985019677,3644027680429875428,unisex,kuling,clothing sets,9
22976,-4227527154985019677,7547780292943122816,girls,hype,bags,9
56639,-4227527154985019677,-488449987991724866,girls,molo,tops,10
78130,-4227527154985019677,5072443198733158252,girls,hype,bags,9
127732,-4227527154985019677,-697862567626410763,girls,molo,bags,10
172484,-4227527154985019677,-932661838383765191,girls,kuling,coveralls,2
178817,-4227527154985019677,-50943422945763155,girls,hype,bags,9
203611,-4227527154985019677,-6961812644339906604,girls,emma och malena maternity,maternity tops,6
240005,-4227527154985019677,-7279020476320904084,girls,le toy van,role play,3
323177,-4227527154985019677,-7380651070414010617,unisex,kavat,sandals,3


**Recommendations**

In [24]:
recommendations = [item.numpy().decode() for item in titles[0]]
item_info_df.loc[item_info_df['item_no'].isin(recommendations)]

Unnamed: 0,item_no,colour,gender_description,brand,product_group,min_age,max_age
6819,6248379259425891960,yellow,unisex,mainio,bottoms,0.125,10.0
13381,40742030666431930,blue,unisex,kavat,boots,1.0,11.0
16889,7423339463580660099,white,unisex,mini rodini,fleeces and midlayers,0.375,11.0
25142,-2685666217734801200,pink,unisex,stoy,bicycles and other vehicles,,
35926,2649172241095163913,white,unisex,mini rodini,coats and jackets,0.875,11.0
36394,1011817660855101910,white,unisex,stoy,role play,,
47917,1052690307609796416,purple,unisex,kuling,boots,0.875,11.0
48272,7799576263764959168,blue,unisex,kuling,swimwear and coverups,0.625,6.0
51095,-6507239208000082480,black,unisex,kuling,boots,1.0,10.0
57234,-2121255597513549382,purple,unisex,kuling,coveralls,0.625,4.0


---
---
---

## **Content-Based Filtering**

Another way to approach recommendations is to base them solely on content metadata, rather than learning from patterns in interactions across the customer base as a whole. As such, we will likely not get any "novel" recommendations and instead many of the recommendations will be very similar to the user's purchase history. 

In order to take advantage of some TFRS "machinery", we can build user and item models as before. However, this time instead of *learning* embeddings for each individual user and each item, we will manually compute the representations of each user and each item. 

In this case an item embedding will just consist of the concatenated one-hot encodings of the brand, product group, and gender description, and a user embedding will be the average of all the item embeddings in their purchase history. 

In [25]:
top_brands = get_vocab(train_df_filtered, 'brand', 100)
top_groups = get_vocab(train_df_filtered, 'product_group', 50)
COLS_TO_KEEP = ['gender_description', 'brand', 'product_group']

def precompute_embeddings(df, agg_col):
    df.loc[:, 'brand'] = df['brand'].apply(lambda x: x if x in top_brands else 'other')
    df.loc[:, 'product_group'] = df['product_group'].apply(lambda x: x if x in top_groups else 'other')
    df_one_hot = pd.get_dummies(df[COLS_TO_KEEP + [agg_col]], columns=COLS_TO_KEEP)
    return df_one_hot.groupby(agg_col).agg('mean')

In [26]:
precomputed_user_embeddings = precompute_embeddings(train_df_filtered, agg_col='user_no')
precomputed_item_embeddings = precompute_embeddings(item_info_df, agg_col='item_no')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [27]:
display(precomputed_user_embeddings)

Unnamed: 0_level_0,gender_description_boys,gender_description_girls,gender_description_unisex,brand_1+ in the family,brand_a happy brand,brand_a monday in copenhagen,brand_adidas,brand_barts,brand_beau loves,brand_billieblush,...,product_group_stroller parts and customisati,product_group_strollers,product_group_swimwear and coverups,product_group_tableware,product_group_textile,product_group_tops,product_group_trainers,product_group_underwear,product_group_vehicles,product_group_water toys
user_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
-1012876894217140776,0.000000,0.277778,0.722222,0.000000,0.166667,0.000000,0.055556,0.0,0.0,0.0,...,0.000000,0.000000,0.055556,0.0,0.000000,0.000000,0.055556,0.166667,0.0,0.0
-1022934284196456562,0.000000,0.166667,0.833333,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.111111,0.0,0.055556,0.111111,0.000000,0.000000,0.0,0.0
-1031375167955555195,0.000000,0.736842,0.263158,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.000000,0.210526,0.052632,0.105263,0.0,0.0
-1041412818309902183,0.200000,0.550000,0.250000,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.000000,0.600000,0.000000,0.000000,0.0,0.0
-1044709512978776856,0.200000,0.050000,0.750000,0.000000,0.000000,0.000000,0.050000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.000000,0.050000,0.050000,0.150000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
965297199758713016,0.166667,0.111111,0.722222,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.055556,0.055556,0.000000,0.0,0.000000,0.166667,0.000000,0.055556,0.0,0.0
968073716034597193,0.111111,0.777778,0.111111,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.000000,0.166667,0.000000,0.000000,0.0,0.0
976567085753614314,0.000000,0.947368,0.052632,0.000000,0.000000,0.052632,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.052632,0.210526,0.000000,0.157895,0.0,0.0
987479213534973896,0.000000,0.277778,0.722222,0.166667,0.000000,0.000000,0.000000,0.0,0.0,0.0,...,0.000000,0.000000,0.000000,0.0,0.055556,0.000000,0.000000,0.277778,0.0,0.0


In [28]:
display(precomputed_item_embeddings)

Unnamed: 0_level_0,gender_description_boys,gender_description_girls,gender_description_unisex,brand_1+ in the family,brand_a happy brand,brand_a monday in copenhagen,brand_adidas,brand_barts,brand_beau loves,brand_billieblush,...,product_group_stroller parts and customisati,product_group_strollers,product_group_swimwear and coverups,product_group_tableware,product_group_textile,product_group_tops,product_group_trainers,product_group_underwear,product_group_vehicles,product_group_water toys
item_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
-10001501373726678,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-1000182030290830232,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
-1000183384954605528,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-1000321715684049686,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-1000570342615087077,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999030474988862413,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
999032067904529387,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
999084409713144028,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
999328979874402204,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
def create_precomputed_embedding_model(precomputed_embeddings):
    num_columns = len(precomputed_embeddings.columns)
    embedding_matrix = np.concatenate((np.zeros((1, num_columns)),
                                      precomputed_embeddings.values))
    embedding_layer = tf.keras.layers.Embedding(*embedding_matrix.shape,
                                                embeddings_initializer=tf.keras.initializers.Constant(
                                                    embedding_matrix),
                                                trainable=False)
    model = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=precomputed_embeddings.index,
            num_oov_indices=1
        ),
        embedding_layer
    ])
    return model

In [30]:
user_model = create_precomputed_embedding_model(precomputed_user_embeddings)
item_model = create_precomputed_embedding_model(precomputed_item_embeddings)

The following cell demonstrates that all the item model is doing is looking up the one-hot encoding of brand, product group, and gender description for a user. 

In [31]:
item_model(tf.constant(['-1000183384954605528']))

<tf.Tensor: shape=(1, 155), dtype=float32, numpy=
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>

Even though we didn't train a `tfrs.Model`, we can still create an index exactly as we did above! 

In [32]:
items_dataset = tf.data.Dataset.from_tensor_slices(item_info_df['item_no'])
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(user_model)
# recommends items out of the entire items dataset.
_ = index.index_from_dataset(
  tf.data.Dataset.zip((items_dataset.batch(100), items_dataset.batch(100).map(item_model)))
)

In [33]:
random_user = np.random.choice(train_df_filtered['user_no'].unique())

In [34]:
items_to_exclude = train_df_filtered.loc[train_df_filtered['user_no'] == random_user]['item_no'].unique()

In [35]:
%%time
_, titles = index.query_with_exclusions(tf.constant([random_user]), 
                                       tf.constant([items_to_exclude]))

CPU times: user 189 ms, sys: 7.29 ms, total: 196 ms
Wall time: 211 ms


In [36]:
train_df_filtered.loc[train_df_filtered['user_no'] == random_user]

Unnamed: 0,user_no,item_no,gender_description,brand,product_group,first_interaction_month
44764,-582134469683447242,-7062902217557939573,unisex,reima,headwear,11
61710,-582134469683447242,2164894343335326037,unisex,tartine et chocolat,shoes,9
98899,-582134469683447242,3933678542901544177,boys,molo,jumpers and knitwear,9
103112,-582134469683447242,4490900546980254270,boys,jacadi,jumpers and knitwear,11
178498,-582134469683447242,1193766607815201020,boys,other,all in ones,9
181962,-582134469683447242,-4251817304791664609,unisex,fub,headwear,11
187693,-582134469683447242,-8038060062925076296,boys,other,trainers,3
195508,-582134469683447242,-5170604594377873809,girls,billieblush,tops,10
201915,-582134469683447242,-809248011627862386,girls,jacadi,dresses,9
214200,-582134469683447242,1305137013358548518,unisex,other,trainers,3


In [37]:
recommendations = [item.numpy().decode() for item in titles[0]]
item_info_df.loc[item_info_df['item_no'].isin(recommendations)]

Unnamed: 0,item_no,colour,gender_description,brand,product_group,min_age,max_age
15,457467103957514638,beige,unisex,other,headwear,0.125,0.125
19,5751912426656680356,white,unisex,other,headwear,0.125,2.0
209,3958544075576929766,beige,unisex,other,headwear,0.625,8.0
252,5765193949212180695,yellow,unisex,other,headwear,0.375,2.0
495,7912340583694710060,grey,unisex,other,headwear,0.125,2.0
670,-8259248676814643686,black,unisex,other,headwear,,
942,2589311358657624625,black,unisex,other,headwear,0.125,4.0
1022,5146575204438291293,grey,unisex,other,headwear,,
1092,7732139752943756076,white,unisex,other,headwear,0.125,2.0
1406,1088769495416475325,blue,unisex,other,headwear,0.125,0.875


**Usually the recommendations have little or no diversity**

In [38]:
test_users_dataset = tf.data.Dataset.from_tensor_slices(test_df_filtered['user_no'])

In [39]:
%%time
_, retrieved_items = index(tf.constant(test_df_filtered['user_no']), k=100)

CPU times: user 2.57 s, sys: 448 ms, total: 3.02 s
Wall time: 1.28 s


~1s to produce 100 recommendations for ~2000 users. 

In [40]:
ids_match = tf.cast(tf.math.equal(true_candidates, retrieved_items), tf.float32)

In [41]:
metrics = [tf.keras.metrics.Mean() for k in ks]
for k, metric in zip(ks, metrics):
    # By slicing until :k we assume scores are sorted.
    # Clip to only count multiple matches once.
    match_found = tf.clip_by_value(
        tf.reduce_sum(ids_match[:, :k], axis=1, keepdims=True),
        0.0, 1.0
    )
    metric.update_state(match_found)

In [42]:
for k, metric in zip(ks, metrics):
    print(f'Top {k} categorical accuracy: {metric.result().numpy():.5f}')

Top 1 categorical accuracy: 0.00044
Top 5 categorical accuracy: 0.00262
Top 10 categorical accuracy: 0.00698
Top 50 categorical accuracy: 0.03752
Top 100 categorical accuracy: 0.06937


This very simple model actually does much better than either the baseline or our simple collaborative filtering model. 

What if there was a way to get the best of both worlds?

---

# **Using Additional Features to Create a Hybrid Model**

From the [using rich features tutorial](https://www.tensorflow.org/recommenders/examples/featurization): 

> One of the great advantages of using a deep learning framework to build recommender models is the freedom to build rich, flexible feature representations.

We can use additional features to build our representations. In this case, we will use the additional metadata features we used in our content based filtering "model" (brand, product group, gender description) to create richer representations of each item. Intuitively, the collaborative filtering model created item representations solely based on purchasing behavior ("many customers bought items A, B, and C, so their representations should be 'similar'. If a customer has bought items A and B, we should recommend C"). On the other hand, the content based filtering model created item representations solely based on item metadata ("items X, Y, and Z all have the same brand, product description, and gender description, so their representations should be 'similar'. If a customer has bought items X and Y, we should recommend Z"). 

With this next model, we will try to combine these ideas so that item representations capture **both** purchasing behavior and item characteristics based on metadata features. 

We will only add additional features to the item model, but there is nothing stopping us from adding to the user model as well. If, for example, we had demographic features they could be worth adding to the user model. We could also add **context** features to the user/query model, such as time, device, etc. 

In [43]:
class ItemModel(tf.keras.Model):
    def __init__(self, 
                 train_df):
        super().__init__()
        # Embed the item id into 16 dimensions
        self.item_embedding = create_embedding_model(train_df, "item_no", embedding_dim=16)

        # There are three gender description values in the data, just one hot encode them
        gender_description = get_vocab(train_df, 'gender_description')
        self.gender_description_lookup = tf.keras.layers.StringLookup(vocabulary=gender_description, 
                                                                      output_mode='one_hot',
                                                                      num_oov_indices=0)
        self.brand_embedding = create_embedding_model(train_df, 
                                                     "brand", 
                                                     embedding_dim=8)
        self.product_group_embedding = create_embedding_model(train_df, 
                                                              "product_group", 
                                                              embedding_dim=5)
        
    def call(self, inputs):
        """
        Item representation is the concatenation of ID embedding, gender description one-hot 
        encoding, brand embedding, and product group embedding
        """
        return tf.concat([
             self.item_embedding(inputs['item_no']),
             self.gender_description_lookup(inputs['gender_description']),
             self.brand_embedding(inputs['brand']),
             self.product_group_embedding(inputs['product_group'])
        ], axis=1)
    
class TFRSContextModel(tfrs.models.Model):
    def __init__(self, 
                 user_model,
                 item_model, 
                 items_w_context):
        super().__init__()
        self.user_model = user_model
        self.item_model = item_model
        self.task = tfrs.tasks.Retrieval(
            metrics = tfrs.metrics.FactorizedTopK(
                candidates=items_w_context.batch(128).map(self.item_model)
            )
        )
        
    def compute_loss(self, inputs, training=False):
        query_embeddings = self.user_model(inputs['user_no'])
        candidate_embeddings = self.item_model({
            'item_no': inputs['item_no'],
            'gender_description': inputs['gender_description'],
            'brand': inputs['brand'],
            'product_group': inputs['product_group']
        })
        
        return self.task(query_embeddings, candidate_embeddings, compute_metrics=not training)

In [44]:
items_df = item_info_df.loc[item_info_df['item_no'].isin(items)][
    ['item_no', 'gender_description', 'brand', 'product_group']]

items_dataset_w_context = tf.data.Dataset.from_tensor_slices(dict(items_df))
user_model = create_embedding_model(train_df_filtered, "user_no")
item_model = ItemModel(train_df_filtered)

model = TFRSContextModel(user_model, item_model, items_dataset_w_context)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [45]:
train_ds = train_dataset.shuffle(1_000).batch(4096)
test_ds = test_dataset.batch(4096)

In [46]:
history = model.fit(train_ds, epochs=3, validation_data=test_ds)

Epoch 1/3
Epoch 2/3
Epoch 3/3


In [47]:
train_results = model.evaluate(train_ds, return_dict=True)
test_results = model.evaluate(test_ds, return_dict=True)



In [48]:
print(f"Train top-100 accuracy:  {train_results['factorized_top_k/top_100_categorical_accuracy']}")
print(f"Test top-100 accuracy:  {test_results['factorized_top_k/top_100_categorical_accuracy']}")

Train top-100 accuracy:  0.3460131585597992
Test top-100 accuracy:  0.12521815299987793


There is still some overfitting, but it is much better than before and the top-100 accuracy is the best of all the models so far!

In [49]:
# Create a model that takes in raw query features, and
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# recommends items out of the entire items dataset.
items_identifier_ds = items_dataset_w_context.map(lambda x: x['item_no'])
_ = index.index_from_dataset(
        tf.data.Dataset.zip((items_identifier_ds.batch(100), 
                             items_dataset_w_context.batch(100).map(model.item_model))))

In [50]:
random_user = np.random.choice(train_df_filtered['user_no'].unique())
items_to_exclude = train_df_filtered.loc[train_df_filtered['user_no'] == random_user]['item_no'].unique()

In [51]:
%%time
_, titles = index.query_with_exclusions(tf.constant([random_user]), 
                                       tf.constant([items_to_exclude]))

CPU times: user 133 ms, sys: 4.58 ms, total: 137 ms
Wall time: 184 ms


In [52]:
train_df_filtered.loc[train_df_filtered['user_no'] == random_user]

Unnamed: 0,user_no,item_no,gender_description,brand,product_group,first_interaction_month
15282,-355917100050929382,2727390070415090157,unisex,småfolk,all in ones,11
67539,-355917100050929382,941004878426231475,boys,champion,jumpers and knitwear,11
87417,-355917100050929382,-7150404021226002845,unisex,mini rodini,jumpers and knitwear,11
121109,-355917100050929382,-7679098013634930632,boys,timberland,all in ones,11
130455,-355917100050929382,-4222212719177975921,unisex,småfolk,all in ones,11
158873,-355917100050929382,7902408375612206522,boys,champion,tops,11
175769,-355917100050929382,8398670320846353131,boys,moschino kid-teen,jumpers and knitwear,11
208072,-355917100050929382,7783670181111540799,girls,buddy & hope,bottoms,11
208302,-355917100050929382,-826806050499523555,unisex,moschino kid-teen,jumpers and knitwear,11
241655,-355917100050929382,-2902137136503355060,boys,moncler,coats and jackets,10


In [53]:
recommendations = [item.numpy().decode() for item in titles[0]]
item_info_df.loc[item_info_df['item_no'].isin(recommendations)]

Unnamed: 0,item_no,colour,gender_description,brand,product_group,min_age,max_age
1065,5310331093029041126,blue,boys,småfolk,all in ones,0.125,0.625
9680,-6590454116463224914,blue,boys,småfolk,all in ones,0.125,2.0
13531,-2692550025455852972,green,boys,småfolk,all in ones,0.125,2.0
13751,-6529328654082724751,blue,boys,småfolk,all in ones,0.125,2.0
22292,-3022787317072387909,green,boys,småfolk,all in ones,0.125,2.0
26803,1192638001384264749,yellow,boys,småfolk,underwear,0.875,6.0
31310,3653303747819224244,blue,boys,småfolk,all in ones,0.125,2.0
46602,5338400809252936593,green,boys,småfolk,all in ones,0.125,2.0
50453,7554514181884221350,blue,boys,småfolk,all in ones,0.125,2.0
60869,-5695807411571609566,blue,boys,småfolk,all in ones,0.125,0.625


### **Performance**

In [55]:
%%time
_, retrieved_items = index(tf.constant(test_df_filtered['user_no']), k=100)

CPU times: user 626 ms, sys: 148 ms, total: 773 ms
Wall time: 303 ms


Less than a second to predict the top 100 items for 2000 users, and that's with a brute force search 