<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="../../images/DLI_Header.png"></a></div>

# Implicit Feedback and Top K Accuracy

In the lab so far, we've been trying to predict the user's rating of an item. This type of feedback is **explicit**, meaning the user was consciously giving us feedback. Other examples might include a "thumbs up" or "like". This is valuable insight, but not all users want to take the time to give thoughtful feedback, so instead, many companies use **implicit** feedback instead, where a user's behavior might be a better metric for their satisfaction.

## Objective
* Understand how to extract Implicit Feedback
* Learn how to use the Top K Accuracy metric.

## Implicit Feedback

Let's take a look at YouTube. YouTube trains its recommendation model on [watch time](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/45530.pdf). The user isn't consciously giving feedback to YouTube when watching a video, but we might make an assumption that if a user watches a long video, they probably enjoy it, and the longer users are on the site, the more ads they can show.

Another example is Amazon. In 2003, its algorithm was based on Item-to-Item Collaborative Filtering. Remember our User-Item interaction matrix from Lab 1? Instead of comparing users with items, Amazon used the same item indices for both rows and columns. Then, it would keep track of how often one item was purchased with another. Amazon has since moved on to researching and applying a wide variety of systems, a history of which can be found [here](https://www.amazon.science/the-history-of-amazons-recommendation-algorithm).

[Facebook](https://engineering.fb.com/core-data/recommending-items-to-more-than-a-billion-people/) similarly uses a form of implicit feedback based on a confidence score of a user's explicit feedback.

For more information on implicit feedback, check out [this paper](https://terpconnect.umd.edu/~oard/pdf/aaai98.pdf).

|Types of Explicit Feedback|Types of Implicit Feedback|
|--------|--------|
| 1 - 5 stars | Watch time |
| Like / Dislike | Click-through rate|
| + 1 Button | Link sharing |
| Written review | Amount bought |


## Deriving Implicit Feedback

Finding implicit feedback can sometimes take a bit of creativity. Let's load the Amazon Dataset we've been using.

In [None]:
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, concatenate, Dense

ratings = pd.read_csv("../../data/task_2.csv")
ratings.head()

Instead of using the rating directly, let us instead consider simply if an item was purchased. To keep things simple, let us see if we can use the user embedding to predict what item they purchased for their review.

In [None]:
ratings = ratings[['item_index', 'user_embed_0', 'user_embed_1', 'valid']]
ratings.head()

If we are not going to use our `'overall'` column as the label, what should we choose? Since we're trying to predict what item the user is making a rating for, we'll use `item_index` and assign each item a probability.

This unfortunately means we can't use any user x item interaction features or properties of the item as inputs, but there are other advantages to this structure. Since we have an output for each item, we only need to run the model once to get the top recommended items for a user. No need to compare it to every item since that's built into the model!

To do this, we'll need to keep track of the number of items. We've already factorized the index, so we just need to find the max number.

In [None]:
NUM_ITEMS = ratings['item_index'].max() + 1
NUM_ITEMS

As before, let us build a tf.data pipeline. Since we have an output for each item, our loss will be very large at first. To prevent Not a Number (NaN) errors, we'll train based on batches instead of epochs. If we leave the [repeat](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#repeat) method of a tf.data dataset as `None`, the dataset will continue indefinitely.

We'll define how many batches to train on by specifying a number of steps when we fit the data.

In [None]:
def get_ds_from_df(df_tmp, cols, label, batch_size=1028, repeats=None):
    dataset = (
        tf.data.Dataset.from_tensor_slices(
            (df_tmp[cols].to_dict("list"), df_tmp[label].values)
        )
        .shuffle(len(df_tmp))
        .repeat(repeats)
        .batch(batch_size)
        .prefetch(1)
    )
    return dataset

Next, let's build our input tensors, just like the previous lab.

In [None]:
input_user_embed_0 = Input(name="user_embed_0", shape=(1), dtype="float32")
input_user_embed_1 = Input(name="user_embed_1", shape=(1), dtype="float32")

In [None]:
input_tensor = {
    "user_embed_0": input_user_embed_0,
    "user_embed_1": input_user_embed_1
}

## Building the model

We'll build our neural network mostly as usual, but the difference from previous labs is the last layer. Now, we have an output neuron for each of our different items. We're using a [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layer this time, but since we're predicting the  chance that a user will review an item, [Softmax](https://www.tensorflow.org/api_docs/python/tf/nn/softmax) would also work. Our dense layer will have a linear activation where the outputs are called a `logit`. For our use case, we are looking for the maximum prediction, so calculating a probability here would be extra computation.

In [None]:
# Dense Neural Network
inputs = concatenate([input_user_embed_0, input_user_embed_1], axis=1)
x = Dense(10000, activation="relu", name="hidden_1")(inputs)
x = Dense(5000, activation="relu", name="hidden_2")(x)
output_tensor = Dense(NUM_ITEMS, activation=None, name="logits")(x)

Let's visually inspect the model to verify it's been set up correctly.

In [None]:
model = tf.keras.Model(input_tensor, output_tensor)
tf.keras.utils.plot_model(model, show_shapes=True, rankdir='LR')

Since we're using the item index as our label, we will use [SparseCategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) for our loss. This way, TensorFlow does not have to search to find the index matching our label. This is very useful in situations like our Amazon dataset where we have thousands of outputs! We will set `from_logits` as `True` so our loss function can turn our predictions into a probability to compare against the label. 

To keep consistency, we'll use [SparseCategoricalAccuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/SparseCategoricalAccuracy). This accuracy metric expects data to be in the same format as what is used for `SparseCategoricalCrossentry` (meaning the label is an index). A debugging tip: if the model accuracy is always `0`, one thing to check is to make sure the correct accuracy metric is being used for the type of data the model is learning on.

In [None]:
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.keras.metrics.SparseCategoricalAccuracy(name="acc")]
model.compile(optimizer="adam", loss=loss, metrics=metrics)

Ok, let's prepare our tf.data dataset. We've specified the number of repeats for our validation data, but not for training. Since not specifying the number of repeats means that the dataset will loop forever, the validation step in our training will run forever. We'll put a `1` here to make sure the validation is gone through at least once.

In [None]:
data_cols = ['user_embed_0', 'user_embed_1']
train_ds = ratings[~ratings["valid"]]
train_ds = get_ds_from_df(train_ds, data_cols, 'item_index')

In [None]:
valid_ds = ratings[ratings["valid"]]
valid_ds = get_ds_from_df(valid_ds, data_cols, 'item_index', repeats=1)

Alright, moment of truth! Let's see how well our model trains. For [model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) we've defined both `epochs` and `steps_per_epoch`.

`steps_per_epoch` is how many batches we want to train on per `epoch`. This isn't an epoch in the traditional sense of machine learning that is one full pass of the dataset. This epoch is a collection of batches that are ingested by our model one step at a time, just like how a set is a number of repititions in weight training.

In [None]:
history = model.fit(train_ds, validation_data=valid_ds, epochs=5, steps_per_epoch=20)

## Top K Accuracy

Hmm, it is looking like our accuracy is a bit low, less than 1%. This is due to a number of reasons:
* We are not using any contextual information other than the user embeddings.
* Users rate multiple items, not just one.
* Think of how often a user chooses to interact with an advertisement. Accuracy for these types of recommendation models are historically low. This is not a bad thing! If users interact with an advertisement 10% of the time, that can still lead to good business.

To account for this, we'll use another metric called [Top K Accuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/TopKCategoricalAccuracy?hl=fr). What this means is we'll look at the top K recommended items and see if any of them match the label. If we do, we'll call that a hit, otherwise, it's a miss.

In order to use this metric, we'll need to compare our label against each output, meaning, [SparseCategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy) does not have the information we need. Instead, we'll use [CategoricalCrossentropy](https://www.tensorflow.org/api_docs/python/tf/keras/losses/CategoricalCrossentropy). This loss expects our label to be a one-hot encoding.

It would take a lot of space in our DataFrame to do this one-hot encoding (63001 columns!), so instead, we'll do it in batches. tf.data has a [map](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map) function that allows us to map a function to our features and labels. We'll use this with [tf.one_hot](https://www.tensorflow.org/api_docs/python/tf/one_hot) to get the format we need.

In [None]:
def one_hot(features, label):
    label = tf.one_hot(label, NUM_ITEMS)
    return features, label

def get_ds_from_df(df_tmp, cols, label, batch_size=1028, repeats=None):
    dataset = (
        tf.data.Dataset.from_tensor_slices(
            (df_tmp[cols].to_dict("list"), df_tmp[label].values)
        )
        .shuffle(len(df_tmp))
        .repeat(repeats)
        .batch(batch_size)
        .map(one_hot)
        .prefetch(1)
    )
    return dataset

Let's recompile our model with our new metrics and loss. This time, we'll use [CategoricalAccuracy](https://www.tensorflow.org/api_docs/python/tf/keras/metrics/CategoricalAccuracy) to reflect our new data structure.

We'll also set our `K` to `200`, meaning if the label is in the top 200 recommendations, that's a success. The appropriate number for K is going to depend on how these recommendations are served to users. For instance, if it's a travel website that shows the top 10 recommended travel destinations, then `K` should be 10 to reflect this.

In [None]:
tf.keras.backend.clear_session()
model = tf.keras.Model(input_tensor, output_tensor)
top_k = tf.keras.metrics.TopKCategoricalAccuracy(200, name="topk")
accuracy = tf.keras.metrics.CategoricalAccuracy(name="acc")
metrics = [top_k, metrics]
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
model.compile(optimizer="adam", loss=loss, metrics=None)

Next, we'll grab our new one-hot encoded dataset.

In [None]:
train_ds = ratings[~ratings["valid"]]
train_ds = get_ds_from_df(train_ds, data_cols, 'item_index')
valid_ds = ratings[ratings["valid"]]
valid_ds = get_ds_from_df(valid_ds, data_cols, 'item_index', repeats=1)

Finally, let's run our new model.

In [None]:
history = model.fit(train_ds, validation_data=valid_ds, epochs=5, steps_per_epoch=20)

Over 10% top K accuracy. Considering we added no other contextual information, this is not bad! Trying to decide whether Wide and Deep or this is better? Since we're not running the model for every item, consider using structure when speed is a priority. If model size is an issue with so many outputs, consider trimming the item pool to the more popular items.

<div align="center"><a href="https://www.nvidia.com/en-us/deep-learning-ai/education/"><img src="../../images/DLI_Header.png"></a></div>