# Model Evaluation Demonstration:
This notebook showcases how to evaluate our GMF and NCF models. This notebook will go over the preprocessing steps as well as the evaluation steps.

## Import Packages and Modules:
These are the following packages and modules we'll be using for this notebook.

In [2]:
## Import necessary Python packages.
import tensorflow as tf
import numpy as np
import pandas as pd
import random

## Import the preprocessing and evaluating modules.
import preprocess
import evaluation

## Import the models.
import models.gmf_model as GMF
import models.ncf_model as NCF

## Preprocess Dataset:
Similar to the preprocessing steps in the `training_demo.ipynb` notebook, we'll have to assign numerical ID values for each unique user and item. This is done below.

In [3]:
## Load in the MSD interaction csv file.
interaction_df = pd.read_csv('interactions.csv')
## Convert whether the user liked a song into binary (0 or 1).
interaction_df['liked'] = (interaction_df['count'] > 0).astype(int)

## Map the DataFrame with numerical IDs.
mapped_df = preprocess.MapUserItemID(df = interaction_df)

## Split the training and testing dataset using the leave-one-out technique.
train_df, test_df = preprocess.LeaveOneOut(df = mapped_df)

We'll also create a set that contains all the positive interactions of users as well as all the unique item IDs.

In [4]:
## Create the corresponding sets to positive interactions and unique item IDs.
user_positive_itemsets, item_pool = preprocess.CreatePositiveInteractions(df = mapped_df)

Processing: 100%|██████████| 404103/404103 [04:38<00:00, 1453.11it/s]


## Evaluation Process:
While we can use the `test_df` generated from the leave-one-out split, we have already created 10,000 randomly selected testing interactions. This is labeled as the `sampled_test_data.csv` file. You can load this in using `pandas` and append the user-item interactions as tuples into the `test_data` list. This process is shown below.

In [5]:
## Load in the testing interactions.
test_df = pd.read_csv('sampled_test_data.csv')

## Create a list of testing interactions.
test_data = []

## For each row in the test DataFrame, add the user and song as a tuple.
for _, row in test_df.iterrows():
    user_id = row['user_id']
    song_id = row['song_id']

    ## Append tuple to the test_data list.
    test_data.append((user_id, song_id))

We'll now initialize the model and load in trained weights. The `ncf_msd_weights.h5` are saved weights from a NCF model that was trained over 20 epochs on the MSD dataset. Before loading in the weights, we'll need to pass in dummy inputs through the model so that it initializes its weights. Then, we can replace the weights with our trained weights. This is shown below.

In [6]:
## Initialize the NCF model.
ncf_model = NCF.NCFModel(
                num_users = mapped_df['user_id'].nunique(), 
                num_items = mapped_df['song_id'].nunique(), 
                num_latent = 8
            )

## Pass dummy inputs to initialize the weights in the NCF model.
ncf_model(np.array([np.array([0]), np.array([1])]))
## After intializing the weights to the model, we can then load in the 
## trained weights to the model.
ncf_model.load_weights('ncf_msd_weights.h5')

We can finally evaluate the performance of the model using the `EvaluateModel` function. This function computes the Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) for each user in our testing dataset. 

The Hit Ratio calculates the probability that a liked song is within the user's top k recommendation list. The NDCG is similar to the HR metric, instead in incorporates the ranking of the liked song as well. 

Users can change the value of the amount of recommendations the model makes. Furthermore, users can use various sample numbers to see how the accuracy progresses over different parameters. This is shown below.

In [7]:
## The top k recommendations for evaluating the performance of the models.
k_val = 10

## Compute the average Hit Ratio and Normalized Discounted Cumulative Gain (NDCG) using the EvaluateModel function.
hit_ratio, ndcg = evaluation.EvaluateModel(model = ncf_model, 
                                           test_data = test_data, 
                                           user_positive_itemsets = user_positive_itemsets, 
                                           item_pool = item_pool, 
                                           num_negatives = 100, 
                                           top_k = k_val
                                        )

## Print the computed performance metrics.
print(f'Hit Ratio at Top {k_val}: {hit_ratio} | NDCG at Top {k_val}: {ndcg}')

Generating 100 Negative Test Samples: 100%|██████████| 10000/10000 [00:01<00:00, 5576.19it/s]
Applying Model: 100%|██████████| 10000/10000 [04:55<00:00, 33.84it/s]


Hit Ratio at Top 10: 0.8018 | NDCG at Top 10: 0.5482748504832984
