## **Installation . . .**

In [None]:
import torch
if not torch.cuda.is_available():
    raise Exception("You should enable GPU runtime")

In [None]:
device = torch.device("cuda")

## **Installing tensorboard and setting it up . . .**

In this session, I wanted to use the original Tensorboard instead of using the TensorboardColab version. Doing this, for example, we are able to add images or graphs and not just scalars. Besides, we are able to load different experiments on the same graphics thus allowing us to compare them in the same plot.

In [None]:
%load_ext tensorboard 

In [None]:
import os
logs_base_dir = "runs"
os.makedirs(logs_base_dir, exist_ok=True)

In [None]:
from torch.utils.tensorboard import SummaryWriter

tb_fm = SummaryWriter(log_dir=f'{logs_base_dir}/{logs_base_dir}_FM/')
tb_gcn = SummaryWriter(log_dir=f'{logs_base_dir}/{logs_base_dir}_GCN/')
tb_gcn_attention = SummaryWriter(log_dir=f'{logs_base_dir}/{logs_base_dir}_GCN_att/')

## **Movielens - 100k dataset**

MovieLens [datasets](https://grouplens.org/datasets/movielens/) were collected by the GroupLens Research Project at the University of Minnesota.
 
&nbsp;


This data set consists of:

* 100,000 ratings (1-5) from 943 users on 1682 movies. 
* Each user has rated at least 20 movies. 
* Simple demographic info for the users (age, gender, occupation, zip)

 &nbsp;

The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. This data has been cleaned up - users who had less than 20 ratings or did not have complete demographic information were removed from this data set. 


> Note that the rating matrix is quite sparse (93.6% to be precise) as it only holds 100,000 ratings out of a possible 1,586,126 (943*1682).

&nbsp;

For this notebook, we will use a preprocessed version of the original data in order to avoid the part of splitting the data in a specific way. The preprocessed dataset has been splitted following the *leave-one-out* strategy and so it has holded out one interaction of each user for testing / validation while keeeping the others for training.



### Preparing imports

In [None]:
from torch.utils.data import DataLoader, Dataset
from IPython import embed
from sklearn.metrics import roc_auc_score
import pandas as pd
import numpy as np
import csv
import os
import scipy.sparse as sp
from tqdm import tqdm, trange


### **Downloading data and loading it with pandas ...**

In [None]:
if not os.path.exists('data/ml-100k'):
    !gdown --id 1rE20sLow9sT2ULpBOOWqw2SEnpIm16OZ
    !mkdir data
    !unzip ml-dataset-splitted.zip && mv ml-dataset-splitted data/ml-100k

In [None]:
!ls data/ml-100k/

The data sets `movielens.train.rating`, `movielens.test.rating` are the splits generated from `u.data` ( which contains the entire data). They follow the "leave-one-out" strategy for splitting the data in a way that will allow us to **evaluate ranking prediction**. 

 &nbsp;

Both files have the same tab-separated format:

    user_id   movie_id   rating   timestamp

where `user_id` is an integer between 1 and 943, `movie_id` is an integer between 1 and 1682, `rating` is an integer between 1 and 5 and `timestamp`  is an epoch-based integer.

<div>
<center><img src="https://files.realpython.com/media/movielens-head.0542b4c067c7.jpg" width="300"/></center>
</div>


However, in the provided preprocessed splits we also have changed all rating tags for binary labels in order to deal with an `implicit feedback` task. So, all data from the dataset will have positive labels (`1`) denoting any interaction with a film as a case of being interesed in the film (even the user did not like it in the end) and, for negative labels (`0`), we will perform negative sampling thus sampling interactions that did not actually occured between user and a given item. So, now we can observe the data:





In [None]:
# LOAD TRAINING DATA
colnames = ["user_id", 'item_id', 'label', 'timestamp']
data = pd.read_csv('data/ml-100k/movielens.train.rating', sep="\t", header=None, names=colnames)
data.head()


In [None]:
# Unique value for the label is 1 (we will need to manually sample negative data)
data.nunique()

In [None]:
data.shape

In [None]:
assert 100000 - 99057 == 943 

So, while we can observe for the training data many interactions for each user, we see below that for the testing (or validation) set we have just holded out one interaction for user, which will be used as ground-truth when evaluating the model outputing a ranking.

In [None]:
# LOAD TESTING DATA
colnames = ["user_id", 'item_id', 'label', 'timestamp']
test_data = pd.read_csv('data/ml-100k/movielens.test.rating', sep="\t", header=None, names=colnames)
test_data.head()


In [None]:
test_data.shape

> Note that we need to preprocess the dataset by for example re-indexing the films or removing the timestamp, which is not useful for our task. We also need to build the adjacency matrix and perform negative sampling for training. Besides, we will need to build the test set thus aiming to evaluate in a ranking way by following *HR* and *NDCG* metrics seen in theory.


### **Preprocessing dataset ...**


We will first show how to preprocess data for some individual examples in `1. Understanding how to process data` section and finally we will construct a *Pytorch Dataset class* which will allow us to preprocess and handle the whole data in order to forward it to the model (it is done in `2. Building dataset and preparing data for the model` section).


#### **1. Understanding how to process data...**

##### *Pre-process Movielens-100k*

In [None]:
# userId,movieId,rating,timestamp
data = data.to_numpy()
data

In [None]:
items = data[:, :2].astype(np.int) - 1  # -1 because ID begins from 1
items

In [None]:
np.max(items, axis=0)[:2] + 1 

In [None]:
# We need each node to have a unique id
reindex_items = items.copy()
reindex_items[:, 1] = reindex_items[:, 1] + 943
reindex_items

In [None]:
field_dims = np.max(reindex_items, axis=0) + 1
field_dims

In [None]:
def build_adj_mx(dims, interactions):
    train_mat = sp.dok_matrix((dims, dims), dtype=np.float32)
    for x in tqdm(interactions, desc="BUILDING ADJACENCY MATRIX..."):
        train_mat[x[0], x[1]] = 1.0
        train_mat[x[1], x[0]] = 1.0

    return train_mat

In [None]:
train_mat = build_adj_mx(field_dims[-1], reindex_items.copy())
train_mat

In [None]:
# Check that we have (2*99057 = 198114) interactions...
99057*2

##### *Checking we have just positive data:*

In [None]:
targets = data[:, 2]
targets

In [None]:
np.unique(targets)

##### *Example on performing negative data for a training sample: (u, i, j)*

In [None]:
data = np.c_[(reindex_items, targets)].astype(int)
data

In [None]:
field_dims[:2]

In [None]:
# EXAMPLE interaction number 988 : user 6 - item 1470
x = data[988]
x

In [None]:
neg_triplet = np.array([0,0,0])
neg_triplet[0] = x[0].copy()
neg_triplet

In [None]:
# Example: We find item 1200 has no connection with user 6
j = 1200
neg_triplet[1] = j
neg_triplet

##### *Define metrics:*

In [None]:
import math

def getHitRatio(recommend_list, gt_item):
    if gt_item in recommend_list:
        return 1
    else:
        return 0

def getNDCG(recommend_list, gt_item):
    idx = np.where(recommend_list == gt_item)[0]
    if len(idx) > 0:
        return math.log(2)/math.log(idx+2)
    else:
        return 0

##### *Build test dataset for evaluation*

In [None]:
dataset_path = 'data/ml-100k/movielens'
test_data = pd.read_csv(f'{dataset_path}.test.rating', sep='\t',
                        header=None, names=colnames).to_numpy()
test_data

In [None]:
# Take number of users and items from reindex items from train set
users, items = np.max(reindex_items, axis=0)[:2] + 1 # [ 943, 1682])
print(users)
print(items)

In [None]:
# Reindex test items and substract 1
pairs_test = test_data[:, :2].astype(np.int) - 1    
pairs_test[:, 1] = pairs_test[:, 1] + users 
pairs_test

In [None]:
assert 74 + 943 - 1 == 1016

In [None]:
pair = pairs_test[0]
pair

In [None]:
# GENERATE TEST SET WITH NEGATIVE EXAMPLES TO EVALUATE
max_users, max_items = field_dims[:2] # number users (943), number items (2625)
negatives = []
for t in range(10):
    j = np.random.randint(max_users, max_items)
    while (pair[0], j) in train_mat or j == pair[1]:
        j = np.random.randint(max_users, max_items)
    negatives.append(j)
negatives

In [None]:
single_user_test_set = np.vstack([pair, ] * (len(negatives)+1))
single_user_test_set

In [None]:
single_user_test_set[:, 1][1:] = negatives
single_user_test_set

#### **2. Building dataset and preparing data for the model ...**

In [None]:
#@title
import numpy as np
import pandas as pd
import torch.utils.data


class MovieLens100kDataset(torch.utils.data.Dataset):
    """
    MovieLens 100k Dataset

    Data preparation
        treat samples with a rating less than 3 as negative samples

    :param dataset_path: MovieLens dataset path

    """

    def __init__(self, dataset_path, num_negatives_train=4, num_negatives_test=100, sep='\t'):

        colnames = ["user_id", 'item_id', 'label', 'timestamp']
        data = pd.read_csv(f'{dataset_path}.train.rating', sep=sep, header=None, names=colnames).to_numpy()
        test_data = pd.read_csv(f'{dataset_path}.test.rating', sep=sep, header=None, names=colnames).to_numpy()

        # TAKE items, targets and test_items
        self.targets = data[:, 2]
        self.items = self.preprocess_items(data)

        # Save dimensions of max users and items and build training matrix
        self.field_dims = np.max(self.items, axis=0) + 1 # ([ 943, 2625])
        self.train_mat = build_adj_mx(self.field_dims[-1], self.items.copy())

        # Generate train interactions with 4 negative samples for each positive
        self.negative_sampling(num_negatives=num_negatives_train)
        
        # Build test set by passing as input the test item interactions
        self.test_set = self.build_test_set(self.preprocess_items(test_data),
                                            num_neg_samples_test = num_negatives_test)

    def __len__(self):
        return self.targets.shape[0]

    def __getitem__(self, index):
        return self.interactions[index]
    
    def preprocess_items(self, data, users=943):
        reindexed_items = data[:, :2].astype(np.int) - 1  # -1 because ID begins from 1
        #users, items = np.max(reindexed_items, axis=0)[:2] + 1 # [ 943, 1682])
        # Reindex items (we need to have [users + items] nodes with unique idx)
        reindexed_items[:, 1] = reindexed_items[:, 1] + users

        return reindexed_items

    def negative_sampling(self, num_negatives=4):
        self.interactions = []
        data = np.c_[(self.items, self.targets)].astype(int)
        max_users, max_items = self.field_dims[:2] # number users (943), number items (2625)

        for x in tqdm(data, desc="Performing negative sampling on test data..."):  # x are triplets (u, i , 1) 
            # Append positive interaction
            self.interactions.append(x)
            # Copy user and maintain last position to 0. Now we will need to update neg_triplet[1] with j
            neg_triplet = np.vstack([x, ] * (num_negatives))
            neg_triplet[:, 2] = np.zeros(num_negatives)

            # Generate num_negatives negative interactions
            for idx in range(num_negatives):
                j = np.random.randint(max_users, max_items)
                # IDEA: Loop to exclude true interactions (set to 1 in adj_train) user - item
                while (x[0], j) in self.train_mat:
                    j = np.random.randint(max_users, max_items)
                neg_triplet[:, 1][idx] = j
            self.interactions.append(neg_triplet.copy())

        self.interactions = np.vstack(self.interactions)
    
    def build_test_set(self, gt_test_interactions, num_neg_samples_test=99):
        max_users, max_items = self.field_dims[:2] # number users (943), number items (2625)
        test_set = []
        for pair in tqdm(gt_test_interactions, desc="BUILDING TEST SET..."):
            negatives = []
            for t in range(num_neg_samples_test):
                j = np.random.randint(max_users, max_items)
                while (pair[0], j) in self.train_mat or j == pair[1]:
                    j = np.random.randint(max_users, max_items)
                negatives.append(j)
            #APPEND TEST SETS FOR SINGLE USER
            single_user_test_set = np.vstack([pair, ] * (len(negatives)+1))
            single_user_test_set[:, 1][1:] = negatives
            test_set.append(single_user_test_set.copy())
        return test_set

In [None]:
full_dataset= MovieLens100kDataset(dataset_path, num_negatives_train=4, num_negatives_test=99)

In [None]:
# 90570 interactions with pairs of index that have interacted + 4*90570 negative
full_dataset.interactions

In [None]:
full_dataset.interactions[:20]

In [None]:
## We had 99057 interactions in training_matrix --> now we have 99057 positive plus 4*99057 negative
assert 5*99057 == full_dataset.interactions.shape[0]

In [None]:
# For test set, we keep the size (one interaction per user) but we append 99 negative samples for evaluation
print(len(full_dataset.test_set))

In [None]:
len(full_dataset.test_set[0]) # --> [gt_pair + 99_neg_samples]

In [None]:
full_dataset.test_set[0]

Sampling 4 negative samples for each positive, will also work as a type of normalization.

In [None]:
data_loader = DataLoader(full_dataset, batch_size=256, shuffle=True, num_workers=0)

In [None]:
for i, (interactions) in enumerate(data_loader):
    if i == 0:
        print(interactions.shape)
    else:
        break

### **Building Factorization Machines model**


Our training matrix is now even sparser: Of all 237,746,250 values (90,570*2,625), only 181,140 are non-zero (90,570*2). In other words, the matrix is 99.92% sparse. Storing this as a dense matrix would be a massive waste of both storage and computing power!
To avoid this, let’s use a scipy.lil_matrix sparse matrix for samples and a numpy array for labels.



<div>
<center><img src="https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2019/04/03/sagemaker-factorization-1.gif" width="400"/></center>
</div>

##### **LAYERS:** Linear and FM part of the equation

In [None]:
# EMBEDDING PYTORCH: https://pytorch.org/docs/stable/nn.html?highlight=embedding#torch.nn.Embedding

In [None]:
# Linear part of the equation
class FeaturesLinear(torch.nn.Module):

    def __init__(self, field_dims, output_dim=1):
        super().__init__()

        self.fc = torch.nn.Embedding(field_dims, output_dim)
        self.bias = torch.nn.Parameter(torch.zeros((output_dim,)))

    def forward(self, x):
        """
        :param x: Long tensor of size ``(batch_size, num_fields)``
        """
        # self.fc(x).shape --> [batch_size, num_fields, 1]
        # torch.sum(self.fc(x), dim=1).shape --> ([batch_size, 1])
        return torch.sum(self.fc(x), dim=1) + self.bias

In [None]:
# FM part of the equation
class FM_operation(torch.nn.Module):

    def __init__(self, reduce_sum=True):
        super().__init__()
        self.reduce_sum = reduce_sum

    def forward(self, x):
        """
        :param x: Float tensor of size ``(batch_size, num_fields, embed_dim)``
        """
        square_of_sum = torch.sum(x, dim=1) ** 2
        sum_of_square = torch.sum(x ** 2, dim=1)
        ix = square_of_sum - sum_of_square
        if self.reduce_sum:
            ix = torch.sum(ix, dim=1, keepdim=True)
        return 0.5 * ix


##### MODEL

In [None]:
class FactorizationMachineModel(torch.nn.Module):
    """
    A pytorch implementation of Factorization Machine.

    Reference:
        S Rendle, Factorization Machines, 2010.
    """

    def __init__(self, field_dims, embed_dim):
        super().__init__()
        # field_dims == total of nodes (sum users + context)
        # self.linear = torch.nn.Linear(field_dims, 1, bias=True)
        self.linear = FeaturesLinear(field_dims)
        self.embedding = torch.nn.Embedding(field_dims, embed_dim, sparse=False)
        self.fm = FM_operation(reduce_sum=True)

        torch.nn.init.xavier_uniform_(self.embedding.weight.data)

    def forward(self, interaction_pairs):
        """
        :param interaction_pairs: Long tensor of size ``(batch_size, num_fields)``
        """
        out = self.linear(interaction_pairs) + self.fm(self.embedding(interaction_pairs))
        
        return out.squeeze(1)
        
    def predict(self, interactions, device):
        # return the score, inputs are numpy arrays, outputs are tensors
        test_interactions = torch.from_numpy(interactions).to(dtype=torch.long, device=device)
        output_scores = self.forward(test_interactions)
        return output_scores
    


### **Workflow for FM with usual embeddings ...**

#### **Train**

In [None]:
from statistics import mean

def train_one_epoch(model, optimizer, data_loader, criterion, device, log_interval=100):
    model.train()
    total_loss = []

    for i, (interactions) in enumerate(data_loader):
        interactions = interactions.to(device)
        targets = interactions[:,2]
        predictions = model(interactions[:,:2])
        
        loss = criterion(predictions, targets.float())
        model.zero_grad()
        loss.backward()
        optimizer.step()
        total_loss.append(loss.item())

    return mean(total_loss)

#### **Evaluation**

##### **Understanding evaluation ...**

In [None]:
len(full_dataset.test_set)

In [None]:
user_test = full_dataset.test_set[0]
user_test.shape

In [None]:
user_test

In [None]:
gt_pair = user_test[0]
neg_items = user_test[1:]
print(f'gt_pair: {gt_pair}')
print(f'lenght neg_items: {len(neg_items)}')

In [None]:
# DEFINE GT_ITEM
gt_item = user_test[0][1]
gt_item

In [None]:
# Defining dummy model with 8 embedding dimensions
dummy_model = FactorizationMachineModel(full_dataset.field_dims[-1], 8).to(device)
out = dummy_model.predict(user_test, device)
out.shape

In [None]:
# Print first 10 predictions, where 1st one is the one for the GT
out[:10]

In [None]:
values, indices = torch.topk(out, 10)
print(values)
print(indices.cpu().detach().numpy())

In [None]:
user_test[0]

In [None]:
# RANKING LIST TO RECOMMEND
recommend_list = user_test[indices.cpu().detach().numpy()][:, 1]
recommend_list

In [None]:
gt_item in recommend_list

##### **Defining test function...**

In [None]:
def test(model, full_dataset, device, topk=10):
    # Test the HR and NDCG for the model @topK
    model.eval()

    HR, NDCG = [], []

    for user_test in full_dataset.test_set:
        gt_item = user_test[0][1]

        predictions = model.predict(user_test, device)
        _, indices = torch.topk(predictions, topk)
        recommend_list = user_test[indices.cpu().detach().numpy()][:, 1]

        HR.append(getHitRatio(recommend_list, gt_item))
        NDCG.append(getNDCG(recommend_list, gt_item))
    return mean(HR), mean(NDCG)

#### **Model, loss and optimizer definition**

In [None]:
model = FactorizationMachineModel(full_dataset.field_dims[-1], 32).to(device)

In [None]:
criterion = torch.nn.BCEWithLogitsLoss(reduction='mean')
optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)


#### **Random evaluation**

In [None]:
topk = 10

# Check Init performance
hr, ndcg = test(model, full_dataset, device, topk=topk)
print("initial HR: ", hr)
print("initial NDCG: ", ndcg)


#### **Start training the model**

In [None]:
# DO EPOCHS NOW
tb = True
topk = 10
for epoch_i in range(20):
    #data_loader.dataset.negative_sampling()
    train_loss = train_one_epoch(model, optimizer, data_loader, criterion, device)
    hr, ndcg = test(model, full_dataset, device, topk=topk)

    print('\n')

    print(f'epoch {epoch_i}:')
    print(f'training loss = {train_loss:.4f} | Eval: HR@{topk} = {hr:.4f}, NDCG@{topk} = {ndcg:.4f} ')
    print('\n')
    if tb:
        tb_fm.add_scalar('train/loss', train_loss, epoch_i)
        tb_fm.add_scalar('eval/HR@{topk}', hr, epoch_i)
        tb_fm.add_scalar('eval/NDCG@{topk}', ndcg, epoch_i)


## **VISUALIZING RESULTS**

Once we have trained both models (*fm with usual embbedding layers* vs *fm with embeddings from gcn*), we can observe both metrics and loss in the same graphic in order to compare:

In [None]:
%tensorboard --logdir runs