## Training and Testing

Let's start by manually defining some neccesary parameters

In [1]:
import numpy as np
import torch
import os
import re
import scipy.sparse as sp
import multiprocessing
import torch.nn.functional as F

from time import time
from functools import partial
from utils.load_data import Data
from utils.metrics import ranklist_by_heapq, get_performance
# from utils.parser import parse_args
from ngcf import NGCF
from multiprocessing import Pool

In [9]:
use_cuda = torch.cuda.is_available()

cores = multiprocessing.cpu_count()

Ks = [10, 20]

data_path = "Data/toy_data/"
batch_size = 64
data_generator = Data(data_path, batch_size, val=False)
n_users = data_generator.n_users
n_items = data_generator.n_items

_, _, mean_adj = data_generator.get_adj_mat()
adj_mtx = mean_adj + sp.eye(mean_adj.shape[0])

emb_dim = 12
layers = [12, 6]
node_dropout = 0.1
mess_dropout = [0.1]*len(layers)
reg = 1e-5
lr = 0.01
n_fold = 10

pretrain = 0

print_every, eval_every, save_every = 1, 1, 10

n_users=1000, n_items=2000
n_interactions=30780
n_train=24228, n_test=6552, sparsity=0.01539
already load adj matrix (3000, 3000) 0.012362957000732422


In [10]:
model = NGCF(data_generator.n_users, data_generator.n_items, emb_dim, layers, 
             reg,node_dropout, mess_dropout, adj_mtx, n_fold)
if use_cuda: 
    model = model.cuda()

In [11]:
for n,p in model.named_parameters():
    if p.requires_grad: print(n)

u_embeddings
i_embeddings
u_g_embeddings
i_g_embeddings
W1.0.weight
W1.0.bias
W1.1.weight
W1.1.bias
W2.0.weight
W2.0.bias
W2.1.weight
W2.1.bias


And this is it really regarding the parameters of the model. We only need the user/item embeddings and then a series of linear layers (that we could refer as graph layers). The embeddings will be concatenated over rows, multiplied by the Laplacian matrix, and then passed through a the graph/linear layers recursively. 

Let's now move to the training phase. The training phase is your typical pytorch training function, with the exception that the output of the forward pass is already the [BPR](https://arxiv.org/pdf/1205.2618.pdf) loss. I will leave out of this notebook schedulers, different optimizers and some other rings and bells.

It goes this way:

In [12]:
def train(model, data_generator, optimizer):
    model.train()
    n_batch = data_generator.n_train // data_generator.batch_size + 1
    running_loss=0
    for _ in range(n_batch):
        u, i, j = data_generator.sample()
        optimizer.zero_grad()
        loss = model(u,i,j)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    return running_loss

We have not talked about the BPR loss yet, so let's have a look. The definition in the [paper](https://arxiv.org/pdf/1905.08108.pdf) is:

$$
Loss = \sum_{(u,i,j) \in \mathcal{O}} -ln \big(\sigma(\hat{y}_{ui} - \hat{y}_{uj})\big) + \lambda ||\Theta||^{2}_{2}
$$

Where $\mathcal{O} = \{ (u,i,j)|(u,i) \in  R^{+}, (u,j) \in R^{-} \}$ is the set of training tuples with $R^{+}$ and $R^{-}$ corresponding to observed and unobserved interactions (aka positive and negative) respectively. $\sigma$ is the sigmoid function and $||\Theta|| = \{ \text{E}, \{ \textbf{W}^{l}_{1},\textbf{W}^{l}_{2} \}^{L}_{l=1}  \}$ are all training parameters. 

In pytorch:

In [13]:
def bpr_loss(self, u, i, j):
    # first term
    y_ui = torch.mul(u, i).sum(dim=1)
    y_uj = torch.mul(u, j).sum(dim=1)
    log_prob = (torch.log(torch.sigmoid(y_ui-y_uj))).mean()

    # regularization (to be honest this does not help much when using pytorch)
    l2norm = (torch.sum(u**2)/2. + torch.sum(i**2)/2. + torch.sum(j**2)/2.).mean()
    l2reg  = reg*l2norm

    # Loss
    return -log_prob + l2reg

okay, so now we now how the training happens, let's move to the validation/testing. Here, we will first use the authors `early_stopping` function. I am sure there are more "pytorchian" ways of doing it, but this function is simple and does the job, so let's use it 

In [14]:
def early_stopping(log_value, best_value, stopping_step, expected_order='asc', patience=10):

    # better is higher or lower
    assert expected_order in ['asc', 'dec']

    if (expected_order == 'asc' and log_value >= best_value) or (expected_order == 'dec' and log_value <= best_value):
        stopping_step = 0
        best_value = log_value
    else:
        stopping_step += 1

    if stopping_step >= patience:
        print("Early stopping is trigger at step: {} log:{}".format(patience, log_value))
        should_stop = True
    else:
        should_stop = False

    return best_value, stopping_step, should_stop

Now let's see how we test on one user

In [21]:
def test_one_user(x):
    rating = x[0]
    u = x[1]

    try:
        training_items = data_generator.train_items[u]
    except Exception:
        training_items = []

    user_pos_test = data_generator.test_set[u]
    all_items = set(range(data_generator.n_items))
    test_items = list(all_items - set(training_items))
    r = ranklist_by_heapq(user_pos_test, test_items, rating, Ks)

    return get_performance(user_pos_test, r, Ks)

And now that we know how to test in one user, let's do it for the whole dataset

In [22]:
def test_CPU(model, users_to_test):
    model.eval()
    result = {'precision': np.zeros(len(Ks)), 'recall': np.zeros(len(Ks)), 'ndcg': np.zeros(len(Ks)),
              'hit_ratio': np.zeros(len(Ks))}

    pool = multiprocessing.Pool(cores)

    u_batch_size = batch_size * 2
    test_users = users_to_test
    n_test_users = len(test_users)
    n_user_batchs = n_test_users // u_batch_size + 1

    count = 0

    for u_batch_id in range(n_user_batchs):

        start = u_batch_id * u_batch_size
        end = (u_batch_id + 1) * u_batch_size

        user_batch = test_users[start: end]
        item_batch = range(data_generator.n_items)

        user_emb = model.u_g_embeddings[user_batch].detach()

        rate_batch  = torch.mm(user_emb, model.i_g_embeddings.t().detach()).cpu().numpy()

        user_batch_rating_uid = zip(rate_batch, user_batch)
        batch_result = pool.map(test_one_user, user_batch_rating_uid)
        count += len(batch_result)

        for re in batch_result:
            result['precision'] += re['precision']/n_test_users
            result['recall'] += re['recall']/n_test_users
            result['ndcg'] += re['ndcg']/n_test_users
            result['hit_ratio'] += re['hit_ratio']/n_test_users

    assert count == n_test_users
    pool.close()
    return result


Let's see how all comes together! (Note that the process here is **extremely** inefficient since we are splitting a 3000x3000 matrix into 10 folds and using a 32 batch for only 1000 users)

In [24]:
stopping_step, should_stop = 0, False
for epoch in range(2):

    t1 = time()
    loss = train(model, data_generator, optimizer)

    if epoch % print_every  == (print_every - 1):
        print("Epoch:{} {:.2f}s, Loss = {:.4f}".
            format(epoch, time()-t1, loss))

    if epoch % eval_every  == (eval_every - 1):
        with torch.no_grad():
            t2 = time()
            users_to_test = list(data_generator.test_set.keys())
            res = test_CPU(model, users_to_test)
        print(
            "VALIDATION:","\n",
            "Epoch: {}, {:.2f}s".format(epoch, time()-t2),"\n",
            "Recall@{}: {:.4f}, Recall@{}: {:.4f}".format(Ks[0], res['recall'][0],  Ks[-1], res['recall'][-1]), "\n",
            "Precision@{}: {:.4f}, Precision@{}: {:.4f}".format(Ks[0], res['precision'][0],  Ks[-1], res['precision'][-1]), "\n",
            "Hit_ratio@{}: {:.4f}, Hit_ratio@{}: {:.4f}".format(Ks[0], res['hit_ratio'][0],  Ks[-1], res['hit_ratio'][-1]), "\n",
            "NDCG@{}: {:.4f}, NDCG@{}: {:.4f}".format(Ks[0], res['ndcg'][0],  Ks[-1], res['ndcg'][-1])
            )        

Epoch:0 94.24s, Loss = 183.0861
VALIDATION: 
 Epoch: 0, 0.65s 
 Recall@10: 0.0056, Recall@20: 0.0109 
 Precision@10: 0.0036, Precision@20: 0.0035 
 Hit_ratio@10: 0.0360, Hit_ratio@20: 0.0660 
 NDCG@10: 0.0185, NDCG@20: 0.0265
Epoch:1 94.42s, Loss = 169.0698
VALIDATION: 
 Epoch: 1, 0.64s 
 Recall@10: 0.0063, Recall@20: 0.0108 
 Precision@10: 0.0042, Precision@20: 0.0034 
 Hit_ratio@10: 0.0420, Hit_ratio@20: 0.0650 
 NDCG@10: 0.0167, NDCG@20: 0.0229


# GPU-enabled test

We could also use the GPU-enabled test functions I described in Chapter03

In [27]:
from collections import defaultdict

def split_mtx(X, n_folds=10):
    X_folds = []
    fold_len = X.shape[0]//n_folds
    for i in range(n_folds):
        start = i * fold_len
        if i == n_folds -1:
            end = X.shape[0]
        else:
            end = (i + 1) * fold_len
        X_folds.append(X[start:end])
    return X_folds


def ndcg_at_k_gpu(pred_items, test_items, test_indices, k):
    r = (test_items * pred_items).gather(1, test_indices)
    f = torch.from_numpy(np.log2(np.arange(2, k+2))).float().cuda()
    dcg = (r[:, :k]/f).sum(1)
    dcg_max = (torch.sort(r, dim=1, descending=True)[0][:, :k]/f).sum(1)
    ndcg = dcg/dcg_max
    ndcg[torch.isnan(ndcg)] = 0
    return ndcg


def test_GPU(u_emb, i_emb, Rtr, Rte, Ks):

    ue_folds = split_mtx(u_emb)
    tr_folds = split_mtx(Rtr)
    te_folds = split_mtx(Rte)

    fold_prec, fold_rec, fold_ndcg, fold_hr = \
        defaultdict(list), defaultdict(list), defaultdict(list), defaultdict(list)
    for ue_f, tr_f, te_f in zip(ue_folds, tr_folds, te_folds):

        scores = torch.mm(ue_f, i_emb.t())
        test_items = torch.from_numpy(te_f.todense()).float().cuda()
        non_train_items = torch.from_numpy(1-(tr_f.todense())).float().cuda()
        scores = scores * non_train_items
        _, test_indices = torch.topk(scores, dim=1, k=max(Ks))
        pred_items = torch.zeros_like(scores).float()
        pred_items.scatter_(dim=1,index=test_indices,src=torch.tensor(1.0).cuda())

        for k in Ks:
            topk_preds = torch.zeros_like(scores).float()
            topk_preds.scatter_(dim=1,index=test_indices[:, :k],src=torch.tensor(1.0))

            TP = (test_items * topk_preds).sum(1)
            prec = TP/k
            rec = TP/test_items.sum(1)
            hit_r = (TP > 0).float()
            ndcg = ndcg_at_k_gpu(pred_items, test_items, test_indices, k)

            fold_prec[k].append(prec)
            fold_rec[k].append(rec)
            fold_ndcg[k].append(ndcg)
            fold_hr[k].append(hit_r)

    result = {'precision': [], 'recall': [], 'ndcg': [], 'hit_ratio': []}
    for k in Ks:
        result['precision'].append(torch.cat(fold_prec[k]).mean())
        result['recall'].append(torch.cat(fold_rec[k]).mean())
        result['ndcg'].append(torch.cat(fold_ndcg[k]).mean())
        result['hit_ratio'].append(torch.cat(fold_hr[k]).mean())
    return result

And in identical fashion to test, we can do

In [28]:
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
stopping_step, should_stop = 0, False
for epoch in range(2):
    t1 = time()
    loss = train(model, data_generator, optimizer)
    if epoch % print_every  == (print_every - 1):
        print("Epoch:{} {:.2f}s, Loss = {:.4f}".
            format(epoch, time()-t1, loss))
    if epoch % eval_every  == (eval_every - 1):
        t2 = time()
        res = test_GPU(
            model.u_g_embeddings.detach(),
            model.i_g_embeddings.detach(),
            data_generator.Rtr,
            data_generator.Rte,
            Ks)
        print("VALIDATION.","\n"
            "Epoch: {}, {:.2f}s".format(epoch, time()-t2),"\n",
            "Recall@{}: {:.4f}, Recall@{}: {:.4f}".format(Ks[0], res['recall'][0],  Ks[-1], res['recall'][-1]), "\n"
            "Precision@{}: {:.4f}, Precision@{}: {:.4f}".format(Ks[0], res['precision'][0],  Ks[-1], res['precision'][-1]), "\n"
            "Hit_ratio@{}: {:.4f}, Hit_ratio@{}: {:.4f}".format(Ks[0], res['hit_ratio'][0],  Ks[-1], res['hit_ratio'][-1]), "\n"
            "NDCG@{}: {:.4f}, NDCG@{}: {:.4f}".format(Ks[0], res['ndcg'][0],  Ks[-1], res['ndcg'][-1])
            )

Epoch:0 94.42s, Loss = 148.5109
VALIDATION. 
Epoch: 0, 0.27s 
 Recall@10: 0.0045, Recall@20: 0.0081 
Precision@10: 0.0031, Precision@20: 0.0028 
Hit_ratio@10: 0.0290, Hit_ratio@20: 0.0530 
NDCG@10: 0.0147, NDCG@20: 0.0208
Epoch:1 94.22s, Loss = 140.6485
VALIDATION. 
Epoch: 1, 0.20s 
 Recall@10: 0.0039, Recall@20: 0.0096 
Precision@10: 0.0029, Precision@20: 0.0032 
Hit_ratio@10: 0.0270, Hit_ratio@20: 0.0610 
NDCG@10: 0.0143, NDCG@20: 0.0228


`test_GPU` test will be a lot faster that `test_CPU` when running the real exercise. For example, for the `Gowalla` dataset used by the authors, using a batch size of 5096 the testing time is nearly 5 min when using `test_CPU` and is less than 15sec when using `test_GPU`. 