# VAE for ranking items

## Model Formalization

For each user $u \in U$, we have a set, $P_u$ = { $(m_1, m_2)$ | $rating_u^{m_1}$ > $rating_u^{m_2}$) } 

$P$ =  $\bigcup\limits_{\forall u \; \in \; U} P_u$

$\forall (u, m_1, m_2) \in P, $ we send two inputs, $x_1 = u \Vert m_1$ and $x_2 = u \Vert m_2$ to a VAE (with the same parameters).

We expect the VAE's encoder to produce $z_1$ (sampled from the distribution: $(\mu_1 , \Sigma_1$)) from $x_1$ ; and similarly $z_2$ from $x_2$ using the parameters $\theta$.

The decoder network is expected to learn a mapping function $f_{\phi}$ from $z_1$ to $m_1$.

We currently have 2 ideas for the decoder network:
1. Using two sets of network parameters, $\phi$ and $\psi$ for $z_1$ and $z_2$ respectively.
2. Using $\phi$ for both $z_1$ and $z_2$.

For ranking the pairs of movies, we have another network:
1. The input of the network is $z_1 \Vert z_2$, 
2. Is expected to learn a mapping, $f_{\delta}$ to a bernoulli distribution over True/False, modelling $rating_u^{m_1} > rating_u^{m_2}$.

## Loss Function

$$Loss \; = \; KL( \, \phi(z_1 \vert x_1) \Vert {\rm I\!N(0, I)} \, ) \; + \; KL( \, \psi(z_2 \vert x_2) \Vert {\rm I\!N(0, I)} \, ) \; - \; \sum_{i} m_{1i} \, log( \, f_{\phi}(z_1)_i ) \; - \; \sum_{i} m_{2i} \, log( \, f_{\psi}(z_2)_i ) \; - \; f_{\delta}(z_1 \Vert z_2) $$

# Imports

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

import gc
import time
import json
import pickle
import random
import functools
import numpy as np
from tqdm import tqdm
import datetime as dt

# Utlity functions

In [2]:
LongTensor = torch.LongTensor
FloatTensor = torch.FloatTensor

is_cuda_available = torch.cuda.is_available()

if is_cuda_available: 
    print("Using CUDA...\n")
    LongTensor = torch.cuda.LongTensor
    FloatTensor = torch.cuda.FloatTensor

def save_obj(obj, name):
    with open(name + '.pkl', 'wb') as f:
        pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

def save_obj_json(obj, name):
    with open(name + '.json', 'w') as f:
        json.dump(obj, f)

def load_obj(name):
    with open(name + '.pkl', 'rb') as f:
        return pickle.load(f)

def load_obj_json(name):
    with open(name + '.json', 'r') as f:
        return json.load(f)

def file_write(log_file, s):
    print(s)
    f = open(log_file, 'a')
    f.write(s+'\n')
    f.close()

def clear_log_file(log_file):
    f = open(log_file, 'w')
    f.write('')
    f.close()

def pretty_print(h):
    print("{")
    for key in h:
        print(' ' * 4 + str(key) + ': ' + h[key])
    print('}\n')

Using CUDA...



# Hyper Parameters

In [3]:
hyper_params = {
#     'data_base': 'saved_data/pro_sg/',
#     'project_name': 'ranking_vae_single_score_ml1m',
    'data_base': 'saved_data/netflix-good-sample/pro_sg/',
    'project_name': 'ranking_vae_netflix_good_sample',
    'model_file_name': '',
    'log_file': '',
    'data_split': [0.8, 0.2], # Train : Test

    'learning_rate': 0.05, # if optimizer is adadelta, learning rate is not required
    'optimizer': 'adam',
    'loss_type': 'hinge',
    'm_loss': float(1),
    'weight_decay': float(1e-6),

    'epochs': 50,
    'batch_size': 512,

    'user_embed_size': 128,
    'item_embed_size': 128,
    
    'hidden_size': 100,
    'latent_size': 64,

    'number_users_to_keep': 100000,
    'batch_log_interval': 2000,
}

file_name = '_optimizer_' + str(hyper_params['optimizer'])
if hyper_params['optimizer'] != 'adadelta':
    file_name += '_lr_' + str(hyper_params['learning_rate'])
file_name += '_user_embed_size_' + str(hyper_params['user_embed_size'])
file_name += '_item_embed_size_' + str(hyper_params['item_embed_size'])
file_name += '_weight_decay_' + str(hyper_params['weight_decay'])

hyper_params['log_file'] = 'saved_logs/' + hyper_params['project_name'] + '_log' + file_name + '.txt'
hyper_params['model_file_name'] = 'saved_models/' + hyper_params['project_name'] + '_model' + file_name + '.pt'


# Data Parsing

In [4]:
def load_data(hyper_params):   
    file_write(hyper_params['log_file'], "Started reading data file")
    f = open(hyper_params['data_base'] + "train.csv")
    lines = f.readlines()

    item_hist = {}
    max_item = 0
    max_user = 0

    for line in lines[1:]:
        line = line.strip().split(",")
        if line[0] not in item_hist: item_hist[line[0]] = []
        item_hist[line[0]].append(int(line[1]))
        max_item = max(max_item, int(line[1]))
        max_user = max(max_user, int(line[0]))

    f = open(hyper_params['data_base'] + "test_tr.csv")
    lines = f.readlines()

    test_tr = {}

    for line in lines[1:]:
        line = line.strip().split(",")
        if line[0] not in test_tr: test_tr[line[0]] = []
        test_tr[line[0]].append(int(line[1]))
        
        # ADDING FOLD-IN SET TO TRAINING
        if line[0] not in item_hist: item_hist[line[0]] = []
        item_hist[line[0]].append(int(line[1]))

        max_item = max(max_item, int(line[1]))
        max_user = max(max_user, int(line[0]))

    # Sample negs
    number_negs = 5
    negs = {}
    for user in item_hist:
        negs[user] = set()
        while len(negs[user]) < number_negs:
            random_neg = random.randint(0, max_item)
            if random_neg not in item_hist[user]: negs[user].add(random_neg)
        negs[user] = list(negs[user])

    f = open(hyper_params['data_base'] + "test_te.csv")
    lines = f.readlines()

    test = {}

    for line in lines[1:]:
        line = line.strip().split(",")
        if line[0] not in test: test[line[0]] = []
        test[line[0]].append(int(line[1]))
        max_item = max(max_item, int(line[1]))
        max_user = max(max_user, int(line[0]))
        
    file_write(hyper_params['log_file'], "Data Files loaded!")
    
    train_reader = DataReader(hyper_params, item_hist, negs, max_user+1, max_item+1, True)
    test_reader = DataReader(hyper_params, test_tr, test, max_user+1, max_item+1, False)

    return train_reader, test_reader, max_user+1, max_item+1
    
#     return item_hist, negs, test_tr, test, max_user+1, max_item+1
    
#     train = load_obj_json(hyper_params['data_base'] + 'train_ranking_vae')
#     test = load_obj_json(hyper_params['data_base'] + 'test_ranking_vae')
#     user_hist = load_obj_json(hyper_params['data_base'] + 'user_hist_ranking_vae')
#     item_hist = load_obj_json(hyper_params['data_base'] + 'item_hist_ranking_vae')

#     train_reader = DataReader(hyper_params, train, len(user_hist), item_hist, True)
#     test_reader = DataReader(hyper_params, test, len(user_hist), item_hist, False)

#     return train_reader, test_reader, len(user_hist), len(item_hist)

class DataReader:

    def __init__(self, hyper_params, a, b, num_users, num_items, is_training):
        self.hyper_params = hyper_params
        self.batch_size = hyper_params['batch_size']
        self.num_users = num_users
        self.num_items = num_items
        self.is_training = is_training
        
        if is_training == False:
            self.test_tr = a
            self.test_te = b
        else:
            self.data = a
            self.negs = b
            self.number()

    def number(self):
        users_done = 0
        count = 0

        x_batch_user = []
        x_batch_item = []

        for user in self.data:

            if users_done > self.hyper_params['number_users_to_keep']: break
            users_done += 1

            for i in range(len(self.data[user])):
                for ii in range(len(self.negs[user])):

                    x_batch_user.append(0)

                    x_batch_item.append(0)
                    x_batch_item.append(0)

                    if len(x_batch_user) == self.batch_size:

                        count += 1
                        
                        x_batch_user = []
                        x_batch_item = []

        self.num_b = count

    def iter(self):
        users_done = 0

        x_batch_user = []
        x_batch_item = []

        for user in self.data:

            if users_done > self.hyper_params['number_users_to_keep']: break
            users_done += 1

            for i in range(len(self.data[user])):
                for ii in range(len(self.negs[user])):

                    x_batch_user.append(int(user))

                    x_batch_item.append(self.data[user][i])
                    x_batch_item.append(self.negs[user][ii])

                    if len(x_batch_user) == self.batch_size:

                        yield Variable(LongTensor(x_batch_user)), Variable(LongTensor(x_batch_item[::2])), Variable(LongTensor(x_batch_item[1::2]))
                        
                        x_batch_user = []
                        x_batch_item = []

    def iter_eval(self):
        all_users = list(self.test_te.keys())
        users_done = 0
        
        for user_now in tqdm(range(len(all_users))):
            if users_done > self.hyper_params['number_users_to_keep']: break
            users_done += 1
            
            user = all_users[user_now]

            yield int(user), self.test_tr[user], self.test_te[user]

# Evaluation Code

In [5]:
def map_int(a):
    if float(a.data) < 0.0: return -1
    if float(a.data) > 0.0: return 1
    return 0

def evaluate_ndcg(model, criterion, reader, hyper_params):
    model.eval()

    ret = 0.0
    
    Ks = [10, 100]
    metrics = {}
    for k in Ks:
        metrics['NDCG@' + str(k)] = 0.0
        metrics['HR@' + str(k)] = 0.0
        metrics['Prec@' + str(k)] = 0.0

    user_done = 0
    total_ndcg = 0.0

    for u, x_tr, x_te in reader.iter_eval():
        user_done += 1
        
        x_user = [ u for i in range(hyper_params['total_items']) ]
        x_item = list(range(hyper_params['total_items']))
        
        x_user = Variable(LongTensor(x_user))
        x_item = Variable(LongTensor(x_item))

        _, scores = model(x_user, x_item)

        scores = scores.data
        scores[LongTensor(x_tr)] = -np.inf
        
        _, argsorted = torch.sort(-1.0 * scores)
        for k in Ks:
            best = 0.0
            now_at = 0.0
            dcg = 0.0
            hr = 0.0

            rec_list = list(argsorted[:k].cpu().numpy())
            for m in range(len(x_te)):
                movie = x_te[m]
                now_at += 1.0
                if now_at <= k: best += 1.0 / float(np.log2(now_at + 1))

                if movie not in rec_list: continue
                hr += 1.0
                dcg += 1.0 / float(np.log2(float(rec_list.index(movie) + 2)))

            metrics['NDCG@' + str(k)] += float(dcg) / float(best)
            metrics['HR@' + str(k)] += float(hr) / float(len(x_te))
            metrics['Prec@' + str(k)] += float(hr) / float(k)

        total_ndcg += 1.0
    
    for k in Ks:
        metrics['NDCG@' + str(k)] = round((100.0 * metrics['NDCG@' + str(k)]) / float(total_ndcg), 4)
        metrics['HR@' + str(k)] = round((100.0 * metrics['HR@' + str(k)]) / float(total_ndcg), 4)
        metrics['Prec@' + str(k)] = round((100.0 * metrics['Prec@' + str(k)]) / float(total_ndcg), 4)
    
    return metrics

def evaluate(model, criterion, reader, hyper_params, is_train_set):
    model.eval()

    metrics = {}
    metrics['CP'] = 0.0
    metrics['ZEROS'] = 0.0
    metrics['loss'] = 0.0

    correct = 0
    not_correct = 0
    zeros = 0
    total = 0
    batch = 0
    
    NDCG = evaluate_ndcg(model, criterion, reader, hyper_params)
    for k in NDCG: metrics[k] = NDCG[k]
        
    return metrics
    
#     for x1, x2, y in reader.iter():
#         batch += 1
#         if is_train_set == True and batch > hyper_params['testing_batch_limit']: break
        
#         o1, output1 = model(x1)
#         o2, output2 = model(x2)
#         out_diff = torch.gt(output1, output2).float() - torch.lt(output1, output2).float()

#         metrics['loss'] += criterion(o1 + o2, [output1] + [output2], y, x1[1], x2[1], x1[0], x2[0]).data
        
#         temp_correct  = int(torch.sum((torch.lt(y, 0.0) * torch.lt(out_diff, 0.0)).float()).data)
#         temp_correct += int(torch.sum((torch.gt(y, 0.0) * torch.gt(out_diff, 0.0)).float()).data)

#         temp_not_correct  = int(torch.sum((torch.lt(y, 0.0) * torch.gt(out_diff, 0.0)).float()).data)
#         temp_not_correct += int(torch.sum((torch.gt(y, 0.0) * torch.lt(out_diff, 0.0)).float()).data)

#         temp_zeros = int(torch.sum(torch.eq(out_diff, 0.0)).data)

#         correct += temp_correct
#         not_correct += temp_not_correct
#         zeros += temp_zeros
#         total += int(y.shape[0])
        
#         assert temp_correct + temp_not_correct + temp_zeros == int(y.shape[0])

#     assert correct + not_correct + zeros == total

#     metrics['CP'] = float(correct) / float(total)
#     metrics['CP'] *= 100.0
#     metrics['CP'] = round(metrics['CP'], 4)

#     metrics['ZEROS'] = float(zeros) / float(total)
#     metrics['ZEROS'] *= 100.0
#     metrics['ZEROS'] = round(metrics['ZEROS'], 4)

#     metrics['loss'] = float(metrics['loss'][0]) / float(batch)
#     metrics['loss'] = round(metrics['loss'], 4)

#     if is_train_set == False:
#         ndcg = evaluate_ndcg(model, criterion, reader, hyper_params)
#         for k in ndcg: metrics['NDCG@' + str(k)] = ndcg[k]

    return metrics

# Model

In [6]:
class Encoder(nn.Module):
    def __init__(self, hyper_params):
        super(Encoder, self).__init__()
        self.linear1 = nn.Linear(
            hyper_params['user_embed_size'] + hyper_params['item_embed_size'], hyper_params['hidden_size']
        )
        nn.init.xavier_normal(self.linear1.weight)
        self.activation = nn.ReLU()
        #self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        x = self.linear1(x)
        #x = self.dropout(x)
        x = self.activation(x)
        return x

class Decoder(nn.Module):
    def __init__(self, hyper_params, out_size):
        super(Decoder, self).__init__()
        self.linear1 = nn.Linear(hyper_params['latent_size'], hyper_params['hidden_size'])
        self.linear2 = nn.Linear(hyper_params['hidden_size'], out_size)
        nn.init.xavier_normal(self.linear1.weight)
        nn.init.xavier_normal(self.linear2.weight)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        
        x = self.linear2(x)
        x = self.activation(x)
        return x

class Model(nn.Module):
    def __init__(self, hyper_params):
        super(Model, self).__init__()
        self.hyper_params = hyper_params
        
        self.encoder = Encoder(hyper_params)
        self.decoder_item = Decoder(hyper_params, hyper_params['total_items'])
        #self.decoder_user = Decoder(hyper_params, hyper_params['total_users'])
        
        self._enc_mu = nn.Linear(hyper_params['hidden_size'], hyper_params['latent_size'])
        self._enc_log_sigma = nn.Linear(hyper_params['hidden_size'], hyper_params['latent_size'])
        nn.init.xavier_normal(self._enc_mu.weight)
        nn.init.xavier_normal(self._enc_log_sigma.weight)
        
        self.user_embed = nn.Embedding(hyper_params['total_users'], hyper_params['user_embed_size'])
        self.item_embed = nn.Embedding(hyper_params['total_items'], hyper_params['item_embed_size'])
        nn.init.normal(self.user_embed.weight.data, mean=0, std=0.01)
        nn.init.normal(self.item_embed.weight.data, mean=0, std=0.01)
        
        self.activation = nn.ReLU()
        self.activation_last = nn.Tanh()
        if self.hyper_params['loss_type'] == 'bce': self.activation_last = nn.Sigmoid()
        
        prev = hyper_params['latent_size']
        self.layer_hinge1 = nn.Linear(prev, 1)
        nn.init.xavier_normal(self.layer_hinge1.weight)
        # self.layer_hinge2 = nn.Linear(64, 1)
        # nn.init.xavier_normal(self.layer_hinge2.weight)
        # xavier_uniform
        
        self.dropout = nn.Dropout(0.1)
        
    def sample_latent(self, h_enc):
        """
        Return the latent normal sample z ~ N(mu, sigma^2)
        """
        mu = self._enc_mu(h_enc)
        #mu = self.dropout(mu)
        log_sigma = self._enc_log_sigma(h_enc)
        #log_sigma = self.dropout(log_sigma)
        sigma = torch.exp(log_sigma)
        std_z = torch.from_numpy(np.random.normal(0, 1, size=sigma.size())).float().cuda()

        self.z_mean = mu
        self.z_sigma = sigma

        return mu + sigma * Variable(std_z, requires_grad=False)  # Reparameterization trick

    def forward(self, xuser, xitem):
        user = (self.user_embed(xuser))
        item = (self.item_embed(xitem))
        
        send = torch.cat([user, item], dim=-1)
#         send = user * item
        
        h_enc = self.encoder(send)
        z = self.sample_latent(h_enc)
        dec_item = self.decoder_item(z)
        
        # dec_user = self.decoder_user(z)
        # Can also produce user as decoder output
        
        output = self.layer_hinge1(z)
        #output = self.dropout(output)
        # output = self.activation(output)
        # output = self.layer_hinge2(output)
        # output = self.activation_last(output)
                              
        return [
            dec_item, self.z_mean, self.z_sigma
            # self.z_mean, self.z_sigma
        ], output.squeeze(-1)

# Custom loss

In [7]:
class VAELoss(torch.nn.Module):
    def __init__(self, hyper_params):
        super(VAELoss,self).__init__()

        self.loss_type = hyper_params['loss_type']
        self.m_loss = hyper_params['m_loss']
        batch_size = hyper_params['batch_size']

        self.zeros_while_max = torch.zeros(int(batch_size)).float()
        if is_cuda_available: self.zeros_while_max = self.zeros_while_max.cuda()
        self.zeros_while_max = Variable(self.zeros_while_max)
        self.exp = Variable(FloatTensor([np.e]))
        self.hundred = Variable(FloatTensor([100.0]))
        self.cce_movie = nn.CrossEntropyLoss(size_average=True)
        #self.cce_user = nn.CrossEntropyLoss(size_average=True)
        #self.bce = nn.BCELoss(size_average=True)

    def forward(self, o1, o2, tm1, tm2, anneal):
        
        m1, zm1, zs1, m2, zm2, zs2 = o1
        #zm1, zs1, zm2, zs2 = o1
        
        mean_sq1 = zm1 * zm1
        stddev_sq1 = zs1 * zs1
        kld  = torch.mean(mean_sq1 + stddev_sq1 - torch.log(stddev_sq1) - 1)
        
        mean_sq2 = zm2 * zm2
        stddev_sq2 = zs2 * zs2
        kld += torch.mean(mean_sq2 + stddev_sq2 - torch.log(stddev_sq2) - 1)
        
        likelihood  = self.cce_movie(m1, tm1)
        likelihood += self.cce_movie(m2, tm2)

        out_diff = o2[0] - o2[1]
        
        # Reference: https://papers.nips.cc/paper/3708-ranking-measures-and-loss-functions-in-learning-to-rank.pdf
        if self.loss_type == 'hinge':
            pairwise_loss = self.m_loss - (out_diff)
            pairwise_loss = torch.mean(torch.max(self.zeros_while_max, pairwise_loss))
            
        elif self.loss_type == 'bce':
            pairwise_loss = self.bce(out_diff, y)
            
        elif self.loss_type == 'easy_hinge':
            # pairwise_loss = torch.log(2.0 - (y * o2))# / np.log(2) # torch.log is base "e"
            pairwise_loss = torch.log((out_diff*out_diff) - (10*y*out_diff) + 26) - 2
            pairwise_loss = torch.mean(torch.max(self.zeros_while_max, pairwise_loss))
            
        elif self.loss_type == 'difficult_hinge':
            # pairwise_loss = torch.log(2.0 - (y * o2))# / np.log(2) # torch.log is base "e"
            pairwise_loss = torch.pow(self.hundred, 1 - (y * out_diff)) - 1
            pairwise_loss = torch.mean(torch.max(self.zeros_while_max, pairwise_loss))
        
        elif self.loss_type == 'saddle':
            pairwise_loss = torch.pow(y + out_diff, 2)

        elif self.loss_type == 'exp':
            pairwise_loss = torch.pow(self.exp, y * out_diff)

        elif self.loss_type == 'logistic':
            pairwise_loss = torch.mean(torch.log(self.m_loss + torch.pow(self.exp, -(out_diff))))
        
        final = (0.0 * kld) + (3.8 * pairwise_loss) + (1 * likelihood)
        
        return final

# Training loop

In [8]:
def train(reader):
    model.train()
    total_loss = 0
    start_time = time.time()
    batch = 0
    batch_limit = int(train_reader.num_b)

    for xuser, xpos, xneg in reader.iter():
        batch += 1
        
        model.zero_grad()
        optimizer.zero_grad()

        temp_o1, temp_o2 = model(xuser, xpos)
        temp_o3, temp_o4 = model(xuser, xneg)
        
        loss = criterion(temp_o1 + temp_o3, [temp_o2, temp_o4], xpos, xneg, anneal)
        # loss = criterion(temp_o1 + temp_o3, [temp_o2, temp_o4], anneal)
        loss.backward()

        optimizer.step()

        total_loss += loss.data

        if (batch % hyper_params['batch_log_interval'] == 0 and batch > 0) or batch == batch_limit:
            div = hyper_params['batch_log_interval']
            if batch == batch_limit: div = (batch_limit % hyper_params['batch_log_interval']) - 1
            if div <= 0: div = 1

            cur_loss = (total_loss[0] / div)
            elapsed = time.time() - start_time

            ss = '| epoch {:3d} | {:5d}/{:5d} batches | ms/batch {:5.2f} | loss {:5.4f}'.format(
                    epoch, batch, batch_limit, (elapsed * 1000) / div, cur_loss
            )
            
            file_write(hyper_params['log_file'], ss)

            total_loss = 0
            start_time = time.time()

train_reader, test_reader, total_users, total_items = load_data(hyper_params)
hyper_params['total_users'] = total_users
hyper_params['total_items'] = total_items
# hyper_params['testing_batch_limit'] = test_reader.num_b
anneal = 0.1

file_write(hyper_params['log_file'], "\n\nSimulation run on: " + str(dt.datetime.now()) + "\n\n")
file_write(hyper_params['log_file'], "Data reading complete!")
file_write(hyper_params['log_file'], "Number of train batches: {:4d}".format(train_reader.num_b))
# file_write(hyper_params['log_file'], "Number of test batches: {:4d}".format(test_reader.num_b))
file_write(hyper_params['log_file'], "Total Users: " + str(total_users))
file_write(hyper_params['log_file'], "Total Items: " + str(total_items) + "\n")

model = Model(hyper_params)
if is_cuda_available: model.cuda()

criterion = VAELoss(hyper_params)

if hyper_params['optimizer'] == 'adagrad':
    optimizer = torch.optim.Adagrad(
        model.parameters(), weight_decay=hyper_params['weight_decay'], lr=hyper_params['learning_rate']
    )
elif hyper_params['optimizer'] == 'adadelta':
    optimizer = torch.optim.Adadelta(
        model.parameters(), weight_decay=hyper_params['weight_decay']
    )
elif hyper_params['optimizer'] == 'adam':
    optimizer = torch.optim.Adam(
        model.parameters(), weight_decay=hyper_params['weight_decay']#, lr=hyper_params['learning_rate']
    )
elif hyper_params['optimizer'] == 'rmsprop':
    optimizer = torch.optim.RMSprop(
        model.parameters(), weight_decay=hyper_params['weight_decay']#, lr=hyper_params['learning_rate']
    )

file_write(hyper_params['log_file'], str(model))
file_write(hyper_params['log_file'], "\nModel Built!\nStarting Training...\n")

best_val_loss = None

try:
    for epoch in range(1, hyper_params['epochs'] + 1):
        epoch_start_time = time.time()
        
        train(train_reader)
        
        # Calulating the metrics on the train set
#         metrics = evaluate(model, criterion, train_reader, hyper_params, True)
#         string = ""
#         for m in metrics: string += " | " + m + ' = ' + str(metrics[m])
#         string += ' (TRAIN)'
    
        # Calulating the metrics on the test set
        metrics = evaluate(model, criterion, test_reader, hyper_params, False)
        string2 = ""
        for m in metrics: string2 += " | " + m + ' = ' + str(metrics[m])
        string2 += ' (TEST)'

        ss  = '-' * 89
#         ss += '\n| end of epoch {:3d} | time: {:5.2f}s'.format(epoch, (time.time() - epoch_start_time))
#         ss += string
#         ss += '\n'
#         ss += '-' * 89
        ss += '\n| end of epoch {:3d} | time: {:5.2f}s'.format(epoch, (time.time() - epoch_start_time))
        ss += string2
        ss += '\n'
        ss += '-' * 89
        file_write(hyper_params['log_file'], ss)
        
        anneal += 0.1
        
        if not best_val_loss or metrics['loss'] <= best_val_loss:
            with open(hyper_params['model_file_name'], 'wb') as f: torch.save(model, f)
            best_val_loss = metrics['loss']

except KeyboardInterrupt: print('Exiting from training early')

with open(hyper_params['model_file_name'], 'rb') as f: model = torch.load(f)
metrics = evaluate(model, criterion, test_reader, hyper_params, False)

string = ""
for m in metrics: string += " | " + m + ' = ' + str(metrics[m])

ss  = '=' * 89
ss += '\n| End of training'
ss += string
ss += '\n'
ss += '=' * 89
file_write(hyper_params['log_file'], ss)

Started reading data file
Data Files loaded!


Simulation run on: 2018-08-16 04:54:23.770395


Data reading complete!
Number of train batches: 121628
Total Users: 75454
Total Items: 17647

Model(
  (encoder): Encoder(
    (linear1): Linear(in_features=256, out_features=100, bias=True)
    (activation): ReLU()
  )
  (decoder_item): Decoder(
    (linear1): Linear(in_features=64, out_features=100, bias=True)
    (linear2): Linear(in_features=100, out_features=17647, bias=True)
    (activation): ReLU()
  )
  (_enc_mu): Linear(in_features=100, out_features=64, bias=True)
  (_enc_log_sigma): Linear(in_features=100, out_features=64, bias=True)
  (user_embed): Embedding(75454, 128)
  (item_embed): Embedding(17647, 128)
  (activation): ReLU()
  (activation_last): Tanh()
  (layer_hinge1): Linear(in_features=64, out_features=1, bias=True)
  (dropout): Dropout(p=0.1)
)

Model Built!
Starting Training...

| epoch   1 |  2000/121628 batches | ms/batch 13.96 | loss 19.4710
| epoch   1 |  4000/121628 

  0%|          | 2/8000 [00:00<08:35, 15.52it/s]

| epoch   1 | 121628/121628 batches | ms/batch 14.04 | loss 18.3734


100%|██████████| 8000/8000 [08:08<00:00, 16.37it/s]
  "type " + obj.__name__ + ". It won't be checked "
  "type " + obj.__name__ + ". It won't be checked "
  "type " + obj.__name__ + ". It won't be checked "


-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 2192.89s | CP = 0.0 | ZEROS = 0.0 | loss = 0.0 | NDCG@10 = 4.1407 | HR@10 = 1.3056 | Prec@10 = 3.86 | NDCG@100 = 6.0774 | HR@100 = 9.2612 | Prec@100 = 2.8833 (TEST)
-----------------------------------------------------------------------------------------
| epoch   2 |  2000/121628 batches | ms/batch 14.03 | loss 18.2641
| epoch   2 |  4000/121628 batches | ms/batch 14.02 | loss 18.3125
| epoch   2 |  6000/121628 batches | ms/batch 14.02 | loss 18.4493
| epoch   2 |  8000/121628 batches | ms/batch 14.03 | loss 18.3800
| epoch   2 | 10000/121628 batches | ms/batch 14.04 | loss 18.4609
| epoch   2 | 12000/121628 batches | ms/batch 14.05 | loss 18.4579
| epoch   2 | 14000/121628 batches | ms/batch 14.05 | loss 18.5111
| epoch   2 | 16000/121628 batches | ms/batch 14.05 | loss 18.5030
| epoch   2 | 18000/121628 batches | ms/batch 14.05 | loss 18.4793
| epoch   2 | 20000/12162

  0%|          | 2/8000 [00:00<08:31, 15.65it/s]

| epoch   2 | 121628/121628 batches | ms/batch 14.05 | loss 18.1791


100%|██████████| 8000/8000 [08:08<00:00, 16.39it/s]


-----------------------------------------------------------------------------------------
| end of epoch   2 | time: 2197.99s | CP = 0.0 | ZEROS = 0.0 | loss = 0.0 | NDCG@10 = 3.7882 | HR@10 = 1.2456 | Prec@10 = 3.55 | NDCG@100 = 6.0211 | HR@100 = 9.5141 | Prec@100 = 2.9195 (TEST)
-----------------------------------------------------------------------------------------
| epoch   3 |  2000/121628 batches | ms/batch 14.18 | loss 18.1229
| epoch   3 |  4000/121628 batches | ms/batch 14.47 | loss 18.1593
| epoch   3 |  6000/121628 batches | ms/batch 15.09 | loss 18.2952
| epoch   3 |  8000/121628 batches | ms/batch 14.12 | loss 18.2137
| epoch   3 | 10000/121628 batches | ms/batch 14.05 | loss 18.3181
| epoch   3 | 12000/121628 batches | ms/batch 14.06 | loss 18.3014
| epoch   3 | 14000/121628 batches | ms/batch 14.07 | loss 18.3350
| epoch   3 | 16000/121628 batches | ms/batch 14.07 | loss 18.3301
| epoch   3 | 18000/121628 batches | ms/batch 13.98 | loss 18.3054
| epoch   3 | 20000/12162

RuntimeError: value cannot be converted to type double without overflow: inf