# Notebook #1: Designing and evaluating a recommendation algorithm

In this notebook, we become familiar with the Python recommendation toolbox, in the simplest possible way. First, we setup the working environment in GDrive. Then, we go through the experimental pipeline, by:
- loading the Movielens 1M dataset; 
- performing a train-test splitting;
- creating a pointwise / pairwise / random / mostpop recommendation object;
- training the model (if applicable);
- computing the user-item relevance matrix;
- calculating some of the recommendation metrics (e.g., NDCG, Item Coverage, Diversity, Novelty).

The trained models, together with the partial computation we will save (e.g., user-item relevance matrix or metrics), will be the starting point of the investigation and the treatment covered by the other Jupyter notebooks. 

## Setup the working environment

- Python 3.6
- Package Requirements: matplotlib, numpy, pandas, scikit-learn, scipy, tensorflow-gpu==2.0
- Storage requirements: around 1GB

This step serves to mount GDrive storage within this Jupyter notebook. The command will request us to give access permissions to this notebook, so that we will be able to clone the project repository when we desire. Please follow the prompted instructions.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

We will clone the project repository in our My Drive folder. If you wish to change the target folder, please modify the command below.

In [None]:
%cd /content/gdrive/My Drive/

In [None]:
! git clone https://github.com/mirkomarras/bias-recsys-tutorial.git

We will move to the project folder in order to install the required packages. 

In [None]:
%cd bias-recsys-tutorial

In [None]:
! ls

In [None]:
! pip install -r requirements.txt

We will configure the notebooks directory as our working directory in order to simulate a local notebook execution. 

In [None]:
%cd bias-recsys-tutorial/notebooks

## Import packages

In [1]:
import sys 
import os

sys.path.append(os.path.join('..'))

In [None]:
import pandas as pd
import numpy as np

In [None]:
from helpers.train_test_splitter import *
from helpers.utils import *

We will define the folders where we will store our pre-computed results. 

In [None]:
data_path = '../data/'

In [None]:
!mkdir '../data/outputs/splits'
!mkdir '../data/outputs/instances'
!mkdir '../data/outputs/models'
!mkdir '../data/outputs/predictions'
!mkdir '../data/outputs/metrics'

## Load data

First, we 

In [None]:
dataset = 'ml1m'  
user_field = 'user_id'
item_field = 'item_id'
rating_field = 'rating'
time_field = 'timestamp'

In [None]:
data = pd.read_csv(os.path.join(data_path, 'datasets/' + dataset + '.csv'), encoding='utf8')

In [None]:
data.head()

## Split data in train and test sets

Define the train-test split parameters
- **dataset**: csv file present in the data/datasets folder
- **method**: 'uftime' for fixed timestamp split, 'utime' for time-based split per user, 'urandom' for random split per user 
- **percentage**: percentage of data to be included in the train set
- **min_train**: minimum number of train samples for a user to be included  
- **min_test**: minimum number of test samples for a user to be included
- **min_time**: start time of interactions to be included
- **max_time**: end time of interactions to be included
- **step_time**: timestamp step while computing the fixed timestamp splitter (only for method "uftime") 
- **user_field**: name of the user column in the dataset csv file
- **item_field**: name of the item column in the dataset csv file
- **rating_field**: name of the rating column in the dataset csv file
- **time_field**: name of the user column in the dataset csv file

In [5]:
method = 'utime'
percentage = 0.80        
min_train = 8
min_test = 2
min_time = None
max_time = None
step_time = 1000

Load dataset interactions

Unnamed: 0,user_id,item_id,rating,timestamp,type,type_id
0,1,1193,5.0,2000-12-31 23:12:40,Drama,7
1,2,1193,5.0,2000-12-31 22:33:33,Drama,7
2,12,1193,4.0,2000-12-31 00:49:39,Drama,7
3,15,1193,4.0,2000-12-30 19:01:19,Drama,7
4,17,1193,5.0,2000-12-30 07:41:11,Drama,7


Perform the train and tets split (splitting methods are defined in ./helpers/traintest_splitter.py)

In [8]:
if method == 'uftime':
    traintest = fixed_timestamp(data, min_train, min_test, min_time, max_time, step_time, user_field, item_field, time_field, rating_field)
elif method == 'utime':
    traintest = user_timestamp(data, percentage, min_train+min_test, user_field, item_field, time_field)
elif method == 'urandom':
    traintest = user_random(data, percentage, min_train+min_test, user_field, item_field)

> Parsing user 6000 of 6040
> Mean number of train ratings per learner: 133.07913907284768
> Mean number of test ratings per learner: 32.51837748344371


Save train and test sets in ./data/outputs/splits

In [10]:
traintest.to_csv(os.path.join(data_path, 'outputs/splits/' + dataset + '_' + method + '.csv'))

## Run the model train and test

Define the experiment parameters
- **dataset**: csv file present in the data/datasets folder
- **method**: 'uftime' for fixed timestamp split, 'utime' for time-based split per user, 'urandom' for random split per user 
- **mode**: type of feedback to be used (i.e., 'implicit' or 'explicit')
- **user_field**: name of the user column in the dataset csv file
- **item_field**: name of the item column in the dataset csv file
- **rating_field**: name of the rating column in the dataset csv file
- **type_field**: name of the category id column in the dataset csv file
- **model_type**: identifier of the recommendation model to test
- **cutoffs**: comma-separated list of cutoffs to be used for test

In [6]:
dataset = 'ml1m'
method = 'utime'
mode = 'implicit'
user_field = 'user_id'
item_field = 'item_id'
rating_field = 'rating'
type_field = 'type_id'
cutoffs = np.array([5,10,20,50,100,200])

Load pre-compute train and test sets

In [7]:
traintest = pd.read_csv(os.path.join(data_path, 'outputs/splits/' + dataset + '_' + method + '.csv'), encoding='utf8')
train = traintest[traintest['set']=='train'].copy()
test = traintest[traintest['set']=='test'].copy()
print('> Loaded', len(train.index), 'train interactions')
print('> Loaded', len(test.index), 'test interactions')

> Loaded 803798 train interactions
> Loaded 196411 test interactions


Show some statistics on users and items

In [8]:
users = list(np.unique(traintest[user_field].values))
items = list(np.unique(traintest[item_field].values))
users.sort()
items.sort()
print('> Loaded', len(users), 'users -', np.min(users), '-', np.max(users), '-', len(np.unique(users)), '-', users[:10])
print('> Loaded', len(items), 'items -', np.min(items), '-', np.max(items), '-', len(np.unique(items)), '-', items[:10])

> Loaded 6040 users - 0 - 6039 - 6040 - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> Loaded 3706 items - 0 - 3705 - 3706 - [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Load category data for items

In [9]:
items_metadata = traintest.drop_duplicates(subset=['item_id'], keep='first')
print('> Retrieved', len(items_metadata.index), 'mapping indexes, one per course')
category_per_item = items_metadata[type_field].values
print('> Loading item categories identifiers -', len(set(category_per_item)), 'categories like', category_per_item[:3])

> Retrieved 3706 mapping indexes, one per course
> Loading item categories identifiers - 18 categories like [ 7 13  3]


Choose the type of feedback you want to work with

In [10]:
if mode == 'implicit':
    train[rating_field] = train[rating_field].apply(lambda x: 1.0)
    test[rating_field] = test[rating_field].apply(lambda x: 1.0)
    traintest[rating_field] = traintest[rating_field].apply(lambda x: 1.0)

Load the model architecture to train and test

In [11]:
from models.pairwise import PairWise
model_type = 'pairwise'
model = PairWise(users, items, train, test, category_per_item, item_field, user_field, rating_field)

> Initializing user, item, and categories lists
> Initializing observed, unobserved, and predicted relevance scores
> Initializing item popularity lists
> Initializing category per item
> Initializing category preference per user
> Initializing metrics


Train the model

In [12]:
model.train(os.path.join(data_path, 'outputs/models/' + dataset + '_' + method + '_' + model_type + '_model.h5'))

Generating training instances of type pair
> Making instances for interaction 800000 / 803798 of type pair 310000 / 803798 of type pair 560000 / 803798 of type pair
> Making training - Epochs 20 Batch Size 1024 Learning Rate 0.001 Factors 10 Negatives 10 Mode pair
Train on 7877220 samples
> auc score: 0.858201970092185 - validation sample 3016000 of 30165444.8678059320369279 - validation sample 56000 of 3016544 3016544 0.8635761368655261 - validation sample 120000 of 3016544 0.8621096948572836 - validation sample 138000 of 3016544 0.8636517969683862 - validation sample 215000 of 3016544 0.8633871286255099 - validation sample 238000 of 3016544 0.8633833461588041 - validation sample 256000 of 3016544281000 of 3016544 0.8617447155910964 - validation sample 305000 of 3016544 0.8614865800412922 - validation sample 325000 of 3016544 - validation sample 373000 of 3016544of 3016544 0.860321722013189 - validation sample 476000 of 3016544 0.8600528079741253 - validation sample 495000 of 3016544 

> auc score: 0.9137085829878704 - validation sample 3016000 of 3016544f 3016544 0.9276115216702032 - validation sample 41000 of 3016544 0.9235512741454309 - validation sample 60000 of 3016544 3016544 0.9192028363349866 - validation sample 99000 of 3016544 0.9198436376558872 - validation sample 121000 of 3016544 0.9187720087713659 - validation sample 140000 of 3016544 0.918076954923854 - validation sample 157000 of 3016544 0.9175751863495037 - validation sample 174000 of 3016544 0.9159186151026649 - validation sample 257000 of 3016544340000 of 3016544 0.9166743913640693 - validation sample 356000 of 3016544 369000 of 3016544 0.9174516602318629 - validation sample 381000 of 3016544393000 of 3016544 0.9174298335065839 - validation sample 405000 of 3016544- validation sample 416000 of 3016544 0.9161144146517141 - validation sample 429000 of 3016544 0.9161705915457589 - validation sample 446000 of 3016544 0.9163126803606026 - validation sample 464000 of 3016544 0.9163489152333714 - validati

Train on 7877220 samples
Epoch 9/9
> auc score: 0.9221561929190342 - validation sample 3016000 of 3016544.9323871511026123 - validation sample 57000 of 3016544 3016544 3016544 0.9258385059594058 - validation sample 111000 of 3016544188000 of 3016544 0.92426830493634 - validation sample 209000 of 3016544 0.9233747941498122 - validation sample 235000 of 3016544 0.923245434842666 - validation sample 257000 of 3016544 0.9242716990296463 - validation sample 280000 of 3016544 0.9245279331337682 - validation sample 307000 of 3016544- validation sample 366000 of 30165443016544 0.9244481347327268 - validation sample 442000 of 3016544 - validation sample 466000 of 3016544 0.9244930254724533 - validation sample 491000 of 3016544 0.9245011131506515 - validation sample 517000 of 3016544- validation sample 537000 of 3016544 0.9231036256311693 - validation sample 628000 of 30165443016544 0.9227556699916 - validation sample 675000 of 3016544of 3016544 0.922789704956014 - validation sample 789000 of 30

> auc score: 0.92481624675398 - validation sample 2639000 of 30165444416544 0.9284552681286048 - validation sample 99000 of 3016544 3016544 0.9270088558822597 - validation sample 347000 of 3016544 of 3016544 0.9272495579811522 - validation sample 401000 of 30165443016544508000 of 3016544 0.9261793437646169 - validation sample 558000 of 3016544 0.9256266999577533 - validation sample 632000 of 3016544 0.9253795215007037 - validation sample 680000 of 30165443016544 0.9248238653886978 - validation sample 749000 of 3016544 0.9251181256574246 - validation sample 771000 of 3016544 0.9252321245496538 - validation sample 793000 of 3016544 0.9253202334289767 - validation sample 834000 of 30165443016544 0.925089114995532 - validation sample 921000 of 3016544 0.9250627788918396 - validation sample 941000 of 3016544 0.9251439539347409 - validation sample 980000 of 3016544 0.925366895386829 - validation sample 1055000 of 3016544 0.9254015635839743 - validation sample 1071000 of 3016544 0.92547114281

> auc score: 0.9247288047981417 - validation sample 3016000 of 3016544 0.9248098211639092 - validation sample 2655000 of 3016544 0.9248145231639042 - validation sample 2663000 of 30165443016544 0.9248265205871192 - validation sample 2680000 of 3016544 0.9248296410604014 - validation sample 2688000 of 3016544 0.9247633810224848 - validation sample 2696000 of 3016544 0.9247640810783724 - validation sample 2704000 of 3016544 0.9247909569354879 - validation sample 2712000 of 3016544 0.9248084835262929 - validation sample 2720000 of 3016544 0.9248127108457804 - validation sample 2728000 of 3016544 0.9248414017392538 - validation sample 2736000 of 30165442744000 of 3016544 0.9248213131778739 - validation sample 2753000 of 3016544 0.9247954735108517 - validation sample 2767000 of 3016544 0.9247915886074329 - validation sample 2773000 of 3016544 0.9248186668518651 - validation sample 2779000 of 3016544 0.9248578877035579 - validation sample 2786000 of 3016544 0.9248556859399406 - validation sa

> auc score: 0.9271636733173271 - validation sample 1479000 of 3016544.9366319765934595 - validation sample 27000 of 3016544 0.935974703386395 - validation sample 37000 of 3016544 0.9374481394012893 - validation sample 47000 of 3016544 0.9348631920139308 - validation sample 58000 of 3016544 0.9340597932383348 - validation sample 68000 of 3016544 0.9333854694170588 - validation sample 78000 of 3016544 0.9298139580883726 - validation sample 91000 of 3016544 0.929324222311546 - validation sample 102000 of 3016544 0.9293634569605579 - validation sample 113000 of 3016544 0.9303634648107677 - validation sample 124000 of 3016544 3016544 0.9284936404545174 - validation sample 146000 of 3016544158000 of 3016544 0.9278509893621301 - validation sample 174000 of 3016544 0.9275326476739373 - validation sample 186000 of 3016544 0.9281943870926015 - validation sample 201000 of 3016544 0.9275264098274416 - validation sample 211000 of 3016544 0.9278102814014416 - validation sample 221000 of 3016544 0.9

> auc score: 0.9254055287116947 - validation sample 3016000 of 3016544 0.9270876155831447 - validation sample 1496000 of 3016544 0.9270591846782826 - validation sample 1505000 of 3016544 0.9270165717008779 - validation sample 1513000 of 3016544 0.9269704816225482 - validation sample 1522000 of 3016544 0.9269980881789104 - validation sample 1531000 of 3016544 0.9269390480603192 - validation sample 1541000 of 3016544 0.9269096042914662 - validation sample 1559000 of 3016544 0.9269413731241243 - validation sample 1568000 of 3016544 0.9268023277551789 - validation sample 1578000 of 30165441595000 of 3016544 0.9268508571467808 - validation sample 1602000 of 3016544 0.9268545344553967 - validation sample 1615000 of 3016544 0.9269106492536071 - validation sample 1622000 of 3016544of 3016544 0.9268820019070325 - validation sample 1635000 of 3016544 0.9269263538816359 - validation sample 1642000 of 3016544 0.9269236343701429 - validation sample 1649000 of 3016544 0.9269329746628552 - validation

Model architecture

In [None]:
model.print()

## Compute user-item relevance scores

In [88]:
model.predict()

> Making predictions for user 6039 / 6040 6040 6040 6040 530 / 60406040/ 6040 951 / 6040 1062 / 6040/ 6040 / 6040/ 6040 1582 / 6040 1693 / 604060406040 2116 / 6040 6040 2318 / 60406040 6040/ 6040 6040 / 6040/ 60406040 60404143 / 6040 / 60404579 / 6040/ 6040 / 6040 6040/ 60405269 / 6040 / 6040 / 6040 / 6040 5804 / 6040/ 6040

In [None]:
scores = model.get_predictions()

In [None]:
scores.shape

In [None]:
user_id, item_id = 120, 320
scores[user_id, item_id]

In [89]:
save_obj(scores, os.path.join(data_path, 'outputs/predictions/' + dataset + '_' + method + '_' + model_type + '_scores.h5'))

## Calculate metrics

In [90]:
model.test(cutoffs)

> Making metrics for user 6000 / 6040

In [None]:
metrics = model.get_metrics()

In [None]:
metrics.keys()

In [None]:
for name, values in metrics.items():
    print(name, values.shape)

In [91]:
save_obj(metrics, os.path.join(data_path, 'outputs/metrics/' + dataset + '_' + method + '_' + model_type + '_metrics.h5'))

In [93]:
model.show_metrics(np.where(cutoffs == 10))

Precision: 0.1106 
 Recall: 0.0217 
 NDCG: 0.1149 
 Hit Rate: 0.326 
 Avg Popularity: 2512.4757 
 Category Diversity: 0.1917 
 Novelty: 1.2783 
 Item Coverage: 0.02 
 User Coverage: 0.326
