## Online Deep Factorization Machine
Online factorization models take single data as an input, make a prediction, and train with the data. 
This notebook demonstrates fitting online models with an adam optimizer and those with hedge backpropagation (HBP) to criteo data.

### 1. Setup
The from models imports the package for use. We have also imported a few other packages for plotting.

In [3]:
import sys
sys.path.append('./../')


from utils import data_preprocess, plot
import os
import pickle
import numpy as np
import torch

from time import time

from models.models_online_deep.deepfm_adam import DeepFMAdam
from models.models_online_deep.deepfm_onn import DeepFMOnn
from models.models_online_deep.nfm_adam import NFMAdam
from models.models_online_deep.nfm_onn import NFMOnn
from models.models_online_deep.fm_adam import FMAdam

from models.models_online_deep.afm_adam import AFMAdam

In [4]:
save_log = os.getcwd() + '/performance/save_log/'
save_model = os.getcwd() + '/performance/save_model/'

### 2. Create a criteo dataset
A dataset for a factorization machine requires indices, values, and labels. 

In [9]:
dataset_path = './../dataset/criteo/'
dataset_input_path = dataset_path + 'tiny_train_input.csv'
dataset_emb_path = dataset_path + 'tiny_train_input.csv'


train_dict = data_preprocess.read_criteo_data(dataset_input_path, dataset_emb_path)
train_dict_size = train_dict['size']

In [10]:
dataset_path = './../dataset/criteo/'
dataset_input = dataset_path + 'tiny_train_input.csv'
dataset_emb = dataset_path + 'tiny_train_input.csv'

num_batchdata = 2500
num_batch = 10
data_config = 7
# data_config = 3

if data_config == "Iteration":
    batch_train_Xi_list, batch_train_Xv_list, batch_train_Y_list, ratio_list \
        = data_preprocess.create_ten_iter(dataset_input_path , dataset_emb_path, num_batch, num_batchdata)

elif isinstance(data_config, int):
    batch_train_Xi_list, batch_train_Xv_list, batch_train_Y_list, ratio_list \
        = data_preprocess.create_dataset(dataset_input_path , dataset_emb_path, data_config, num_batch, num_batchdata)

else:
    batch_train_Xi_list, batch_train_Xv_list, batch_train_Y_list, ratio_list \
        = data_preprocess.create_dataset(dataset_input_path , dataset_emb_path, int(num_batch/2), num_batch, num_batchdata)

In [11]:
num_hidden_layers = 5
neuron_per_hidden_layer = 10
data_feature_dim = 39
embedding_size = 10
n = 0.0001

feature_sizes = [63, 113, 126, 51, 224, 148, 100, 79, 104, 9, 32, 57, 82, 1457, 555, 176373, 129683, 305, 19, 11887,
                 632, 3, 41738, 5170, 175446, 3170, 27, 11356, 165602, 10, 4641, 2030, 4, 172761, 18, 15, 57903, 86,
                 44549]
# num_feature = sum(feature_sizes)

### 3. Create deep factorization machines and online deep factorization machines
There are five models in this notebook: DeepFM, NFM with both an adam optimizer and HB, and FM with an adam optimizer. Treatments for each models are referenced from [PyTorch Implementations of Factorization Machines](https://github.com/nzc/dnn_ctr).

In [13]:
model_list = [
    NFMAdam(feature_sizes,
            embedding_size=embedding_size,
            num_hidden_layers=num_hidden_layers,
            neuron_per_hidden_layer=neuron_per_hidden_layer,
            n=n),
    NFMOnn(feature_sizes,
           embedding_size=embedding_size,
           num_hidden_layers=num_hidden_layers,
           neuron_per_hidden_layer=neuron_per_hidden_layer,
           n=n),
    DeepFMAdam(feature_sizes,
               embedding_size=embedding_size,
               num_hidden_layers=num_hidden_layers,
               neuron_per_hidden_layer=neuron_per_hidden_layer,
               n=n),
    DeepFMOnn(feature_sizes,
              embedding_size=embedding_size,
              num_hidden_layers=num_hidden_layers,
              neuron_per_hidden_layer=neuron_per_hidden_layer,
              n=n),
    FMAdam(feature_sizes,
           embedding_size=embedding_size,
           n=n)
]

In [14]:
model_name_list = [str(model).split('-')[0] for model in model_list]
print(model_name_list)

['NFMAdam', 'NFMOnn', 'DeepFMAdam', 'DeepFMOnn', 'FMAdam']


### 4. Pretrain the models
Models are pretrained with a batch of data from a dataset made above.

In [15]:
for ith_model, ith_model_name in zip(model_list, model_name_list):
    print(f"====={ith_model_name}=====")
    for j in range(1000):
        loss_emb = ith_model.update_embedding(batch_train_Xi_list[int(num_batch/2)],
                                              batch_train_Xv_list[int(num_batch/2)],
                                              batch_train_Y_list[int(num_batch/2)])
        pred_label = ith_model.predict(batch_train_Xi_list[int(num_batch/2)],
                                       batch_train_Xv_list[int(num_batch/2)])

        if j % 100 == 0:
            print('i th iter %d , loss : %f' % (j, loss_emb.cpu().data))
            right_count = len((np.where(np.asarray(pred_label) == np.asarray(batch_train_Y_list[int(num_batch/2)])))[0])
            total_count = len(np.asarray(batch_train_Y_list[int(num_batch/2)]))
            print('training accuracy : %.4f\n' % (right_count / total_count))

=====NFMAdam=====
i th iter 0 , loss : 0.649882
training accuracy : 0.5836

i th iter 100 , loss : 0.635214
training accuracy : 0.5892

i th iter 200 , loss : 0.629482
training accuracy : 0.5904

i th iter 300 , loss : 0.624124
training accuracy : 0.5964

i th iter 400 , loss : 0.619241
training accuracy : 0.6048

i th iter 500 , loss : 0.615242
training accuracy : 0.6072

i th iter 600 , loss : 0.611514
training accuracy : 0.6096

i th iter 700 , loss : 0.609084
training accuracy : 0.6116

i th iter 800 , loss : 0.607181
training accuracy : 0.6168

i th iter 900 , loss : 0.604746
training accuracy : 0.6216

=====NFMOnn=====
i th iter 0 , loss : 0.656520
training accuracy : 0.6996

i th iter 100 , loss : 0.641535
training accuracy : 0.6996

i th iter 200 , loss : 0.635665
training accuracy : 0.6996

i th iter 300 , loss : 0.630929
training accuracy : 0.6996

i th iter 400 , loss : 0.628255
training accuracy : 0.6996

i th iter 500 , loss : 0.623599
training accuracy : 0.6996

i th iter

### 5. Fit models to the whole dataset
After creating instances of models and pretraining them, we conduct the online task where embedding and weight parameters are updated.

In [16]:
result_dict = {}
result_dict['roc'] = {}
result_dict['data_ratio'] = {}
result_dict['time'] = {}
result_dict['accuracy'] = {}

result_dict['num_batch'] = num_batch
result_dict['num_batchdata'] = num_batchdata
result_dict['user_auc_mean'] = {}

In [17]:
for ith_exp in range(num_batch):
    print('#' * 100)

    for jth_model_name, jth_model in zip(model_name_list, model_list):
        print('%d th batch, %s model' % (ith_exp + 1, jth_model_name))
        print('neg ratio : %d,  pos ratio %d ' % (ratio_list[ith_exp][0], ratio_list[ith_exp][1]))

        time_elapsed, accuracy, roc, confusion_matrix\
            = jth_model.run_experiment(batch_train_Xi_list[ith_exp], batch_train_Xv_list[ith_exp], batch_train_Y_list[ith_exp])

        print('fpr : %.4f , tpr : %.4f ' % (roc['fpr'], roc['tpr']))
        print('confusion matrix : %s' % confusion_matrix)
        print('accuracy : %.4f \n' % accuracy)

        if ith_exp == 0:
            result_dict['roc'][jth_model_name] = [roc]
            result_dict['data_ratio'][jth_model_name] = [ratio_list[ith_exp]]
            result_dict['time'][jth_model_name] = [time_elapsed]
            result_dict['accuracy'][jth_model_name] = [accuracy]

        else:
            result_dict['roc'][jth_model_name].append(roc)
            result_dict['data_ratio'][jth_model_name].append(ratio_list[ith_exp])
            result_dict['time'][jth_model_name].append(time_elapsed)
            result_dict['accuracy'][jth_model_name].append(accuracy)

####################################################################################################
1 th batch, NFMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.8493 , tpr : 0.8846 
confusion matrix : {'tp': 1548, 'fp': 637, 'tn': 113, 'fn': 202}
accuracy : 66.4400 

1 th batch, NFMOnn model
neg ratio : 7,  pos ratio 3 
fpr : 1.0000 , tpr : 1.0000 
confusion matrix : {'tp': 1750, 'fp': 750, 'tn': 0, 'fn': 0}
accuracy : 70.0000 

1 th batch, DeepFMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.5533 , tpr : 0.6349 
confusion matrix : {'tp': 1111, 'fp': 415, 'tn': 335, 'fn': 639}
accuracy : 57.8400 

1 th batch, DeepFMOnn model
neg ratio : 7,  pos ratio 3 
fpr : 0.6387 , tpr : 0.6829 
confusion matrix : {'tp': 1195, 'fp': 479, 'tn': 271, 'fn': 555}
accuracy : 58.6400 

1 th batch, FMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.5400 , tpr : 0.5749 
confusion matrix : {'tp': 1006, 'fp': 405, 'tn': 345, 'fn': 744}
accuracy : 54.0400 

##################################################

fpr : 0.6507 , tpr : 0.7183 
confusion matrix : {'tp': 1257, 'fp': 488, 'tn': 262, 'fn': 493}
accuracy : 60.7600 

9 th batch, FMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.5733 , tpr : 0.5926 
confusion matrix : {'tp': 1037, 'fp': 430, 'tn': 320, 'fn': 713}
accuracy : 54.2800 

####################################################################################################
10 th batch, NFMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.9853 , tpr : 0.9897 
confusion matrix : {'tp': 1732, 'fp': 739, 'tn': 11, 'fn': 18}
accuracy : 69.7200 

10 th batch, NFMOnn model
neg ratio : 7,  pos ratio 3 
fpr : 1.0000 , tpr : 1.0000 
confusion matrix : {'tp': 1750, 'fp': 750, 'tn': 0, 'fn': 0}
accuracy : 70.0000 

10 th batch, DeepFMAdam model
neg ratio : 7,  pos ratio 3 
fpr : 0.7093 , tpr : 0.7697 
confusion matrix : {'tp': 1347, 'fp': 532, 'tn': 218, 'fn': 403}
accuracy : 62.6000 

10 th batch, DeepFMOnn model
neg ratio : 7,  pos ratio 3 
fpr : 0.6760 , tpr : 0.7103 
confusion matrix : 

### 6. Save models
When the training is done, models are saved to a designated directory as pickle files. Their ROC scores, accuracy scores, and amount of time required are also saved.

In [18]:
save_filename = 'Time_Stamp' + str(int(time()))\
                + '-Dataset' + str('criteo') \
                + '-Num_BatchLength' + str(num_batchdata) \
                + '-Num_Batch' + str(num_batch) \
                + '-Num_Hidden_Layers' + str(num_hidden_layers) \
                + '-Neuron_Per_Hidden_Layer' + str(neuron_per_hidden_layer) \
                + '_' + str(data_config)

with open(save_log + save_filename + '.pickle', 'wb') as f:
    pickle.dump(result_dict, f)

for ith_model, ith_model_name in zip(model_list, model_name_list):
    with open(save_model + str(ith_model_name) + '.pickle', 'wb') as f:
        pickle.dump(ith_model, f)

print('save_log : %s' % (save_log + save_filename + '.pickle'))
print('save_model : %s' % (save_model))

FileNotFoundError: [Errno 2] No such file or directory: '/home/yohan/Myenv/Project/kaist_cccproject_yeji/jupyters/performance/save_log/Time_Stamp1578479969-Datasetcriteo-Num_BatchLength2500-Num_Batch10-Num_Hidden_Layers5-Neuron_Per_Hidden_Layer10_7.pickle'

### 7. Plot graphs
We are able to draw ROC curves and accuracy score graphs according to the models' performance. For the ROC curve, the x-axis indicates false positive rates (FPR or (1-specificity)) and the y-axis indicates true positive rates (TPR or sensitivity). The x-axis and the y-axis in the accuracy score graph indicates the sequence of iteration and number of correct answers that the model has made. 

In [None]:
plot.draw_roc_graph("performance/save_log/", "Time_Stamp1578375578-Datasetcriteo-Num_BatchLength2500-Num_Batch10-Num_Hidden_Layers5-Neuron_Per_Hidden_Layer10_7" + '.pickle')

In [None]:
plot.draw_acc_graph("performance/save_log/", "Time_Stamp1578375578-Datasetcriteo-Num_BatchLength2500-Num_Batch10-Num_Hidden_Layers5-Neuron_Per_Hidden_Layer10_7" + '.pickle')