## Evaluate and compare the different models
Using the 10% of cross validated training set records  and the history I saved:

In [1]:
import pickle
import pandas as pd
import numpy as np

import os

#Keras
from keras.models import load_model
from keras import backend as K

# Tensorflow
import tensorflow as tf

from sklearn.metrics import mean_squared_error

Using TensorFlow backend.





### Set and Check GPUs

In [2]:
def set_check_gpu():
    cfg = K.tf.ConfigProto()
    cfg.gpu_options.per_process_gpu_memory_fraction =1 # allow all of the GPU memory to be allocated
    # for 8 GPUs
    # cfg.gpu_options.visible_device_list = "0,1,2,3,4,5,6,7" # "0,1"
    # for 1 GPU
    cfg.gpu_options.visible_device_list = "0"
    #cfg.gpu_options.allow_growth = True  # # Don't pre-allocate memory; dynamically allocate the memory used on the GPU as-needed
    #cfg.log_device_placement = True  # to log device placement (on which device the operation ran)
    sess = K.tf.Session(config=cfg)
    K.set_session(sess)  # set this TensorFlow session as the default session for Keras

    print("* TF version: ", [tf.__version__, tf.test.is_gpu_available()])
    print("* List of GPU(s): ", tf.config.experimental.list_physical_devices() )
    print("* Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU'))) 
  
    
    os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID";
    # set for 8 GPUs
#     os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7";
    # set for 1 GPU
    os.environ["CUDA_VISIBLE_DEVICES"] = "0";

    # Tf debugging option
    tf.debugging.set_log_device_placement(True)

    gpus = tf.config.experimental.list_physical_devices('GPU')

    if gpus:
        try:
            # Currently, memory growth needs to be the same across GPUs
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
            print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            print(e)

#     print(tf.config.list_logical_devices('GPU'))
    print(tf.config.experimental.list_physical_devices('GPU'))
    print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

In [3]:
set_check_gpu()

* TF version:  ['1.15.2', True]
* List of GPU(s):  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'), PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
* Num GPUs Available:  1
1 Physical GPUs, 1 Logical GPUs
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Num GPUs Available:  1


## Name of trained model

In [4]:
from os import listdir
from os.path import isfile, join

mypath = '../models'

onlyfiles = [f.replace('.h5', '') for f in listdir(mypath) if isfile(join(mypath, f))]
onlyfiles

['dense_1_Multiply_50_embeddings_4_epochs_dropout',
 'dense_5_Multiply_50_embeddings_10_epochs_dropout',
 'matrix_facto_10_embeddings_100_epochs',
 'dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout',
 'dense_1_Multiply_50_embeddings_100_epochs_dropout']

In [5]:
models_history =['dense_1_Multiply_50_embeddings_4_epochs_dropout',
 'dense_5_Multiply_50_embeddings_10_epochs_dropout',
 'matrix_facto_10_embeddings_100_epochs',
 'dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout',
 'dense_1_Multiply_50_embeddings_100_epochs_dropout']

### Compare MSE validation error / Train error

In [6]:
hist_path = "../histories/"

validation_error = {}
train_error = {}

for val in models_history:
    with open(hist_path +  val +'.pkl', 'rb') as file_pi:
        thepickle = pickle.load(file_pi)
        
        validation_error[val]=np.min(thepickle["val_loss"])
        train_error[val]=np.min(thepickle["loss"])
        
validation_error = pd.Series(validation_error)
train_error = pd.Series(train_error)
print ("MSE validation error \n",validation_error.sort_values(ascending=True).head(20))
print ("\nTrain error \n",train_error.sort_values(ascending=True).head(20))

MSE validation error 
 dense_1_Multiply_50_embeddings_4_epochs_dropout           1.573231
dense_1_Multiply_50_embeddings_100_epochs_dropout         1.577702
dense_5_Multiply_50_embeddings_10_epochs_dropout          1.599752
dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout     1.608152
matrix_facto_10_embeddings_100_epochs                    18.228967
dtype: float64

Train error 
 matrix_facto_10_embeddings_100_epochs                    0.026353
dense_1_Multiply_50_embeddings_100_epochs_dropout        0.360574
dense_1_Multiply_50_embeddings_4_epochs_dropout          0.889537
dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout    0.913438
dense_5_Multiply_50_embeddings_10_epochs_dropout         1.125387
dtype: float64


### We can notice the following points from the above:

- Performance got way better when using neural network comparing to using matrix factorization.

- When using neural network, I converge to the best model very quickly, sometimes after 2 epochs and after that the model starts overfitting or at least the validation error does not seem to go down anymore. Matrix factorization does not converge at all.

- Adding epochs lead to overfitting

- Adding layers (over 3) does not help much and actually leads to overfitting

- Changing the number of hidden units does not help.

- Simplifying the model by reducing embedding size does not help either.

- Choosing large values of embedding has made a small improvement in the results.

- Multiply or concatenate user and item embeddings does not seem to matter, but concatenate seems to give little better results

- Training with Dropout seem to prevent some overfitting

- Adding dense layers on top of the embeddings before the merge helps a bit.

- Adding some metadata lead to some improvement in the results.

- Running on a larger dataset does not help either, because the data in both datasets is very skewed.


In [7]:
!pwd

/home/ec2-user/SageMaker/dse260-CapStone-Amazon/3-**Final-Keras-DeepRecommender-Shoes/3_Evaluate_And_Prediction


In [8]:
!ls -al

total 32
drwxrwxr-x  3 ec2-user ec2-user  4096 May 27 20:25 .
drwxrwxr-x 11 ec2-user ec2-user  4096 May 27 18:24 ..
-rw-rw-r--  1 ec2-user ec2-user 16856 May 27 20:25 *Evaluate_And_Predict.ipynb
drwxrwxr-x  2 ec2-user ec2-user  4096 May 27 17:25 .ipynb_checkpoints


## Predict - Verifying the performance on the test set.
- Check whether our results are reproducible on unseen data.
- Test on new data using previously saved models.
- I got the following results on the test set:

In [15]:
from sklearn.model_selection import train_test_split

review_data = pd.read_csv('../data/amazon_reviews_us_Shoes_v1_00_help_voted_And_cut_lognTail.csv')
review_data.rename(columns={ 'star_rating': 'score','customer_id': 'user_id', 'user': 'user_name'}, inplace=True)

items = review_data.product_id.unique()
item_map = {i:val for i,val in enumerate(items)}
inverse_item_map = {val:i for i,val in enumerate(items)}
review_data["old_item_id"] = review_data["product_id"] # copying for join with metadata
review_data["item_id"] = review_data["product_id"].map(inverse_item_map)
items = review_data.item_id.unique()
print ("We have %d unique items in metadata "%items.shape[0])

users = review_data.user_id.unique()
user_map = {i:val for i,val in enumerate(users)}
inverse_user_map = {val:i for i,val in enumerate(users)}
review_data["old_user_id"] = review_data["user_id"] 
review_data["user_id"] = review_data["user_id"].map(inverse_user_map)

items_reviewed = review_data.product_id.unique()
review_data["old_item_id"] = review_data["product_id"] # copying for join with metadata
review_data["item_id"] = review_data["product_id"].map(inverse_item_map)

items_reviewed = review_data.item_id.unique()
users = review_data.user_id.unique()
helpful_votes = review_data.helpful_votes.unique()



ratings_train, ratings_test = train_test_split( review_data, test_size=0.1, random_state=0)

We have 97758 unique items in metadata 


In [16]:
models =['dense_1_Multiply_50_embeddings_4_epochs_dropout',
 'dense_5_Multiply_50_embeddings_10_epochs_dropout',
 'matrix_facto_10_embeddings_100_epochs',
 'dense_1_Multiply_50_embeddings_100_epochs_dropout']

models_with_Meta =[
 'dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout'
]

In [17]:
load_path = "../models/"

perfs = {}

for mod in models:
    model = load_model(load_path+mod+'.h5')
    ratings_test['preds_' + mod] = model.predict([ratings_test['user_id'],
                                                  ratings_test['item_id']])
    perfs[mod] = mean_squared_error(ratings_test['score'], ratings_test['preds_'+mod])

perfs= pd.Series(perfs)
perfs.sort_values(ascending=True).head(20)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


dense_5_Multiply_50_embeddings_10_epochs_dropout      1.596477
dense_1_Multiply_50_embeddings_4_epochs_dropout       1.615289
dense_1_Multiply_50_embeddings_100_epochs_dropout     1.616922
matrix_facto_10_embeddings_100_epochs                18.230796
dtype: float64

In [19]:
perfs = {}

for mod in models_with_Meta:
    model = load_model(load_path+mod+'.h5')
    ratings_test['preds_' + mod] = model.predict([ratings_test["user_id"]
                                                , ratings_test["item_id"]
                                                , ratings_test["helpful_votes"]
                                                ])
    
    perfs[mod] = mean_squared_error(ratings_test['score'], ratings_test['preds_'+mod]) ## MSE between real score and prdicted score

perfs= pd.Series(perfs)
perfs


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


dense_5_Meta_Multiply_50_embeddings_10_epochs_dropout    1.605814
dtype: float64

### MSE on test data is very similar to what I got on the evaluation data
### The best result  on both the internal keras random cross validation scheme and test-set acheived when using 5 layers, 5 layered concatenated embeddings, Dropout and 10 epochs
### I will use this model further for executing recommendations (dense_5_Multiply_50_embeddings_10_epochs_dropout )

## Conclusion
- In this work I created and compared 2 models for predicting user's ratings on top of Amazon's review data: a matrix factorization model and deep network model, and used the models for recommending items to users.

- I showed that using deep neural networks can achieve better performance than using matrix factorization. 

- Going deeper (more than 3 layers) seems to lead to overfitting and not to further improvement.

- Adding epochs, reducing embedding size or change hidden units numbers does not help either.

- Running on a larger dataset does not help either, because the data in both datasets is very skewed.

- Choosing large values of embedding (50) and adding dense layers on top of the embeddings before concatenating helps a bit.

- Adding metadata and training with Dropout lead to some improvement in the results.

- The fact that the data is so sparsed and skewed has a huge impact on the ability to model the recommendation problem and to achieve smaller test MSE.

