# Customized Product Proposal - Recommender System

--------------------------------------------------------

@author: jl-datascientist (Jérémy Lecourt)

Objectif:
Système avancé de recommandation de produits afin d'effectuer des propositions personnalisées selon les profils des clients.

Cadre du projet:
Application pour élaborer une proposition personnalisée de Films adaptée selon les profils des individus, en utilisant un jeux de données obtenus depuis le site Imdb.com, plus grand recueil de films/notations utilisateurs sur la toile.

--------------------------------------------------------

# ETAPE 4 : Advanced modelisation and optimisation

# 0 - Import dataset and advanced filtering

In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import des données de notations à partir du Dataset constitué en ETAPE 1
# Ajout d'un filtre temporel suite à la DataVisualisation effectuée en ETAPE 2
# Le Dataset est filtré sur l'année 2014, dernière année complète disponible dans le jeux de données
# Cela permet d'obtenir un Dataset cohérent, à jour, et de taille adéquate : de l'ordre de 10^5 ratings

data_r = pd.read_csv(r"Dataset\data_t0.csv")
data_r.head()


Unnamed: 0,movieId,originalTitle,averageRating,numVotes,userId,rating,timestamp,date_rating,year
0,1,Toy Story,8.3,867989,107,4.0,1397338073,2014-04-12 21:27:53,2014
1,1,Toy Story,8.3,867989,133,4.0,1397182851,2014-04-11 02:20:51,2014
2,1,Toy Story,8.3,867989,136,5.0,1415635425,2014-11-10 16:03:45,2014
3,1,Toy Story,8.3,867989,215,2.0,1419568797,2014-12-26 04:39:57,2014
4,1,Toy Story,8.3,867989,285,2.5,1392514571,2014-02-16 01:36:11,2014


In [14]:
# Analyse du dataset en vue de modéliser un Recommender System

Nb_ratings = len(data_r)
Nb_users = len(data_r.groupby('userId'))
Nb_movies = len(data_r.groupby('movieId'))

print("Nb Ratings : ", Nb_ratings)
print("Nb Users : ", Nb_users)
print("Nb Movies : ", Nb_movies)
print("Sparsity : ", round(1 - Nb_ratings /(Nb_users*Nb_movies),3)*100, "%", "\n")

#print("Stats Nb ratings per user:\n\n", data_r['userId'].value_counts().describe(), "\n")
#print("Stats Nb ratings per movie:\n\n", data_r['movieId'].value_counts().describe(), "\n")
print("Stats Ratings:\n\n", data_r['rating'].describe(), "\n")

Nb Ratings :  290101
Nb Users :  5844
Nb Movies :  5144
Sparsity :  99.0 % 

Stats Ratings:

 count    290101.000000
mean          3.569781
std           1.043957
min           0.500000
25%           3.000000
50%           4.000000
75%           4.500000
max           5.000000
Name: rating, dtype: float64 



In [15]:
# Data filtering (=> Reduct Sparsity = 99%): 
# On peut écarter les films très peu populaires et les clients trop occasionnels, qui n'apportent pas d'intérêt particulier
# Cela permet d'éviter les données pour lesquels une prédiction correcte est très difficile voire complètement illusoire, et d'éliminer un biais potentiel
# Avec un filtre raisonnable sur l'axe client et produit, on s'attend à réduire la dispersion de la donnée (sparsity improvement)

rates_f0 = data_r.copy()

# -> Filter to get sufficient number of Votes per userId (=> 15 minimum)

Filter_User = rates_f0.groupby('userId')['movieId'].count().reset_index()
Filter_User.columns = ['userId', 'nbMovie']

rates_f0 = rates_f0.merge(Filter_User)
print(rates_f0.shape)

rates_f1 = rates_f0[rates_f0['nbMovie']>=15]
#rates_f1 = rates_f1[rates_f1['nbMovie']<=100]
print(rates_f1.shape)

(290101, 10)
(279353, 10)


In [16]:
# -> Filter to get sufficient number of Votes per MovieId (=> 100 minimum)

Filter_Movie = rates_f1.groupby('movieId')['userId'].count().reset_index()
Filter_Movie.columns = ['movieId', 'nbUser']

rates_f1 = rates_f1.merge(Filter_Movie)
print(rates_f1.shape)

rates_f2 = rates_f1[rates_f1['nbUser']>=100]
#rates_f2 = rates_f2[rates_f2['nbUser']<=100]

print(rates_f2.shape)

(279353, 11)
(210582, 11)


In [17]:
# -> Drop filter columns
data_r = rates_f2.iloc[:,:8]
data_r.shape

(210582, 8)

In [18]:
# Analyse du dataset filtré en vue de modéliser un Recommender System

Nb_ratings = len(data_r)
Nb_users = len(data_r.groupby('userId'))
Nb_movies = len(data_r.groupby('movieId'))
print("Nb Ratings : ", Nb_ratings)
print("Nb Users : ", Nb_users)
print("Nb Movies : ", Nb_movies)
print("Sparsity : ", round(1 - Nb_ratings /(Nb_users*Nb_movies),3)*100, "%", "\n")

#print("Stats Nb ratings per user:\n\n", data_r['userId'].value_counts().describe(), "\n")
#print("Stats Nb ratings per movie:\n\n", data_r['movieId'].value_counts().describe(), "\n")
print("Stats Ratings:\n\n", data_r['rating'].describe(), "\n")


Nb Ratings :  210582
Nb Users :  3929
Nb Movies :  695
Sparsity :  92.30000000000001 % 

Stats Ratings:

 count    210582.000000
mean          3.678605
std           1.004736
min           0.500000
25%           3.000000
50%           4.000000
75%           4.500000
max           5.000000
Name: rating, dtype: float64 



In [19]:
#l_0 = []
#l_10 = []

#for i in list(data_r['userId'].value_counts()):
#    if i >= 10:
#        l_10.append(i)
#    else:
#        l_0.append(i)
#print(round(len(l_0)/len(l_10)*100,2),'%')

In [20]:
# Sauvegarde du dataset obtenu pour la modélisation
df = pd.DataFrame(data = data_r[['movieId','userId','rating']])
df.to_csv(r'Dataset\data_t0f.csv', index = False)

print(df.shape)

(210582, 3)


On obtient un Dataset prêt pour l'entrainement et l'évaluation des modèles, avec une taille d'environ 210 000 notes utilisateurs, avec 695 films et 3929 utilisateurs, soit une dispersion de la donnée réduite à 92,3%.

## 1 - Modélisations retenues et premières évaluations

In [60]:
# Chargement du Dataset en utilisant la classe "Surprise" appropriée

from surprise import Dataset
from surprise import Reader

df = pd.read_csv(r"Dataset\data_t0f.csv")

reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(df, reader)

print(df.shape, data)

from surprise.model_selection import train_test_split

trainset, testset = train_test_split(data, test_size=.25)
print('Longueur Testset:',len(testset))

(210582, 3) <surprise.dataset.DatasetAutoFolds object at 0x00000137E35DAEB0>
Longueur Testset: 52646


In [61]:
# Définition des modèles retenus

from surprise import NormalPredictor
from surprise import KNNBasic
from surprise import SVD

algo_NP = NormalPredictor()           #Normal distribution approach

sim_opt_I = {"user_based": False}     #Item-based with KNN approach
algo_KNN_I = KNNBasic(sim_options=sim_opt_I)

sim_opt_U = {"user_based": True}      #User-based with KNN approach
algo_KNN_U = KNNBasic(sim_options=sim_opt_U)

algo_SVD = SVD()                      #Matrix factorisation approach

In [62]:
# Entraînement des modèles

print('\n','NormalPredictor...')
algo_NP.fit(trainset)

print('\n','KNN Item-based...')
algo_KNN_I.fit(trainset)

print('\n','KNN User-based...')
algo_KNN_U.fit(trainset)

print('\n','SVD...')
algo_SVD.fit(trainset)

print('\n','Fin entraînement!')



 NormalPredictor...

 KNN Item-based...
Computing the msd similarity matrix...
Done computing similarity matrix.

 KNN User-based...
Computing the msd similarity matrix...
Done computing similarity matrix.

 SVD...

 Fin entraînement!


In [63]:
# Evaluation des modèles
from surprise import accuracy

print('Evaluation des modèles','\n')

pred_NP = algo_NP.test(testset)
acc_NP = accuracy.rmse(pred_NP)
print('NormalPredictor :',acc_NP,'\n')

pred_KNN_I = algo_KNN_I.test(testset)
acc_KNN_I = accuracy.rmse(pred_KNN_I)
print('KNN Item-based :',acc_KNN_I,'\n')

pred_KNN_U = algo_KNN_U.test(testset)
acc_KNN_U = accuracy.rmse(pred_KNN_U)
print('KNN User-based :',acc_KNN_U,'\n')

pred_SVD = algo_SVD.test(testset)
acc_SVD = accuracy.rmse(pred_SVD)
print('SVD :',acc_SVD,'\n')


Evaluation des modèles 

RMSE: 1.3700
NormalPredictor : 1.3700431765306664 

RMSE: 0.8670
KNN Item-based : 0.8670141347697389 

RMSE: 0.8422
KNN User-based : 0.8421772194208027 

RMSE: 0.8112
SVD : 0.8111849663674929 



In [64]:
# Répartition des prédictions du meilleur modèle
p_0 = []
p_1 = []
p_2 = []
p_3 = []
p_4 = []
p_5 = []

for i in list(range(len(pred_SVD))):
    if pred_SVD[i][3] < 1.5:
        p_0.append(pred_SVD[i][3])
    elif pred_SVD[i][3] <= 2.9:
        p_1.append(pred_SVD[i][3])
    elif pred_SVD[i][3] <= 3.4:
        p_2.append(pred_SVD[i][3])
    elif pred_SVD[i][3] <= 3.9:
        p_3.append(pred_SVD[i][3])
    elif pred_SVD[i][3] <= 4.4:
        p_4.append(pred_SVD[i][3])
    elif pred_SVD[i][3] > 4.4:
        p_5.append(pred_SVD[i][3])
print('Répartition des prédictions autour de la moyenne (', round(df.rating.mean(),1),'/ 5 ) :','\n')
print('Entre 0.5 et 1.5: ',round(len(p_0)/len(testset)*100,2),' %')
print('Entre 1.5 et 2.9: ',round(len(p_1)/len(testset)*100,2),' %')
print('Entre 2.9 et 3.4: ',round(len(p_2)/len(testset)*100,2),' %')
print('Entre 3.4 et 3.9: ',round(len(p_3)/len(testset)*100,2),' %')
print('Entre 3.9 et 4.4: ',round(len(p_4)/len(testset)*100,2),' %')
print('Entre 4.4 et 5.0: ',round(len(p_5)/len(testset)*100,2),' %')

Répartition des prédictions autour de la moyenne ( 3.7 / 5 ) : 

Entre 0.5 et 1.5:  0.3  %
Entre 1.5 et 2.9:  9.98  %
Entre 2.9 et 3.4:  18.5  %
Entre 3.4 et 3.9:  32.4  %
Entre 3.9 et 4.4:  29.24  %
Entre 4.4 et 5.0:  9.58  %


In [65]:
# Sauvegarde du meilleur modèle et de ses prédictions

import time
import os
from surprise.dump import dump

date = time.strftime('%y%m%d-%Hh%Mm%S', time.localtime())
file_name = 'AlgoEval_BM_' + algo_SVD.__class__.__name__ + '_' + date
#file_name += '-fold{0}'.format(fold_i + 1)
dump_dir = os.path.expanduser('~') + '/Desktop/DS-Bootcamp/Projet/Projet-RS_Films/Model/'
file_name = os.path.join(dump_dir, file_name)

dump(file_name, pred_SVD, algo_SVD)

In [71]:
testset[10]

(2858, 115430, 5.0)

In [72]:
# Evaluation d'une prédiction en particulier (exple : User 2858 sur Movie 115430)

prediction = algo_SVD.predict(2858,115430)
prediction.est

4.711222738997067

Le meilleur modèle avant optimisation semble être le modèle SVD, suivi de près par le modèle avec une approche de type KNN (User-Based). On envisage donc d'optimiser ces 2 modèles.

## 2 - Optimisation KNN User-Based (parameters tuning)

In [44]:
# Chargement du Dataset

from surprise import Dataset
from surprise import Reader

df = pd.read_csv(r"Dataset\data_t0f.csv")

reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(df, reader)

print(df.shape, data)

(210582, 3) <surprise.dataset.DatasetAutoFolds object at 0x00000137F32E6460>


In [45]:
# Split Entraînement / Validation en vue du GridSearch

# Mélange des données
import random
raw_ratings = data.raw_ratings
random.shuffle(raw_ratings)

# Train = 80% , Test = 20%
threshold = int(.8 * len(raw_ratings))
Train_raw_ratings = raw_ratings[:threshold]
Test_raw_ratings = raw_ratings[threshold:]

data.raw_ratings = Train_raw_ratings  # data is now the Trainset

In [47]:
from surprise import KNNBasic
from surprise.model_selection import GridSearchCV

# Entrainement de la grille par CrossValidation sur le Trainset (NbCombinations x NbCV)
param_grid = {'k': [10, 20, 50, 100], 'min_k': [1,2], 'sim_options': 
                                          {'name': ['msd', 'cosine'],
                                            'min_support': [1,2],
                                              'user_based': [True,False]}
             }
grid_search = GridSearchCV(KNNBasic, param_grid, measures=['rmse'], cv=5, n_jobs = -1)
grid_search.fit(data)

In [48]:
grid_search.best_score

{'rmse': 0.8384364965819039}

In [49]:
grid_search.best_params

{'rmse': {'k': 20,
  'min_k': 2,
  'sim_options': {'name': 'msd', 'min_support': 2, 'user_based': True}}}

In [51]:
results_df = pd.DataFrame.from_dict(grid_search.cv_results)
best_results_KNNBasic_t0f = results_df[results_df['rank_test_rmse'] < 10][['mean_test_rmse','std_test_rmse','rank_test_rmse','mean_fit_time','mean_test_time','param_k','param_min_k','param_sim_options']]

In [23]:
# SAVE RESULTS FOR KNNMeans
#results_df.to_csv(r'Model\All_GSresults_KNNBasic_t0f_210314.csv', index = False)
#best_results_KNNBasic_t0f.to_csv(r'Model\Best_GSresults_KNNMeans_t0f_210314.csv', index = False)

In [52]:
best_results_KNNBasic_t0f.head(10)
#best_results_KNNMeans_t0f['param_sim_options'].astype(str).str.split(',',expand=True).head(10)

Unnamed: 0,mean_test_rmse,std_test_rmse,rank_test_rmse,mean_fit_time,mean_test_time,param_k,param_min_k,param_sim_options
0,0.84076,0.003445,7,1.819134,10.425914,10,1,"{'name': 'msd', 'min_support': 1, 'user_based'..."
1,0.840206,0.003394,6,2.05211,11.611143,10,1,"{'name': 'msd', 'min_support': 2, 'user_based'..."
4,0.840766,0.003424,8,1.963145,10.724322,10,2,"{'name': 'msd', 'min_support': 1, 'user_based'..."
5,0.840174,0.003417,5,2.028371,12.033615,10,2,"{'name': 'msd', 'min_support': 2, 'user_based'..."
8,0.838817,0.003665,3,1.989877,12.904284,20,1,"{'name': 'msd', 'min_support': 1, 'user_based'..."
9,0.838469,0.003666,2,1.737153,11.936274,20,1,"{'name': 'msd', 'min_support': 2, 'user_based'..."
12,0.838824,0.003643,4,1.847259,12.047775,20,2,"{'name': 'msd', 'min_support': 1, 'user_based'..."
13,0.838436,0.003692,1,1.673923,11.868255,20,2,"{'name': 'msd', 'min_support': 2, 'user_based'..."
21,0.850727,0.003938,9,1.716608,14.564842,50,2,"{'name': 'msd', 'min_support': 2, 'user_based'..."


In [54]:
# Algo with best params
algo = grid_search.best_estimator['rmse']


In [55]:
# Retrain on the whole Trainset
trainset = data.build_full_trainset()
algo.fit(trainset)

Computing the msd similarity matrix...
Done computing similarity matrix.


<surprise.prediction_algorithms.knns.KNNBasic at 0x137f266d0d0>

In [56]:
# Compute biased accuracy on Trainset
from surprise import accuracy

predictions = algo.test(trainset.build_testset())
print('Biased accuracy on Trainset,', end='   ')
accuracy.rmse(predictions)

Biased accuracy on Trainset,   RMSE: 0.6553


0.6553469703170417

In [57]:
# Compute unbiased accuracy on Testset
testset = data.construct_testset(Test_raw_ratings)
predictions = algo.test(testset)
print('Unbiased accuracy on Testset,', end=' ')
accuracy.rmse(predictions)

Unbiased accuracy on Testset, RMSE: 0.8286


0.8285714310786031

In [59]:
import time
import os
from surprise.dump import dump

date = time.strftime('%y%m%d-%Hh%Mm%S', time.localtime())
file_name = 'AlgoEval_BP_KNN_' + algo.__class__.__name__ + '_' + date
#file_name += '-fold{0}'.format(fold_i + 1)
dump_dir = os.path.expanduser('~') + '/Desktop/DS-Bootcamp/Projet/Projet-RS_Films/Model/'
file_name = os.path.join(dump_dir, file_name)

dump(file_name, predictions, algo)

L'approche User-Based est bien meilleure que l'approche Item-Based, et la RMSE est légèrement améliorée suite à l'optimisation effectuée.

## 3 - Optimisation Matrix Factorisation SVD (parameters tuning)

In [None]:
# Chargement du Dataset

from surprise import Dataset
from surprise import Reader

df = pd.read_csv(r"Dataset\data_t0f.csv")

reader = Reader(rating_scale=(0.5, 5))
data = Dataset.load_from_df(df, reader)

print(df.shape, data)

In [10]:
# Split Entraînement / Validation en vue du GridSearch

# Mélange des données
import random
raw_ratings = data.raw_ratings
random.shuffle(raw_ratings)

# Train = 80% , Test = 20%
threshold = int(.8 * len(raw_ratings))
Train_raw_ratings = raw_ratings[:threshold]
Test_raw_ratings = raw_ratings[threshold:]

data.raw_ratings = Train_raw_ratings  # data is now the Trainset

In [11]:
from surprise import SVD
from surprise.model_selection import GridSearchCV

# Entrainement de la grille par CrossValidation sur le Trainset (NbCombinations x NbCV)
param_grid = {'n_factors': [50,80,100,120,150,200], 'init_mean': [0,1.0,2.5], 'init_std_dev': [0.05,0.1,0.2,0.5], 'lr_all': [0.002,0.005,0.01,0.05,0.1], 'reg_all': [0.02,0.05,0.1,0.5,1]}

grid_search = GridSearchCV(SVD, param_grid, measures=['mae', 'rmse', 'fcp'], cv=5, n_jobs = -1)
grid_search.fit(data)

In [12]:
grid_search.best_score

{'mae': 0.59257003938627,
 'rmse': 0.7939014637603912,
 'fcp': 0.7147300892988626}

In [13]:
grid_search.best_params

{'mae': {'n_factors': 200,
  'init_mean': 0,
  'init_std_dev': 0.05,
  'lr_all': 0.01,
  'reg_all': 0.05},
 'rmse': {'n_factors': 200,
  'init_mean': 0,
  'init_std_dev': 0.05,
  'lr_all': 0.01,
  'reg_all': 0.05},
 'fcp': {'n_factors': 200,
  'init_mean': 0,
  'init_std_dev': 0.05,
  'lr_all': 0.05,
  'reg_all': 0.05}}

In [14]:
results_df = pd.DataFrame.from_dict(grid_search.cv_results)
results_df.head()
best_results_SVD_t0f = results_df[results_df['rank_test_rmse'] < 10][['mean_test_mae','std_test_mae','rank_test_mae','mean_test_rmse','std_test_rmse','rank_test_rmse','mean_test_fcp','std_test_fcp','rank_test_fcp','mean_fit_time','mean_test_time','param_n_factors','param_init_mean','param_init_std_dev','param_lr_all','param_reg_all']]
best_results_SVD_t0f.head(10)

Unnamed: 0,mean_test_mae,std_test_mae,rank_test_mae,mean_test_rmse,std_test_rmse,rank_test_rmse,mean_test_fcp,std_test_fcp,rank_test_fcp,mean_fit_time,mean_test_time,param_n_factors,param_init_mean,param_init_std_dev,param_lr_all,param_reg_all
311,0.595234,0.004427,11,0.797043,0.005752,5,0.710177,0.003857,1770,23.155665,0.793278,80,0.0,0.05,0.01,0.05
611,0.593797,0.004002,4,0.795584,0.00537,4,0.712106,0.003154,1794,27.139808,0.790885,100,0.0,0.05,0.01,0.05
910,0.594398,0.004817,8,0.798341,0.00625,9,0.711514,0.003215,1790,30.878208,0.792082,120,0.0,0.05,0.01,0.02
911,0.593755,0.004345,3,0.795261,0.005892,3,0.712763,0.003585,1795,31.324614,0.788092,120,0.0,0.05,0.01,0.05
1210,0.594102,0.004536,5,0.798269,0.006034,8,0.711968,0.003726,1793,37.155618,0.797068,150,0.0,0.05,0.01,0.02
1211,0.592666,0.004277,2,0.794075,0.005791,2,0.713621,0.003318,1798,37.512264,0.797866,150,0.0,0.05,0.01,0.05
1510,0.594298,0.004069,7,0.797585,0.005535,6,0.711425,0.002692,1789,47.55221,0.806443,200,0.0,0.05,0.01,0.02
1511,0.59257,0.004357,1,0.793901,0.00574,1,0.714056,0.003636,1799,47.264778,0.807042,200,0.0,0.05,0.01,0.05
1516,0.595267,0.004825,12,0.798174,0.006355,7,0.71473,0.003167,1800,47.15547,0.794076,200,0.0,0.05,0.05,0.05


In [20]:
# SAVE RESULTS FOR SVD GRID SEARCH
results_df.to_csv(r'Model\All_GSresults_SVD_t0f_210315.csv', index = False)
best_results_SVD_t0f.to_csv(r'Model\Best_GSresults_SVD_t0f_210315.csv', index = False)

In [15]:
# Select best parameters
algo = grid_search.best_estimator['rmse']

In [16]:
# Retrain on the whole Trainset
trainset = data.build_full_trainset()
algo.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x28325a7b910>

In [17]:
# Compute biased accuracy on Trainset
from surprise import accuracy

predictions = algo.test(trainset.build_testset())
print('Biased accuracy on Trainset,', end='   ')
accuracy.rmse(predictions)

Biased accuracy on Trainset,   RMSE: 0.5902


0.5902481765198406

In [19]:
# Compute unbiased accuracy on Testset

testset = data.construct_testset(Test_raw_ratings)
predictions = algo.test(testset)
print('Unbiased accuracy on Testset,', end=' ')
accuracy.rmse(predictions)
accuracy.mae(predictions)
accuracy.fcp(predictions)

Unbiased accuracy on Testset, RMSE: 0.7794
MAE:  0.5811
FCP:  0.7282


0.7281971298916068

In [21]:
p_0 = []
p_1 = []
p_2 = []
p_3 = []
p_4 = []
p_5 = []

for i in list(range(len(predictions))):
    if predictions[i][3] <= 1.25:
        p_0.append(predictions[i][3])
    elif predictions[i][3] <= 2.5:
        p_1.append(predictions[i][3])
    elif predictions[i][3] <= 3:
        p_2.append(predictions[i][3])
    elif predictions[i][3] <= 4:
        p_3.append(predictions[i][3])
    elif predictions[i][3] <= 4.5:
        p_4.append(predictions[i][3])
    elif predictions[i][3] > 4.5:
        p_5.append(predictions[i][3])
print(round(len(p_0)/len(testset)*100,2))
print(round(len(p_1)/len(testset)*100,2))
print(round(len(p_2)/len(testset)*100,2))
print(round(len(p_3)/len(testset)*100,2))
print(round(len(p_4)/len(testset)*100,2))
print(round(len(p_5)/len(testset)*100,2))

0.16
4.07
8.46
55.93
25.62
5.76


In [25]:
import time
import os
from surprise.dump import dump

date = time.strftime('%y%m%d-%Hh%Mm%S', time.localtime())
file_name = date + '-' + algo.__class__.__name__ + '_BP_Vt0f'
#file_name += '-fold{0}'.format(fold_i + 1)
dump_dir = os.path.expanduser('~') + '/Desktop/DS-Bootcamp/Projet/Projet-RS_Films/Model/'
file_name = os.path.join(dump_dir, file_name)

dump(file_name, predictions, algo, 1)

The dump has been saved as file C:\Users\Client/Desktop/DS-Bootcamp/Projet/Projet-RS_Films/Model/210315-21h47m15-SVD_BP_Vt0f_t


L'optimisation effectuée sur l'algorithme SVD (1800 combinaisons de paramètres testées : 10h d'analyse / évaluation) a permis d'obtenir une meilleure mesure de prédiction (RMSE sous la barre des 0,80 => 0,7794 exactement). On sauvegarde ce dernier modèle optimisé.