## Computing the Optimal Weights for Blending

In [1]:
%load_ext autoreload
%autoreload 2

Necessary imports:

In [2]:
import numpy as np
from baselines import Baselines
from MF_SGD import MF_SGD
from MF_BSGD import MF_BSGD
from MF_ALS import MF_ALS
from surprise_models import SurpriseModels
from blending import Blending
from data import Data

Set the random seed to be able to reproduce the results.

In [3]:
np.random.seed(98)

Load and prepare data.

In [4]:
data = Data(test_purpose=True)

Preparing data ...
Splitting data to train and test data ...
... data is splitted.
... data is prepared.


Dictionary for the models to blend:

In [5]:
models = {}

Run Baseline models.

In [6]:
baselines = Baselines(data=data, test_purpose=True)

print('\nModelling using baseline_global_mean:')
models['baseline_global_mean'] = baselines.baseline_global_mean()['Rating']

print('\nModelling using baseline_user_mean:')
models['baseline_user_mean'] = baselines.baseline_user_mean()['Rating']

print('\nModelling using baseline_movie_mean:')
models['baseline_item_mean'] = baselines.baseline_item_mean()['Rating']

print('\nModelling using baseline_global_median:')
models['baseline_global_median'] = baselines.baseline_global_median()['Rating']

print('\nModelling using baseline_user_median:')
models['baseline_user_median'] = baselines.baseline_user_median()['Rating']

print('\nModelling using baseline_movie_median:')
models['baseline_item_median'] = baselines.baseline_item_median()['Rating']


Modelling using baseline_global_mean:
Test RMSE using baseline_global_mean: 1.1217510250034082

Modelling using baseline_user_mean:
Test RMSE using baseline_user_mean: 1.096396482836658

Modelling using baseline_movie_mean:
Test RMSE using baseline_item_mean: 1.0308878155269185

Modelling using baseline_global_median:
Test RMSE using baseline_global_median: 1.130537396215373

Modelling using baseline_user_median:
Test RMSE using baseline_user_median: 1.1515699986539643

Modelling using baseline_movie_median:
Test RMSE using baseline_item_median: 1.0984206090659812


Run Matrix Factorization model trained using Stochastic Gradient Descent.

In [7]:
mf_sgd = MF_SGD(data=data, test_purpose=True)

print('\nModelling using MF_SGD:')
models['mf_sgd'] = mf_sgd.train()['Rating']


Modelling using MF_SGD:
Learning the matrix factorization using SGD ...
Iteration: 1, RMSE on training set: 1.0160678051052041
Iteration: 2, RMSE on training set: 1.007356323368439
Iteration: 3, RMSE on training set: 1.0004793119769244
Iteration: 4, RMSE on training set: 0.9955558890929159
Iteration: 5, RMSE on training set: 0.9917812226205461
Iteration: 6, RMSE on training set: 0.9884934261639605
Iteration: 7, RMSE on training set: 0.9859018403955917
Iteration: 8, RMSE on training set: 0.982875352018487
Iteration: 9, RMSE on training set: 0.9804447964872828
Iteration: 10, RMSE on training set: 0.9794597522010847
Iteration: 11, RMSE on training set: 0.9774870835122362
Iteration: 12, RMSE on training set: 0.9768867592093701
Iteration: 13, RMSE on training set: 0.9755857527900949
Iteration: 14, RMSE on training set: 0.9746623618252294
Iteration: 15, RMSE on training set: 0.9741756825755195
Iteration: 16, RMSE on training set: 0.9735984975199187
Iteration: 17, RMSE on training set: 0.973

Run Matrix Factorization model trained using Biased Stochastic Gradient Descent.

In [8]:
mf_bsgd = MF_BSGD(data=data, test_purpose=True)

print('\nModelling using MF_BSGD:')
models['mf_bsgd'] = mf_bsgd.train()['Rating']


Modelling using MF_BSGD:
Learning the matrix factorization using BSGD ...
Iteration: 1, RMSE on training set: 1.0031893656613933
Iteration: 2, RMSE on training set: 0.9917689208035846
Iteration: 3, RMSE on training set: 0.9840764212615188
Iteration: 4, RMSE on training set: 0.9792922297767432
Iteration: 5, RMSE on training set: 0.9761765216047489
Iteration: 6, RMSE on training set: 0.973157866279339
Iteration: 7, RMSE on training set: 0.9713441594690397
Iteration: 8, RMSE on training set: 0.9696744429435734
Iteration: 9, RMSE on training set: 0.9685373094228283
Iteration: 10, RMSE on training set: 0.9673615128082158
Iteration: 11, RMSE on training set: 0.9664318851358776
Iteration: 12, RMSE on training set: 0.9656141499944134
Iteration: 13, RMSE on training set: 0.9650112733199667
Iteration: 14, RMSE on training set: 0.9645321669298093
Iteration: 15, RMSE on training set: 0.9640508333332661
Iteration: 16, RMSE on training set: 0.9636882114236515
Iteration: 17, RMSE on training set: 0.

Run Matrix Factorization model trained using Alternating Least Squares.

In [9]:
mf_als = MF_ALS(data=data, test_purpose=True)

print('\nModelling using MF_ALS:')
models['mf_als'] = mf_als.train()['Rating']


Modelling using MF_ALS:
Learning the matrix factorization using ALS ...
Iteration: 1, RMSE on training set: 0.984673949906507
Iteration: 2, RMSE on training set: 0.969514886406822
Iteration: 3, RMSE on training set: 0.9556154436965887
Iteration: 4, RMSE on training set: 0.9468573624882057
Iteration: 5, RMSE on training set: 0.9427911726103717
Iteration: 6, RMSE on training set: 0.9403818073184986
Iteration: 7, RMSE on training set: 0.9387652328269936
Iteration: 8, RMSE on training set: 0.9376801079932285
Iteration: 9, RMSE on training set: 0.9369711988246903
Iteration: 10, RMSE on training set: 0.936517493924976
Iteration: 11, RMSE on training set: 0.9362331353027183
Iteration: 12, RMSE on training set: 0.9360602304304075
Iteration: 13, RMSE on training set: 0.9359596492220911
Iteration: 14, RMSE on training set: 0.9359052775650721
Iteration: 15, RMSE on training set: 0.9358801604599987
Iteration: 16, RMSE on training set: 0.9358734638055952
The training process converged to a thresho

Run Models from Surprise Library.

In [10]:
surprise_models = SurpriseModels(data=data, test_purpose=True)

Run neighborhood models from Surprise Library.

In [11]:
print('\nModelling using user based Surprise kNN Baseline:')
models['surprise_kNN_baseline_user'] = surprise_models.kNN_baseline(k=100, 
                                                                    sim_options={'name': 'pearson_baseline',
                                                                                 'user_based': True})['Rating']

print('\nModelling using item based Surprise kNN Baseline:')
models['surprise_kNN_baseline_item'] = surprise_models.kNN_baseline(k=300, 
                                                                    sim_options={'name': 'pearson_baseline',
                                                                                 'user_based': False})['Rating']


Modelling using user based Surprise kNN Baseline:
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Test RMSE using Surprise kNN_baseline: 0.997609482251249

Modelling using item based Surprise kNN Baseline:
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
Test RMSE using Surprise kNN_baseline: 0.9866224991008465


Run two more simpler models from Surprise.

In [12]:
print('\nModelling using Surprise SlopeOne:')
models['surprise_slope_one'] = surprise_models.slope_one()['Rating']

#print('\nModelling using Surprise SVD:')
#models['surprise_SVD'] = surprise_models.SVD()['Rating']

#print('\nModelling using Surprise SVD++:')
#models['surprise_SVDpp'] = surprise_models.SVDpp()['Rating']

print('\nModelling using Surprise Co-Clustering:')
models['surprise_co_clustering'] = surprise_models.co_clustering()['Rating']


Modelling using Surprise SlopeOne:
Test RMSE using Surprise slope_one: 0.999198097767189

Modelling using Surprise Co-Clustering:
Test RMSE using Surprise co_clustering: 1.0100282598031178


Run blending algorihtm to find the optimal weights for the resulting blended (combined) model.

In [13]:
blending = Blending(models, data.test_df['Rating'])

print('\nModelling using weighted averaging of the previous models.')
optimal_weights = blending.optimize_weighted_average()
print('\nOptimal weights: ', optimal_weights)


Modelling using weighted averaging of the previous models.
     fun: 0.9761825961868279
     jac: array([ 1.78217888e-05, -2.65136361e-04,  3.32579017e-04,  1.87158585e-05,
        8.75219703e-05,  1.69582665e-04,  7.85216689e-05,  6.29574060e-05,
       -9.56803560e-05, -1.75088644e-05,  3.53455544e-05,  1.20326877e-05,
       -1.53258443e-05])
 message: 'Optimization terminated successfully.'
    nfev: 633
     nit: 42
    njev: 42
  status: 0
 success: True
       x: array([ 1.92715920e-01, -4.33729900e-01, -2.98515008e-01,  1.97066800e-01,
        1.52011392e-02, -1.89821999e-03, -1.79234920e-01,  3.45241789e-01,
        7.33516634e-01,  2.58685080e-01,  3.52442872e-01, -1.81266862e-01,
       -5.49359673e-04])

Optimal weights:  {'baseline_global_mean': 0.19271591983375924, 'baseline_user_mean': -0.4337299003053698, 'baseline_item_mean': -0.2985150084211989, 'baseline_global_median': 0.1970668001493541, 'baseline_user_median': 0.015201139195270304, 'baseline_item_median': -0.0018