<a href="https://colab.research.google.com/github/sergiokapone/DataScience/blob/main/Hw7/Hw7_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Завдання
Візьміть датасет [movielens](https://surprise.readthedocs.io/en/stable/dataset.html) і побудуйте модель матричної факторизації. У даній бібліотеці він має назву SVD. Підберіть найкращі параметри за допомогою крос-валідації, також поекспериментуйте з іншими [алгоритмами](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html) розрахунків (SVD++, NMF) і оберіть той, який буде оптимальним.


In [1]:
!pip install scikit-surprise



In [2]:
from surprise import SVD, SVDpp, NMF
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split, cross_validate
from hyperopt import hp, fmin, tpe, Trials


In [3]:
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k')

In [4]:
# sample random trainset and testset
# test set is made of 25% of the ratings.
train_set, test_set = train_test_split(data, test_size=0.25)

In [28]:
space_svd = {
    'n_factors': hp.choice('n_factors', [50, 100, 150]),
    'n_epochs': hp.choice('n_epochs', [10, 20, 30]),
    'lr_all': hp.uniform('lr_all', 0.001, 0.1),
    'reg_all': hp.uniform('reg_all', 0.01, 0.2),
    'random_state': 42
}

space_nmf = {
        'n_factors': hp.choice('n_factors', [5, 10, 15, 20, 25]),
        'n_epochs': hp.choice('n_epochs', [10, 20, 30, 40, 50]),
        'biased': hp.choice('biased', [True, False]),
        'reg_pu': hp.uniform('reg_pu', 0.001, 0.1),
        'reg_qi': hp.uniform('reg_qi', 0.001, 0.1),
        'lr_bu': hp.uniform('lr_bu', 0.001, 0.1),
        'lr_bi': hp.uniform('lr_bi', 0.001, 0.1),
        'random_state': 42
    }

In [29]:
# Визначте функцію для обчислення RMSE
def objective(params, func):
    model = func(**params)
    model.fit(train_set)
    predictions = model.test(test_set)
    rmse = accuracy.rmse(predictions)
    return rmse

best_params = []
for i, func in enumerate([SVD, SVDpp, NMF]):
    print('-------------------------')
    print(f'Srart: {func.__name__}')
    print('-------------------------')
    space=[space_svd, space_svd, space_nmf]

    best = fmin(fn=lambda params: objective(params, func), space=space[i], algo=tpe.suggest, max_evals=2)
    best_params.append({func.__name__: best})

best_params

-------------------------
Srart: SVD
-------------------------
RMSE: 0.9458
RMSE: 0.9936
100%|██████████| 2/2 [00:03<00:00,  1.88s/trial, best loss: 0.9457963458077339]
-------------------------
Srart: SVDpp
-------------------------
RMSE: 0.9504
RMSE: 0.9337
100%|██████████| 2/2 [02:56<00:00, 88.37s/trial, best loss: 0.9337113704161145]
-------------------------
Srart: NMF
-------------------------
RMSE: 0.9450
RMSE: 1.3555
100%|██████████| 2/2 [00:02<00:00,  1.12s/trial, best loss: 0.9450289140643958]


[{'SVD': {'lr_all': 0.06500877783410028,
   'n_epochs': 0,
   'n_factors': 1,
   'reg_all': 0.1843122134050255}},
 {'SVDpp': {'lr_all': 0.06220343496137148,
   'n_epochs': 2,
   'n_factors': 0,
   'reg_all': 0.15178561592744472}},
 {'NMF': {'biased': 0,
   'lr_bi': 0.007639956045705341,
   'lr_bu': 0.08109029459453065,
   'n_epochs': 2,
   'n_factors': 0,
   'reg_pu': 0.05387519948683046,
   'reg_qi': 0.09478436121820444}}]

In [30]:
# Побудова моделі SVD
model_svdpp = SVD(**best_params[0].get('SVD'))
results_svdpp = cross_validate(model_svdpp, data, measures=['MSE', 'MAE'], cv=5, verbose=True)

Evaluating MSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
MSE (testset)     1.2691  1.2666  1.2710  1.2669  1.2627  1.2673  0.0028  
MAE (testset)     0.9445  0.9468  0.9466  0.9427  0.9430  0.9447  0.0017  
Fit time          0.04    0.07    0.06    0.08    0.08    0.07    0.02    
Test time         0.14    0.29    0.13    0.22    0.40    0.24    0.10    


In [31]:
# Побудова моделі NMF
model_nmf = NMF(**best_params[2].get('NMF'))
results_nmf = cross_validate(model_nmf, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    2.7789  2.7541  2.7655  2.7682  2.7778  2.7689  0.0091  
MAE (testset)     2.5423  2.5143  2.5249  2.5310  2.5383  2.5302  0.0099  
Fit time          0.09    0.13    0.13    0.11    0.12    0.11    0.02    
Test time         0.10    0.10    0.10    0.25    0.10    0.13    0.06    


In [33]:
# Побудова моделі SVDpp
model_svdpp = SVDpp(**best_params[1].get('SVDpp'))
results_nmf = cross_validate(model_svdpp, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9580  0.9584  0.9629  0.9689  0.9599  0.9616  0.0040  
MAE (testset)     0.7613  0.7630  0.7638  0.7679  0.7652  0.7642  0.0022  
Fit time          0.64    0.63    0.62    0.62    0.66    0.63    0.02    
Test time         3.05    4.30    3.00    3.00    4.27    3.52    0.62    
