# Applying algorithms using recommender library

This involves applying different recommenders using an algorithm library

Some resources to get started, and for different algorithm options

- http://surpriselib.com/
- http://surprise.readthedocs.io/en/stable/


## importing libraries

In [1]:
from __future__ import (absolute_import, division, print_function, unicode_literals)

# import surprise
from surprise import evaluate, print_perf, dump, Reader, Dataset
#import algorithms from surprise
from surprise.model_selection import cross_validate
from surprise.model_selection import GridSearchCV
import time

ImportError: No module named model_selection

Importing surprise algorithms

### KNN Algorithms

In [None]:
from surprise import KNNBasic
from surprise import KNNBaseline
from surprise import KNNWithMeans
from surprise import KNNWithZScore

### Matrix Factorization

In [None]:
from surprise import NMF
from surprise import SVD

### Base Algorithms, SlopeOne, NormalPredictor,Baseline

In [2]:
from surprise import AlgoBase
from surprise import SlopeOne
from surprise import NormalPredictor
from surprise import BaselineOnly

### Co-Clustering

In [3]:
from surprise import CoClustering

In [4]:
import pandas as pd
import numpy as np


In [5]:
from surprise import accuracy
import numpy as np
import pandas as pd
np.random.seed(101)
from collections import defaultdict
import os, io, sys

In [6]:
mae = 0
rmse = 0

In [7]:
df_ratings = pd.read_csv("cleaned_ratings.csv", low_memory=False)

IOError: File cleaned_ratings.csv does not exist

In [None]:
df_ratings.shape

In [None]:
df_ratings.head()

In [None]:
def compute_recommendations(algo):
    global df_ratings
    df_ratings=df_ratings[['user_id','item_id','rating']]
    df_ratings = df_ratings.dropna()
    df_ratings = df_ratings.drop_duplicates()

    #formatting the dataset using the surprise library
    reader = Reader(line_format='user item rating', sep=',', rating_scale=(1, 5))
    data = Dataset.load_from_df(df_ratings, reader=reader)
    training_set = data.build_full_trainset()

    algo = algo # use the singular value decomposition
#     cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

#     algorithm.fit(training_set)# fit the data to the model
#     testing_set = training_set.build_anti_testset()
#     predictions = algorithm.test(testing_set)# make prediction



    param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
                  'reg_all': [0.4, 0.6]}
    
    print("\nGrid Search time")
    start_time = time.time()
    gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)
    gs.fit(data)
    print(time.time() - start_time, "seconds")
    
    # best RMSE score
    print(gs.best_score['rmse'])

    # combination of parameters that gave the best RMSE score
    print(gs.best_params['rmse'])
    
    
    algo = gs.best_estimator['rmse']
    
    print("\nCross Validation Score")
    cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

    
    print("\nAlgorithm Train Time")
    start_time = time.time()
    algo.fit(training_set)# fit the data to the model
    print(time.time() - start_time, "seconds")
    
    
    print("\nAlgorithm BUILD Anti-Testset Time")
    start_time = time.time()
    algo.fit(training_set)# fit the data to the model
    print(time.time() - start_time, "seconds")
    
    
    print("\nAlgorithm BUILD Anti-Testset Time")
    start_time = time.time()
    testing_set = training_set.build_anti_testset()
    print(time.time() - start_time, "seconds")
    
    
    
    print("\nAlgorithm Testing Anti_testset Time")
    start_time = time.time()
    predictions = algo.test(testing_set)# make prediction
    print(time.time() - start_time, "seconds")
    
#     print("\nMean Absolute Error", float(accuracy.mae(predictions)))
#     print("\nMean Square Error", float(accuracy.rmse(predictions)))

    global mae
    global rmse
    mae = accuracy.mae(predictions)
    rmse = accuracy.rmse(predictions)
    mae = float(mae)
    rmse = float(rmse)
    print("\nMean Absolute Error", mae)
    print("\nMean Square Error", rmse)

## A KNN Algorithm

In [48]:
mae = 0
rmse = 0
print("Basic KNN Algorithm")
compute_recommendations(KNNBasic())

Basic KNN Algorithm

Grid Search time
9.13635802268982 seconds
0.9755618541913663
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}

Cross Validation Score
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9693  0.9644  0.9798  0.9618  0.9668  0.9684  0.0062  
MAE (testset)     0.7420  0.7376  0.7530  0.7372  0.7420  0.7424  0.0057  
Fit time          1.45    1.53    1.53    1.35    1.35    1.44    0.08    
Test time         0.10    0.08    0.09    0.07    0.07    0.08    0.01    

Algorithm Train Time
1.5632081031799316 seconds

Algorithm BUILD Anti-Testset Time
1.9908599853515625 seconds

Algorithm BUILD Anti-Testset Time
3.8130252361297607 seconds

Algorithm Testing Anti_testset Time
39.820473432540894 seconds
MAE:  0.2643
RMSE: 0.3401

Mean Absolute Error 0.26425497043915624

Mean Square Error 0.340141389660591


## A Matrix Factorization Algorithm

In [49]:
mae = 0
rmse = 0

print("SVD Algorithm")
compute_recommendations(SVD())

SVD Algorithm

Grid Search time
7.8044047355651855 seconds
0.9761377175797857
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}

Cross Validation Score
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9619  0.9820  0.9727  0.9725  0.9504  0.9679  0.0108  
MAE (testset)     0.7349  0.7507  0.7438  0.7464  0.7330  0.7418  0.0068  
Fit time          1.17    1.15    1.11    1.15    1.07    1.13    0.03    
Test time         0.11    0.06    0.06    0.06    0.06    0.07    0.02    

Algorithm Train Time
1.3836185932159424 seconds

Algorithm BUILD Anti-Testset Time
1.3383402824401855 seconds

Algorithm BUILD Anti-Testset Time
3.345135450363159 seconds

Algorithm Testing Anti_testset Time
38.42778992652893 seconds
MAE:  0.2644
RMSE: 0.3402

Mean Absolute Error 0.2643806888672658

Mean Square Error 0.34020838251695457


## A Basic Algorithm

In [43]:
mae = 0
rmse = 0

print("Random Rating Algorithm")
compute_recommendations(NormalPredictor())

Random Rating Algorithm

Grid Search time
9.545884370803833 seconds
0.976116211960863
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}

Cross Validation Score
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9718  0.9540  0.9652  0.9815  0.9673  0.9679  0.0089  
MAE (testset)     0.7422  0.7379  0.7388  0.7492  0.7421  0.7420  0.0040  
Fit time          1.31    1.47    1.63    1.64    1.60    1.53    0.13    
Test time         0.09    0.09    0.12    0.08    0.09    0.10    0.01    

Algorithm Run Time
1.7000212669372559 seconds

Algorithm BUILD Anti-Testset Time
1.623244047164917 seconds

Algorithm BUILD Anti-Testset Time
4.306072950363159 seconds

Algorithm Testing Anti_testset Time
45.2378089427948 seconds
MAE:  0.2643

Mean Absolute Error 0.26428393657409294
RMSE: 0.3402

Mean Square Error 0.34023059234724684


## A Co-Clustering Algorithm

In [50]:
mae = 0
rmse = 0

print("A co-clustering Algorithm")
compute_recommendations(CoClustering())

A co-clustering Algorithm

Grid Search time
7.760693550109863 seconds
0.9760097962144766
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}

Cross Validation Score
Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9769  0.9537  0.9721  0.9625  0.9702  0.9671  0.0081  
MAE (testset)     0.7410  0.7333  0.7469  0.7421  0.7431  0.7413  0.0045  
Fit time          1.15    1.17    1.22    1.13    1.08    1.15    0.04    
Test time         0.10    0.06    0.06    0.06    0.06    0.07    0.01    

Algorithm Train Time
1.3719029426574707 seconds

Algorithm BUILD Anti-Testset Time
1.3392219543457031 seconds

Algorithm BUILD Anti-Testset Time
3.3509840965270996 seconds

Algorithm Testing Anti_testset Time
38.21390223503113 seconds
MAE:  0.2643
RMSE: 0.3401

Mean Absolute Error 0.2642843964469402

Mean Square Error 0.340110313546458
