# Reccomender Systems w/ Surprise
From: http://surpriselib.com/

Run:  ```!pip install surprise``` in an empy cell to install the surprise package

### Overview
Surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.
Surprise was designed with the following purposes in mind:

>Give users perfect control over their experiments. 

> Alleviate the pain of Dataset handling. Users can use both built-in datasets (Movielens, Jester), and their own custom datasets.


> Provide various ready-to-use prediction algorithms such as baseline algorithms, neighborhood methods, matrix factorization-based ( SVD, PMF, SVD++, NMF), and many others. Also, various similarity measures (cosine, MSD, pearson…) are built-in.

> Make it easy to implement new algorithm ideas.

> Provide tools to evaluate, analyse and compare the algorithms’ performance. Cross-validation procedures can be run very easily using powerful CV iterators (inspired by scikit-learn excellent tools), as well as exhaustive search over a set of parameters.

## Data & problem

In [1]:
from surprise import SVD
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import train_test_split, KFold, GridSearchCV

import pandas as pd
import numpy as np

In [2]:
data = Dataset.load_builtin(name='ml-100k')

### Explore your data

### Split your data into test and train

In [3]:
train_data, test_data = train_test_split(data, test_size=.25)

### Fit a model to the data

In [4]:
algo = SVD()
algo.fit(train_data)
predictions = algo.test(test_data)


### Evaluate your results

In [5]:
# Then compute RMSE
accuracy.rmse(predictions)
#

RMSE: 0.9405


0.9405066075770725

In [6]:
# Use Cross-validation
# define a cross-validation iterator
kf = KFold(n_splits=5)

algo = SVD()

rmse_array = []
for train_data, test_data in kf.split(data):

    # train and test algorithm.
    algo.fit(train_data)
    predictions = algo.test(test_data)

    # Compute and print Root Mean Squared Error
    rmse = accuracy.rmse(predictions, verbose=True)
    rmse_array.append(rmse)

RMSE: 0.9317
RMSE: 0.9375
RMSE: 0.9359
RMSE: 0.9351
RMSE: 0.9363


In [7]:
# Avg RMSE
np.mean(rmse_array)

0.9352897458779422

In [8]:
#Min RMSE
np.min(rmse_array)

0.9317384560096411

In [9]:
# Try using gridsearch

# Define the parameter array
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
              'reg_all': [0.4, 0.6]}

gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

gs.fit(data)

# best RMSE score
print(gs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])

0.9634328553809479
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}
