## Collaborative Filtering
#### Model Based Approach

In [2]:
import pandas as pd
# import SVD from surprise
from surprise import SVD

# # import dataset from surprise
from surprise import Dataset
from surprise import Reader
import surprise

# import accuracy from surprise
from surprise import accuracy

# import train_test_split from surprise.model_selection
from surprise.model_selection import train_test_split
# import GridSearchCV from surprise.model_selection
from surprise.model_selection import GridSearchCV
# import cross_validate from surprise.model_selection
from surprise.model_selection import cross_validate

We will be working with the [same data](https://drive.google.com/file/d/1WvTmAfO09TCX7xp7uu06__ziic7JnrL5/view?usp=sharing) we used in the previous exercise.

In [3]:
book_ratings = pd.read_csv('data/BX-Book-Ratings.csv',sep=";", encoding="latin")

In [4]:
book_ratings

Unnamed: 0,User-ID,ISBN,Book-Rating
0,276725,034545104X,0
1,276726,0155061224,5
2,276727,0446520802,0
3,276729,052165615X,3
4,276729,0521795028,6
...,...,...,...
1149775,276704,1563526298,9
1149776,276706,0679447156,0
1149777,276709,0515107662,10
1149778,276721,0590442449,10


In [5]:
book_ratings.shape

(1149780, 3)

* create surprise dataset from book_ratings

In [6]:
reader = Reader(rating_scale=(0, 10))

# Loads Pandas dataframe
data = Dataset.load_from_df(book_ratings[:1000], reader)

In [7]:
data

<surprise.dataset.DatasetAutoFolds at 0x15d800052e0>

* split data to train and test set, use test size 15%

In [8]:
X_train, X_test = train_test_split(data, test_size=0.15)

In [9]:
X_train

<surprise.trainset.Trainset at 0x15d80005340>

* Use SVD (with default settings) to create recommendations for each user
    - print default model's rmse that was computed on the test set (using object accuracy we imported in the beginning)

In [10]:
alg = SVD()
output = alg.fit(X_train)

In [11]:
uids = book_ratings['User-ID'].unique()

In [12]:
accuracy.rmse(output.test(X_test))

RMSE: 3.6284


3.6283620779413024

* create parameters grid, use this params:
* 'n_factors': [110, 120, 140, 160]
* 'reg_all': [0.08, 0.1, 0.15]

In [13]:
param_grid = {'n_factors': [110, 120, 140, 160], 'reg_all': [0.08, 0.1, 0.15]}

* instantiate GridSearch with SVD as model, our pre-defined parameter grid and rmse and mae as evaluation metrics

In [14]:
ga = surprise.model_selection.GridSearchCV(surprise.SVDpp, param_grid, measures=['rmse', 'mae'], cv=3)

* fit GridSearch

In [15]:
X_train

<surprise.trainset.Trainset at 0x15d80005340>

In [17]:
ga.fit(X_train)

* print best RMSE score from training

* predict test set with optimal model based on `RMSE`

* print optimal model's RMSE that was computed on test set
    - is it better than the default parameters?