## Collaborative Filtering
#### Model Based Approach

In [13]:
import pandas as pd
# import dataset from surprise
from surprise import Dataset, Reader, SVD, accuracy
# import train_test_split from surprise.model_selection
from surprise.model_selection import train_test_split
# import GridSearchCV from surprise.model_selection
from surprise.model_selection import GridSearchCV
# import cross_validate from surprise.model_selection
from surprise.model_selection import cross_validate

We will be working with the [same data](https://drive.google.com/file/d/1WvTmAfO09TCX7xp7uu06__ziic7JnrL5/view?usp=sharing) we used in the previous exercise.

In [14]:
book_ratings = pd.read_csv('E:\Vocational\Lighthouse Labs\Flex Course\C08_Machine Learning Application\exercise_recommender_engines\data\\bx_book_ratings.csv',sep=";", encoding="latin")

* create surprise dataset from book_ratings

In [15]:
reader = Reader(rating_scale=(0, 10))

# Loads Pandas dataframe
data = Dataset.load_from_df(book_ratings, reader)

* split data to train and test set, use test size 15%

In [16]:
# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.15, random_state=42)

* Use SVD (with default settings) to create recommendations for each user
    - print default model's rmse that was computed on the test set (using object accuracy we imported in the beginning)

In [17]:
# Use SVD with default settings to create recommendations
default_svd = SVD()
default_svd.fit(trainset)
predictions = default_svd.test(testset)

In [18]:
# Print default model's RMSE
default_rmse = accuracy.rmse(predictions, verbose=True)

RMSE: 3.4989


* create parameters grid, use this params:
* 'n_factors': [110, 120, 140, 160]
* 'reg_all': [0.08, 0.1, 0.15]

In [19]:
# Create a parameter grid
param_grid = {
    'n_factors': [110, 120, 140, 160],
    'reg_all': [0.08, 0.1, 0.15]
}

* instantiate GridSearch with SVD as model, our pre-defined parameter grid and rmse and mae as evaluation metrics

In [20]:
# Instantiate GridSearch with SVD as model
grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

* fit GridSearch

In [21]:
# Fit GridSearch
grid_search.fit(data)

* print best RMSE score from training

In [24]:
# Print best RMSE score from training
print(f"Best RMSE score from training: {grid_search.best_score['rmse']}")

# Get the best model based on RMSE
best_svd = grid_search.best_estimator['rmse']

Best RMSE score from training: 3.4432040918906206


* predict test set with optimal model based on `RMSE`

In [26]:
# Train the best model on the entire training set
best_svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a230ba2650>

In [27]:
# Predict the test set with the optimal model
best_predictions = best_svd.test(testset)

* print optimal model's RMSE that was computed on test set
    - is it better than the default parameters?

In [28]:
# Print optimal model's RMSE on the test set
best_rmse = accuracy.rmse(best_predictions, verbose=True)

RMSE: 3.4310


In [29]:
# Compare with the default parameters
print(f"Default model RMSE: {default_rmse}")
print(f"Optimal model RMSE: {best_rmse}")
print(f"Is the optimal model better? {'Yes' if best_rmse < default_rmse else 'No'}")

Default model RMSE: 3.4989099480619457
Optimal model RMSE: 3.430999654233622
Is the optimal model better? Yes
