# Exercise: Build A Collaborative Filtering Movie Recommender System with Surprise

In this exercise, you will build a collaborative filtering movie recommender system using either the `KNNWithMeans` or `SVD` algorithm from the `Scikit-Surprise` library. The dataset used is the combined Movielens dataset contained in the  `ratings_movies.csv` file that you have obtained at the end of the [previous exercise](./02_exercise_most_popular.ipynb#save-the-combined-dataframe-to-a-csv-file) after merging the `ratings.csv` and `movies.csv` files from the Movielens dataset. 

**Instructions**:
1. Load the combined Movielens dataset from the `ratings_movies.csv` file.
2. Create a `Reader` object, mapping the rating scale from 0.5 to 5.
3. Load the dataset into a `Dataset` object, using the columns `['userId', 'movieId', 'rating']`.
4. Split the dataset into training and testing sets using the `train_test_split` method from the `model_selection` module.
5. Train a collaborative filtering model using either the `KNNWithMeans` or `SVD` algorithm.
6. Make predictions on the test set and evaluate the model using the `RMSE` metric.
7. Generate top-N movie recommendations for a given user ID using the `get_top_n` function provided in
   + [KNN notebook](./03_collaborative_filtering_similarity.ipynb#k-nearest-neighbors).
   + [SVD notebook](./03_collaborative_filtering_matrix_factorization.ipynb#singular-value-decomposition).
8. Optionally, make movie recommendations for a new user by providing a list of movie ratings. Just follow the steps provided in 
   + [KNN notebook](./03_collaborative_filtering_similarity.ipynb#predictions-for-a-new-user).
   + [SVD notebook](./03_collaborative_filtering_matrix_factorization.ipynb#predictions-for-a-new-user).

In [None]:
# Import libraries
import pandas as pd
import numpy as np
from surprise import Dataset
from surprise import Reader
from surprise import KNNWithMeans, SVD
from surprise.model_selection import GridSearchCV
from surprise.model_selection import train_test_split
from surprise import accuracy
from collections import defaultdict

In [None]:
# Load the combined Movielens dataset from the `ratings_movies.csv` file.
ratings_movies = pd.read_csv('./data/ratings_movies.csv')
ratings_movies

In [None]:
# Create a `Reader` object using the `rating_scale` parameter set to `(0.5, 5)`.
reader = Reader(rating_scale=(0.5, 5))

In [None]:
# Load the dataset into a `Dataset` object, using the columns [`userId`, `movieId`, `rating`].
data = Dataset.load_from_df(ratings_movies[['userId', 'movieId','rating']], reader)
data

In [None]:
# Split the dataset into training and testing sets using the `train_test_split` method from the `model_selection` module.
trainset, testset = train_test_split(data, test_size=0.25, random_state=42)

---
### Train a collaborative filtering model using the `KNNWithMeans` algorithm

In [None]:
# Create a `KNNWithMeans` model with a cosine similarity.
simulation_options = {
    'name': 'cosine',
    'user_based': False # Item-based
}
knn = KNNWithMeans(sim_options=simulation_options, k=40)

# Fit the model to the training data.
knn.fit(trainset)

In [None]:
# Run the model on the test data.
predictions_knn = knn.test(testset)

In [None]:
# Calculate the RMSE of the model.
accuracy.rmse(predictions_knn)

In [None]:
def get_top_n(predictions, userId, n=10):
    """ Return the top-N recommendation for each user from a set of predictions.
    
    Args:
    predictions(list of Prediction objects): The list of predictions, as
        returned by the test method of an algorithm.
    n(int): The number of recommendation to output for each user. Default
        is 10.
    
    
    Returns:
    A dict where keys are user (raw) ids and values are lists of tuples:
        [(raw item id, rating estimation), ...] of
        size n.
    """

    # First map the predictions to each user.
    top_n = defaultdict(list)
    
    for user_id, item_id, actual_rating, estimated_rating, _ in predictions:
        top_n[user_id].append((item_id, estimated_rating))

    # Then sort the predictions for each user and retrieve the k highest ones.
    for user_id, estimated_ratings in top_n.items():
        estimated_ratings.sort(key=lambda x: x[1], reverse=True) # sort by rating estimation, descending. x[1] is the estimated rating. 
        top_n[user_id] = estimated_ratings[:n]

    return top_n[userId]

In [None]:
# Use the `get_top_n` function to get the top 10 recommendations for a particular user.
userId = 100
top_n = get_top_n(predictions_knn, userId= userId, n=10)
top_n

In [None]:
# Get the movie ID from the top_n 
movie_ids = [movie_id for movie_id, _ in top_n] # List comprehension
movie_ids

In [None]:
# Getting the recommened movie titles
recommended_movies = ratings_movies.set_index('movieId').loc[movie_ids, 'title'].drop_duplicates().to_list()
print(f"Recommended movies for user {userId} are: ", recommended_movies)

---
### Train a collaborative filtering model using the SVD algorithm

In [None]:
# Create a `SVD` model.
svd = SVD(n_factors=100, n_epochs=20, lr_all=0.005, reg_all=0.02, random_state=42)

# Fit the model to the training data.
svd.fit(trainset)

In [None]:
# Predict the ratings for the testset
predictions_svd = svd.test(testset)

# Calculate the RMSE of the model.
print("RMSE for SVD model is: ", accuracy.rmse(predictions_svd))

In [None]:
# Use the `get_top_n` function to get the top 10 recommendations for a particular user.
top_n_svd = get_top_n(predictions_svd, userId= userId, n=10)
top_n_svd

In [None]:
# Get the movie ID from the top_n _svd
movie_ids_svd = [movie_id for movie_id, _ in top_n_svd] # List comprehension
movie_ids_svd

In [None]:
# Getting the recommened movie titles with SVD
recommended_movies_svd = ratings_movies.set_index('movieId').loc[movie_ids_svd, 'title'].drop_duplicates().to_list()
print(f"Recommended movies for user {userId} are: ", recommended_movies_svd)
recommended_movies = ratings_movies.set_index('movieId').loc[movie_ids, 'title'].drop_duplicates().to_list()
print(f"Recommended movies for user {userId} are: ", recommended_movies)