> # Collaborative Filtering Goodbooks Recommender System

In this project our aim is to find a rating-based or matching-based (binary) dataset that can be used to inform a recsys based on collaborative filtering and build a Python Notebook that:

   (1) Loads the dataset

   (2) Tries at least 2 different recommendation methods based on collaborative filtering (Tensorflow, Matrix factorization, Count-based)

   (3) Uses quantitative metrics to evaluate the recommendations of each of the two methods that you selected.

## About Dataset

To begin with, we have selected a dataset from kaggle containing ten thousand books & one million ratings. 

The dataset can be found [here](https://www.kaggle.com/datasets/zygmunt/goodbooks-10k?select=ratings.csv).

Some informations about the dataset:

   * Contains ratings for ten thousand popular books. 
   * As to the source, let's say that these ratings were found on the internet. 
   * Generally, there are 100 reviews for each book, although some have less - fewer - ratings. 
   * Also, ratings go from one to five.

## Recommender System

We will start by **importing the necessary libraries**

In [1]:
# import libraries
import pandas as pd
import numpy as np
from termcolor import colored
from collections import defaultdict
from datetime import datetime
import random

from datasketch import MinHash, MinHashLSH

from surprise import Reader, Dataset, SVD
from surprise.model_selection.validation import cross_validate

from functions.pre_process_data import *
from functions.count_based_recsys_functions import *
from functions.svd_functions import *

Next, we are going to **import the data** and **select a user** for the recommendations

In [2]:
# paths to load data
path_books = 'data/books.csv'
path_ratings = 'data/ratings.csv'

In [3]:
# function to import data
books, ratings = import_data(path_books,path_ratings)

Import book data
[32mSuccesfull[0m

Import rating data
[32mSuccesfull[0m



In [4]:
# function to select user
user_id = select_user()

Please provide a user ID (eg. a number from 1 to 53424): 123
[32mSuccesfull[0m



Since we have finished the data loading and we have selected a user, we will **start building the recommender system**.

> ### Count-Based Recommender system

We are going to build our **first recommender system** based on **collaborative filtering** using the min-wise independent permutations **locality sensitive hashing** scheme. [`@MinHash LSH`](https://en.wikipedia.org/wiki/MinHash)

In [5]:
# set starting time
start_time = datetime.now()

# map each user to discretized ratings
ratings_ = map_users_to_discretized_ratings(ratings)

# create MinHashLSH inxdexes
index, hashes = create_user_based_LSH_index(ratings_)

# recommendations
df_recommendations, already_rated = get_recommendation_LSH(books, ratings_, user_id, index, hashes)

# validation using Precision, Recall
precision, recall = validation_of_user_based_recsys(ratings, already_rated, user_id)

#end time
end_time = datetime.now()

#total execution time
total_time = end_time - start_time

print(f'Execution time: {total_time}')

[1] Map each user to discretized ratings
[32mSuccesfull[0m

[2] Create user-based LSH indexes
10000 out of 53424 users indexed.
20000 out of 53424 users indexed.
30000 out of 53424 users indexed.
40000 out of 53424 users indexed.
50000 out of 53424 users indexed.
index created
[32mSuccesfull[0m

[3] The recommendation for user 123 is:

I suggest the following books because they have received positive ratings from users who tend to like what you like:

 book_id                                                                 title
    9219 On Becoming Baby Wise: Giving Your Infant the Gift of Nighttime Sleep
    3448                                                           Term Limits
    4482                  It's a Magical World: A Calvin and Hobbes Collection
    7091                                                                 Cabal
    3694                The Intelligent Investor (Collins Business Essentials)
    9932                                                     Parts

> ### Recommender System using SVD (Matrix Factorization Technique)

Next, we will build the **second recommender system** based on **Matrix Factorization**. [`@SVD`](https://en.wikipedia.org/wiki/Singular_value_decomposition)

For that porpuse we will use the [`@surprise`](https://pypi.org/project/scikit-surprise/1.0.2/) library.

In [6]:
# load Reader library
reader = Reader(rating_scale=(0.5, 5))
    
# load ratings dataset with Dataset library
data = Dataset.load_from_df(ratings[['user_id', 'book_id', 'rating']], reader)

In [7]:
# set starting time
start_time = datetime.now()

# svd
svd = svd_algorithm(data)

# recommendations
rec_table = recommend_surprise(user_id, ratings, svd, books, 10)

# validation using RMSE
rmse = validate(user_id, ratings, svd, books) 

#end time
end_time = datetime.now()

#total execution time
total_time = end_time - start_time

print(f'Execution time: {total_time}')

[1] Compute the RMSE of the SVD algorithm
Evaluating RMSE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8391  0.8404  0.8381  0.8425  0.8421  0.8404  0.0017  
Fit time          8.29    8.38    8.54    8.74    8.45    8.48    0.16    
Test time         1.08    1.16    1.02    1.10    1.09    1.09    0.05    
[32mSuccesfull[0m

[2] Create training set
[32mSuccesfull[0m

[3] Fit SVD
[32mSuccesfull[0m

[4] The recommendation for user 123 is:
 book_id                                                      title
    4868                                           Jesus the Christ
    1788               The Calvin and Hobbes Tenth Anniversary Book
    8109                           The Absolute Sandman, Volume One
    3628                             The Complete Calvin and Hobbes
     862             Words of Radiance (The Stormlight Archive, #2)
    5207   The Days Are Just Packed: A Calvin and Hobbes Colle