# Recommendation System

Using python package named [Surprise](http://surpriselib.com/), which is an easy-to-use Python scikit for recommendation systems. It includes several commonly used algorithms, including [collaborative filtering](https://surprise.readthedocs.io/en/stable/knn_inspired.html) and [Matrix Factorization-based algorithms](https://surprise.readthedocs.io/en/stable/matrix_factorization.html).

In [11]:
# install packages
import sys

!pip install scikit-surprise



In [4]:
from surprise.prediction_algorithms.matrix_factorization import SVD
from surprise.prediction_algorithms.knns import KNNBasic
from surprise.prediction_algorithms.knns import KNNWithMeans
from surprise.prediction_algorithms.knns import KNNBaseline
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
from surprise.model_selection import GridSearchCV

## Loading data from package surprise 

downloading the dataset included in package surprise. The data will be saved in the .surprise_data folder in your home directory.

In [6]:
# Load the movielens-100k dataset (download it if needed),
# built - in in the package
data = Dataset.load_builtin('ml-100k')

# sample random trainset and testset where test set is made of 20% of the ratings.
trainset, testset = train_test_split(data, test_size=.20, random_state=0)

## Collaborative Filtering

Applying three different flavors of collaborative filtering to this data.

### The basic collaborative filtering algorithm

In [7]:
# Use the basic collaborative filtering algorithm. 
# See https://surprise.readthedocs.io/en/stable/knn_inspired.html for more details.
cf = KNNBasic()
cf.fit(trainset)
#knn basic
#it takes the weighted average of the user's neighborhood of the item
#caveat - the individual bias is not considered here
# Train the algorithm on the trainset, and predict ratings for the testset
predictions = cf.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
accuracy.mae(predictions)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9872
MAE:  0.7822


0.7821631158458218

### The basic collaborative filtering algorithm with user mean ratings

In [8]:
# Use the basic collaborative filtering algorithm, taking into account the mean ratings of each user.
# See https://surprise.readthedocs.io/en/stable/knn_inspired.html for more details.
sim_options = {'user_based': True}
cf_mean = KNNWithMeans(sim_options=sim_options)
cf_mean.fit(trainset)


#knnwith means - the mean rating of each user is taken into consideration
#this helps in reducing the bias
#k is the size of the neighborhood, default is 40
#sim_option - user based(user-user collaborative filtering) 

# Train the algorithm on the trainset, and predict ratings for the testset
predictions = cf_mean.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
accuracy.mae(predictions)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9629
MAE:  0.7597


0.7597060868238855

RMSE has reduced from previous one

In [9]:
# Use the basic collaborative filtering algorithm, taking into account the mean ratings of each user.
# See https://surprise.readthedocs.io/en/stable/knn_inspired.html for more details.
sim_options = {'user_based': False} #item to item collaborative filterings
cf_mean = KNNWithMeans(sim_options=sim_options)
cf_mean.fit(trainset)


#knnwith means - the mean rating of each user is taken into consideration
#this helps in reducing the bias
#k is the size of the neighborhood, default is 40
#sim_option - user based(user-user collaborative filtering) 

# Train the algorithm on the trainset, and predict ratings for the testset
predictions = cf_mean.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
accuracy.mae(predictions)

Computing the msd similarity matrix...
Done computing similarity matrix.
RMSE: 0.9464
MAE:  0.7427


0.7426643957744533

RMSE has further reduced

## Matrix Factorization

applying the matrix factorization to this data.

In [10]:
# using the famous SVD algorithm.
svd = SVD(n_factors = 100)
svd.fit(trainset)
#user item matrix is decomposed and used to make predictions
#Train the algorithm on the trainset, and predict ratings for the testset
#n_factors is the number of dimensions/characteristics to describe the item/user
predictions = svd.test(testset)

# Then compute RMSE
accuracy.rmse(predictions)
accuracy.mae(predictions)

RMSE: 0.9459
MAE:  0.7460


0.7459825271297698

alter n_factors to gain more precision. 
Higher rhe factors more computation needed.here, it is almost same as user-user collaborative filtering