# Recommender System 

Recommendation or Recommender system is one of the most used application data science. The system employs statistical algorithm that help to predict using user rating, review or view etc.The system assume that, it is highly likely for users to have similar kind of review/rating for a set of entities. Netflix, Amazon, Facebook, YouTube etc. uses recommender system in one way or the other way to increase the customer base for their products.

In this project a simple recommendation system is developed using movie rating data from MovieLens. I have used the latest dataset "ml-latest". It has 27,000,000 ratings and 1,100,000 tag applications applied to 58,000 movies by 280,000 users. Includes tag genome data with 14 million relevance scores across 1,100 tags. 

import all the packages required for the project

In [53]:
import numpy as np
from surprise import SVD
from surprise import Dataset
from surprise import Reader
from surprise.model_selection import cross_validate
from surprise import accuracy
from surprise.model_selection import KFold
from surprise.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

Read the ratings and movie data into data frame

In [18]:
ratings=pd.read_csv("/Users/SimnaRassak/Documents/Project/DataSet/Project2/ml-latest/ratings.csv")

In [20]:
movies=pd.read_csv("/Users/SimnaRassak/Documents/Project/DataSet/Project2/ml-latest/movies.csv")

In [None]:
del ratings['timestamp']#Removing timestamp as it is not used in this project at this moment

Data Preprocessing and visualization

In [25]:
ratings.head()

Unnamed: 0,userId,movieId,rating
0,1,307,3.5
1,1,481,3.5
2,1,1091,1.5
3,1,1257,4.5
4,1,1449,4.5


In [21]:
movies.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [32]:
reader = Reader(rating_scale=(1, 5))

Loading the data for the model training and testing

In [50]:
dataset = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

Top 100k is used for cross validation

In [45]:
d = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']][:10000], reader)

The are two major recommendor approaches-
1. Content based filtering

The similarity is identified based on the attributes and products. In the case of movies content based filter uses genres, production house, directors, actors

2. Collaborative filtering

It is based on the users past behaviour and similar decision or choices made by other users

The Singular Value Decomposition (SVD)

SVD is used as a collaborative filtering technique. It is a method from linear algebra that has been generally used as a dimensionality reduction technique in machine learning. SVD is a matrix factorisation technique.

In [51]:
algorithm1=SVD()

Cross-validation is primarily used in applied machine learning to estimate model performance. It gives a less biased or less optimistic estimate of the model performance.

In [46]:
cross_validate(algorithm1,d,measures=['RMSE','MAE'],cv=5,verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.0018  1.0020  1.0177  1.0213  1.0448  1.0175  0.0158  
MAE (testset)     0.7696  0.7784  0.7764  0.7863  0.7955  0.7812  0.0089  
Fit time          0.52    0.51    0.53    0.47    0.47    0.50    0.02    
Test time         0.02    0.02    0.02    0.01    0.02    0.02    0.00    


{'test_rmse': array([1.00176085, 1.00199813, 1.01768363, 1.02130148, 1.04482959]),
 'test_mae': array([0.76958161, 0.77840218, 0.77637411, 0.78626676, 0.79548886]),
 'fit_time': (0.5152969360351562,
  0.5058181285858154,
  0.5295648574829102,
  0.4651219844818115,
  0.47403502464294434),
 'test_time': (0.016223907470703125,
  0.019187211990356445,
  0.018877029418945312,
  0.013238906860351562,
  0.019151926040649414)}

25% the data is used as test and remaining as the training set to generate a model

In [54]:
traindata,testdata=train_test_split(dataset,test_size=0.25)

The algorithm is trained using train data

In [55]:
algorithm1.fit(traindata)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x7fb11d1231f0>

Prediction is made on test data using the train model. The algorithm gives an accuracy of 79.8%

In [56]:
predict=algorithm1.test(testdata)
accuracy.rmse(predict)

RMSE: 0.7981


0.798063802651853

Using this model a recommendation is made for any selected user. A user input is used to review the user previous movie rating. From the previous history of the user a new set of movies are recommended. The recommendation is based on the estimated score from the model using customer behaviour.

In [59]:
#Select a user
Idu=input("Enter the user Id:")
user_Id=int(Idu)

Enter the user Id:300


In [65]:
#user previous views
user=ratings[(ratings['userId']==user_Id)&(ratings['rating']<=4)]
user=user.set_index('movieId')
user=user.join(movies)['title']
print(user)

movieId
16               Sense and Sensibility (1995)
21                             Copycat (1995)
48               When Night Is Falling (1995)
112                  Margaret's Museum (1995)
151                     Batman Forever (1995)
                        ...                  
8783                          Dolemite (1975)
8798                  1900 (Novecento) (1976)
8807              Maîtresse (Mistress) (1975)
8810    Slave of Love, A (Raba lyubvi) (1976)
8815                   ABBA: The Movie (1977)
Name: title, Length: 95, dtype: object


In [73]:
#Recommendation for user
user=movies.copy()
user=user.reset_index()
user['Estimate Score']=user['movieId'].apply(lambda x:algorithm1.predict(user_Id,x).est)
user=user.drop('movieId',axis=1)
user=user.sort_values('Estimate Score',ascending=False)
print(user.head(5))

      index                                              title  \
1171   1171  Star Wars: Episode V - The Empire Strikes Back...   
293     293                                Pulp Fiction (1994)   
843     843                              Godfather, The (1972)   
1195   1195                     Godfather: Part II, The (1974)   
257     257          Star Wars: Episode IV - A New Hope (1977)   

                           genres  Estimate Score  
1171      Action|Adventure|Sci-Fi        5.000000  
293   Comedy|Crime|Drama|Thriller        5.000000  
843                   Crime|Drama        5.000000  
1195                  Crime|Drama        4.968325  
257       Action|Adventure|Sci-Fi        4.965219  
