In [1]:
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate, KFold
import pandas as pd
import numpy as np

reader = Reader()
ratings = pd.read_csv(r'C:\Users\subra\OneDrive\Desktop\datasets\ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [2]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
fold = KFold(n_splits=5)
fold.split(data)

<generator object KFold.split at 0x00000278C218E270>

In [3]:
svd = SVD()
cross_validate(svd, data, measures=['RMSE', 'MAE'])

{'test_rmse': array([0.8977437 , 0.90816465, 0.89297663, 0.89325792, 0.8925646 ]),
 'test_mae': array([0.68905864, 0.69655792, 0.68766969, 0.68921514, 0.68792203]),
 'fit_time': (3.7000789642333984,
  3.688384771347046,
  3.628535509109497,
  3.7064368724823,
  3.5163609981536865),
 'test_time': (0.1320357322692871,
  0.10916304588317871,
  0.19594335556030273,
  0.15203499794006348,
  0.20462775230407715)}

We get a mean Root Mean Sqaure Error of 0.89 approx which is more than good enough for our case. Let us now train on our dataset and arrive at predictions.

In [4]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x278c218f9a0>

Let us pick user with user Id 1 and check the ratings she/he has given.

In [5]:
ratings[ratings['userId'] == 2]

Unnamed: 0,userId,movieId,rating,timestamp
20,2,10,4.0,835355493
21,2,17,5.0,835355681
22,2,39,5.0,835355604
23,2,47,4.0,835355552
24,2,50,4.0,835355586
...,...,...,...,...
91,2,592,5.0,835355395
92,2,593,3.0,835355511
93,2,616,3.0,835355932
94,2,661,4.0,835356141


In [6]:
ratings[ratings['movieId'] == 300]

Unnamed: 0,userId,movieId,rating,timestamp
50,2,300,3.0,835355532
1038,15,300,4.0,1054449869
3193,19,300,3.0,855193220
4043,23,300,4.0,1149868544
5097,30,300,5.0,945122065
...,...,...,...,...
96205,641,300,4.0,834636572
96610,647,300,5.0,947292218
98451,659,300,5.0,834598140
98679,662,300,3.0,839022324


In [7]:
svd.predict(2, 300, 3)

Prediction(uid=2, iid=300, r_ui=3, est=3.583150355851875, details={'was_impossible': False})

For movie with ID 300, we get an estimated prediction of 3.53. One startling feature of this recommender system is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have predicted the movie.

### Conclusion

We created recommenders using demographic , content- based and collaborative filtering. While demographic filtering is very elemantary and cannot be used practically, Hybrid Systems can take advantage of content-based and collaborative filtering as the two approaches are proved to be almost complimentary. This model was very baseline and only provides a fundamental framework to start with.