# Recommender System example with MovieLens Dataset

For this recommender system I've implemented matrix factorization algorithm (SVD) from the
<a href="https://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf">Koren's paper</a>.

In [1]:
# imports
from recommender import Recommender
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from utils import from_pd_to_array

In [2]:
# read MovieLens dataset
movies = pd.read_csv('/home/marianna/Projects/python/ml/Recommender System/data/movies.csv', dtype='category')
ratings = pd.read_csv('/home/marianna/Projects/python/ml/Recommender System/data/ratings.csv', dtype='category')

ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [3]:
# encode movies' titles and users' Ids
movie_enc = LabelEncoder()
user_enc = LabelEncoder()
ratings['movie'] = movie_enc.fit_transform(ratings['movieId'].values)
ratings['user'] = user_enc.fit_transform(ratings['userId'].values)

# convert values to categorical type
ratings['user'] = ratings['user'].astype('category')
ratings['movie'] = ratings['movie'].astype('category')

# ratings table after encodings
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp,movie,user
0,1,1,4.0,964982703,0,0
1,1,3,4.0,964981247,3676,0
2,1,6,4.0,964982224,6894,0
3,1,47,5.0,964983815,5544,0
4,1,50,5.0,964982931,5889,0


In [4]:
# convert data to the pivot table with the type numpy array
rating_data = from_pd_to_array(ratings, 'user', 'movie', 'rating')

In [5]:
rec = Recommender(n_epochs=5)
rec.fit(rating_data,test_size=0.1,verbose=1)

# method validate calculates Mean Squared Error (MSE) between the testset and the predicted data
rec.validate()

Proesssing epoch: 1
Training loss: 2.286194031031647
Proesssing epoch: 2
Training loss: 1.5538030686141222
Proesssing epoch: 3
Training loss: 1.1794577994307922
Proesssing epoch: 4
Training loss: 0.9427813797965081
Proesssing epoch: 5
Training loss: 0.7723712186760323
Mean Squared Error: 0.0033439556136727333


### My recommender system algorithm implementation is quite time consuming but performs pretty well even on this not so big data set