# Non Matrix Factorization (Daniel Pak and Harry Wei)

**For** this assignment, we built a recommendation model using non matrix factorization. We selected the Surprise library (http://surpriselib.com/) which contains well built recomendation algorithms, including NMF. Many dataset suitable for recommendation systems were pre-loaded into the Surpise package, and we selected the 100k movie lens dataset. 

**We** imported the NMF model, tuned and cross validated our system to select the best values for the # of latent dimensions, the regularization parameter, and other hyperparameters.

## Importing All Modules and the Dataset

In [None]:
!pip3 install surprise



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import surprise
from surprise import Dataset, accuracy
from surprise.model_selection.search import GridSearchCV
from surprise.prediction_algorithms.matrix_factorization import NMF
from sklearn.model_selection import train_test_split
from time import time
import copy
import os

# Import 100k movie lens dataset
data = Dataset.load_builtin("ml-100k")
ratings = data.raw_ratings
X_train, X_test = train_test_split(ratings, test_size=0.2,random_state=0)

## Basic Recommendation System Using Movie Lens Dataset

This section creates and trains a recommendation system using Non Matrix Factorization method. The algorithm goes through an exhaustive search to find the best hyper parameters.  

In [None]:
data.raw_ratings = X_train
# factors_val = np.linspace(100,1000,5).astype('int')
# epoch_val = [5, 10, 20, 25, 30]
# p_val = np.linspace(0.1,0.5,5)
# q_val = np.linspace(0.15,0.2,5)

factors_val = [550]
epoch_val = [20]
p_val = [0.3]
q_val = [0.18]


parameters = {'n_factors': factors_val, 'n_epochs': epoch_val, 'reg_pu': p_val, 'reg_qi': q_val} #https://surprise.readthedocs.io/en/stable/matrix_factorization.html
gs = GridSearchCV(NMF, parameters, measures=['mse'], cv = 3, n_jobs = -1)
# Training model using the training set
t0 = time()
gs.fit(data)
tf = time()
print(f"Training finished in {tf-t0} seconds.")

# Print model hyper parameters that generates the lowest MSE scores
print(gs.best_score['mse'])
print(gs.best_params['mse'])

Training finished in 82.9665150642395 seconds.
0.9143814647209784
{'n_factors': 550, 'n_epochs': 20, 'reg_pu': 0.3, 'reg_qi': 0.18}


### Prediction and test scores
The traind recommendation system is used to generate a prediction of ratings of movies for everyone. The mean squared error between the prediction and actual rating is calculateed and displayed. The same is done for the test set.

In [None]:
model = gs.best_estimator['mse']
trainset = data.build_full_trainset()
model.fit(trainset)
train_predict = model.test(trainset.build_testset())
print(f"Accuracy of Training Set is {accuracy.mse(train_predict, verbose=False)}")

Accuracy of Training Set is 0.8083042015083123


In [None]:
testset = data.construct_testset(X_test)
test_predict = model.test(testset)

print(f"Accuracy of Test Set is {accuracy.mse(test_predict, verbose=False)}")

Accuracy of Test Set is 0.8985747233268002


### Interpretation of movie recommendatios
The following table displays the result of prediction, as well as actual rating side by side. The second table shows the same data sorted by user ID. 

In [None]:
df_test = pd.DataFrame(test_predict)
df_test = df_test.drop(["details"], axis=1)
df_test.columns = ["User ID", "Movie ID", "Actual Movie Ratings", "Predicted Movie Ratings"]

movies = pd.read_csv("movie_item.txt", sep="|",header=None)
movies = movies[[0,1]]
movies.columns = ["Movie ID", "Movie Title"]
movies['Movie ID'] = movies['Movie ID']-1
movies_dict = movies.to_dict()
df_test["Movie ID"] = df_test["Movie ID"].astype("int").map(movies_dict['Movie Title'])
display(df_test)

Unnamed: 0,User ID,Movie ID,Actual Movie Ratings,Predicted Movie Ratings
0,23,My Life as a Dog (Mitt liv som hund) (1985),4.0,3.986150
1,695,Jungle2Jungle (1997),5.0,3.972350
2,774,Batman Forever (1995),3.0,2.368556
3,417,Lord of Illusions (1995),3.0,3.144461
4,234,Drop Dead Fred (1991),3.0,2.665321
...,...,...,...,...
19995,659,Copycat (1995),3.0,3.574292
19996,14,Better Off Dead... (1985),5.0,4.259313
19997,629,High Noon (1952),5.0,4.097314
19998,892,Field of Dreams (1989),2.0,3.721906


In [None]:
df_test.sort_values(by='User ID')

Unnamed: 0,User ID,Movie ID,Actual Movie Ratings,Predicted Movie Ratings
11783,1,"Love Bug, The (1969)",1.0,2.799149
3660,1,Carlito's Way (1993),4.0,3.169712
7645,1,Jude (1996),2.0,3.272450
3126,1,Beavis and Butt-head Do America (1996),4.0,3.599938
13431,1,Mimic (1997),1.0,2.375029
...,...,...,...,...
8990,99,"Wrong Trousers, The (1993)",5.0,4.142914
14046,99,"Brothers McMullen, The (1995)",3.0,3.446930
13110,99,Back to the Future (1985),4.0,3.889419
14708,99,Independence Day (ID4) (1996),2.0,2.210188
