In [11]:
import pandas as pd
import numpy as np
from math import sqrt
from scipy.sparse import coo_matrix
from collections import namedtuple
from sklearn.decomposition import NMF
from sklearn.metrics import mean_squared_error

MV_users = pd.read_csv('./data/users.csv')
MV_movies = pd.read_csv('./data/movies.csv')
train = pd.read_csv('./data/train.csv')
test = pd.read_csv('./data/test.csv')

Data = namedtuple('Data', ['users','movies','train','test'])
data = Data(MV_users, MV_movies, train, test)

In [12]:
# prepare the data for NMF processing
user_list = list(data.users['uID'])
movie_list = list(data.movies['mID'])
genres = list(data.movies.columns.drop(['mID', 'title', 'year']))
# Create a dictionary to map user/movie ID to index
mid2idx = dict(zip(data.movies.mID, list(range(len(data.movies)))))
uid2idx = dict(zip(data.users.uID, list(range(len(data.users)))))

# Create a rating matrix using the train data in a sparse matrix format
ind_movie = [mid2idx[x] for x in data.train.mID]
ind_user = [uid2idx[x] for x in data.train.uID]
train_rating = list(data.train.rating)
rating_matrix = np.array(coo_matrix((train_rating, (ind_user, ind_movie)), shape=(len(user_list), len(movie_list))).toarray())

In [13]:
nmf_model = NMF(n_components=20, init='random', random_state=0, max_iter=1000)
W = nmf_model.fit_transform(rating_matrix)
H = nmf_model.components_

In [15]:
# make predictions
pred = np.dot(W, H)
rating_pred_test = pred[[uid2idx[x] for x in data.test.uID], [mid2idx[x] for x in data.test.mID]]

# calculate RMSE
rmse = np.sqrt(mean_squared_error(list(data.test.rating), rating_pred_test))
print('RMSE value is: ', rmse)

RMSE value is:  2.861970909347182


Summary:
* The result of the non-negative matrix factorization (NMF) using sklearn performed significantly worse than the baseline, content based, and collaborative methods. The resulting RMSE was 2.86.
* The poor performance can be caused by NMF's random initialization, the latent factors that we chose for NMF, and also the sparsity nature of the movie data, since NMF is extremely sensitive to sparsity.
* Also, NMF performance suffers because it gives unknown ratings as zeroes, consequently leading to wrong updates during the training. The zeroes could be misleading as false negatives.

Some fixes to consider:
* We could give default values to unknown ratings before applying the NMF or have the loss function to ignore the unknown ratings.
* Use hyperparameter tuning to tune NMF for optimization.
* Try some hybrid approaches such as combining NMF with content/collaborative methods to improve results.