# Movie recommender with multinomial RBM (Python, Tensorflow, GPU)

A Restricted Boltzmann Machine (RBM) is used to perform collaborative filtering over the Movielens dataset.
The RBM is a generative model, it learns the joint probability distribution $P(v,h)$, where $v$ are the visible units and $h$ the hidden ones. The hidden units are latent variables while the visible units are clamped on the input data. The model generates ratings for a user/movie pair using a collaborative filtering based approach. 

The dataset contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users. The movies are ranked from 1-5; In the first iteration of this notebook we consider a simplifie version of the problem, implementing a binary encoding of the dataset. The reason for this choice is that one can use a more traditional binary RBM instead of a multinomial one, see below for further discussion.



## 0 Global Settings and Import

In [3]:
#load libraries

from __future__ import print_function
from __future__ import absolute_import
from __future__ import division

# set the environment path to find Recommenders
import sys
sys.path.append("../../")

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import papermill as pm
from zipfile import ZipFile

from reco_utils.recommender.rbm.Mrbm_tensorflow import RBM
from reco_utils.dataset.python_splitters import python_stratified_split
from reco_utils.dataset.url_utils import maybe_download
from reco_utils.evaluation.python_evaluation import map_at_k, ndcg_at_k, precision_at_k, recall_at_k

#For interactive mode only
%load_ext autoreload
%autoreload 2

print("System version: {}".format(sys.version))
print("Pandas version: {}".format(pd.__version__))

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
System version: 3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
Pandas version: 0.23.4


We first load the movies and the user's rankings

In [10]:
#Load the movielens dataset
path1 = 'C:/Users/mimillet/OneDrive - Microsoft/Recommender Systems/Recommenders/reco_utils/recommender/rbm/movielens_data/movies.dat'

movies_df = pd.read_csv(path1, sep='::', header= None, engine= 'python', names =  ['MovieId', 'Title', 'Genre'] ) #movies dataset

#inspect first entries
movies_df.head()

Unnamed: 0,MovieId,Title,Genre
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [13]:
#load the ratings
path2 ='C:/Users/mimillet/OneDrive - Microsoft/Recommender Systems/Recommenders/reco_utils/recommender/rbm/movielens_data/ratings.dat'
ratings_df = pd.read_csv(path2, sep='::', header=None, engine = 'python', names=['userID','MovieId','Rating','Timestamp'])

ratings_df.head( )

Unnamed: 0,userID,MovieId,Rating,Timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


### 1 Split the data using the stratified python splitter  

In [14]:
train, test = python_stratified_split(ratings_df)

## 2 Train the RBM model






In [15]:
header = {
        "col_user": "userID",
        "col_item": "MovieId",
        "col_rating": "Rating",
    }


model = RBM(hidden_units= 1000, keep_prob= .7, training_epoch = 10,**header)


In [17]:
model.fit(train)

Building user affinity sparse matrix...
Collecting user affinity matrix...
Creating the computational graph


## 4 Recommendation report    

Once the system has been trained, we produce a recommendation report for a given user. Below we can select a user and plot the first k movies with the highest score. Note that these suggestions possibly includes movies that have already been seen by the user. We can also provide a list of reccomended but so far unseen movies

In [11]:
#Recommended Movies, possibly including seen ones 
#select a user by changing the id below. The id is an integer number from 0 to n_user

usr_id = 0
k=10

usr_mv_like = np.where(vp[usr_id,:]==2) #for each user selects the liked movies 
mv_id = (np.asanyarray(usr_mv_like)+1).flatten() 

MVI= np.in1d(movies_df['MovieID'].values, mv_id) #index of liked movies 

#add the recommendation score 
sel_movie = movies_df[MVI] 
sel_movie['reco score'] = pvh[usr_id,usr_mv_like,1].flatten()

#order the movies according to their score 
sel_movie.sort_values(['reco score'], ascending = False).head(k)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,MovieID,Title,Genre,reco score
2502,2571,"Matrix, The (1999)",Action|Sci-Fi|Thriller,0.738748
1192,1210,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,0.737414
0,1,Toy Story (1995),Animation|Children's|Comedy,0.737017
604,608,Fargo (1996),Crime|Drama|Thriller,0.734234
907,919,"Wizard of Oz, The (1939)",Adventure|Children's|Drama|Musical,0.732835
1959,2028,Saving Private Ryan (1998),Action|Drama|War,0.73275
2647,2716,Ghostbusters (1984),Comedy|Horror,0.732745
2559,2628,Star Wars: Episode I - The Phantom Menace (1999),Action|Adventure|Fantasy|Sci-Fi,0.732674
589,593,"Silence of the Lambs, The (1991)",Drama|Thriller,0.732558
2693,2762,"Sixth Sense, The (1999)",Thriller,0.732054


In [12]:
#Recommended, unseen Movies 
#select a user by changing the id below. The id is an integer number from 0 to n_user

mv_unseen = np.where(RX[usr_id]==0) 
mv_id_un= np.intersect1d(usr_mv_like, mv_unseen)
mv_id_unseen = mv_id_un+1 

MVI_unseen= np.in1d(movies_df['MovieID'].values, mv_id_unseen)

sel_unseen_movie = movies_df[MVI_unseen]

sel_unseen_movie['reco score'] = pvh[usr_id,mv_id_un,1].flatten()

sel_unseen_movie.sort_values(['reco score'], ascending = False).head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if sys.path[0] == '':


Unnamed: 0,MovieID,Title,Genre,reco score
2502,2571,"Matrix, The (1999)",Action|Sci-Fi|Thriller,0.738748
1192,1210,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Romance|Sci-Fi|War,0.737414
2647,2716,Ghostbusters (1984),Comedy|Horror,0.732745
2559,2628,Star Wars: Episode I - The Phantom Menace (1999),Action|Adventure|Fantasy|Sci-Fi,0.732674
589,593,"Silence of the Lambs, The (1991)",Drama|Thriller,0.732558
642,648,Mission: Impossible (1996),Action|Adventure|Mystery,0.731406
1203,1221,"Godfather: Part II, The (1974)",Action|Crime|Drama,0.731205
453,457,"Fugitive, The (1993)",Action|Thriller,0.731133
1271,1291,Indiana Jones and the Last Crusade (1989),Action|Adventure,0.730507
1245,1265,Groundhog Day (1993),Comedy|Romance,0.730242


We can check if this result makes sense by looking at the orginal movies rated by user 1. There is a good mixture of Action/Sci-Fi, Comedy and drama, even though there is no much children's movie. For example, the user watched Star wars episode IV and the model suggest to watch episode VI. If you wonder why there is no episode V that is a good question! Personally, my favourites of the old series are IV and VI, as it seems to be the case for many people, so it may not be totally unreasonable.  

In [13]:
merged_df = pd.merge(movies_df, ratings_df[['UserID', 'MovieID','Rating']], on='MovieID')
merged_df[ merged_df['UserID'] ==1 ]

Unnamed: 0,MovieID,Title,Genre,UserID,Rating
0,1,Toy Story (1995),Animation|Children's|Comedy,1,5
22893,48,Pocahontas (1995),Animation|Children's|Musical|Romance,1,5
41541,150,Apollo 13 (1995),Drama,1,5
67447,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Fantasy|Sci-Fi,1,4
141660,527,Schindler's List (1993),Drama|War,1,5
144754,531,"Secret Garden, The (1993)",Children's|Drama,1,4
158459,588,Aladdin (1992),Animation|Children's|Comedy|Musical,1,4
167921,594,Snow White and the Seven Dwarfs (1937),Animation|Children's|Musical,1,4
168684,595,Beauty and the Beast (1991),Animation|Children's|Musical,1,5
172011,608,Fargo (1996),Crime|Drama|Thriller,1,4


The analysis of the multinomial case will be presented later