# Recommending movies using Restricted Boltzmann Machines

\- [Saurabh Mathur](saurabhmathur96.github.io)

The aim of this experiment is to recommend movies to a user, given his earlier ratings. To accoplish this I am using the modified version of the Restricted Boltzmann Machine model as discussed by [Ruslan Salakhutdinov, Andriy Mnih and
Geoffrey Hinton](http://www.machinelearning.org/proceedings/icml2007/papers/407.pdf).

In [1]:
import tensorflow as tf

I will be using the [MovieLens 1M dataset](http://grouplens.org/datasets/movielens/) released on 2/2003. It has 1 million ratings from 6000 users on 4000 movies.

I have downloaded the data in the `data/raw/ml-1m` directory. ( `scripts/fetch.py` ) 

I have also pre-processed the ratings data and converted it to a user by movie matrix at `data/intermediate`

In [12]:
import pandas as pd

movies = pd.read_csv("../data/intermediate/movies.csv", index_col=0)

movies.head()

Unnamed: 0,MovieID,Title,Genres,ContinuousMovieID
0,1,Toy Story (1995),Animation|Children's|Comedy,0
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1
2,3,Grumpier Old Men (1995),Comedy|Romance,2
3,4,Waiting to Exhale (1995),Comedy|Drama,3
4,5,Father of the Bride Part II (1995),Comedy,4


In [4]:
import scipy.io

R = scipy.io.mmread("../data/intermediate/user_movie_ratings.mtx").tocsr()


print ('{0}x{1} user by movie matrix'.format(*R.shape))

6040x3883 user by movie matrix


## Restricted Boltzmann Networks

Restricted Boltzmann Machine, [Source: Asimov Institute's Neural Network Zoo](http://www.asimovinstitute.org/neural-network-zoo/)
![Restricted Boltzmann Machine](../images/rbm.png)

The visible nodes are shown as yellow and hidden nodes are shown as green.
The data flow in an RBM is as follows :
- Input data is fed into visible nodes.
- Activations at visible nodes are computed and propagated to hidden nodes.
- Now, the direction of propagation is reversed.
- The output activations are computed and outputted via the visible nodes.

More formally, the RBM can be shown as :

Restricted Boltzmann Machine, [Source: Deep Learning Tutorial part 3/3: Deep Belief Networks](https://lazyprogrammer.me/deep-learning-tutorial-part-33-deep-belief/)
![Restricted Boltzmann Machine](../images/rbm_vars.png)

The Energy of the RBM is defined as :

$$ E(v, h) = -\sum_i a_i \cdot v_i - \sum_j b_i \cdot h_i + \sum_i \sum_j v_i \cdot w_{ij} \cdot h_j $$

Since this is unsupervised learning, the algorithm I am using to train the RBM network is a modification of [*Contrastive Divergence*](https://chronicles.mfglabs.com/rbm-and-recommender-systems-3fa30f53d1dc#.9i7w8t9re).

## The trick


The main issue here is that of missing values - there is a *lot* of them.

To overcome this, the proposed solution [here](http://www.machinelearning.org/proceedings/icml2007/papers/407.pdf)
is :
- each of the visible nodes of the RBM represent a movie ( and take the corresponding movie rating as input ).
- each user vector is treated as a training example. 

This is similar to training an RBM model for each user but reusing the weights. 

## Building the model

In [7]:
from __future__ import division

n_visible, n_hidden = len(movies), 20


graph = tf.Graph()

with graph.as_default():
    v_bias = tf.placeholder(tf.float32, [n_visible])
    h_bias = tf.placeholder(tf.float32, [n_hidden])
    W = tf.placeholder(tf.float32, [n_visible, n_hidden])
    
    # visible to hidden pass
    v_1 = tf.placeholder(tf.float32, [None, n_visible])
    h_1_ = tf.sigmoid(tf.matmul(v_1, W) + h_bias)
    h_1 = tf.nn.relu(tf.sign(h_1_ - tf.random_uniform(tf.shape(h_1_))))
    
    
    # hidden to visible pass
    v_2_ = tf.sigmoid(tf.matmul(h_1, tf.transpose(W)) + v_bias)
    v_2 = tf.nn.relu(tf.sign(v_2_ - tf.random_uniform(tf.shape(v_2_))))
    h_2 = tf.nn.sigmoid(tf.matmul(v_2, W) + h_bias)
    
    # Learning rate
    lr = 0.01
    W_gradient_1 = tf.matmul(tf.transpose(v_1), h_1)
    W_gradient_2 = tf.matmul(tf.transpose(v_2), h_2)
    
    contrastive_divergence = ( W_gradient_1 - W_gradient_2 ) / tf.to_float(tf.shape(v_1)[0])
    
    # parameter updates
    W_update = W + lr * contrastive_divergence
    v_bias_update = v_bias + lr * tf.reduce_mean(v_1 - v_2, 0)
    h_bias_update = h_bias + lr * tf.reduce_mean(h_1 - h_2, 0)
    
    # error metrics
    mae = tf.reduce_mean(tf.abs(v_1 - v_2))
    rmse = tf.sqrt(tf.reduce_mean(tf.square(v_1 - v_2)))
    

## Training the model

In [8]:
import numpy as np


n_epoch = 20
batch_size = 100
current_W = np.zeros([n_visible, n_hidden], np.float32)
current_v_bias = np.zeros([n_visible], np.float32)
current_h_bias = np.zeros([n_hidden], np.float32)


# split into train and test
train_R = R[0:4500]
test_R = R[4500:]

errors = []

with tf.Session(graph=graph) as sess:
    tf.initialize_all_variables().run()
    for epoch in range(n_epoch):
        for start in range(0, train_R.shape[0]-batch_size, batch_size):
            end = start + batch_size
            end = start + batch_size
            batch = train_R[start:end].todense()
            feed_dict = { v_1: batch, W: current_W, v_bias: current_v_bias, h_bias: current_h_bias }
            updates = [W_update, v_bias_update, h_bias_update]
            current_W, current_v_bias, current_h_bias = sess.run(updates, feed_dict=feed_dict)
        
        feed_dict = { v_1: test_R.todense(), W: current_W, v_bias: current_v_bias, h_bias: current_h_bias }
        mean_average_error, root_mean_squared_error = sess.run([mae, rmse], feed_dict=feed_dict)
        current_error = { "MAE": mean_average_error, "RMSE": root_mean_squared_error }
        
        print "MAE = {MAE:10.9f}, RMSE = {RMSE:10.9f}".format(**current_error)
        errors.append(current_error)

MAE = 0.393706173, RMSE = 0.837142766
MAE = 0.307844728, RMSE = 0.795156062
MAE = 0.283565879, RMSE = 0.782326221
MAE = 0.274305254, RMSE = 0.777140021
MAE = 0.268723488, RMSE = 0.773530304
MAE = 0.266444802, RMSE = 0.772211075
MAE = 0.265085578, RMSE = 0.771489143
MAE = 0.264171988, RMSE = 0.770580709
MAE = 0.263169616, RMSE = 0.770097256
MAE = 0.262013227, RMSE = 0.769408047
MAE = 0.261296660, RMSE = 0.768895447
MAE = 0.261128098, RMSE = 0.768850684
MAE = 0.260994464, RMSE = 0.768646479
MAE = 0.260592788, RMSE = 0.768427193
MAE = 0.260619551, RMSE = 0.768634498
MAE = 0.259937257, RMSE = 0.768016875
MAE = 0.259915859, RMSE = 0.768064320
MAE = 0.259853303, RMSE = 0.768015981
MAE = 0.259344101, RMSE = 0.767609239
MAE = 0.258970350, RMSE = 0.767514586


## Save the model parameters for resuse

In [9]:
np.save("../models/W.npy", current_W)
np.save("../models/v_bias.npy", current_v_bias)
np.save("../models/h_bias.npy", current_h_bias)

## Load the model parameters

In [10]:
import numpy as np

current_W = np.load("../models/W.npy")
current_v_bias = np.load("../models/v_bias.npy")
current_h_bias = np.load("../models/h_bias.npy")

## Making recommendations

In [11]:
import tensorflow as tf
from IPython.display import display, HTML

graph = tf.Graph()

with graph.as_default():
    v_bias = tf.placeholder(tf.float32, [n_visible])
    h_bias = tf.placeholder(tf.float32, [n_hidden])
    W = tf.placeholder(tf.float32, [n_visible, n_hidden])
    v_1 = tf.placeholder(tf.float32, [None, n_visible])
    
    
    h_1 = tf.nn.sigmoid(tf.matmul(v_1, W) + h_bias)
    v_2 = tf.nn.sigmoid(tf.matmul(h_1, tf.transpose(W)) + v_bias)

current_user = R[4500].todense()
recommendations = movies.copy(deep=True)
recommendations["Ratings"] =  current_user[0].T
HTML("<h3> Rated movies </h3>")
display(recommendations.sort_values(by=["Ratings"], ascending = False).head())


print ("current_user = {0}".format(current_user))
with tf.Session(graph=graph) as sess:
    tf.initialize_all_variables().run()
    feed_dict = { v_1: current_user, W: current_W, h_bias: current_h_bias }
    h1 = sess.run(h_1, feed_dict=feed_dict)
    feed_dict = { h_1: h1, W: current_W, v_bias: current_v_bias }
    v2 = sess.run(v_2, feed_dict=feed_dict)
    recommendations["Score"] = v2[0] * 5.0
    HTML("<h3> Recommended movies </h3>")
    display(recommendations.sort_values(by=["Score"], ascending = False).head())

Unnamed: 0.1,Unnamed: 0,MovieID,Title,Genres,ContinuousMovieID,Ratings
1265,1265,1285,Heathers (1989),Comedy,1265,5
1353,1353,1374,Star Trek: The Wrath of Khan (1982),Action|Adventure|Sci-Fi,1353,5
2398,2398,2467,"Name of the Rose, The (1986)",Mystery,2398,5
2647,2647,2716,Ghostbusters (1984),Comedy|Horror,2647,5
1180,1180,1198,Raiders of the Lost Ark (1981),Action|Adventure,1180,5


current_user = [[0 0 0 ..., 0 0 0]]


Unnamed: 0.1,Unnamed: 0,MovieID,Title,Genres,ContinuousMovieID,Ratings,Score
0,0,1,Toy Story (1995),Animation|Children's|Comedy,0,0,5.0
3045,3045,3114,Toy Story 2 (1999),Animation|Children's|Comedy,3045,0,5.0
2286,2286,2355,"Bug's Life, A (1998)",Animation|Children's|Comedy,2286,0,5.0
2327,2327,2396,Shakespeare in Love (1998),Comedy|Romance,2327,0,5.0
585,585,589,Terminator 2: Judgment Day (1991),Action|Sci-Fi|Thriller,585,5,5.0
