# Deep learning A-Z : Building a Boltzmann Machine

<p align="justify">
This notebook is my response to the fifth homework of the course called *Deep Learning A-Z™: Hands-On Artificial Neural Networks* accessible here : https://www.udemy.com/deeplearning/
</p>
<p align="justify">
In this notebook, we are going to build two recommended systems which will tell for each user if this user liked or no a movie. 
The first recommended system will have a binary output to tell if the user liked or not the movie. We are going to build this recommended system using Boltzmann machines built with pytorch.
The second recommended system will output a rating between 1 to five to scale if the user liked or not the movie. We are going to build this recommended system using Autoencoder built with pytorch.
</p>

### Imports

In [23]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

### 1. Data preprocessing

In [24]:
base_path = os.path.join(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath('__file__')))), 
                          'ressources/Boltzmann_Machines/')

movies = pd.read_csv(base_path + 'ml-1m/movies.dat', sep='::', header=None, engine='python', encoding='latin-1')
users = pd.read_csv(base_path + 'ml-1m/users.dat', sep='::', header=None, engine='python', encoding='latin-1')
ratings = pd.read_csv(base_path + 'ml-1m/ratings.dat', sep='::', header=None, engine='python', encoding='latin-1')

movies.head()


Unnamed: 0,0,1,2
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


For the ratings dataset, the first col is the user id, the second is the genre and the third one is the grade. Fourth one is the zip code.

In [25]:
users.head()

Unnamed: 0,0,1,2,3,4
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


For the ratings dataset, the first col is the user id, the second is the movie that this user rated and the third one is the grade. Fourth one is just a timestamp.

In [26]:
ratings.head()

Unnamed: 0,0,1,2,3
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [27]:
# cols are the same as for ratings dataset
training_set = pd.read_csv(base_path + 'ml-100k/u1.base', delimiter='\t')
test_set = pd.read_csv(base_path + 'ml-100k/u1.test', delimiter='\t')
training_set.head()

Unnamed: 0,1,1.1,5,874965758
0,1,2,3,876893171
1,1,3,4,878542960
2,1,4,3,876893119
3,1,5,3,889751712
4,1,7,4,875071561


In [28]:
# convert data to array
training_set = np.array(training_set, dtype='int')
test_set = np.array(test_set, dtype='int')
training_set

array([[        1,         2,         3, 876893171],
       [        1,         3,         4, 878542960],
       [        1,         4,         3, 876893119],
       ...,
       [      943,      1188,         3, 888640250],
       [      943,      1228,         3, 888640275],
       [      943,      1330,         3, 888692465]])

In [29]:
# Getting the number of users and movies
nb_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

In [30]:
# Converting the data into an array with users in lines and movies in columns
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)

Tensors are array with only one data type. So Torch tensors are just multi-dimentionnal arrays exactly as tensorflow tensors.

In [31]:
# Converting the data into Torch tensors
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

Now, our data are tensors :

In [32]:
training_set

tensor([[0., 3., 4.,  ..., 0., 0., 0.],
        [4., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [5., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 5., 0.,  ..., 0., 0., 0.]])

### 2. Building the Boltzmann Machine for binary outputs

First, as we want to predict binary output, we need to change our input ratings into a binary rating. Indeed, predicted binary rating will be calculated from input rating so it must be binary. Moreover, as 0 currently means that a user doesn't rated a movie, we need to change this value (to -1) because for binary output we need the 0 value.

In [11]:
training_set[training_set == 0] = -1 # replace 0 by -1 for non rated movies
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set >= 3] = 1
test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set >= 3] = 1

Now, let's create the model !

In [12]:
class RBM: 
    
    """
    Create a Bernouilli RBM that will predict if user liked of not a movie, using a binary output
    """
    
    def __init__(self, nv, nh):
        """
        Args:
        nv (int): number of visible nodes
        nh (int): number of hidden nodes
        """
        self.Weights = torch.randn(nh, nv) # initialize weights
        self.bias_hidden = torch.randn(1, nh) # initialize bias of hidden nodes (fist dimension is batch, second is bias)
        self.bias_visible = torch.randn(1, nv) # initialize bias of visible nodes
        
    def sample_hidden(self, x):
        """
        Function which will activate hidden nodes according to a certain probability given the input nodes. 
        This probability is the sigmoid function applied to (self.Weights * x + self.bias_hidden).
        """
        wx = torch.mm(x, self.Weights.t()) # make product of two tensors
        activation = wx + self.bias_hidden.expand_as(wx) # expand add a new dimention to make sure that bias is applied to each line of the mini batch (arg "1")
        p_hidden_given_visible = torch.sigmoid(activation) # represents the probability of the hidden node to active given visible node
        
        # Bernouilli function will create a random number between 0 and 1 and return one if 
        # this number is higher than p_hidden_given_visible, and 0 else.
        
        return p_hidden_given_visible, torch.bernoulli(p_hidden_given_visible)
    
    def sample_visible(self, y):
        """
        Function which will activate visible nodes according to a certain probability given the hidden nodes. 
        This probability is the sigmoid function applied to (self.Weights * y + self.bias_visible).
        """
        wy = torch.mm(y, self.Weights)
        activation = wy + self.bias_visible.expand_as(wy)
        p_visible_given_hidden = torch.sigmoid(activation) 
        
        return p_visible_given_hidden, torch.bernoulli(p_visible_given_hidden)
    
    def train(self, v0, vk, ph0, phk):
        """
        Use contrastive divergence to train our RBM

        Args:
        v0 : input vector containing ratings
        vk : visible nodes obtained after k steps of contrastive divergence
        ph0 : initial probability of hidden nodes
        phk : probability of hidden nodes after k gibbs samplings
        """
        self.Weights += (torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)).t()

        self.bias_visible += torch.sum((v0 - vk), 0)
        self.bias_hidden += torch.sum((ph0 - phk), 0)

Train function (lines 8/9/10) and epoch training (lines 3/4/5/6) follows this algorithm :

![title](../images/cd.png)

In [13]:
nv = len(training_set[0]) # lenght of fisrt line of training set is the number of inputs
nh = 100 # 1682 movies so the model may detect many features, we can start by 100 of them
batch_size = 100 # Start with 100 for a fast training (increase diminushes precision)
rbm = RBM(nv, nh) # Create restricted Boltzmann machine

Finally, we can try our model :

In [14]:
# Training the RBM
nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
    train_loss = 0 # introduce a loss variable which will increase when we will find differences between predictions and answer
    counter = 0.
    for id_user in range(0, nb_users - batch_size, batch_size):
        # create batchs of users
        vk = training_set[id_user:id_user+batch_size] # vector that will be the input of gibbs chain and will be updated at each input
        v0 = training_set[id_user:id_user+batch_size] 
        ph0,_ = rbm.sample_hidden(v0)
        for k in range(10): # 10 is an hyperparameter, number of random walks
            _,hk = rbm.sample_hidden(vk) # do gibbs chain
            _,vk = rbm.sample_visible(hk) # do gibbs chain (update of visible nodes)
            vk[v0<0] = v0[v0<0] 
        phk,_ = rbm.sample_hidden(vk)
        rbm.train(v0, vk, ph0, phk)
        train_loss += torch.mean(torch.abs(v0[v0>=0] - vk[v0>=0])) # update train loss
        counter += 1.
    print('epoch: ' + str(epoch) + ' loss: ' + str(train_loss/counter))

epoch: 1 loss: tensor(0.3346)
epoch: 2 loss: tensor(0.2571)
epoch: 3 loss: tensor(0.2552)
epoch: 4 loss: tensor(0.2504)
epoch: 5 loss: tensor(0.2461)
epoch: 6 loss: tensor(0.2446)
epoch: 7 loss: tensor(0.2499)
epoch: 8 loss: tensor(0.2480)
epoch: 9 loss: tensor(0.2454)
epoch: 10 loss: tensor(0.2480)


In [15]:
# Testing the RBM
test_loss = 0
counter = 0.
for id_user in range(nb_users):
    v = training_set[id_user:id_user+1] # we use inputs of training set to activate neurons to predict output for test set
    vt = test_set[id_user:id_user+1]
    if len(vt[vt>=0]) > 0:
        _,h = rbm.sample_hidden(v)
        _,v = rbm.sample_visible(h)
        test_loss += torch.mean(torch.abs(vt[vt>=0] - v[vt>=0]))
        counter += 1.
print('test loss: ' + str(test_loss/counter))

test loss: tensor(0.2533)


### 3. Building the Autoencoder for ratings outputs

We are going to build a Stacked Auto Encoder. The architecture of such model can be found in this figure :

![title](../images/sae.png)

In [33]:
class SAE(nn.Module): # inheritance from Module class of nn module of pytorch
    
    """
    Building a Stacked Auto Encoder
    """
    
    def __init__(self, ):
        
        super(SAE, self).__init__()
        self.full_connection_1 = nn.Linear(nb_movies, 20) # 20 is the number of neurons in hidden layer (number of detected features)
        self.full_connection_2 = nn.Linear(20, 10) # we encode the first hidden layer (decrease number of nodes)
        self.full_connection_3 = nn.Linear(10, 20) # we decode the second hidden layer (increase number of nodes)
        self.full_connection_4 = nn.Linear(20, nb_movies) # ouput layer (with same number of nodes than in input layer)
        self.activation = nn.Sigmoid() # sigmoid activation function
        
    def forward(self, x):
        """
        Args:
        x : input vector
        """
        
        x = self.activation(self.full_connection_1(x)) # pass input to fisrt hidden layer
        x = self.activation(self.full_connection_2(x)) # pass input to second hidden layer
        x = self.activation(self.full_connection_3(x)) # pass input to third hidden layer
        x = self.full_connection_4(x) # we do not apply activation function of output layer
        
        return x

In [34]:
sae = SAE()
criterion = nn.MSELoss()
optimizer = optim.RMSprop(sae.parameters(), lr = 0.01, weight_decay = 0.5)

In [36]:
# Training the SAE
nb_epoch = 200
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    counter = 0. # float to avoid warning
    for id_user in range(nb_users):
        input = Variable(training_set[id_user]).unsqueeze(0) # add additinnal dimension corresponding to the batch (0 is index of new dimension)
        target = input.clone() # target is the same as input vector for Auto Encoder 
        if torch.sum(target.data > 0) > 0: # optimize code for bigger datasets (goes in loof only if at least one user rated one movie)
            output = sae(input) # use Auto Encoder
            target.require_grad = False # compute gradient only with respect to the input and not with respect to the target
            output[target == 0] = 0 # save up some memory for bigger datasets
            loss = criterion(output, target) # calculate loss between output and target
            mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10) # number of movies over number of movies that habe positive rating (+1e10 to avoid zero division without adding bias)
            loss.backward() # indicate if needs to increase or decrease the weights using backward function
            train_loss += np.sqrt(loss.item()*mean_corrector) # update loss value
            counter += 1.
            optimizer.step() # apply the optimizer to change the weights (amount of change in the weights)
    print('epoch: ' + str(epoch) + ' loss: ' +  str(train_loss/counter))

epoch: 1 loss: 1.7663763737549565
epoch: 2 loss: 1.0964976124193597
epoch: 3 loss: 1.0534610503261894
epoch: 4 loss: 1.0381563866651102
epoch: 5 loss: 1.031060397527553
epoch: 6 loss: 1.0264345873198473
epoch: 7 loss: 1.0238328781391688
epoch: 8 loss: 1.0218807300819088
epoch: 9 loss: 1.0210072640670216
epoch: 10 loss: 1.0196825133320673
epoch: 11 loss: 1.019082204160153
epoch: 12 loss: 1.0181814042926405
epoch: 13 loss: 1.0182670753373997
epoch: 14 loss: 1.017563198485881
epoch: 15 loss: 1.0172430537748878
epoch: 16 loss: 1.016888068500237
epoch: 17 loss: 1.0169046023269215
epoch: 18 loss: 1.0164247637942174
epoch: 19 loss: 1.016565284026937
epoch: 20 loss: 1.0160099571776615
epoch: 21 loss: 1.016035854306826
epoch: 22 loss: 1.0156174869763015
epoch: 23 loss: 1.0158948898161921
epoch: 24 loss: 1.015702642954535
epoch: 25 loss: 1.0160270450979854
epoch: 26 loss: 1.0155900618067224
epoch: 27 loss: 1.0156824777170594
epoch: 28 loss: 1.0149671533787115
epoch: 29 loss: 1.0134353086391739
e

In [38]:
# Testing the SAE
test_loss = 0
counter = 0.
for id_user in range(nb_users):
    input = Variable(training_set[id_user]).unsqueeze(0) # we use training ratings to predict ratings for test set
    target = Variable(test_set[id_user]).unsqueeze(0) # what we want to predict
    if torch.sum(target.data > 0) > 0:
        output = sae(input)
        target.require_grad = False
        output[target == 0] = 0
        loss = criterion(output, target)
        mean_corrector = nb_movies/float(torch.sum(target.data > 0) + 1e-10)
        test_loss += np.sqrt(loss.item()*mean_corrector)
        counter += 1.
print('test loss: ' + str(test_loss/counter))

test loss: 0.9533672705973602
