# LAB 10 : Restricted Boltzmann Machine

Name : 

Roll Number :

References : 

1. MNIST Dataset : http://yann.lecun.com/exdb/mnist/
2. Movie Lens Dataset : https://grouplens.org/datasets/movielens/
3. https://towardsdatascience.com/restricted-boltzmann-machine-how-to-create-a-recommendation-system-for-movie-review-45599a406deb
4. https://towardsdatascience.com/restricted-boltzmann-machine-as-a-recommendation-system-for-movie-review-part-2-9a6cab91d85b
5. https://github.com/echen/restricted-boltzmann-machines

#**Problem 1** : MNIST Digit Classification using RBM + Logistic Regression 

1. Consider MNIST Digit Dataset
2. Use the Bernoulli RBM API from Sci-kit learn package and create a pipeline of RBM network and logistic regression to classify the digits

##Write down the Objectives, Hypothesis and Experimental description for the above problem


## Programming : 
  Please write a program to demonstrate the same

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage import convolve
from sklearn import linear_model, datasets, metrics
from sklearn.model_selection import train_test_split
from sklearn.neural_network import BernoulliRBM
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import minmax_scale
from sklearn.base import clone

In [3]:
def nudge_dataset(X, Y):
  """
  This produces a dataset 5 times bigger than the original one,
  by moving the 8x8 images in X around by 1px to left, right, down, up
  """
  direction_vectors = [
  [[0, 1, 0], [0, 0, 0], [0, 0, 0]],
  [[0, 0, 0], [1, 0, 0], [0, 0, 0]],
  [[0, 0, 0], [0, 0, 1], [0, 0, 0]],
  [[0, 0, 0], [0, 0, 0], [0, 1, 0]],
  ]
  def shift(x, w):
    return convolve(x.reshape((8, 8)), mode="constant", weights=w).ravel()
  
  X = np.concatenate([X] + [np.apply_along_axis(shift, 1, X, vector) for vector in direction_vectors])
  Y = np.concatenate([Y for _ in range(5)], axis=0)
  return X, Y


In [4]:
X, y = datasets.load_digits(return_X_y=True)
X = np.asarray(X, "float32")
X, Y = nudge_dataset(X, y)
X = minmax_scale(X, feature_range=(0, 1)) # 0-1 scaling


In [6]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)

In [7]:
logistic = linear_model.LogisticRegression(solver="newton-cg", tol=1)
rbm = BernoulliRBM(random_state=0, verbose=True)
rbm_features_classifier = Pipeline(steps=[("rbm", rbm), ("logistic", logistic)])


In [8]:
# Hyper-parameters. These were set by cross-validation,
# using a GridSearchCV. Here we are not performing cross-validation to
# save time.
rbm.learning_rate = 0.06
rbm.n_iter = 10
# More components tend to give better prediction performance, but larger
# fitting time
rbm.n_components = 100
logistic.C = 6000
# Training RBM-Logistic Pipeline
rbm_features_classifier.fit(X_train, Y_train)


[BernoulliRBM] Iteration 1, pseudo-likelihood = -25.57, time = 0.38s
[BernoulliRBM] Iteration 2, pseudo-likelihood = -23.68, time = 0.52s
[BernoulliRBM] Iteration 3, pseudo-likelihood = -22.88, time = 0.55s
[BernoulliRBM] Iteration 4, pseudo-likelihood = -21.91, time = 0.50s
[BernoulliRBM] Iteration 5, pseudo-likelihood = -21.79, time = 0.42s
[BernoulliRBM] Iteration 6, pseudo-likelihood = -20.96, time = 0.32s
[BernoulliRBM] Iteration 7, pseudo-likelihood = -20.80, time = 0.26s
[BernoulliRBM] Iteration 8, pseudo-likelihood = -20.63, time = 0.26s
[BernoulliRBM] Iteration 9, pseudo-likelihood = -20.38, time = 0.25s
[BernoulliRBM] Iteration 10, pseudo-likelihood = -20.19, time = 0.25s


Pipeline(steps=[('rbm',
                 BernoulliRBM(learning_rate=0.06, n_components=100,
                              random_state=0, verbose=True)),
                ('logistic',
                 LogisticRegression(C=6000, solver='newton-cg', tol=1))])

In [10]:
raw_pixel_classifier = clone(logistic)
raw_pixel_classifier.C = 100.0
raw_pixel_classifier.fit(X_train, Y_train)

Y_pred = rbm_features_classifier.predict(X_test)
print(
 "Logistic regression using RBM features:\n%s\n"
 % (metrics.classification_report(Y_test, Y_pred))
)
print(
 "Confusion Matrix:\n%s\n"
 % (metrics.confusion_matrix(Y_test, Y_pred))
)
Y_pred = raw_pixel_classifier.predict(X_test)
print(
 "Logistic regression using raw pixel features:\n%s\n"
 % (metrics.classification_report(Y_test, Y_pred))
)
print(
 "Confusion Matrix:\n%s\n"
 % (metrics.confusion_matrix(Y_test, Y_pred))
)

Logistic regression using RBM features:
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       174
           1       0.90      0.91      0.91       184
           2       0.92      0.95      0.93       166
           3       0.95      0.89      0.91       194
           4       0.95      0.94      0.94       186
           5       0.93      0.92      0.93       181
           6       0.97      0.96      0.97       207
           7       0.93      0.99      0.96       154
           8       0.90      0.88      0.89       182
           9       0.88      0.91      0.89       169

    accuracy                           0.93      1797
   macro avg       0.93      0.93      0.93      1797
weighted avg       0.93      0.93      0.93      1797


Confusion Matrix:
[[171   0   0   0   2   0   1   0   0   0]
 [  0 168   1   1   1   1   1   1   4   6]
 [  0   4 158   1   0   0   0   1   2   0]
 [  0   0   8 172   0   2   0   2   6   4]
 [  1   1 

## Inferences and Conclusion : State all the key observations and conclusion

#**Problem 2** : RBM as a Recommendation System for Movie Review on Movie Lens Dataset

1. Use the Movie Lens Dataset, Split it into train-test set. Convert the ratings to Binary (The task is to predict if the user likes a movie or not) 
2. Build a RBM network, train the model and test it on the test set

##Write down the Objectives, Hypothesis and Experimental description for the above problem


## Programming : 
  Please write a program to demonstrate the same

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
##import dataset
movies = pd.read_csv('ml-1m/movies.dat', sep = '::',header = None, engine = 'python',
                     encoding = 'latin-1')
users =pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python',
                     encoding = 'latin-1')
ratings =pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python',
                     encoding = 'latin-1')

In [None]:
##create training and test set data
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t', header = None)
##convert it to array
training_set = np.array(training_set, dtype = 'int')

test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t', header = None)
##convert it to array
test_set = np.array(test_set, dtype = 'int')

In [None]:
#take max users id in train and test data
nb_users = int(max(max(training_set[:, 0]), max(test_set[:, 0])))
nb_movies =  int(max(max(training_set[:, 1]), max(test_set[:, 1])))


In [None]:
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies — 1] = id_ratings
        new_data.append(list(ratings))
    return new_data

In [None]:
training_set = convert(training_set)
test_set = convert(test_set)

In [None]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

In [None]:
training_set[training_set == 0] = -1
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set >= 3] = 1
test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set >= 3] = 1

In [None]:
class RBM():
    def __init__(self, nv, nh):
        ##initialize all weights 
        ##a tensor with size of nh, nv in normal dis mean 0 var 1
        self.W = torch.randn(nh, nv)
        #bias for hidden nodes
        #1st dimension is batch, 2nd is num of hidden nodes
        self.a = torch.randn(1, nh)
        #bias for visible nodes
        self.b = torch.randn(1, nv)
    #activate the hidden nodes by sampling all hiddens node, given values of visible nodes 
    def sample_h(self, x):
        #x is values of visible nodes
        #probablity of hiddens h to be activated, given values of visible  nodes v
        wx = torch.mm(x, self.W.t())
        #use sigmoid fuc to activate visible node
        ## a is bias for hidden nodes
        activation = wx + self.a.expand_as(wx)
        ##ith of the vector is the probability of ith hidden nodes to be activated, 
        ##given visible values
        p_h_given_v =torch.sigmoid(activation)
        #samples of all hiddens nodes
        return p_h_given_v, torch.bernoulli(p_h_given_v)
    def sample_v(self, y):
        #y is hidden nodes
        #probablity of visible h to be activated, given hidden  nodes v
        wy = torch.mm(y, self.W)
        #use sigmoid fuc to activate hiddens nodes
        activation = wy + self.b.expand_as(wy)
        ##ith of the vector is the probability of ith visible nodes to be activated, 
        ##given hidden values
        p_v_given_h =torch.sigmoid(activation)
        #samples of all hiddens nodes
        return p_v_given_h, torch.bernoulli(p_v_given_h)
        
    #visible nodes after kth interation
    #probablity of hidden nodes after kth iteration
    def train(self, v0, vk, ph0, phk):
#         self.W += torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)
        self.W += (torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)).t()
#         self.W += torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)
        #add zero to keep b as a tensor of 2 dimension
        self.b += torch.sum((v0 - vk), 0)
        self.a += torch.sum((ph0 - phk), 0)

In [None]:
#number of movies
nv = len(training_set[0])
#number of hidden nodes or num of features
nh = 100
batch_size = 100
rbm = RBM(nv, nh)

In [None]:
nb_epoch = 10
for epoch in range(1, nb_epoch+1):
    ##loss function
    train_loss = 0
    #normalize the loss, define a counter
    s = 0.
    #implement a batch learning, 
    for id_user in range(0, nb_users - batch_size, 100):
        #input batch values
        vk = training_set[id_user: id_user+batch_size]
        #target used for loss mesarue: rating 
        v0 = training_set[id_user: id_user+batch_size]
        ##initilize probablity
        #pho: given real rating at begining, probablity of hidden nodes
        ph0, _ = rbm.sample_h(v0)
        #k step of constrative divergence
        for k in range(10):
            _, hk = rbm.sample_h(vk)
            _, vk = rbm.sample_v(hk)
            #training on rating that do exist, rating as -1
            vk[v0<0] = v0[v0<0]
        phk, _ = rbm.sample_h(vk)
        #update weights and bias
        rbm.train(v0, vk, ph0, phk)
        #update train loss
        train_loss += torch.mean(torch.abs(v0[v0>0]-vk[v0>0]))
        s += 1
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

In [None]:
##loss function
test_loss = 0
#normalize the loss, define a counter
s = 0.
#implement a batch learning, 
for id_user in range(0, nb_users):
    #use input of train set to activate RBM
    v_input = training_set[id_user: id_user+1]
    #target used for loss mesarue: rating 
    v_target = test_set[id_user: id_user+1]
    #use only 1 step to make better prediction, though used 10 steps to train
    if len(v_target[v_target>=0]):
        _, h = rbm.sample_h(v_input)
        _, v_input = rbm.sample_v(h)
        #update test loss
        test_loss += torch.mean(torch.abs(v_target[v_target>0]-v_input[v_target>0]))
        s += 1

## Inferences and Conclusion : State all the key observations and conclusion