## Boltzmann Machines

#### Importing the libraries

In [1]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

#### Importing the dataset

In [2]:
movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')

#### Preparing the training set and the test set

In [3]:
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int')

#### Getting the number of users and movies

In [4]:
nb_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

In [5]:
nb_users

943

In [6]:
nb_movies

1682

#### Converting the data into an array with users in lines and movies in columns

In [7]:
def convert(data):
    new_data = []
    for id_users in range(1, nb_users + 1):
        id_movies = data[:,1][data[:,0] == id_users]
        id_ratings = data[:,2][data[:,0] == id_users]
        ratings = np.zeros(nb_movies)
        ratings[id_movies - 1] = id_ratings
        new_data.append(list(ratings))
    return new_data
training_set = convert(training_set)
test_set = convert(test_set)


#### Converting the data into Torch tensors

Lines= observation into the network, features = columns the input nodes

Architecture is created with tensors = arrays that contains elements of a single data type, tensor is multidimensional matrix that is a pytorch array  

In [10]:
training_set = torch.FloatTensor(training_set)
test_set = torch.FloatTensor(test_set)

#### Converting the ratings into binary ratings 1 (Liked) or 0 (Not Liked)

In [11]:
training_set[training_set == 0] = -1
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set >= 3] = 1
test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set >= 3] = 1

In [12]:
training_set

tensor([[-1.,  1.,  1.,  ..., -1., -1., -1.],
        [ 1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        ...,
        [ 1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1.,  1., -1.,  ..., -1., -1., -1.]])

#### Creating the architecture of the Neural Network

Boltzmann Machine is a probabilistic graphical model

BM usigin RBM ( Restricted Boltzmann Machine Architecture)

##### 1 Function inside the class:

init > Number of hidden nodes , the weights the probability of the visible nodes given the hidden nodes, bias for the same probability and also the bias for the visibles nodes given the hidden nodes
nh: Hidden nodes, nv: Visible nodes, W= weight.

self.W=torch.randn = initialize the function randomly(Normal Distribution)

self.a=torch.randn initializethe function randomly(Normal Distribution)- a= bias, vector of nh element, (1,nh) two dimension 1 bias 2 vector, (pytorch tensor)cannot accept 1 dimension.

self.b=torch.randn initializethe function randomly(Normal Distribution)- a= bias, vector of nv element, (1,nv) two dimension 1 bias 2 vector, (pytorch tensor)cannot accept 1 dimension.- For the visible nodes

##### 2 Function (Sample): ( the hidden nodes)

probabilities of the hidden nodes given visibles nodes, sigmoid activation function, is used cuz during the training we aproximate the loglikelyhood gradient, using gip sampling

1) Probability of H given v: sigmoid activation function, apply to wx = torch.mm(x, self.W.t())

2) Inside the activation function  wx plus the bias is a linear function of the neuron.activation = wx + self.a.expand_as(wx), batches= expand_as(wx) the bias is apply to each line of the mini batch.

3) Activate the hidden nodes: probability will be activate  (p_h_given_v > H of given V)  = torch.sigmoid(activation)

4) retun probability and the sample: bernoulli rvn cuz we are predicting and binary outcome, whether the user like or not (0 or 1) return p_h_given_v, torch.bernoulli(p_h_given_v), probability the hidden is activated


##### 3 Funtion (Sample V):  The visible nodes

the probabilities of the visibles nodes given the hidden nodes P=1. Outcome the predicting rating.
same data from sampleh

##### 4 Function Train - Contrastive divergent - aprox lilkelyhood grad

Energy based model, we need to minimize the weight, the goal is to maximize the log-likelyhood
we need to compute the gradient (aprox), usig Gibbs sampling algo =  K-steps constrative divergent

In [19]:
class RBM():
    def __init__(self, nv, nh):
        self.W = torch.randn(nh, nv)
        self.a = torch.randn(1, nh)
        self.b = torch.randn(1, nv)
    def sample_h(self, x):
        wx = torch.mm(x, self.W.t())
        activation = wx + self.a.expand_as(wx)
        p_h_given_v = torch.sigmoid(activation)
        return p_h_given_v, torch.bernoulli(p_h_given_v)
    def sample_v(self, y):
        wy = torch.mm(y, self.W)
        activation = wy + self.b.expand_as(wy)
        p_v_given_h = torch.sigmoid(activation)
        return p_v_given_h, torch.bernoulli(p_v_given_h)
    def train(self, v0, vk, ph0, phk):
        self.W += (torch.mm(v0.t(), ph0) - torch.mm(vk.t(), phk)).t()
        self.b += torch.sum((v0 - vk), 0)
        self.a += torch.sum((ph0 - phk), 0)

In [20]:
nv = len(training_set[0])
nh = 100 #number of features
batch_size = 100 # Based on number of observations
rbm = RBM(nv, nh) # creatinf RBM object 

#### Training the RBM

For loop that will go through the observation and will adjust the weight, final visible nodes.
nb_epoch + 1 = will go from 1 to 10
loss_function to measure the error, difference from predicting and the real raiting
simple difference in absolute value = loss method used
S= counter
for id_user in range(0, nb_user -batch_size(stop),batch_size(step= 100):
Inputs = rainting of all the movies by specific user
targets= keep same initial value


In [21]:
nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(0, nb_users - batch_size, batch_size):
        vk = training_set[id_user:id_user+batch_size] #Inputs range of the user to next 100 Users
        v0 = training_set[id_user:id_user+batch_size] # target - we dont wanna touch to get the lost
        ph0,_ = rbm.sample_h(v0) # Probabilities (Sample-h) /(,_) when we want the first element of the function /RBM Object, element(x) 
        for k in range(10): # K steps constrative divergent - visible nodes are weight
            _,hk = rbm.sample_h(vk) # Sample H in the visible nodes - for the hidden node
            _,vk = rbm.sample_v(hk) # sample V in the hidden node
            vk[v0<0] = v0[v0<0] # Dont wanna learn where in not raiting- Frezze visible nodes
        phk,_ = rbm.sample_h(vk)
        rbm.train(v0, vk, ph0, phk) # train lost below
        train_loss += torch.mean(torch.abs(v0[v0>=0] - vk[v0>=0])) # simple distance between absolute values
        s += 1.
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s)) # what happens - the lost how is decreasing


epoch: 1 loss: tensor(0.3477)
epoch: 2 loss: tensor(0.2575)
epoch: 3 loss: tensor(0.2470)
epoch: 4 loss: tensor(0.2508)
epoch: 5 loss: tensor(0.2463)
epoch: 6 loss: tensor(0.2511)
epoch: 7 loss: tensor(0.2475)
epoch: 8 loss: tensor(0.2497)
epoch: 9 loss: tensor(0.2483)
epoch: 10 loss: tensor(0.2472)


#### Testing the RBM

In [22]:
test_loss = 0
s = 0.
for id_user in range(nb_users):
    v = training_set[id_user:id_user+1]
    vt = test_set[id_user:id_user+1]
    if len(vt[vt>=0]) > 0:
        _,h = rbm.sample_h(v)
        _,v = rbm.sample_v(h)
        test_loss += torch.mean(torch.abs(vt[vt>=0] - v[vt>=0]))
        s += 1.
print('test loss: '+str(test_loss/s))

test loss: tensor(0.2604)


### Evaluating our RBM  with the RMSE

The RMSE (Root Mean Squared Error) is calculated as the root of the mean of the squared differences between the predictions and the targets.

In [23]:
nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
    train_loss = 0
    s = 0.
    for id_user in range(0, nb_users - batch_size, batch_size):
        vk = training_set[id_user:id_user+batch_size]
        v0 = training_set[id_user:id_user+batch_size]
        ph0,_ = rbm.sample_h(v0)
        for k in range(10):
            _,hk = rbm.sample_h(vk)
            _,vk = rbm.sample_v(hk)
            vk[v0<0] = v0[v0<0]
        phk,_ = rbm.sample_h(vk)
        rbm.train(v0, vk, ph0, phk)
        train_loss += np.sqrt(torch.mean((v0[v0>=0] - vk[v0>=0])**2)) # RMSE here
        s += 1.
    print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

epoch: 1 loss: tensor(0.4975)
epoch: 2 loss: tensor(0.4942)
epoch: 3 loss: tensor(0.4965)
epoch: 4 loss: tensor(0.4982)
epoch: 5 loss: tensor(0.4943)
epoch: 6 loss: tensor(0.4941)
epoch: 7 loss: tensor(0.4980)
epoch: 8 loss: tensor(0.4944)
epoch: 9 loss: tensor(0.4974)
epoch: 10 loss: tensor(0.4992)


#### Test phase

In [24]:
test_loss = 0
s = 0.
for id_user in range(nb_users):
    v = training_set[id_user:id_user+1]
    vt = test_set[id_user:id_user+1]
    if len(vt[vt>=0]) > 0:
        _,h = rbm.sample_h(v)
        _,v = rbm.sample_v(h)
        test_loss += np.sqrt(torch.mean((vt[vt>=0] - v[vt>=0])**2)) # RMSE here
        s += 1.
print('test loss: '+str(test_loss/s))

test loss: tensor(0.4786)
