<a href="https://colab.research.google.com/github/meryltheng/DeepLearningA-Z/blob/main/Boltzmann_Machine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Boltzmann Machine

##Downloading the dataset

Dataset info: https://grouplens.org/datasets/movielens/

###ML-100K

In [1]:
!wget "http://files.grouplens.org/datasets/movielens/ml-100k.zip"
!unzip ml-100k.zip
!ls

--2023-06-15 07:45:25--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4924029 (4.7M) [application/zip]
Saving to: ‘ml-100k.zip’


2023-06-15 07:45:25 (27.3 MB/s) - ‘ml-100k.zip’ saved [4924029/4924029]

Archive:  ml-100k.zip
   creating: ml-100k/
  inflating: ml-100k/allbut.pl       
  inflating: ml-100k/mku.sh          
  inflating: ml-100k/README          
  inflating: ml-100k/u.data          
  inflating: ml-100k/u.genre         
  inflating: ml-100k/u.info          
  inflating: ml-100k/u.item          
  inflating: ml-100k/u.occupation    
  inflating: ml-100k/u.user          
  inflating: ml-100k/u1.base         
  inflating: ml-100k/u1.test         
  inflating: ml-100k/u2.base         
  inflating: ml-100k/u2.test         
  inflating: ml-100k/u3.base    

###ML-1M

In [2]:
!wget "http://files.grouplens.org/datasets/movielens/ml-1m.zip"
!unzip ml-1m.zip
!ls

--2023-06-15 07:45:26--  http://files.grouplens.org/datasets/movielens/ml-1m.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5917549 (5.6M) [application/zip]
Saving to: ‘ml-1m.zip’


2023-06-15 07:45:26 (31.3 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]

Archive:  ml-1m.zip
   creating: ml-1m/
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         
ml-100k  ml-100k.zip  ml-1m  ml-1m.zip	sample_data


##Importing the libraries

In [3]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.optim as optim
import torch.utils.data
from torch.autograd import Variable

## Importing the dataset


In [4]:
# We won't be using this dataset.
movies = pd.read_csv('ml-1m/movies.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
users = pd.read_csv('ml-1m/users.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')
ratings = pd.read_csv('ml-1m/ratings.dat', sep = '::', header = None, engine = 'python', encoding = 'latin-1')

See data file descriptions in `README` file.

## Preparing the training set and the test set


Refer to `README` file in `ml-100k` folder.

In [5]:
training_set = pd.read_csv('ml-100k/u1.base', delimiter = '\t')
training_set = np.array(training_set, dtype = 'int')
test_set = pd.read_csv('ml-100k/u1.test', delimiter = '\t')
test_set = np.array(test_set, dtype = 'int')

## Getting the number of users and movies


Make one matrix for training and one for test set. These matrices will have the same number of users and movies (i.e., rows and cols).

Maximum user/movie ID needs to correspond to ALL data splits, especially when doing k-fold cross-validation.

In [6]:
nb_users = int(max(max(training_set[:,0]), max(test_set[:,0])))
nb_movies = int(max(max(training_set[:,1]), max(test_set[:,1])))

## Converting the data into an array with users in lines and movies in columns


In [7]:
def convert(data):
  new_data = []
  for id_users in range(1, nb_users + 1):  # add 1 bc python is annoying with indexing
    id_movies = data[:, 1] [data[:, 0] == id_users]
    id_ratings = data[:, 2] [data[:, 0] == id_users]
    ratings = np.zeros(nb_movies)
    ratings[id_movies - 1] = id_ratings # minus 1 for indexes to correspond
    new_data.append(list(ratings)) # list of lists - what Torch expects
  return new_data
training_set = convert(training_set)
test_set = convert(test_set)

## Converting the data into Torch tensors


In [8]:
training_set = torch.FloatTensor(training_set) # expects a list of lists
test_set = torch.FloatTensor(test_set)

## Converting the ratings into binary ratings 1 (Liked) or 0 (Not Liked)


In [9]:
training_set[training_set == 0] = -1 # movies not rated
training_set[training_set == 1] = 0
training_set[training_set == 2] = 0
training_set[training_set >= 3] = 1

test_set[test_set == 0] = -1
test_set[test_set == 1] = 0
test_set[test_set == 2] = 0
test_set[test_set >= 3] = 1

## Creating the architecture of the Neural Network


We're going to build a probabilistic graphical model through a class. Classes are the most practical way to build/create anything in python. It's an ensemble of instructions and a model for what we want to build.

Three functions in this class:


1.   Initialise the RBM
2.   Sample the probabilities of the hidden nodes `h` given the visible nodes `v`
3.   Sample the probabilities of the visible nodes `v` given the hidden nodes `h`



In [10]:
class RBM():
  def __init__(self, nv, nh):# init function is compulsory in any class; default function
     self.W = torch.randn(nh, nv) # initialises a tensor of size nh, nv acc to a normal distribution with mean 0 and sd 1
     self.a = torch.randn(1, nh) # one bias for each hidden node; 2D-tensor n_batch X n_bias
     self.b = torch.randn(1, nv)
  def sample_h(self, x): # x corresponds to v
    wx = torch.mm(x, self.W.t()) # product of two tensors; note we take the transposed W for matrix dims to agree
    activation = wx + self.a.expand_as(wx) # weights + bias ; `.expand_as` function to apply bias to each line of the mini-batch
    p_h_given_v = torch.sigmoid(activation) # probability the hidden node is activated given the visible node
    return p_h_given_v, torch.bernoulli(p_h_given_v) # 2nd arg returns sample of hidden neurons based on probabilities
  def sample_v(self, y):
    wy = torch.mm(y, self.W) # note that we do not transpose W here bc matrix dims already agree
    activation = wy + self.b.expand_as(wy)
    p_v_given_h = torch.sigmoid(activation)
    return p_v_given_h, torch.bernoulli(p_v_given_h)
  def train(self, v0, vk, ph0, phk): # Contrastive Divergence based on algo 1 in Fischer & Igel (2012)
    self.W += (torch.mm(v0.t(),ph0) - torch.mm(vk.t(),phk)).t() # update weights
    self.b += torch.sum((v0 - vk), 0) # 2nd arg to keep tensor to 2D
    self.a += torch.sum((ph0 - phk), 0)

**Note on RBM:**
The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a matrix of weights $W$ of size $m \times n$. Each weight element $w_{i,j}$ of the matrix is associated with the connection between the visible (input) unit $v_i$ and the hidden unit $h_j$. In addition, there are bias weights (offsets) $a_i$ for $v_i$ and $b_i$ for $h_j$. Given the weights and biases, the energy of a configuration (pair of boolean vectors) ($v,h$) is defined as

$$ E(v,h) = - \sum_i{a_i v_i} - \sum_j{b_j h_j} - \sum_i \sum_j{v_i w_{i,j} h_j}$$ .

The joint probability distribution for the visible and hidden vectors is defined in terms of the energy function as follows,

$$ P(v,h) = - \frac{1}{Z} e^{-E(v,h)}$$

where $Z$ is a partition function defined as the sum of $e^{-E(v,h)}$ over all possible configurations, which can be interpreted as a normalizing constant to ensure that the probabilities sum to 1.

More here: https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine

[Note: biases are analogous to the role of a constant in a linear function]

### Create RMB object

In [11]:
nv = len(training_set[0]) # no. of movies
nh = 100 # param we can choose and tune; no. features we detect
batch_size = 100 # tunable param; big batch size, faster training
rbm = RBM(nv, nh) # creates RMB object

## Training the RBM


In [12]:
nb_epoch = 10
for epoch in range(1, nb_epoch + 1):
  train_loss = 0 # initialised value
  s = 0. # counter that is a float
  for id_user in range(0, nb_users - batch_size, batch_size): # 3rd arg is the step size
    vk = training_set[id_user : id_user + batch_size] # input batch to be updated
    v0 = training_set[id_user : id_user + batch_size] # won't be updated; original ratings of users (ie. target)
    ph0,_ = rbm.sample_h(v0) # `,_` returns first element of function
    for k in range(10): # k-step contrastive divergence
      _,hk = rbm.sample_h(vk)
      _,vk = rbm.sample_v(hk)
      vk[v0<0] = v0[v0<0] # ensure training does not happen on movies that weren't rated (== -1)
    phk,_ = rbm.sample_h(vk)
    rbm.train(v0, vk, ph0, phk) # this executes training, does not return anything
    train_loss += torch.mean(torch.abs(v0[v0 >= 0] - vk[v0 >= 0]))
    s += 1. # counter to normalise the train loss
  print('epoch: '+str(epoch)+' loss: '+str(train_loss/s))

epoch: 1 loss: tensor(0.3569)
epoch: 2 loss: tensor(0.2452)
epoch: 3 loss: tensor(0.2522)
epoch: 4 loss: tensor(0.2524)
epoch: 5 loss: tensor(0.2471)
epoch: 6 loss: tensor(0.2497)
epoch: 7 loss: tensor(0.2464)
epoch: 8 loss: tensor(0.2498)
epoch: 9 loss: tensor(0.2445)
epoch: 10 loss: tensor(0.2488)


This means that we get one wrong prediction every four times in the training set (loss = 0.25).

## Testing the RBM


The "blind-walk" technique (from MCMC; close to random-walk but probabilities are not the same) is anagolous to having to stay on a straight line while blindfolded. Model has been trained to do this for 10 steps, so there is a high chance of 'staying on the line' when making single steps -- it is much easier!

In [15]:
test_loss = 0
s = 0.
for id_user in range(nb_users): # batch_size is a technique specific to the training; don't need it here
  v = training_set[id_user : id_user + 1] # training set is the input used to activate the hidden neurons to predict the output
  vt = test_set[id_user : id_user + 1]
  if len(vt[vt >= 0]) > 0: # the blind-walk is one step, so loop is removed; instead, we have a filter for only existing ratings (>0)
    _,h = rbm.sample_h(v)
    _,v = rbm.sample_v(h)
    test_loss += torch.mean(torch.abs(vt[vt >= 0] - v[vt >= 0]))
    s += 1.
print('test loss ' +str(test_loss/s))

test loss tensor(0.2432)


Model predicted test set slightly better than training set (loss < 0.25)!