# Boltzmann Machines

<img src="img/67_blog_image_2.png" width="800" height="500">

* Use-case: Recommender Systems
* Undirected and Unsupervised models
  * Generative deep-learning
  * No output layer 
  * Each visible node is something that we measure, e.g. each component of a power plant
  * Each hidden layer is something we don't measure, e.g. wind speed, humidity <img src="img/67_blog_image_3.png" width="500" height="250" >
* Energy-based models (EBM)
  * Uses the Boltzmann distribution formula <img src="img/boltzmann_dist.png" width="100" height="50" >
  * Systems tend to move towards their lowest energy state
  * [A Tutorial on Energy-Based Learning](http://yan.lecuncom/exdb/publis/pdf/lecun-06.pdf)
  * Mr. Nobody
* Restricted Boltzmann Machine (RBM) <img src="img/restricted_boltzmann_machine.png" height="500" width="800" >
  * In/visible nodes cannot connect to each other
  * Visible nodes can be different movies
  * Hidden nodes can be features such as movie genre, actor, award, director, etc.
* Contrastive Divergence
  * Allows RBM's to learn through Gibb's sampling
  * Iteratively updates the weights to minimize the energy in the system
  * [A Fast Learning Algorithm for Deep Belief Nets](https://www.cs.toronto.edu/~hinton/absps/fastnc.pdf)
* Deep Belief Networks (DBN)
  * Stacked RBM's
  * Greedy Layer-wise training
  * Wake-Sleep algorithm
  * Top two layers are undirected
  * Bottom two layer connections flow downwards towards the inputs (directed)
  * [Greedy Layer-Wise Training of Deep Networks](http://www.iro.umontreal.ca/~lisa/pointeurs/BengioNips2006all.pdf)
* Deep Boltzmann Machines (DBM)
  * Stacked RBM's but all connections remained undirected
  * [Deep Boltzmann Machines](http://www.utstat.toronto.edu/~rsalakhu/dbm.pdf)

In [1]:
from os.path import dirname, abspath, join, curdir

import numpy as np
import pandas as pd

from torch import FloatTensor, mean, abs

In [2]:
# Import the dataset
datapath = join(dirname(dirname(abspath(curdir))), "data", "raw", "rbm")

movies = pd.read_csv(join(datapath, "movielens-1m", "movies.dat"),
                     sep="::",
                     header=None,
                     engine="python",
                     encoding="latin-1")

users = pd.read_csv(join(datapath, "movielens-1m", "users.dat"),
                    sep="::",
                    header=None,
                    engine="python",
                    encoding="latin-1")

ratings = pd.read_csv(join(datapath, "movielens-1m", "ratings.dat"),
                      sep="::",
                      header=None,
                      engine="python",
                      encoding="latin-1")

movies.shape, users.shape, ratings.shape

((3883, 3), (6040, 5), (1000209, 4))

In [3]:
# Prepare training and test sets
train_df = pd.read_csv(join(datapath, "movielens-100k", "u1.base"),
                        sep="\t",
                        header=None)

train_set = np.array(train_df, dtype="int")

test_df = pd.read_csv(join(datapath, "movielens-100k", "u1.test"),
                        sep="\t",
                        header=None)

test_set = np.array(test_df, dtype="int")

train_set.shape, test_set.shape

((80000, 4), (20000, 4))

In [4]:
# Create matrices of total number of users and movies for bi-fold cross validation
# The max user/movie ID may be present in the training or test data
nb_users = int(max(max(train_set[:, 0]), max(test_set[:, 0])))
nb_movies = int(max(max(train_set[:, 1]), max(test_set[:, 1])))

nb_users, nb_movies

(943, 1682)

In [5]:
def convert(data: np.ndarray) -> list:
    """Convert data into a matrix like structure.

    Args:
    ----
    data : np.ndarray
        The data to transform
    size : int
        The total number of items in the overall dataset

    Returns:
    -------
    list
        The data transformed
    """
    new_data = []

    for user_id in range(1, nb_users + 1):
        # Get user movies and ratings
        movie_ids = data[:, 1][data[:, 0] == user_id]
        rating_ids = data[:, 2][data[:, 0] == user_id]

        # Get all list of movie ratings by user, unrated movies = -1
        ratings = -np.ones(nb_movies)
        ratings[movie_ids - 1] = rating_ids # movie_ids starts at 1
        new_data.append(np.array(ratings))

    return new_data

In [6]:
train_set_converted = convert(train_set)
test_set_converted = convert(test_set)

train_set_converted[:5]

[array([ 5.,  3.,  4., ..., -1., -1., -1.]),
 array([ 4., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.])]

In [7]:
test_set_converted[:5]

[array([-1., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.]),
 array([-1., -1., -1., ..., -1., -1., -1.]),
 array([ 4.,  3., -1., ..., -1., -1., -1.])]

In [8]:
# Convert data into tensors
train_set_ft = FloatTensor(train_set_converted)
test_set_ft = FloatTensor(test_set_converted)

  train_set_ft = FloatTensor(train_set_converted)


In [9]:
train_set_ft[:5]

tensor([[ 5.,  3.,  4.,  ..., -1., -1., -1.],
        [ 4., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.]])

In [10]:
test_set_ft[:5]

tensor([[-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [ 4.,  3., -1.,  ..., -1., -1., -1.]])

In [11]:
# Convert ratings into binary (1=Liked, 0=Not Liked)
train_set_ft[train_set_ft == 1] = 0
train_set_ft[train_set_ft == 2] = 0
train_set_ft[train_set_ft >= 3] = 1

test_set_ft[test_set_ft == 1] = 0
test_set_ft[test_set_ft == 2] = 0
test_set_ft[test_set_ft >= 3] = 1

In [12]:
train_set_ft[:5]

tensor([[ 1.,  1.,  1.,  ..., -1., -1., -1.],
        [ 1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.]])

In [13]:
test_set_ft[:5]

tensor([[-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [-1., -1., -1.,  ..., -1., -1., -1.],
        [ 1.,  1., -1.,  ..., -1., -1., -1.]])

In [14]:
from rbm import RBM

In [26]:
# Number of visible nodes
nv = len(train_set_ft[0])

# Tuneable parameters, number of hidden nodes and number of batches per iteration
nh = 150
batch_size = 100

In [27]:
# Initialize the RBM model
rbm = RBM(nv, nh)

In [28]:
# Train the RBM model
epochs = 10

for epoch in range(epochs):
    train_loss = 0
    s = 0.

    for uid in range(0, nb_users - batch_size, batch_size):
        # Get initial visible nodes (movie ratings)
        vk = train_set_ft[uid:uid+batch_size]
        v0 = train_set_ft[uid:uid+batch_size]

        # Get initial hidden node probabilities
        ph0, _ = rbm.sample_h(v0)

        # contrastive divergence to obtain samples of
        # the visible/hidden node activations after k-steps
        for k in range(10):
            _, hk = rbm.sample_h(vk)
            _, vk = rbm.sample_v(hk)

            # Freeze visible nodes containing -1 ratings during training
            # so we don't learn from movies that were not rated by the user
            vk[v0<0] = v0[v0<0]

        # Get final hidden node probabilities after k-steps
        phk, _ = rbm.sample_h(vk)
        rbm.train(v0, vk, ph0, phk)
        train_loss += mean(abs(v0[v0>=0] - vk[v0>=0]))

        # RMSE Loss
        # train_loss += np.sqrt(mean((v0[v0>=0] - vk[v0>=0])**2))

        # Update counter to normalize train_loss
        s += 1.

    print(f"Epoch: {epoch+1} Train_loss: {train_loss/s}")

Epoch: 1 Train_loss: 0.3574598729610443
Epoch: 2 Train_loss: 0.25747954845428467
Epoch: 3 Train_loss: 0.2497379034757614
Epoch: 4 Train_loss: 0.25117620825767517
Epoch: 5 Train_loss: 0.24890729784965515
Epoch: 6 Train_loss: 0.24733179807662964
Epoch: 7 Train_loss: 0.24786941707134247
Epoch: 8 Train_loss: 0.25024890899658203
Epoch: 9 Train_loss: 0.24323007464408875
Epoch: 10 Train_loss: 0.24826949834823608


In [32]:
# Testing the RBM model
test_loss = 0
s = 0.

for uid in range(nb_users):
    v = train_set_ft[uid:uid+1]
    vt = test_set_ft[uid:uid+1]

    if len(vt[vt>=0]) > 0:
        _, h = rbm.sample_h(v)
        _, v = rbm.sample_v(h)

        test_loss += mean(abs(vt[vt>=0] - v[vt>=0]))

        # RMSE Loss
        # test_loss += np.sqrt(mean((v0[v0>=0] - vk[v0>=0])**2))

        s += 1.

print(f"Test_Loss: {test_loss/s}")

Test_Loss: 0.264729380607605
