Note: all the following is not quite the scenario these models are being studied for. The purpose is to see how unsupervised models can extract useful features and model probability distribution of given data. RBM can be viewed as a contemporary neural network (though 'double-edged'), so the weights they learn under unsupervised learning procedure can be used to initialize feedforward NN with good prior and then fine-tuned to achieve faster learning with better generalization (as Mr. Hinton initially proposed in 2006's paper).

In [1]:
import numpy as np
import MNIST_Dataset.LoadMNIST as mnist

from MyRBM.RMB_Classifier import Classifier

### Loading MNIST dataset

In [2]:
mnist.load('MNIST_Dataset/')

In [3]:
sparse_training_labels = np.zeros((mnist.training_images_count, 10))

for i, label in enumerate(mnist.training_labels):
    sparse_training_labels[i][int(label)] = 1.0

sparse_testing_labels = np.zeros((mnist.test_images_count, 10))

for i, label in enumerate(mnist.test_labels):
    sparse_testing_labels[i][int(label)] = 1.0

In [4]:
train_images = mnist.training_images.reshape((mnist.training_images.shape[0], mnist.images_size)) / 255
test_images = mnist.test_images.reshape((mnist.test_images.shape[0], mnist.images_size)) / 255

In [5]:
def get_accuracy(model: Classifier, data: np.array, labels: np.array):
    successes = 0
    for x, y in zip(data, labels):
        if y == model.class_predict(x):
            successes += 1

    return successes / data.shape[0]

## RBM with 10 hidden layers

### Initializing the model

In [6]:
model = Classifier(mnist.images_size, 10, 10, True)

In [7]:
# initial accuracy is expected to be around 0.1 for obvious reasons
print('Initial test accuracy: ', get_accuracy(model, test_images, mnist.test_labels))

Initial test accuracy:  0.101


### Training the model

In [8]:
model.fit_classifier(train_images, sparse_training_labels, lr=1e-1)
print('Test accuracy: ', get_accuracy(model, test_images, mnist.test_labels))

Running model training.
Epochs:  10
Batch size:  16
Gibb samples:  1
Learning rate:  0.1
Persistence:  off
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
EPOCH 2...
EPOCH 3...
EPOCH 4...
EPOCH 5...
EPOCH 6...
EPOCH 7...
EPOCH 8...
EPOCH 9...
EPOCH 10...
Done!
Test accuracy:  0.6892


In [9]:
model.fit_classifier(train_images, sparse_training_labels, lr=1e-2, gibb_samples=5, batch_size=64)
print('Test accuracy: ', get_accuracy(model, test_images, mnist.test_labels))

Running model training.
Epochs:  10
Batch size:  64
Gibb samples:  5
Learning rate:  0.01
Persistence:  off
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
EPOCH 2...
EPOCH 3...
EPOCH 4...
EPOCH 5...
EPOCH 6...
EPOCH 7...
EPOCH 8...
EPOCH 9...
EPOCH 10...
Done!
Test accuracy:  0.7081


In [11]:
model.fit_classifier(train_images, sparse_training_labels, lr=1e-3, gibb_samples=10, batch_size=128)
print('Test accuracy: ', get_accuracy(model, test_images, mnist.test_labels))

Running model training.
Epochs:  10
Batch size:  128
Gibb samples:  10
Learning rate:  0.001
Persistence:  off
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
EPOCH 2...
EPOCH 3...
EPOCH 4...
EPOCH 5...
EPOCH 6...
EPOCH 7...
EPOCH 8...
EPOCH 9...
EPOCH 10...
Done!
Test accuracy:  0.7089


In conclusion, with RBM with 10 hidden units we can reach 71% test accuracy in ~30 epochs using standard CD and proper parameters. The model has (28*28+10) * 10 + 28*28+10 + 10 = 8744 parameters. Considering that this classifier is actually based on the unsupervised model — rbm — that performs unsupervised feature detection using energy function and that the gradient of it is intractable and must be approximated the result is quite impressive.

Though the implementation supposes binary layers, this particular model showed better result when the input is real-valued and is not turned into binary vectors internally. With such treating and 50 epochs we could get 75% accuracy with pure CD-1 and CD-5 at the end.

Using current implementation of PCD (as of 29 dec 2022) did not yield any benefits but quick over-fitting and prolonged learning.

### RBM with 100 hidden units

### Initializing the model

In [12]:
model100 = Classifier(mnist.images_size, 10, 100, True)

In [13]:
# initial accuracy is expected to be around 0.1 for obvious reasons
print('Initial test accuracy: ', get_accuracy(model100, test_images, mnist.test_labels))

Initial test accuracy:  0.1009


In [14]:
model100.fit_classifier(train_images, sparse_training_labels, lr=1e-2, epochs=5, momentum=.9)
print('Test accuracy: ', get_accuracy(model100, test_images, mnist.test_labels))

Running model training.
Epochs:  5
Batch size:  16
Gibb samples:  1
Learning rate:  0.01
Persistence:  off
Momentum:  0.9
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
EPOCH 2...
EPOCH 3...
EPOCH 4...
EPOCH 5...
Done!
Test accuracy:  0.9166


In [45]:
# saving parameters before fumbling with fine-tuning
W_copy = model100.W.copy()
Bv_copy = model100.Bv.copy()
Bh_copy = model100.Bh.copy()

In [46]:
model100.W = W_copy.copy()
model100.Bv = Bv_copy.copy()
model100.Bh = Bh_copy.copy()

### Trying to fine-tune with Persistence Contrastive Divergence (yet not quite beneficial here)

In [47]:
for i in range(10):
    model100.fit_classifier(train_images, sparse_training_labels, lr=1e-4, epochs=1, persistence=True)
    print('Test accuracy: ', get_accuracy(model100, test_images, mnist.test_labels))

Running model training.
Epochs:  1
Batch size:  16
Gibb samples:  1
Learning rate:  0.0001
Persistence:  on
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
Done!
Test accuracy:  0.92
Running model training.
Epochs:  1
Batch size:  16
Gibb samples:  1
Learning rate:  0.0001
Persistence:  on
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
Done!
Test accuracy:  0.9186
Running model training.
Epochs:  1
Batch size:  16
Gibb samples:  1
Learning rate:  0.0001
Persistence:  on
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
Done!
Test accuracy:  0.9186
Running model training.
Epochs:  1
Batch size:  16
Gibb samples:  1
Learning rate:  0.0001
Persistence:  on
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
Done!
Test accuracy:  0.9198
Running model training.
Epochs:  1
Batch size:  16
Gibb samples:  1
Learning rate:  0.0001
Persistence:  on
Momentum:  0.8
Weight decay:  1e-06
Sample binary:  on

EPOCH 1...
Done!
Test 

The base of this caller is an RBM with 100 hidden units has (28*28 + 10) * 100 + (28*28 + 10) + 100 = 80 194 parameters. With 6 epoch it achieves 92% classification accuracy that is probably near its extreme.