
# How to prove the pretraining work in MNIST dataset?

> In this tutorial, I tried to check the improvement on model performance by pretrained autoencoder when there are lots of unlabeled data but limited labeled data.


Method:
*   baseline model: 
I used 1000 labeled mnist data to train plain multilayer neural network, and check the validation accuracy by model prediction on the rest 69000 data.

*   autoencoder-pretrained model:
In pretraining step, I used 69000 unlabeled mnist data to train an autoencoder in unsupervised way. When the training of autoencoder is completed, encoder part of autoencoder is extracted to compress the input signal. In subsequent, supervised classification step,  1000 labeled data are compressed by encoder, and then fed into plain multineural network as baseline model.

Conclusion:
Validation accuracy of baseline model saturated at around 0.83 after 1000 epoches, while that of autoencoder-pretrained model can reach 0.93. Pretraining works!!



![](https://drive.google.com/uc?export=view&id=1c_Ut7jJYbYNmko9z8HpEIUgrypKR5HHq)

Future works:


*   Check the effect of encoding dimension(encoding_dim)
*   Try different pretraining architectures: ex: restricted boltzmann machine(RBM)
*   Will pretraining work for time series??
*   Try different image dataset: FashionMNIST and NOTMNIST



Reference:


*   This post tell how to classify mnist w/ merely 1000 labels
https://towardsdatascience.com/a-wizards-guide-to-adversarial-autoencoders-part-4-classify-mnist-using-1000-labels-2ca08071f95
*  This post tells how to compress signal w/ autoencoder.
https://towardsdatascience.com/unsupervised-learning-of-gaussian-mixture-models-on-a-selu-auto-encoder-not-another-mnist-11fceccc227e

*    This lecture tells why and when pretraining will work.
Hinton coursera Lecture 14 on pretraining
https://d3c33hcgiwev3.cloudfront.net/_4bd9216688e0605b8e05f5533577b3b8_lec14.pdf?Expires=1540425600&Signature=QYnddOB54RDuCJ8ETkAq7xc3E05nUMeFGWtbUsvArIHkE2SVWLfvMe~Qz6ph~LB~HaDfnQz8eaITic-8qqk3CJwXkOyoFKlXdLo8bCddK8C1sr-ASE2WS6kmuyl-oPwuz1oaKuKQkazeUuapPLdV7RpG2X35jVHSt0yRef6JjqM_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5







Load the mnist

In [None]:
from sklearn.metrics import accuracy_score
from keras.datasets import mnist
import numpy as np
import pandas as pd
from keras.callbacks import TensorBoard

from keras.layers import Input, Dense
from keras.models import Model
from keras.regularizers import l2
from keras.utils import to_categorical
import keras

import matplotlib.pyplot as plt
import os
print(os.listdir("../input"))


In [None]:
#### Load the data
file = open("../input/mnist_train.csv")
data_train = pd.read_csv(file)

y_train = np.array(data_train.iloc[:, 0])
x_train = np.array(data_train.iloc[:, 1:])

file = open("../input/mnist_test.csv")
data_test = pd.read_csv(file)
y_test = np.array(data_test.iloc[:, 0])
x_test = np.array(data_test.iloc[:, 1:])

x_train = x_train.astype('float32')/ 255.
x_test = x_test.astype('float32')/ 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

x_all = np.concatenate((x_train,x_test))
y_all = to_categorical(np.concatenate((y_train,y_test)))

print('Shape of x_train:',x_train.shape)
print('Shape of x_test:', x_test.shape)
print('Shape of y_train:', y_train.shape)
print('Shape of y_test:',y_test.shape)


In [None]:
#### Split into train and test set
n_labeled = 1000
x_train = x_all[:n_labeled,:]
x_test = x_all[n_labeled:,:]
y_train = y_all[:n_labeled,:]
y_test = y_all[n_labeled:,:]

print(x_train.shape)
print(x_test.shape)

In [None]:
#### Construct neural Architeture for baseline model
input_img = Input(shape=(784,))
d = Dense(20, activation='relu')(input_img)
d = Dense(10, activation='relu')(d)
output = Dense(10, activation='softmax', kernel_regularizer=l2(0.01))(d)
baseline = Model(input_img,output)
baseline.compile(optimizer='adam', loss='binary_crossentropy',
                 metrics=['categorical_accuracy'])

In [None]:
#### Train the model
history_baseline = baseline.fit(x_train, y_train,
                epochs=1000,
                batch_size=100,
                shuffle=True,
                verbose=2,
                validation_data=[x_test,y_test])

#### Check the model performance
score = baseline.evaluate(x_test, y_test)
print ('keras test accuracy score:', score[1])


After 1000 epoches, training accuracy is 100% but test accuracy remains at ~86%. The overfitting w/ too limited data is quite obvious.

In [None]:
#### Check the model performance
score = baseline.evaluate(x_test, y_test)
print ('keras test accuracy score:', score[1])


**Autoencoder-pretrained model:**

In [None]:
#### Construct neural architecture of autoencoder
'''ref: https://towardsdatascience.com/unsupervised-learning-of-gaussian-\
mixture-models-on-a-selu-auto-encoder-not-another-mnist-11fceccc227e'''
# this is the size of our encoded representations
encoding_dim = 6  
# Specify the layer of autoencoder
input_img = Input(shape=(784,))
d = Dense(256, activation='selu')(input_img)
d = Dense(128, activation='selu')(d)
encoded = Dense(encoding_dim, activation='selu', kernel_regularizer=l2(0.01))(d)
d = Dense(128, activation='selu')(encoded)
d = Dense(256, activation='selu')(d)
decoded = Dense(784, activation='sigmoid')(d)
# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)
# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)
# create a placeholder for an encoded (6-dimensional) input
encoded_input = Input(shape=(encoding_dim,))
# retrieve the last layer of the autoencoder model
deco = autoencoder.layers[-3](encoded_input)
deco = autoencoder.layers[-2](deco)
deco = autoencoder.layers[-1](deco)
# create the decoder model
decoder = Model(encoded_input, deco)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

In [None]:
#### Train the autoencoder. Note we use test data to train.
autoencoder.fit(x_test, x_test,
                epochs=500,
                batch_size=2000,
                shuffle=True,
                verbose=2,
                validation_data=(x_test, x_test))

In [None]:
#### Use encoder part of autoencoder to compress signal for supervised training
x_train_en = encoder.predict(x_train)
x_test_en = encoder.predict(x_test)

In [None]:
#### Construct neural architecture for autoencoder-pretrained model, same as baseline model.
input_encoded_img = Input(shape=(6,))
d = Dense(20, activation='relu')(input_encoded_img)
d = Dense(10, activation='relu')(d)
output = Dense(10, activation='softmax', kernel_regularizer=l2(0.01))(d)

classifier = Model(input_encoded_img,output)
classifier.compile(optimizer='adam', loss='binary_crossentropy',
                 metrics=['categorical_accuracy'])

In [None]:
#### Train the model
history_pretrained = classifier.fit(x_train_en, y_train,
                epochs=1000,
                batch_size=100,
                shuffle=True,
                verbose=2,
                validation_data=[x_test_en,y_test])


In [None]:
#### Check the score 
score = classifier.evaluate(x_test_en, y_test)
print ('keras test accuracy score:', score[1])

In [None]:
#### Visualize train history
plt.plot(history_baseline.history['val_categorical_accuracy'])  
plt.plot(history_pretrained.history['val_categorical_accuracy'])  
plt.title('Train History')  
#plt.ylabel('')  
plt.xlabel('Epoch')  
plt.legend(['baseline:val_categorical_accuracy', 
            'pretrained:val_categorical_accuracy'], loc='lower right')  
plt.show()