# MLP (MNIST, Tensorflow)
In this tutorial, we will use MNIST data to practice Multi Layer Perceptron with Tensorflow.

Three concepts you will learn from this tutorial,  
1. tensorflow MLP implementation  
2. drop out  
3. early stopping  

In [6]:
import tensorflow as tf
import numpy as np

# Collect MNIST Data

In [7]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [8]:
print(x_train.shape)
print(x_test.shape)

(60000, 28, 28)
(10000, 28, 28)


In [9]:
![Image of Perceptron](https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/MLP_XOR.png) 

/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `[Image of Perceptron](https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/MLP_XOR.png) '


train data has **60000** samples  
test data has **10000** samples   
every data is **28 * 28** pixels  

In [3]:
![Image of Perceptron](https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/mnist_sample.png) 

/bin/sh: -c: line 0: syntax error near unexpected token `('
/bin/sh: -c: line 0: `[Image of Perceptron](https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/mnist_sample.png) '


# Split train data into train and validation data
Validation during training gives advantages below,  
1) check if train goes well based on validation score  
2) apply **early stopping** when validation score doesn't improve while train score goes up (overcome **overfitting**)

In [4]:
x_val  = x_train[50000:60000]
x_train = x_train[0:50000]
y_val  = y_train[50000:60000]
y_train = y_train[0:50000]

In [5]:
print("train data has " + str(x_train.shape[0]) + " samples")
print("every train data is " + str(x_train.shape[1]) 
      + " * " + str(x_train.shape[2]) + " image")

train data has 50000 samples
every train data is 28 * 28 image


In [6]:
print("validation data has " + str(x_val.shape[0]) + " samples")
print("every train data is " + str(x_val.shape[1]) 
      + " * " + str(x_train.shape[2]) + " image")

validation data has 10000 samples
every train data is 28 * 28 image


28 * 28 pixels has gray scale value from **0** to **255**

In [7]:
# sample to show gray scale values
print(x_train[0][8])

[  0   0   0   0   0   0   0  18 219 253 253 253 253 253 198 182 247 241
   0   0   0   0   0   0   0   0   0   0]


each train data has its label **0** to **9**

In [8]:
# sample to show labels for first train data to 10th train data
print(y_train[0:9])

[5 0 4 1 9 2 1 3 1]


test data has **10000** samples  
every test data is **28 * 28** image  

In [9]:
print("test data has " + str(x_test.shape[0]) + " samples")
print("every test data is " + str(x_test.shape[1]) 
      + " * " + str(x_test.shape[2]) + " image")

test data has 10000 samples
every test data is 28 * 28 image


# Reshape
In order to fully connect all pixels to hidden layer,  
we will reshape (28, 28) into (28x28,1) shape.  
It means we flatten row x column shape to an array having 28x28 (756) items.

In [10]:
x_train = x_train.reshape(50000, 784)
x_val = x_val.reshape(10000, 784)
x_test = x_test.reshape(10000, 784)

print(x_train.shape)
print(x_test.shape)

(50000, 784)
(10000, 784)


# Normalize data
normalization helps deep learning speed up on optimization!

In [11]:
x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
x_test = x_test.astype('float32')

gray_scale = 255
x_train /= gray_scale
x_val /= gray_scale
x_test /= gray_scale

# label to one hot encoding value

In [12]:
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_val = tf.keras.utils.to_categorical(y_val, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

In [13]:
y_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]])

# Tensorflow MLP Graph

In [14]:
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)

In [15]:
def mlp(x):
    # hidden layer1
    w1 = tf.Variable(tf.random_uniform([784,256]))
    b1 = tf.Variable(tf.zeros([256]))
    h1 = tf.nn.relu(tf.matmul(x, w1) + b1)
    # hidden layer2
    w2 = tf.Variable(tf.random_uniform([256,128]))
    b2 = tf.Variable(tf.zeros([128]))
    h2 = tf.nn.relu(tf.matmul(h1, w2) + b2)
    h2_drop = tf.nn.dropout(h2, keep_prob)
    # output layer
    w3 = tf.Variable(tf.random_uniform([128,10]))
    b3 = tf.Variable(tf.zeros([10]))
    logits= tf.matmul(h2_drop, w3) + b3
    
    return logits

In [16]:
logits = mlp(x)

In [17]:
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=y))

In [18]:
train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss_op)

In [19]:
# initialize
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
    # Run the initializer
    sess.run(init)
    max_val_acc = 0.0
    for epoch in range(100):
        avg_loss = 0.
        start = 0; end = 1000
        for i in range(50):
            _, loss = sess.run([train_op, loss_op], 
                               feed_dict={x: x_train[start: end], y: y_train[start: end], 
                                          keep_prob: 0.9})
            start += 1000
            end += 1000
            # Compute average loss
            avg_loss += loss / 50
            
        # Validate model
        preds = tf.nn.softmax(logits)  # Apply softmax to logits
        correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
        # Calculate accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        cur_val_acc = accuracy.eval({x: x_val, y: y_val, keep_prob: 1.0})
        print("epoch: "+str(epoch)+", validation accuracy: " 
              + str(cur_val_acc) +', loss: '+str(avg_loss))
        if epoch > 50 and cur_val_acc < max_val_acc + 0.0005:
            print("early stopped on "+str(epoch))
            break
        else:
            if max_val_acc < cur_val_acc:
                max_val_acc = cur_val_acc
    
    # Test model
    preds = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("[Test Accuracy] :", accuracy.eval({x: x_test, y: y_test, keep_prob: 1.0}))

epoch: 0, validation accuracy: 0.2212, loss: 9524.407270507818
epoch: 1, validation accuracy: 0.6217, loss: 517.2178720712662
epoch: 2, validation accuracy: 0.6637, loss: 2.4575348401069643
epoch: 3, validation accuracy: 0.6909, loss: 1.6795690083503723
epoch: 4, validation accuracy: 0.7216, loss: 1.446189036369324
epoch: 5, validation accuracy: 0.751, loss: 1.2856665849685671
epoch: 6, validation accuracy: 0.7795, loss: 1.1384457182884216
epoch: 7, validation accuracy: 0.7978, loss: 1.052964658737183
epoch: 8, validation accuracy: 0.8125, loss: 0.9662364649772648
epoch: 9, validation accuracy: 0.8282, loss: 0.8758240914344784
epoch: 10, validation accuracy: 0.8375, loss: 0.8203610956668854
epoch: 11, validation accuracy: 0.8478, loss: 0.7551597130298612
epoch: 12, validation accuracy: 0.8563, loss: 0.7112305164337158
epoch: 13, validation accuracy: 0.862, loss: 0.659562692642212
epoch: 14, validation accuracy: 0.8713, loss: 0.6162692201137541
epoch: 15, validation accuracy: 0.8753, lo