# MLP (MNIST, Tensorflow)
In this tutorial, we will use MNIST data to practice Multi Layer Perceptron with Tensorflow.

In [None]:
import tensorflow as tf
import numpy as np
from IPython.display import Image

# MLP Architecture
here is the overview of MLP architecture we will implement with Tensorflow

In [None]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/simple_mlp_mnist.png", width=500, height=250)

- 인풋레이어1개, 히든레이어가 2개, 아웃풋 레이어 1개이고 레이어 별로 노드수가 다름
  - 노드별로 bias 1개씩 있음
  - 일부 레이어에 일부 노드를 사용하지 않는 'drop out' 적용가능 
- 총 계산된 레이어가 precition. 이거를 activation(prediction)
  - 이걸 레이블 값과 비교함.
  - 비교하면서 수정하는것을 '오차역전파'
    - 수정하는방법이 '경사하강법'
    - local 이슈 피하기위해 'optimizer'
- input data : (70000,28,28 : input layer랑 다른거임) 
  - 28 * 28 -> 784로 resshape
- label : 10 
  - 원핫인코딩 적용 

# Collect MNIST Data

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [None]:
print(x_train.shape)
print(x_test.shape)

(60000, 28, 28)
(10000, 28, 28)


- 7만개중 검증 데이터에 1만개할당
- 각 데이터가 28x28픽셀로 이루어짐

train data has **60000** samples  
test data has **10000** samples   
every data is **28 * 28** pixels  

below image shows 28*28 pixel image sample for hand written number '0' from MNIST data.  
MNIST is gray scale image [0 to 255] for hand written number.

![0 from MNIST](https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/mnist_sample.png)

# Split train data into train and validation data
Validation during training gives advantages below,  
1) check if train goes well based on validation score  
2) apply **early stopping** when validation score doesn't improve while train score goes up (overcome **overfitting**)

In [None]:
x_val  = x_train[50000:60000]
x_train = x_train[0:50000]
y_val  = y_train[50000:60000]
y_train = y_train[0:50000]

In [None]:
print("train data has " + str(x_train.shape[0]) + " samples")
print("every train data is " + str(x_train.shape[1]) 
      + " * " + str(x_train.shape[2]) + " image")

train data has 50000 samples
every train data is 28 * 28 image


In [None]:
print("validation data has " + str(x_val.shape[0]) + " samples")
print("every train data is " + str(x_val.shape[1]) 
      + " * " + str(x_train.shape[2]) + " image")

validation data has 10000 samples
every train data is 28 * 28 image


28 * 28 pixels has gray scale value from **0** to **255**

In [None]:
# sample to show gray scale values
print(x_train[0][8])

[  0   0   0   0   0   0   0  18 219 253 253 253 253 253 198 182 247 241
   0   0   0   0   0   0   0   0   0   0]


each train data has its label **0** to **9**

In [None]:
# sample to show labels for first train data to 10th train data
print(y_train[0:9])

[5 0 4 1 9 2 1 3 1]


test data has **10000** samples  
every test data is **28 * 28** image  

In [None]:
print("test data has " + str(x_test.shape[0]) + " samples")
print("every test data is " + str(x_test.shape[1]) 
      + " * " + str(x_test.shape[2]) + " image")

test data has 10000 samples
every test data is 28 * 28 image


# Reshape
In order to fully connect all pixels to hidden layer,  
we will reshape (28, 28) into (28x28,1) shape.  
It means we flatten row x column shape to an array having 28x28 (784) items.

In [None]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/reshape_mnist.png", width=500, height=250)

In [None]:
x_train = x_train.reshape(50000, 784)
x_val = x_val.reshape(10000, 784)
x_test = x_test.reshape(10000, 784)

print(x_train.shape)
print(x_test.shape)

(50000, 784)
(10000, 784)


- 28*28을 784로 reshape해줌

In [None]:
x_train[0]

array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   3,  18,  18,  18,
       126, 136, 175,  26, 166, 255, 247, 127,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,  30,  36,  94, 154, 17

# Normalize data
normalization usually helps faster learning speed, better performance  
by reducing variance and giving same range to all input features.  
since MNIST data set all input has 0 to 255, normalization only helps reducing variances.  
it turned out normalization is better than standardization for MNIST data with my MLP architeture,    
I believe this is because relu handles 0 differently on both feed forward and back propagation.  
handling 0 differently is important for MNIST, since 1-255 means there is some hand written,  
while 0 means no hand written on that pixel.

In [None]:
x_train = x_train.astype('float32')
x_val = x_val.astype('float32')
x_test = x_test.astype('float32')

gray_scale = 255
x_train /= gray_scale
x_val /= gray_scale
x_test /= gray_scale

- 255로 나눠주는게 곧 정규화 

# label to one hot encoding value

In [None]:
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_val = tf.keras.utils.to_categorical(y_val, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

In [None]:
y_train

array([[0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.]])

- y값 (0~9)를 one hot encoding해준다 
  - 숫자간의 관계가 없으니까 인코딩 이렇게 해주는거지
- one hot
  - 1 : 1 0 0 0 0 0 0 0 0 0
  - 2: 0 1 0 0 0 0 0 0 0 0
  - 3: 0 0 1 0 0 0 0 0 0 0

# Tensorflow MLP Graph
Let's implement the MLP graph with Tensorflow

In [None]:
Image(url= "https://raw.githubusercontent.com/minsuk-heo/deeplearning/master/img/simple_mlp_mnist.png", width=500, height=250)

In [None]:
x = tf.placeholder(tf.float32, [None, 784]) # None만큼의 샘플을 받고 784개의 칼럼을 가진 x라는 의미
y = tf.placeholder(tf.float32, [None, 10])

In [None]:
def mlp(x):
    # hidden layer1
    w1 = tf.Variable(tf.random_uniform([784,256])) # 784개의 인풋을 받고 256개의 노드로 처리한다
    b1 = tf.Variable(tf.zeros([256])) ## bias는 node 수 마다 있는거다
    h1 = tf.nn.relu(tf.matmul(x, w1) + b1) ## relu는 활성화 함수
    # hidden layer2
    w2 = tf.Variable(tf.random_uniform([256,128])) # 256개의 인풋을 받고 128개의 노드로 처리한다
    b2 = tf.Variable(tf.zeros([128]))
    h2 = tf.nn.relu(tf.matmul(h1, w2) + b2)
    # output layer
    w3 = tf.Variable(tf.random_uniform([128,10]))
    b3 = tf.Variable(tf.zeros([10]))
    logits= tf.matmul(h2, w3) + b3
    
    return logits

In [None]:
logits = mlp(x)

In [None]:
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=y)) ## 손실함수

In [None]:
train_op = tf.train.AdamOptimizer(learning_rate=0.01).minimize(loss_op) ## optimizer

실제 학습 돌리는 코드 (아래)

In [None]:
# initialize : 그냥 항상 필요한거라는데
init = tf.global_variables_initializer()

# train hyperparameters
epoch_cnt = 30 # 총 30회 할건데
batch_size = 1000 # 5만개 데이터 중 1000개를 한 batch 사이즈로 -> 한 에포크당 50번의 iteration(batch)을 가진다
iteration = len(x_train) // batch_size

# Start training
with tf.Session() as sess:
    # Run the initializer
    sess.run(init) # 변수 초기화 기능
    for epoch in range(epoch_cnt):
        avg_loss = 0.
        start = 0; end = batch_size
        
        for i in range(iteration):
            _, loss = sess.run([train_op, loss_op], 
                               feed_dict={x: x_train[start: end], y: y_train[start: end]})
            start += batch_size; end += batch_size
            # Compute average loss
            avg_loss += loss / iteration
            
        # Validate model
        preds = tf.nn.softmax(logits)  # Apply softmax to logits
        correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
        # Calculate accuracy
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        cur_val_acc = accuracy.eval({x: x_val, y: y_val})
        print("epoch: "+str(epoch)+", validation accuracy: " 
              + str(cur_val_acc) +', loss: '+str(avg_loss))
    
    # Test model
    preds = tf.nn.softmax(logits)  # Apply softmax to logits
    correct_prediction = tf.equal(tf.argmax(preds, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("[Test Accuracy] :", accuracy.eval({x: x_test, y: y_test}))

epoch: 0, validation accuracy: 0.1064, loss: 9479.23924316406
epoch: 1, validation accuracy: 0.7404, loss: 487.9066563034058
epoch: 2, validation accuracy: 0.8683, loss: 20.24389074325562
epoch: 3, validation accuracy: 0.8761, loss: 11.892984285354613
epoch: 4, validation accuracy: 0.8858, loss: 9.276838760375973
epoch: 5, validation accuracy: 0.8785, loss: 8.25918293952942
epoch: 6, validation accuracy: 0.8832, loss: 7.402374687194823
epoch: 7, validation accuracy: 0.906, loss: 6.622725062370303
epoch: 8, validation accuracy: 0.9034, loss: 5.537717547416686
epoch: 9, validation accuracy: 0.8971, loss: 4.807866144180299
epoch: 10, validation accuracy: 0.8396, loss: 7.349521398544313
epoch: 11, validation accuracy: 0.9066, loss: 6.607095966339113
epoch: 12, validation accuracy: 0.8217, loss: 52.06143003463745
epoch: 13, validation accuracy: 0.8922, loss: 15.170511302948
epoch: 14, validation accuracy: 0.9016, loss: 6.205790314674376
epoch: 15, validation accuracy: 0.9036, loss: 4.978821