# 10. Improve Neural Network Performance
- Neural network의 학습 결과를 좋게 만드는 방법을 정리하면 아래와 같다.
  - Input layer 및 hidden layer의 activation fuction은 ReLU를 사용하고, output layer의 activation function은 sigmoid를 사용한다.

  - Weights의 초기화는 Xavier initialization을 사용한다.
  
  - 학습과정에 Weight 설계시 dropout을 사용한다.
  
  - 다수의 독립적인 Traning model을 학습시켜 결과를 합치는 emsemble을 사용한다.
  
  - Network 구성을 다양한 방법으로 시도해 본다.

## 1. Better activation function

### 1.1. Activation function
- Neural network에서 여러개의 network이 연결되어 구성된다. 일반적으로 input layer, hidden layer, output layer로 나눈다.
- 이때 한 network의 hypothesis는 sigmoid와 같은 함수가 있어서 일정 값 이상이 될때 activation 시킨다. 그래서 Neural network에서는 activation function 이라고 부른다.

- Neural network을 블럭다이어 그램으로 그려보면 activation function은 아래 그림과 같이 한 network의 출력단에 나타내수 있다.

<img src="https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2017/05/20090158/node-output-300x195.png" alt="" title="" />


### 1.2. Vanishing gradient problem
- Network layer가 깊을수록 학습이 비례하여 좋아지지 않는다. output layer로부터 3단 이후에는 학습이 되지 않는다.

- Chap 5에서 sigmoid 함수는 0.0 ~ 1.0 사이의 값을 출력하는 activation function 이라고 했다. 그리고 Chap 9에서 backpropagation은 종단부터 미분을 해나가야 학습이 된다고 했다. 아래 그림은 어떤 neural network 이 있다고 할때 backpropagation 과정을 나타낸 것이다.

In [None]:
'''
                   x
-(S)-(Net N-2)-(S)-----(Net N-1)--------\
                  ∂f/∂x = ∂f/∂g * ∂g/∂x  \     
                        = ∂f/∂g * y       \  g                       a
                        = ∂f/∂g * 0.01    (*)---(S)-----(Net N)-(S)------\
                                          / ∂f/∂g                   ∂f/∂a \
                                         /                                 \
                    y = 0.01            /                                  (+)--- f
-(S)--(Net N-2)-(S)--------------------/                              b    /
                   ∂f/∂y = ∂f/∂g * ∂g/∂y                          --------/
                                                                    ∂f/∂b 
'''

- 위 network에서 Net N-2의 출력이 -10이었다면 Simoid curve에 따라 출력 y는 0.01과 같이 0에 가까운 값이 된다.

- 그러므로 chain rule에 따라 ∂f/∂x = ∂g/∂x ＊ 0.01이 된다. 이 값은 이전 network과 또다시 곱의 연산을 거쳐서 미분의 과정을 반복하게 되는데, 마찬가지로 Net N-3의 sigmoid curve에 따라 y에 해당하는 값은 0에 가까워 질수 있고, x에 해당하는 값은 network이 깊어질수록 미분을 거치면서 "0.01 ＊ 0.02 ＊ ... ＊ 0.001" 로 누적되어 거의 0에 가까워진다.

- 결론적으로 network이 깊어질수록 input layer의 입력 x가 output layer의 출력 f에 미치는 영향은 0에 가까워 지기 때문에 학습이 network 깊이에 비례하여 일어나지 않게 된다. 보통 network의 깊이가 4단 이상이 되면 학습이 일어나지 않는다. 

- 이와 같이 경사가 사라지는 현상을 Vanishing gradient 라고 한다.

<img src="http://www.birc.co.kr/wp-content/uploads/2018/01/vanishing_gradient-1024x386.png" alt="" title="" />

## 1.3. ReLU (Rectified Linear Unit)

- Sigmoid activation function이 가진 vanishing gradient 문제점을 해결하기 위해 Rectifed linear unit이 만들어 졌으며 아래 파란색 그래프와 같이 0이하의 값은 사용하지 않는다는 원리이다.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/6/6c/Rectifier_and_softplus_functions.svg/495px-Rectifier_and_softplus_functions.svg.png" alt="" title="" />

- Deep neural network에서 output layer에서는 0.0 ~ 1.0 사이의 출력을 받아 classifier 해야하므로 sigmoid activation을 사용해야 하고, 나뭐지 input layer 및 hidden layer에는 relu activation을 사용하면 학습이 잘된다.

- Tenorflow에서는 아래와 같이 sigmoid 함수를 relu로 바꿔서 사용하면 된다.
  - L1 = tf.relu(tf.matmul(X, W1) + b1)
  - L2 = tf.relu(tf.matmul(L1, W2) + b2)
  - ...
  - L10 = tf.relu(tf.matmul(L9, W10) + b10)
  - hypothesis = tf.sigmoid(tf.matmul(L10, W11) + b11)
  
## 1.4. Activation functions
- Sigmoid, ReLU이외에도 아래와 같이 다양한 activation fuction이 존재한다.

<img src="https://cdn-images-1.medium.com/max/1600/1*DRKBmIlr7JowhSbqL6wngg.png" alt="" title="" />

- Activation function을 비교한 결과는 아래와 같다.
<img src="https://camo.qiitausercontent.com/f0c0b6b782f620311b05441ddda4228cc9b43f90/68747470733a2f2f71696974612d696d6167652d73746f72652e73332e616d617a6f6e6177732e636f6d2f302f3130303532332f62613931616231302d393238642d306464612d616438622d6338326534396562343564622e706e67" alt="" title="" />

## 2. Better weights initialize
- 이전 chapter 까지는 아래와 같이 weight 값을 random으로 생성하여 사용하였다. 이에 따라 학습 과정의 cost 값들이 조금씩 달랐다. 이에 따라 가장 좋은 초기화 방법을 사용할 필요가 있다.
 
 W1 = tf.Variable(tf.random_normal([2, 10], name='weight1'))
 

### 2.1. Set all initial weights to zero
- 아래와 같은 network에서 weight값을 0으로 초기화 했다면 편미분시 x가 0이 되고, x를 전달했던 이전 network의 미분 결과는 모두 0이 되어 학습이 일어나지 않는다. 따라서 절대로 0으로 초기화 하면 안된다.

In [None]:
'''
0        0         x
-(S)-(Net N-2)-(S)-----(Net N-1)--------\
                  ∂f/∂x = ∂f/∂g * ∂g/∂x  \     
                        = ∂f/∂g * w       \  g                       a
                        = ∂f/∂g * 0.00    (*)---(S)-----(Net N)-(S)------\
                                          / ∂f/∂g                   ∂f/∂a \
                                         /                                 \
                    w = 0.0             /                                  (+)--- f
                   --------------------/                              b    /
                                                                  --------/
                                                                    ∂f/∂b 
'''

### 2.2. Xavier initialization
- Makes sure the weights are 'just right', not too small, not too big

- Using number of input (fan_in) and output(fan_out)

- 구현 방법은 아래와 같이 입력의 개수와 출력의 개수이용하여 식에 대입하면 된다. 놀랍도록 결과가 좋지만 더 놀라운 것은 왜 이렇게 되는지는 설명은 불가능 하다고 한다.

- 실제 적용하여 테스트해보면 학습의 초기 과정부터 cost가 금방 떨어지는것을 확인할 수 있다.

In [None]:
'''
# Xavier initialization
W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in/2)
'''

- Tensorflow 에서는 아래와 같이 사용한다.

In [None]:
'''
W1 = tf.get_variable("W1", shape=[784, 256], initializer=tf.contrib.layers.xavier_initializer())
'''

- Restricted Boatman Machine(RBM) 초기화를 비롯하여 LSUV, OrthonNorm, Xavier, MSRA와 같은 초기화 방법들이 개발되었으나, 2016년 현재에도 어떻게 초기화 하는 방법이 제일 좋은지 밝혀지지 않았다.

- 현업에서는 xavier, MSRA 초기화를 많이 사용하는 편인데, 개발자가 직접 변경해가며 좋은 결과를 찾아야 한다.

## 3. Dropout

### 3.1. Overfitting
- Network의 깊이가 깊을수록 overfitting 될 가능성이 높다.

- Overfitting 된 경우 traning data로 학습후 traning data로 테스트시 accuracy가 0.99가 나오더라도 아래 그래프와 같이 test data로 accuracy 측정시 에러가 떨어질수 있다.

<img src="https://i.stack.imgur.com/rpqa6.jpg" alt="" title="" />

### 3.2. Solution for overfitting
  - More training data
  - Regularization (chap7. machine learning tips 참조)
  - Dropout
  - Model ensemble

### 3.3. Dropout
- Randomly set some neurons to zero in the forward pass

- 아래 그림과 같이 forward 방향으로 랜덤하게 network을 끈어 버리는 것을 의미한다.

- 반드시 학습 과정에만 사용해야 한다. 평가시에는 절대로 적용하지 않아야 한다.

<img src="https://cdn-images-1.medium.com/max/1044/1*iWQzxhVlvadk6VAJjsgXgg.png" alt="" title="" />


- Tensorflow 에서는 아래와 같이 사용한다.
  - 한 network layer에서 weight 설계시 activation 출력을 dropout 함수의 입력받아서 다음 network layer의 입력으로 내보내도록 한다.
  - Training 시 dropout rate를 설정한다. 일반적으로 0.5 ~ 0.7을 사용하는데 0.7이라면  70%를 통과 시키고 30%는 끈어버리겠다는 의미이다.
  - Evaluation 시에는 dropout rate를 1.0으로 사용한다. 즉 dropout을 적용하지 않겠다는 의미이다.

In [None]:
'''
# dropout (keep_prob) rate 0.7 on training, but should be 1 for testing!!
keep_prob = tf.placeholder(tf.float32)

# Weights and bias for NN layers
W1 = tf.get_variable("W1", shape=[784, 512], initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)
L1 = tf.nn.dropout(L1, keep_prob=keep_prob)

W2 = tf.get_variable("W2", shape=[512, 512], initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)
L2 = tf.nn.dropout(L2, keep_prob=keep_prob)


# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys, keep_prob: 0.7}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels, keep_prob: 1}))
'''

## 4. Model ensemble
- 아래 그림과 같이 다수의 독립적인 Traning set과 Learning model을 학습시켜 결과를 합치는 emsemble 이라 한다.

- 실제 적용시 약 2% ~ 5%까지 accuracy가 향상된다.


<img src="https://www.researchgate.net/publication/273153116/figure/fig1/AS:267380941127694@1440759992754/BsN-Score-ensemble-neural-network-SF-using-boosting-approach.png" alt="" title="" />

## 5. Various Neural Network
- Neural network의 각 network layer를 다양한 형태로 라우팅하여 Accuracy를 끌어올릴수 있다.

- 다양한 아이디어로 network을 구성해보고 잘되면 된다!

### 5.1. Feed forward neural network

In [None]:
'''
(X)───(L1)───(L2)───(L3)───(L4)───(L6)───(L7)───(Y)
'''

### 5.2. Fast forward neural network

In [None]:
'''
(X)────(L1)──●──(L2)────(L3)──⊙──(L4)──●──(L6)───(L7)──⊙──(L8)───(Y)
             │                ▲        │               ▲
             └────────────────┘        └───────────────┘
'''

### 5.3. Split and merge neural network

In [None]:
'''
            ┌──(L2)───(L3)──┐
            │               ▼
(X)───(L1)──●               ⊙────(L6)───(L7)───(Y)
            │               ▲
            └──(L4)───(L5)──┘
'''

### 5.4. Convolutional neural network

In [None]:
'''
(X1)───(L1)───(L2)──┐
                    ▼
(X2)───(L3)───(L4)──⊙──(L7)───(L8)───(Y)
                    ▲
(X3)───(L5)───(L6)──┘
'''

### 5.5. Recurrent neural network

In [None]:
'''
(X1) ────(L1)──●──(L2)──●──(L3)──●──(L4)──┐
               │        │        │        │
               ▼        ▼        ▼        ▼
(X2) ────(L5)──⊙──(L6)──⊙──(L7)──⊙──(L8)──⊙──(Y)
               ▲        ▲        ▲        ▲
               │        │        │        │
(X3) ────(L9)──●──(L10)─●──(L11)─●──(L12)─┘
'''

## 6. Lab for Improve Performance

### 6.1. Lab 1: Softmax classifier for MNIST (Accuracy 0.9023)
- Chapter 7의 Lab4: MNIST introduction과 다른점은 아래와 같다.
  - cost 함수를 직접 구현하였으나, softmax_cross_entropy_with_logits_v2로 변경하였다.
  - GradientDescentOptimizer 대신 AdamOptimizer로 변경하였다.
  - Accuracy 결과는 비슷하다.

- Weight과 Bias는 랜덤값을 사용한다.

- Network 간단하게 Input layer, Hidden layer, Output layer로 구성되었다.

In [2]:
################################################################################
# lab10-1 : minist by softmax cross entropy
#           accuracy : 0.9023
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
nb_classes = 10

# Input placeholders, MNIST data image of shape 28 * 28 = 784 pixel
X = tf.placeholder(tf.float32, shape=[None, 784])
# 0 ~ 9 digits recognition = 10 classed
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])

# Weights and bias for NN layers
W = tf.Variable(tf.random_normal([784, nb_classes]))
b = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis using softmax
hypothesis = tf.matmul(X, W) + b

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
    print("Epoch:", "%04d" % (epoch + 1), "\tCost:", "{:.9f}".format(avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1]}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 	Cost: 4.715295316
Epoch: 0002 	Cost: 1.608079906
Epoch: 0003 	Cost: 1.069499107
Epoch: 0004 	Cost: 0.848682950
Epoch: 0005 	Cost: 0.726699845
Epoch: 0006 	Cost: 0.649306182
Epoch: 0007 	Cost: 0.593791151
Epoch: 0008 	Cost: 0.552169523
Epoch: 0009 	Cost: 0.519451609
Epoch: 0010 	Cost: 0.493092402
Epoch: 0011 	Cost: 0.470662844
Epoch: 0012 	Cost: 0.451730263
Epoch: 0013 	Cost: 0.435721565
Epoch: 0014 	Cost: 0.421236033
Epoch: 0015 	Cost: 0.408534826
Learning finished!!

Accuracy: 0.8974

Test one label and prediction...
Label:     	 [4]
Prediction:	 [4]


### 6.2. Lab 2: Neural network for MNIST (Accuracy 0.9423)
- 위 Lab1과의 차이점은 아래와 같다.
  - 3단의 Neural network으로 구성되었다.
  - Hidden layer의 activation function은 ReLU를 적용하였다.
  - Accuracy 결과는 0.9423%로 약 4% 상승 되었다.

In [19]:
################################################################################
# lab10-2 : minist by nerual network
#           accuracy : 0.9423
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
nb_classes = 10

# Input placeholders, MNIST data image of shape 28 * 28 = 784 pixel
X = tf.placeholder(tf.float32, shape=[None, 784])
# 0 ~ 9 digits recognition = 10 classed
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])

# Weights and bias for NN layers
W1 = tf.Variable(tf.random_normal([784, 256]))
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.Variable(tf.random_normal([256, 256]))
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.Variable(tf.random_normal([256, nb_classes]))
b3 = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis using softmax
hypothesis = tf.matmul(L2, W3) + b3

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
    print("Epoch:", "%04d" % (epoch + 1), "\tCost:", "{:.9f}".format(avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1]}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 	Cost: 168.082875335
Epoch: 0002 	Cost: 41.620471441
Epoch: 0003 	Cost: 26.003158995
Epoch: 0004 	Cost: 18.060220715
Epoch: 0005 	Cost: 12.970530490
Epoch: 0006 	Cost: 9.597306450
Epoch: 0007 	Cost: 7.096123880
Epoch: 0008 	Cost: 5.451318178
Epoch: 0009 	Cost: 3.946004150
Epoch: 0010 	Cost: 3.001856147
Epoch: 0011 	Cost: 2.205120762
Epoch: 0012 	Cost: 1.806419487
Epoch: 0013 	Cost: 1.242452548
Epoch: 0014 	Cost: 1.038211533
Epoch: 0015 	Cost: 0.754097386
Learning finished!!

Accuracy: 0.9448

Test one label and prediction...
Label:     	 [1]
Prediction:	 [1]


### 6.3. Lab 3: Xavier init neural network for MNIST (Accuracy 0.9730)
- 위 Lab2와의 차이점은 아래와 같다.
  - Weight의 초기화에 랜덤을 사용하지 않고, Xavier 초기화를 적용하였다.
  - Accuracy 결과는 0.9730%로 약 3% 상승 되었다

In [3]:
################################################################################
# lab10-3 : minist by nerual network with xavier
#           accuracy : 0.9780
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
nb_classes = 10

# Input placeholders, MNIST data image of shape 28 * 28 = 784 pixel
X = tf.placeholder(tf.float32, shape=[None, 784])
# 0 ~ 9 digits recognition = 10 classed
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])

# Weights and bias for NN layers
W1 = tf.get_variable("W1", shape=[784, 256],
        initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.get_variable("W2", shape=[256, 256],
        initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.get_variable("W3", shape=[256, nb_classes],
        initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis using softmax
hypothesis = tf.matmul(L2, W3) + b3

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
    print("Epoch:", "%04d" % (epoch + 1), "\tCost:", "{:.9f}".format(avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1]}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 	Cost: 0.294609584
Epoch: 0002 	Cost: 0.113068595
Epoch: 0003 	Cost: 0.074517656
Epoch: 0004 	Cost: 0.052327855
Epoch: 0005 	Cost: 0.038574041
Epoch: 0006 	Cost: 0.030784411
Epoch: 0007 	Cost: 0.025149835
Epoch: 0008 	Cost: 0.019272849
Epoch: 0009 	Cost: 0.013632683
Epoch: 0010 	Cost: 0.014041129
Epoch: 0011 	Cost: 0.014991121
Epoch: 0012 	Cost: 0.012427811
Epoch: 0013 	Cost: 0.011833470
Epoch: 0014 	Cost: 0.007531284
Epoch: 0015 	Cost: 0.008567470
Learning finished!!

Accuracy: 0.9744

Test one label and prediction...
Label:     	 [9]
Prediction:	 [9]


### 6.4. Lab 4: Deep neural network for MNIST (Accuracy 0.9785)
- 위 Lab3과의 차이점은 아래와 같다.
  - 5단의 neural network으로 좀더 deep 하게 구성하였다.
  - shape의 출력은 256개에서 512개로 좀더 wide하게 구성하였다.
  - Accuracy 결과는 0.978%로 약 0.5% 상승 되었다. 때에 따라서 accuracy가 줄기도 한다. 아마도 overfitting 된것으로 보인다. 이에 따라 lab 5에서 dropout 으로 network을 설계할 것이다.

In [3]:
################################################################################
# lab10-4 : minist by nerual network with deep learning
#           accuracy : 0.9785
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
nb_classes = 10

# Input placeholders, MNIST data image of shape 28 * 28 = 784 pixel
X = tf.placeholder(tf.float32, shape=[None, 784])
# 0 ~ 9 digits recognition = 10 classed
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])

# Weights and bias for NN layers
W1 = tf.get_variable("W1", shape=[784, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)

W2 = tf.get_variable("W2", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.get_variable("W3", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3) + b3)

W4 = tf.get_variable("W4", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4) + b4)

W5 = tf.get_variable("W5", shape=[512, nb_classes],
        initializer=tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([nb_classes]))

# Hypothesis using softmax
hypothesis = tf.matmul(L4, W5) + b5

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
    print("Epoch:", "%04d" % (epoch + 1), "\tCost:", "{:.9f}".format(avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1]}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 	Cost: 0.301736783
Epoch: 0002 	Cost: 0.108016018
Epoch: 0003 	Cost: 0.071312178
Epoch: 0004 	Cost: 0.053806748
Epoch: 0005 	Cost: 0.039286335
Epoch: 0006 	Cost: 0.034092403
Epoch: 0007 	Cost: 0.030653502
Epoch: 0008 	Cost: 0.027925185
Epoch: 0009 	Cost: 0.020962704
Epoch: 0010 	Cost: 0.020842491
Epoch: 0011 	Cost: 0.017203622
Epoch: 0012 	Cost: 0.018255131
Epoch: 0013 	Cost: 0.016747021
Epoch: 0014 	Cost: 0.015837919
Epoch: 0015 	Cost: 0.014658532
Learning finished!!

Accuracy: 0.9746

Test one label and prediction...
Label:     	 [3]
Prediction:	 [3]


### 6.5. Lab 5: Dropout neural network for MNIST (Accuracy 0.9848)
- 위 Lab4와의 차이점은 아래와 같다.
  - 5단의 neural network에 deropout을 적용하였다.
  - Accuracy 결과는 0.9848%로 약 1% 상승 되었다.

In [3]:
################################################################################
# lab10-5 : minist by nerual network with dropout
#           accuracy : 0.9848
################################################################################

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
nb_classes = 10

# dropout (keep_prob) rate 0.7 on training, but should be 1 for testing!!
keep_prob = tf.placeholder(tf.float32)

# Input placeholders, MNIST data image of shape 28 * 28 = 784 pixel
X = tf.placeholder(tf.float32, shape=[None, 784])
# 0 ~ 9 digits recognition = 10 classed
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])

# Weights and bias for NN layers
W1 = tf.get_variable("W1", shape=[784, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X, W1) + b1)
L1 = tf.nn.dropout(L1, keep_prob=keep_prob)

W2 = tf.get_variable("W2", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)
L2 = tf.nn.dropout(L2, keep_prob=keep_prob)

W3 = tf.get_variable("W3", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3) + b3)
L3 = tf.nn.dropout(L3, keep_prob=keep_prob)

W4 = tf.get_variable("W4", shape=[512, 512],
        initializer=tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4) + b4)
L4 = tf.nn.dropout(L4, keep_prob=keep_prob)

W5 = tf.get_variable("W5", shape=[512, nb_classes],
        initializer=tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([nb_classes]))
# Hypothesis using softmax
hypothesis = tf.matmul(L4, W5) + b5

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict={X: batch_xs, Y: batch_ys, keep_prob: 0.7}
        c, _ = sess.run([cost, optimizer], feed_dict=feed_dict)
        avg_cost += c / total_batch
    print("Epoch:", "%04d" % (epoch + 1), "\tCost:", "{:.9f}".format(avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels, keep_prob: 1}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1], keep_prob: 1}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 	Cost: 0.458700201
Epoch: 0002 	Cost: 0.172372331
Epoch: 0003 	Cost: 0.129912548
Epoch: 0004 	Cost: 0.108884575
Epoch: 0005 	Cost: 0.095104513
Epoch: 0006 	Cost: 0.080963188
Epoch: 0007 	Cost: 0.078037094
Epoch: 0008 	Cost: 0.068476953
Epoch: 0009 	Cost: 0.065428976
Epoch: 0010 	Cost: 0.061137256
Epoch: 0011 	Cost: 0.057880940
Epoch: 0012 	Cost: 0.054597637
Epoch: 0013 	Cost: 0.050098970
Epoch: 0014 	Cost: 0.051898345
Epoch: 0015 	Cost: 0.045763725
Learning finished!!

Accuracy: 0.9801

Test one label and prediction...
Label:     	 [6]
Prediction:	 [6]


### 6.6. Lab 6: High level tensorflow API for MNIST (Accuracy 0.9850)
- 위 Lab5와의 차이점은 아래와 같다.
  - High level의 tensorflow API를 적용하였고, 소스코드의 기능은 같다.
  - Accuracy 결과는 0.9850%로 비슷하다.

In [4]:
################################################################################
# lab10-6 : minist by nerual network with high level tensorflow api
#           accuracy : 0.9850
################################################################################
from tensorflow.contrib.layers import fully_connected, batch_norm, dropout
from tensorflow.contrib.framework import arg_scope
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random

# for reproducibility
tf.set_random_seed(777) 

# Import MNIST data
# Check out https://www.tensorflow.org/get_started/mnist/beginners for
# more information about the mnist dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("../../mnist_data/", one_hot=True)

# Parameters
learning_rate = 0.01 # we can use large learning rate using Batch Normalization
training_epochs = 15
batch_size = 100
keep_prob = 0.7
nb_classes = 10 # 0 ~ 9 digits recognition = 10 classed

# Input placeholders 
X = tf.placeholder(tf.float32, shape=[None, 784]) # imgage = 28*28 = 784 pixel
Y = tf.placeholder(tf.float32, shape=[None, nb_classes])
train_mode = tf.placeholder(tf.bool, name='train_mode')

# Layer output size
hidden_output_size = 512
final_output_size = nb_classes

# Weight initializer: xavier
xavier_init = tf.contrib.layers.xavier_initializer()

# Normalizer: batch normalization    
bn_params = {
    'is_training': train_mode,
    'decay': 0.9,
    'updates_collections': None
}

# We can build short code using 'arg_scope' to avoid duplicate code
# same function with different arguments
with arg_scope([fully_connected],
        activation_fn = tf.nn.relu,
        weights_initializer = xavier_init,
        biases_initializer = None,
        normalizer_fn = batch_norm,
        normalizer_params = bn_params
        ):
    hidden_layer1 = fully_connected(X, hidden_output_size, scope="h1")
    h1_drop = dropout(hidden_layer1, keep_prob, is_training=train_mode)
    hidden_layer2 = fully_connected(h1_drop, hidden_output_size, scope="h2")
    h2_drop = dropout(hidden_layer2, keep_prob, is_training=train_mode)
    hidden_layer3 = fully_connected(h2_drop, hidden_output_size, scope="h3")
    h3_drop = dropout(hidden_layer3, keep_prob, is_training=train_mode)
    hidden_layer4 = fully_connected(h3_drop, hidden_output_size, scope="h4")
    h4_drop = dropout(hidden_layer4, keep_prob, is_training=train_mode)
    hypothesis = fully_connected(h4_drop, final_output_size,
            activation_fn=None, scope="hypothesis")

# Define cost/loss & optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=hypothesis, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initialize
sess = tf.Session()
sess.run(tf.global_variables_initializer())

# Train model
for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = int(mnist.train.num_examples / batch_size)

    for i in range(total_batch):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        feed_dict_train = {X: batch_xs, Y: batch_ys, train_mode: True}
        feed_dict_cost = {X: batch_xs, Y: batch_ys, train_mode: False}
        opt = sess.run(optimizer, feed_dict=feed_dict_train)
        c = sess.run(cost, feed_dict=feed_dict_cost)
        avg_cost += c / total_batch

    print("[Epoch: {:>4}]\tCost: {:>.9}".format(epoch + 1, avg_cost))

print("Learning finished!!")

# Test model and check accuracy
correct_prediction = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print("\nAccuracy:", sess.run(accuracy, feed_dict={X: mnist.test.images,
    Y: mnist.test.labels, train_mode: False}))

# Get one and predict
r = random.randint(0, mnist.test.num_examples - 1)
print("\nTest one label and prediction...")
print("Label:     \t", sess.run(tf.argmax(mnist.test.labels[r:r + 1], 1)))
print("Prediction:\t", sess.run(tf.argmax(hypothesis, 1), feed_dict={
    X: mnist.test.images[r:r + 1], train_mode: False}))

sess.close()

Extracting ../../mnist_data/train-images-idx3-ubyte.gz
Extracting ../../mnist_data/train-labels-idx1-ubyte.gz
Extracting ../../mnist_data/t10k-images-idx3-ubyte.gz
Extracting ../../mnist_data/t10k-labels-idx1-ubyte.gz
[Epoch:    1]	Cost: 0.38708553
[Epoch:    2]	Cost: 0.330159667
[Epoch:    3]	Cost: 0.321581603
[Epoch:    4]	Cost: 0.316077446
[Epoch:    5]	Cost: 0.311775169
[Epoch:    6]	Cost: 0.310022716
[Epoch:    7]	Cost: 0.307601213
[Epoch:    8]	Cost: 0.307955301
[Epoch:    9]	Cost: 0.305440181
[Epoch:   10]	Cost: 0.303470927
[Epoch:   11]	Cost: 0.302448962
[Epoch:   12]	Cost: 0.302503021
[Epoch:   13]	Cost: 0.302841304
[Epoch:   14]	Cost: 0.30146164
[Epoch:   15]	Cost: 0.302058428
Learning finished!!

Accuracy: 0.9832

Test one label and prediction...
Label:     	 [9]
Prediction:	 [9]
