* activation function: 각 unit의 sigmoid 같은 함수

## Backpropagation의 문제
- layer가 많아질수록 accuracy가 안 좋다. 
- Vanishing gradient: 어떤 값의 sigmoid는 0~1 사이일 것이고, 앞쪽의 layer의 경우 곱해지는 값이 굉장히 작아지게 된다. 앞으로 갈수록 점차 경사도가 안 좋아진다.
- 원인
    1. We used wrong type of non-linearity (by Geoffrey Hinton)
    2. We initialized the weights in a stupid way
    
   

### Solution1 : ReLU(Rectified Linear Unit)
<img src="relu.png">

#### NN에서는 이제 sigmoid 대신 relu를 사용
마지막 단에서는 0~1의 값이 필요하니까 sigmoid를 사용!

##### 다른 종류: Leaky ReLu   /   Maxout   /   ELU   /   tanh(sigmoid 보완)

## Solution2
1. Not all 0's
2. 논문: 'A Fast Learning Algorithm for Deep Belief Nets' (이제 많이 사용 X)
    - Resticted Boatman Machine (RBM)
    
### How can we use RBM to initiailze weights?
1. Apply the RMB idea on adjacent two layers as a pre-training step
2. Continue the first process to all layers
3. This will set weights

### Good News
- Simple methods are OK

#### Xavier/He initialization
- Makes sure the wieghts are 'just right', not too small, not too big
- Using number of input (fan_in) and output (fan_out)
    1. W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in)
    2. W = np.random.randn(fan_in, fan_out) / np.sqrt(fan_in/2)

# NN dropout and model ensemble

이것을 하는 이유? Overfitting

* Am I overfitting ?
    1. Very high accuracy on the training dataset
    2. But poor accuracy on the test data set
    
- layer의 층을 늘릴수록, train error는 계속 작아지지만 test error는 줄어들다가 어느 순간 다시 커지게 된다. 

* Solution
    1. More Training Data
    2. Reduce the # of features
    3. Regularization  (+Dropout)
  
  
### Regularization: Dropout - randomly set some neuros to zero in the forward pass
- 학습: dropout <-> 실전: dropout_rate = 1


### Ensemble?
<img src="ensem.png">
출처: http://www.slideshare.net/sasasiapacific/ipb-improving-the-models-predictive-power-with-ensemble-approaches

## Various ways to stack NN module

### 1. Feedforward neural network

### 2. Fast forward
- signal을 두 단 앞으로 보내기

### 3. Split & merge


### 4. Recurrent network (RNN)


# NN for MNIST
- softmax보다 단을 더 많이! + ReLU

In [2]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

In [2]:
mnist = input_data.read_data_sets('MNIST_data/', one_hot = True) #label을 one_hot으로 바꾸기

nb_classes = 10

X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, nb_classes])

W1 = tf.Variable(tf.random_normal([784, 256]))
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X,W1) + b1)

W2 = tf.Variable(tf.random_normal([256, 256]))
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2) + b2)

W3 = tf.Variable(tf.random_normal([256, nb_classes]))
b3 = tf.Variable(tf.random_normal([nb_classes]))
hypothesis = tf.matmul(L2, W3) +b3

Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.


In [4]:
#define cost/loss & Optimizer
learning_rate = 0.01
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = hypothesis,
                                                             labels = Y))

#AdamOptimizer : uses moving averages of the parameters (momentum)
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

In [23]:
is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

In [24]:
training_epochs = 15
batch_size = 100

config = tf.ConfigProto(device_count={'GPU': 0})
with tf.Session(config = config) as sess:
    sess.run(tf.global_variables_initializer())
    #Training cycle:
    for epoch in range(training_epochs):
        avg_cost = 0
        total_batch = int(mnist.train.num_examples/batch_size) #iteration 횟수
        
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)  #100개씩 data 읽어오기
            c, _ = sess.run([cost, optimizer], feed_dict = {X:batch_xs, Y: batch_ys})
            avg_cost += c / total_batch
            
        print('Epoch: ', '%04d' % (epoch +1), ' cost = ','{:.9f}'.format(avg_cost))
    
    #test the model using test sets
    print('Accuracy:', accuracy.eval(session = sess, feed_dict = {X: mnist.test.images,
                                                                 Y: mnist.test.labels}))

Epoch:  0001  cost =  48.531320539
Epoch:  0002  cost =  9.034787826
Epoch:  0003  cost =  4.922225090
Epoch:  0004  cost =  3.301491155
Epoch:  0005  cost =  2.714303530
Epoch:  0006  cost =  2.484510562
Epoch:  0007  cost =  2.334515521
Epoch:  0008  cost =  1.746345760
Epoch:  0009  cost =  1.806353526
Epoch:  0010  cost =  1.512919767
Epoch:  0011  cost =  1.351888039
Epoch:  0012  cost =  1.231365173
Epoch:  0013  cost =  0.959691505
Epoch:  0014  cost =  1.087991192
Epoch:  0015  cost =  0.963870677
Accuracy: 0.9589


# Xaiver initialization
- 처음부터 cost가 낮다: 초기값 설정이 잘 되었음을 의미한다.

In [37]:
tf.reset_default_graph() #tf는 기존의 변수를 지울 수 없음. 

X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, nb_classes])

W1 = tf.get_variable('W1', shape = [784,256],
                    initializer = tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([256]))
L1 = tf.nn.relu(tf.matmul(X,W1)+b1)

W2 = tf.get_variable('W2', shape = [256, 256],
                    initializer = tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([256]))
L2 = tf.nn.relu(tf.matmul(L1, W2)+b2)

W3 = tf.get_variable('W3', shape = [256, nb_classes],
                    initializer = tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([nb_classes]))
hypothesis = tf.matmul(L2, W3) + b3

In [38]:
#define cost/loss & Optimizer
learning_rate = 0.01
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = hypothesis,
                                                             labels = Y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

In [39]:
training_epochs = 15
batch_size = 100

config = tf.ConfigProto(device_count={'GPU': 0})
with tf.Session(config = config) as sess:
    sess.run(tf.global_variables_initializer())
    #Training cycle:
    for epoch in range(training_epochs):
        avg_cost = 0
        total_batch = int(mnist.train.num_examples/batch_size) #iteration 횟수
        
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)  #100개씩 data 읽어오기
            c, _ = sess.run([cost, optimizer], feed_dict = {X:batch_xs, Y: batch_ys})
            avg_cost += c / total_batch
            
        print('Epoch: ', '%04d' % (epoch +1), ' cost = ','{:.9f}'.format(avg_cost))
    
    #test the model using test sets
    print('Accuracy:', accuracy.eval(session = sess, feed_dict = {X: mnist.test.images,
                                                                 Y: mnist.test.labels}))

Epoch:  0001  cost =  0.261068299
Epoch:  0002  cost =  0.135333015
Epoch:  0003  cost =  0.118715815
Epoch:  0004  cost =  0.105170587
Epoch:  0005  cost =  0.093711926
Epoch:  0006  cost =  0.095917307
Epoch:  0007  cost =  0.083916248
Epoch:  0008  cost =  0.081445102
Epoch:  0009  cost =  0.079361560
Epoch:  0010  cost =  0.069836883
Epoch:  0011  cost =  0.074417542
Epoch:  0012  cost =  0.077027685
Epoch:  0013  cost =  0.066237198
Epoch:  0014  cost =  0.049401297
Epoch:  0015  cost =  0.073024502
Accuracy: 0.9691


# Deep NN for MNIST
- 3단에서 5단으로! node의 수도 늘리기

In [40]:
tf.reset_default_graph() 

X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None, nb_classes])

W1 = tf.get_variable('W1', shape = [784,512],
                    initializer = tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X,W1)+b1)

W2 = tf.get_variable('W2', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2)+b2)

W3 = tf.get_variable('W3', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3)+b3)

W4 = tf.get_variable('W4', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4)+b4)

W5 = tf.get_variable('W5', shape = [512, nb_classes],
                    initializer = tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([nb_classes]))
hypothesis = tf.matmul(L4, W5) + b5

In [42]:
#define cost/loss & Optimizer
learning_rate = 0.01
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = hypothesis,
                                                             labels = Y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

In [43]:
training_epochs = 15
batch_size = 100

config = tf.ConfigProto(device_count={'GPU': 0})
with tf.Session(config = config) as sess:
    sess.run(tf.global_variables_initializer())
    #Training cycle:
    for epoch in range(training_epochs):
        avg_cost = 0
        total_batch = int(mnist.train.num_examples/batch_size) #iteration 횟수
        
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)  #100개씩 data 읽어오기
            c, _ = sess.run([cost, optimizer], feed_dict = {X:batch_xs, Y: batch_ys})
            avg_cost += c / total_batch
            
        print('Epoch: ', '%04d' % (epoch +1), ' cost = ','{:.9f}'.format(avg_cost))
    
    #test the model using test sets
    print('Accuracy:', accuracy.eval(session = sess, feed_dict = {X: mnist.test.images,
                                                                 Y: mnist.test.labels}))

Epoch:  0001  cost =  0.488696199
Epoch:  0002  cost =  0.190954977
Epoch:  0003  cost =  0.158582509
Epoch:  0004  cost =  0.142573393
Epoch:  0005  cost =  0.134667423
Epoch:  0006  cost =  0.127765756
Epoch:  0007  cost =  0.111130574
Epoch:  0008  cost =  0.114275015
Epoch:  0009  cost =  0.111815214
Epoch:  0010  cost =  0.106692245
Epoch:  0011  cost =  0.116992086
Epoch:  0012  cost =  0.097166270
Epoch:  0013  cost =  0.100400571
Epoch:  0014  cost =  0.127970845
Epoch:  0015  cost =  0.108218026
Accuracy: 0.9681


기존보다 Deep하게 쌓았음에도 불구하고, 이전보다 Accuracy가 낮음.
 - overfitting이 그 원인
 
이를 예방하기 위해 Dropout을 사용

# Dropout for MNIST
- 통상적으로 keep_prob = 0.5~0.7
- Test할 때는 반드시 keep_prob = 1
- 그래서 keep_prob를 placeholder로 둔다

In [16]:
import tensorflow as tf
import random
# import matplotlib.pyplot as plt

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data/', one_hot = True) #label을 one_hot으로 바꾸기

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [24]:
tf.reset_default_graph() 

#dropout rate 0.7 on training, but should be 1 for testing
keep_prob = tf.placeholder(tf.float32)

In [25]:
X = tf.placeholder(tf.float32, [None, 784])
Y = tf.placeholder(tf.float32, [None,10])

W1 = tf.get_variable('W1', shape = [784,512],
                    initializer = tf.contrib.layers.xavier_initializer())
b1 = tf.Variable(tf.random_normal([512]))
L1 = tf.nn.relu(tf.matmul(X,W1)+b1)
L1 = tf.nn.dropout(L1, keep_prob = keep_prob)

W2 = tf.get_variable('W2', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b2 = tf.Variable(tf.random_normal([512]))
L2 = tf.nn.relu(tf.matmul(L1, W2)+b2)
L2 = tf.nn.dropout(L2, keep_prob = keep_prob)

W3 = tf.get_variable('W3', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b3 = tf.Variable(tf.random_normal([512]))
L3 = tf.nn.relu(tf.matmul(L2, W3)+b3)
L3 = tf.nn.dropout(L3, keep_prob = keep_prob)

W4 = tf.get_variable('W4', shape = [512, 512],
                    initializer = tf.contrib.layers.xavier_initializer())
b4 = tf.Variable(tf.random_normal([512]))
L4 = tf.nn.relu(tf.matmul(L3, W4)+b4)
L4 = tf.nn.dropout(L4, keep_prob = keep_prob)

W5 = tf.get_variable('W5', shape = [512, 10],
                    initializer = tf.contrib.layers.xavier_initializer())
b5 = tf.Variable(tf.random_normal([10]))
hypothesis = tf.matmul(L4, W5) + b5
hypotheis = tf.nn.dropout(L4, keep_prob = keep_prob)


In [28]:
#define cost/loss & Optimizer
learning_rate = 0.001  #이전보다 learning rate 더 작게
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits = hypothesis,
                                                             labels = Y))
optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)

is_correct = tf.equal(tf.argmax(hypothesis, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

In [29]:
training_epochs = 15
batch_size = 100

config = tf.ConfigProto(device_count={'GPU': 0})
with tf.Session(config = config) as sess:
    sess.run(tf.global_variables_initializer())
    #Training cycle:
    for epoch in range(training_epochs):
        avg_cost = 0
        total_batch = int(mnist.train.num_examples/batch_size) #iteration 횟수
        
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)  #100개씩 data 읽어오기
            c, _ = sess.run([cost, optimizer], feed_dict = {X:batch_xs, Y: batch_ys, keep_prob:0.7})
            avg_cost += c / total_batch
            
        print('Epoch: ', '%04d' % (epoch +1), ' cost = ','{:.9f}'.format(avg_cost))
    
    #test the model using test sets
    print('Accuracy:', accuracy.eval(session = sess, feed_dict = {X: mnist.test.images,
                                                                 Y: mnist.test.labels,
                                                                 keep_prob: 1}))

Epoch:  0001  cost =  0.454613173
Epoch:  0002  cost =  0.171633255
Epoch:  0003  cost =  0.132865671
Epoch:  0004  cost =  0.104946284
Epoch:  0005  cost =  0.093991493
Epoch:  0006  cost =  0.082176484
Epoch:  0007  cost =  0.075727579
Epoch:  0008  cost =  0.065757217
Epoch:  0009  cost =  0.063511627
Epoch:  0010  cost =  0.059284707
Epoch:  0011  cost =  0.056172906
Epoch:  0012  cost =  0.048765170
Epoch:  0013  cost =  0.051348418
Epoch:  0014  cost =  0.045860301
Epoch:  0015  cost =  0.046133322
Accuracy: 0.9829


# Optimizer
<img src="opt.png">

- 무엇이 좋은지 simulation할 수 있는 사이트
    - http://www.denizyuret.com/2015/03/alec-radfords-animations-for.html
    
- 통상적으로 Adam Optimizer가 좋음


<img src="sum.png">