# Gradient vanishing and exploding
## 2. Batch Normalization
### 2.1 Problem

Even if we use another initializing method and activation functions, we can't perfectly avoid gradient problem,

eased quite a lot though. Solution that I'll introduce in this note is very effective.

### 2.2 Solution

In 2015, research paper of SergeyIoffe and Christian Szegedy suggest **batch normalization** to solve

gradient problem. It adds calculation before going through activation function. Basically, make input data's mean 0 and normalize, then

scale and move with two additional parameters on each layer.(gamma and beta)

Mean and standard deviation are calculated on each minibatch. 

- (1) Calculate mean and standard deviation

- (2) Normalize inputs of minibatch (x(i))

- (3) Draw 'z' with z = (gamma)*(x(i)) + (beta)

---

With this method, performance of deep neural network is greatly improved. 

- Vanishing gradient problem is greatly reduced even with tanh and sigmoid function.

- Large number of learning rate is available.

- Play role of regularization to avoid overfitting abit.

But It increase complexity of calculation and computing expence.


### 2.3 In TensorFlow


In [1]:
import tensorflow as tf

tf.reset_default_graph()

n_inputs = 28*28
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")

training = tf.placeholder_with_default(False, shape=(), name="training")

# layer1
hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1")
bn1 = tf.layers.batch_normalization(hidden1, training=training, momentum=0.9)
bn1_act = tf.nn.elu(bn1)

# layer2
hidden2 = tf.layers.dense(bn1_act, n_hidden2, name="hidden2")
bn2 = tf.layers.batch_normalization(hidden1, training=training, momentum=0.9)
bn2_act = tf.nn.elu(bn2)

# output
logits_before_bn = tf.layers.dense(bn2_act, n_outputs, name="outputs")
logits = tf.layers.batch_normalization(logits_before_bn, training=training, momentum=0.9)

  from ._conv import register_converters as _register_converters


Batch normalization algorithm uses Exponential decay.(similar to exponential smoothing)

There is momentum parameter to calculate. As minibatch's size is small, momentum near 1 is appropriate. (0.9 -> 0.999)

##### partial() function

To reduce dupication of codes, we can use partial() function.

In [2]:
from functools import partial

tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")

training = tf.placeholder_with_default(False, shape=(), name="training")

my_batch_norm_layer = partial(tf.layers.batch_normalization, training=training, momentum=0.9)

hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1")
bn1 = my_batch_norm_layer(hidden1)
bn1_act = tf.nn.elu(bn1)

hidden2 = tf.layers.dense(bn1_act, n_hidden2, name="hidden2")
bn2 = my_batch_norm_layer(hidden2)
bn2_act = tf.nn.elu(bn2)

logits_before_bn = tf.layers.dense(bn2_act, n_outputs, name="outputs")
logits = my_batch_norm_layer(logits_before_bn)

##### Difference in run

There are two difference in that step.

FIrst, while we train model, we have to set 'True' on placeholder of training when we execute 

calculation depending on batch_normalization().

Second, batch_normalization() function produce some calculation to evaluate on every training step.

These calculation are automatically added on UPDATE_OPS collection, so, all we have to do is draw thoes calculation and 

execute on every training iteration.

In [3]:
'''
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, Y_batch = mnist.train.next_batch(batch_size)
            sess.run([training_op, extra_update_ops],
                     feed_dict={training: True, X: X_batch, Y: Y_batch})
        accruacy_val = accuracy.eval(feed_dict={X: mnist.validation.images, Y: mnist.validation.labels})
        
        print(epoch, "Dev accuracy: ", accuracy_val)
        
    save_path = saver.save(sess, "./my_model_final.ckpt")
'''

'\nextra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)\n\nwith tf.Session() as sess:\n    init.run()\n    for epoch in range(n_epochs):\n        for iteration in range(mnist.train.num_examples // batch_size):\n            X_batch, Y_batch = mnist.train.next_batch(batch_size)\n            sess.run([training_op, extra_update_ops],\n                     feed_dict={training: True, X: X_batch, Y: Y_batch})\n        accruacy_val = accuracy.eval(feed_dict={X: mnist.validation.images, Y: mnist.validation.labels})\n        \n        print(epoch, "Dev accuracy: ", accuracy_val)\n        \n    save_path = saver.save(sess, "./my_model_final.ckpt")\n'