# Batch Normalization - Solutions
Batch normalization is most useful when building deep neural networks. To demostrate this, we'll create a convolutional neural network with 20 convolutional layers, followed by fully connected layer. We'll use it to classify handwritten digits in the MNIST dataet, which should be familiar to you by now.

Thhis is **not** a good network for classifying MNIST digits. You could create a muchh simpler network and get better results. However, to give you hands-on experience with batch normalization, we had to make an example that was:
1. Complicated enough that training would benefit from batch normalization.
2. Simple enough that it would train quickly, since this is meant to be a short exercise just to give you some practice adding batch normalization.
3. Simple enough that the architecuture would be easy to understand without additional resources.

This notebook includes two versions of the network that you can edit. This first uses higher level funtions from the tf.layers package. The second is the same network, but uses only lower level functionsin the tf.nn package.
1. Batch Normalization with tf.layers.batch_normalization
2. Batch Normalization with tf.nn.batch_normalization

This following cell loads Tensorflow, downloads the MNIST dataset if necessary, and loads it into an object named mnist. You'll need to run this cell before running anything else in the notebook.

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True, reshape=False)

Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [2]:
"""
DO NOT MODIFY THIS CELL
"""
def fully_connected(prev_layer, num_units):
    """
    Create a fully connected layer with the give layer as input and the give number of neurons.
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
    return layer

We'll use the following funtion to create convolutional layers in our network. They are very basic: we're always using a 3x3 kernel, RELU activatin functions, strides of 1x1 on layers with odd depths, and strides of 2x2 on layers with even depths. We aren't bothering with pooling layers at all in this network.

This version of the funtions does not include batch normalization.

In [3]:
"""
DO NOT MODIFY THIS CELL
"""
def conv_layer(prev_layer, layer_depth):
    """
    Create a convolutional layer with the give layer as input
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :return Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1
    conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', activation=tf.nn.relu)
    return conv_layer

**Run the following cell**, along with the earlier cells(to load the dataset and define the necessary functions).
This cell builds the network **without** batch normalization, then trains it on the MNIST dataset. It displays loss and accuracy data periodically while training.

In [7]:
"""
DO NOT MODIFY THIS CELL
"""
def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    
    # Feed the inputs into a series of 20 convolutional layers
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i)
    
    # Flatten the output from the convolutional layers
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])
    
    # Add one fully connected layer
    layer = fully_connected(layer, 100)
    
    # Create the output layer with 1 node for each
    logits = tf.layers.dense(layer, 10)
    
    print(logits)
    
    # Define
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        
            # train this batch
            sess.run(train_opt, {inputs: batch_xs,
                                labels: batch_ys})
            
            # Periodically check the validation or training loss and accuray
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                             labels: mnist.validation.labels})
                print ('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))
        
        # At the end, score the final accuracy for bath the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                 labels: mnist.validation.labels})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                 labels: mnist.test.labels})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually, just to make sure batchh normalization really worked
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy, feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]]})
        print('Accuracy on 100 samples:', correct/100)
        
num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Tensor("dense_2/BiasAdd:0", shape=(?, 10), dtype=float32)
Batch:  0: Validation loss: 0.69004, Validation accuracy: 0.11260
Batch: 25: Training loss: 0.37000, Training accuracy: 0.09375
Batch: 50: Training loss: 0.32630, Training accuracy: 0.15625
Batch: 75: Training loss: 0.32496, Training accuracy: 0.10938
Batch: 100: Validation loss: 0.32535, Validation accuracy: 0.09900
Batch: 125: Training loss: 0.32635, Training accuracy: 0.06250
Batch: 150: Training loss: 0.32320, Training accuracy: 0.15625
Batch: 175: Training loss: 0.32810, Training accuracy: 0.14062
Batch: 200: Validation loss: 0.32530, Validation accuracy: 0.11260
Batch: 225: Training loss: 0.32757, Training accuracy: 0.09375
Batch: 250: Training loss: 0.32598, Training accuracy: 0.10938
Batch: 275: Training loss: 0.32371, Training accuracy: 0.12500
Batch: 300: Validation loss: 0.32534, Validation accuracy: 0.09060
Batch: 325: Training loss: 0.32623, Training accuracy: 0.10938
Batch: 350: Training loss: 0.32395, Training acc