### Building a Convolutional Neural Network 
This notebook is a brief guide to using TensorFlow's layers modules to build a convolutional neural network 
model to recognize the handwritten digits in the MNIST data set, which contains 60,000 training 
examples and 10,000 test examples of the handwritten digits 0–9, stored as 28x28-pixel black-and-white images.

The layers module provides methods that facilitate the creation of convolutional neural networks with fully 
connected layers and convolutional layers, adding activation functions, and applying dropout regularization. 

Convolutional neural networks (CNNs) are an architecture for image classification tasks consisting of a series 
of filters that are applied to images represented as pixel vectors to extract (and learn) higher-level features, 
which can later be used for classification. 

A CNN contains three components:
 - Convolutional layers, each of which applies a specified convolution filter to the image. Each layer encodes subregions of the image as a single value to be stored in the output feature map.
- Pooling layers, which reduce the dimensionality of the feature map by using non-linear functions (e.g., max-pooling, average-pooling, etc.). Max pooling partitions the input image into a set of non-overlapping rectangles (e.g., 2x2-pixel regions) and, for each such rectangle, outputs the maximum, thus reducing the spatial size of the representation and hence the amount of computation needed in the network. This also helps control overfitting. Pooling layers are commonly inserted between successive convolutional layers.
- Fully-connected layers (where every node in the layer is connected to every node in the preceding layer), which perform classification on the features extracted by the convolutional layers.

Each CNN is composed of a stack of modules where each module consists of a convolutional layer followed by a pooling layer, together performing feature extraction. The last convolutional module is linked to one or more fully-connected (i.e., dense) layers that perform classification. The last dense layer in a CNN contains a single node for each of the possible classes the model may predict, with a softmax activation function to generate a value between 0–1 for each node.  The softmax values for a given image are interpreted as a measure of how likely it is that the image belongs to each of the possible classes. (The sum of all the softmax values is equal to 1.)

Let's start by importing the necessary modules.

In [2]:
!pip install --upgrade tensorflow

^C


In [4]:
import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz


In [5]:
# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

In [6]:
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # Reshape input picture
    x = tf.reshape(x, shape=[-1, 28, 28, 1])

    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output, class prediction
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

In [7]:
# Store layers weight & bias
weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.



In [9]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print("Optimization Finished!")

    # Calculate accuracy for 256 mnist test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
                                      y: mnist.test.labels[:256],
                                      keep_prob: 1.}))

Iter 1280, Minibatch Loss= 25206.261719, Training Accuracy= 0.27344
Iter 2560, Minibatch Loss= 11158.732422, Training Accuracy= 0.46875
Iter 3840, Minibatch Loss= 5476.475586, Training Accuracy= 0.71094
Iter 5120, Minibatch Loss= 5257.539551, Training Accuracy= 0.69531
Iter 6400, Minibatch Loss= 3663.018066, Training Accuracy= 0.78906
Iter 7680, Minibatch Loss= 2602.932129, Training Accuracy= 0.80469
Iter 8960, Minibatch Loss= 2046.976196, Training Accuracy= 0.85938
Iter 10240, Minibatch Loss= 4052.697021, Training Accuracy= 0.80469
Iter 11520, Minibatch Loss= 2944.110840, Training Accuracy= 0.82812
Iter 12800, Minibatch Loss= 2067.766357, Training Accuracy= 0.87500
Iter 14080, Minibatch Loss= 1334.779053, Training Accuracy= 0.92188
Iter 15360, Minibatch Loss= 1942.644531, Training Accuracy= 0.90625
Iter 16640, Minibatch Loss= 1402.522339, Training Accuracy= 0.90625
Iter 17920, Minibatch Loss= 1830.195557, Training Accuracy= 0.89062
Iter 19200, Minibatch Loss= 2092.083008, Training Acc

Iter 157440, Minibatch Loss= 0.000000, Training Accuracy= 1.00000
Iter 158720, Minibatch Loss= 262.972107, Training Accuracy= 0.96094
Iter 160000, Minibatch Loss= 251.116821, Training Accuracy= 0.96875
Iter 161280, Minibatch Loss= 293.155273, Training Accuracy= 0.95312
Iter 162560, Minibatch Loss= 125.187431, Training Accuracy= 0.96875
Iter 163840, Minibatch Loss= 209.545898, Training Accuracy= 0.96094
Iter 165120, Minibatch Loss= 447.211945, Training Accuracy= 0.93750
Iter 166400, Minibatch Loss= 120.575859, Training Accuracy= 0.96875
Iter 167680, Minibatch Loss= 85.991066, Training Accuracy= 0.96875
Iter 168960, Minibatch Loss= 69.658447, Training Accuracy= 0.98438
Iter 170240, Minibatch Loss= 0.000000, Training Accuracy= 1.00000
Iter 171520, Minibatch Loss= 90.872154, Training Accuracy= 0.97656
Iter 172800, Minibatch Loss= 223.042633, Training Accuracy= 0.97656
Iter 174080, Minibatch Loss= 94.388855, Training Accuracy= 0.98438
Iter 175360, Minibatch Loss= 125.202782, Training Accura