# Chapter 13: Convolutional Neural Networks

#### Intro

* CNNs are best for visual perception, voice recognition and natural language processing (NLP).
* In contrast to fully connected DNN (which scale badly for millions of features), each neuron is only partially connected to the next layer.
* Neurons in the first convolutional layer are not connected to every single pixel in the input image, but only to pixels in their receptive fields. In turn, each neuron in the second convolutional layer is connected only to neurons located within a small rectangle in the first layer.
* This idea is based on the mechanics of how the visual cortex works i.e. neurons build upon only a subset of the previous layer (a local receptive field) to process increasingly complex patterns in order to understand images.

#### Implementation

* A neuron located in row $i$, column $j$ of a layer is connected to the neurons in the previous layer located in rows $i$ to $i + f_h – 1$, columns $j$ to $j + f_w – 1$. 
* $f_h$ and $f_w$ are the height and width of the receptive field. 
* The distance between two consecutive receptive fields is called the $stride$. 
* To maintain neuron size between layers, it is common to add zeros around the inputs. This is called zero padding.

#### Visual representation

<img src="files/conv_vis_rep.png">

In [71]:
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [93]:
# Define layers
def conv_layer(filters=32, activation_fn=tf.nn.relu):
    return lambda layer : tf.layers.conv2d(
        inputs=layer,
        filters=filters,
        kernel_size=[5, 5],
        padding="SAME",
        activation=activation_fn)

def relu_layer():
    return lambda layer : tf.nn.relu(layer)

def pool_layer():
    return lambda layer : tf.nn.max_pool(
        layer, 
        ksize=[1,2,2,1], 
        strides=[1,2,2,1],
        padding="VALID")

def fully_connected_layer(n_inputs, n_outputs, activation_fn=tf.nn.relu):
    return lambda layer : tf.contrib.layers.fully_connected(
        tf.reshape(layer, [-1, n_inputs]), 
        n_outputs, 
        activation_fn=activation_fn)

def softmax_layer():
    return lambda layer : tf.nn.softmax(layer)

# Create architecture
def create_nn(X,layer_fs):
    for layer_f in layer_fs:
        X = layer_f(X)
    return X

In [94]:
tf.reset_default_graph()

# Load sample images 
# cifar10 (162.17 MiB) 
# 60000 32x32 colour images in 10 classes, with 6000 images per class
dataset = tfds.load(name="cifar10", split=tfds.Split.TRAIN).map(
    lambda elem : {"image": tf.cast(elem['image'], tf.float64), 
                   "label": elem['label']})

# Load shapes
height, width, channels = dataset.output_shapes['image']
n_outputs = 10

In [95]:
save_path = "./c13_1.ckpt"

# Training Hyperparameters
num_epochs = 10
num_batches = 64
num_dataset = 6400

# Dataset
train_dataset = dataset.map(
    lambda elem : {"image": tf.to_float(elem['image']), 
                   "label": elem['label']})
train_dataset = train_dataset.take(num_dataset)
train_dataset = train_dataset.shuffle(buffer_size=1000)
train_dataset = train_dataset.batch(num_batches)
iterator = train_dataset.make_initializable_iterator()
elem = iterator.get_next()
X, y = elem['image'], elem['label']

# Create CNN layers
output = create_nn(X, [conv_layer(),
                       pool_layer(),
                       fully_connected_layer(num_batches**2*2, n_outputs), 
                       fully_connected_layer(n_outputs, n_outputs)
                      ])

# Loss and training operation
loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=output)
training_op = tf.train.AdamOptimizer().minimize(loss)

predictions = {
    "classes": tf.argmax(input=output, axis=1),
    "probabilities": tf.nn.softmax(output, name="softmax_tensor"),
    "loss": loss
}

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for epoch in range(num_epochs):
        loss_sum = 0
        sess.run(iterator.initializer)
        try:
            while True:
                _, result = sess.run([training_op, predictions])
                loss_sum += result["loss"]
        except tf.errors.OutOfRangeError:
            pass
        print(epoch, loss_sum)
    tf.train.Saver().save(sess, save_path)

0 872.2247304916382
1 228.6411497592926
2 222.81011629104614
3 213.71810901165009
4 202.59857857227325
5 191.03315341472626
6 181.55472087860107
7 171.2621158361435
8 161.51873195171356
9 153.3797744512558


In [96]:
y_pred = []
y_target = []

with tf.Session() as sess:
    tf.train.Saver().restore(sess, save_path)
    sess.run(iterator.initializer)
    try:
        while True:
            y_pred_batch, y_target_batch = sess.run([predictions["classes"], y])
            y_pred.append(y_pred_batch)
            y_target.append(y_target_batch)
    except tf.errors.OutOfRangeError:
        pass

INFO:tensorflow:Restoring parameters from ./c13_1.ckpt


I0526 23:37:14.796082 140734930376128 saver.py:1270] Restoring parameters from ./c13_1.ckpt


In [106]:
y_target_flat = np.array(y_target).flatten()
y_pred_flat = np.array(y_pred).flatten()

print("Target   ", y_target_flat[:30])
print("Predicted", y_pred_flat[:30])

Target    [8 6 6 8 6 2 3 6 9 8 7 1 3 1 2 1 0 7 2 3 4 2 7 2 8 0 2 3 0 9]
Predicted [8 6 3 9 6 6 7 5 9 9 7 5 2 4 0 8 9 7 7 6 6 9 7 5 0 0 2 6 8 9]
