# 94-775/95-865: Image Analysis with Convolutional Neural Nets (CNN's, also called convnets)

Author: George H. Chen (georgechen [at symbol] cmu.edu)

This demo draws heavily from the handwritten digit example in Chapter 2 of Francois Chollet's "Deep Learning with Python" book. I've added a simpler single-layer example first before moving to the 2-layer example. I then proceed to two CNN examples.

We start with loading in the data.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
np.set_printoptions(precision=5, suppress=True)

from tensorflow.python import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

flattened_train_images = train_images.reshape(len(train_images), -1)  # flattens out each training image
flattened_train_images = flattened_train_images.astype(np.float32) / 255  # rescale to be between 0 and 1
flattened_test_images = test_images.reshape(len(test_images), -1)  # flattens out each test image
flattened_test_images = flattened_test_images.astype(np.float32) / 255  # rescale to be between 0 and 1

from keras.utils import to_categorical
train_labels_categorical = to_categorical(train_labels)
test_labels_categorical = to_categorical(test_labels)

Using TensorFlow backend.


In [2]:
train_labels[0]

5

In [3]:
train_labels_categorical[0]

array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

## Single-layer neural net

Making a neural net with a single Dense layer (also called a fully-connected layer) is fairly simple. Note that we need to specify the input shape for the initial layer.

Make sure you understand where the number of parameters comes from! To do this, count how many numbers are in the weight matrix and the bias vector.

In [4]:
single_layer_model = Sequential()  # this is Keras's way of specifying a model that is a single sequence of layers
single_layer_model.add(Dense(10, activation='softmax', input_shape=(784,)))
single_layer_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 10)                7850      
Total params: 7,850
Trainable params: 7,850
Non-trainable params: 0
_________________________________________________________________


In [5]:
single_layer_model.compile(optimizer='adam',
                           loss='categorical_crossentropy',
                           metrics=['accuracy'])

In [6]:
single_layer_model.fit(flattened_train_images,
                       train_labels_categorical,
                       validation_split=0.2,
                       epochs=5,
                       batch_size=128)

Train on 48000 samples, validate on 12000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x6461a0e10>

## Two-layer neural net

Going from 1 Dense layer to 2 Dense layers is straightforward. Importantly, we only need to specify the input shape for the first layer added; the input shape is automatically determined by Keras for the second layer.

Once again, make sure you know where the number of parameters come from.

In [7]:
two_layer_model = Sequential()  # this is Keras's way of specifying a model that is a single sequence of layers
two_layer_model.add(Dense(512, activation='relu', input_shape=(784,)))
two_layer_model.add(Dense(10, activation='softmax'))
two_layer_model.compile(optimizer='adam',
                        loss='categorical_crossentropy',
                        metrics=['accuracy'])
two_layer_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 512)               401920    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [8]:
two_layer_model.fit(flattened_train_images,
                    train_labels_categorical,
                    validation_split=0.2,
                    epochs=5,
                    batch_size=128)

Train on 48000 samples, validate on 12000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x646958f90>

## A simple CNN

To work with CNN's, we do not need to flatten the images. However, we still reshape them to include depth information (since we're looking at grayscale images, the depth is just 1).

In [9]:
from keras.layers import Conv2D, MaxPooling2D, Flatten

In [10]:
# reshape images to have an additional dimension for color (even though there's no color)
scaled_train_images = train_images.reshape(len(train_images), train_images.shape[1], train_images.shape[2], -1)
scaled_test_images = test_images.reshape(len(test_images), test_images.shape[1], test_images.shape[2], -1)

# rescale to be between 0 and 1
scaled_train_images = scaled_train_images.astype(np.float32) / 255
scaled_test_images = scaled_test_images.astype(np.float32) / 255

In [11]:
print(scaled_train_images.shape)

(60000, 28, 28, 1)


We now create a simple CNN.

Make sure you understand why the output shape after each layer is what it is, and where the number of parameters come from for the convolutional and dense layers.

In [12]:
simple_convnet_model = Sequential()
simple_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
simple_convnet_model.add(MaxPooling2D((2, 2)))
simple_convnet_model.add(Flatten())
simple_convnet_model.add(Dense(10, activation='softmax'))
simple_convnet_model.summary()

simple_convnet_model.compile(optimizer='adam',
                             loss='categorical_crossentropy',
                             metrics=['accuracy'])

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 5408)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                54090     
Total params: 54,410
Trainable params: 54,410
Non-trainable params: 0
_________________________________________________________________


In [13]:
simple_convnet_model.fit(scaled_train_images,
                         train_labels_categorical,
                         validation_split=0.2,
                         epochs=5,
                         batch_size=128)

Train on 48000 samples, validate on 12000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x6475f0ad0>

## A deeper CNN

We next create a deeper CNN. Note that despite this CNN being deeper, it has _fewer_ parameters!

For the second convolutional layer: remember that we treat the input as a 13-by-13 image that has a depth of 32 (i.e., you can think of this as a stack of 32 images each of size 13-by-13 pixels). Keras will automatically make the filter size to be 3-by-3-**by-32** in this case. Make sure you understand how this leads to the total parameter count of 9248.

In [14]:
deeper_convnet_model = Sequential()
deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
deeper_convnet_model.add(MaxPooling2D((2, 2)))
deeper_convnet_model.add(Conv2D(32, (3, 3), activation='relu'))
deeper_convnet_model.add(MaxPooling2D((2, 2)))
deeper_convnet_model.add(Flatten())
deeper_convnet_model.add(Dense(10, activation='softmax'))
deeper_convnet_model.summary()

deeper_convnet_model.compile(optimizer='adam',
                             loss='categorical_crossentropy',
                             metrics=['accuracy'])

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 11, 11, 32)        9248      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 5, 5, 32)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 800)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                8010      
Total params: 17,578
Trainable params: 17,578
Non-trainable params: 0
__________________________________________________

In [15]:
deeper_convnet_model.fit(scaled_train_images,
                         train_labels_categorical,
                         validation_split=0.2,
                         epochs=5,
                         batch_size=128)

Train on 48000 samples, validate on 12000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x64750b890>

## Finally evaluate on test data

Finally, we look at the test set accuracy. Here, note that the deeper CNN has the best test set accuracy despite having fewer parameters than the two-Dense-layer neural net and the simpler CNN! In machine learning, between models that have the same accuracy, often we favor one with fewer parameters.

In [16]:
test_loss, test_acc = single_layer_model.evaluate(flattened_test_images, test_labels_categorical)
print('Test accuracy:', test_acc)

Test accuracy: 0.9208999872207642


In [17]:
test_loss, test_acc = two_layer_model.evaluate(flattened_test_images, test_labels_categorical)
print('Test accuracy:', test_acc)

Test accuracy: 0.9765999913215637


In [18]:
test_loss, test_acc = simple_convnet_model.evaluate(scaled_test_images, test_labels_categorical)
print('Test accuracy:', test_acc)

Test accuracy: 0.9811999797821045


In [19]:
test_loss, test_acc = deeper_convnet_model.evaluate(scaled_test_images, test_labels_categorical)
print('Test accuracy:', test_acc)

Test accuracy: 0.984000027179718


To get the actual predicted labels for any of these models, we can use the `predict_classes` function; we can check that the raw accuracy agrees with the accuracy found above via the `evaluate` function.

In [20]:
predicted_labels = deeper_convnet_model.predict_classes(scaled_test_images)

In [21]:
predicted_labels

array([7, 2, 1, ..., 4, 5, 6])

In [22]:
np.mean(predicted_labels == test_labels)

0.984

Note that the `predict` function produces the raw output of the neural net, which for each test image corresponds to the probabilities of the different digits 0, 1, ..., 9.

In [23]:
predicted_outputs = deeper_convnet_model.predict(scaled_test_images)

In [24]:
predicted_outputs.shape

(10000, 10)

For example, we can see what the predicted class probabilities are for the 0-th test example:

In [25]:
predicted_outputs[0]

array([0.     , 0.     , 0.00001, 0.00001, 0.     , 0.     , 0.     ,
       0.99997, 0.     , 0.     ], dtype=float32)