# Introduction to Convolutional Neural Networks (convnets or CNNs) with tf.keras and tf.data.

For this notebook we will look at a convnet problem for image classification. Like the first two tutorials, we will classify Fashion-MNIST digits. This tutorial was inspired by [this one](https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.1-introduction-to-convnets.ipynb).

For the image classification task, rather than unstacking our images as we did in the previous notebooks, it is better to utilize the 2D structure of natural images and train on that. This is what convnets or CNNs do. On a very high level convnets are stacks of Convolutional layers (`Conv2D`) and Pooling layers (`MaxPooling2D`). But most importantly we will take as input 3D tensors of shape (`height`, `width`, `channels`) where for the case of grayscale images `channels=1` and return 3D tensors. 

In [1]:
import tensorflow as tf

import numpy as np

In [2]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.fashion_mnist.load_data()

In [3]:
TRAINING_SIZE = len(train_images)
TEST_SIZE = len(test_images)

train_images = np.asarray(train_images, dtype=np.float32) / 255

# Convert the train images and add channels
train_images = train_images.reshape((TRAINING_SIZE, 28, 28, 1))

test_images = np.asarray(test_images, dtype=np.float32) / 255
# Convert the train images and add channels
test_images = test_images.reshape((TEST_SIZE, 28, 28, 1))

In [4]:
# How many categories we are predicting from (0-9)
LABEL_DIMENSIONS = 10

train_labels  = tf.keras.utils.to_categorical(train_labels, LABEL_DIMENSIONS)
test_labels = tf.keras.utils.to_categorical(test_labels, LABEL_DIMENSIONS)

# Cast the labels to floats, needed later
train_labels = train_labels.astype(np.float32)
test_labels = test_labels.astype(np.float32)

In [5]:
model = tf.keras.Sequential()

model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(3, 3), activation=tf.nn.relu, input_shape=(28, 28, 1)))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation=tf.nn.relu))
model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=2))
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=(3, 3), activation=tf.nn.relu))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


You can see above that the output of every convolutional layer is a 3D tensor of shape (`height`, `width`, `filters`). The width and height tend to get smaller as we go deeper into the network and the number of filters or channels increases from the input channel size of 1. 

The last part of the network for the classification task is similar to the other notebooks and consists of `Dense` layers which process 1D vectors. So we first need to `Flatten` our 3D outputs from the convolutional part to 1D and then add the `Dense` layers:

In [6]:
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(64, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(LABEL_DIMENSIONS, activation=tf.nn.softmax))

Training the network is again similar to all the previous notebooks:

In [7]:
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)

model.compile(loss='categorical_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                3

In [8]:
BATCH_SIZE=128

# Because tf.data may work with potentially **large** collections of data
# we do not shuffle the entire dataset by default
# Instead, we maintain a buffer of SHUFFLE_SIZE elements
# and sample from there.
SHUFFLE_SIZE = 10000 

# Create the dataset
dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
dataset = dataset.shuffle(SHUFFLE_SIZE)
dataset = dataset.batch(BATCH_SIZE)

In [9]:
EPOCHS=5 # or the number of times we go through our entire training dataset

for epoch in range(EPOCHS):
    for (batch, (images, labels)) in enumerate(dataset):
        train_loss, train_accuracy = model.train_on_batch(images, labels)
    
        if batch % 10 == 0: print(batch, train_accuracy)
  
    # Here you can gather any metrics or adjust your training parameters
    print('Epoch #%d\t Loss: %.6f\tAccuracy: %.6f' % (epoch + 1, train_loss, train_accuracy))

0 0.09375
10 0.4609375
20 0.5234375
30 0.5703125
40 0.765625
50 0.65625
60 0.6953125
70 0.671875
80 0.734375
90 0.734375
100 0.640625
110 0.6953125
120 0.7421875
130 0.828125
140 0.8515625
150 0.828125
160 0.75
170 0.703125
180 0.75
190 0.7734375
200 0.7421875
210 0.8046875
220 0.7265625
230 0.8203125
240 0.875
250 0.796875
260 0.828125
270 0.8125
280 0.8046875
290 0.8515625
300 0.8359375
310 0.8359375
320 0.8203125
330 0.8203125
340 0.8359375
350 0.7890625
360 0.8125
370 0.8125
380 0.7421875
390 0.8828125
400 0.828125
410 0.859375
420 0.84375
430 0.828125
440 0.796875
450 0.7890625
460 0.8671875
Epoch #1	 Loss: 0.482308	Accuracy: 0.802083
0 0.8125
10 0.828125
20 0.8671875
30 0.8984375
40 0.8515625
50 0.84375
60 0.828125
70 0.8359375
80 0.828125
90 0.90625
100 0.8046875
110 0.8671875
120 0.8125
130 0.8515625
140 0.90625
150 0.8984375
160 0.828125
170 0.84375
180 0.8515625
190 0.921875
200 0.8203125
210 0.828125
220 0.8046875
230 0.875
240 0.8984375
250 0.84375
260 0.8828125
270 0.86718

Again to evaluate the model we need to check the accuracy on unseen or test data:

In [10]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('\nTest Model \t\t Loss: %.6f\tAccuracy: %.6f' % (test_loss, test_acc))


Test Model 		 Loss: 0.281278	Accuracy: 0.896900
