# Chapter 5 - Deep learning for computer vision

## Introduction to convnets

In this section, we are going to take a look at convolutional networks and talk about why they have been so sucessful at computer vision tasks. To get started, we are going to see how we can improve the digit recognition from the MNIST dataset from chapter 2.

In [1]:
from tensorflow.keras import layers
from tensorflow.keras import models

## The difference

The section below is where we start to depart from the example from chapter 2. Instead of a network consisting of just densely connected layers, there is a layer of Conv2D layers that I like to think of as a pre-processing step for the densely connected layers. Convolutional layers excell at finding patterns in images because they apply a set of filters over the entire image where dense layers apply their filters indescriminately. I know this is probably a bad example but it's how I like to think about it.

As we can see in the example below, Maxpooling2D effectifely halves the size of the netweek each time it is called. This is be design as MaxPooling is intended to aggresively downsample the feature maps from the preceding Conv2D layer. The MaxPooling operation is similar to the Convolution operation except it does a max tensor operation with a stride of 2 instead of 3, hence the halving. This downsampling decreases the total amount of tensors in a network thereby decreasing the overfitting of the model. The downsampling helps the model generalize features by looking at larger windows.

In [2]:
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

In [3]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


## Conventional methods

After the convolutional layers we add two densely connected layers. At first, this did not make sense to me. But then I considered what each layer type excells at. Convolutional layers excell at finding image features and converting the image (2D tensor) to a list of probabilities that each feature exists. Dense layers excell at finding correlations in numerical data. So this network is doing feature extraction on the image using convolutional layers and then doing classification on the resulting feature probabilities. Smart.

In [4]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [5]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                3

In [6]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

## Normalizing the data

Again we have an example of data normalization but in this case its an image so normalization is pretty straighforward because we know the bounds of the numbers inside. First we cast all of the values contained in the image arrays to floats (to preserve precision after normalization) and then we divide by 255 to map all values to a value between 0 to 1. Next we see to_categorical being used to set test and train labels. This basically converts the result integer into a binary array that can be understood by the algorithm. 

For example a label array containing:  
`[1, 3]`   
would be converted to:   
`[[0, 1, 0, 0], 
 [0, 0, 0, 1]]`

In [7]:
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [8]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

In [9]:
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

In [10]:
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=["accuracy"])

In [11]:
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Train on 60000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7fde5003e198>

## Results

After compiling and training the model we see that just by including the convolutional layers we have increased the accuracy of the model to 99% versus the 97% we got in chapter two. A marked improvement from smarter model architecture versus trying to brute force improvements with more data 

In [16]:
test_loss, test_acc = model.evaluate(test_images, test_labels)



In [17]:
test_acc

0.9923

## Training a convnet using a small datasets

In this example, we will be using a convnet and data augmentation to train a convolutional model with a small dataset.

In [1]:
import os, shutil

In [2]:
original_dataset_dir = '../datasets/dogs-vs-cats'
base_dir = '../datasets/dogs-vs-cats-small'

In [3]:
os.mkdir(base_dir)

In [5]:
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

In [9]:
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)

train_cats_dir = os.path.join(train_dir, 'cats')
os.mkdir(train_cats_dir)

train_dogs_dir = os.path.join(train_dir, 'dogs')
os.mkdir(train_dogs_dir)

In [10]:
validation_cats_dir = os.path.join(validation_dir, 'cats')
os.mkdir(validation_cats_dir)

validation_dogs_dir = os.path.join(validation_dir, 'dogs')
os.mkdir(validation_dogs_dir)