In [2]:
import pandas as pd
import numpy as np

from tensorflow import keras 
from keras import layers
from keras.datasets import mnist

## Chapter 8: Introduction to deep learning for computer vision

You’ll learn to apply convnets to image-classification problems, in particular those involving small training datasets, which are the most common use case if you aren’t a large tech company.

Basic convnet is a stack of `Conv2d` and `MaxPooling2D`. We build model using Functional API.

In [3]:
# Dataset example : MNIST

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# 60k train images + 10k test images, each of size 28x28
print("Train images:", train_images.shape)
print("Test images:",test_images.shape)

# integer labels (i.e. we need to use sparse categorical loss)
print("First 10 train labels:", train_labels[0:10])
print("First 10 test labels:", test_labels[0:10])

Train images: (60000, 28, 28)
Test images: (10000, 28, 28)
First 10 train labels: [5 0 4 1 9 2 1 3 1 4]
First 10 test labels: [7 2 1 0 4 1 4 9 5 9]


In [4]:
# reshape to appropriate tensor shape

train_images = train_images.reshape((60000, 28, 28, 1))
print("Train images:", train_images.shape)

# pixel intensities range from 0 to 255
train_images[0,:,:,0]

Train images: (60000, 28, 28, 1)


array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         18,  18,  18, 126, 136, 175,  26, 166, 255, 247, 127,   0,   0,
          0,   0],
       [  

In [5]:
# normalize train_images
train_images = train_images.astype("float32") / 255

# reshape and normlize test_images
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype("float32") / 255 

### About Convnets
* Convnet input tensor dimensions: `(image_height, image_width, image_channels)`
* Batch dimension is **not** included in the above tensor
* `Conv2D` means that the kernel slides in a 2D fashion. 
* `Fliters` argument in `Conv2D` function is to mention number of filters to use in the `conv2D` layer.
* Output of every `Conv2D` and `MaxPooling2D` is of dim: `(height, width, channels)`
* `channels` co-ordinate in above output is controlled by number of filters passed in `Conv2D` layer.

### Step 1: Defining ConvNet in functional API format

In [6]:
# basic convnet in Functional API format

inputs = keras.Input(shape=(28, 28, 1))
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(inputs)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(10, activation="softmax")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

# to get summary of above model
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 28, 28, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 13, 13, 32)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 5, 5, 64)         0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 3, 128)         73856 

### About code:
* Input is an image of dim $28 \times 28 \times 1$ i.e. greyscale image.
* The first layer uses $32$ different filters each of size $3 \times 3$.

The first `Conv2D` layer:

* One $3 \times 3$ filter gives an output of size $26 \times 26$. Stacking "image outputs" of 32 such filters gives output tensor of size $26 \times 26 \times 32$.
* Default is no stride or stride 1.
* Note: $26 = 28 - 3 + 1$

The `dense` layer(s):

* Notice that the convnet output is finally connected to a densely connected network of layer(s). 
* For this we need to convert the tensor output from `convnet` to a 1D input for the `dense` layer. This task is completed using the `Flatten` layer.
* Since our final classification problem (MNIST dataset) involves 10 classes, our `dense` layers has 10 neurons and a softmax activation.
* Note that `categorical_crossentropy` is used as loss if we use categorical encoded (i.e. one-hot encoding) labels.
* On the other hand, we use `sparse_categorical_crossentropy` in case of integer labels (like in case of MNIST example)

### Step 2: Compile and Fit the CNN model

In [7]:
# define optimizer, loss function and metric to moniter along with loss
model.compile(optimizer="rmsprop",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])


# fit the models of above specifications for specified number of epochs.
# the "fit" function processes images in batches of 64 images at once.
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x1d3001d63e0>

In [8]:
eval = model.evaluate(test_images, test_labels)



In [9]:
print("eval object is of type ", type(eval))
print("eval is a list of ",len(eval), "objects")

eval object is of type  <class 'list'>
eval is a list of  2 objects


In [10]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f"Test accuracy: {test_acc:.3f}")

Test accuracy: 0.992


Point to note:
* `Dense` layers learn global patterns in the input feature space while `Conv` layers learn local patterns.
* visual world is fundamentally **translation-invariant** and **spatially hierarchical**.

Convnets have two important properties:
* **The patterns they learn are translation-invariant**. That is, after learning a certain pattern in the lower-right corner of a picture, a convnet can recognize it anywhere: for example, in the upper-left corner. A densely connected model would have to learn the pattern anew if it appeared at a new location. This makes convnets data-efficient when processing images. They need fewer training samples to learn representations that have generalization power.

* **They can learn spatial hierarchies of patterns.** A first convolution layer will learn small local patterns such as edges, a second convolution layer will learn larger patterns made of the features of the first layers, and so on 


![](response_map.png)

* Convolutions operate over rank-3 tensors called *feature maps*, with two spatial axes (height and width) as well as a depth axis (also called the channels axis)

* For the *input feature map*, depth axis is the color channels axis (depth = 3 for color image and depth = 1 for black-and-white image)

* This *output feature map* is still a rank-3 tensor: height, width and depth. But the depth of the *output feature map* is a parameter of the layer. Now the depth axis no longer stand for specific colors as in RGB input; rather, they stand for *filters*. Filters encode specific aspects of the input data.

In [11]:
data_augmentation = keras.Sequential(
    [
        # Applies horizontal flipping to a random 50% of the images 
        layers.RandomFlip("horizontal"),

        # Rotates the input images by a random value in the range [–10%, +10%]
        layers.RandomRotation(0.1),

        #Zooms in or out of the image by a random factor in the range [-20%, +20%]
        layers.RandomZoom(0.2),
    ]
)

In [None]:
model = keras.Sequential([
    layers.Dense(4, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(1, activation="sigmoid")
])




conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3))