# Chapter 14: Deep Computer Vision Using Convolutional Neural Networks

## Problem 1

What are the advantages of a CNN over a fully connected DNN for image classification?

* Parameter efficiency: Parameters are reused over the image
* Invariance: Patterns learned somewhere in an image are detected everywhere
* Locality: CNNs can be constructed to combine low-level features into larger structures by chaining multiple CNNs

## Problem 2

Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.

What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?

* 3 input layers, 3 x 3 kernel, bias , 100 output maps -> (3 x 3 x 3 + 1) x 100 = 2800 parameters 
* 100 input layers, 3 x 3 kernel, bias, 200 output maps -> (100 x 3 x 3 + 1) x 200 = 180200 parameters
* 200 input layers, 3 x 3 kernel, bias , 400 output maps -> (200 x 3 x 3 + 1) x 400 = 720400 parameters
* Total parameters = 2800 + 180200 + 720400 = 903'400 parameters

For inference, only two consecutive layers need to be retained in memory:
* Same padding + stride 2: input channel 200 x 300 -> first feature map 100 x 150 -> second feature map 50 x 75 -> third feature map 25 x 38
* 32 bits = 4 bytes
* First layer: 4 x 100 x 100 x 150 = 6'000'000 = 6mb
* Second layer: 4 x 200 x 50 x 75 = 3'000'000 = 3mb
* third layer: 4 x 400 x 25 x 38 = 1'520'000 = 1.52mb
* Assuming optimization (only two consecutive layers in memory), inference will require 9mb of RAM for the feature maps, plus 903'400 parameters ~ 3.6mb, plus the image itself, 4 * 200 * 300 x 3 pixels ~ 720kb

For training, all layers need to be stored in memory for the backward pass:
* The layers together require 6 + 3 + 1.6 = 10.5 mb
* For a batch size of 50, 10.5 * 50 = 525mb
* Size of images: 50 * 720kb = 36mb
* Model parameters: 3.6 mb
* Minimum RAM for training: 525 + 36 + 3.6 = 564.6

## Problem 3

If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

* Increase stride
* Use consecutive 3x3 kernels, rather than 5x5 or 7x7 kernels
* Smaller batch size
* Reduce size of data to 16bits or even 8 bits
* More maxpooling layers

## Problem 4

Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

* Fewer parameters to learn
* Max pooling reinforces most dominant signal and removes noise -> acts as regularizer and improves generalization performance

## Problem 5

When would you want to add a local response normalization layer?

To force different feature maps to learn/explore a wider range of features

## Problem 6

Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?

* AlexNet: Stack convolutional layers directly on top of one another + local response normalization
* GoggLeNet: Inception modules
* ResNet: Residual learning / skip connections
* Xception: Depthwise separable convolution layer
* SENet: SE block

## Problem 7

What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

* A neural network composed only of convolution and pooling layers
* Replace first dense layer with a convolution layer with kernal size equal to layer's input size, stride 1, and "valid" padding. The following layers need 1x1 filters

Fully convolutional networks are neural networks composed exclusively of convolutional and pooling layers. FCNs can efficiently process images of any width
and height (at least above the minimum size). They are most useful for object detection and semantic segmentation because they only need to look at the image
once (instead of having to run a CNN multiple times on different parts of the image). If you have a CNN with some dense layers on top, you can convert these
dense layers to convolutional layers to create an FCN: just replace the lowest dense layer with a convolutional layer with a kernel size equal to the layer’s input size, with one filter per neuron in the dense layer, and using "valid" padding.
Generally the stride should be 1, but you can set it to a higher value if you want. The activation function should be the same as the dense layer’s. The other dense layers should be converted the same way, but using 1 × 1 filters. It is actually possible to convert a trained CNN this way by appropriately reshaping the dense layers’ weight matrices.

## Problem 8

What is the main technical difficulty of semantic segmentation?

Information loss in CNNs. Pixel-level information needs to be restored for the final task

The main technical difficulty of semantic segmentation is the fact that a lot of the spatial information gets lost in a CNN as the signal flows through each layer, especially in pooling layers and layers with a stride greater than 1. This spatial information needs to be restored somehow to accurately predict the class of each pixel.

## Problem 9

Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

In [1]:
import os

import numpy as np
import tensorflow as tf

from tensorflow import keras

In [2]:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
if len(physical_devices) > 0:
    tf.config.experimental.set_memory_growth(physical_devices[0], True)

In [3]:
(X_train_full,  y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()
X_valid, X_train = X_train_full[:5000]/255., X_train_full[5000:]/255.
y_valid, y_train= y_train_full[:5000], y_train_full[5000:]
X_test = X_test/255.

In [4]:
X_train = X_train[..., np.newaxis].astype('float32')
X_valid = X_valid[..., np.newaxis].astype('float32')
X_test = X_test[..., np.newaxis].astype('float32')

In [5]:
model = keras.models.Sequential([
    keras.layers.Conv2D(64, 3, activation='relu', padding='same', input_shape=X_train.shape[1:]),
    keras.layers.Conv2D(64, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.Conv2D(128, 3, activation='relu', padding='same'),
    keras.layers.MaxPooling2D(2),
    keras.layers.Flatten(),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(50, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

In [6]:
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = 'nadam', metrics=['accuracy', 'categorical_accuracy', 'sparse_top_k_categorical_accuracy', 'top_k_categorical_accuracy'])

In [7]:
run_index = 1 # increment every time you train the model
run_logdir = os.path.join(os.curdir, "my_mnist_logs", "run_{:03d}".format(run_index))

In [8]:
%load_ext tensorboard

In [9]:
%tensorboard --logdir=./my_mnist_logs --port=6006

Reusing TensorBoard on port 6006 (pid 4312), started 4:03:23 ago. (Use '!kill 4312' to kill it.)

In [12]:
modelcheckpoint_cb = keras.callbacks.ModelCheckpoint('./models/my_mnist_cnn.h5', save_best_only=True)
earlystopping_cb = keras.callbacks.EarlyStopping(patience=10)
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
callbacks = [modelcheckpoint_cb, earlystopping_cb, tensorboard_cb]

In [13]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

In [14]:
history = model.fit(X_train, y_train, epochs=200, validation_data=(X_valid, y_valid), callbacks = callbacks)

Epoch 1/200
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200


In [15]:
model.evaluate(X_valid, y_valid)



[0.041420646011829376,
 0.993399977684021,
 0.09600000083446503,
 0.9998000264167786,
 0.20880000293254852]

In [16]:
model.evaluate(X_train, y_train)



[0.004471472930163145,
 0.9987454414367676,
 0.09907272458076477,
 0.9999818205833435,
 0.21067272126674652]

In [17]:
model_load = keras.models.load_model('./models/my_mnist_cnn.h5')
model_load.evaluate(X_valid, y_valid)



[0.02713191509246826,
 0.09640000015497208,
 0.09640000015497208,
 0.9998000264167786,
 0.2433999925851822]

In [18]:
model_load.evaluate(X_train, y_train)



[0.017329227179288864,
 0.09901817888021469,
 0.09901817888021469,
 0.9998363852500916,
 0.23772726953029633]

In [44]:
y_pred = np.argmax(model.predict(X_train), axis=-1)
sum(y_train == y_pred) / len(y_pred)

0.9983636363636363

In [45]:
y_pred = np.argmax(model_load.predict(X_train), axis=-1)
sum(y_train == y_pred) / len(y_pred)

0.9987454545454545

In [46]:
y_pred = np.argmax(model_load.predict(X_valid), axis=-1)
sum(y_valid == y_pred) / len(y_pred)

0.9938

In [47]:
y_pred = np.argmax(model.predict(X_valid), axis=-1)
sum(y_valid == y_pred) / len(y_pred)

0.9934

In [49]:
model_load.metrics_names

['loss', 'accuracy']

In [51]:
model.metrics_names

['loss', 'accuracy']

In [52]:
model.save('./my_mnist_cnn_test.h5')

In [53]:
model_test = keras.models.load_model('./my_mnist_cnn_test.h5')

In [55]:
model_test.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.09600000083446503]

In [58]:
np.all(model.predict(X_valid) == model_test.predict(X_valid))

True

In [59]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

In [60]:
model_test = keras.models.load_model('./my_mnist_cnn_test.h5')
model_test.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.09600000083446503]

In [61]:
tf.__version__

'2.3.0'

In [62]:
model.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.993399977684021]

In [63]:
model.metrics_names

['loss', 'accuracy']

In [64]:
model.metrics

[<tensorflow.python.keras.metrics.Mean at 0x182c6d43e08>,
 <tensorflow.python.keras.metrics.MeanMetricWrapper at 0x182d3c3a248>]

In [None]:
keras.metrics.Accuracy(Q)

In [67]:
model.layers[0].get_weights()[0][0][0][0]

array([-0.02089118, -0.03093303,  0.05925622,  0.05330097,  0.0412606 ,
       -0.01940396, -0.07862163, -0.14383893, -0.57217515,  0.01543191,
        0.00902614,  0.04170617, -0.34896886, -0.03886291,  0.02617732,
        0.0256169 ,  0.24014159, -0.33885354, -0.07032701,  0.03729414,
       -0.02416613, -0.22400498, -0.00736859, -0.13455546,  0.11253003,
       -0.12499595,  0.10964944, -0.170484  , -0.18827076, -0.1883732 ,
        0.03201261, -0.11146889,  0.29235902,  0.06446967, -0.06647974,
       -0.00145599,  0.04357658, -0.00444707, -0.09551448, -0.02749488,
        0.03206693, -0.08407733,  0.09534223, -0.14873868, -0.28262717,
       -0.15177178,  0.07111242,  0.01771266, -0.03874651, -0.09678347,
       -0.00453642, -0.0568822 , -0.02944077, -0.05342673, -0.01625358,
       -0.05732775,  0.0736178 ,  0.01949573, -0.07842135, -0.0534521 ,
        0.09091351,  0.01105258, -0.02580779, -0.18441854], dtype=float32)

In [68]:
model_test.layers[0].get_weights()[0][0][0][0]

array([-0.02089118, -0.03093303,  0.05925622,  0.05330097,  0.0412606 ,
       -0.01940396, -0.07862163, -0.14383893, -0.57217515,  0.01543191,
        0.00902614,  0.04170617, -0.34896886, -0.03886291,  0.02617732,
        0.0256169 ,  0.24014159, -0.33885354, -0.07032701,  0.03729414,
       -0.02416613, -0.22400498, -0.00736859, -0.13455546,  0.11253003,
       -0.12499595,  0.10964944, -0.170484  , -0.18827076, -0.1883732 ,
        0.03201261, -0.11146889,  0.29235902,  0.06446967, -0.06647974,
       -0.00145599,  0.04357658, -0.00444707, -0.09551448, -0.02749488,
        0.03206693, -0.08407733,  0.09534223, -0.14873868, -0.28262717,
       -0.15177178,  0.07111242,  0.01771266, -0.03874651, -0.09678347,
       -0.00453642, -0.0568822 , -0.02944077, -0.05342673, -0.01625358,
       -0.05732775,  0.0736178 ,  0.01949573, -0.07842135, -0.0534521 ,
        0.09091351,  0.01105258, -0.02580779, -0.18441854], dtype=float32)

In [72]:
np.all(model_test.layers[1].get_weights()[0][0][0][0] == model.layers[1].get_weights()[0][0][0][0])

True

In [73]:
model.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.993399977684021]

In [74]:
model_test.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.09600000083446503]

In [83]:
model_test.evaluate(X_valid, y_valid)



[0.06746520102024078, 0.09600000083446503]

## Problem 10

Use transfer learning for large image classification, going through these steps:

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf

from tensorflow.keras.preprocessing import image_dataset_from_directory

### a.

Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).

In [None]:
_URL = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=_URL, extract=True)
PATH = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

train_dir = os.path.join(PATH, 'train')
validation_dir = os.path.join(PATH, 'validation')

BATCH_SIZE = 32
IMG_SIZE = (160, 160)

### b.

Split it into a training set, a validation set, and a test set.

In [None]:
train_dataset = image_dataset_from_directory(train_dir,
                                             shuffle=True,
                                             batch_size=BATCH_SIZE,
                                             image_size=IMG_SIZE)

In [None]:
validation_dataset = image_dataset_from_directory(validation_dir,
                                                  shuffle=True,
                                                  batch_size=BATCH_SIZE,
                                                  image_size=IMG_SIZE)

In [None]:
class_names = train_dataset.class_names

plt.figure(figsize=(10, 10))
for images, labels in train_dataset.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")

In [None]:
val_batches = tf.data.experimental.cardinality(validation_dataset)
test_dataset = validation_dataset.take(val_batches // 5)
validation_dataset = validation_dataset.skip(val_batches // 5)

In [None]:
print('Number of validation batches: %d' % tf.data.experimental.cardinality(validation_dataset))
print('Number of test batches: %d' % tf.data.experimental.cardinality(test_dataset))

In [None]:
AUTOTUNE = tf.data.experimental.AUTOTUNE

train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
validation_dataset = validation_dataset.prefetch(buffer_size=AUTOTUNE)
test_dataset = test_dataset.prefetch(buffer_size=AUTOTUNE)

### c.

Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.

In [None]:
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input

In [None]:
# Create the base model from the pre-trained model MobileNet V2
IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

In [None]:
image_batch, label_batch = next(iter(train_dataset))
feature_batch = base_model(image_batch)
print(feature_batch.shape)

In [None]:
base_model.trainable = False

In [None]:
# Let's take a look at the base model architecture
base_model.summary()

In [None]:
# Add a classification head
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
feature_batch_average = global_average_layer(feature_batch)
print(feature_batch_average.shape)

In [None]:
prediction_layer = tf.keras.layers.Dense(1)
prediction_batch = prediction_layer(feature_batch_average)
print(prediction_batch.shape)

In [None]:
inputs = tf.keras.Input(shape=(160, 160, 3))
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = global_average_layer(x)
x = tf.keras.layers.Dropout(0.2)(x)
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)

In [None]:
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(lr=base_learning_rate),
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
initial_epochs = 10

loss0, accuracy0 = model.evaluate(validation_dataset)

In [None]:
print("initial loss: {:.2f}".format(loss0))
print("initial accuracy: {:.2f}".format(accuracy0))

In [None]:
history = model.fit(train_dataset,
                    epochs=initial_epochs,
                    validation_data=validation_dataset)

In [None]:
# learning curves
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.ylim([min(plt.ylim()),1])
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

### d.

Fine-tune a pretrained model on this dataset.

In [None]:
base_model.trainable = True

In [None]:
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))

# Fine-tune from this layer onwards
fine_tune_at = 100

# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
    layer.trainable =  False

In [None]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer = tf.keras.optimizers.RMSprop(lr=base_learning_rate/10),
              metrics=['accuracy'])

In [None]:
model.summary()

In [None]:
len(model.trainable_variables)

In [None]:
fine_tune_epochs = 10
total_epochs =  initial_epochs + fine_tune_epochs

history_fine = model.fit(train_dataset,
                         epochs=total_epochs,
                         initial_epoch=history.epoch[-1],
                         validation_data=validation_dataset)

In [None]:
acc += history_fine.history['accuracy']
val_acc += history_fine.history['val_accuracy']

loss += history_fine.history['loss']
val_loss += history_fine.history['val_loss']

In [None]:
plt.figure(figsize=(8, 8))
plt.subplot(2, 1, 1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.ylim([0.8, 1])
plt.plot([initial_epochs-1,initial_epochs-1],
          plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.ylim([0, 1.0])
plt.plot([initial_epochs-1,initial_epochs-1],
         plt.ylim(), label='Start Fine Tuning')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

In [None]:
loss, accuracy = model.evaluate(test_dataset)
print('Test accuracy :', accuracy)

In [None]:
#Retrieve a batch of images from the test set
image_batch, label_batch = test_dataset.as_numpy_iterator().next()
predictions = model.predict_on_batch(image_batch).flatten()

# Apply a sigmoid since our model returns logits
predictions = tf.nn.sigmoid(predictions)
predictions = tf.where(predictions < 0.5, 0, 1)

print('Predictions:\n', predictions.numpy())
print('Labels:\n', label_batch)

plt.figure(figsize=(10, 10))
for i in range(9):
  ax = plt.subplot(3, 3, i + 1)
  plt.imshow(image_batch[i].astype("uint8"))
  plt.title(class_names[predictions[i]])
  plt.axis("off")

## Problem 11

Go through TensorFlow’s Style Transfer tutorial. It is a fun way to generate art using Deep Learning.