<a href="https://colab.research.google.com/github/google/applied-machine-learning-intensive/blob/master/content/05_deep_learning/00_convolutional_neural_networks/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Copyright 2020 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are deep neural networks with the addition of two very special types of layers: **convolutional layers** and **pooling layers**. We will take a look at both in this lesson.

## Convolutional Layers

Convolutional layers are layers in a neural network that only partially connect to their input layers. The layer is divided into receptive fields that each only look at a portion of the input layer and apply filters to it.

Let's see this in action. First, we will create a 100 x 100 x 3 image that contains red vertical stripes centered every 10 pixels on the image.

In [0]:
import matplotlib.pyplot as plt
import numpy as np

# Create an image that is completely black.
vertical_stripes = np.zeros((100, 100, 3))

# Loop over the image 10 pixels at a time, turning the centerline of vertical
# pixels red.
for x in range(4, 101, 10):
  vertical_stripes[:, x:x+2, 0] = 1.0

_ = plt.imshow(vertical_stripes)

Now let's create a filter we'll apply using TensorFlow's [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) function.

For illustrative purposes we'll create a filter to extract the red out of the image we just created. The filter will be `10 x 10 x 3`. (`10 x 10` is the size of our receptor because our vertical red lines are centered within every 10 pixels. `3` is the number of color channels we are reading because our image has RGB values.) The final number in the filter (1) is the number of output channels we'd like the filter to produce. These output channels are called "feature maps." You get one feature map per filter.

In [0]:
receptor_height, receptor_width = 10, 10
input_color_channels, output_color_channels = 3, 1

filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels,
                          output_color_channels), dtype=np.float32)

We created our filter and set it to all zeros. We now need to indicate what portion of the receptor field we want to extract data from. In this case we are trying to extract the vertical red line, which we know is centered every ten pixels (pixels 5 and 6). To capture the red line, we'll tell the filter that we only care about the 5th and 6th pixel in every row of data.

In [0]:
filters[:, 5:7, :, 0] = 1

Now let's get our image ready to pass to our convolutional layer. To do that we package the 3-dimensional image in yet another array to create a dataset for TensorFlow. TensorFlow's convolutional function expects a 4-dimensional dataset.

In [0]:
dataset = np.array([vertical_stripes], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

image_count, image_height, image_width, color_channels

To get the image into TensorFlow we need to convert it into a Tensor.

In [0]:
import tensorflow as tf

X = tf.convert_to_tensor(dataset, dtype=tf.float32)

To create our convolutional layer, we use [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d). The arguments we are passing it are:

*  The image that we are processing.
*  The filters we want to apply to the data. In this case we are passing in the filter that will capture the middle vertical pixels in a 10x10 receptor.
*  The strides we want the layer to take when operating on the data. In this case we want the input data to be processed for every image and every color channel. The 10s cause the receptor to shift by 10 pixels every vertical and horizontal step through the image. This is exactly our filter size, and it allows us to stay centered on the red vertical lines. In practice you'd likely want some overlap.
*  A padding argument we input as "SAME", which causes TensorFlow to pad the image if necessary (equal padding on each size) in order to make the filter process the entire image.

In [0]:
convolution = tf.nn.conv2d(X, filters, strides=[1, 10, 10, 1], padding="SAME")

We can now run our convolutional layer using a TensorFlow session.

Notice our output shape reduces the input image to a 10 x 10 x 1 matrix from a 100 x 100 x 3 matrix. This is because we processed the image using a 10 x 10 single-channel output filter and stepped 10 pixels each time. 

In [0]:
output = convolution.numpy()
output.shape

Looking at the image isn't very telling. It simply looks like a single-color image.

In [0]:
plt.imshow(output[0, :, :, 0 ])

When we look at the data, we can see that the values are uniformly 10.

In [0]:
np.unique(output)

What happens if we include some black pixels by increasing our vertical filter to capture all four vertical pixels in the center (pixels 4-7, rather than just pixels 5-6)? Our output number changes to 20.


In [0]:
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels,
                          output_color_channels), dtype=np.float32)
filters[:, 4:8, :, :] = 1

X = tf.convert_to_tensor(dataset)
convolution = tf.nn.conv2d(X, filters, strides=[1,10,10,1], padding="SAME")

output = convolution.numpy()

np.unique(output)

If we move our filter to only capture black pixels, our output becomes 0.

In [0]:
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels,
                          output_color_channels), dtype=np.float32)
filters[:, :2, :, :] = 1

X = tf.convert_to_tensor(dataset)
convolution = tf.nn.conv2d(X, filters, strides=[1,10,10,1], padding="SAME")

output = convolution.numpy()

np.unique(output)

Let's look at a convolutional layer on a real image. We'll load a sample image from scikit-learn.

In [0]:
from sklearn.datasets import load_sample_image

china = load_sample_image('china.jpg')

plt.imshow(china)

We will package the image in a 4-dimensional matrix for processing by TensorFlow.

In [0]:
dataset = np.array([china], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

image_count, image_height, image_width, color_channels

To see the convolutional layer in action, let's recreate our vertical line filter and apply it to the image.

In [0]:
receptor_height, receptor_width = 10, 10
input_color_channels, output_color_channels = 3, 1
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels,
                          output_color_channels), dtype=np.float32)
filters[:, 5:7, :, :] = 1

image_count, image_height, image_width, color_channels = dataset.shape
X = tf.convert_to_tensor(dataset)

convolution = tf.nn.conv2d(X, filters, strides=[1,4,4,1], padding="SAME")

output = convolution.numpy()

plt.imshow(output[0, :, :, 0], cmap="gray")
plt.show()

You won't typically define your own filters. You can let TensorFlow discover them by using [tf.keras.layers.Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) instead of [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d).

In this example we ask for three features with a 5x5 visual receptor, stepping two pixels at a time.

In [0]:
image_count, image_height, image_width, color_channels = dataset.shape
X = tf.convert_to_tensor(dataset)

convolution = tf.keras.layers.Conv2D(filters=3, kernel_size=5, strides=[2,2],
                               padding="SAME")

output = convolution(X)
output = output.numpy()

Let's look at the first feature map.

In [0]:
plt.imshow(output[0, :, :, 0])
plt.show()

Here is the second feature map.

In [0]:
plt.imshow(output[0, :, :, 1])
plt.show()

And the third.

In [0]:
plt.imshow(output[0, :, :, 2])
plt.show()

## Pooling Layers

Pooling layers are used to shrink the data from their input layer by sampling the data per receptor. Let's look at an example. We'll first load a sample image.

In [0]:
flower = load_sample_image('flower.jpg')

plt.imshow(flower)
plt.show()

We can package this image in a 4-dimensional matrix and pass it to the [tf.nn.max_pool](https://www.tensorflow.org/api_docs/python/tf/nn/max_pool) function. This function extracts the maximum value from each receptor field.

In the example below, we create a 2 x 2 receptor and move it around the image, shifting 2 pixels each time. This reduces the height and width of the image by half, effectively reducing our dataset size by 75%.

In [0]:
dataset = np.array([flower], dtype=np.float32)

X = tf.convert_to_tensor(dataset)
max_pool = tf.nn.max_pool(X, ksize=[1,2,2,1], strides=[1,2,2,1],
                          padding="VALID")

output = max_pool.numpy()

plt.imshow(output[0].astype(np.uint8))
plt.show()

## Exercise 1: Manual Filtering

Use [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) to apply a stack of filters to the scikit-learn built-in flower image mentioned earlier in this colab.

* Create a (7, 7, 3, 2) filter set. The `2` at the end indicates that we'll create two filters and get two output channels (feature maps).
* Make the first filter be a vertical line filter on the middle pixel of each row.
* Make the second filter be a horizontal line filter on the middle pixel of each row.
* Pass the flower image and filters to [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d), stepping 3 pixels vertically and horizontally.
* Display the first feature map as an image.
* Display the second feature map as an image.

### **Student Solution**

In [0]:
# Create your filters and apply them to the flower image using TensorFlow here.

# Use PyPlot to output the first feature map here.

# Use PyPlot to output the second feature map here.

---

### Answer Key

In [0]:
import matplotlib.pyplot as plt
import numpy as np

flower = load_sample_image('flower.jpg')

receptor_height, receptor_width = 7, 7
input_color_channels, output_color_channels = 3, 2
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels,
                          output_color_channels), dtype=np.float32)
filters[:, 3:4, :, 0] = 1
filters[3:4, :, :, 1] = 1

X = tf.convert_to_tensor([flower], dtype=tf.float32)

convolution = tf.nn.conv2d(X, filters, strides=[1,3,3,1], padding="SAME")

output = convolution.numpy()

plt.imshow(output[0, :, :, 0], cmap="gray")
plt.show()

plt.imshow(output[0, :, :, 1], cmap="gray")
plt.show()

---

## Building a CNN

Now that we have learned about the component parts of a convolutional neural network, let's actually build one.

In this section we will use the [Fruits 360](https://www.kaggle.com/moltean/fruits) dataset that is hosted on Kaggle.

Upload your `kaggle.json` file and run the code below to download the file with the Kaggle API.

In [0]:
! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && mv kaggle.json ~/.kaggle/ && echo 'Done'
! kaggle datasets download moltean/fruits
! ls

The dataset file is `fruits.zip`. Let's unzip and inspect it.

In [0]:
import os
import zipfile

zipfile.ZipFile('fruits.zip').extractall()
os.listdir('./fruits-360/')

We've listed the unzipped directory. Inside it there are two primary folders we'll work with in this dataset:

* Test
* Training

There are folders for each category in the `Test` and `Training` folders. Let's make sure all of the categories are represented in test and train, and let's see how many categories we are working with.

In [0]:
train_dir = './fruits-360/Training'
train_categories = set(os.listdir(train_dir))
test_dir = './fruits-360/Test'
test_categories = set(os.listdir(test_dir))

if train_categories.symmetric_difference(test_categories):
  print("Warning!: ", train_categories.symmetric_difference(test_categories))

print(sorted(train_categories))
print(len(train_categories))

`131` categories, each with representation in test and train.

According to the documentation, the images are all `100x100` pixels. Let's load one and see what the images look like.


In [0]:
import cv2 as cv
import matplotlib.pyplot as plt

sample_dir = os.path.join(train_dir, 'Lychee')
img = cv.imread(os.path.join(sample_dir, os.listdir(sample_dir)[0]))
_ = plt.imshow(img)

We can also verify that the shape is what we expect.

In [0]:
img.shape

We find a `100x100` pixel image with three channels of color.

We can see the color encoding range:

In [0]:
img.min(), img.max()

This hints at a `[0, 255]` range. Depending on how long our model takes to train, it might be wise to scale the values down to `[0.0, 1.0]`, but we'll hold off for now.

Now we need to find a way to get the images into the model. TensorFlow Keras has a class called [`DirectoryIterator`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/DirectoryIterator) that can help with that.

The iterator pulls images from a directory and passes them to our model in batches. There are many settings we can change. In our example here, we set the `target_size` to the size of our input images. Notice that we don't provide a third dimension even though these are RGB files. This is because the default `color_mode` is `'rgb'`, which implies three values.

We also set `image_data_generator` to `None`. If we wanted to, we could have passed an [`ImageDataGenerator`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) to augment the image and increase the size of our dataset. We'll save this for an exercise.

In [0]:
import tensorflow as tf

train_dir = './fruits-360/Training'

train_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    target_size=(100, 100),
    directory=train_dir,
    batch_size=128,
    image_data_generator=None)

The output for the code above notes that `67,692` images were found across `131` classes. These classes are the directories that were in our root folder. They are sorted, so the actual values of the classes are:

* 0 - Apple Braeburn
* 1 - Apple Crimson Snow
* 2 - Apple Golden 1
* ... 
* 128 - Tomato not Ripened
* 129 - Walnut
* 130 - Watermelon

We can validate that using the code below.

In [0]:
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 0)[0][0]])
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 1)[0][0]])
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 2)[0][0]])
print('...')
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 128)[0][0]])
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 129)[0][0]])
print(train_image_iterator.filepaths[np.where(train_image_iterator.labels == 130)[0][0]])

Let's build our model now. We'll use the [`Sequential`](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) and [`Dense`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) classes that we've used in many previous labs, as well as a few new classes:

* [`Conv2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) which creates a convolutional layer.
* [`MaxPool2D`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPooling2D) which creates a pooling layer.
* [`Flatten`](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten) which creates a layer that converts a multidimensional tensor down to a flat tensor.

You can see the entire model below. We input our images into a convolutional layer followed by a pooling layer. After stacking a few convolutional layers and pooling layers, we flatten the final pooling output and finish with some traditional dense layers. The final dense layer is `131` nodes wide and is activated by softmax. This layer represents our classification predictions.

In [0]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu',
                           input_shape=(100, 100, 3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(131, activation='softmax')
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Now let's start training. Let one or two epochs run but then **!!!! STOP THE CELL FROM RUNNING !!!!**

How long was each epoch taking? Ours was taking about `4` minutes. Let's do the math. If each epoch took `4` minutes and we ran `100` epochs, then we'd be training for `400` minutes. That's just under `7` hours of training!

Luckily there is a better way. In the menu click on 'Runtime' and then 'Change runtime type'. In the modal that appears, there is an option called 'Hardware accelerator' that is set to 'None'. Change this to 'GPU' and save your settings.

Your runtime will change, so you'll need to go back to the start of this section and run all of the cells from the start. Don't forget to upload your `kaggle.json` again.

When you get back to this cell a second time and start it running, you should notice a big improvement in training time. We were getting `9` seconds per epoch, which is about `900` seconds total. This totals `15` minutes, which is much better. Let the cell run to completion (hopefully about `15` minutes). You should see it progressing as it is running.

In [0]:
history = model.fit(
    train_image_iterator,
    epochs=10,
)

You might have noticed that each epoch only processed `529` items. These are batches, not images. We set our [`DirectoryIterator`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/DirectoryIterator) batch size to `128`. We have `67,692` images. `67,692 / 129 = 524.744186047`, which is close to the `529` number.

Now let's plot our training accuracy over time.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

And our loss.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))), history.history['loss'])
plt.show()

Over `99%` training accuracy. Let's see how well this generalizes:

In [0]:
import tensorflow as tf

test_dir = './fruits-360/Test'

test_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    target_size=(100, 100),
    directory=test_dir,
    batch_size=128,
    shuffle=False,
    image_data_generator=None)

model.evaluate(test_image_iterator)

When we ran this, we got just under `90%` accuracy, so we are definitely overfitting.

We can also make predictions. The code below selects the next batch, gets predictions for it, and then returns the first prediction.

In [0]:
predicted_class = np.argmax(model(next(test_image_iterator)[0])[0])
predicted_class

This maps to the directory in that position.

In [0]:
os.listdir(train_dir)[predicted_class]

Overall the model seemed to train well, though overfit a bit. We'll try to address this in the exercise below by augmenting our images.

### Exercise 2: `ImageDataGenerator`

Recreate the model above using an [`ImageDataGenerator`](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) to augment the training dataset. When running `fit` be sure to pay attention to the `steps_per_epoch` parameter. It defaults to unbounded, and the generator just keeps on generating if you don't set it.

When you have finished training your model, visualize your training loss. 

Next, use the model to make predictions, and then calculate the F1 score of your validation results.

Explain your work.

*Use as many code blocks and text blocks as necessary below.*

#### **Student Solution**

In [0]:
# Your code goes here

---

#### Answer Key

Get the data.

In [0]:
import os
import zipfile

! chmod 600 kaggle.json && (ls ~/.kaggle 2>/dev/null || mkdir ~/.kaggle) && mv kaggle.json ~/.kaggle/ && echo 'Done'
! kaggle datasets download moltean/fruits

zipfile.ZipFile('fruits.zip').extractall()
os.listdir('./fruits-360/')

Build a model.

In [0]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(100, 100 ,3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(131, activation='softmax')
])

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()

Create an `ImageDataGenerator` to augment the data and then train.

In [0]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

training_dir = './fruits-360/Training/'
train_data_gen = ImageDataGenerator().flow_from_directory(
    batch_size = 128,
    directory=training_dir,
    target_size=(100, 100))

history = model.fit(
    train_data_gen,
    epochs=10,
    steps_per_epoch=100
)

Once training is complete, we can plot accuracy across time.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['accuracy']))),
         history.history['accuracy'])
plt.show()

And loss.

In [0]:
import matplotlib.pyplot as plt

plt.plot(list(range(len(history.history['loss']))),
         history.history['loss'])
plt.show()

Finally, we want to make predictions using our test dataset.

In [0]:
test_dir = './fruits-360/Test/'

test_image_iterator = tf.keras.preprocessing.image.DirectoryIterator(
    target_size=(100, 100),
    directory=test_dir,
    shuffle=False,
    image_data_generator=None)

predictions = model.predict(test_image_iterator)

predictions

And print the F1 score.

In [0]:
from sklearn.metrics import f1_score

import numpy as np

actual_classes = test_image_iterator.classes
predicted_classes = [np.argmax(prediction) for prediction in predictions]

f1_score(actual_classes, predicted_classes, average='micro')

---