#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# WIP Convolutional Neural Networks

Convolutional Neural Networks (CNN) are deep neural networks with the addition of two very special types of layers: convolutional layers and pooling layers. Let's take a look at both.

## Overview

### Learning Objectives

* Introduction to Convolutional Neural Networks
* Using TensorFlow to implement CNN

### Prerequisites

* Introduction to TensorFLow
* Image Manipulation

### Estimated Duration

60 minutes

### Grading Criteria

Each exercise is worth 3 points. The rubric for calculating those points is:

| Points | Description |
|--------|-------------|
| 0      | No attempt at exercise |
| 1      | Attempted exercise, but code does not run |
| 2      | Attempted exercise, code runs, but produces incorrect answer |
| 3      | Exercise completed successfully |

There is no graded exercise in this Colab so there is 0 points available.

## Convolutional Layers

Convolutional layers are layers in the network that only partially connect to their input layers. The layer is divided into receptive fields that only look at a portion of the input layer and apply filters to it.

Let's see this in action. First, we will create a 100 x 100 x 3 image that contains red vertical stripes centered every 10 pixels on the image.

In [0]:
import matplotlib.pyplot as plt
import numpy as np

# Create an image that is completely black
vertical_stripes = np.zeros((100, 100, 3))

# Loop over the image 10 pixels at a time turning the
# center line of vertial pixels red
for x in range(4, 101, 10):
  vertical_stripes[:, x:x+2, 0] = 255

plt.imshow(vertical_stripes)

In [0]:
# Run this cell to mount your Google Drive.
from google.colab import drive
drive.mount('/content/drive')

We will now create a filter that we'll apply using TensorFlow's [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) function. For illustrative purposes we'll create a filter to extract the red out of the image we just created.

The filter will be 10 x 10 x 3 (remember that our vertical red lines are centered every 10 pixels and that our image has RGB values). The final number in the filter, 1, is the number of output channels we'd like the filter to produce. These output channels are called "feature maps". You get one feature map per filter.

In [0]:
receptor_height, receptor_width, input_color_channels, output_color_channels = (10, 10, 3, 1)
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels, output_color_channels), dtype=np.float32)

We created our filter and set it to all zeros. We now need to indicate what portion of the receptor field we want to extract data from. In this case we are trying to extract the vertical red line which we know is centered every ten pixels (pixels 5 and 6). To capture the red line we'll tell the filter that we only care about the 5th and 6th pixel in every row of data.

In [0]:
filters[:, 5:7, :, 0] = 1

Let's now get our image ready to pass to our convolutional layer. To that we package the 3-dimensional image in yet another array to create a dataset for TensorFlow. TensorFlow's convolutional function expects a 4-dimensional dataset.

In [0]:
dataset = np.array([vertical_stripes], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

image_count, image_height, image_width, color_channels

To get the image into TensorFlow we need to create a placeholder. We'll create one that can take any number of images (None as the first value of `shape`). We could have also passed in `image_count` since we knew exactly how many images we were processing in this case, but it isn't required.

In [0]:
import tensorflow as tf

X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))

To create our convolutional layer we use [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d). The arguments we are passing it are:

*  Our placeholder that we'll use to pass data into the layer.
*  The filters that we want to apply to the data. In this case we are passing in the filter that will capture the middle vertical pixels in a 10x10 receptor.
*  The strides we want the layer to take when operating on the data. In this case we want the input data to be processed for every image (the first 1) and every color channel (the last 1). The 10s cause the receptor to shift by 10 pixels every vertical and horizontal step through the image. This is exactly our filter size and allows us to stay centered on the red vertical lines. In practice you'd likely want some overlap.
*  A padding argument. In this case we chose "SAME" which causes TensorFlow to pad the image if necessary (equal padding on each size) in order to make the filter process the entire image.

In [0]:
convolution = tf.nn.conv2d(X, filters, strides=[1,10,10,1], padding="SAME")

We can now run our convolutional layer using a TensorFlow session.

Notice that our output shape reduces the input image to a 10 x 10 x 1 matrix from a 100 x 100 x 3 matrix. This is because we processed the image using a 10 x 10 single-channel-output filter and stepped 10 pixels each time. 

In [0]:
with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

output.shape

Looking at the image isn't very telling. It simply looks like a single-color image.

In [0]:
plt.imshow(output[0, :, :, 0 ])

If you look at the data you can see that the values are uniformly 2550.

In [0]:
np.unique(output)

What happens if we include some black pixels by increasing our vertical filter to capture the four vertical pixels in the center? Our output number changes to 5100.


In [0]:
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels, output_color_channels), dtype=np.float32)
filters[:, 4:8, :, :] = 1

X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))
convolution = tf.nn.conv2d(X, filters, strides=[1,10,10,1], padding="SAME")

with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

np.unique(output)

If we move our filter to only capture black pixels our output becomes 0.

In [0]:
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels, output_color_channels), dtype=np.float32)
filters[:, :2, :, :] = 1

X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))
convolution = tf.nn.conv2d(X, filters, strides=[1,10,10,1], padding="SAME")

with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

np.unique(output)

Let's look at a convolutional layer on a real image. We'll load a sample image from Scikit Learn.

In [0]:
from sklearn.datasets import load_sample_image

china = load_sample_image('china.jpg')

plt.imshow(china)

We will package the image in a 4-dimensional matrix for processing by TensorFlow.

In [0]:
dataset = np.array([china], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

image_count, image_height, image_width, color_channels

Let's re-create our vertical line filter and apply it to the image to see the convolutional layer in action.

In [0]:
receptor_height, receptor_width, input_color_channels, output_color_channels = (10, 10, 3, 1)
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels, output_color_channels), dtype=np.float32)
filters[:, 5:7, :, :] = 1

image_count, image_height, image_width, color_channels = dataset.shape
X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))

convolution = tf.nn.conv2d(X, filters, strides=[1,4,4,1], padding="SAME")

with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

plt.imshow(output[0, :, :, 0], cmap="gray")
plt.show()

Typically you won't define your own filters though. You can let TensorFlow discover them by using [tf.layers.conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d) instead of [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d).

In this example we ask for three features with a 5x5 visual receptor stepping two pixels at a time.

In [0]:
image_count, image_height, image_width, color_channels = dataset.shape
X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))

convolution = tf.layers.conv2d(X, filters=3, kernel_size=5, strides=[2,2], padding="SAME")

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  output = sess.run(convolution, feed_dict={X: dataset})

Let's look at the first feature map.

In [0]:
plt.imshow(output[0, :, :, 0])
plt.show()

Here is the second feature map.

In [0]:
plt.imshow(output[0, :, :, 1])
plt.show()

And the third.

In [0]:
plt.imshow(output[0, :, :, 2])
plt.show()

## Pooling Layers

Pooling layers are simply used to shrink the data from their input layer by sampling the data per receptor. Let's look at an example. We'll first load a sample image.

In [0]:
flower = load_sample_image('flower.jpg')

plt.imshow(flower)
plt.show()

We can package this image in a 4-dimensional matrix and pass it to the [tf.nn.max_pool](https://www.tensorflow.org/api_docs/python/tf/nn/max_pool) function. This function extracts the maximum value from each receptor field.

In the example below we create a 2 x 2 receptor (ksize) and move it around the image shifting 2 pixels each time. This reduces the height and width of the image by half, effectively reducing our dataset size by 75%.

In [0]:
dataset = np.array([flower], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))
max_pool = tf.nn.max_pool(X, ksize=[1,2,2,1], strides=[1,2,2,1], padding="VALID")

with tf.Session() as sess:
  output = sess.run(max_pool, feed_dict={X: dataset})

plt.imshow(output[0].astype(np.uint8))
plt.show()

# Exercises

## Exercise 1: Challenge (Ungraded)

Use [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d) to apply a stack of filters to the Scikit Learn built in flower image mentioned earlier in this colab.

* Create a (7, 7, 3, 2) filter set. The `2` on the end indicates that we'll create two filters and get two output channels (feature maps).
* Make the first filter be a vertical line filter on the middle pixel of each row.
* Make the second filter be a horizontal line filter on the middle pixel of each row.
* Pass the flower image and filters to [tf.nn.conv2d](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d), step 3 pixels vertically and horizontally.
* Display the first feature map as an image.
* Display the second feature map as an image.

### Student Solution

In [0]:
# Create your filters and apply them to the flower image using TensorFlow here.
# receptor_height, receptor_width, input_color_channels, output_color_channels = (15, 15, 3, 2)
receptor_height, receptor_width, input_color_channels, output_color_channels = (7, 7, 3, 2)
filters = np.zeros(shape=(receptor_height, receptor_width, input_color_channels, output_color_channels), dtype=np.float32)

dataset = np.array([flower], dtype=np.float32)
image_count, image_height, image_width, color_channels = dataset.shape

# Use PyPlot to output the first feature map here.
filters[:, 4, :, 0] = 1
# filters[:, 7:10, :, 0] = 1
X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))
convolution = tf.nn.conv2d(X, filters, strides=[1,3,3,1], padding="SAME")

with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

plt.imshow(output[0, :, :, 0], cmap="gray")
# plt.imshow(output[0, :, :, 0])
plt.show()




In [0]:
# Use PyPlot to output the second feature map here.
filters[4, :, :, 1] = 1
X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))
convolution = tf.nn.conv2d(X, filters, strides=[1,3,3,1], padding="SAME")

with tf.Session() as sess:
  output = sess.run(convolution, feed_dict={X: dataset})

plt.imshow(output[0, :, :, 0], cmap="gray")
plt.show()


In [0]:
image_count, image_height, image_width, color_channels = dataset.shape
X = tf.placeholder(tf.float32, shape=(None, image_height, image_width, color_channels))

convolution = tf.layers.conv2d(X, filters=3, kernel_size=7, strides=[3,3], padding="SAME")

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  output = sess.run(convolution, feed_dict={X: dataset})

In [0]:
plt.imshow(output[0, :, :, 1])
plt.show()