# Image Recognition

(live long, prosper, and watch out for cats)

![cat detectors rule](assets/image/meme.jpg)

(image: https://towardsdatascience.com/the-whos-who-of-machine-learning-and-why-you-should-know-them-9cefbbc84f07)

# Topics

- Input Representation
- Convolutional Neural Networks
 - Convolution, Activation, Pooling
- Architectures
 - Image Classification
 - Object Detection
 - Instance Segmentation

# Image Recognition Tasks
![cat detectors](assets/image/cat_detectors.png)

(image: analyticsindiamag.com)

## Image Representation: Tensor

- 3 channels: 'rgb'
- rows: image height
- columns: image width

Ordering:
- Channels-first: channel, rows, columns
- Channels-last: rows, columns, channels

## Walkthrough - Image Tensors

In this walkthrough, we will read an image from file and examine the data.

### Setup

Install the Python image library:
```
conda install pillow
```

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

# read an image file
demo = Image.open('assets/image/cat.jpg') # source: pxhere.com/en/photo/1337399

# check whether this is RGB or BGR
# so that we can input the images correctly to our neural network
print('channel ordering:', demo.mode)

# display the image
plt.imshow(demo)
plt.title('moar food')
plt.show()

In [None]:
# examine the numpy array
demo_arr = np.array(demo)

print('shape:', demo_arr.shape)
print('data type:', demo_arr.dtype)
print('rank:', demo_arr.ndim)

In [None]:
# since the sides of the picture are the same boring color
# inspect the (roughly) middle 5 rows and columns
midpoint_row = int(demo_arr.shape[0] / 2)
midpoint_col = int(demo_arr.shape[1] / 2)

demo_arr[midpoint_row:midpoint_row+5, midpoint_col:midpoint_col+5, :]

In [None]:
# resize the image to 224 by 224
demo.thumbnail((224, 224), resample=Image.BICUBIC)

# display the image
plt.imshow(demo, interpolation='nearest')
plt.title('moar food (224x224)')
plt.show()

In [None]:
# examine the numpy array again
demo_arr = np.array(demo)

print(demo_arr.shape)
print(demo_arr.dtype)

# notice the difference in values from previously
midpoint_row = int(demo_arr.shape[0] / 2)
midpoint_col = int(demo_arr.shape[1] / 2)
demo_arr[midpoint_row:midpoint_row+5, midpoint_col:midpoint_col+5, :]

A histogram is sometimes helpful to visualize the colour distribution of a given channel

In [None]:
# order: RGB
red_channel = demo_arr[:, :, 0]
green_channel = demo_arr[:, :, 1]
blue_channel = demo_arr[:, :, 2]

fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 10))

# flatten the [row_size, col_size] matrix into a vector of [row_size * col_size]
# we just need to count the raw pixel values for the histogram,
# so it doesn't matter where they are located.
ax[0].hist(red_channel.flatten(), 256, range=(0,256), color='red')
ax[1].hist(green_channel.flatten(), 256, range=(0,256), color='green')
ax[2].hist(blue_channel.flatten(), 256, range=(0,256), color='blue')

fig.suptitle('Histogram of input image')
plt.show()

## Output

- Image Classification: labels
- Object Detection: labels + bounding boxes
- Instance Segmentation: labels + boundaries

## Problem: many input features $\rightarrow$ many parameters

224 x 224 pixel colour image: 224 x 224 x 3 = 150528 features

Dimensionality reduction may help, but there's a better way....

## Convolution

- Reduces parameter space
- Looks at localized, spatial information

![cnn](assets/image/cnn.png)

(image: [leonardoaraujosantos.gitbooks.io](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html))

## Convolution - Hyperparmeters

- Kernel size: size of the window (pixels)
- Stride: how many pixels to slide the window
- Depth: how many filters to use
- Padding: whether to keep the output size the same as input size

## Kernel size = (2, 2), Stride = 1

![convolution](assets/image/2d_convolution.png)

(image: http://www.deeplearningbook.org/contents/convnets.html)

## Depth = number of filters

![depth col](assets/image/depthcol.jpg)

(image: https://cs231n.github.io/convolutional-networks)

## Padding = same

![same_padding_no_strides](assets/image/same_padding_no_strides.gif)

(image: [leonardoaraujosantos.gitbooks.io](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html))

## Padding = valid (none)

![no_padding_no_strides](assets/image/no_padding_no_strides.gif)

(image: [leonardoaraujosantos.gitbooks.io](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/convolutional_neural_networks.html))

## Activity - Convolution Hyperparameters

1. Run and watch the example in the next cell. 

2. Fill in the corresponding hyperparameters

 - Kernel Size = 
 - Stride = 
 - Depth = 
 - Padding =

In [None]:
# Source: https://cs231n.github.io/convolutional-networks
from IPython.display import HTML

HTML('<iframe src=conv-demo.html width=800 height=700></iframe>')

## Walkthrough - 2D Convolution

In this walkthrough, we will convolve our demo image with a kernel that performs edge detection.

Credits: http://machinelearninguru.com/computer_vision/basics/convolution/image_convolution_1.html

In [None]:
# convert our input image to greyscale (1 channel)
demo_grey = demo.convert(mode='L')

plt.imshow(demo_grey, interpolation='nearest')
plt.title('moar grey')
plt.show()

In [None]:
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.convolve2d.html
from scipy.signal import convolve2d

# edge-detection kernel
kernel = np.array([[-1,-1,-1],[-1,8,-1],[-1,-1,-1]])

# we use 'valid' which means we do not add zero padding to our image
edges = convolve2d(demo_grey, kernel, mode='valid')

plt.imshow(edges)
plt.title('moar edges')
plt.show()

## Convolutional Block

Generally, 3 stages:
1. Convolutional Layer
2. Activation Layer
3. Pooling Layer

Sometimes 1 & 2 will repeat a few times before 3.

![conv block](assets/image/convnet.jpg)

(image: https://cs231n.github.io/convolutional-networks)

## Activation Layer

The output of convolution is typically passed through an "activation" function, so that it can model non-linearity:

Examples:
- linear (= no activation)
- sigmoid
- tanh
- Rectified Linear Units, leaky ReLU, Parametric ReLU, Exponential Linear Units

https://keras.io/activations/

## Walkthrough -  Activation Functions

Let's see what happens when we pass our convolved edge detected image through various activation functions. 

In [None]:
import numpy as np

input_arr = edges
input_arr.shape

Sigmoid

$f(x) = \frac{1}{1 + e^{-x}}$

In [None]:
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

plt.imshow(sigmoid(edges))
plt.title('edges + sigmoid')
plt.show()

In [None]:
plt.imshow(np.tanh(input_arr))
plt.title('edges + tanh')
plt.show()

ReLU

$f(x) = max(0, x)$

In [None]:
def relu(x):
    return np.maximum(x, 0)

plt.imshow(relu(input_arr))
plt.title('edges + ReLU')
plt.show()

Leaky ReLU

$f(x) = \begin{cases}
x & if\,x > 0 \\
0.01x & otherwise
\end{cases}$

In [None]:
def leaky_relu(x):
    return np.where(x > 0, x, x * 0.01)

plt.imshow(leaky_relu(input_arr))
plt.title('edges + Leaky ReLU')
plt.show()

## Pooling Layer

- Summarizes the activations
 - Take the maximum of a window size: Max pooling
 - Take the average of a window size: Average pooling

https://keras.io/layers/pooling/

## Pooling Layer

- Translation invariances:
  - Robust to shifts in locations of pixels within that window
- Downsampling:
  - Compressing and summarizing inputs into next layers
  - Pass the highest activation, or the average activations to the next layer

## Walkthrough - Pooling

Let's see what happens when we pass our convolved + activated image through various pooling functions.

### Setup
Install the skimage library:
```
conda install scikit-image
```

In [None]:
# https://stackoverflow.com/questions/42463172/how-to-perform-max-mean-pooling-on-a-2d-array-using-numpy
from skimage.measure import block_reduce

def max_pool(x, pool_size=(2, 2)):
    return block_reduce(x, pool_size, np.max)

def average_pool(x, pool_size=(2, 2)):
    return block_reduce(x, pool_size, np.mean)

In [None]:
plt.imshow(max_pool(sigmoid(edges)))
plt.title('edges + sigmoid + max pool')
plt.show()

print('Original shape:', edges.shape)
print('After activation:', sigmoid(edges).shape)

# (2, 2) will halve the size of the output
print('After pooling (2, 2):', max_pool(sigmoid(edges)).shape)

In [None]:
plt.imshow(average_pool(sigmoid(edges)))
plt.title('edges + sigmoid + average pool')
plt.show()

In [None]:
plt.imshow(max_pool(relu(edges)))
plt.title('edges + relu + max pool')
plt.show()

In [None]:
plt.imshow(average_pool(relu(edges)))
plt.title('edges + relu + average pool')
plt.show()

## Regularization Layers

These layers are common in Deep Convolutional Neural Networks:

- Dropout
- Batch Normalization

## Dropout

- Randomly setting some layer inputs to 0 during training to reduce overfitting.
- No-op during prediction

https://keras.io/layers/core/#dropout

## Dropout
![dropout](assets/image/dropout.png)

(image: http://www.cs.toronto.edu/~rsalakhu/papers/srivastava14a.pdf)

## Batch Normalization

- Avoid saturating non-linearities by normalizing the input to the next layer
- Normalizing is done per minibatch
- Speeds up training

https://keras.io/layers/normalization/#batchnormalization

## Batch Normalization - Training

![batch norm](assets/image/batchnorm.png)

(image: [Batch Normalization Paper](https://arxiv.org/abs/1502.03167))

## Batch Normalization - Prediction

- Minibatch mean and variance don't apply at prediction time
- Instead, "population" mean and variance is computed and stored from training
- Batch Norm layer will use the population mean and variance for prediction

## Architectures

- Image Classification
- Object Detection
- Instance Segmentation

## Reading List

|Material|Read it for|URL
|--|--|--|
|Deep Learning - Chapter 9.2: Motivation (p 329-335)|3 motivations for convolution|http://www.deeplearningbook.org/contents/convnets.html|
|Deep Learning - Chapter 9.3: Motivation (p 335-339)|The idea behind pooling|http://www.deeplearningbook.org/contents/convnets.html|