# Introduction to Computer Vision and CNNs

### Trinity 2021 - Week 6 - 2021.06.01
### Lucas Kruitwagen
DPhil, Geography and the Environment, Smith School of Enterprise and the Environment
#### lucas.kruitwagen@gmail.com
#### @lucaskruitwagen
#### https://github.com/Lkruitwagen

In [None]:
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<i class="fas fa-envelope fa-xs"></i>
<i class="fab fa-twitter fa-xs"></i>
<i class="fab fa-github fa-xs"></i>

## Contents - Week 6

1. Computer Vision Problems

1. Machine Learning Approach

1. Let's Code! TF+MNIST

1. What are CNNs?

1. History of CNNs

1. Let's Code! TF+MNIST+CNNs

### 1. What is Computer Vision

Core problems in computer vision:

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/computer_vision_tasks.png" alt="drawing" style="display:inline" width="800"></img><sub>[1]</sub>

... Information extraction from spatially-structured data (e.g. images, video)

#### Example Applications

Optical Character Recognition | Facial Detection | Pose Detection 
 -- | ---------------- | -------------- 
 <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/license_plate.jpeg" alt="drawing" style="display:inline" width="300"></img><sub>[2]</sub> |   <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/friends.gif" alt="drawing" style="display:inline" width="300"></img><sub>[3]</sub>  |   <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/dance.gif" alt="drawing" style="display:inline" width="300"></img><sub>[4]</sub>  
 
Self-driving Vehicles | Anomaly Detection | Medical Imagery
--------------------- | ----------------- | --------------
 <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/selfdriving.gif" alt="drawing" style="display:inline" width="300"></img><sub>[5]</sub>  | <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/anomaly.jpg" alt="drawing" style="display:inline" width="300"></img><sub>[6]</sub>  | <img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/medical.png" alt="drawing" style="display:inline" width="300"></img><sub>[7]</sub> 

#### More examples from Climate Change + AI

**Remote Sensing Solar PV Facilities - A global inventory**

Solar PV is a key technology for mitigating climate change while increasing energy access in the Global South. Coauthors and I have used ML with remote sensing imagery to search the entire planet for solar PV facilities and determine their installation dates - critical data for supporting policy, engineering, and planning.

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/fig-1_samples.png" alt="drawing" style="display:inline" height="500"></img> 

<sub>Kruitwagen, L., Story, K., Friedrich, J., Buyers, L., Skillman, S., Hepburn, C. (2021) In peer review at _Nature_. Supported by DescartesLabs Inc., the World Resources Insistute, and computing grants from AWS and GCP.</sub>

**Cloud Type Detection for causal inference of aersol effects**

The planet's energy balance is sensitive to the reflectance of marine boundary layer clouds. Cloud reflectance is determined by its mesoscale structure. Anthropogenic aerosols cause transitions in these structures. Coauthors and I use unsupervised ML to characterise cloud types and then isolate aerosol causal effects.

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/fdl.gif" alt="drawing" style="display:inline" height="500"></img> 

<sub>Christensen, M., Jones, W., Kusner, M., Kruitwagen, L., Pearce, T., Saengkyongam, S., Watson-Parris, D. (2020) *Aerosol Effects on Mesoscale Structures in Marine Boundary Layer Clouds*. Supported by the European Space Agency and the Frontier Development Lab.</sub>

**Flood Detection and Mapping**

Timely flood mapping is crucial for emergency response efforts. Lightweight flood-mapping models can be implemented on spacecraft for streaming inference and alert systems.

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/ml4cc.png" alt="drawing" style="display:inline" height="500"></img> 

<sub>Ahmed, N., Budd, S., Kruitwagen, L., Mateo-Garcia, G., Maynard-Reid, M., Praveen, S., Roth, N. (alph.) (2021) A Machine-Learning for Climate Change (ML4CC) project. Supported by Trillium Technologies Ltd and UNOSAT.</sub>

**Multispectral+Radar Self-Supervised Sensor Fusion**

Self-supervised sensor fusion of Sentinel-1 synthetic-aperature radar and Sentinel-2 multispectral data for general purpose semantic embeddings, leading to a proliferation of low-data use cases.

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/deepsentinel.png" alt="drawing" style="display:inline" height="500"></img> 

<sub>Kruitwagen, L. (2020) *DeepSentinel*. Supported by Microsoft AI for Earth and the European Space Agency.</sub>

**Image References**

<sub>[1] Li, F, Johnson, J., Yeung, S. (2017) http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf</sub>
<sub>[2] https://medium.com/@quangnhatnguyenle/detect-and-recognize-vehicles-license-plate-with-machine-learning-and-python-part-1-detection-795fda47e922</sub>
<sub>[3] https://towardsdatascience.com/real-time-face-recognition-with-cpu-983d35cc3ec5</sub>
<sub>[4] https://nanonets.com/blog/human-pose-estimation-2d-guide/</sub>
<sub>[5] https://towardsdatascience.com/semantic-segmentation-popular-architectures-dff0a75f39d0</sub>
<sub>[6] https://www.ricoh.com/technology/tech/073_imagerecognition</sub>
<sub>[7] https://www.nature.com/articles/s41598-019-42557-4</sub>

### 2. Machine Learning Approach

Like other domains, we have some data, $X$, with which we want to predict some target, $Y$. We're looking for a function:

$F(X,\theta) = \hat{Y} \approx Y$

We can use a neural network, parameterised by $\theta$ as a universal approximator. As long as $F$ is differentiable, we can define a loss function $\mathcal{L}(\hat{Y}, Y)$ which we can minimise to find the values $\tilde{\theta}$ that maximises the likelihood function:

$\tilde{\theta} = \text{argmin } \mathcal{L}(F(X,\theta),Y)$

Common loss functions are MSE (L2 loss) for regression problems and cross-entropy for classification problems. These loss functions are smooth and concave, so gradient descent can be used to solve for $\tilde{\theta}$ corresponding to the global minimum of $\mathcal{L}$.

We also want to ensure our function $F$ is generaliseable to $X'$ and $Y'$ not in $X$ and $Y$, i.e. that neither the model nor parameters have been overfit to the available data. Parameter overfit is mitigated by regularisation via random dropout. Model overfit is managed by retaining an out-of-sample _validation set_ alongside the _training set_, and _test set_.
* **training set**: used to solve for parameters $\tilde{\theta}$
* **validation set**: used to explore the hyperparameter space 
* **test set**: used to report the out-of-sample performance of the maxmimum likelihood estimator

Sometimes training data from different distributions can be used to train a single model. Multiple training sets and additional _training-validation_ sets can be used, but a machine learning problem should use only a single validation set drawn from the same distribution of the test set.

#### Approach in Computer Vision

##### A single sample $X$

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/sample_X.png" alt="drawing" style="display:inline" width="800"></img>

C: Channels; H: Height (pixels); W: Width (pixels)

**NB:** Tensorflow: "channels-last", i.e. [H,W,C]; PyTorch: "channels-first", i.e. [C,H,W]

##### A single target $Y$

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/sample_Y.png" alt="drawing" style="display:inline" width="1000"></img>

How it used to be done:

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/traditional_cv.png" alt="drawing" style="display:inline" width="800"></img><sub>[8]</sub>

Feature engineering: mathematical transforms of the input image that the expert hypothesizes are important for the downstream task, e.g. edge detection, color, color gradiants

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/lecun_2015_features.png" alt="drawing" style="display:inline" width="200"></img><sub>[9]</sub>

How it's done now:

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/dl_cv.png" alt="drawing" style="display:inline" width="800"></img><sub>[8]</sub>

**Image References**

<sub>[8]: O'Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Velasco-Hernandez, G., Krpalkova, L., Riordan, D., Walsh, J. (2019) https://arxiv.org/abs/1910.13796</sub>
<sub>[9]: LeCun, Y., Bengio, Y., Hinton G. (2015) https://www.nature.com/articles/nature14539</sub>

### 3. Computer Vision Hello World: MNIST
**M**odified **N**ational **I**nstitute of **S**tandards and **T**echnology database: 70,000 grayscale pictures of hand-written digits 0-9, 28x28px.

The original problem: how to machine-read US zip codes on letter envelopes. Now a ML benchmark and teaching dataset.

<img src="https://raw.githubusercontent.com/Lkruitwagen/teaching/main/cv/assets/mnist.png" alt="drawing" style="display:inline" width="600"></img>

### Let's Code!

Import dependencies and set up Jupyter environment. 

*best practises: separate built-ins, packages, and ML-libraries. Vertically align in-line comments.*

In [None]:
import os, sys                       # some built-ins 

import matplotlib.pyplot as plt      # visualisation
import numpy as np                   # data maniputlations

import tensorflow as tf
import tensorflow_datasets as tfds   # built-in MNIST

*You may wish to also set up your Lab environment if possible. Some helpful commands:*

Watch GPU loading: `watch nvidia-smi -i 1`

Watch CPU and memory loading: `htop`

In [None]:
tf.config.list_physical_devices()    # let's check that TF is GPU-ready

In [None]:
### Load the MNIST dataset iterators
(ds_train, ds_test), ds_info = tfds.load(
    'mnist',                      # TFDS has MNIST built-in
    split=['train', 'test'],      # MNIST only has train and test data... fine for today
    as_supervised=True,           # dataset generators include samples and labels 
    with_info=True,               # also return dataset metadata
)

In [None]:
type(ds_train), type(ds_test)

`ds_train` and `ds_test` are *generators*. They serve our data to the ML model, manage batchsize and parallelisation, and can be customised with augmentations.

In [None]:
### Inspect our data
sample_X, sample_Y = next(ds_train.as_numpy_iterator())             # get a single sample
print ('Single sample:',type(sample_X), type(sample_Y), sample_X.shape, sample_Y.shape)

sample_X, sample_Y = next(ds_train.batch(100).as_numpy_iterator())  # get a batch of 100 samples
print ('Batch sample:',type(sample_X), type(sample_Y), sample_X.shape, sample_Y.shape)

In [None]:
### Visualisat our data
fig, axs = plt.subplots(10,10,figsize=(8,8))
axs = axs.flatten()                                           # flatten array of axes (10,10) -> (100,)
for ii in range(100):
    axs[ii].text(13,-1,str(sample_Y[ii]),color='k')           # annotate above axis
    axs[ii].imshow(np.squeeze(sample_X[ii,:,:]), cmap='gray') # squeeze out the channel dimension
    axs[ii].axis('off')
plt.show()

### Let's build on what we know - multiclass classification with scikit learn

We want a multi-class classifier because our labels are one of 10 digits. We can make one with an ensemble of one-vs-all classifiers conveniently implemented by sklearn. We can use a SVM for each class' classifier.

In [None]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC

We have a nice small dataset that can fit in memory. Let's get it all from our TF datasets into numpy arrays.

In [None]:
X_trn = np.array([_x for _x, _y in ds_train.as_numpy_iterator()])
Y_trn = np.array([_y for _x, _y in ds_train.as_numpy_iterator()])
X_test = np.array([_x for _x, _y in ds_test.as_numpy_iterator()])
Y_test = np.array([_y for _x, _y in ds_test.as_numpy_iterator()])

Image data usually comes in as Byte integers. Let's normalise our input data and change it to float. Float64 is a bit excessive, so let's go with Float32.

In [None]:
print('raw data:',X_trn.max(), X_trn.min(), X_trn.dtype)
X_trn, X_test = (X_trn/255.).astype(np.float32), (X_test/255.).astype(np.float32)
print('normalised:',X_trn.max(), X_trn.min(), X_trn.dtype)

Our targets are current stored as categorical values represented by integers. We need to 'one-hot' encode them to a vector of [0,1] targets.

In [None]:
n_classes = np.unique(Y_trn).shape[0]

In [None]:
def one_hot_encode(targets, n_classes):
    return np.eye(n_classes)[targets]

In [None]:
Y_trn = one_hot_encode(Y_trn, n_classes)
Y_test = one_hot_encode(Y_test, n_classes)

Last thing - we need to flatten our training data X. The classifier expects a 2D dataframe of [n_samples,m_features]. Each pixel-channel datum will be a feature.

In [None]:
X_trn = X_trn.reshape(X_trn.shape[0],-1)
X_test = X_test.reshape(X_test.shape[0],-1)
print (X_trn.shape)

In [None]:
### train our one-vs-all classifier
classifier = OneVsRestClassifier(LinearSVC()).fit(X_trn, Y_trn)

In [None]:
### run prediction on our test data
Y_test_hat = classifier.predict(X_test)

In [None]:
# use np.argmax to return test data to categorical
accuracy = (np.argmax(Y_test_hat, axis=1)==np.argmax(Y_test, axis=1)).sum()/Y_test.shape[0]

In [None]:
accuracy

### A simple deep neural network with TF

As before, we need to normalise our data and cast it to Float32. We can use tf.dataset.map to map a normalising function over all our data.

In [None]:
def normalise_mapper(sample, target):                     # sample and target are now tf tensors
    return tf.cast(tf.squeeze(sample), tf.float32) / 255., target      # return the (image, label) tuple

In [None]:
ds_train = ds_train.map(normalise_mapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)   # AUTOTUNE - > allows TF to decide how many CPU processes to use
ds_test = ds_test.map(normalise_mapper, num_parallel_calls=tf.data.experimental.AUTOTUNE)

Configure the data pipeline:
- *cache*: for a small dataset, read it only once and keep it in memory
- *shuffle*: randomly select dataset elements. 
- *batch*: set the batch size
- *prefetch*: allow the data pipeline to fetch samples while the model is running and updating

In [None]:
ds_train = ds_train.cache()
ds_train = ds_train.shuffle(ds_info.splits['train'].num_examples) # Set the buffer to the full dataset for small datasets
ds_train = ds_train.batch(128)                                    # 128 a nice power of 2
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)       # allow TF to decide how many processes to prefect data with

ds_test = ds_test.cache()
ds_test = ds_test.shuffle(ds_info.splits['test'].num_examples)
ds_test = ds_test.batch(128)
ds_test = ds_test.prefetch(tf.data.experimental.AUTOTUNE)

Build a simple fully-connected neural network with two hidden layers

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),   # a non-parameterised layer to flatten our image data for full-connection
  tf.keras.layers.Dense(128,activation='relu'),    # a fully connected layer with ReLU activation
  tf.keras.layers.Dropout(0.5),                    # a dropout layer for regularisation
  tf.keras.layers.Dense(10)                        # an output layer with same dimension as our targets
])
model.summary()

In [None]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),                                # ADAM optimizer -> always a good first bet
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),     # from_logits -> don't need to one-hot the targets
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],
)

In [None]:
model.fit(
    ds_train,
    epochs=10,
    validation_data=ds_test,
)