<a href="https://colab.research.google.com/github/phjacobs/academic-kickstart/blob/master/CSDMS_DeepLearning_Buscombe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clinic: Landcover and landform classification using deep neural networks


CSDMS	2019	Annual	Meeting:	CSDMS	3.0	– Bridging	Boundaries

May	21-23,	2019

### Daniel Buscombe

Assistant Research Professor

School of Earth and Sustainability

School of Informatics, Computing and Cyber Systems


Northern Arizona University, Flagstaff, AZ

[Email](mailto:daniel.buscombe@nau.edu)
[Web](www.danielbuscombe.com)
[Google Scholar](https://scholar.google.com/citations?user=bwVl0NwAAAAJ&hl=en)


### What do I do with this notebook file?

You probably have opened this from a direct link I sent you. If, however, you have been given this as a .ipynb file, you should save it to your personal google drive account. Once there, you should open it using [colaboratory](https://research.google.com/colaboratory/faq.html). 

When open in colaboratory, go to **Runtime** on the toolbar, then **Change runtime type**, and finally under **Hardware accelerator*** select **GPU** and hit **Save**. 

This will allow you to train models using a GPU, which is a necessity if you want model training to complete in a reasonable time.



## Purpose and content of this workshop


### Purpose
* Deep learning models are the current state-of-the-art for image recognition, segmentation, and classification tasks

* The purpose of this clinic is to demonstrate a workflow for training pixelwise segmentation models using pairs of images and label images

* The big limitation we have today is time, so we'll be using a small (manageable, and hopefully relevant) dataset and going over a workflow that you can explore in more detail on your own


* I'll provide an overview of some tools and concepts; hopefully enough to equip you with the understanding you'll need to apply this to your own data


In [0]:
%%html
<marquee style='width: 50%; color: red;'><b>Warning: there is only so much we can achieve in 2 hours!</b></marquee>

## Using jupyter notebooks

[Jupyter](https://jupyter.org/) notebooks are a way to share executable code that can be run through a web browser. 

A notebook kernel is a computational engine that executes the code contained in a notebook. The ipython kernel executes python code. Kernels for many other languages also exist.


You can navigate and list file contents in python using the ```os``` module

In [0]:
import os
print(os.getcwd())

/content


You can also access shell commands using the bang (`!`) operator, which is usually simpler (typically using less code)


For example to print working directory (`pwd`)

In [0]:
! pwd

/content


We'll attempt to use minimal code in this workshop, which will necessitate the use of both python and bash (shell) commands


There are a couple of different ways of running a python script and passing it variables. To demonstrate this, let's first make a simple script that accepts 2 variables and prints them to screen 

In [0]:
var1 = 50
var2 = 'jupyter_is_cool'

```writefile``` is a so-called [magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

In [0]:
%%writefile test_script.py
import sys
script, var1, var2 = sys.argv
print("Script name:", script)
print("First variable:", var1)
print("Second variable:", var2)

Writing test_script.py


Below I use the ```bash``` command ```cat``` to show the contents of the file

In [0]:
! cat test_script.py

You can run the script using the ```run``` jupyter magic command. Variables are passed to the script using a $ symbol

In [0]:
%run ./test_script.py $var1 $var2

... or using bash. This is the convention we'll adopt in this clinic:

In [0]:
! python3 test_script.py $var1 $var2

### What is Machine Learning?

Machine Learning is a set of methods that allow computers to learn from data to make and improve predictions (for example cancer, weekly sales, credit default). 

Machine learning is a paradigm shift from “normal programming” where all instructions must be explicitly given to the computer to “indirect programming” that takes place through providing data.

![](https://christophm.github.io/interpretable-ml-book/images/programing-ml.png)
[source](https://christophm.github.io/interpretable-ml-book/terminology.html)

![](https://christophm.github.io/interpretable-ml-book/images/learner.png)
[source](https://christophm.github.io/interpretable-ml-book/terminology.html)

## Distinction between Machine and Deep Learning

#### Machine learning 
 * requires extracting features from data to input to the model
 * requires fine-tuning of model architecture
 * requires fine-tuning of model hyperparameters
 * performance tends to plateau with more data
 * lots of different models

#### Deep learning ...
 * automatically extract features from data
 * automatically fine-tunes hyperparameters
 * performance doesn't tend to plateau with more data
 * requires fine-tuning of model architecture
 * just one model - the artificial neural network
    
   ![](https://images.xenonstack.com/blog/machine-learning-vs-deep-learning.png)
   
   [source](https://www.mathworks.com/videos/introduction-to-deep-learning-what-is-deep-learning--1489502328819.html)
   

Therefore, lots of interest from geoscientists

* we don't have time or inclination to research models and fine-tune their parameters
* we don't have time or inclination to develop optimal feature extraction techniques


### Why use DL for image classification?
 
* No ‘feature engineering’
* Instead, hierarchy of features automatically learned from data
* Potentially more powerful -  learns more abstract information


### ML or AI?

Difference between machine learning and AI:

"If it is written in Python, it's probably machine learning. If it is written in PowerPoint, it's probably AI"

[source](https://twitter.com/matvelloso/status/1065778379612282885)

A less facetious definiton:

"“**Machine learning** is the study of computer algorithms that allow computer programs to automatically improve through experience.”"

“**Artificial intelligence** is the science and engineering of making computers behave in ways that, until recently, we thought required human intelligence.”

The term AI is a moving target "... based on those capabilities that humans possess but which machines do not.” 

[source](https://medium.com/datadriveninvestor/differences-between-ai-and-machine-learning-and-why-it-matters-1255b182fc6)


### Problems DL is supposed overcome

![](https://github.com/dbuscombe-usgs/cdi_dl_workshop/raw/67e8c84d0e0b89b024814ef2e8f3bed091ee0c4e/Day2/figs/Picture2.png)

Ok, we'll pick up lots of these threads, fill in details and explain the most important concepts as we go.


So let's get started

## Downloading the scripts and data

The scripts and data are in a github repository called [```Semantic-Segmentation-Suite```](https://github.com/dbuscombe-usgs/Semantic-Segmentation-Suite), which is a fork of the [original repository](https://github.com/GeorgeSeif/Semantic-Segmentation-Suite) with a few minor alterations made by myself, and different data sets


### Data

* The example datasets are 30-cm aerial imagery of riverine and riparian environments of the Colorado River in Grand Canyon National Park an their associated label images. 

* The imagery is described in [Durning et al., 2017](https://pubs.usgs.gov/ds/1027/ds1027_introduction.html)

* The pixel labels have been compiled by me, and are a somewhat coarse-resolution classification

* There are two data sets, **GeomorphA** and **GeomorphB**

* Due to time and compute limitations, the training data set is fairly small. A larger training data set is being compiled and will be available at a later date.


#### GeomorphA
The scene is classified into the following broad landcover/geomorphic categories: 

1. water

> all types (aerated, rough, smooth)

2. sand

> all types (wet and dry open and sparsely vegetated sand, aeolian, beach and channel margin)

3. talus and bedrock

> all bedrock and coarse material deposited under gravity

4. debris fan

> all coarse clast- and matrix-supported sediment deposited by water

5. vegetation

> medium and dense stands, low and canopy

6. other

> boats and beach umbrellas!


#### GeomorphB


1. water

> all types (aerated, rough, smooth)

2. sand

>  dry open and sparsely vegetated sand (aeolian, beach and channel margin)

3. wet sand

> wet "intertidal" channel margins adjacent to dry sand

4. submerged sand

> sand deposits visible through shallow water

5. talus and bedrock

> all bedrock and coarse material deposited under gravity

6. debris fan

> all coarse clast-supported sediment deposited by water

7. muddy debris

> all poorly sorted matrix-supported sediment deposited by water

8. vegetation

> medium and dense stands, low and canopy

9. other

> boats and beach umbrellas!



### Code
Several state-of-the-art models. Easily plug and play with different models

The repository contains a python/tensorflow workflow for retraining numerous deep learning models for generic image semantic segmentation. It contains several state-of-the-art models. 

In [0]:
! rm -rf Semantic-Segmentation-Suite/

Head over to the "files" tab and hit "refresh". You'll see that the Semantic-Segmentation-Suite folder is gone.

Next, use ```git clone``` to clone my github-hosted repository which contains the data and code for this workshop:

In [0]:
! git clone --depth 1 https://github.com/dbuscombe-usgs/Semantic-Segmentation-Suite.git

## Navigating the file system and exploring the data

We're going to change our working directory to that folder we just downloaded from github. We could do that using bash or python. Let's use python: 

In [0]:
import os
os.chdir('./Semantic-Segmentation-Suite')
print(os.getcwd())

Next we're going to import the ```tensorflow``` module and print the version number. At the time of writing, the version number is ```1.13.1```. Be aware that it is possible this code could break with other (later/earlier) versions of tensorflow

In [0]:
import tensorflow as tf
tf.VERSION


The following bash command will tell you how much RAM you have available, in GB


In [0]:
! free -g

You can test to see if a GPU is available like so:

In [0]:
tf.test.is_gpu_available()

Let's explore the contents of our ```Semantic-Segmentation-Suite``` folder

In [0]:
! ls

In [0]:
! ls GeomorphA

In [0]:
! ls GeomorphB/

Let's take a look at the contents of ```class_dict.csv```, which contains the labels and associated RGB colors: 

In [0]:
!cat GeomorphA/class_dict.csv

In [0]:
!cat GeomorphB/class_dict.csv

We can count the number of train/test/validation images using code such as:


In [0]:
from glob import glob
nval = len(glob('GeomorphA/val/*.jpg'))
print('%i validation images' % nval)

ntrain = len(glob('GeomorphA/train/*.jpg'))
print('%i training images' % ntrain)

In [0]:
nval = len(glob('GeomorphB/val/*.jpg'))
print('%i validation images' % nval)

ntrain = len(glob('GeomorphB/train/*.jpg'))
print('%i training images' % ntrain)

## Getting to know ```train.py``` and using it for quick/exploratory tests

It is often useful to make quick tests to see if there is any major issue with the architecture of the model itself

Let's execute the next bit of code, and while that is running we'll go through those command-line inputs

We'll demonstrate this using the **GeomorphA** data set


**12 mins**


In [0]:
! rm -rf checkpoints/

Let's define some inputs and then run the script. While it's running, we'll talk about what it is doing

**9 mins**

In [0]:
N = 10 ## number of epochs
B = 1 ## batch size

In [0]:
! python3 train.py --num_epochs $N --dataset GeomorphA --num_val_images $nval --batch_size $B --frontend ResNet152

### train.py
What is this script doing?

First, it automatically downloads the model pre-trained checkpoints from the web

> **Checkpoints?**

Checkpoints are a way to save the current state of your model so that you can pick up from where you left off. They capture

* The architecture of the model, allowing you to re-create the model

* The weights of the model (for now, these will essentially be random numbers)

* The training configuration (loss, optimizer, epochs, and other meta-information - I'll explain what all this means later)


Then, it looks from images and corresponding label images, and trains a deep learning model **end-to-end**. This means that all the parameters in the models will be learned from our data, using a single large network that automatically learns how to map inputs (pixels and regions of images) to outputs (our classes). This is computationally intensive, hence the need for GPUs. Our workflow would therefore be described in the DL literature as 'end-to-end'. 

We will are not using 'transfer learning', which is the (usually less computationally intensive) process of using not only existing models, but also re-purposing existing parameters sets learned from other data.


### Command-line inputs

The ```train.py``` script takes a lot of command line inputs. For now, we're using most of the defaults and only specifying 4 things, which are described below. 

Later we will keep adding command line inputs that will specify which model(s) to use and how to treat the data, but for now, we are specifying: 

* The **batch size*** (1). This is the number of training images to work through before the model’s internal parameters are updated. In general, larger numbers are better. However, because our GPU memory is limited, so we are using only a batch size of 1.

* The **number of epochs** (10). In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights. 

So, with a batch size of 1 and 40+ training images, the model doesn't even see all the images. Given that image batches are drawn randomly, the minimum number of epochs for the model to see all images would be 40, but probably much larger. Typically, 100s to 1000s of epochs are used in model training

* the dataset to use ("GeomorphA")

* the number of validation images to use (all of them)

* **Frontend** model or 'feature extractor'

#### Feature Extractors

Three feature extractors are implemented in the code we're using: 

* [MobileNetV2](https://arxiv.org/abs/1801.04381)
* [ResNet50/101/152](https://arxiv.org/abs/1512.03385), and 
* [InceptionV4](https://arxiv.org/abs/1602.07261). 

A feature extractor in this case is a deep neural network architecture that has been optimized to automatically extract only those features in your data that contribute most to the prediction variable or output in which you are interested.

We are using the feature extractor called  **ResNet152**, which is a slighter larger version of the default (ResNet101)

[ResNet152](https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624) is a really popular generic model used in image analysis. The number refers to the number of layers, but the rest of the architecture is the same. More layers usually means more accuracy but slower training


#### Dense (pixelwise) classifiers

Once image features are extracted they are then further processed at different scales. Why? 

* processing the features at different scales will give the network the capacity to handle objects at different sizes

* when performing segmentation there is a tradeoff. If you want good classification accuracy, then you’ll definitely want to process those high level features from later in the network since they are more discriminative and contain more useful semantic information. On the other hand, if you only process those deep features, you won’t get good localisation because of the low resolution!

Fifteen segmentation models are implemented. See the [github repo](https://github.com/dbuscombe-usgs/Semantic-Segmentation-Suite) for full descriptions and links to the papers that describe them

More details of these implementations can be found in [this](https://towardsdatascience.com/semantic-segmentation-with-deep-learning-a-guide-and-code-e52fc8958823) accessible blog post


#### Initial checks

You'll notice that there are 3 png files in the ```Semantic-Segmentation-Suite``` top-level directory, namely:

* accuracy_vs_epochs.png
* iou_vs_epochs.png
* loss_vs_epochs.png

These are made and updated every epoch by the program

Double-click on them in the files tab and they will each pop out into their own window


In the above, two important things to look for are

* per-class accuracies. We want these to be fairly uniform across all classes. The accuracies will (for now) be low

* average loss per epoch. Click on "loss_vs_epochs.png"

In order to quantify how a given model performs, the loss function is usually used to evaluate to what extent the actual outputs are correctly predicted by the model outputs.


### Looking at training results (Validation statistics)

Double click on the images:

* loss_vs_epochs.png
* accuracy_vs_epochs.png
* iou_vs_epochs.png


True positives are image regions/pixels correctly classified as belonging to a certain class by the model

True negatives are correctly classified as not belonging to a certain class.

False negatives are regions/pixels incorrectly classified as not belonging to a certain class

False positives are those regions/pixels incorrectly classified as belonging to a certain class.

#### Accuracy




In [0]:
from IPython.display import Math, HTML

display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))

Math(r'A = \frac{TP + TN}{TP + TN + FP + FN}')


#### F1 score (recall and precision)

These metrics are commonly used in evaluation of pixelwise segmentations, where the number of pixels corresponding to each class vary considerably.

Precision and recall are useful where the number of observations belonging to one class is significantly lower than those belonging to the other classes.

##### Precision
The proportion of positive identiﬁcations that are correct (a precision of 1 means there are no false positives)

##### Recall
Recall is the proportion of actual positives identiﬁed correctly (a recall of 1 means there are no false negatives)

##### F1 score
An equal weighting of the recall and precision, quantifying how well the model performs in general.
 


In [0]:
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))

Math(r'P = \frac{TP + TN}{TP + FP}')

In [0]:
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))
Math(r'P = \frac{TP + TN}{TP + FN}')

In [0]:
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))
Math(r'F_1 = 2\frac{P \times R}{P + R}')


#### IoU metric

The intersection over union (IoU) metric is a simple metric used to evaluate the performance of a segmentation algorithm. 

In [0]:
display(HTML("<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/"
               "latest.js?config=default'></script>"))
        
Math(r'IoU = \frac{y_{true} \cap y_{pred}}{y_{true} \cup y_{pred}}')

![](https://www.oreilly.com/library/view/deep-learning-for/9781788295628/assets/63fb2c41-8e83-49c5-ad3a-fee59e8a178b.png)

By convention, a predicted bounding box is considered as being good if IoU >> 0.5

![](https://stanford.edu/~shervine/images/intersection-over-union.png)
[source](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)

### Looking at per-class accuracies, per image

Below we're checking the contents of the ```checkpoints``` folder

and then looking at the contents of a csv file that contains accuracies and other validation statistics for each category and each validation image:

In [0]:
! ls checkpoints/
! cat checkpoints/0009/val_scores.csv

### Using checkpoints

With Colab, GPU kernels may terminate, suddenly and without warning. Therefore, it is often a good idea to keep number of training epochs small, but to use checkpoints

Add ```--continue_training True``` to the arguments list below if you want to retrain from a checkpoint

**You can only use this option with one particular model** (but you can use a different feature extractor if you wish)

Below we'll start from the last checkpoint, and train for an additional 10 epochs

Like before, let's run the code first the talk about it while its running (scroll down)

**9 mins**

In [0]:
! python3 train.py --num_epochs $N --dataset GeomorphA --num_val_images $nval --batch_size $B --continue_training True --frontend ResNet152

The relatively minor disadvantage with the code we are using is that it doesn't keep track of the training history in the checkpoints, so our "X_vs_epoch.png" figures don't include the results from the previous model training step



## Convolutional neural networks

While the model is training, let's explore in a little more detail convolutional neural networks, also known as **CNNs**, which are a specific type of neural networks used extensive in deep learning classification (and regression) tasks

* Convolutional Neural Networks (CNNs) are very similar to ordinary Neural Networks: they are made up of neurons that have learnable weights and biases.

* CNN architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture.

Natural images exhibit ”‘stationarity”’, meaning that the statistics of one part of the image are the same as any other part. This suggests that the features that we learn at one part of the image can also be applied to other parts of the image, and we can use the same features at all locations.

More precisely, having learned features over small (say 8x8) patches sampled randomly from the larger image, we can then apply this learned 8x8 feature detector anywhere in the image. Specifically, we can take the learned 8x8 features and ”‘convolve”’ them with the larger image, thus obtaining a different feature activation value at each location in the image.

### Layers

The layers consist of hierarchical filters that are designed to extract features of increasingly complexity

* The input of each layer is the output of the previous one

* The layer does not need to learn the whole concept at once, but actually build a chain of features that build that knowledge.

* It learns the best way to map inputs to outputs (you don’t need to)


![](https://i1.wp.com/www.michaelchimenti.com/wp-content/uploads/2017/11/Deep-Neural-Network-What-is-Deep-Learning-Edureka.png)


### Types of layers

CNNs are generally composed of the following layers:

![](https://stanford.edu/~shervine/images/architecture-cnn.png)
[source](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)

#### Convolution layer (CONV)

[source](http://deeplearning.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/)

The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output is called feature map or activation map.

![](https://stanford.edu/~shervine/images/convolution-layer-a.png)

#### Pooling (POOL)

The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.

![](https://stanford.edu/~shervine/images/max-pooling-a.png)
[source](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)

#### Fully Connected (FC)

The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.

![](https://stanford.edu/~shervine/images/fully-connected.png)
[source](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)

Let's take another look at the validation scores from the last epoch

In [0]:
! ls checkpoints/
! cat checkpoints/0009/val_scores.csv

Looks reasonable despite only training for 20 epochs

In [0]:
%%html
<marquee style='width: 30%; color: blue;'><b>Congratulations!</b></marquee>

Before we go on, we should rename the checkpoints folder

In [0]:
! mv checkpoints/ checkpoints_geoA_resnet152_densenet56/

### Testing a different model (and data set)

Like before, let's run the code first the talk about it while its running (scroll down)

(You may get a "running out of memory" warning. Just hit **Ignore** - you should be ok)

We'll demonstrate this using the **GeomorphB** data set

**11 mins**

In [0]:
! python3 train.py --num_epochs $N --dataset GeomorphB --num_val_images $nval --batch_size $B --frontend ResNet152 --model FC-DenseNet103 

This time, we switched from the default model ("FC-DenseNet56") to a slightly larger version of the same model, called **FC-DenseNet103**

Let's take a quick look at what that means

![](https://cdn-images-1.medium.com/max/1000/1*RfyAoe6Wlv4aLip2Y5Aw-Q.png)

A standard approach is to pass the input image goes through multiple convolutions and obtain high-level features.

![](https://cdn-images-1.medium.com/max/1000/1*4wx7szWCBse9-7eemGQJSw.png)

In a **ResNet** architecture, each layer gets to see both the output from the previous layer (standard) as well as the inputs to that layer. So it not only sees the ouputs but also the data used to learn that output

![](https://cdn-images-1.medium.com/max/1000/1*rmHdoPjGUjRek6ozH7altw.png)

A **Densenet** takes that concept even further: each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers. Each layer is receiving a “collective knowledge” from all preceding layers.

Since each layer receives feature maps from all preceding layers, network can be thinner (few layers than it would ordinarily be). The result is higher computational and memory efficiencies. The following figure shows the concept of concatenation during forward propagation:

![](https://cdn-images-1.medium.com/max/1000/1*9ysRPSExk0KvXR0AhNnlAA.gif)

[source](https://towardsdatascience.com/review-densenet-image-classification-b6631a8ef803)


## Model training

### Backpropagation
Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. 

### Updating weights
In a neural network, weights are updated as follows:

* Step 1: Take a batch of training data and perform forward propagation to compute the loss. 
* Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight. 
* Step 3: Use the gradients to update the weights of the network.

![](https://stanford.edu/~shervine/images/update-weights.png)
[source](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks)

### Optimizing convergence

Stochastic Gradient Descent is a popular way to find how to minimize a cost function  (called finding a global minima)

For each example in the data:

* find the value predicted by the neural network 
* calculate the loss from the loss function 
* find partial derivatives of the loss function, these partial derivatives produce gradients
* use the gradients to update the values of weights and biases

A more detailed yet accessible explanation may be found [here](http://ruder.io/optimizing-gradient-descent/) 

#### Learning rate
The learning rate, indicates at which pace the weights get updated. It can be fixed or adaptively changed. 

* If the learning rate is low, then training is more reliable, but optimization will take a lot of time because steps towards the minimum of the loss function are tiny.

* If the learning rate is high, then training may not converge or even diverge. Weight changes can be so big that the optimizer overshoots the minimum and makes the loss worse.

![](https://cdn-images-1.medium.com/max/1000/0*QwE8M4MupSdqA3M4.png)

[source](https://towardsdatascience.com/a-look-at-gradient-descent-and-rmsprop-optimizers-f77d483ef08b)


#### Adaptive learning rates
Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. 

* The Adam optimizer is the most commonly used technique. 

* Here we are using a technique called Root Mean Squared Propagation" or RMSprop an adaptive version of Stochastic Gradient Descent. A more detailed yet accessible explanation may be found [here](http://ruder.io/optimizing-gradient-descent/) 


In your own time, you'll be able to explore what "front-end" and "model" combinations are best


Ok, so let's take a look at our validation scores once more so we can compare them to the previous set. 

## A note on creating label images

Myself and colleagues are working on a tool called [Earth Annotator](https://github.com/dbuscombe-usgs/EarthAnnotator) that uses supervised machine learning to quickly and efficiently create label imagery.

* Details of the algorithms used are described in [this open access paper](https://www.mdpi.com/2076-3263/8/7/244/pdf).

* In your own time, you can trial this technique by launching an ipython notebook from [here](https://mybinder.org/v2/gh/dbuscombe-usgs/EarthAnnotator/master?filepath=EarthAnnotator.ipynb). 

* The user is prompted to provide examples of all the categories in the image, using the cursor. The algorithm (called [a fully connected conditional random field](https://en.wikipedia.org/wiki/Conditional_random_field)) then models the likelihood of those labels for every unannotated pixel in the image, based on color and/or location.

* This makes labeling a lot faster than doing it completely by hand

* Optimizing this technique is an active research topic: please get in touch or submit a pull request on github if you have ideas to make it better.

In [0]:
! ls checkpoints/
! cat checkpoints/0009/val_scores.csv

## Managing and using checkpoints

Before we go further, let's talk a little more about checkpoints


### Viewing model predictions

In [0]:
! ls checkpoints/0009

In each of the subfolders of checkpoints, and for each validation image, there is a label image enabling us to visually compare the ground truth (suffix "gt") and the model prediction (suffix "pred") 

In [0]:
! cp checkpoints/0009/B_tile_11264_0_gt.png exampleB_gt.png
! cp checkpoints/0009/B_tile_11264_0_pred.png exampleB_pred.png

In your files list (remember to hit **Refresh**), you can double click on each image to view

In [0]:
! cp checkpoints_geoA_resnet152_densenet56/0009/B_tile_11264_0_gt.png exampleA_gt.png
! cp checkpoints_geoA_resnet152_densenet56/0009/B_tile_11264_0_pred.png exampleA_pred.png

Afterwards, we can remove them

In [0]:
! rm example*.png

Before we move on, let's take a look at another example validation image

In [0]:
! cp checkpoints/0009/A_tile_4608_10752_gt.png exampleB_gt.png
! cp checkpoints/0009/A_tile_4608_10752_pred.png exampleB_pred.png
! cp checkpoints_geoA_resnet152_densenet56/0009/A_tile_4608_10752_gt.png exampleA_gt.png
! cp checkpoints_geoA_resnet152_densenet56/0009/A_tile_4608_10752_pred.png exampleA_pred.png

Ok, so it looks like we're making headway, but clearly we need to give the model more time (epochs)


Unfortunately, we don't have that time in this class, so instead we'll download some checkpoints I made earlier and resume model training 


But before we do anything, let's do some housekeeping by removing those example files

In [0]:
! rm example*.png

### Saving and downloading checkpoints

For consistency, we'll first rename our **GeomorphB** checkpoints using the same convention as before:

In [0]:
! mv checkpoints/ checkpoints_geoB_resnet152_densenet103/

The next bit of code will remove all 10 checkpoint validation folders, leaving only the checkpoint files themselves

In [0]:
!for i in `seq 0 9`; do rm -rf checkpoints_geoB_resnet152_densenet103/000$i; done
!ls checkpoints_geoB_resnet152_densenet103

And we'll do the same for the **GeomorphA** tests too

In [0]:
!for i in `seq 0 9`; do rm -rf checkpoints_geoA_resnet152_densenet56/000$i; done
!ls checkpoints_geoA_resnet152_densenet56

In [0]:
## Note that if you had more than 10 epochs, say 30, you'd also have to add a line like this:
## !for i in `seq 10 29`; do rm -rf checkpoints/00$i; done

The following command will zip up the contents of the checkpoint folders, for easy download

In [0]:
! zip -r checkpoints_geoA_resnet152_densenet56.zip checkpoints_geoA_resnet152_densenet56/*.*
! zip -r checkpoints_geoB_resnet152_densenet103.zip checkpoints_geoB_resnet152_densenet103/*.*

If you like, you could check the file size in MB of the zipped folders:

In [0]:
! du -h checkpoints_geoB_resnet152_densenet103.zip
! du -h checkpoints_geoA_resnet152_densenet56.zip

Now we have zipped folders containing the checkpoints, we may download them locally for further use, on- or off-line.

You can download it straight from the **files** browser on the left > right-click > selecting **download**

### Downloading and using a checkpoints file saved on google drive

The following workflow would enable you to download that zipped checkpoint file from a google drive. First we'll define some functions for downloading (the details are not too important)

[source for following code](https://stackoverflow.com/questions/38511444/python-download-files-from-google-drive-using-url)

In [0]:
def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"

    session = requests.Session()

    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)

    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value

    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768

    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)	

The following code will download a zipped file from my google drive containing checkpoint files, unzip it into a folder, and then delete the zipped file   

In [0]:
import requests
import zipfile

# make the checkpoints folder if it doesn't already exist
try:
  os.mkdir(os.getcwd()+os.sep+'checkpoints')
except:
  pass

url = 'https://drive.google.com/file/d/1mS-ODCEGVct3QSEVkZsyNHQkO7SOa76k/view?usp=sharing'
file_id = '1mS-ODCEGVct3QSEVkZsyNHQkO7SOa76k'  

destination = 'csdms_checkpoints_geoA_resnet152_densenet103_200.zip'
download_file_from_google_drive(file_id, destination)

Check on the size of the zipped file

In [0]:
#! stat --printf="%s" csdms_checkpoints_geoA_resnet152_densenet103_200.zip
! du -h csdms_checkpoints_geoA_resnet152_densenet103_200.zip

 Then unzip it

In [0]:
! unzip csdms_checkpoints_geoA_resnet152_densenet103_200.zip

If you're on a windows machine, you could uncomment and run the following python code instead

In [0]:
#print('unzipping')	
#zip_ref = zipfile.ZipFile(destination, 'r')
#zip_ref.extractall(os.getcwd())
#zip_ref.close()
#os.remove(destination)

Double check the contents:

In [0]:
! ls checkpoints/

## Model training -- this time for real (with my pre-prepared checkpoints)

### GeomorphA

This model checkpoint has undergone 200 iterations, but let's do another 5 and look at the results

**6-7 mins**


In [0]:
! python3 train.py --num_epochs 5 --dataset GeomorphA --continue_training True --num_val_images $nval --batch_size $B --frontend ResNet152 --model FC-DenseNet103

You'll see after the first iteration that validation metrics have improved considerably, but that the model is still imperfect

While that's running, let's discuss strategies that we might adopt to improve overall and specific class accuracies

###  The dataset

* There is error in the label data

* Should there be other categories?

* debris and sand both score relatively poorly -- are these categories too large?


The **lumping or splitting?** conundrum: often the hardest thing to get right with an image classification task



Some insight into this may be obtained when we examine the results of **GeomorphB**, which has split sand and debris categories

First, let's rename the checkpoints folder, then take a look at some examples 

In [0]:
! mv checkpoints/ checkpoints_geoA_resnet152_densenet103_205epochs/

In [0]:
! ls checkpoints_geoA_resnet152_densenet103_205epochs/0004
! cat checkpoints_geoA_resnet152_densenet103_205epochs/0004/val_scores.csv

Overall some categories are predicted really well:
* Water ~95%
* Vegetation > 90%
* Boats/umbrellas > 90%

Others, not so good:
* Bedrock + talus > 70%
* Sand ~ 50%
* Debris ~ 40%


In [0]:
! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/D_tile_3584_2048_gt.png geoA_ex1_gt.png
! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/D_tile_3584_2048_pred.png geoA_ex1_pred.png
! cp GeomorphA/val/D_tile_3584_2048.jpg geo_ex1.jpg

! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/A_tile_2560_15872_gt.png geoA_ex2_gt.png
! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/A_tile_2560_15872_pred.png geoA_ex2_pred.png
! cp GeomorphA/val/A_tile_2560_15872.jpg geo_ex2.jpg

! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/A_tile_5632_7680_gt.png geoA_ex3_gt.png
! cp checkpoints_geoA_resnet152_densenet103_205epochs/0004/A_tile_5632_7680_pred.png geoA_ex3_pred.png
! cp GeomorphA/val/A_tile_5632_7680.jpg geo_ex3.jpg

### GeomorphB

In [0]:
# make the checkpoints folder if it doesn't already exist
try:
  os.mkdir(os.getcwd()+os.sep+'checkpoints')
except:
  pass

url = 'https://drive.google.com/file/d/1oq7WYItW38uZL4es1FRXdGq7vE_0tXu1/view?usp=sharing'
file_id = '1oq7WYItW38uZL4es1FRXdGq7vE_0tXu1'  

destination = 'csdms_checkpoints_geoB_resnet152_densenet103_200.zip'
download_file_from_google_drive(file_id, destination)

In [0]:
! unzip csdms_checkpoints_geoB_resnet152_densenet103_200.zip
! ls checkpoints/

This model checkpoint has undergone 200 iterations, but let's do another 5 and look at the results

**9 mins**

In [0]:
! python3 train.py --num_epochs 5 --dataset GeomorphB --continue_training True --num_val_images $nval --batch_size $B --frontend ResNet152 --model FC-DenseNet103

### About the model training

I didn't test all the models and feature extractors. There could be better models. 

Similarly, image augmentation could be explored

## Using image augmentation

Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:

* Flip (horizontal or vertical). In the above, I have implemented both

* Rotation. 

* Brightness.  

Other codes might offer other augmentation tricks such as:

* Information loss
* Random cropping (subsets of images)
* Contrast
* Random zoom
* etc

In your own time, uncomment and run the following code to see if validation results improve: 

In [0]:
##! python3 train.py --num_epochs 10 --dataset Geomorph --num_val_images 14 --batch_size 1 --continue_training True --frontend ResNet152 --model FC-DenseNet103 --h_flip True --v_flip True --brightness 0.1 --rotation 10

Let's rename the folder and check the contents of the last validation set, just like we did before:

In [0]:
! mv checkpoints/ checkpoints_geoB_resnet152_densenet103_205epochs/
! ls checkpoints_geoB_resnet152_densenet103_205epochs/0004
! cat checkpoints_geoB_resnet152_densenet103_205epochs/0004/val_scores.csv

Then, for comparison, we'll take a look at the same 3 validation images

In [0]:
! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/D_tile_3584_2048_gt.png geoB_ex1_gt.png
! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/D_tile_3584_2048_pred.png geoB_ex1_pred.png

! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/A_tile_2560_15872_gt.png geoB_ex2_gt.png
! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/A_tile_2560_15872_pred.png geoB_ex2_pred.png

! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/A_tile_5632_7680_gt.png geoB_ex3_gt.png
! cp checkpoints_geoB_resnet152_densenet103_205epochs/0004/A_tile_5632_7680_pred.png geoB_ex3_pred.png

Some **GeomorphB** average accuracies were very similar **GeomorphA**, for example:

* water ~ 0.95
* bedrocktalus ~ 0.7
* veg ~ 0.9
* other ~ 0.9

Values for sand were generally not good:

* sand ~ 0.4
* wetsand ~ 0.4
* submerged_sand ~ 0.5

* Average debrisfan accuracy was marginally better ~ 0.45 

* But muddy debris accuracy was poor at ~ 0.3


DL workflows are scalable and should keep getting better with more and more data. 

![](https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Why-Deep-Learning-1024x742.png)

[source: Andrew Ng ](https://www.slideshare.net/ExtractConf)

But how much is enough? Clearly in this case we need a bigger data set


## GeomorphB loss, accuracy and IoU curves

![](https://github.com/dbuscombe-usgs/cdi_dl_workshop/raw/67e8c84d0e0b89b024814ef2e8f3bed091ee0c4e/Day2/figs/Picture1.png)

#### Loss vs Epochs            

![alt text-1](https://user-images.githubusercontent.com/3596509/57955589-351aa900-78ee-11e9-8944-ff7769a1b645.png)


#### Accuracy vs Epochs            

![alt text-2](https://user-images.githubusercontent.com/3596509/57955613-419f0180-78ee-11e9-80f4-5ada1fc346e8.png)

#### IoU vs Epochs            

![alt text-1](https://user-images.githubusercontent.com/3596509/57955618-4368c500-78ee-11e9-8131-45cb5a594b6d.png)  





### General limitations

* It can be very difficult to interpret a model produced with deep learning. Such models may have many layers and thousands of nodes; interpreting each of these individually is impossible

* We therefore evaluate deep learning models by measuring how well they predict, treating the architecture itself as a “black box"

* DL models require a great deal of computing power to build. For simpler problems with small data sets, deep learning may not produce sufficient added benefit over simpler methods to justify the cost and time.

* DL models are very data hungry


### Deep learning in the geosciences


* It is very early days in the development and understanding of the role of DL models in the geosciences

* Many claims about the efficacy of DCNNs for image classification are largely based upon analyses of conventional photographic imagery of familiar, mostly anthropogenic objects. Much more work required for the image classification of natural textures and objects.

* We need to build our own **shared databases**

* We need robust **benchmarks**

* We need guidance on **best practices**

Finally ...

> "There is an ongoing misconception that AI/ML are intrinsically valuable, and that therefore working in the field is bound to make you rich. A ML model is only as valuable as the problem it solves. ML without an application isn't worth anything (beyond intellectual curiosity)." Francois Chollet (author of Keras). [source](https://twitter.com/fchollet/status/1130000985466626048)



# Clinic: Landcover and landform classification using deep neural networks


CSDMS	2019	Annual	Meeting:	CSDMS	3.0	– Bridging	Boundaries

May	21-23,	2019

### Daniel Buscombe

Assistant Research Professor

School of Earth and Sustainability

School of Informatics, Computing and Cyber Systems


Northern Arizona University, Flagstaff, AZ

[Email](mailto:daniel.buscombe@nau.edu)
[Web](www.danielbuscombe.com)
[Google Scholar](https://scholar.google.com/citations?user=bwVl0NwAAAAJ&hl=en)