# Machine Learning Tutorial

## Oceanhackweek 2019

In the recent years we have heard of many success stories of applications of Machine Learning for solving problems across domains. However, machine learning is not new: it has a long history intertwined with other fields such as Statistics, Data Mining, Pattern Recognition, Artificial Intelligence, Deep Learning, Data Science, etc.

So let's spend a few minutes and compare our own understanding of what all these terms mean to us.

### Activity 1:

Discuss with your neighbor and provide your own definition of these terms in this etherpad:

[https://etherpad.net/p/Oceanhackweek2019](https://etherpad.net/p/Oceanhackweek2019)

![](img/Terms.png)

**Some History**:

Statistics: [The Lady Tasting Tea]()

Machine Learning: [Statistical Modeling: The Two Cultures]()

Artificial Intelligence: [A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence]()

Data Science: [An Action Plan for Expanding the Technical Areas of the Field of Statistics]()


**Focus on the problems not the names!**

Machines (i.e. algorithms) which are learning from data. 

**Machine Learning Categories**:
* Supervised Learning
* Unsupervised Learning
* Semi-supervised Learning
* Reinforcement Learning

### Some popular Machine Learning Tasks 
(and a roadmap for solving them Andrea Mueller '13)

![](img/MLmap.png)

![](img/Classification.png)

---

![](img/Regression.png)

---
![](img/Clustering.png)

---
![](img/DimensionalityReduction.png)
---
![](img/ReinforcementLearning.png)

**Regression:** each observation has a corresponding output value. We want to learn a mapping which can predict the labels of future observations which do not have output values.

**Clustering:** combine the observations into several groups according to some measure of 'closeness'.

**Dimensionality Reduction:** embed the observations into a smaller dimensional space by preserving some measure of 'closeness'.

## Example of Applications in Oceanography

**Classification:** Quality Control with Machine Learning

Detect outliers in hydrographic profiles.

* Annotations of outliers

![](img/Labels.png)

* Set of Features

![](img/Features.png)

Guilherme P. Castelão, *A Flexible System for Automatic Quality Control of Oceanographic Data*, [https://arxiv.org/pdf/1503.02714.pdf](https://arxiv.org/pdf/1503.02714.pdf)


**Regression:** Sea Surface Temperature Interpolation with Gaussian Process regression.

Argo Raw Data

<img src = https://raw.githubusercontent.com/oceanhackweek/ohw19-tutorial-machine-learning/master/img/Argo_raw.png width="600">

Argo Predictions

![Argo Predictions](img/Argo_Predictions.png)



Mikael Kuusela and Michael L. Stein, Locally stationary spatio-temporal interpolation of Argo profiling float data [https://arxiv.org/abs/1711.00460](https://arxiv.org/abs/1711.00460)

**Dimensionality Reduction**: PCA of Sea Surface Temperature
    
![](img/ElNinoEOF.png)

**Clustering:**

Marine Bacteria Gene Clustering 

![](img/gene_clustering.png)



Leão T, Castelão G, Korobeynikov A, Monroe EA, Podell S, Glukhov E, Allen EE, Gerwick WH, Gerwick L. 2017. Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea. Proc Natl Acad Sci U S A 114:3198–3203. doi:10.1073/pnas.1618556114.

### Activity 2:
Go to the [etherpad]([https://etherpad.net/p/Oceanhackweek2019](https://etherpad.net/p/Oceanhackweek2019)) and try to provide one example of the above problems in oceanography.

### Oceanography Data Comes in all Sizes and Shapes

Oceans can be observed through multiple sensors at many different scales. Oceans are complex ecosystems and this results in abundance of intercorrelated measurements. Many machine learning techniques make assumptions about the data being in a specific format with explicit independence constraints. To apply standard techniques, and extract meaningful information from the data, researchers often need to transform them into a standard format so that they can be run through existing libraries.

<img src = "https://raw.githubusercontent.com/oceanhackweek/ohw19-tutorial-machine-learning/master/img/ocean_data.png" width="600">

### Feature Extraction

Many of the existing machine learning libraries assume observations are represented by a set of features and can be stored in a table format. But oceanography data often has time, spatial, frequency, channel or other dimensions. 


<img src="https://raw.githubusercontent.com/oceanhackweek/ohw19-tutorial-machine-learning/master/img/feature_table.png" width=300>       <img src="img/dataset-diagram.png" width=300>

[Image Source](https://www.superhumaninvesting.com/understand-machine-learning-five-minutes/)

This requires either to extract simple features from the compex data, or use machine learning libraries which can handle complex relationships.


### What about Deep Learning?

Can't deep learning solve all problems?

![](img/ML_DL.png)

Deep learning is the methodology of using deep neural networks to solve machine learning problems.

![](img/History_DL.png)

Li Deng and Dong Yu: Deep Learning Methods and Applications.
Foundations and Trends in Signal Processing, 7 (3-4), 197-387. 2014.)

#### Shallow vs Deep Learning

![a](img/LogisticRegressionNN.png)

Logistic Regression is a single layer neural network.

Image Source: http://samuelhermanblog.blogspot.com/2017/01/deep-learning-part-1-logistic.html

<!--[](https://i.stack.imgur.com/fKvva.png)-->

Fully connected networks have too many paramaters:

$$\# \textrm{parameters for two fully connected layers} \approx \# \textrm{ neurons in layer}_1 \times \# \textrm{ neurons in layer}_2 \times +  \#\textrm{ neurons in layer}_2 $$

In [1]:
# 5 x 100

(100*100+100)*5

50500

With sufficiently enough parameters a fully connected network can perfectly predict the labels. 

### Bias-Variance Trade-off

![](img/target_bias_variance_tradeoff.png)
![](img/bias_variance_tradeoff.png)
https://djsaunde.wordpress.com/2017/07/17/the-bias-variance-tradeoff/

The most useful neural network architectures usually have more complex structure which aims to 

* reduce the number of model parameters
* capture repeatable patterns in the data

![](img/NeuralNetworkZo19High.png)

**Popular Architectures:**

* Convolutional Neural Networks: images, spatial data, localized temporal data
    
* Long Short Term Memory Networks (LSTM): temporal data with longer term dependencies
    
* Autoencoders: large data => dimensionality reduction 

**Examples in Oceanography:**
* finding patterns in multidimensional data: [plankton classification](http://benanne.github.io/2015/03/17/plankton.html), [whale call detection](https://docs.meridian.cs.dal.ca/ketos/tutorials/index.html)
* timeseries forecasting: [predicting El Nino Southern Oscillation](https://developers.arcgis.com/python/sample-notebooks/predicting-enso/)
* speedup of model simulation: [predicting ocean waves](https://www.ibm.com/blogs/research/2017/09/deep-learning-forecast-ocean-waves/)
* missing data generation, super-resolution: [coarser grid weather models](https://ai.googleblog.com/2019/07/learning-better-simulation-methods-for.html)