# Machine learning

# What is it?

- Machine learning (ML) is a techinque which uses computers to discover patterns or information about your data.
- It is a part of the wider topic of *artificial intelligence*
- There are lots of different types of machine learning

# Examples of machine learning

- Simplest ML algorithm could be a linear regression. It automatically and iteratively looks at your data to calculate the parameters of your $y = mx + c$ curve
- A more advenced technique is *K-means clustering*. It is a way of finding clusters of points in your data without having to input any explicit labels.
- The most famous is *neural networks* (NN) which were inspired by the brain and use a directed network of connected neurons to describe features of the data set.
    - More recently (since about 2010) *deep neural networks* (DNN) have become possible, allowing more detailed models of data to be learned starting the modern buzz for *deep learning*.

# What are neural networks

Neural netowrks are a collection of ariticial neurons connected together so it's best to start by learning about about neurons.

In nature, a neuron is a cell which has an electrical connection to other neurons. If a charge is felt from 'enough' of the input neurons then the neuron fires and passes a charge to its output.

An artificial neuron has multiple inputs and can pass its output to multiple other neurons.

A neuron will calculate its value, $p = \sum_i{x_iw_i}$ where $x_i$ is the input value and $w_i$ is a weight assigned to that connection. This $p$ is then passed through some *activation function* to determine the output of the neuron.

<img src="neuron.png" alt="An artificial neuron" style="width: 200px; margin:0 auto;"/>

## Networks

The inputs to each neurons either come from the outputs of other neurons or are explicit inputs from the user. This allows you to connect together a large netowrk of neurons:

<img src="network.png" alt="An artificial neural network" style="width: 400px; margin:0 auto;"/>

In this network every neuron on one layer is connected to every neuron on the next. Every arrow in the diagram has a weight assigned to it.

You input values on the left-hand side of the network, and the data flows through the network from layer to layer until the output layer has a value.

## What shape should the network be?

There is some art and some science to deciding the shape of a network. There are rules of thumb (hidden layer size should be similar sized to the input and output layers) but this is one of the things that you need to experiment with and see how it affects performance.

The number of hidden layers relates to the level of abstraction you are looking at. Generally, more complex problems need more hidden layers (i.e. deeper networks) but this makes training harder.

## How are the weights calculated?

The calculation of the weights in a network is done through a process called *training*. This generally uses lots of data examples to iteratively work out good values for the weights.

# How do you train neural networks

The main method by which NNs are trained is a technique called *backpropogation*.

In order to train your network you need a few things:
 - A labelled training data set
 - A labelled test (or evaluation) data set
 - A set of initial weights

## Initial weights

The weights to start with are easy: just set them randomly!

## Training and testing data sets

You will need two data sets. One will be ued by the learning algorithm to train the network and the other will be used to report on the quality of the training at the end.

It is important that these data sets are disjoint to prevent *overfitting*.

It is common to start with one large set of data that you want to learn about and to split it into 80% training data set and 20% test data set.

## Backpropogation ("the backward propogation of errors")

Once you have your network structure, your initial weights and your training data set, you can start training.

There have been lots of algorithms to do this over the last several decades but the currently most popular one is *backpropogation*.

The first thing you need to do is to calculate the derivative of each weight with respect to the output of the network, $D_n = \frac{dw_n}{dy}$. This gives how much you need to tweak each weight and in which direction to correct the output.

Then for each training entry:
 - pass it through the network and find the value $y$
 - compare $y$ with the expected true output, $t$ to calculate the error $\epsilon$
 - tweak each weight by $\delta w_n = \epsilon R \frac{dw_n}{dy}$ where $R$ is the *learning rate*
 
This means that the 'more wrong' the weights are, the more the move towards the true value. This slows down as, after lots of examples, the network *converges*.

<img src="backprop1.png" alt="Back propogation example" style="width: 100%; margin:0 auto;"/>

# Common neural network libraries

It would,as with with most things, be possible to to the above by hand but that would take years to make any progress. Instead we use software packages to do the leg work for us.

The can in general, construct networks, automatically caluculate derivatives, perform backpropogation and evaluate performance for you.

- PyTorch
- Tensorflow
- Keras
- Caffe2
- scikit-learn

In this workshop, we will be using Tensorflow.

# Our first neural network: classifying Irises

We're going to start with a classic machine learning example, classifying species of Irises.

![three iris species](iris_three_species.jpg)

Iris setosa, Iris versicolor, and Iris virginica 

## Data set

There [exists a data set of 150 irises](https://en.wikipedia.org/wiki/Iris_flower_data_set), each classified by sepal length and width, and petal length and width.

|Sepal length |	sepal width |	petal length |	petal width |	species|
|--- |	--- |	--- |	--- |	-|
|6.4 |	2.8 |	5.6 |	2.2 |	2|
|5.0 |	2.3 |	3.3 |	1.0 |	1|
|0.9 |	2.5 |	4.5 |	1.7 |	2|
|4.9 |	3.1 |	1.5 |	0.1 |	0|
|... |	... |	... |	... |	...|

Each species label is naturally a string (for example, "setosa"), but machine learning typically relies on numeric values. Therefore, someone mapped each string to a number. Here's the representation scheme:

 - 0 represents setosa
 - 1 represents versicolor
 - 2 represents virginica


## The code

The Python code that we will be running is available at [premade_estimator.py](https://github.com/tensorflow/models/blob/master/samples/core/get_started/premade_estimator.py). Feel free to follow along with that file but the important parts of the code will be on these slides.

## Loading our data

Since we're working with a common data set, Tensorflow comes with some helper function to load the data into the correct form for us.

In [iris_data.py](https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.py), there is a function `load_data`.

```python
>>> (train_x, train_y), (test_x, test_y) = load_data()
>>> train_x.head()
   SepalLength  SepalWidth  PetalLength  PetalWidth
0          6.4         2.8          5.6         2.2
1          5.0         2.3          3.3         1.0
2          4.9         2.5          4.5         1.7
3          4.9         3.1          1.5         0.1
4          5.7         3.8          1.7         0.3
>>> train_y.head()
0    2
1    1
2    2
3    0
4    0
Name: Species, dtype: int64
```
It brings in the data from a CSV file into a Pandas `DataFrame`.

## Prepping our data

Also in [iris_data.py](https://github.com/tensorflow/models/blob/master/samples/core/get_started/iris_data.py) there is a function called `train_input_fn`:

```python
def train_input_fn(features, labels, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)
    return dataset
```

We pass this `train_x`, `train_y` and our wanted batch size.

 - First it converts the input data format to a Tensorflow `Dataset`
 - Then it shuffles, repeats and batches the examples
 - Finally it returns the data set

## Designing our network

Tensorflow comes with a network specially designed for this kind of *classification* problem. It automates a lot of the setup work but has a few configurable parameters.

The network is called [tf.estimator.DNNClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier) (Deep Neural Network Classifier). In our case we will give it three things:
 1. the list of the features (in our case 'SepalLength', 'SepalWidth', 'PetalLength' and 'PetalWidth')
 2. the number and size of the hidden layers
 3. the number of output classes to create

```python
classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    # Two hidden layers of 10 nodes each.
    hidden_units=[10, 10],
    # The model must choose between 3 classes.
    n_classes=3
)
```

and that is all that is needed to describe the shape of our network. We can now get to work training it.

## Training our network

To train our network, all we need to do is call the `train` method on the classifier object we just created.

It takes two arguments: the first is the function to use to generate the training data set so we use our `train_input_fn` from above and the second is the numer of steps to perform which will change how long it trains for.

```python
classifier.train(
    input_fn=lambda:iris_data.train_input_fn(train_x, train_y,
                                             args.batch_size),
    steps=args.train_steps
)
```

At this point, Tensorflow will go ahead and train the network, outputting its progress to the screen. It should take a few seconds to run.

## Evaluating our model

We want to check how good a job the training did so we then evaluate our network on our test data set. It takes a very similar form to training:

```python
eval_result = classifier.evaluate(
    input_fn=lambda:iris_data.eval_input_fn(test_x, test_y,
                                            args.batch_size))

print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
```

It should print something like:

```
Test set accuracy: 0.933
```

telling us that the network classified the test data set with a 93.3% accuracy.

## Use the model

Finally, we want to ue the model to make a prediction about the real world. Given a few examples of irises, we evaluate them using the model and compare the results to what would expect:

```python
expected = ['Setosa', 'Versicolor', 'Virginica']
predict_x = {
    'SepalLength': [5.1, 5.9, 6.9],
    'SepalWidth': [3.3, 3.0, 3.1],
    'PetalLength': [1.7, 4.2, 5.4],
    'PetalWidth': [0.5, 1.5, 2.1],
}

predictions = classifier.predict(
    input_fn=lambda:iris_data.eval_input_fn(predict_x,
                                            labels=None,
                                            batch_size=args.batch_size))
```
gives us:
```
Prediction is "Setosa" (99.8%), expected "Setosa"
Prediction is "Versicolor" (99.6%), expected "Versicolor"
Prediction is "Virginica" (98.5%), expected "Virginica"
```

# Introduction to image analysis

The iris example worked well but the big downside is that it required manual processing of the real-world data before it could be modelled. Someone had to go with a ruler and measure the lengths and widths of each of the flowers. A more common and easily obtainable corpus is images.

There have been manay advancements in image analysis but at the core of most of them is *kernel convolution*. This starts by treating the image as a grid of numbers, where each number represents the brightness of the pixel

$$
\begin{matrix} 
105 & 102 & 100 & 97 & 96 & \dots \\
103 & 99 & 103 & 101 & 102 & \dots \\
101 & 98 & 104 & 102 & 100 & \dots \\
99 & 101 & 106 & 104 & 99 & \dots \\
104 & 104 & 104 & 100 & 98 & \dots \\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots
\end{matrix}
$$


## Define a kernel

You can then define a *kernel* which defines a filter to be applied to the image:

$$
Kernel = \begin{bmatrix}
0 & -1 & 0 \\
-1 & 5 & -1 \\
0 & -1 & 0
\end{bmatrix}
$$

Depending on the values in the kernel, different filtering operations will be performed. The most common are:

 - sharpen (shown above)
 - blur
 - edge detection (directional or isotropic)
 
The values of the kernels are created by mathematical analysis and are generally fixed.

## Applying a kernel

This kernel is then overlaid over each set of pizels in the image, corresponding values are multiplies and then the total is summed:

<img src="conv1.jpg" alt="Convolution" style="width: 600px; margin:0 auto;"/>

## First pixel

![Convolution](conv3.jpg)

## Second pixel

![Convolution](conv4.jpg)

## Dealing with edges

![Convolution](conv5.jpg)

## Before and after

If using a Sobel edge detection kernel, you will see the following effect

![Before and after](filter.jpg)

# Convolutional neural networks

At the core of convolutional neural networks is their ability to create abstract feature detectors automatically. If carefully combined, you can create a netowrk which can create layers of abstraction going from "is there an edge here" to "is there an eye here" to "is this a person".

From a neural network perspective, there is little different in training. You can simply treat each element of the convolution kernel as a weight as we did before. The backpropogation algorithm will automatically learn the correct values to describe the training data set.

## Typical CNN

![Typical CNN](typical_cnn.png)

# Handwriting recognition

# Ethics of machine learning

Credits:

- Dog photo: CC BY 2.0 [Emily Mathews](https://www.flickr.com/photos/eamathe/14517807267/)
- Irises: <a href="https://commons.wikimedia.org/w/index.php?curid=170298"><em>Iris setosa</em></a> (by
<a href="https://commons.wikimedia.org/wiki/User:Radomil">Radomil</a>, CC BY-SA 3.0),
<a href="https://commons.wikimedia.org/w/index.php?curid=248095"><em>Iris versicolor</em></a> (by
<a href="https://commons.wikimedia.org/wiki/User:Dlanglois">Dlanglois</a>, CC BY-SA 3.0),
and <a href="https://www.flickr.com/photos/33397993@N05/3352169862"><em>Iris virginica</em></a>
(by <a href="https://www.flickr.com/photos/33397993@N05">Frank Mayfield</a>, CC BY-SA
2.0).
- Kernel convolution images: http://machinelearninguru.com/computer_vision/basics/convolution/image_convolution_1.html
- CNN layout: CC BY-SA 4.0 [Aphex34](https://commons.wikimedia.org/wiki/File:Typical_cnn.png)