<h1><center>Deep learning with Keras</center></h1>

<center>Owen Jones | Bath ML | 3rd June 2018</center>

Let's start at the very beginning...

## What is Keras?

> Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. -- **[keras.io](https://keras.io)**

In other words, Keras makes it super easy to build neural networks. And that is exactly what we're going to do.

In [1]:
import keras

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


## The best dataset in the world

To get the hang of the Keras syntax, we're going to start off with a really simple network on a really simple dataset. You might have seen this one before...

In [2]:
# Set a seed for reproducibility
import numpy as np
seed = 42
np.random.seed(seed)

In [3]:
iris = np.load("data/iris.npy")
iris

array([[5.1, 3.5, 1.4, 0.2, 0. ],
       [4.9, 3. , 1.4, 0.2, 0. ],
       [4.7, 3.2, 1.3, 0.2, 0. ],
       [4.6, 3.1, 1.5, 0.2, 0. ],
       [5. , 3.6, 1.4, 0.2, 0. ],
       [5.4, 3.9, 1.7, 0.4, 0. ],
       [4.6, 3.4, 1.4, 0.3, 0. ],
       [5. , 3.4, 1.5, 0.2, 0. ],
       [4.4, 2.9, 1.4, 0.2, 0. ],
       [4.9, 3.1, 1.5, 0.1, 0. ],
       [5.4, 3.7, 1.5, 0.2, 0. ],
       [4.8, 3.4, 1.6, 0.2, 0. ],
       [4.8, 3. , 1.4, 0.1, 0. ],
       [4.3, 3. , 1.1, 0.1, 0. ],
       [5.8, 4. , 1.2, 0.2, 0. ],
       [5.7, 4.4, 1.5, 0.4, 0. ],
       [5.4, 3.9, 1.3, 0.4, 0. ],
       [5.1, 3.5, 1.4, 0.3, 0. ],
       [5.7, 3.8, 1.7, 0.3, 0. ],
       [5.1, 3.8, 1.5, 0.3, 0. ],
       [5.4, 3.4, 1.7, 0.2, 0. ],
       [5.1, 3.7, 1.5, 0.4, 0. ],
       [4.6, 3.6, 1. , 0.2, 0. ],
       [5.1, 3.3, 1.7, 0.5, 0. ],
       [4.8, 3.4, 1.9, 0.2, 0. ],
       [5. , 3. , 1.6, 0.2, 0. ],
       [5. , 3.4, 1.6, 0.4, 0. ],
       [5.2, 3.5, 1.5, 0.2, 0. ],
       [5.2, 3.4, 1.4, 0.2, 0. ],
       [4.7, 3

The first four columns are numeric features (plant-related measurements... don't worry too much), and the fifth column is a label corresponding to the species, which is what we're going to use as our target.

First we're just going to shuffle the rows, because at the moment they're in order (notice the label in the last column); in a minute we'll be splitting the data and we want a mixture of labels in each part.

In [4]:
np.random.shuffle(iris)

Now we'll separate the labels...

In [5]:
iris_labels = iris[:, 4]

... because Keras needs the labels to be "one-hot encoded".

**One-hot encoded label:** list out all the possible labels, and mark the one which is correct.

Here, our label could be 0, 1 or 2. 
    Is it: 0? 1? 2?
    ---------------
    0 =>  [1, 0, 0]
    1 =>  [0, 1, 0]
    2 =>  [0, 0, 1]

Keras can do this for us...

In [6]:
from keras.utils import to_categorical
iris_onehot = to_categorical(iris_labels)
iris_onehot

array([[0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [0., 0., 1.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0

Now we can get on with building a neural net!

## A simple network

We're going to build a "sequential" model. The clue's in the name - we start with an empty model, and _sequentially_ add layers.

In [7]:
from keras.models import Sequential

model = Sequential()

There are plenty of layers to choose from, and we'll see some more exciting ones later on, but for now we'll stick with dense layers.

For each dense layer, we specify:
* The number of `units`, or how many neurons we want in the layer - the final layer will need to have 3 units, because we're classifying as one of 3 labels
* The `activation` function we want to use - in a fully dense network, we tend to use sigmoid activation on the middle layers and softmax on the output layer

And Keras takes care of everything else for us.

Well, almost. For the very first layer in any model, we need to tell Keras what "shape" our input will be. We ignore the first dimension (the number of observations, or "rows"), because it can change without consequence; but we have to specify the other dimensions in a tuple. Here, it's a size-1 tuple, specifying that we have 4 features ("columns").

Oh, and to add layers we use... umm, `add()`.

In [8]:
from keras.layers import Dense

model.add(Dense(15, activation="sigmoid", input_shape=(4,)))
model.add(Dense(3, activation="softmax"))

Let's see what we've created...

In [9]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 15)                75        
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 48        
Total params: 123
Trainable params: 123
Non-trainable params: 0
_________________________________________________________________


Looking good!

There's one more thing to do before we can train our model though: now that we're happy with it, we have to lock it in by "compiling" it.

At this point we also need to tell Keras which loss function (always categorical crossentropy for multiclass classification) and optimizer (stochastic gradient descent, or something more fancy) to use.

We can also ask for a list of metrics which we would like to see reported during training. These have nothing to do with the training process. The training process tries to minimise the loss function. Usually that results in an increase in accuracy. Once again. The metrics have NOTHING to do with the training process. It's just nice to see them!

In [10]:
model.compile(loss="categorical_crossentropy", optimizer="sgd", metrics=["acc"])

Now we can fit our model! We pass in our training data (remember that the label is in the last column, and we don't want to include that here!) and our one-hot encoded labels, as well as:
* The number of `epochs` to train for, or the total number of times that all our training data gets passed through the network
* A "batch size" - the parameters in the network get updated after each `batch_size` observations have been passed through
* A proportion of the data which we'll set aside and use to assess our model's generalised performance (because a model can usually make great predictions about the data that's been used to train it, but it might not do so well on data it hasn't seen before)

In [11]:
model.fit(iris[:, :4], iris_onehot, epochs=50, batch_size=20, validation_split=0.2)

Train on 120 samples, validate on 30 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x1b79a626a58>

Notice that the loss keeps dropping, and the accuracy fluctuates but generally speaking increases for both the training and validation sets.

So, that's the syntax, but let's be honest - it's a rubbish boring network and a rubbish boring dataset. Let's move on to something more interesting!

## Problem 2: Identify person based on gait

Dataset from https://archive.ics.uci.edu/ml/datasets/Activity+Recognition+from+Single+Chest-Mounted+Accelerometer

### Preparing the data

In [None]:
walking = np.load("data/walking_data.npy")

In [None]:
walking_labels = np.load("data/walking_labels.npy")

In [None]:
m = walking.shape[0]
from random import shuffle
indices = [x for x in range(m)]
shuffle(indices)
train_indices = indices[:int(m*0.6)]
val_indices = indices[int(m*0.6):int(m*0.8)]
test_indices = indices[int(m*0.8):]

In [None]:
X_train = walking[train_indices, :, :]
X_val = walking[val_indices, :, :]
X_test = walking[test_indices, :, :]

# We have 15 integer labels, but these need to be one-hot encoded
# e.g. '4' becomes [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
y_train = to_categorical(walking_labels[train_indices])
y_val = to_categorical(walking_labels[val_indices])
y_test = to_categorical(walking_labels[test_indices])

In [None]:
print(X_train.shape)
print(y_train.shape)

In [None]:
# Let's have a little look...
import matplotlib.pyplot as plt
%matplotlib inline

def plot_series(series):
    plt.plot(series[:, 0], color="red")
    plt.plot(series[:, 1], color="green")
    plt.plot(series[:, 2], color="blue")

In [None]:
plot_series(X_train[0, :, :])

Can we tell between different people's data by eye?

Let's plot a few series for some different people - say, 5 series for 3 people.

![](three.png)

### The neural network

In [None]:
from keras.layers import Conv1D, MaxPooling1D, Flatten

In [None]:
# Initiate the model - we'll use a sequential model so we can add to it
model = Sequential()

# Start with a convolutional layer:
#  * filters: The number of "features" we want to learn; number of patterns to try to identify
#  * kernel_size: The "window" to consider, i.e. we look at a rolling window captuiring [kernel_size] time points at once
#  * strides: How many time steps to "roll forward" each time we move the window
#  * activation: The activation function to use; convolutional layers typically use REctified Linear Unit function
#  * input_shape: We're feeding in observations each of shape 260{time points}*3{directional acceleration features}
model.add(Conv1D(filters=40, kernel_size=40, strides=2, activation="relu", input_shape=(260, 3)))
print(model.output_shape)

model.add(MaxPooling1D(pool_size=2))
print(model.output_shape)

# Another convolutional layer: this one finds "meta-patterns" in the patterns the first layer picked up
model.add(Conv1D(filters=40, kernel_size=10, activation="relu"))
print(model.output_shape)

# If the net is too large and computation too slow, we can reduce the number of parameters with max pooling
# This layer would reduce the number of parameters by half by combining ("pooling") parameters
# i.e. parameters get paired up (by position) and the maximum one only is kept
model.add(MaxPooling1D(pool_size=2))
print(model.output_shape)

# We still have a 3-dimensional set of parameters - we need to make this 2-dimensional, so we "flatten"
# (Unstack all the leaves and lay them out next to each other)
model.add(Flatten())
print(model.output_shape)

# We need to finish with a couple of dense layers: one to detect relationships between the (flattened)
# convolutional neurons, and...
model.add(Dense(100, activation="sigmoid"))
print(model.output_shape)

# ... one to present the output as a one-hot vector.
# (We typically use softmax in the very final layer since it provides a "stronger" signal than sigmoid)
model.add(Dense(15, activation="softmax"))
print(model.output_shape)

In [None]:
model.summary()

In [None]:
# We have to compile the network before we can run it, defining:
# * Loss function to use (always categorical cross-entropy for multi-class logistic regression)
# * Optimizer to use
#   ("adam" = "ADAptive Movement estimation", but e.g. "sgd" = "Stochastic Gradient Descent" will work, just slower)
# * Metrics to report (NOT used for adjusting parameters - that's what the loss function is for!)
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

In [None]:
# Fit it!
# * X_train and y_train are training data/labels
# * epochs: How many times to pass the training data through and update the network's parameters
# * batch_size: How many observations to include in each batch the optimizer sees
# * Also show us the accuracy for the cross-validation set
model.fit(X_train, y_train, epochs=10, batch_size=100, validation_data=(X_val, y_val))

At this point we could try to improve that cross-validation accuracy score, e.g. change network structure.

### Reporting

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
y_pred = model.predict_classes(X_test)
print(classification_report(np.argmax(y_test, axis=1), y_pred))
print(confusion_matrix(np.argmax(y_test, axis=1), y_pred))

---

## Visualising features

We can try to visualise the "features" of the time series which the convolutional layers of the net have learned to identify.

In [None]:
# Plot the weights from a given layer
def plot_filter(model, layer, k):
    x = model.layers[layer].get_weights()[0][:, :, k]
    plot_series(x)

In [None]:
plot_filter(model, 0, 4)

We can also see if there are any patterns in the autocorrelation plots which might suggest strong periodicity.

In [None]:
def plot_filter_corr(model, layer, k):
    weights = model.layers[layer].get_weights()[0][:, :, k]
    corrs = np.apply_along_axis(lambda y: np.correlate(y, y, mode="full"), 0, weights)
    plot_series(corrs[corrs.shape[0]//2:, :])

In [None]:
plot_filter_corr(model, 0, 4)

Let's plot each filter with its autocorrelation plot.

In [None]:
fig, ax = plt.subplots(ncols=5, nrows=2, figsize=(50, 20))

for k in range(5):
    plt.subplot(2, 5, 1+k)
    plot_filter(model, 0, k)
    plt.subplot(2, 5, 6+k)
    plot_filter_corr(model, 0, k)

In [None]:
fig.savefig("corrs.png")