<img style="float: left; margin-bottom: 1em" src="images/PRACE.png" width="200">
<img style="float: right; margin-bottom: 1em" src="images/surfsara.png" width="150">
<hr style="clear: both"/>

# Improving neural networks
In this notebook you will work on improving a neural network on a new data set. This data set is taken from the [PCam](https://github.com/basveeling/pcam) dataset, and consists of a thousand patches taken from a much larger histopathological slices of lymph node sections.

Each patch will either have metastatic (cancerous) tissue or not. As in the previous notebooks, your task is to train a network that will solve this classification task.

Let's load the data, and plot the first 16 images:

In [None]:
import lib
import keras

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

X, Y, labels = lib.dataset_pcam()

X = X[:1000]
Y = Y[:1000]

print('Data set size: {}'.format(X.shape))

lib.plot_examples(X[:16], Y[:16], labels);

Let's train a dense network to serve as a performance baseline. We can use it to compare to more advanced architectures like CNNs. Run the next cell and take note of the final accuracy on the training set.

In [None]:
from keras.models import Sequential
from keras.layers import Flatten, Dense
from keras.optimizers import Adam

model = Sequential()
model.add(Flatten(input_shape=X.shape[1:]))
model.add(Dense(1024, activation='relu'))
model.add(Dense(512, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.summary()

model.compile(Adam(lr=0.000001), loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(X, Y, validation_split=0.2, epochs=25, batch_size=32)
lib.plot_history(history);

## Exercise 1
Now implement a CNN to solve the problem by filling out the skeleton below.

How does a CNN perform on this dataset compared with a dense network? Why do you think it performs worse or better?

In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense
from keras.optimizers import Adam

model = Sequential()
model.add(Conv2D(32, kernel_size=3, activation='relu', input_shape=X.shape[1:]))
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

# Add additional conv and max pool layers here

# Add one or more dense layers here

model.add(Dense(2, activation='softmax'))
model.summary()

model.compile(Adam(lr=<FILL IN>), loss="categorical_crossentropy", metrics=["accuracy"])
history = model.fit(X, Y, validation_split=0.2, epochs=25, batch_size=32)
lib.plot_history(history);

In [None]:
# PLEASE FILL IN YOUR ANSWER HERE

## Exercise 2
Inspect the class activation maps for the different layers by running the next cell. Do they make sense?

In [None]:
example_index = 1
lib.plot_cam(model, X[example_index], np.argmax(Y[example_index]), labels);

## Exercise 3
Although the network is performing quite well, we can speed up convergence by adding **batch normalisation** layers. We add those between the convolutional layers and the max pooling layers.

In the next cell we import the batch normalisation layer, called [`BatchNormalization`](https://keras.io/layers/normalization/) in Keras. Copy-paste the network you trained in exercise 3 into the cell below, and add the batch normalisation layers **between the `Conv2D` and `MaxPool2D` layers**.

Rerun the training. How do the layers affect the convergence speed? Why do you think that is?

In [None]:
from keras.layers import BatchNormalization

# COPY-PASTE YOUR NETWORK FROM EXERCISE 3 HERE, AND ADD THE BATCH NORMALISATION LAYERS

In [None]:
# FILL IN YOUR ANSWER HERE

## Exercise 4
Look at the loss curves for the training and validation set in exercise 3. At what point does the model start overfitting? How can you tell?

Motivate your answer in the cell below.

In [None]:
# FILL IN YOUR ANSWER HERE

## Exercise 5
Add dropout layers between the dense layers and train the network again. Use a fairly high rate (e.g. 0.5) of dropout and inspect the resulting validation and training loss curves. Does it have any effect? If so, how?

We have imported the dropout layer (simply called [`Dropout`](https://keras.io/layers/core/#dropout) in Keras) for you in the first line of the next cell. Copy-paste your original network from exercise 3 and add the dropout layers.

**Hint**: the dropout rate is a floating point number as the first parameter of `Dropout`.

In [None]:
from keras.layers import Dropout

# COPY-PASTE YOUR NETWORK FROM EXERCISE 3 HERE, AND ADD THE DROPOUT LAYERS

In [None]:
# FILL IN YOUR ANSWER HERE

## Exercise 6
We have regularised the dense part of the network by using dropout. We can do the same for the convolutional layers by penalising large weights in the filter kernels. We do this by adding an l2 regulariser on the kernel weights.

In the cell below we load the [`l2`](https://keras.io/regularizers) regulariser and add it to the first convolutional layer. For this we use the `kernel_regularizer` parameter in the `Conv2D` layer. In this example we set the strength of the regularization to 0.01. Depending on the data set and optimisation you may want to try different values.

Apply l2 regularisation to the kernels of all convolutional layers of your network from exercise 5, and rerun the training process. How can you tell we more effectively control overfitting now?

In [None]:
# EXAMPLE:

from keras.regularizers import l2

model = Sequential()
model.add(Conv2D(32, kernel_size=3, activation='relu', input_shape=X.shape[1:], kernel_regularizer=l2(0.01)))
model.add(BatchNormalization())
model.add(MaxPool2D(pool_size=(2, 2), strides=(2, 2)))

# COPY-PASTE THE REST OF YOUR NETWORK HERE, OR COPY THE kernel_regularizer PARAMETER TO YOUR OWN NETWORK

In [None]:
# FILL IN YOUR ANSWER HERE

## Exercise 9: bonus
Increase the regularisation strength (e.g. by an order of magnitude, going from 0.01 to 0.1) and/or increase the dropout rate. What happens with the convergence? Why is that? Motivate your answer in the cell below.

In [None]:
# PLEASE COPY-PASTE YOUR NETWORK HERE FROM EXERCISE 8, AND MODIFY THE REGULARISATION STRENGTH AND/OR DROPOUT RATE

In [None]:
# PLEASE FILL IN YOUR ANSWER HERE