## Set-up: Run the following cell to configure our working directory appropriately.
If we are running on Google Colab, the following cell will clone the notebooks into our Colab filespace. If instead running locally, it will add the parent directory of the notebooks to the path such that we can import the custom module ```funcs```.

In [None]:
import sys
if 'google.colab' in sys.modules:
    !git clone https://github.com/harry-rendell/MLworkshop.git
    sys.path.append('./MLworkshop')
else:
    sys.path.append('..')

## Tips!
* In Google Colab you can pass your cursor over a function to see what it does.
* If instead you are running locally, you can use Shift+Tab while your cursor is in a function to see what it does.

# Introduction
---
We are going to build and train a simple neural network to classify the MNIST dataset. This dataset contains 1,796 grayscale images of handwritten digits from 0 to 9. The images are an 8x8 grid of pixel values. Although this is an easy task for a human, it's not so easy for a computer. Since every image in the database is unique, we need a model which can adapt to different handwriting styles and classify them accurately. This is where machine learning comes in!

In [None]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
# Keras imports
from tensorflow.python.keras.utils.vis_utils import plot_model
from tensorflow.python.keras.models import Model
from tensorflow.python.keras.layers import Input, Dense, Flatten, Reshape, Dropout
from tensorflow.keras.regularizers import l1_l2
# Custom imports
from funcs.plotting import plot_classifications, plot_training, plot_data

# Load in data
---

In [None]:
from sklearn import datasets
digits = datasets.load_digits()
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(digits.images, digits.target, test_size=0.6, shuffle=True)
x_train = x_train/x_train.max() # Normalise data so pixel values are between 0 and 1
x_test  = x_test/x_train.max()
input_shape  = (8,8)
n_classes = 10 # we have 10 different classes, ie 10 integers from 0 to 9

# Plot the data
---
### Let's see what we're working with here.

In [None]:
plot_data(x_train, y_train)

# Making your model
---
### This is the basic structure of constructing a dense neural network using Keras
> ```
> Line 1: i = Input(shape)
> Line 2: x = Flatten()(i)
> Line 3: x = Dense(n_nodes, activation='relu')(x)
> Line 4: x = Dense(n_nodes, activation='relu')(x)
> ...
> Line 5: o = Dense(n_classes, activation='softmax')(x)
>
> Line 6: mymodel = Model(i, o, name='My first model!')
> Line 7: mymodel.summary()

* Line 1: We set the input of the model using the shape of our input. Since we are using 8x8 images in the training data, our input shape is (8,8). You can use the ```input_shape``` parameter defined earlier.
* Line 2: This step flattens the 2D input with shape (8,8) into a 1D array with shape (64,), since Dense networks require a 1D input. Note that for every layer we need to pass the previous layer to the current one. Here we do this by putting (i) at the end which passes the input to Flatten().
* Line 3: Here we create the first layer. We can choose how many nodes we want in this layer (more nodes = able to model complex data better, but takes longer to train). We also need to set the activation, a sensible choice would be activation='relu'.
* Line 4: We can add more layers like this, provided we pass the previous layer to the new layer by putting (x) at the end as before.
* Line 5: We define the final ouput layer. The output shape needs to match the shape of the label data (y_train), ie 10. These 10 numbers will correspond to the probability that the input is each of the numbers 0-9. We must use the 'softmax' activation function here as it ensures the probabilities sum to 1.
* Line 6: We construct the model using the ```Model()``` function. We pass the input and output. You can also name the model anything you like, e.g. name = 'My first model!'
* Line 7: Prints a summary of our model

In [None]:
### Use the template above to make your model here
i = Input(shape=input_shape)
x = Flatten()(i)
x = Dense(30, activation='relu')(x)
o = Dense(n_classes, activation='softmax')(x)

# Create Model
fcc = Model(i, o, name='Dense')
fcc.summary()

# Training your model
---
### Now you have defined your model, use the template below to compile and train it.
> ```
> Line 8: mymodel.compile(...)
> Line 9: mymodel_history = mymodel.fit(...)


* Line 8: Here we compile the model using ```.compile()```. We need to pass the following: 
    * optimizer='adam' - a particularly good adaptive optimizer. See https://arxiv.org/abs/1412.6980 if you are interested
    * loss='sparse_categorical_crossentropy' - we need to use this loss function for classification tasks
    * metrics='accuracy' - ask the model to calculate the accuracy during training


* Line 9: Train the model using ```.fit()```. We need to pass a few things here:
    * x - training images
    * y - training labels
    * epochs - how long to train for. ~100 is a good start.
    * batch_size - how many images to group up for each training step. ~32 is sensible.
    * validation_data - the test images and labels, ie (x_test, y_test).
    * verbose - Set this to True if you wish to see the progress of training. Otherwise set to False.
    
Note, you will need to rerun lines 1-9 if you wish to start training from scratch, as if you only run lines 8 & 9 it will continue where it left off.

In [None]:
### Use the template above to compile and fit your model here
fcc.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_fcc = fcc.fit(x=x_train, y=y_train, epochs=100, validation_data=(x_test, y_test), batch_size=32, verbose=1)

# Plot progress of training
---

### Now you have defined your model, use the template below to compile and train it.
> ```
> Line 10: plot_training(...)
> Line 11: mymodel.evaluate(...)


* Line 10: Pass the output from Line 9 to my custom function plot_training() to see how the training progressed over time.

* Line 11: Evaluate the model on the test data to find the final accuracy. Note that this function returns two numbers, loss and accuracy, but we are only interested in the accuracy at this point.

In [None]:
### Use the template above to plot the training of your model, and evaluate the final test accuracy.
plot_training(history_fcc)

# Calculate accuracy on entire test set
_, acc = fcc.evaluate(x_test, y_test, verbose=0)
print("Testing accuracy: {:.1f}%".format(acc * 100.))

# Plot classifications
---
### Let's plot some of the test data along with the predicted classifcations from the network.
> ```
> Line 11: predicted = mymodel.predict(x_test).argmax(axis=-1)
> Line 12: plot_classifications(x_test, y_test, predicted)


* Line 11: Ask the network to predict the labels of the test data. Then choose the one with the highest probability (argmax)

* Line 12: Use my custom function to plot a grid of test data with their true and predicted labels. Note, misclassifications will appear in red.

In [None]:
### Use the template above to predict and plot the classifications of the test data
predicted = fcc.predict(x_test).argmax(axis=-1) 
plot_classifications(x_test, y_test, predicted)

# What happens if we train for too long?
---
### Recompile your model and train your network for longer, what happens to the test accuracy?

In [None]:
### Copy your code corresponding to lines 1-11, then increase epochs (max 500 otherwise it will take too long)
i = Input(shape=input_shape)
x = Flatten()(i)
x = Dense(30, activation='relu')(x)
o = Dense(n_classes, activation='softmax')(x)

# Create Model
fcc = Model(i, o, name='Dense')
fcc.summary()

In [None]:
fcc.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_fcc = fcc.fit(x=x_train, y=y_train, epochs=300, validation_data=(x_test, y_test), batch_size=32, verbose=1)

In [None]:
plot_training(history_fcc)

# Calculate accuracy on entire test set
_, train_acc = fcc.evaluate(x_train, y_train, verbose=0)
_, test_acc  = fcc.evaluate(x_test, y_test, verbose=0)
print("Training accuracy: {:.1f}%".format(train_acc * 100.))
print("Testing accuracy: {:.1f}%".format(test_acc * 100.))

# Improving test accuracy
---
### Some techniques we can use to improve test accuracy:
* Dropout - During training, a random fraction of nodes are deactivated for each training step. E.g. inserting Dropout(0.1) after a Dense() layer will randomly deactivate 10% of the nodes of that Dense layer, for each training step. This benefits the network as it encourages it to behave like a combination of smaller networks, each of which can continue to work even when some fail to classify. To use dropout, insert the  ```Dropout()``` function after the Dense() layer that you would like to apply the dropout to.
* L1/L2 Regularisation - Gradually sets unused weights to zero. You can use the ```l1_l2()``` function and pass it to a Dense layer using ```Dense(..., bias_regularizer=l1_l2() )```

In [None]:
### Copy your code corresponding to lines 1-11, then add Dropout and/or L1/L2 Regularisation 
from tensorflow.keras.regularizers import l1_l2
# Connect input, intermediate, and output layers using the Keras functional API
i = Input(shape=input_shape)
x = Flatten()(i)
x = Dense(50, activation='relu', bias_regularizer=l1_l2())(x)
x = Dropout(0.2)(x)
o = Dense(n_classes, activation='softmax')(x)


# Create Model
fcc = Model(i, o, name='Dense')
fcc.summary()


fcc.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history_fcc = fcc.fit(x=x_train, y=y_train, epochs=100, validation_data=(x_test, y_test), batch_size=32, verbose=1)

plot_training(history_fcc)

# Calculate accuracy on entire test set
_, train_acc = fcc.evaluate(x_train, y_train, verbose=0)
_, test_acc = fcc.evaluate(x_test, y_test, verbose=0)
print("Training accuracy: {:.1f}%".format(train_acc * 100.))
print("Testing accuracy: {:.1f}%".format(test_acc * 100.))

# Challenge!
### I was able to make a network with a test accuracy 98.1%. Can you do better than this using Dropout and Regularisation?

In [None]:
### Make your best model here!
