# Introduction to Neural network (Keras + MNIST)

### Aims

The main concepts covered in this notebook are: 

>* getting familiar with basic keras
>* input-output with keras
>* construciton of neural network models with keras

In [None]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

The following code boxes will allow you to visualise your model training. Scroll back up to take a look once you get to a "model.fit" statement! (You'll need to refresh the dashboard with the refresh button on the top right)

In [None]:
import os, datetime
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

In [None]:
%load_ext tensorboard
%tensorboard --port=5036 --logdir $logdir
tensorboard_callback = keras.callbacks.TensorBoard(logdir, histogram_freq=1) 

# Part 1: Sythetic Data

In the first part of this workshop we will work with a "clean" dataset, generating data from purely deterministic functions.

First, generate data according to 
\begin{equation}
  x_2 = \cos(x_1),
\end{equation}
with $x_1 \in [-\pi, \pi]$. This data will be labelled as belonging to class $y=0$.

For data in class $y=1$, generate 
\begin{equation}
  x_2 = a + \cos(x_1),
\end{equation}
where $a=1$ for now.

You should generate ~2000 samples for $x_1$ (uniform distribution over $x_1\sim U[-\pi,\pi]$).

For use in keras, it helps to build numpy arrays of following shape 
$$X.\text{shape}=(N, D)$$
with a corresponding set of labels
$$y.\text{shape}=(N,)$$
where $N$ is the number of samples and $D$ the number of features ($D=2$ here).

### 🚩 Exercise 1 (CORE): Building a Baseline Model

a) Build a logistic regression model in keras.  The model should consist of an input layer and a fully-connected output layer. No hideen layer for now.
See lecture notes for details of how to create these objects, or ask your tutors. 

b) Compile the model. At this stage you need to select a loss function (specified via the "loss" keyword) and an optimizer. Any optimizer will do -- you could use one of the "exciting" ones, e.g. Adam.  

c. Train the model with model.fit. Pass the keyword argument 

```
# callbacks=[tensorboard_callback]
```

to visualise above. 

You might also want to split the dataset into a training and validation component via 


```
# validation_split=0.X
```

### 🚩 Exercise 2 (CORE): Testing your model

Generate "test" data uniformly over $x_1\in[-\pi, \pi]$ and $x_2\in[-1, 1+a]$. Use your trained model to predict the $y$ labels for this data and visualise the results. Overlay the original curves on the output -- is the result what you expect, and why?

The preditcion is clearly wrong. We xan assume that the absence of hidden layers has left the system with too little complexity to recreate the true distribution. In toher word, the poool of functions in the form $G(x) = \sum_n \sigma(\alpha_n x_n + \theta_n)$ is too small to correctly approximate $f(x)$

### 🚩 Exercise 3 (CORE): Building a Baseline Model

Now create a new model by adding a fully-connected hidden layer with 2 neurons between your input and output above. 

Train the new model and visualise the same test data from above.

# Part 2: MNIST Dataset and Study of the Hidden Layers

### Loading,exploring, and preparing the data

For the second part, we are going to use the MNIST dataset, partly because you are already familiar with it, and partly because it comes with tensorflow, the most wideely used library for neural networks in python.

a. Load the MNIST dataset using `keras.datasets.mnist.load_data()` and split it into train and test.

b. Print the shapes of the training and testing data and labels.

c. Display the first image in the training set and its label.

d. Normalize the pixel values of the training and testing data to be between 0 and 1.

e. Convert the labels to one-hot encoded vectors using `keras.utils.to_categorical()`.

### 🚩 Exercise 4 (CORE): Building a Baseline Model

a. Build a sequential neural network model with one hidden layer of 128 neurons and an output layer with 10 neurons. Let's try using different activation fucntions (there is no real need to do this here, except learn how to implement it in keras). For example, use ReLU activation for the hidden layer and softmax for the output layer.

b. Compile the model using the Adam optimizer, categorical crossentropy loss, and accuracy metric You can do this using the parameters

'''
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
'''

c. Train the model for 10 epochs with a batch size of 128 and a 10% validation split.

### 🚩 Exercise 5 (CORE): Exploring Network Depth

a. Evaluate the model on the test set and print the test loss and accuracy.

b. Plot the training and validation loss curves.

c. Generate and visualize a confusion matrix for the test set predictions.

### 🚩 Exercise 6 (CORE): Exploring Network Depth

a. Build a deeper network with two hidden layers, each with 128 neurons and ReLU activation, and an output layer with 10 neurons and softmax activation.

b. Compile and train the model as in Exercise 1.

c. Evaluate the model on the test set and compare the results with the single-layer model.

d. Plot the training and validation loss curves for the deep network.

e. Generate and visualize a confusion matrix for the deep network's test set predictions.

### 🚩 Exercise 3 (CORE): When to Stop Training?

Compare the training and validation loss curves for `model` (single hidden layer) and `model_deep` (two hidden layers).

a. In which scenario do you observe signs of overfitting? Explain your reasoning.

b. Based on these graphs, suggest a stopping criterion for training to prevent overfitting.

c. How does the depth of the network influence the point at which overfitting begins?

### 🚩 Exercise 4 (CORE): Going Deep

Let's now validate the results in the previous question by increasing the number of hidden layers. We hope to see that the trends we observed when going from one to two hidden layers will be even more pronounced.

a. Build a neural network with 10 hidden layers, each with 128 neurons and ReLU activation, and an output layer with 10 neurons and softmax activation.

b. Compile and train the model for 20 epochs with a batch size of 128 and a 10% validation split.

c. Evaluate the model on the test set and plot the training and validation loss curves.

d. Discuss any challenges encountered during training and potential solutions.

### 🚩 Exercise  (EXTRA): Regularisation Techniques (10 Layers Deep)

We have briefly touched on regularisation on Monday, which describes the process of removing complexity from an overfitting network. Here, let's try to implement dropout regularization, which is a technique that randomly ignores ("drops out") some layers when the network is overfitting. Look up the technique implementation before having a go. In keras, this is implemented using the "Dropout" method for the dropout layers, which accepts a parameter $p$ between 0 and 1, and its effect is to randomly set input units for that layer to 0 with probability $p$ at each step during training time.

a. Implement dropout regularization in the 10-layer deep network after each hidden layer with a dropout rate of $p=0.2$.

b. Train the regularized model for 20 epochs and compare the training and validation loss curves with the original 10-layer deep model.

c. Discuss the impact of dropout regularization on the deep network's performance and generalization.

**Discussion:**



# Competing the Worksheet

At this point you have hopefully been able to complete all the CORE exercises and attempted the EXTRA ones. Now 
is a good time to check the reproducibility of this document by restarting the notebook's
kernel and rerunning all cells in order.

Before generating the PDF, please go to Edit -> Edit Notebook Metadata and change 'Student 1' and 'Student 2' in the **name** attribute to include your name. If you are unable to edit the Notebook Metadata, please add a Markdown cell at the top of the notebook with your name(s).

Once that is done and you are happy with everything, you can then run the following cell 
to generate your PDF. Once generated, please submit this PDF on Learn page by 16:00 PM on the Friday of the week the workshop was given. 

In [None]:
!jupyter nbconvert --to pdf mlp_week09_workshop.ipynb 