# Week 9: Neural networks I 
[Jacob Page](jacob-page.com)

In this workshop we will work through a series of model problems using (fairly shallow) fully connected neural networks. You will see how appropriately designed networks can construct complicated functions by visualising the internal (learnt) representations in the hidden layer(s). 

The tasks we are doing today are based on the excellent blog post by Chris Olah (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/). There are additional links on that page to other excellent resources on network visualisation. 

---


As you work through the problems it will help to refer to your lecture notes. The exercises here are designed to reinforce the topics covered this week. The lecture notes include a small amount of documentation on the keras library, but please ask/discuss with the tutors if you get stuck, even early on! This may the first time many of you have seen keras, and things may be a little counter intuitive initially. 


# Imports

We're only going to need a couple of standard libraries this week, as well as keras. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow.keras as keras

The following code boxes will allow you to visualise your model training. Scroll back up to take a look once you get to a "model.fit" statement! (You'll need to refresh the dashboard with the refresh button on the top right)

In [None]:
import os, datetime
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))

In [None]:
%load_ext tensorboard
%tensorboard --port=5036 --logdir $logdir
tensorboard_callback = keras.callbacks.TensorBoard(logdir, histogram_freq=1) 

# Exercise 0
In the first part of this workshop we will work with a "clean" dataset, generating data from purely deterministic functions.

First, generate data according to 
\begin{equation}
  x_2 = \cos(x_1),
\end{equation}
with $x_1 \in [-\pi, \pi]$. This data will be labelled as belonging to class $y=0$.

For data in class $y=1$, generate 
\begin{equation}
  x_2 = a + \cos(x_1),
\end{equation}
where $a=1$ for now.

You should generate ~2000 samples for $x_1$ (uniform distribution over $x_1\sim U[-\pi,\pi]$).

For use in keras, it helps to build numpy arrays of following shape 
$$X.\text{shape}=(N, D)$$
with a corresponding set of labels
$$y.\text{shape}=(N,)$$
where $N$ is the number of samples and $D$ the number of features ($D=2$ here).

# Exercise 1
Build a logistic regression model in keras. 
The model should consist of an input layer and a fully-connected output layer. 
See lecture notes for details of how to create these objects, or ask your tutors. 

# Exercise 2 
Compile the model. At this stage you need to select a loss function (specified via the "loss" keyword) and an optimizer. Any optimizer will do -- you could use one of the "exciting" ones, e.g. Adam.  

# Exercise 3 
Train the model with model.fit. Pass the keyword argument 

```
# callbacks=[tensorboard_callback]
```

to visualise above. 

You might also want to split the dataset into a training and validation component via 


```
# validation_split=0.X
```



# Exercise 4
Generate "test" data uniformly over $x_1\in[-\pi, \pi]$ and $x_2\in[-1, 1+a]$. Use your trained model to predict the $y$ labels for this data and visualise the results. Overlay the original curves on the output -- is the result what you expect, and why?

# Exercise 5
Now create a new model by adding a fully-connected hidden layer with 2 neurons between your input and output above. 

Train the new model and visualise the same test data from above. 

# Exercise 6 
Create a new model featuring only your input and hidden layers that were trained above. Visualise the output of the two hidden neurons for the training and test datasets. What does the decision boundary (that separates y=0 from y=1) look like in this representation?

# Exercise 7
Now we're going to try and break the 2-hidden-neuron classifier above. We'll do this by generating a dataset for which there is no way for the two neurons in the hidden layer to create a linear decision boundary. 

Generate two-dimensional, normally distributed data (class $y=0$), according to $\mathbf X \sim N(\mathbf 0, \mathbf I_{2\times 2})$.

For the second class, $y=1$, generate one dimensional data $r \sim N(a,\sigma^2)$, with $a=4$ and $\sigma=0.2$, before generating $x_1 = r\cos \theta$, $x_2=r\sin \theta$, with $\theta \sim U[0, 2\pi)$ (this is actually not going to give us a normal distribution through the annulus, but it's fine for our purposes). 

Visualise the results to verify things are behaving as expected.

# Exercise 8 
Train the simple neural network you defined about on this new dataset, and visualise the results on a new test dataset of uniformly distributed points. Where is the decision boundary? (This will likely differ each time you train).

Of course, we should really generate test data using the same distributions as the training set, but for visualisation it's helpful to have lots of data all over the $x_1, x_2$ plane. 

Visualise the output of the hidden layer.

(Aside -- explore how many epochs you need to train the model; 1 epoch=1 pass through the training data)

# Exercise 9 
What is the simplest modification to your architecture that would be able to classify this dataset? 

Attempt to verify your predictions following a similar workflow to the above. 

# Exercise 10 (optional)
Now redefine your data to overlap by setting $a=2$.

Complicate your network a little -- add extra layers, more neurons etc. Train and watch validation loss. Can you make your model overfit? How might you counteract this? You may want to read the keras documentation a little to decide exactly how to implement your ideas (e.g. custom losses, dropout layers...)

You may want to refer to the keras documentation, e.g. https://keras.io/api/layers/