# Exercise 3

Work on this before the next lecture on 1 May. We will talk about questions, comments, and solutions during the exercise after the third lecture.

Please do form study groups! When you do, make sure you can explain everything in your own words, do not simply copy&paste from others.

The solutions to a lot of these problems can probably be found with Google. Please don't. You will not learn a lot by copy&pasting from the internet.

If you want to get credit/examination on this course please upload your work to **your GitHub repository** for this course **before** the next lecture starts and post a link to your repository [in this thread](). If you worked on things together with others please add their names to the notebook so we can see who formed groups.

---

These are some useful default imports for plotting and [`numpy`](http://www.numpy.org/)

In [1]:
%config InlineBackend.figure_format='retina'
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (8, 8)
plt.rcParams["font.size"] = 14
from sklearn.utils import check_random_state

---

## Question 1

Finding analytic gradients of expressions. Use the backprop framework we built in the lecture to compute the analytic gradient of an expression. This is useful to get thinking about
writing computations as graphs and to see first hand that there is no magic involved
in automatically finding derivatives.

For example, using the expression: $f(x) = \sin(x^2)$: implement a module for $\sin(x)$, build the graph representing the function, plot the expression as well as the gradient as a function of $x$.

* add a new operation (e.g. $\sin, \cos, \exp, x^y$, ...) to the `Addition` and `Multiply` modules.
* build a new expression using the available expressions
* plot your expression as well as its gradient
* compare the gradient to one you worked out by hand (or some other software package)

ps. if you want to use automatic differentiation for serious projects outside of neural
network projects checkout packages like https://github.com/HIPS/autograd

In [2]:
class Multiply:
    def forward(self, x, y):
        self.x = x
        self.y = y
        return x * y
    
    def backward(self, dLdz):
        dzdx = self.y
        dLdx = dLdz * dzdx
        
        dzdy = self.x
        dLdy = dLdz * dzdy
        return [dLdx, dLdy]


class Add:
    def forward(self, x, y):
        self.x = x
        self.y = y
        return x + y
        
    def backward(self, dLdz):
        dzdy = 1
        dzdx = 1
        return [dLdz * dzdy, dLdz * dzdx]

In [8]:
def f_with_gradients(x, y, z):
    """ Implements (x+y)*z """
    q = Add()
    f = Multiply()
    
    q_out = q.forward(x, y)
    f_out = f.forward(q_out, z)
    
    # suppose L = u
    # u = (x+y)*z
    # let f(a,b) = a*b
    # q(a,b) = a + b
    # then
    # L = f( q(x,y), z )
    # dL = [ df/dq*dq/dx, df/dq*dq/dy, df/dz]
    # dL = [ z, z, x+y]
    grad_f = f.backward(1.)
    grad_q = q.backward(grad_f[0])

    # return [dL/dx, dL/dy, dL/dz]
    gradients = [grad_q[0], grad_q[1], grad_f[1]]
    return f_out, gradients

In [11]:
x = np.array([0.,1.,2.])
y = np.array([1.,3.,5.])
z = np.array([0.,1.,2.])
L = f_with_gradients(x, y, z)
# L = (x+y)*z
# dL = [ df/dq*dq/dx, df/dq*dq/dy, df/dz]
# dL = [ z, z, x+y]
print(L)

(array([  0.,   4.,  14.]), [array([ 0.,  1.,  2.]), array([ 0.,  1.,  2.]), array([ 1.,  4.,  7.])])


In [12]:
class Square:
    def forward(self, x):
        self.x = x
        return x**2
    
    def backward(self, dLdz):
        dzdx = 2*self.x
        return dLdz*dzdx
        
class Sin:
    def forward(self,x):
        self.x = x
        return np.sin(x)

    def backward(self, dLdz):
        dzdx = -np.cos(self.x)
        return dLdz*dzdx

In [14]:
def graph_sin_x_squared(x):
    f = Sin()
    q = Square()
    
    q_out = q.forward(x)
    f_out = f.forward(q_out)
    
    grad_f = f.backward(1.)
    grad   = q.backward(grad_f)
    
    return f_out, grad

test by computing
$ f = sin(x^2), x = \sqrt{\pi} $

which should yield
$ f = sin(\pi) = 0 $
and
$ \frac{df}{dx} = -2\sqrt{\pi}cos(\pi) = 2\sqrt{\pi} $

In [21]:
#print(2.0*np.sqrt(np.pi))
print(graph_sin_x_squared(np.sqrt(np.pi)))
assert( np.abs( graph_sin_x_squared(np.sqrt(np.pi))[1] / (2.0*np.sqrt(np.pi)) - 1) < 0.001 )

(5.6655388976479796e-16, 3.5449077018110318)


---

## Question 2

Use the circle data set from exercise 1 and build a neural network classifier that can solve the problem (scikit-learn provides a MLPClassifier classifier that implements a neural network). Comment on:

* what is the minimum number of layers
* what is the minimum width of each layer
* does the answer change if you provide polynomial features?
* thinking about how NNs distort and transform the problem space, can you create
  a visualisation of what it is the network is doing to make this problem linearly
  separable?
* is there a difference between using the tanh and ReLU activation functions?

In [None]:
# your answer

---

## Question 2.5

Use the spiral data set build a neural network classifier that can solve the problem (scikit-learn provides a MLPClassifier classifier that implements a neural network). Comment on:
* what is the minimum number of layers
* what is the minimum width of each layer
* is there a difference between using the tanh and ReLU activation functions?
* (bonus) build the same neural network using `keras` instead.
* (bonus) can you create a visualisation to show how the network transforms the problem? (Tim isn't sure this can be done, so don't spend forever on this.)

In [None]:
def make_spiral():
    N = 100 # number of points per class
    K = 3 # number of classes
    X = np.zeros((N*K, 2)) # data matrix (each row = single example)
    y = np.zeros(N*K, dtype='uint8') # class labels
    for j in range(K):
        ix = range(N*j, N*(j+1))
        r = np.linspace(0.0, 1, N) # radius
        t = np.linspace(j*4, (j+1)*4, N) + np.random.randn(N)*0.2 # theta
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
        y[ix] = j

    return X, y

In [None]:
# your answer

---

## Question 3

Build a fully connected neural network and a ConvNet to classify hand written digits.

scikit-learn's NN implementation does not support convolutional layers so it is probably
best to implement both using `keras`.

Experiment with different network architectures, different optimisers and such.

You should be able to achieve accuracies > 95% pretty quickly. With a bit of experimenting
you can probably reach 98% or 99%.

This question is about getting you comfortable using `keras` and constructing networks
from basic building blocks. Feel free to experiment, brnach out and try different things.

Note: this might take a while to train, unfortunately.

In [None]:
# loading the MNIST (or digits) dataset in keras
from keras.datasets import mnist

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# the first time you run this you need to have internet so it can download the data

In [None]:
# some of the layers we discussed in class, checkout the documentation if you need more
# or want to experiment
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

# your solution