# From Single layer Percetrons to Multi-layer neural networks

### Outline
* Multi-layer structure
* Activation functions
* Back-Propagation Algorithm
* Stepping through Theano, Lasagne and NoLearn
* Tips

# Multi-layer structure
![Multi-layer neural network](http://ufldl.stanford.edu/tutorial/images/Network3322.png)
Source: http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

# Activation functions
* Choose an activation function which has a simple derivative

## Sigmoid function

$$ f(x) = \frac{1}{1+e^{-x}} $$

$$ f'(x) = f(x) (1-f(x)) $$

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-5, 5, 1000)
f = 1/(1+np.exp(-(x))) # +0.1 to avoid dividing by 0
fig = plt.figure(figsize=(10,5))
ax = fig.gca()
ax.grid()
plt.plot(x, f, color='black')
plt.xlim(x.min(), x.max())

## Hyperbolic tangent function

$$ f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} $$

$$ f(x) = 2 * \frac{1}{1+e^{-2x}} - 1$$

$$ f'(x) = 1 - f(x)^2$$

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-5, 5, 1000)
f = 2/(1+np.exp(-(2*x)))-1 # +0.1 to avoid dividing by 0
fig = plt.figure(figsize=(10,5))
ax = fig.gca()
ax.grid()
plt.plot(x, f, color='black')
plt.xlim(x.min(), x.max())

* [Great comparison of activation functions](https://en.wikipedia.org/wiki/Activation_function)

# Backpropagation Algorithm

1. Compute forward pass by calculating the activations for $a^{(2)}$ and $a^{(3)}$
2. Calculate the cost function $$ J(w) = \frac{1}{2}(a^{(3)} - y)^2$$
3. Calculate the errors $$ \delta^{(3)} = a^{(3)} - y $$
4. Calculate the error for the hidden layer $$ \delta^{(2)} = (W^{(2)})^T \delta^{(3)} * \frac{\delta \phi (z^{(2)})}{\delta z^{(2)}}$$
Interestingly, $$ \frac{\delta \phi (z^{(2)})}{\delta z^{(2)}} = (a^{(2)} \dot (1-a^{(2)}))$$ 

5. Calculate the change of the weights
$$ \Delta^{(l)}_{i,j} = \Delta^{(l)}_{i,j} + a^{(l)}_{j} \delta^{(l+1)}_{i}$$
6. Update the weights
$$ W^{(l)} = W^{(l)} - \eta \Delta^{(l)} $$

# Tools to calculate neural networks

 * Theano: Symbolic computation library for Python (Cuda support)
 * Lasagne: Neural network library based on Theano
 * NoLearn: Python wrapper for Lasagne


## NoLearn
In NoLearn, you can define the network layers as a Python list

```
layers = [
    (InputLayer, {'shape': (1, X.shape[1],)}),
    (DenseLayer, {'num_units': 2, 'nonlinearity': sigmoid}),
    (DenseLayer, {'num_units': 2, 'nonlinearity': softmax}),
]
```

and pass it to the neural network definition
```
net1 = NeuralNet(
    layers=layers,
    max_epochs=100,
    update_learning_rate=1,
    train_split=TrainSplit(eval_size=0),
    verbose=3,
)
```

Use `net.fit(X, y)` and `net.predict(X)` to train and for your prediction, respectively.

In [None]:
## Things to do:
* 

# XOR prediction with a Multi-layer neural network

In [None]:
import numpy as np

from lasagne.layers import DenseLayer
from lasagne.layers import InputLayer
from lasagne.nonlinearities import softmax, sigmoid

from nolearn.lasagne import TrainSplit
from nolearn.lasagne import NeuralNet

data_set = np.array([
    [0, 0, 0],
    [0, 1, 1],
    [1, 0, 1],
    [1, 1, 0],
])

X = data_set[:, :2]
y = data_set[:, 2:]
X = np.array(X).astype(np.float32)
y = np.array(y).ravel().astype(np.int32)

layers = [
    (InputLayer, {'shape': (1, X.shape[1],)}),
    # add hidden and output layers
    # ...
]

net1 = NeuralNet(
    layers=layers,
    # setup network training parameters
    # ...
)
net1.fit(X, y)

In [None]:
# Test the model by predicting the output for (1, 1)
net1.predict([[1, 1],])

# Credit card approval prediction

In [None]:
# data from https://onlinecourses.science.psu.edu/stat857/node/215
import pandas as pd
import numpy as np
from lasagne.layers import DenseLayer
from lasagne.layers import InputLayer
from lasagne.nonlinearities import softmax, tanh

from nolearn.lasagne import NeuralNet
from nolearn.lasagne import TrainSplit

training_set = pd.read_csv('../../data/German_credit_card_training_500.csv')
training_set = training_set.sort_values(
    ['Creditability',]).head(
    2 * len(training_set[(training_set['Creditability'] == 0)])
)
# test_set = pd.read_csv('german_credit_dataset/Test50.csv')
# extract the creditability column as y vector
y = training_set['Creditability'].values
# drop the creditability column from the dataset
training_set.drop('Creditability', axis=1, inplace=True)

# remaining dataset is used as input matrix
X = np.array(training_set.values).astype(np.float32)
y = np.array(y).astype(np.int32)

# apply some very simple normalization to the data
X -= X.mean()
X /= X.std()

credit_approval_net = NeuralNet(
    layers=[  # three layers: one hidden layer
        (InputLayer, {'shape': (None, X.shape[1],)}),
        # add hidden and output layers
        # ...
        ],
    # setup the training parameters
    )

credit_approval_net.fit(X, y)

In [None]:
# determine score on test set
training_set = pd.read_csv('../../data/German_credit_card_test_500.csv')
training_set = training_set.sort_values(
    ['Creditability',]).head(
    2 * len(training_set[(training_set['Creditability'] == 0)])
)
# test_set = pd.read_csv('german_credit_dataset/Test50.csv')
# extract the creditability column as y vector
y_test = training_set['Creditability'].values
# drop the creditability column from the dataset
training_set.drop('Creditability', axis=1, inplace=True)

# remaining dataset is used as input matrix
X_test = np.array(training_set.values).astype(np.float32)
y_test = np.array(y_test).astype(np.int32)

# apply some very simple normalization to the data
X_test -= X_test.mean()
X_test /= X_test.std()

# run test set with test data
credit_approval_net.score(X_test, y_test)

## Classify hand-written numbers

In [None]:
import cPickle, gzip, numpy

# Load the dataset
f = gzip.open('../../data/mnist.pkl.gz', 'rb')
training_set, valid_set, test_set = cPickle.load(f)
f.close()

In [None]:
X_train = training_set[0]
y_train = training_set[1]
X_test = test_set[0]
y_test = test_set[1]

In [None]:
import matplotlib
import matplotlib.pyplot as plt
%pylab inline

def get_images(training_set):
    """ Return a list containing the images from the MNIST data
    set. Each image is represented as a 2-d numpy array.
    
    source: https://github.com/mnielsen/neural-networks-and-deep-learning/blob/master/fig/mnist.py
    """
    flattened_images = training_set[0]
    return [np.reshape(f, (-1, 28)) for f in flattened_images]

def plot_10_by_10_images(images):
    """ Plot 100 MNIST images in a 10 by 10 table. """
    figs, axes = plt.subplots(4, 4, figsize=(6, 6))
    for i in range(4):
        for j in range(4):
            axes[i, j].imshow(-X_train[i + 4 * j].reshape(28, 28), cmap='gray', interpolation='none')
            axes[i, j].set_xticks([])
            axes[i, j].set_yticks([])
            axes[i, j].set_title("Label: {}".format(y[i + 4 * j]))
            axes[i, j].axis('off')
    
images = get_images(training_set)
plot_10_by_10_images(images)

In [None]:
import pandas as pd
import numpy as np
from lasagne.layers import DenseLayer
from lasagne.layers import InputLayer
from lasagne.nonlinearities import softmax, tanh

from nolearn.lasagne import NeuralNet
from nolearn.lasagne import TrainSplit

X = X_train.astype(np.float32)
y = y_train.astype(np.int32)

# apply some very simple normalization to the data
X -= X.mean()
X /= X.std()

mnist_net = NeuralNet(
    layers=[  # three layers: one hidden layer
        (InputLayer, {'shape': (None, X.shape[1], )}),
        (DenseLayer, {'num_units': 50, 'nonlinearity': sigmoid}),
        (DenseLayer, {'num_units': 10, 'nonlinearity': softmax}),
        ],
    update_learning_rate=0.1,
    max_epochs=5,  # we want to train this many epochs
    verbose=2,
    train_split=TrainSplit(eval_size=0.25),
    )

mnist_net.fit(X, y)

In [None]:
mnist_net.score(X_test, y_test)

# Tips

* [Effective Backpropagation, pdf](http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf)
* [Comparison of activation functions](https://en.wikipedia.org/wiki/Activation_function)
* [Introduction to Theano](http://on-demand.gputechconf.com/gtc/2015/webinar/deep-learning-course/getting-started-with-theano.pdf)