# Neural Networks

## Motivation

Neural Networks: General-purpose learning algorithm for modeling non-linearity

... if you train it with "enough" data

## Non-linear inputs

- Images
- Text
- Speech
- XOR

## Limitations of linear models

Not "linearly separable"

![xor](assets/neural/xor.png)

Can't draw boundary to separate x's and o's

## Modeling non-linearity

Transform $x$ into $\phi(x)$ to become linearly separable

![xor](assets/neural/xor_phi.png)

$\phi(x)$ is the basis for a "neuron"

## Neuron

$$y = W\phi(x) + b$$

$$\phi(x) = g(W'x + b')$$

Trainable: $W', b', W, b$

$g(x)$ is a non-linear function, e.g. Sigmoid

## Neuron (Perceptron)

![neuron](assets/neural/neuron.png)

(image: Neural Network Methods in Natural Language Processing, Goldberg, 2017)

## Neural Network

Multiple neurons in 1 layer make up an "Artificial Neural Network"

![neural network](assets/neural/300px-Colored_neural_network.svg.png)

(image: [Wikipedia](https://en.wikipedia.org/wiki/Artificial_neural_network))

## Neural Network (Deep)

Multiple "hidden" layers of neurons make up a "Deep Neural Network"

![multi-layer perceptron](assets/neural/deep_nn.png)

(image: Goldberg, 2017)

## Properties of a Neural Network

|Term|Description|Examples|
|--|--|--|
|Input dimension|How many inputs|4|
|Output dimension|How many outputs|3|
|Number of hidden layers|Number of layers, excluding input and output|2|
|Activation type|Type of non-linear function|sigmoid, ReLU, tanh|
|Hidden layer type|How the neurons are connected together|Fully-connected, Convolutional|

## Activation types

What non-linearity is applied

![dnn](assets/neural/activations.png)

(image: Goldberg, 2017)

## Layer types

How the neurons are connected together, and what operations are performed with x, W, and b:

- Dense
- Convolutional
- Recurrent
- Residual

More detail to come...

## Walkthrough: Neural Network Architectures in keras

In this walkthrough, we will use Keras to examine the architecture of some well-known neural networks.

### Setup - Conda environment

1. Create a new conda environment called `mldds03`
  a. Launch an `Anaconda Python` command window
  b. `conda create -n mldds03 python=3`
2. Activate the conda environment: `conda activate mldds03`
3. Install: `conda install jupyter numpy pandas matplotlib keras pydot python-graphviz`
4. Navigate to the courseware folder: `cd mldds-courseware`
5. Launch Jupyter: `jupyter notebook` and open this notebook

### Pre-trained Neural Networks in Keras

"Pre-trained" neural networks are available under `keras.applications`

https://keras.io/applications/

These are trained on the ImageNet dataset (http://www.image-net.org/), which contains millions of images.

The neural network architectures from keras are previous years submissions to the ImageNet annual challenge. 

In [None]:
import keras

print(keras.__version__)

### MobileNet

MobileNet is a pre-trained ImageNet DNN optimized to run on smaller devices.

Documentation: https://keras.io/applications/#mobilenet

Implementation: https://github.com/keras-team/keras-applications/blob/master/keras_applications/mobilenet.py

In [None]:
from keras.applications import mobilenet

mobilenet_model = mobilenet.MobileNet(weights='imagenet')
mobilenet_model.summary()

### ResNet50

ResNet50 is another pre-trained ImageNet DNN. This is a larger network than MobileNet (almost 26 million parameters). It improves accuracy by introducing residual connections, which are connections that skip layers.

Documentation: https://keras.io/applications/#resnet50

Implementation: https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py

In [None]:
from keras.applications import resnet50

resnet_model = resnet50.ResNet50(weights='imagenet')
resnet_model.summary()

### Creating Neural Networks using Keras
Finally, let's try something simpler.

Let's create a 1-layer network that can do linear regression.

In [None]:
# Reference: https://gist.github.com/fchollet/b7507f373a3446097f26840330c1c378
from keras.models import Sequential
from keras.layers import Dense

simple_model = Sequential()
simple_model.add(Dense(1, input_dim=4, activation='sigmoid')) # 4 inputs, 1 output
simple_model.compile(optimizer='rmsprop', loss='mse')

simple_model.summary()

In [None]:
keras.models.Sequential?

In [None]:
keras.layers.Dense?

In [None]:
keras.Model.compile?

How about a 2-layer network to make it a deep neural network?

In [None]:
from keras.layers import Activation

deeper_model = Sequential()

# imagine we tuned hyperparamters and derived this magic architecture to solve all our problems
deeper_model.add(Dense(12, input_dim=42, activation='relu')) # 42 inputs, 12 outputs
deeper_model.add(Dense(4, activation='relu')) # 12 inputs, 4 outputs
deeper_model.add(Activation("softmax"))
deeper_model.compile(optimizer='rmsprop', loss='binary_crossentropy')

deeper_model.summary()

### Visualizing Neural Net Architectures in Keras

https://keras.io/visualization/

In [None]:
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot

model_to_dot?

In [None]:
SVG(model_to_dot(simple_model, show_shapes=True).create(prog='dot', format='svg'))

In [None]:
SVG(model_to_dot(deeper_model, show_shapes=True).create(prog='dot', format='svg'))

In [None]:
SVG(model_to_dot(mobilenet_model, show_shapes=True).create(prog='dot', format='svg'))

In [None]:
SVG(model_to_dot(resnet_model, show_shapes=True).create(prog='dot', format='svg'))

### Troubleshooting: Graphviz Installation

If pydot is not able to find `graphviz`, you can try installing graphviz manually.

1. Download and install graphviz binaries from: https://graphviz.gitlab.io/download/
2. Add the path to graphviz to your PATH environment variable, e.g. `C:/Program Files (x86)/Graphviz2.38/bin`
3. Launch a new `Anaconda Prompt` and re-run the Jupyter notebook.

## Training a neural network

A neural network is trained using Stochastic Gradient Descent
  - Forward Propagation to compute the output at each layer
  - Back Propagation to compute gradients
  - Update weights and biases using gradients

### Forward Propagation

For 1 neuron:

$$y = W'g(Wx + b) + b'$$

### Forward Propagation

2 layers of neurons:

$$x_1 = W_1'g(W_1x + b_1) + b_1'$$

$$y = x_2 = W_2'g(W_2x_1 + b_2) + b_2'$$

### Forward Propagation

For layer $l$, single layer operation:

$$x_l = \sigma_l(W_lx_{l-1} + b_1)$$

where $\sigma_l(z) = W_l'g(z) + b_l'$

### Feedforward through Layers

for $l = 1$ to $\,L$:

$\,\,\,\,x_l = \sigma_l(W_lx_{l-1} + b_l)$

Where:
- Number of layers: $L$
- Input: $x_0$, Output: $x_L$
- Note: $x_l$ are tensors with the input & output dimensions of that layer

### Backward Propagation

Objective
- Compute the gradients of the cost function $J$ w.r.t. to $W^j_l$ and $b^j_l$ (layer $l$, neuron $j$)
  - Partial derivatives $\frac{\partial J}{\partial W^j_l}$, $\frac{\partial J}{\partial b^j_l}$ 
- E.g. quadratic cost function, $n$ training samples, output $x_L$:
$$J({W_l},{b_l}) = \frac{1}{2n}\sum_{i=1}^n {\|y^i - x_{L}^i\|}^2$$



### Backward Propagation

1. Feedforward from layer 1 to L
2. Compute the output error vector at layer L ($\delta_L$)
3. Backward propagate the error (backwards from layer L-1, .. 1) to compute per-layer error vectors ($\delta_l$)
4. Compute gradient of cost function for layer $l$, neuron $j$:
$$\frac{\partial J}{\partial W_l^j} = x_{l-1}^j\delta_l^j$$
$$\frac{\partial J}{\partial b_l^j} = \delta_l^j$$

### Gradient Descent Update Rule

$$W_l^j := W_l^j + \epsilon \frac{\partial J}{W_l^j}$$

$$b_l^j := b_l^j + \epsilon \frac{\partial J}{b_l^j}$$

$\epsilon$ = learning rate

## Workshop: Neural Network for Logistic Regression

In this workshop, you'll train a neural network to perform logistic regression on the MNIST dataset.

Credits: https://medium.com/@the1ju/simple-logistic-regression-using-keras-249e0cc9a970

In [None]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras import backend as K

In [None]:
# Training settings
BATCH_SIZE = 128
NUM_CLASSES = 10
EPOCHS = 30

# Input size settings
IMG_ROWS = 28 # 28 pixels wide
IMG_COLS = 28 # 28 pixels high

In [None]:
# Import the dataset, split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

In [None]:
# Input processing
input_dim = IMG_ROWS * IMG_COLS
X_train = X_train.reshape(60000, input_dim) 
X_test = X_test.reshape(10000, input_dim) 

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

In [None]:
# We are doing multi-class classification
# Convert class vectors to binary class matrices
y_train_cat = keras.utils.to_categorical(y_train, NUM_CLASSES)
y_test_cat = keras.utils.to_categorical(y_test, NUM_CLASSES)

# Show how the classes look like
print('y_train shape:', y_train_cat.shape)
print('First y_train sample:', y_train_cat[1])

In [None]:
# Create the model
model = Sequential() 
model.add(Dense(NUM_CLASSES, input_dim=input_dim, activation='softmax'))
model.summary()

### Exercise: Training the Neural Network

1. Compile the model
2. Train the model using `sgd`, minibatch size 128 
  - Training set: `X_train`, `y_train_cat`
  - Test set: `X_test`, `y_test_cat`
3. Plot the learning curve, using the accuracy metrics
4. Analyze the learning curve to determine if overfitting or underfitting occurred. 
  - If overfitting occurred, which epoch can training stop
  - If underfitting occurred, train more epochs to determine what the optimum number of epochs should be

How to get the accuracy metrics:
```
history = model.fit(..., metrics=['accuracy'])
...
loss = history.history['loss']
val_loss = history.history['val_loss']

```

You may reference this example for steps 1 and 2: https://medium.com/@the1ju/simple-logistic-regression-using-keras-249e0cc9a970

In [None]:
# Compile and train model
# Your code here













In [None]:
# Plot learning curve
# Your code here
















## Reading List

|Material|Read it for|URL
|--|--|--|
|Lecture 1: Deep Learning Challenge. Is There Theory?|Intro to Deep Learning|https://stats385.github.io/lecture_slides (lecture 1)|
|Lecture 2: Overview of Deep Learning from a Practical Point of View|More background on Neural Nets|https://stats385.github.io/lecture_slides (lecture 2)|
|Neural Networks and Deep Learning, Chapter 2|Understanding Back Propagation|http://neuralnetworksanddeeplearning.com/chap2.html|
|Guide to the Sequential Model|Basic usage of Keras for neural net training|https://keras.io/getting-started/sequential-model-guide/|