# Neural Networks

## Motivation

General-purpose learning algorithm for modeling non-linearity

## Non-linear inputs

- Images
- Text
- Speech
- XOR

## Limitations of linear models

Not "linearly separable"

![xor](assets/neural/xor.png)

Can't draw boundary to separate x's and o's

## Modeling non-linearity

Transform $x$ into $\phi(x)$ to become linearly separable

![xor](assets/neural/xor_phi.png)

$\phi(x)$ is the basis for a "neuron"

## Neuron

$$y = W\phi(x) + b$$

$$\phi(x) = g(W'x + b')$$

Trainable: $W', b', W, b$

$g(x)$ is a non-linear function, e.g. Sigmoid

## Neuron (Single-layer Perceptron)

![neuron](assets/neural/neuron.png)

(image: Neural Network Methods in Natural Language Processing, Goldberg, 2017)

## Neural Network

Multiple neurons in 1 layer make up an "Artificial Neural Network"

![neural network](assets/neural/300px-Colored_neural_network.svg.png)

(image: [Wikipedia](https://en.wikipedia.org/wiki/Artificial_neural_network))

## Neural Network (Deep)

Multiple "hidden" layers of neurons make up a "Deep Neural Network"

![dnn](assets/neural/deep_nn.png)

(image: Goldberg, 2017)

## Properties of a Neural Network

|Term|Description|Examples|
|--|--|--|
|Input dimension|How many inputs|4|
|Output dimension|How many outputs|3|
|Number of hidden layers|Number of layers, excluding input and output|2|
|Activation type|Type of non-linear function|sigmoid, ReLU, tanh|
|Hidden layer type|How the neurons are connected together|Fully-connected, Convolutional|

## Activation types

What non-linearity is applied

![dnn](assets/neural/activations.png)

(image: Goldberg, 2017)

## Layer types

How the neurons are connected together, and what operations are performed with x, W, and b:

- Dense
- Convolutional
- Recurrent
- Residual

More detail to come...

## Walkthrough: Neural Network Architectures in keras

In this walkthrough, we will use Keras to examine the architecture of some well-known neural networks.

### Setup

1. Create a new conda environment called `mldds03`
  a. Launch an `Anaconda Python` command window
  b. `conda create -n mldds03 python=3`
2. Activate the conda environment: `conda activate mldds03`
3. Install Jupyter and keras: `conda install jupyter keras`
4. Navigate to the courseware folder: `cd mldds-courseware`
5. Launch Jupyter: `jupyter notebook`
6. Open this notebook

### Pre-trained Neural Networks in Keras

"Pre-trained" neural networks are available under `keras.applications`

https://keras.io/applications/

These are trained on the ImageNet dataset (http://www.image-net.org/), which contains millions of images.

The neural network architectures from keras are previous years submissions to the ImageNet annual challenge. 

In [2]:
import keras

print(keras.__version__)

2.1.6


First, we'll take a look at MobileNet.

You can find a description of it in keras documentation:
https://keras.io/applications/#mobilenet

The description also contains a URL to the research paper that describes the architecure.

In [10]:
from keras.applications import mobilenet

mobilenet_model = mobilenet.MobileNet(weights='imagenet')
mobilenet_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv1_pad (ZeroPadding2D)    (None, 226, 226, 3)       0         
_________________________________________________________________
conv1 (Conv2D)               (None, 112, 112, 32)      864       
_________________________________________________________________
conv1_bn (BatchNormalization (None, 112, 112, 32)      128       
_________________________________________________________________
conv1_relu (Activation)      (None, 112, 112, 32)      0         
_________________________________________________________________
conv_pad_1 (ZeroPadding2D)   (None, 114, 114, 32)      0         
_________________________________________________________________
conv_dw_1 (DepthwiseConv2D)  (None, 112, 112, 32)      288       
__________

Next, we'll inspect another architecture, just for comparison.

https://keras.io/applications/#resnet50

In [12]:
from keras.applications import resnet50

resnet_model = resnet50.ResNet50(weights='imagenet')
resnet_model.summary()

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_4 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, 230, 230, 3)  0           input_4[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 112, 112, 64) 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, 112, 112, 64) 256         co

Finally, let's try something simpler.

Let's create a 1-layer network that can do linear regression.

In [15]:
# Reference: https://gist.github.com/fchollet/b7507f373a3446097f26840330c1c378
from keras.models import Sequential
from keras.layers import Dense

simple_model = Sequential()
simple_model.add(Dense(1, input_dim=4, activation='sigmoid')) # 4 inputs, 1 output
simple_model.compile(optimizer='rmsprop', loss='mse')

simple_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 1)                 5         
Total params: 5
Trainable params: 5
Non-trainable params: 0
_________________________________________________________________


How about a 2-layer network to make it a deep neural network?

In [18]:
from keras.layers import Activation

deeper_model = Sequential()
deeper_model.add(Dense(8, input_dim=16, activation='relu')) # 16 inputs, 8 outputs
deeper_model.add(Dense(1, activation='relu')) # 8 inputs, 1 output
deeper_model.add(Activation("softmax"))
deeper_model.compile(optimizer='rmsprop', loss='binary_crossentropy')

deeper_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 9         
_________________________________________________________________
activation_144 (Activation)  (None, 1)                 0         
Total params: 145
Trainable params: 145
Non-trainable params: 0
_________________________________________________________________


## Training a neural network

A neural network is trained using:
- Stochastic Gradient Descent
- Back Propagation

## Back Propagation

http://neuralnetworksanddeeplearning.com/chap2.html