# How do I build a neural net I read about in a paper and adapt it to my problem?

<img src="conv.gif">
Image source: https://medium.datadriveninvestor.com/convolutional-neural-networks-3b241a5da51e

## Need the following information:

### Architectural 
- Number of convolutional layers
- Filter/stride/padding for each conv layer
- Number of dense (i.e. fully connected) layers
- Number of nodes per dense layer
- Other: 
    - most conv nets use max pooling following some (not neccessarily all) conv layers, they will have a filter size/stride/padding
    - BatchNormalization layers are common, other normalization schemes might be used (local response normalization; LRN)
    - Dropout layers are also common, need to know dropout ratio (usually between 0.1 and 0.5 - this is a hyperparameter!)
    
### Additional info/parameters
- batch size
- dropout ratio
- learning rate
- optimizer
- activation function

## VGG
For this tutorial, build VGG-11 (Simonyan et. al., 2015). **Note that ImageNet (or any image-recognition task) is a classification problem (i.e. "this is a cat"), but we will convert the network to perform regression tasks ($y=f(x)$) by altering the last dense layer so that the output activation function is "linear" ($\mathcal{L}=y-\hat{y}$) and the number of output nodes is equal to the number of features in our prediction.** For ImageNet, the number of output nodes is 1000, because that's how many different classes are in the dataset ("fox", "dog", "car", etc...). For our regression problem, the number of nodes in the final dense layer will be equal to the number of pixels/grid cells in the target ($\mathbf{Y}$) data.

For this tutorial, we'll assume we have an input of shape (224, 224, 3) and an output of (224, 224, 1). The I/O features mean that we are predicting a single feature that we assume is parameterized by three input features. For example, if we are predicting precipitation amount (one feature), input features might be temperature, pressure, and relative humidity. Each 224x224 "image" represents, in our precipitation thought experiment, a 224x224 spatial grid (lat/lon for a given time step).

### Data dimensions
If you've worked with NNs yourself, then you've probably heard of ImageNet. ImageNet is a benchmark dataset used in the ImageNet Large-ScaleVisual Recognition Challenge (ILSVRC). Many popular networks have been developed using this dataset - it's considered a benchmark dataset, like MNIST. The image dimensions are (224,224,3) -> (image height, image width, number of channels). The word "channels" can be confusing, and can often be replaced with the word "features", and represents the depth of the image. For example, an RGB image has a depth of 3, i.e. 3 channels. The data for your problem is likely a different shape, so some parameters from the original network will have to be tweaked to fit. 

## Build the structure first
Often there will be a table or a diagram showing which layers are used, what order they go in, and some (but not all!) of the parameters. In Table 1 from Simonyan et. al., we'll use the most shallow network (left-most column), with a total of 11 weighted layers - 8 conv and 3 dense layers. Note that pooling, dropout, and normalization layers don't require weights to be learned.

Table 1 denotes conv layers using the format "conv-receptive field size-number of channels". The receptive field is the filter size for each layer, and the number of channels is the number of channels in the output from that layer. For example, "conv3-64" means that the convolutional layer uses a filter of size 3, and outputs data in shape (new_height, new_width, 64). The formula to calculate new dimensions after a convolution is:

$$
dim_{new} = \frac{dim_{old} + 2p -f}{s} + 1
$$

where:
    f = filter size
    s = stride
    p = padding
    
Often, padding and/or stride will not be explicitly stated in the text, but can be inferred.

<img src="VGG_table1_box.png">

## VGG-11 structure

**NOTE:** The code sample below shows the structure of VGG11 using *only* the parameters specified in Table 1. We will have to mine through the text for the rest of the network parameters.

```Python
import keras
from keras import layers
import numpy as np

# INPUT
input_img = keras.Input(shape=(224, 224, 3))
# CONV1
x = layers.Conv2D(64, 3, activation=?, padding=?)(input_img)
# MAX POOLING
x = layers.MaxPooling2D(?)(x)
# CONV2
x = layers.Conv2D(128, 3, activation=?, padding=?)(x)
# MAX POOLING
x = layers.MaxPooling2D(?)(x)
# CONV3
x = layers.Conv2D(256, 3, activation=?, padding=?)(x)
# CONV4
x = layers.Conv2D(256, 3, activation=?, padding=?)(x)
# MAX POOLING
x = layers.MaxPooling2D(?)(x)
# CONV5
x = layers.Conv2D(512, 3, activation=?, padding=?)(x)
# CONV6
x = layers.Conv2D(512, 3, activation=?, padding=?)(x)
# MAX POOLING
x = layers.MaxPooling2D(?)(x)
# CONV7
x = layers.Conv2D(512, 3, activation=?, padding=?)(x)
# CONV8
x = layers.Conv2D(512, 3, activation=?, padding=?)(x)
# MAX POOLING
x = layers.MaxPooling2D(?)(x)
# FLATTEN
x = layers.Flatten()(x)
# DENSE1
x = layers.Dense(4096, activation=?)(x)
# DENSE2
x = layers.Dense(4096, activation=?)(x)
# DENSE3
x = layers.Dense(224*224*1, activation=?)(x)
output = layers.Reshape((224,224,1), input_shape=(224*224*1,))(x)
```

<img src="VGG_architecture.png">

## Notes on max pooling and padding in Keras - from Keras docs

<img src="Keras_maxpool.png">

## For a conv and dense layers - info on padding from Keras docs
padding: one of "valid" or "same" (case-insensitive). "valid" means no padding. "same" results in padding with zeros evenly to the left/right or up/down of the input such that output has the same height/width dimension as the input. (https://keras.io/api/layers/convolution_layers/convolution2d/)

In [1]:
import keras
from keras import layers
import numpy as np

# INPUT
input_img = keras.Input(shape=(224, 224, 3))
print('INPUT image shape: ',input_img.shape)
# CONV1
x = layers.Conv2D(64, 3, activation='relu', padding='same')(input_img)
print('CONV1 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP1 output shape: ',x.shape)
# CONV2
x = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
print('CONV2 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP1 output shape: ',x.shape)
# CONV3
x = layers.Conv2D(256, 3, activation='relu', padding='same')(x)
print('CONV3 output shape: ',x.shape)
# CONV4
x = layers.Conv2D(256, 3, activation='relu', padding='same')(x)
print('CONV4 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP2 output shape: ',x.shape)
# CONV5
x = layers.Conv2D(512, 3, activation='relu', padding='same')(x)
print('CONV5 output shape: ',x.shape)
# CONV6
x = layers.Conv2D(512, 3, activation='relu', padding='same')(x)
print('CONV6 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP3 output shape: ',x.shape)
# CONV7
x = layers.Conv2D(512, 3, activation='relu', padding='same')(x)
print('CONV7 output shape: ',x.shape)
# CONV8
x = layers.Conv2D(512, 3, activation='relu', padding='same')(x)
print('CONV8 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP4 output shape: ',x.shape)
# FLATTEN
x = layers.Flatten()(x)

# DENSE1
x = layers.Dense(4096, activation='relu')(x)
print('DENSE1 output shape: ',x.shape)
# DENSE2
x = layers.Dense(4096, activation='relu')(x)
print('DENSE2 output shape: ',x.shape)
# DENSE3
x = layers.Dense(224*224*1, activation='linear')(x)  # this would be a softmax layer for a classification problem
print('DENSE3 output shape: ',x.shape)
output = layers.Reshape((224,224,1), input_shape=(224*224*1,))(x)  # reshape the 2D Dense output to resemble 3D img
print('OUTPUT shape: ',output.shape)

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


INPUT image shape:  (?, 224, 224, 3)
CONV1 output shape:  (?, 224, 224, 64)

MP1 output shape:  (?, 112, 112, 64)
CONV2 output shape:  (?, 112, 112, 128)
MP1 output shape:  (?, 56, 56, 128)
CONV3 output shape:  (?, 56, 56, 256)
CONV4 output shape:  (?, 56, 56, 256)
MP2 output shape:  (?, 28, 28, 256)
CONV5 output shape:  (?, 28, 28, 512)
CONV6 output shape:  (?, 28, 28, 512)
MP3 output shape:  (?, 14, 14, 512)
CONV7 output shape:  (?, 14, 14, 512)
CONV8 output shape:  (?, 14, 14, 512)
MP4 output shape:  (?, 7, 7, 512)
DENSE1 output shape:  (?, 4096)
DENSE2 output shape:  (?, 4096)
DENSE3 output shape:  (?, 50176)
OUTPUT shape:  (?, 224, 224, 1)


## Training parameters

<img src="VGG_train.png">

In [2]:
import keras
from keras import layers
import numpy as np

# new imports
import tensorflow as tf

# INPUT
input_img = keras.Input(shape=(224, 224, 3))
print('INPUT image shape: ',input_img.shape)
# CONV1
x = layers.Conv2D(64, 
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(input_img)
print('CONV1 output shape: ',x.shape)
# DROPOUT
x = layers.Dropout(0.5)(x)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP1 output shape: ',x.shape)
# CONV2
x = layers.Conv2D(128,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV2 output shape: ',x.shape)
# DROPOUT
x = layers.Dropout(0.5)(x)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP1 output shape: ',x.shape)
# CONV3
x = layers.Conv2D(256,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV3 output shape: ',x.shape)
# CONV4
x = layers.Conv2D(256,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV4 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP2 output shape: ',x.shape)
# CONV5
x = layers.Conv2D(512,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV5 output shape: ',x.shape)
# CONV6
x = layers.Conv2D(512,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV6 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP3 output shape: ',x.shape)
# CONV7
x = layers.Conv2D(512,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV7 output shape: ',x.shape)
# CONV8
x = layers.Conv2D(512,  
                  3, 
                  activation='relu', 
                  padding='same',
                  kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                  bias_initializer=tf.keras.initializers.Zeros(),
                  kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                 )(x)
print('CONV8 output shape: ',x.shape)
# MAX POOLING
x = layers.MaxPooling2D(pool_size=(2, 2), strides=2)(x)
print('MP4 output shape: ',x.shape)
# FLATTEN
x = layers.Flatten()(x)

# DENSE1
x = layers.Dense(4096, 
                 activation='relu',
                 kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                 bias_initializer=tf.keras.initializers.Zeros(),
                 kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                )(x)
print('DENSE1 output shape: ',x.shape)
# DENSE2
x = layers.Dense(4096,
                 activation='relu',
                 kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                 bias_initializer=tf.keras.initializers.Zeros(),
                 kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                )(x)
print('DENSE2 output shape: ',x.shape)
# DENSE3
x = layers.Dense(224*224*1,
                 activation='linear',
                 kernel_initializer=tf.keras.initializers.RandomNormal(mean=0.0, stddev=0.0025),
                 bias_initializer=tf.keras.initializers.Zeros(),
                 kernel_regularizer=tf.keras.regularizers.l2(5*10e-4)
                )(x)  # this would be a softmax layer for a classification problem
print('DENSE3 output shape: ',x.shape)
output = layers.Reshape((224,224,1), input_shape=(224*224*1,))(x)  # reshape the 2D Dense output to resemble 3D img
print('OUTPUT shape: ',output.shape)


INPUT image shape:  (?, 224, 224, 3)
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
CONV1 output shape:  (?, 224, 224, 64)
MP1 output shape:  (?, 112, 112, 64)
CONV2 output shape:  (?, 112, 112, 128)
MP1 output shape:  (?, 56, 56, 128)
CONV3 output shape:  (?, 56, 56, 256)
CONV4 output shape:  (?, 56, 56, 256)
MP2 output shape:  (?, 28, 28, 256)
CONV5 output shape:  (?, 28, 28, 512)
CONV6 output shape:  (?, 28, 28, 512)
MP3 output shape:  (?, 14, 14, 512)
CONV7 output shape:  (?, 14, 14, 512)
CONV8 output shape:  (?, 14, 14, 512)
MP4 output shape:  (?, 7, 7, 512)
DENSE1 output shape:  (?, 4096)
DENSE2 output shape:  (?, 4096)
DENSE3 output shape:  (?, 50176)
OUTPUT shape:  (?, 224, 224, 1)


In [3]:
# generate some synthetic data to use for the purposes of this example
# NOTE - I made this tutorial on my local machine, which doesn't have a lot of memory. The number of samples used below (axis=0) are too small to be practical - 10K samples would be more appropriate for training
x_train = np.random.randn(128,224,224,3)
y_train = np.random.randn(128,224,224,1)
x_val = np.random.randn(16,224,224,3)
y_val = np.random.randn(16,224,224,1)

# define, compile and train the model
from keras.optimizers import SGD  # stochastic gradient descent
# define model object
model = keras.Model(input_img,output)  # the arguments are the layers defined in the cell above that have the I/O data
# compile model
model.compile(
    optimizer=SGD(learning_rate=10e-2, momentum=0.9),
    loss='mean_squared_error'  
    )
# NOTE: the paper uses the SGD optimizer. The network was trained by manually reducing the learning rate. Other optimizers automatically do this, such as Adam.

# train
model.fit(
    x_train, y_train,  # training I/O
    epochs=5,  # small number used in development
    batch_size=8,#256,  # batch size adjusted for demo only
    validation_data=(x_val,y_val) # validation occurs at the end of each training epoch
)



Train on 128 samples, validate on 16 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x1ed71e1a4c8>

In [5]:
model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 224, 224, 64)      1792      
_________________________________________________________________
dropout_1 (Dropout)          (None, 224, 224, 64)      0         
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 112, 112, 64)      0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 112, 112, 128)     73856     
_________________________________________________________________
dropout_2 (Dropout)          (None, 112, 112, 128)     0         
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 56, 56, 128)       0   