# Keras Functional API
This notebook shows how to use the  Functional API of Keras by implementing a 20 layer resnet. 

Up until this point we have only seen examples where layers were following each other sequentially. The input of the current layer was always the output of the previous layer. This setup was the standard until about 2014-15 when the InceptionNet and ResNet architectures appeared. Both have more complicated connections then simple sequential ones.

Previous examples used the Sequential model API designed specifically for sequential models. The functional API let's you connect layers arbitrarily, in whatever way you want. It is very useful for more advanced architectures. Addtionally, it enables the network to have multiple inputs and outputs.

## Functional API example
First we will show how can the two layer network from the second lesson be written using Functional API. The network written in with the Sequential API looks like the following:



In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

seq_model = Sequential()
seq_model.add(Dense(64, input_shape=(16,)))
seq_model.add(Dense(64))
seq_model.add(Dense(10))

seq_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 64)                1088      
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
Total params: 5,898
Trainable params: 5,898
Non-trainable params: 0
_________________________________________________________________


The same model with Functional API:

In [None]:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# This returns a tensor
input = Input(shape=(16,))

# A layer instance is callable on a tensor, and returns a tensor
# The parameters are the inputs, and the result of the call is the output tensor.
output1 = Dense(64, activation='relu')(input)
output2 = Dense(64, activation='relu')(output1)  # Here we connect the output of the first layer to the second layer
predictions = Dense(10, activation='softmax')(output2)  # Here we connect the output of the second layer to the last layer

# This creates a model that includes
# the Input layer and three Dense layers
func_model = Model(inputs=input, outputs=predictions)
func_model.summary()

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 16)]              0         
_________________________________________________________________
dense_3 (Dense)              (None, 64)                1088      
_________________________________________________________________
dense_4 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650       
Total params: 5,898
Trainable params: 5,898
Non-trainable params: 0
_________________________________________________________________


Basically, in the Functional API you have to declare how the layers are connected by passing the outputs of a layer as an input to another layer. Each layer is a Python callable and it expects as input a placeholder tensor and produces another tensor. The shape of the Input tensor has to be specified, from then on shapes are automatically calculated.


## The ResNet architecture

The next example contains the newer variant of ResNet trained on CIFAR10 dataset. The architecture can be viewed [here](http://vegesm.web.elte.hu/resnet-20-keras.png).


### Setup
First do some imports needed later:

In [None]:
from tensorflow import keras
from tensorflow.keras.layers import Dense, Conv2D, BatchNormalization, Activation
from tensorflow.keras.layers import AveragePooling2D, Input, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.regularizers import l2
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import cifar10
import numpy as np
import os


Next, load and normalize the data:

In [None]:

# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Input image dimensions.
input_shape = x_train.shape[1:]

# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# Normalize data by subtracting the mean, slightly improves the results
x_train_mean = np.mean(x_train, axis=0)
x_train -= x_train_mean
x_test -= x_train_mean

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
x_train shape: (50000, 32, 32, 3)
50000 train samples
10000 test samples
y_train shape: (50000, 1)


### Creating the nework
First we define the `resnet_layer` function. This function creates a 2D Convolution-Batch Normalization-Activation stack of layers that will serve as a basic bulidng block. The return value `x` is a tensor that can be used as input to the next layer. 

The block looks like this:  
<center><img src="https://drive.google.com/uc?id=14AOnj3igJGGpnmIfvhuE1U_GffI1LR2Q" width="50%" /></center>



In [None]:
def resnet_layer(inputs,
                 num_filters=16,  # parameters for the convolution
                 kernel_size=3,
                 strides=1,
                 activation='relu',  # name of the activation function
                 batch_normalization=True,  # should we include a batchnorm layer
                 conv_first=True):
    
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs

    # Should the convolution come before the BatchNorm+ReLu or the other way?
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x

 One residual block contains a stack of (1 x 1)-(3 x 3)-(1 x 1) convolutions with BatchNormalization and ReLU layers. Here is a diagram of the block:
<center><img src="https://drive.google.com/uc?id=11c7I0m8Xk_6TX9TT76K0QFJw0OvTnV7U"  /></center>




In [None]:
def resnet_block(x, num_filters_in, num_filters_out, strides, activation, batch_normalization,
                 conv_on_bottleneck):
    y = resnet_layer(inputs=x, num_filters=num_filters_in, kernel_size=1,
                      strides=strides, activation=activation, batch_normalization=batch_normalization,
                      conv_first=False)
    y = resnet_layer(inputs=y, num_filters=num_filters_in,
                      kernel_size=3, conv_first=False)
    y = resnet_layer(inputs=y,  num_filters=num_filters_out,
                      kernel_size=1,  conv_first=False)
    
    # In each stage the first block decreases the size of the feature maps 
    # and increases the number of filters. We can only add tensors of the same dimensions,
    # so in the skip-connection we have to resize the input.
    # We do that by simply doing a 1x1 convolution
    if conv_on_bottleneck:
        x = resnet_layer(inputs=x,
                          num_filters=num_filters_out,
                          kernel_size=1,
                          strides=strides,
                          activation=None,
                          batch_normalization=False)
    x = keras.layers.add([x, y])
    return x

The ResNet presented here have 3 stages, each stage having two residual blocks. At the beginning of each stage, the feature map size is halved (downsampled) by the end of the first residual block, while the number of filter maps is doubled. After that, within each stage, the resiudal blocks' input and output size does not change.

Features maps sizes and number of filters after each stage:
- conv1 (pre stage 0): 32x32,  16
- stage 0: 32x32,  64
- stage 1: 16x16, 128
- stage 2:  8x8,  256

In [None]:
def resnet_v2(input_shape, num_res_blocks, num_classes=10):
    """
    Creates a ResNetv2 model.
    
    Parameters:
      input_shape: shape of input image tensor
      num_res_blocks: number of residual blocks per stages
      num_classes: number of output classes (CIFAR10 has 10)
    Returns:
      The Keras model.
    """
        
    # Start model definition.
    num_filters = [16, 64, 128, 256]

    inputs = Input(shape=input_shape)
    
    # ResNet first performs a Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters[0],
                     conv_first=True)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            if stage == 0 and res_block == 0:  # first layer and first stage
                activation = None
                batch_normalization = False

            strides = 2 if stage>0 and res_block==0 else 1  # First layer, not first stage has a stride

            x = resnet_block(x, num_filters[stage], num_filters[stage+1], strides, activation,
                             batch_normalization, conv_on_bottleneck=(res_block == 0))


    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    x = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(x)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model

Next we train our neural network. The only new thing here is the callback property. Callbacks can be used in Keras to add extra steps to training that are performed on every epoch/iteration. Some examples:
- `ModelCheckpoint`: saves the model on every epoch. Useful if you have a large model that takes a long time to train and you want to save the model every now and then in case the computer crashes.
- `CsvLogger`: saves the train/validation accuracy and loss during training, useful for visualisations.
- `ReduceLROnPlateau`: decreases the learning rate if the validation loss has stopped decreasing.

We use the `ReduceLROnPlateau` callback that divides the learning rate by $\sqrt{10}$ if the validation loss did not decrease for five epochs. It is an often useful startegy to decrease the learning rate over time.


In [None]:
model = resnet_v2(input_shape=input_shape, num_res_blocks=2)
model.compile(loss='categorical_crossentropy',
              optimizer=Adam(lr=1e-3),
              metrics=['accuracy'])


# This
lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [lr_reducer]

# Run training
# Training ResNet is slower than previous network, so it is trained only for 10 epochs.
# To train it fully, you can run it for 100 or 200 epochs, if you have time!
model.fit(x_train, y_train, batch_size=32,
          epochs=10,
          validation_data=(x_test, y_test),
          shuffle=True,
          callbacks=callbacks)


# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 1.1989514827728271
Test accuracy: 0.7412999868392944


We have achieved 74% accuracy on CIFAR-10. With training the network much longer (200 epochs), one can achieve 91-92% accuracy.