In [None]:
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

<a target="_blank" href="https://colab.research.google.com/github/GoogleCloudPlatform/keras-idiomatic-programmer/blob/master/notebooks/prestem_deconvolution.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>

Due to the time to train the model, change the Colab runtime to use the GPU:

`Runtime -> Change runtime type -> Hardware accelerator -> GPU`

# Add Deconvolution Pre-Stem to ResNet50

### Background

The ResNet50 architecture ([Deep Residual Learning for Image Recognition, 2015](https://arxiv.org/pdf/1707.00600.pdf)) does not learn well (or at all) with small image sizes, such as the CIFAR-10 and CIFAR-100 whose image size is 32x32. The reason is that the feature maps are downsampled too soon in the architecture and become 1x1 (single pixel) before reaching the bottleneck layer prior to the classifier.

The ResNet50 was designed for 224x224 but will work well for size 128x128.

## Setup

We start by importing the tf.keras modules we will use, along with numpy and the builtin dataset for CIFAR-10.

This tutorial will work with both TF 1.X and TF 2.0.

In [1]:
# imports for the model
from tensorflow.keras import Sequential, Model
from tensorflow.keras.layers import Dense, Conv2DTranspose, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.applications import ResNet50

# imports for the dataset
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import numpy as np

## Solution

We could upsample the CIFAR-10 images upstream from 32x32 to 128x128, using an interpolation algorithm such as BI-CUBIC --but this 'hardwired' interpolation may not be the best and may introduce artifacts. Additionally, being upstream from the model, it is generally an inefficient method.
The ResNet50 was designed for 224x224 and work well for size 128x128, but not for small images such as 32x32.

The authors addressed this in their second paper ([Identity Mappings in Deep Residual Networks](https://arxiv.org/pdf/1603.05027.pdf)) with the ResNet56 v2 architecture optimized for CIFAR-10/CIFAR-100. They did this by:

    - Reducing the stem convolution from using a large 7x7 filter to a much smaller 3x3
      filter that captured coarse details in a much smaller image size (32x32).
      
    - Changing the residual groups to reduce downsampling of the feature maps, such that
      they continued to be 4x4 at the bottleneck layer, as was in the first paper for
      ResNet50 v1 for the 224 x 224 images.

*Drawback*

The drawback with this 2016 approach was that one had to redesign the micro-architecture (meta-parameters) of an existing neural networks to accomodate image size that was substantially different from the original neural network.

In recent years, the design of convolutional neural networks have become increasingly more formalized. In addition to a stem group we see the emergence of **pre-stem** groups which can learned transformations of the input to *dynamically* adjust to an existing architecture that was designed for a different image size.

*Alternate Solution - Upscale Image Upstream from Model*

One could updsample the CIFAR-10 images upstream from 32x32 to 128x128, using an interpolation algorithm such as BI-CUBIC --but this 'hardwired' interpolation may not be the best and may introduce artifacts. Additionally, being upstream from the model, it is generally an inefficient method.

### Pre-Stem Solution

Instead, one can add a *pre-stem* Group at the bottom (input) layer of a stock ResNet50 to learn the best upsampling using deconvolution (also known as a transpose convolution).

Additionally, the pre-stem becomes part of the model graph.

### Step 1

We start with a stock `ResNet50` without a classifier and reset the input shape to (128, 128, 3), which we name as the `base` model.

Next, we add the classifier layer as a Dense layer of 10 nodes, which we name as the `resnet` model.

In [2]:
# Get a pre-built ResNet50 w/o the top layer (classifer) and input shape configured for 128 x 128
base = ResNet50(include_top=False, input_shape=(128, 128, 3), pooling='max')

# Add a new classifier (top) layer for the 10 classes in CIFAR-10
outputs = Dense(10, activation='softmax')(base.output)

# Rebuild the model with the new classifier
resnet = Model(base.input, outputs)
resnet.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m94765736/94765736[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


### Step 2

We construct a pre-stem group using two deconvolutions (also called a transpose convolution):

    1. First deconvolution takes (32, 32, 3) as input and upsamples to (64, 64, 3).
    2. Second deconvolution upsamples to (128, 128, 3)
    3. We use the add() method to attach the pre-stem to the resnet model.
    
Essentially, the pre-stem takes the (32, 32, 3) CIFAR-10 inputs and outputs (128, 128, 3) which is then the input to the resnet model.

In [4]:
# Create the pre-stem as a Sequential model
model = Sequential()

# This is the first deconvolution, which takes the (32, 32, 3) CIFAR-10 input and outputs (64, 64, 3)
model.add(Conv2DTranspose(3, (3, 3), strides=2, padding='same', activation='relu', input_shape=(32,32,3)))
model.add(BatchNormalization())

# This is the second deconvolution which outputs (128, 128, 3) which matches the input to our ResNet50 model
model.add(Conv2DTranspose(3, (3, 3), strides=2, padding='same', activation='relu'))
model.add(BatchNormalization())

# Add the ResNet50 model as the remaining layers and rebuild
model.add(resnet)
model.compile(loss='categorical_crossentropy', optimizer=Adam(learning_rate=0.001), metrics=['acc'])

model.summary()

  super().__init__(


### Prepare the Data

We will use the CIFAR-10 builtin dataset and normalize the image data and one-hot encode the labels upstream from the model.

In [5]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = (x_train / 255.0).astype(np.float32)
x_test  = (x_test  / 255.0).astype(np.float32)

y_train = to_categorical(y_train)
y_test  = to_categorical(y_test)

print(x_train.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 0us/step
(50000, 32, 32, 3)


### Train the Model

Let's partially train the model to demonstrate how a pre-stem works. First, for ResNet50 I find a reliable choice of optimizer and learning rate is the Adam optimizer with a learning rate = 0.001. While the batch normalization should provide the ability for higher learning rates, I find with higher ones on ResNet50 it per epoch loss does not converge.


We will then use the fit() method for a small number of epochs (5) and set aside 10% of the training data for the per epoch validation data.

From my test run, I got:

    Epoch 1: 27.7%
    Epoch 2: 33.8%
    Epoch 3: 42.9%
    Epoch 4: 35.9%  -- dropped into a less good local minima
    Epoch 5: 49.1%  -- found a better local minima to dive into

In [None]:
model.fit(x_train, y_train, epochs=5, batch_size=32*2*2, verbose=1, validation_split=0.1)

Epoch 1/5
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m222s[0m 527ms/step - acc: 0.5132 - loss: 1.3939 - val_acc: 0.4012 - val_loss: 1.7217
Epoch 2/5
[1m352/352[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m187s[0m 417ms/step - acc: 0.6595 - loss: 1.0132 - val_acc: 0.6474 - val_loss: 1.0207
Epoch 3/5
[1m166/352[0m [32m━━━━━━━━━[0m[37m━━━━━━━━━━━[0m [1m1:15[0m 405ms/step - acc: 0.7468 - loss: 0.7773