Tutorial credit: https://machinelearningmastery.com/upsampling-and-transpose-convolution-layers-for-generative-adversarial-networks/

How to use the UpSampling2D Layer: 

In [None]:
#importing the necessary packages and configuring some parameters

#basic packages
import pandas as pd
import numpy as np
import re
import collections
import matplotlib.pyplot as plt
from pathlib import Path

#Packages for data prep
from sklearn.model_selection import train_test_split
from nltk.corpus import stopwords
from keras.preprocessing.text import Tokenizer
from keras.utils.np_utils import to_categorical
from sklearn.preprocessing import LabelEncoder

#Packages for modeling
from keras import models
from keras import layers
from keras import regularizers

In [None]:
"""the simplest way to upsample an input is to double each row and column
For example, a 2x2 input image would be output as 4x4

         1,2
input = (3,4)

         1, 1, 2, 2
Output: (1, 1, 2, 2)
         3, 3, 4, 4
         3, 4, 4, 4
"""


'the simplest way to upsample an input is to double each row and column\nFor example, a 2x2 input image would be output as 4x4\n\n         1,2\ninput = (3,4)\n\n         1, 1, 2, 2\nOutput: (1, 1, 2, 2)\n         3, 3, 4, 4\n         3, 4, 4, 4\n'

The Keras deep learning library provides this capability in a layer called UpSampling2D.



It can be added to a convolutional neural network and repeats the rows and columns provided as input in the output. For example:

In [None]:
#define model
from keras.models import Sequential
from keras.layers import UpSampling2D

model = Sequential()
model.add(UpSampling2D())

We can demonstrate the behavior of this layer with a simple contrived example.

First, we can define a contrived input image that is 2×2 pixels. We can use specific values for each pixel so that after upsampling, we can see exactly what effect the operation had on the input.

In [None]:
#define input data
import numpy

X = numpy.asarray([[1,2],
             [3,4]])
#show input data for context
print(X)

[[1 2]
 [3 4]]


Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model

In [None]:
#reshape input data into one sample a sample w/a channel
X = X.reshape((1, 2, 2, 1))

We can now define our model. The model only has the UpSampling2D layer which takes 2x2 grayscale images as input directly and outputs the result of the upsampling operation.

In [None]:
#make a prediction w/the model
yhat = model.predict(X)

#reshape output to remove channel to make printing easier
yhat = yhat.reshape((4,4))

#summarize output
print(yhat)

[[1 1 2 2]
 [1 1 2 2]
 [3 3 4 4]
 [3 3 4 4]]


Running the above cells first creates and summarizes our 2x2 input data.

Next, the model is summarized. We can see that it will output a 4x4 result as we expect, and importnatly, the layer has no parameters or model weights. THis is because it is not learning anyting; it is just doubling the input. 

Finally, the model is used to upsample our input, resulting in a doubling of each row and column for out input data, as we expected. 

In [None]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d_1 (UpSampling2 (None, 4, 4, 1)           0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


By default, the UpSampling2D will double each input dimension. This is defined by the "size" argument that is set to the tuple (2,2).



You may want to use different factors on each dimension, such as double the width and triple the height. This could be achieved by setting the "size" argument to (2,3). The result of applying this operation to a 2x2 image would be a 4x6 output image (e.g. 2x2 and 2x3). For example: 

In [None]:
#example of using different scale factors for each dimension
model.add(UpSampling2D(size = (2,3)))
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d_1 (UpSampling2 (None, 4, 4, 1)           0         
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 8, 12, 1)          0         
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 16, 36, 1)         0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


Additionally, by default, the UpSampling2D layer will use a nearest neighbor algorithm to fill in the new rows and columns. This has teh effect of simply doubling rows and columns, as described and is specified by the "interpolation" argument set to "nearest". 

Alternately, a bilinear interpolation method can be used which draws upon multiple surrounding points. This can be specified via setting the "interpolation" argument to "bilinear". For example: 

In [None]:
#example of using bilinear interpolation when upscaling
model.add(UpSampling2D(interpolation = 'bilinear'))
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d_1 (UpSampling2 (None, 4, 4, 1)           0         
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 8, 12, 1)          0         
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 16, 36, 1)         0         
_________________________________________________________________
up_sampling2d_4 (UpSampling2 (None, 32, 72, 1)         0         
_________________________________________________________________
up_sampling2d_5 (UpSampling2 (None, 64, 144, 1)        0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________


The UpSampling2D layer is simple and effective, although does not perform any learning.

It is not able to fill in useful detail in the upsampling operation. To be useful in a GAN, each UpSampling2D layer must be followed by a Conv2D layer that will learn to interpret the doubled input and be trained to translate it into meaningful detail.

We can demonstrate this with an example.

In this case, our little GAN generator model must produce a 10×10 image and take a 100 element vector from the latent space as input.

First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5×5 image.

In [None]:
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import UpSampling2D
from keras.layers import Conv2D
from keras.models import Sequential

#define model
model = Sequential()

#define input shape, output enough activations for the 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim = 100))

#reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))

#fill in detail in the unsampled feature maps and output a single image
model.add(Conv2D(1, (3,3), padding = 'same'))

#summarize model
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 3200)              323200    
_________________________________________________________________
reshape_1 (Reshape)          (None, 5, 5, 128)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 5, 5, 1)           1153      
Total params: 324,353
Trainable params: 324,353
Non-trainable params: 0
_________________________________________________________________


Running the example creates the model and summarizes the output shape of each layer. 

We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5x5. 

The widths and heights are doubled to 10x10 by the UpSampling 2D layer, resulting in a feature map with quadruple the area. 

Finally, the Conv2D processes these feature maps and adds in detail, outputting a single 10x10 image. 

#Conv2D Transpose / Transpose convolution layer
The Conv2D Transpose or transpose convolution layer is more complicated than a simple upsampling layer.

It performs the unsampling operation AND interprets the raw input data to fill in the detail while it is unsampling. It's like a layer that combines the UpSampling2D and Conv2D layers into one layer. 




In [None]:
"""
consider an input image with the size 2x2 as follows: 

        1, 2
Input = (3,4)
"""

Assuming a single filter with a 1x1 kernel and modle weights that result in no changes to the inputs when output (e.g. a model weight of 1.0 and a bias of 0.0), a transpose convolution operation with an output stride of 1x1 will reproduce the output as-is:

In [None]:
"""
          1, 2
Output = (3, 4)
"""

With an output stride of (2,2), the 1x1 convolution requires the insertion of additional rows and columns into the input image so that the reads of the operation can be performed. Therefore, the input looks as follows: 

In [None]:
"""
         1, 0, 2, 0
Input = (0, 0, 0, 0)
         3, 0, 4, 0
         0, 0, 0, 0 
"""

The model can then read across this input using an output stride of (2,2) and will output a 4x4 image, in this case with no change as our model weights have no effect by design:

In [None]:
"""
          1, 0, 2, 0
Output = (0, 0, 0, 0)
          3, 0, 4, 0
          0, 0, 0, 0
"""

Keras provides the transpose convolution capability via the Conv2DTranspose layer. It can be added to our model directly; for example: 

In [None]:
#example of using the transpose convolutional layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2DTranspose

#define model
model = Sequential()
model.add(Conv2DTranspose(...))

We can demonstrate the behavior of this layer with a simple contrived example.

First, we cna define a contrived input image that is 2x2 pixels, as we did in the previous section. We can use specific values for each pixel so that after the transpose convolutional operation, we can see exactly what effect the operation had on the input.

In [None]:
#define input data
X = asarray([[1,2],
             [3,4]])

#show input data for context
print(X)

[[1 2]
 [3 4]]


Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model. 

In [None]:
#reshape input data into one sample - a sample with a channel
X = X.reshape((1, 2, 2, 1))

We can now define our model. 

The model has only the Conv2DTranspose layer, which takes 2x2 grayscale images as input directly and outputs the result of the operation.

The Conv2DTranspose both upsamples and performs a convolution. As such, we must specifiy the both the num of filters and the size of the filters as we do for Conv2D layers. Additionally, we must specify a stride of (2,2) becuase the upsampling is achieved by the stride behaviour of the convolution on the input. 

Specifying a stride of (2,2) has the effect of spacing out the input. Specifically, rows and columns of 0.0 values are inserted to achieve the desired stride. 

In this example, we will use one filter, with a 1x1 kernel and a stride of 2x2 so that the 2x2 input image is upsampled to 4x4

In [None]:
#define model
model = Sequential()
model.add(Conv2DTranspose(1, (1,1), strides = (2,2), input_shape = (2, 2, 1)))

#summarize the model
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_transpose (Conv2DTran (None, 4, 4, 1)           2         
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


We can demonstrate the behavior of this layer with a simple contrived example.

First, we can define a contrived input image that is 2×2 pixels. We can use specific values for each pixel so that after upsampling, we can see exactly what effect the operation had on the input.

In [None]:
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)

[[1 2]
 [3 4]]


Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model.



In [None]:
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

We can now define our model.

The model has only the Conv2DTranspose layer, which takes 2×2 grayscale images as input directly and outputs the result of the operation.

The Conv2DTranspose both upsamples and performs a convolution. As such, we must specify both the number of filters and the size of the filters as we do for Conv2D layers. Additionally, we must specify a stride of (2,2) because the upsampling is achieved by the stride behavior of the convolution on the input.

Specifying a stride of (2,2) has the effect of spacing out the input. Specifically, rows and columns of 0.0 values are inserted to achieve the desired stride.

In this example, we will use one filter, with a 1×1 kernel and a stride of 2×2 so that the 2×2 input image is upsampled to 4×4.

In [None]:
# define model
model = Sequential()
model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))
# summarize the model
model.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_transpose_1 (Conv2DTr (None, 4, 4, 1)           2         
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________


To make it clear what the Conv2DTranspose layer is doing, we will fix the single weight in the single filter to the value of 1.0 and use a bias value of 0.0.

These weights, along with a kernel size of (1,1) will mean that values in the input will be multiplied by 1 and output as-is, and the 0 values in the new rows and columns added via the stride of 2×2 will be output as 0 (e.g. 1 * 0 in each case).

In [None]:
# define weights that they do nothing
weights = [asarray([[[[1]]]]), asarray([0])]
# store the weights in the model
model.set_weights(weights)

We can then use the model to make a prediction, that is upsample a provided input image.


In [None]:
# make a prediction with the model
yhat = model.predict(X)

# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))

# summarize output
print(yhat)

The full example is typed below: 

In [None]:
# example of using the transpose convolutional layer
from numpy import asarray
from keras.models import Sequential
from keras.layers import Conv2DTranspose
# define input data
X = asarray([[1, 2],
			 [3, 4]])
# show input data for context
print(X)
# reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))
# define model
model = Sequential()
model.add(Conv2DTranspose(1, (1,1), strides=(2,2), input_shape=(2, 2, 1)))
# summarize the model
model.summary()
# define weights that they do nothing
weights = [asarray([[[[1]]]]), asarray([0])]
# store the weights in the model
model.set_weights(weights)
# make a prediction with the model
yhat = model.predict(X)
# reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
# summarize output
print(yhat)

[[1 2]
 [3 4]]
Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_transpose_2 (Conv2DTr (None, 4, 4, 1)           2         
Total params: 2
Trainable params: 2
Non-trainable params: 0
_________________________________________________________________
[[1. 0. 2. 0.]
 [0. 0. 0. 0.]
 [3. 0. 4. 0.]
 [0. 0. 0. 0.]]


Running the example first creates and summarizes our 2×2 input data.

Next, the model is summarized. We can see that it will output a 4×4 result as we expect, and importantly, the layer two parameters or model weights. One for the single 1×1 filter and one for the bias. Unlike the UpSampling2D layer, the Conv2DTranspose will learn during training and will attempt to fill in detail as part of the upsampling process.

Finally, the model is used to upsample our input. We can see that the calculations of the cells that involve real values as input result in the real value as output (e.g. 1×1, 1×2, etc.). We can see that where new rows and columns have been inserted by the stride of 2×2, that their 0.0 values multiplied by the 1.0 values in the single 1×1 filter have resulted in 0 values in the output.

Remember: this is a contrived case where we artificially specified the model weights so that we could see the effect of the transpose convolutional operation.

In practice, we will use a large number of filters (e.g. 64 or 128), a larger kernel (e.g. 3×3, 5×5, etc.), and the layer will be initialized with random weights that will learn how to effectively upsample with detail during training.

In fact, you might imagine how different sized kernels will result in different sized outputs, more than doubling the width and height of the input. In this case, the ‘padding‘ argument of the layer can be set to ‘same‘ to force the output to have the desired (doubled) output shape; for example:

In [None]:
# example of using padding to ensure that the output is only doubled
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same', input_shape=(2, 2, 1)))
model.summary()

Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_transpose_2 (Conv2DTr (None, 4, 4, 1)           2         
_________________________________________________________________
conv2d_transpose_3 (Conv2DTr (None, 8, 8, 1)           10        
_________________________________________________________________
conv2d_transpose_4 (Conv2DTr (None, 16, 16, 1)         10        
Total params: 22
Trainable params: 22
Non-trainable params: 0
_________________________________________________________________


#Simple Generator Model With the Conv2DTranspose Layer

The Conv2DTranspose is more complex than the UpSampling2D layer, but it is also effective when used in GAN models, specifically the generator model.

Either approach can be used, although the Conv2DTranspose layer is preferred, perhaps because of the simpler generator models and possibly better results, although GAN performance and skill is notoriously difficult to quantify.

We can demonstrate using the Conv2DTranspose layer in a generator model with another simple example.

In this case, our little GAN generator model must produce a 10×10 image and take a 100-element vector from the latent space as input, as in the previous UpSampling2D example.

First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5×5 image.

In [None]:
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))

Next, the 5×5 feature maps can be upsampled to a 10×10 feature map.

We will use a 3×3 kernel size for the single filter, which will result in a slightly larger than doubled width and height in the output feature map (11×11).

Therefore, we will set ‘padding‘ to ‘same’ to ensure the output dimensions are 10×10 as required.

In [None]:
# double input from 128 5x5 to 1 10x10 feature map
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))

Tying this together, the complete example is listed below.

In [None]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Reshape
from keras.layers import Conv2DTranspose
from keras.layers import Conv2D
# define model
model = Sequential()
# define input shape, output enough activations for for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))
# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))
# double input from 128 5x5 to 1 10x10 feature map
model.add(Conv2DTranspose(1, (3,3), strides=(2,2), padding='same'))
# summarize model
model.summary()

Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 3200)              323200    
_________________________________________________________________
reshape_3 (Reshape)          (None, 5, 5, 128)         0         
_________________________________________________________________
conv2d_transpose_6 (Conv2DTr (None, 10, 10, 1)         1153      
Total params: 324,353
Trainable params: 324,353
Non-trainable params: 0
_________________________________________________________________


Running the example creates the model and summarizes the output shape of each layer.

We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5×5.

The widths and heights are doubled to 10×10 by the Conv2DTranspose layer resulting in a single feature map with quadruple the area.