<a href="https://colab.research.google.com/github/rahiakela/generative-adversarial-networks-with-python/blob/part-1-foundations/1_upsampling_and_transpose_convolutional_for_generative_adversarial_networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Upsampling and Transpose Convolutional  for Generative Adversarial Networks

The GAN architecture is comprised of both a generator and a discriminator model. The generator is responsible for creating new outputs, such as images, that plausibly could have come from the original dataset. The generator model is typically implemented using a deep convolutional neural network and
results-specialized layers that learn to fill in features in an image rather than extract features from an input image.

Two common types of layers that can be used in the generator model are a upsample layer that simply doubles the dimensions of the input and the transpose convolutional layer that performs an inverse convolution operation.

In this notebook, we will discover how to use Upsampling and Transpose Convolutional Layers in Generative Adversarial Networks when generating images. 

After completing this guide, we will know:

* Generative models in the GAN architecture are required to upsample input data in order to generate an output image.

* The Upsampling layer is a simple layer with no weights that will double the dimensions of input and can be used in a generative model when followed by a traditional convolutional layer.

* The Transpose Convolutional layer is an inverse convolutional layer that will both upsample input and learn how to fill in details during the model training process.

## Setup

In [10]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    IS_COLAB = True
except Exception:
    IS_COLAB = False

# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import UpSampling2D, Dense, Reshape, Conv2D

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. LSTMs and CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

No GPU was detected. LSTMs and CNNs can be very slow without a GPU.
Go to Runtime > Change runtime and select a GPU hardware accelerator.


## Need for Upsampling in GANs

Generative Adversarial Networks are an architecture for neural networks for training a generative model. The architecture is comprised of a generator and a discriminator model, each of which are implemented as a deep convolutional neural network. 

The discriminator is responsible for classifying images as either real (from the domain) or fake (generated). 

The generator is responsible for generating new plausible examples from the problem domain. The generator works by taking a random point from the latent space as input and outputting a complete image, in a one-shot manner.

A traditional convolutional neural network for image classification, and related tasks, will use pooling layers to downsample input images. For example, an average pooling or max pooling layer will reduce the feature maps from a convolutional by half on each dimension, resulting in an output that is one quarter the area of the input. 

Convolutional layers themselves also perform a form of downsampling by applying each filter across the input images or feature maps; the resulting activations are an output feature map that is smaller because of the border effects.
Often padding is used to counter this effect. 

The generator model in a GAN requires an inverse operation of a pooling layer in a traditional convolutional layer. It needs a layer to translate
from coarse salient features to a more dense and detailed output.

A simple version of an unpooling or opposite pooling layer is called an upsampling layer. It works by repeating the rows and columns of the input. A more elaborate approach is to perform a backwards convolutional operation, originally referred to as a deconvolution, which is incorrect, but is more commonly referred to as a fractional convolutional layer or a transposed
convolutional layer.

## How to Use the Upsampling Layer

Perhaps the simplest way to upsample an input is to double each row and column. 

For example, an input image with the shape 2 × 2 would be output as 4 × 4.

```python
input = [
  [1, 2],
  [3, 4]
]

Output = [
  [1, 1, 2, 2],
  [1, 1, 2, 2],
  [3, 3, 4, 4],
  [3, 3, 4, 4]
]
```

### Worked Example Using the UpSampling2D Layer

The Keras deep learning library provides this capability in a layer called UpSampling2D. It can be added to a convolutional neural network and repeats the rows and columns provided as input in the output.

**Step-1**

First, we can define a contrived input image that is 2 × 2 pixels. We can use specific values for each pixel so that after upsampling, we can see exactly what effect the operation had on the input.

**Step-2**

Once the image is defined, we must add a channel dimension (e.g. grayscale) and also a sample dimension (e.g. we have 1 sample) so that we can pass it as input to the model. The data dimensions in order are: samples, rows, columns, and channels.

**Step-3**

We can now define our model. The model has only the UpSampling2D layer which takes 2 × 2 grayscale images as input directly and outputs the result of the upsampling operation.

**Step-4**

We can then use the model to make a prediction, that is upsample a provided input image.

**Step-5**

The output will have four dimensions, like the input, therefore, we can convert it back to a 2 × 2 array to make it easier to review the result.

Tying all of this together, the complete example of using the UpSampling2D layer in Keras is provided below.

In [3]:
# step-1: define input data
X = np.array([
    [1, 2],
    [3, 4]          
])
print(X)

# step-2: reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

# step-3: define model
model = Sequential()
model.add(UpSampling2D(input_shape=(2, 2, 1)))
model.summary()

# step-4: make a prediction with the model
yhat = model.predict(X)

# step-5: reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 4))
print(yhat)

[[1 2]
 [3 4]]
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d (UpSampling2D) (None, 4, 4, 1)           0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
[[1. 1. 2. 2.]
 [1. 1. 2. 2.]
 [3. 3. 4. 4.]
 [3. 3. 4. 4.]]


We can see that it will output a 4 × 4 result as we expect, and importantly, the
layer has no parameters or model weights. This is because it is not learning anything; it is just doubling the input. 

Finally, the model is used to upsample our input, resulting in a doubling of
each row and column for our input data, as we expected.

By default, the UpSampling2D will double each input dimension. This is defined by the size argument that is set to the tuple (2,2). You may want to use different factors on each dimension, such as double the width and triple the height. 

This could be achieved by setting the size argument to (2, 3). The result of applying this operation to a 2 × 2 image would be a 4 × 6 output image (e.g. 2 × 2 and 2 × 3).

In [8]:
# step-1: define input data
X = np.array([
    [1, 2],
    [3, 4]          
])
print(X)

# step-2: reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

# step-3: define model
model = Sequential()
model.add(UpSampling2D(input_shape=(2, 2, 1), size=(2, 3)))
model.summary()

# step-4: make a prediction with the model
yhat = model.predict(X)

# step-5: reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 6))
print(yhat)

[[1 2]
 [3 4]]
Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d_5 (UpSampling2 (None, 4, 6, 1)           0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
[[1. 1. 1. 2. 2. 2.]
 [1. 1. 1. 2. 2. 2.]
 [3. 3. 3. 4. 4. 4.]
 [3. 3. 3. 4. 4. 4.]]


Additionally, by default, the UpSampling2D layer will use a nearest neighbor algorithm to fill in the new rows and columns. This has the effect of simply doubling rows and columns, as described and is specified by the interpolation argument set to ‘nearest’. 

Alternately, a bilinear interpolation method can be used which draws upon multiple surrounding points. This can be specified via setting the interpolation argument to ‘bilinear’.

In [9]:
# step-1: define input data
X = np.array([
    [1, 2],
    [3, 4]          
])
print(X)

# step-2: reshape input data into one sample a sample with a channel
X = X.reshape((1, 2, 2, 1))

# step-3: define model
model = Sequential()
model.add(UpSampling2D(input_shape=(2, 2, 1), size=(2, 3), interpolation='bilinear'))
model.summary()

# step-4: make a prediction with the model
yhat = model.predict(X)

# step-5: reshape output to remove channel to make printing easier
yhat = yhat.reshape((4, 6))
print(yhat)

[[1 2]
 [3 4]]
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
up_sampling2d_6 (UpSampling2 (None, 4, 6, 1)           0         
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
[[1.        1.        1.3333334 1.6666667 2.        2.       ]
 [1.5       1.5       1.8333334 2.1666667 2.5       2.5      ]
 [2.5       2.5       2.8333335 3.1666667 3.5       3.5      ]
 [3.        3.        3.3333335 3.6666667 4.        4.       ]]


### Simple Generator Model With the UpSampling2D Layer

The UpSampling2D layer is simple and effective, although does not perform any learning. It is not able to fill in useful detail in the upsampling operation. To be useful in a GAN, each UpSampling2D layer must be followed by a Conv2D layer that will learn to interpret the doubled input and be trained to translate it into meaningful detail.

In this example, our little GAN generator model must produce a 10 × 10 image as output and take a 100 element vector of random numbers from the latent space as input. 

First, a Dense fully connected layer can be used to interpret the input vector and create a sufficient number of activations (outputs) that can be reshaped into a low-resolution version of our output image, in this case, 128 versions of a 5 × 5 image.

Next, the 5 × 5 feature maps can be upsampled to a 10 × 10 feature map.

Finally, the upsampled feature maps can be interpreted and filled in with hopefully useful detail by a Conv2D layer. The Conv2D has a single feature map as output to create the single image we require.

Tying this together, the complete example is listed below.

In [11]:
# define model
model = Sequential()

# define input shape, output enough activations for 128 5x5 image
model.add(Dense(128 * 5 * 5, input_dim=100))

# reshape vector of activations into 128 feature maps with 5x5
model.add(Reshape((5, 5, 128)))

# double input from 128 5x5 to 1 10x10 feature map
model.add(UpSampling2D())

# fill in detail in the upsampled feature maps and output a single image
model.add(Conv2D(1, (3, 3), padding='SAME'))

model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 3200)              323200    
_________________________________________________________________
reshape (Reshape)            (None, 5, 5, 128)         0         
_________________________________________________________________
up_sampling2d_7 (UpSampling2 (None, 10, 10, 128)       0         
_________________________________________________________________
conv2d (Conv2D)              (None, 10, 10, 1)         1153      
Total params: 324,353
Trainable params: 324,353
Non-trainable params: 0
_________________________________________________________________


We can see that the Dense layer outputs 3,200 activations that are then reshaped into 128 feature maps with the shape 5×5. The widths and heights are doubled to 10×10 by the UpSampling2D layer, resulting in a feature map with quadruple the area. 

Finally, the Conv2D processes these feature maps and adds in detail, outputting a single 10 × 10 image.

## How to Use the Transpose Convolutional Layer