#### Copyright 2019 Google LLC.

In [0]:
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Autoencoders

An **autoencoder** is a type of neural network used to learn an efficient representation, or encoding, for a set of data. The advantages of using these learned encodings are similar to those of word embeddings; they reduce the dimension of the feature space and can capture similarities between different inputs. Autoencoders are a useful *unsupervised* learning method, as they do not require any ground truth labels to train.

This notebook is based on [this tutorial](https://github.com/mrdragonbear/Autoencoders/blob/master/Autoencoder-Tutorial.ipynb) and [this keras example](https://www.kaggle.com/vikramtiwari/autoencoders-using-tf-keras-mnist).

## Data

We will use the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset, which contains images of handwritten digits (0, 1, 2, etc.). This dataset has 60,000 training examples and 10,000 testing examples.

In [0]:
# Set random seeds for reproducible results.
import numpy as np
import tensorflow as tf

np.random.seed(42)
tf.random.set_seed(42)

In [0]:
# Load dataset using keras data loader.
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Each image in the dataset is 28 x 28 pixels. Let's flatten each to a 1-dimensional vector of length 784.

In [0]:
image_size = x_train.shape[1]
original_dim = image_size * image_size
# Flatten each image into a 1-d vector.
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])

# Rescale pixel values to a 0-1 range.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

print('x_train:', x_train.shape)
print('x_test:', x_test.shape)

## Autoencoder Structure


<a title="Chervinskii [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Autoencoder_structure.png"><img width="512" alt="Autoencoder structure" src="https://upload.wikimedia.org/wikipedia/commons/2/28/Autoencoder_structure.png"></a>

Source: [Wikipedia](https://en.wikipedia.org/wiki/Autoencoder)

An autoencoder works by learning to output a copy of its input, after passing the input through one or more smaller hidden layer(s). This hidden layer describes an encoding or "code" used to represent the input. An autoencoder has two main parts: an **encoder** that maps the input into the code, and a **decoder** that maps the code back to a reconstruction of the original input. This structure forces the hidden layer to learn a more efficient, useful representation of the input data (also called a "latent representation").

## Basic Model

Below is an example of a simple autoencoder that maps the 784-dimensional input image to a 36-dimensional latent representation, then attempts to reconstruct the 784-dimensional input image from that encoded representation. 

Instead of `keras.models.Sequential`, we'll use `keras.models.Model` to more clearly show the encoder and decoder parts of the autoencoder as individual models. This will also make it easier to extract the latent representations from the encoder. The `Sequential` API is usually easier to use while the `Model` API is more flexible. You can read more about their differences [here](https://keras.io/models/about-keras-models/).

In [0]:
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model

### Encoder

In [0]:
latent_dim = 36

# input layer (needed for the Model API).
input_layer = Input(shape=(original_dim,), name='encoder_input')

# Notice that with all layers except for the first,
# we need to specify which layer is used as input.
latent_layer = Dense(latent_dim, activation='relu',
                     name='latent_layer')(input_layer)

encoder = Model(input_layer, latent_layer, name='encoder')
encoder.summary()

### Decoder

In [0]:
latent_inputs = Input(shape=(latent_dim,), name='decoder_input')
output_layer = Dense(original_dim, name='decoder_output')(latent_inputs)

decoder = Model(latent_inputs, output_layer, name='decoder')
decoder.summary()

### Training

The full autoencoder passes the inputs to the encoder, then the latent representations from the encoder to the decoder. We'll use the Adam optimizer and Mean Squared Error loss.

In [0]:
autoencoder = Model(
    input_layer,
    decoder(encoder(input_layer)),
    name="autoencoder"
)

autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.summary()

We will train for 50 epochs, using [`EarlyStopping`](https://keras.io/callbacks/#earlystopping) to stop training early if validation loss improves by less than 0.0001 for 10 consecutive epochs. Using a batch size of 2048, this should take 1-2 minutes to train.

In [0]:
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    # minimum change in loss that qualifies as "improvement"
    # higher values of min_delta lead to earlier stopping
    min_delta=0.0001,
    # threshold for number of epochs with no improvement
    patience=10,
    verbose=1
)

In [0]:
autoencoder.fit(
    # input
    x_train,
    # output
    x_train,
    epochs=50,
    batch_size=2048,
    validation_data=(x_test, x_test),
    callbacks=[early_stopping]
)

### Visualize Predictions

In [0]:
decoded_imgs = autoencoder.predict(x_test)

In [0]:
import matplotlib.pyplot as plt

def visualize_imgs(nrows, axis_names, images, sizes, n=10):
  '''
  Plots images in a grid layout.

  nrows: number of rows of images to display
  axis_names: list of names for each row
  images: list of arrays of images
  sizes: list of image size to display for each row
  n: number of images to display per row (default 10)

  nrows = len(axis_names) = len(images)
  '''
  fig, axes = plt.subplots(figsize=(20,4), nrows=nrows, ncols=1, sharey=False)
  for i in range(nrows):
    axes[i].set_title(axis_names[i], fontsize=16)
    axes[i].axis('off')

  for col in range(n):
      for i in range(nrows):
        ax = fig.add_subplot(nrows, n, col + 1 + i * n)
        plt.imshow(images[i][col].reshape(sizes[i], sizes[i]))
        plt.gray()
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

  fig.tight_layout()
  plt.show()

In [0]:
visualize_imgs(
    2,
    ['Original Images', 'Reconstructions'],
    [x_test, decoded_imgs],
    [image_size, image_size]
)

This shows 10 original images with their corresponding reconstructed images directly below. Clearly, our autoencoder captured the basic digit structure of each image, though the reconstructed images are less sharp.

## Application: Image Compression

Autoencoders have been used extensively in image compression and processing. An autoencoder can create higher resolution images from low-resolution images, and even colorize black and white images.

To see how autoencoders can be used to compress images, we can use our already trained encoder as an image compressor. You can think of the decoder as a decompressor, reconstructing the original image from the compressed one.

In [0]:
# Compress original images.
encoded_imgs = encoder.predict(x_test)
# Reconstruct original images.
decoded_imgs = decoder.predict(encoded_imgs)

In [0]:
visualize_imgs(
    3,
    ['Original Images', '36-dimensional Latent Representation', 'Reconstructions'],
    [x_test, encoded_imgs, decoded_imgs],
    [image_size, 6, image_size]
)

Now we can visualize the latent representation of each image that the autoencoder learned. Since this reduces the 784-dimensional original image to a 36-dimensional image, it essentially performs an image compression.

## Application: Image Denoising

Autoencoders can also "de-noise" images, such as poorly scanned pictures, and even partially damaged and destroyed paper documents ([Kaggle dataset](https://www.kaggle.com/c/denoising-dirty-documents)). To train a de-noising autoencoder, we must first add noise to the images. 

*Note: "Noise" refers to something that interferes with the quality of original input, such as static in an image or a partially jumbled message.*

### Add Noise

[imgaug](https://github.com/aleju/imgaug) is a useful package to perform various image augmentations. Many of the `arithmetic` functions in the package simulate adding noise to an image. We'll use the [`SaltAndPepper`](https://imgaug.readthedocs.io/en/latest/source/api_augmenters_arithmetic.html#imgaug.augmenters.arithmetic.SaltAndPepper) technique.

*Note: This will take slightly under a minute to run on the full training and testing sets.*

In [0]:
from imgaug import augmenters

# Reshape images to 3-dimensional for augmenter. Since the images were
# originally 2-dimensional, the third dimension is just 1.
x_train = x_train.reshape(-1, image_size, image_size, 1)
x_test = x_test.reshape(-1, image_size, image_size, 1)
  
# p is the probability of changing a pixel to noise.
# higher values of p mean noisier images.
noise = augmenters.SaltAndPepper(p=0.6)
# We could chain multiple augmenters using Sequential.
seq = augmenters.Sequential([noise])

# Rescale pixel values to 0-255 (instead of 0-1) for augmenter,
# add noise to images, then rescale pixel values back to 0-1.
x_train_noise = seq.augment_images(x_train * 255) / 255
x_test_noise = seq.augment_images(x_test * 255) / 255

For comparison, here are what 5 images look like before we add noise:

In [0]:
f, ax = plt.subplots(figsize=(20,2), nrows=1, ncols=5)
for i in range(5, 10):
    ax[i-5].imshow(x_train[i].reshape(image_size, image_size))
plt.show()

After we add noise, the images look like this:

In [0]:
f, ax = plt.subplots(figsize=(20,2), nrows=1, ncols=5)
for i in range(5, 10):
    ax[i-5].imshow(x_train_noise[i].reshape(image_size, image_size))
plt.show()

As you can see, the images are quite noisy and difficult to de-noise even with the human eye. Luckily, autoencoders are much better at this task. We'll follow a similar architecture as before, but this time we'll train the model using the *noisy* images as input and the *original, un-noisy* images as output.

### Encoder

We will need a more sophisticated encoder / decoder architecture to handle the more complex problem. The encoder will use 3 `Conv2D` layers, with decreasing output filter sizes and a `MaxPool` layer after each. This will perform the desired effect of compressing or "downsampling" the image.

Since we are using convolutional layers, we can work directly with the 3-dimensional images.

In [0]:
from tensorflow.keras.layers import Conv2D, MaxPool2D, UpSampling2D

filter_1 = 64
filter_2 = 32
filter_3 = 16
kernel_size = (3, 3)
pool_size = (2, 2)
latent_dim = 4

In [0]:
input_layer = Input(shape=(image_size, image_size, 1))
# First convolutional layer
encoder_conv1 = Conv2D(filter_1, kernel_size,
                        activation='relu', padding='same')(input_layer)
encoder_pool1 = MaxPool2D(pool_size, padding='same')(encoder_conv1)
# Fecond convolutional layer
encoder_conv2 = Conv2D(filter_2, kernel_size, activation='relu',
                       padding='same')(encoder_pool1)
encoder_pool2 = MaxPool2D(pool_size, padding='same')(encoder_conv2)
# Third convolutional layer
encoder_conv3 = Conv2D(filter_3, kernel_size,
                       activation='relu', padding='same')(encoder_pool2)
latent_layer = MaxPool2D(pool_size, padding='same')(encoder_conv3)

encoder_denoise = Model(input_layer, latent_layer, name='encoder')
encoder_denoise.summary()

### Decoder

The decoder will work in reverse, using 3 `Conv2D` layers, with *increasing* output filter sizes and an [`UpSampling2D`](https://keras.io/layers/convolutional/#UpSampling2D) layer after each. This will perform the desired effect of reconstructing or de-noising the image.

In [0]:
latent_inputs = Input(shape=(latent_dim, latent_dim, filter_3))

# First convolutional layer
decoder_conv1 = Conv2D(filter_3, kernel_size,
                       activation='relu', padding='same')(latent_inputs)
decoder_up1 = UpSampling2D(pool_size)(decoder_conv1)
# Second convolutional layer
decoder_conv2 = Conv2D(filter_2, kernel_size,
                        activation='relu', padding='same')(decoder_up1)
decoder_up2 = UpSampling2D(pool_size)(decoder_conv2)
# Third convolutional layer
decoder_conv3 = Conv2D(filter_1, kernel_size,
                        activation='relu')(decoder_up2)
decoder_up3 = UpSampling2D(pool_size)(decoder_conv3)

# Output layer, which outputs images of size (28 x 28 x 1)
output_layer = Conv2D(1, kernel_size, padding='same')(decoder_up3)

decoder_denoise = Model(latent_inputs, output_layer, name='decoder')
decoder_denoise.summary()

### Training

We will again use early stopping and the same model parameters.

In [0]:
denoise_autoencoder = Model(
    input_layer,
    decoder_denoise(encoder_denoise(input_layer))
)

denoise_autoencoder.compile(optimizer='adam', loss='mse')
denoise_autoencoder.summary()

We will only train for 10 epochs this time since the model is more complex and takes longer to train. This should take around a minute.

In [0]:
denoise_autoencoder.fit(
    # Input
    x_train_noise,
    # Output
    x_train,
    epochs=10,
    batch_size=2048,
    validation_data=(x_test_noise, x_test),
    callbacks=[early_stopping]
)

### Visualize Denoised Images

Let's visualize the first 10 denoised images.

In [0]:
denoised_imgs = denoise_autoencoder.predict(x_test_noise[:10])

In [0]:
visualize_imgs(
    3,
    ['Noisy Images', 'Denoised Images', 'Original Images'],
    [x_test_noise, denoised_imgs, x_test],
    [image_size, image_size, image_size]
)

As we can see, the autoencoder is mostly successful in recovering the original image, though a few de-noised images are still blurry or unclear. More training or a different model architecture may help.

## Resources

* [Introduction to Autoencoders](https://www.jeremyjordan.me/autoencoders)
* [Building Autoencoders in Keras](https://blog.keras.io/building-autoencoders-in-keras.html)
* [PCA vs. Autoencoders](https://towardsdatascience.com/pca-vs-autoencoders-1ba08362f450)
* [Variational Autoencoders](https://www.jeremyjordan.me/variational-autoencoders/)
* [Auto-Encoding Variational Bayes paper](https://arxiv.org/abs/1312.6114)
* [Generating Images with VAEs](https://towardsdatascience.com/generating-images-with-autoencoders-77fd3a8dd368)
* [Credit Card Fraud Detection using Autoencoders](https://medium.com/@curiousily/credit-card-fraud-detection-using-autoencoders-in-keras-tensorflow-for-hackers-part-vii-20e0c85301bd)
* [Autoencoder Explained Video](https://www.youtube-nocookie.com/embed/H1AllrJ-_30?start=359)

# Exercises

## Exercise 1

Use the architecture we used for the de-noising autoencoder to train a model for image compression. Try to get the autoencoder's output to have the same quality as the original images. Visualize your best results.



### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
# Put the recommended solution here; if there is more than one "good" solution
# that you think students should know put those solutions in subsequent code
# boxes with "# Solution" in the first line.

## Exercise 2

Using the de-noising autoencoder:

1.   Try using larger `p` values for the `SaltAndPepper` augmenter to generate even noisier images. Experiment with how noisy the images can get and still be de-noised.

2.   Try using other kinds of noise in place of `SaltAndPepper`. You can use other `imgaug` functions such as `Salt` and `CoarseSaltAndPepper`, or randomly generate your own noise using `numpy`. Which augmentations or noises are harder to denoise, using our model? Why might that be?



### Student Solution

In [0]:
# Your answer goes here

### Answer Key

**Solution**

In [0]:
# Put the recommended solution here; if there is more than one "good" solution
# that you think students should know put those solutions in subsequent code
# boxes with "# Solution" in the first line.