# Convolutional Autoencoder (CAE) Tutorial

## Introduction

A Convolutional Autoencoder (CAE) is a type of autoencoder that uses convolutional layers to encode and decode the input data. CAEs are particularly well-suited for image data, as convolutional layers can capture spatial hierarchies in images more effectively than fully connected layers.

## Architecture

A CAE consists of two main parts:
1. **Encoder**: Uses convolutional layers to map the input to a latent-space representation.
2. **Decoder**: Uses transposed convolutional layers to reconstruct the input from the latent space representation.

### Encoder

The encoder function uses convolutional layers to reduce the spatial dimensions of the input while increasing the depth (number of filters).

Mathematically, a convolutional layer can be described as:

$$
h = \sigma(W * x + b)
$$

where:
- $*$ denotes the convolution operation
- $W$ is a set of learnable filters
- $b$ is a bias term
- $\sigma$ is an activation function (e.g., ReLU, sigmoid)
- $x$ is the input image

### Decoder

The decoder function uses transposed convolutional layers (also known as deconvolutional layers) to reconstruct the input from the latent space representation.

Mathematically, a transposed convolutional layer can be described as:

$$
\hat{x} = \sigma(W^T * h + b')
$$

where:
- $W^T$ denotes the transposed convolution operation
- $b'$ is a bias term
- $\sigma$ is an activation function
- $h$ is the latent space representation

### Loss Function

The loss function for a CAE is typically the mean squared error (MSE) between the input and the reconstructed output:

$$
L = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{x}_i)^2
$$

## Training Process

Training a CAE involves minimizing the loss function with respect to the weights and biases of the convolutional and transposed convolutional layers. This is typically done using gradient descent.

### Derivatives

Let's derive the gradients for the weights and biases of the convolutional and transposed convolutional layers.

#### Convolutional Layer Gradients

For the convolutional layer, the gradient of the loss function with respect to the weights $W$ is:

$$
\frac{\partial L}{\partial W} = \frac{\partial L}{\partial h} \cdot \frac{\partial h}{\partial W}
$$

Since $h = \sigma(W * x + b)$, we have:

$$
\frac{\partial h}{\partial W} = x \cdot \sigma'(W * x + b)
$$

Thus,

$$
\frac{\partial L}{\partial W} = (x - \hat{x}) \cdot \sigma'(W^T * h + b') \cdot x^T \cdot \sigma'(W * x + b)
$$

#### Transposed Convolutional Layer Gradients

For the transposed convolutional layer, the gradient of the loss function with respect to the weights $W^T$ is:

$$
\frac{\partial L}{\partial W^T} = \frac{\partial L}{\partial \hat{x}} \cdot \frac{\partial \hat{x}}{\partial W^T}
$$

Since $\hat{x} = \sigma(W^T * h + b')$, we have:

$$
\frac{\partial \hat{x}}{\partial W^T} = h \cdot \sigma'(W^T * h + b')
$$

Thus,

$$
\frac{\partial L}{\partial W^T} = (x - \hat{x}) \cdot \sigma'(W^T * h + b') \cdot h^T
$$

### Gradient Descent Update

The weights and biases are updated using the gradients:

$$
W \leftarrow W - \eta \frac{\partial L}{\partial W}
$$

$$
b \leftarrow b - \eta \frac{\partial L}{\partial b}
$$

where $\eta$ is the learning rate.

# Advantages and Drawbacks

## Advantages
- **Spatial Hierarchies**: CAEs can capture spatial hierarchies in the data, making them particularly well-suited for image data.
- **Parameter Efficiency**: Convolutional layers have fewer parameters than fully connected layers, reducing the risk of overfitting.
- **Local Receptive Fields**: Convolutional layers focus on local regions of the input, allowing the model to learn localized features.

## Drawbacks
- **Computational Cost**: Convolutional layers can be computationally expensive, especially with large input sizes and deep networks.
- **Hyperparameter Tuning**: Choosing the right architecture and hyperparameters (e.g., number of filters, kernel size, pooling size) can be challenging and time-consuming.
- **Reconstruction Quality**: While CAEs can capture spatial features effectively, they may not always achieve high-quality reconstructions compared to other types of autoencoders (e.g., Variational Autoencoders).




In [None]:
import numpy as np
from keras.datasets import mnist
from keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
from keras.optimizers import Adam
import matplotlib.pyplot as plt

# Load the MNIST dataset
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

# Define the CAE architecture
input_img = Input(shape=(28, 28, 1))

# Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# Create CAE model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer=Adam(), loss='binary_crossentropy')

# Train the CAE
autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=256,
                shuffle=True,
                validation_data=(x_test, x_test))

# Encode and decode some digits
decoded_imgs = autoencoder.predict(x_test)

# Display original and reconstructed images
n = 10  # Number of digits to display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
plt.show()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
 12/235 [>.............................] - ETA: 1:06 - loss: 0.1000