<h1>
Reverse Diffusion With Denoising
</h1>


# Brief Recap

We'll be presenting a comprehensive guide to implementing reverse diffusion denoising using TensorFlow. We present a minimal yet complete implementation of a Denoising Diffusion Probabilistic Model (DDPM) trained on the CIFAR10 dataset, with extensibility for other image domains. The implementation emphasizes mathematical foundations, practical considerations, and modern architectural patterns while maintaining compatibility with standard deep learning workflows.

# Architecture

<img src='https://raw.githubusercontent.com/raminmohammadi/GEN-AI/7642375d6798d75d28100008e3c74b5a96d79d21/Labs/Diffusion%20Models/assets/architecture.png' width=450>

This diagram illustrates the architecture of a diffusion model, highlighting both the forward and reverse diffusion processes.

1. **Input Data**: This is the original data that you want to model or generate, such as an image.

2. **Noise Schedule**: A predefined schedule that determines how much noise is added at each step in the forward diffusion process. This schedule controls the transformation from the original data to noise.

3. **Forward Diffusion Process**: In this phase, noise is progressively added to the input data according to the noise schedule, transforming it into noisy data. This simulates the degradation of data.

4. **Noisy Data**: The result of the forward diffusion process, where the original data has been converted into a noisy version.

5. **Neural Network**: A model trained to predict and reverse the noise added in the forward process. It learns to generate the original data from the noisy data.

6. **Reverse Diffusion Process**: Guided by the neural network and using the noisy data as a starting point, this process iteratively removes noise, working backwards to recreate a denoised version of the original data.

7. **Denoised Output**: The final output generated by the reverse diffusion process, ideally resembling the initial input data.

Overall, this architecture shows how diffusion models use a combination of noise addition and removal to generate new data that mimics the original input.


## Forward Diffusion Process

<img src='https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-b2d8ac0/learnopencv.com/wp-content/uploads/2023/02/denoising-diffusion-probabilistic-models_forward_process_changing_distribution.png' width=500>


> ***It is easy to destroy but hard to create - Pearl S. Buck***

1. In the “Forward Diffusion” process, we slowly and iteratively add noise to (corrupt) the images in our training set such that they “move out or move away” from their existing subspace.
2. What we are doing here is converting the unknown and complex distribution that our training set belongs to into one that is easy for us to sample a (data) point from and understand.
3. At the end of the forward process, the images become entirely unrecognizable. The complex data distribution is wholly transformed into a (chosen) simple distribution. Each image gets mapped to a space outside the data subspace.







## Reverse Diffusion Process

<img src='https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-b2d8ac0/learnopencv.com/wp-content/uploads/2023/02/denoising-diffusion-probabilistic-models_moving_from_simple_to_data_space-1.png' width=400>

> ***By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. - Stable Diffusion, 2022***


1. In the “Reverse Diffusion process,” the idea is to reverse the forward diffusion process.
2. We slowly and iteratively try to reverse the corruption performed on images in the forward process.
3. The reverse process starts where the forward process ends.
4. The benefit of starting from a simple space is that we know how to get/sample a point from this simple distribution (think of it as any point outside the data subspace). 
5. And our goal here is to figure out how to return to the data subspace.
6. However, the problem is that we can take infinite paths starting from a point in this “simple” space, but only a fraction of them will take us to the “data” subspace. 
7. In diffusion probabilistic models, this is done by referring to the small iterative steps taken during the forward diffusion process. 
8. The PDF that satisfies the corrupted images in the forward process differs slightly at each step.
9. Hence, in the reverse process, we use a deep-learning model at each step to predict the PDF parameters of the forward process. 
10. And once we train the model, we can start from any point in the simple space and use the model to iteratively take steps to lead us back to the data subspace. 
11. In reverse diffusion, we iteratively perform the **“denoising”** in small steps, starting from a noisy image.
12. This approach for training and generating new samples is much more stable than GANs and better than previous approaches like variational autoencoders (VAE) and normalizing flows. 





## Theoretical Foundations of Reverse Diffusion


<img src='https://cdn-ilclanb.nitrocdn.com/IekjQeaQhaYynZsBcscOhxvktwdZlYmf/assets/images/source/rev-b2d8ac0/learnopencv.com/wp-content/uploads/2023/01/diffusion-models-forwardbackward_process_ddpm.png' width=600>


* **Forward Process:**

  Gradually adds Gaussian noise to data samples over T steps:

  $$
  q(x_t|x_{(t-1)})=𝒩(x_t;\sqrt{1-\beta_t}x_{t-1}, \beta I)
  $$

  where $\beta_t$ defines the noise schedule.

* **Reverse Process:** 

  Learned transition that iteratively denoises samples:
  $$
  p_θ(x_{t−1}|x_t)=𝒩(x_{t−1};μ_θ(x_t,t),Σ_θ(x_t,t))
  $$

  The neural network $ϵ_θ$ predicts the noise component for denoising.


* **Training Objective** 

  The simplified objective minimizes the MSE between predicted and actual noise:
  $$
  {L} = \mathbb{E}_{t, x_0, \epsilon} \left[ \left\| \epsilon - \epsilon_{\theta} \left( \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, t \right) \right\|^2 \right]
  $$

  where ,
  $$
  \bar{\alpha}_t = \prod_{s=1}^{t} (1 - \beta_s).
  $$

# **Implementation**

### **Key Components**
* **Data Preprocessing:** Prepare the dataset, typically normalizing images to ensure they fit within the model's expected range.

* **Diffusion Process:**

  * **Forward Diffusion:** Incrementally adds noise to the data.
  * **Reverse Diffusion:** The model learns to remove this noise to reconstruct the data.
* **Noise Schedule:** Determines how noise is added at each step of the forward process.

* **Neural Network Architecture:** Usually a U-Net, which is effective for capturing spatial information and reconstructing images.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt

# Example U-Net like architecture
def get_unet_model(input_shape):
    inputs = keras.Input(shape=input_shape)

    # Encoding path
    x = layers.Conv2D(64, 3, activation='relu', padding='same')(inputs)
    x = layers.MaxPooling2D(2)(x)
    x = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
    x = layers.MaxPooling2D(2)(x)

    # Bottom
    x = layers.Conv2D(256, 3, activation='relu', padding='same')(x)

    # Decoding path
    x = layers.UpSampling2D(2)(x)
    x = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
    x = layers.UpSampling2D(2)(x)
    x = layers.Conv2D(64, 3, activation='relu', padding='same')(x)

    outputs = layers.Conv2D(3, 1, activation='sigmoid', padding='same')(x)

    model = keras.Model(inputs, outputs)
    return model


### **Explanation**

**U-Net Architecture:**

<img src='https://raw.githubusercontent.com/raminmohammadi/GEN-AI/7642375d6798d75d28100008e3c74b5a96d79d21/Labs/Diffusion%20Models/assets/unet.jpg' width=500>

Utilizes encoding and decoding paths with skip connections to capture and reconstruct image features effectively.
  * **Training Process:** The model is optimized using a mean squared error loss, attempting to learn the denoising process.
  * **Epochs:** Set to a low number for quick demonstration; more epochs would be needed for real data.

**Use Cases**

This basic setup can be extended for advanced applications like super-resolution, style transfer, or generative tasks by adjusting the architecture and training protocols. For real implementations, consider using additional noise schedules and more complex architectures tailored to specific tasks.

In [None]:
# Instantiate and compile the model
image_size = (128, 128, 3)
model = get_unet_model(input_shape=image_size)
model.compile(optimizer='adam', loss='mse')

# Example dummy data for training
import numpy as np
dummy_images = np.random.rand(100, 128, 128, 3)

# Train the model
model.fit(dummy_images, dummy_images, epochs=5, batch_size=10)

# Display some outputs
sample_output = model.predict(dummy_images[:5])
for img in sample_output:
    plt.imshow(img)
    plt.show()

# **Denoising Images using Reverse Diffusion**


### **Idea**
<img src='https://raw.githubusercontent.com/raminmohammadi/GEN-AI/7642375d6798d75d28100008e3c74b5a96d79d21/Labs/Diffusion%20Models/assets/denoise.jpg' width=500>

The idea is to see how a diffusion model deconstructs an image into noise and reversely constructs it back to a recognizable form, displaying the effectiveness of learned noise removal in generating high-quality images.

* **Forward Diffusion (Top Row):**

  Starts with a clear image and progressively adds noise over several steps, making the image noisier.

* **Reverse Diffusion (Bottom Row):**

  The model learns to gradually remove noise, starting from a noisy image and reconstructing it back into a clear image.
  Arrows indicate how the model refines the image at each step, effectively demonstrating its capability to denoise and generate the original content.



### **Reverse process**

<img src='https://raw.githubusercontent.com/raminmohammadi/GEN-AI/7642375d6798d75d28100008e3c74b5a96d79d21/Labs/Diffusion%20Models/assets/denoise_flow.jpg' width=500>

The reverse process of a diffusion model is illustrated above:

1. **Noise Initialization** (First Input \( t = 0 \)):
   - The model begins with an entirely noisy image.

2. **Model Processing**:
   - The model takes the noisy image and predicts a slightly denoised version.
   - This process iteratively refines the image by reducing noise step by step.

3. **Output and Feedback Loop**:
   - The output image, with reduced noise $( \text{noise} - 1 )$, becomes the input for the next iteration (\( t > 0 \)).
   - This feedback loop continues until the image is sufficiently denoised, progressively reconstructing the original content.

Essentially, the model uses iterative predictions to transform noise into a coherent image through multiple steps.



In [None]:
import numpy as np

from tqdm.auto import trange, tqdm
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras import layers

## Data Preparation

In [None]:
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()
X_train = X_train[y_train.squeeze() == 1]
X_train = (X_train / 127.5) - 1.0

In [None]:
IMG_SIZE = 32     # input image size, CIFAR-10 is 32x32
BATCH_SIZE = 128  # for training batch size
timesteps = 16    # how many steps for a noisy image into clear
time_bar = 1 - np.linspace(0, 1.0, timesteps + 1) # linspace for timesteps

In [None]:
plt.plot(time_bar, label='Noise')
plt.plot(1 - time_bar, label='Clarity')
plt.legend()

## Image Processing

In [None]:
def cvtImg(img):
    img = img - img.min()
    img = (img / img.max())
    return img.astype(np.float32)

def show_examples(x):
    plt.figure(figsize=(10, 10))
    for i in range(25):
        plt.subplot(5, 5, i+1)
        img = cvtImg(x[i])
        plt.imshow(img)
        plt.axis('off')

show_examples(X_train)

**Key Components**

* **Normalization**: This function normalizes the input image (img) to a range between 0 and 1.

* **Data Type:** It converts the image data type to np.float32, which is often preferred for numerical computations in machine learning.

## Noise Generation

We are going to simulate a diffusion process where an image is gradually degraded by adding noise over multiple steps. This process is essential for training diffusion models, as they learn to reverse this noise addition to generate images.


The `forward_noise` function takes an image and a timestep as input and applies Gaussian noise according to a pre-defined schedule. By repeating this process, you can progressively degrade the clarity of an image. This technique is fundamental to how diffusion models learn to generate images.

In [None]:
def forward_noise(x, t):
    a = time_bar[t]        # base noise level
    b = time_bar[t + 1]    # next noise level

    noise = np.random.normal(size=x.shape)
    # Reshape 'a' and 'b' to add dimensions compatible with 'x'
    a = a[:, np.newaxis, np.newaxis, np.newaxis]  # Add 3 new axes
    b = b[:, np.newaxis, np.newaxis, np.newaxis]  # Add 3 new axes

    img_a = x * (1 - a) + noise * a
    img_b = x * (1 - b) + noise * b
    return img_a, img_b

def generate_ts(num):
    return np.random.randint(0, timesteps, size=num)

# t = np.full((25,), timesteps - 1) # if you want see clarity
# t = np.full((25,), 0)             # if you want see noisy
t = generate_ts(25)             # random for training data
a, b = forward_noise(X_train[:25], t)
show_examples(a)

**Key Components**

* **Input:** It takes an image (x) and a timestep (t) as input.
* **Noise Levels:** It determines the noise levels for the current timestep (a) and the next timestep (b) using the time_bar array. time_bar contains pre-calculated noise levels that decrease over time. This is the "noise schedule" you mentioned earlier.
* **Adding Noise:** Gaussian noise is added to the input image based on the noise level:

## Model Building

**Block Function**


This function defines a basic building block of the U-Net, consisting of convolutional layers, activation functions, and layer normalization. It processes both the image features (x_img) and timestep information (x_ts) to learn how noise affects the image at different stages of the diffusion process.

In [None]:
def block(x_img, x_ts):
    x_parameter = layers.Conv2D(128, kernel_size=3, padding='same')(x_img)
    x_parameter = layers.Activation('relu')(x_parameter)

    time_parameter = layers.Dense(128)(x_ts)
    time_parameter = layers.Activation('relu')(time_parameter)
    time_parameter = layers.Reshape((1, 1, 128))(time_parameter)
    x_parameter = x_parameter * time_parameter

    # -----
    x_out = layers.Conv2D(128, kernel_size=3, padding='same')(x_img)
    x_out = x_out + x_parameter
    x_out = layers.LayerNormalization()(x_out)
    x_out = layers.Activation('relu')(x_out)

    return x_out

**`make_model` Function**

This function assembles the complete U-Net model:

* **Input Layers:** It creates input layers for the image (x_input) and the timestep (x_ts_input).
* **Timestep Embedding:** The timestep is embedded into a higher-dimensional representation to provide more context to the model about the diffusion stage.
* **Encoding Path:** The image is processed through a series of convolutional blocks, gradually downsampling the feature maps to capture larger-scale patterns.
* **Bottleneck:** A central block processes the most compressed representation of the image.
* **Decoding Path:** The feature maps are upsampled and combined with features from the encoding path using skip connections to reconstruct fine details.
* **Output Layer:** A final convolutional layer produces the denoised image.
* **Model Creation:** The function returns a Keras model object that takes the imag

In [None]:
def make_model():
    x = x_input = layers.Input(shape=(IMG_SIZE, IMG_SIZE, 3), name='x_input')

    x_ts = x_ts_input = layers.Input(shape=(1,), name='x_ts_input')
    x_ts = layers.Dense(192)(x_ts)
    x_ts = layers.LayerNormalization()(x_ts)
    x_ts = layers.Activation('relu')(x_ts)

    # ----- left ( down ) -----
    x = x32 = block(x, x_ts)
    x = layers.MaxPool2D(2)(x)

    x = x16 = block(x, x_ts)
    x = layers.MaxPool2D(2)(x)

    x = x8 = block(x, x_ts)
    x = layers.MaxPool2D(2)(x)

    x = x4 = block(x, x_ts)

    # ----- MLP -----
    x = layers.Flatten()(x)
    x = layers.Concatenate()([x, x_ts])
    x = layers.Dense(128)(x)
    x = layers.LayerNormalization()(x)
    x = layers.Activation('relu')(x)

    x = layers.Dense(4 * 4 * 32)(x)
    x = layers.LayerNormalization()(x)
    x = layers.Activation('relu')(x)
    x = layers.Reshape((4, 4, 32))(x)

    # ----- right ( up ) -----
    x = layers.Concatenate()([x, x4])
    x = block(x, x_ts)
    x = layers.UpSampling2D(2)(x)

    x = layers.Concatenate()([x, x8])
    x = block(x, x_ts)
    x = layers.UpSampling2D(2)(x)

    x = layers.Concatenate()([x, x16])
    x = block(x, x_ts)
    x = layers.UpSampling2D(2)(x)

    x = layers.Concatenate()([x, x32])
    x = block(x, x_ts)

    # ----- output -----
    x = layers.Conv2D(3, kernel_size=1, padding='same')(x)
    model = tf.keras.models.Model([x_input, x_ts_input], x)
    return model


model = make_model()
model.summary()

The U-Net architecture is characterized by its symmetrical encoding and decoding paths, resembling a "U" shape. Skip connections between corresponding layers in the encoding and decoding paths allow the model to preserve fine-grained details during the denoising process.

## Model Training



#### **Compilation**


The model is compiled with an Adam optimizer and a mean absolute error (MAE) loss function. During training, it learns to predict the noise added to the image at each timestep. By iteratively denoising images, it learns to reverse the diffusion process and generate new images from noise.

In [None]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.0008)
loss_func = tf.keras.losses.MeanAbsoluteError()
model.compile(loss=loss_func, optimizer=optimizer)

In [None]:
def predict(x_idx=None):
    x = np.random.normal(size=(32, IMG_SIZE, IMG_SIZE, 3))

    for i in trange(timesteps):
        t = i
        x = model.predict([x, np.full((32), t)], verbose=0)
    show_examples(x)

predict()

In [None]:
def predict_step():
    xs = []
    x = np.random.normal(size=(8, IMG_SIZE, IMG_SIZE, 3))

    for i in trange(timesteps):
        t = i
        x = model.predict([x, np.full((8),  t)], verbose=0)
        if i % 2 == 0:
            xs.append(x[0])

    plt.figure(figsize=(20, 2))
    for i in range(len(xs)):
        plt.subplot(1, len(xs), i+1)
        plt.imshow(cvtImg(xs[i]))
        plt.title(f'{i}')
        plt.axis('off')

predict_step()

In [None]:
def train_one(x_img):
    x_ts = generate_ts(len(x_img))
    x_a, x_b = forward_noise(x_img, x_ts)
    loss = model.train_on_batch([x_a, x_ts], x_b)
    return loss

In [None]:
def train(R=50):
    bar = trange(R)
    total = 100
    for i in bar:
        for j in range(total):
            x_img = X_train[np.random.randint(len(X_train), size=BATCH_SIZE)]
            loss = train_one(x_img)
            pg = (j / total) * 100
            if j % 5 == 0:
                bar.set_description(f'loss: {loss:.5f}, p: {pg:.2f}%')

#### **Training**

The training process involves repeatedly presenting the model with noisy images and their corresponding timesteps, and guiding it to predict less noisy versions. This process allows the model to learn the underlying patterns in the data and effectively reverse the diffusion process to generate high-quality images.

In [None]:
for _ in range(10):
    train()
    # reduce learning rate for next training
    model.optimizer.learning_rate = max(0.000001, model.optimizer.learning_rate * 0.9)

    # show result
    predict()
    predict_step()
    plt.show()

Diffusion Models are a conceptually simple and elegant approach to the problem of generating data. Their State-of-the-Art results combined with non-adversarial training has propelled them to great heights, and further improvements can be expected in the coming years given their nascent status. In particular, Diffusion Models have been found to be essential to the performance of cutting-edge models like DALL-E 2.

