# Diffusion models
In this notebook, I build a diffusion model using PyTorch.

This notebook is adapted from the DeepLearning.AI course, [Diffusion Models](https://learn.deeplearning.ai/diffusion-models/).

# Setup
Import required packages.

In [None]:
# Built-in packages

# External packages
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, PillowWriter
import numpy as np
from IPython.display import HTML

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import models, transforms

from tqdm.auto import tqdm

DDPM = [Denoising Diffusion Probabilistic Models](https://hojonathanho.github.io/diffusion/), a _scheduler_, or sampling algorithm that iteratively subtracts noise (predicted by the neural network) from the original Gaussian (normal) sample for a set number of time steps.

The sampling process actually happens backwards: from the final timestep to the first one; as if going back in time from a fully diffused drop of ink in water to the moment when the ink was first dropped (time 0). That way, the scheduler can predict what the original sample would look like after the noise is removed.

The neural network predicts the _noise_ in a sampled image, or the parts of the image that are not the image itself, for a given timestep (iteration). That predicted noise is then subtracted from the sample, leading to a less-noisy image (hence, _denoising_).

Before the next iteration of passing the sample through the neural network after subtracting the predicted noise, extra noise (scaled by the timestep) must be added to the (sample - predicted noise) image to get it closer to normally-distributed noise, since that's what the network expects as input. If you don't add the extra noise (calculated by the scheduler), then the network will predict something like the average of the dataset rather than clear unique samples.

## Training
During training, noise is added to the input image. The neural network predicts the noise that was added to the image. The difference between the predicted noise and the actual added noise is computed and then, through backpropagation, the network's parameters are updated to better predict the added noise.

## Additional embeddings
The neural network receives two kinds of embeddings during upsampling: context and timestep. The timestep embedding is added to the product (multiplication) of the context embedding with the upsampled layer. That embedding stores information about the level of noise to predict in the image (the timestep) and the desired image (the context).