```{contents}
```
## Autoencoders and Variational Autoencoders (VAE)

Both belong to the **Generative AI** family, designed to **learn compact, meaningful representations (latent space)** of input data and reconstruct it — often leading to powerful generative abilities.

---

### Autoencoders (AE)

#### Core Intuition

An **Autoencoder** learns to **compress** data (encode) into a small internal representation and then **rebuild** (decode) the original data from that compressed form.
It captures the **most important features** of the input — removing redundancy and noise.

Think of it as:

> “Learning to copy the input efficiently through a bottleneck.”

---

#### Architecture

An Autoencoder has three main components:

```
Input  →  Encoder  →  Latent Space  →  Decoder  →  Output
   x         ↓             z             ↓          x̂
```

* **Encoder:**
  A neural network that compresses input $x$ into a smaller latent vector $z$.
  Mathematically:
  $$
  z = f_\theta(x)
  $$

* **Latent Space:**
  A compressed “information capsule” representing key features of $x$.

* **Decoder:**
  Another neural network that reconstructs data from $z$.
  $$
  \hat{x} = g_\phi(z)
  $$

* **Objective:**
  Minimize **Reconstruction Loss**:
  $$
  L = |x - \hat{x}|^2
  $$
  (often Mean Squared Error or Cross-Entropy)

---

### Intuition in Simple Terms

Imagine giving an artist a photo and asking them to draw it from memory.

* The **encoder** is like how their brain stores only key features (shape, color, structure).
* The **decoder** is their hand drawing it again — not pixel-perfect, but close.

---

#### Applications

* **Denoising:** Remove noise from corrupted images.
* **Dimensionality reduction:** Alternative to PCA for nonlinear data.
* **Anomaly detection:** Large reconstruction error = anomaly.
* **Image compression / feature extraction.**

---

### Variational Autoencoder (VAE)

#### Motivation

Standard autoencoders learn **a fixed deterministic mapping**: each input → one latent code.
→ That’s **not ideal for generating new data** — there’s no smooth “latent space” to sample from.

#### VAE Solution

VAEs make the latent space **probabilistic**, not deterministic.

---

#### Architecture Overview

Still has **encoder → latent space → decoder**, but with probabilistic modeling:

```
Input x → Encoder → (μ, σ) → z ~ N(μ, σ²) → Decoder → x̂
```

* **Encoder:**
  Instead of producing a single latent vector, it outputs two:

  * Mean vector $\mu(x)$
  * Standard deviation vector $\sigma(x)$

  These define a **Gaussian distribution** in latent space:
  $$
  z \sim \mathcal{N}(\mu, \sigma^2)
  $$

* **Reparameterization Trick:**
  To make sampling differentiable for backpropagation:
  $$
  z = \mu + \sigma \odot \epsilon,\quad \epsilon \sim \mathcal{N}(0, I)
  $$

* **Decoder:**
  Generates new samples $\hat{x} = g_\phi(z)$.

---

#### Loss Function

VAE balances two goals:

1. **Reconstruction Loss:** Rebuild input accurately.
   $$
   L_{recon} = |x - \hat{x}|^2
   $$
2. **KL Divergence Loss:** Make latent distribution $q(z|x)$ close to a standard Gaussian $p(z)$:
   $$
   L_{KL} = D_{KL}(q(z|x) | p(z))
   $$

**Total Loss:**
$$
L = L_{recon} + \beta L_{KL}
$$
(β is a scaling factor in β-VAE)

---

#### Intuition

* The **encoder** learns a distribution, not a fixed point.
* Sampling from this smooth latent space allows you to **generate new, realistic data**.
* Points close in latent space correspond to **semantically similar samples**.

---

#### Applications

* **Image generation:** Produce realistic new faces, objects, or digits.
* **Data compression:** Learn meaningful latent representations.
* **Anomaly detection:** Outliers have high reconstruction + KL loss.
* **Latent space arithmetic:**
  E.g., “smiling face” − “neutral face” + “male” = “smiling male face”.

---

#### Comparison: Autoencoder vs VAE

| Feature            | Autoencoder            | Variational Autoencoder             |
| ------------------ | ---------------------- | ----------------------------------- |
| Latent Space       | Deterministic          | Probabilistic                       |
| Output             | Reconstructed input    | Reconstructed + Generative          |
| Generative Ability | Poor                   | Excellent                           |
| Loss Function      | MSE / Cross-Entropy    | Reconstruction + KL Divergence      |
| Use Case           | Compression, denoising | Generation, representation learning |

---

#### Example (MNIST)

* Input: Handwritten digit image (28×28)
* Encoder: Compresses to latent vector (say 2D)
* Decoder: Reconstructs digit
* In VAE, you can sample random latent vectors → generate **new digits** never seen before.