# Model: AutoEncoder (2-D bottleneck)
### What is an AutoEncoder (AE)?

An **AutoEncoder** learns to:
1) **Encode** an input \(x\) into a low-dimensional **latent vector** \(z\) (the **bottleneck**), and
2) **Decode** \(z\) back into a reconstruction \($\hat{x}\$) that resembles the original.

For MNIST (28×28 grayscale), we’ll use a **fully-connected (MLP) AE** with a **2-D bottleneck**:
- **Why 2-D?** So we can **visualize** the latent space later as a 2D scatter plot (each image → one point).
- **Encoder**: $(784 \rightarrow 300 \rightarrow 2)$
- **Decoder**: $(2 \rightarrow 300 \rightarrow 784)$

We keep it minimal:
- **Activations**: `LeakyReLU` after the hidden layers (helps gradients with small negative slope).
- **No output activation** on the decoder’s last layer because inputs are **standardized** (Mean=0.1307, Std=0.3081).  
  If you were using raw \([0,1]\) pixels, a `Sigmoid` could be sensible; with standardized inputs, an unrestricted linear output works well with L1/L2 losses.

---

### Shape flow

- Input batch \(x\): `[B, 1, 28, 28]`
- **Flatten** to `[B, 784]` before feeding the encoder.
- Encoder outputs \(z\): `[B, 2]`
- Decoder maps back to `[B, 784]`
- **Unflatten** to `[B, 1, 28, 28]` to compare against the original images.

This ensures the training loss can be computed as `loss(x, x_hat)` element-wise.

---

### Methods we implement

- `encode(x) → z`: only the encoder path (used later to visualize latents).
- `decode(z) → x_hat`: only the decoder path (used later to manipulate latents and decode).
- `forward(x) → x_hat`: full AE (encode then decode).

---

### Why `LeakyReLU`?

- Like ReLU but with a small slope on negatives (mitigates “dying ReLUs”).
- Often improves stability for simple MLP autoencoders.

---

### Sanity check idea (optional)

After defining the model, run a tiny batch through it and verify:
- `x.shape == recon.shape == [B, 1, 28, 28]`
- `features.shape == [B, 2]`

We’ll do that right after we instantiate the model.
