# Week 5: Autoencoders & Embeddings - Homework

**ML2: Advanced Machine Learning**

**Estimated Time**: 1 hour

---

This homework combines programming exercises and knowledge-based questions to reinforce this week's concepts.

## Setup

Run this cell to import necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print('✓ Libraries imported successfully')

---
## Part 1: Programming Exercises (60%)

Complete the following programming tasks. Read each description carefully and implement the requested functionality.

### Exercise 1: Experiment: Compression Forces Learning

**Time**: 12 min

Observe how different bottleneck sizes affect reconstruction quality and feature learning.

In [None]:
import torch
import torch.nn as nn
import torchvision
import matplotlib.pyplot as plt

class Autoencoder(nn.Module):
    def __init__(self, latent_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, latent_dim)
        )
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 784),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded, encoded

# Train autoencoders with different bottleneck sizes
latent_dims = [2, 8, 32, 128]

# TODO: Train each and observe reconstruction quality
# Question: What happens to reconstruction as latent_dim changes?
# What is the model learning to fit into the bottleneck?

---
## Part 2: Knowledge Questions (40%)

Answer the following questions to test your conceptual understanding.

### Question 1 (Short Answer)

**Question 1 - Why Compression Forces Learning**

An autoencoder with 784-dim input → 32-dim latent → 784-dim output must compress 784 numbers into 32.

Explain:
1. Why can't the model just memorize each input?
2. What must it learn instead?
3. What happens if latent_dim = 784 (no compression)?

**Hint**: Compression = information bottleneck. The model must learn the ESSENCE of the data.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 2 (Short Answer)

**Question 2 - Latent Space as Learned Representation**

After training an autoencoder on MNIST digits, the 32-dimensional latent space captures what makes each digit unique.

Explain:
1. Why might similar digits (like 3 and 8) be close in latent space?
2. How is this different from pixel space (raw 784 dimensions)?
3. What makes the latent representation 'better' than raw pixels?

**Hint**: Latent space captures semantic similarity, not just pixel similarity.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 3 (Multiple Choice)

**Question 3 - Reconstruction Loss**

You train an autoencoder and get: Train reconstruction loss = 0.01, Test reconstruction loss = 0.10

What does this suggest?

A) The model is working perfectly
B) The model is overfitting
C) The latent dimension is too large
D) The model needs more training

A) The model is working perfectly
B) The model is overfitting
C) The latent dimension is too large
D) The model needs more training

**Hint**: Large gap between train and test = overfitting.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 4 (Short Answer)

**Question 4 - Autoencoders vs Supervised Learning**

Autoencoders are UNSUPERVISED - they don't need labels.

Explain:
1. What is the 'label' that an autoencoder trains on?
2. Why is this useful when you don't have labeled data?
3. How could you use an autoencoder's learned representations for a downstream supervised task?

**Hint**: The input IS the label (reconstruct yourself). The latent space can be used for other tasks.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 5 (Short Answer)

**Question 5 - VAE vs Standard Autoencoder**

Variational Autoencoders (VAEs) learn a DISTRIBUTION in latent space, not just a point.

Explain:
1. Why is learning a distribution useful for GENERATION?
2. What can VAEs do that standard autoencoders cannot?
3. What's the tradeoff?

**Hint**: Distribution = you can sample new points. Standard AE only encodes existing data.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 6 (Multiple Choice)

**Question 6 - Bottleneck Size Selection**

You're building an autoencoder for 1000x1000 images. Which latent dimension is most reasonable?

A) latent_dim = 2 (extreme compression)
B) latent_dim = 256 (moderate compression)
C) latent_dim = 1000000 (no compression)
D) latent_dim = 100000 (minimal compression)

A) latent_dim = 2 (extreme compression)
B) latent_dim = 256 (moderate compression)
C) latent_dim = 1000000 (no compression)
D) latent_dim = 100000 (minimal compression)

**Hint**: Too small = loss of information. Too large = no compression benefit. Need balance.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 7 (Short Answer)

**Question 7 - Denoising Autoencoders**

A denoising autoencoder is trained with: corrupted_input → encoder → decoder → clean_output

Explain:
1. Why does this make the learned features MORE robust?
2. What additional capability does the model gain?
3. How is this related to data augmentation?

**Hint**: Learning to denoise forces the model to learn the underlying structure, not memorize noise.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 8 (Short Answer)

**Question 8 - Embeddings as Dimensionality Reduction**

Autoencoder latent space, PCA, and t-SNE all reduce dimensionality. 

Compare:
1. How does an autoencoder differ from PCA?
2. When would you prefer an autoencoder over PCA?
3. What's the computational tradeoff?

**Hint**: PCA = linear. Autoencoder = nonlinear (with activation functions). PCA is faster.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 9 (Short Answer)

**Question 9 - Interpolation in Latent Space**

You encode two images to latent vectors z1 and z2. Then you decode MIDPOINT (z1 + z2)/2.

What do you expect to see? Why is this useful?

**Hint**: If latent space is smooth, the midpoint should be a blend of the two images.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 10 (Short Answer)

**Question 10 - Real-World Application**

Google Photos uses learned embeddings to search photos by similarity without tags.

Explain:
1. How does an autoencoder-style approach enable this?
2. Why is pixel-space similarity not good enough?
3. What must the latent space capture to make semantic search work?

**Hint**: Latent space must capture 'what the image contains' not 'what pixels look like'.

**Your Answer**:

[Write your answer here in 2-4 sentences]

---
## Submission

Before submitting:
1. Run all cells to ensure code executes without errors
2. Check that all questions are answered
3. Review your explanations for clarity

**To Submit**:
- File → Download → Download .ipynb
- Submit the notebook file to your course LMS

**Note**: Make sure your name is in the filename (e.g., homework_01_yourname.ipynb)