# Week 9: From Supervised to Generative Learning - Homework

**ML2: Advanced Machine Learning**

**Estimated Time**: 1 hour

---

This homework combines programming exercises and knowledge-based questions to reinforce this week's concepts.

## Setup

Run this cell to import necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print('✓ Libraries imported successfully')

---
## Part 1: Programming Exercises (60%)

Complete the following programming tasks. Read each description carefully and implement the requested functionality.

### Exercise 1: Experiment: Discriminative vs Generative Models

**Time**: 10 min

Compare what discriminative and generative models learn.

In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt

# Discriminative model: P(y|x) - "Is this a cat?"
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 128),
            nn.ReLU(),
            nn.Linear(128, 10)  # 10 classes
        )
    
    def forward(self, x):
        return self.model(x)  # Returns class logits

# Generative model: P(x) - "Generate a cat image"
class Generator(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(32, 128),  # From latent space
            nn.ReLU(),
            nn.Linear(128, 784),  # To image space
            nn.Sigmoid()
        )
    
    def forward(self, z):
        return self.model(z)  # Returns generated image

# TODO: What can each model do that the other cannot?
# Discriminator: Can classify but cannot generate
# Generator: Can generate but cannot classify directly

---
## Part 2: Knowledge Questions (40%)

Answer the following questions to test your conceptual understanding.

### Question 1 (Short Answer)

**Question 1 - Discriminative vs Generative**

Discriminative: Learn P(y|x) - "Given data x, what is label y?"
Generative: Learn P(x) or P(x,y) - "What does the data distribution look like?"

Explain:
1. What can a generative model do that a discriminative model cannot?
2. Why might you want to generate new data?
3. Give a real-world application for each type.

**Hint**: Generative can create new samples. Discriminative only classifies existing ones.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 2 (Short Answer)

**Question 2 - VAE: Learning a Distribution**

Standard autoencoder: Encodes each image to a SINGLE point in latent space
VAE: Encodes each image to a DISTRIBUTION (mean μ and variance σ²)

Explain:
1. Why is learning a distribution better for generation?
2. What does sampling from this distribution enable?
3. What's the tradeoff?

**Hint**: Distribution = you can sample infinite new points. Single point = can only reconstruct training data.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 3 (Multiple Choice)

**Question 3 - Reparameterization Trick**

VAEs need to sample z ~ N(μ, σ²) during training. But sampling is not differentiable!

The reparameterization trick: z = μ + σ * ε, where ε ~ N(0,1)

Why does this solve the problem?

A) It makes sampling faster
B) It moves the randomness to ε, making z differentiable w.r.t μ and σ
C) It reduces overfitting
D) It increases model capacity

A) It makes sampling faster
B) It moves the randomness to ε, making z differentiable w.r.t μ and σ
C) It reduces overfitting
D) It increases model capacity

**Hint**: We can't backprop through sampling, but we can backprop through μ + σ * ε.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 4 (Short Answer)

**Question 4 - KL Divergence in VAEs**

VAE loss = Reconstruction loss + KL(q(z|x) || p(z))

The KL term encourages the learned distribution q(z|x) to be close to a standard normal p(z) = N(0,1).

Explain:
1. Why do we want the latent space to be normally distributed?
2. What would happen without the KL term?
3. How does this help with generation?

**Hint**: Normal distribution = smooth, continuous latent space where we can sample anywhere.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 5 (Short Answer)

**Question 5 - GANs vs VAEs**

GANs: Generator vs Discriminator (adversarial training)
VAEs: Encoder-Decoder with probabilistic latent space

Compare:
1. Which typically generates sharper images?
2. Which is easier to train?
3. Which gives you explicit control over latent space?

**Hint**: GANs = sharper but unstable. VAEs = blurrier but stable with structured latent space.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 6 (Multiple Choice)

**Question 6 - Mode Collapse in GANs**

Mode collapse = Generator learns to produce only a few types of outputs, ignoring diversity.

Why does this happen?

A) Generator finds a few outputs that fool the discriminator and sticks with them
B) Not enough training data
C) Learning rate too low
D) Model is too small

A) Generator finds a few outputs that fool the discriminator and sticks with them
B) Not enough training data
C) Learning rate too low
D) Model is too small

**Hint**: The generator exploits weaknesses in the discriminator instead of learning full distribution.

**Your Answer**: [Write your answer here - e.g., 'B']

**Explanation**: [Explain why this is correct]

### Question 7 (Short Answer)

**Question 7 - Conditional Generation**

Conditional VAE/GAN: Generate specific types of outputs (e.g., "generate a smiling face")

Explain: How do you modify the architecture to enable conditional generation?

**Hint**: Provide the condition (label) as additional input to the generator/decoder.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 8 (Short Answer)

**Question 8 - Latent Space Arithmetic**

In a well-trained VAE/GAN:
man with glasses - man + woman ≈ woman with glasses

Explain:
1. Why does vector arithmetic work in latent space?
2. What does this reveal about what the model learned?
3. How is this similar to word embeddings?

**Hint**: Latent space organizes concepts as directions. Similar to king - man + woman = queen.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 9 (Short Answer)

**Question 9 - Diffusion Models**

Diffusion models (like DALL-E 2, Stable Diffusion) are a newer generative approach.

They learn to REVERSE a gradual noising process.

Explain: How is this conceptually different from VAEs/GANs?

**Hint**: Diffusion: learn to denoise. VAEs: learn to compress/decompress. GANs: learn to fool a critic.

**Your Answer**:

[Write your answer here in 2-4 sentences]

### Question 10 (Short Answer)

**Question 10 - Real-World Applications**

Generative models power:
- DALL-E (text-to-image)
- ChatGPT (text generation)
- Deepfakes (face generation)

Explain:
1. What ethical concerns arise from powerful generative models?
2. How might you detect AI-generated content?
3. What safeguards should be in place?

**Hint**: Concerns: misinformation, copyright, consent. Detection: artifacts, watermarking.

**Your Answer**:

[Write your answer here in 2-4 sentences]

---
## Submission

Before submitting:
1. Run all cells to ensure code executes without errors
2. Check that all questions are answered
3. Review your explanations for clarity

**To Submit**:
- File → Download → Download .ipynb
- Submit the notebook file to your course LMS

**Note**: Make sure your name is in the filename (e.g., homework_01_yourname.ipynb)