# üìò Day 1: Introduction to Generative AI

**üéØ Goal:** Master the fundamentals of Generative AI and understand how machines create new content

**‚è±Ô∏è Time:** 90-120 minutes

**üåü Why This Matters for AI:**
- Generative AI is behind DALL-E, Midjourney, Stable Diffusion, Sora, and ChatGPT
- Powers the AI revolution of 2023-2025 (text-to-image, text-to-video)
- Used for data augmentation, creative content, drug discovery, and design
- Foundation for understanding modern AI systems that create (not just classify)
- Critical skill for AI practitioners in 2024-2025
- Autoencoders and VAEs are building blocks for more advanced models

---

## ü§î What is Generative AI?

**Generative AI = Models that CREATE new data**

### Discriminative vs Generative Models

**Discriminative Models (What you've learned so far):**
- **Task:** Classify or predict labels
- **Question:** "Is this a cat or dog?"
- **Output:** Category/Label (Cat = 0.9, Dog = 0.1)
- **Examples:** Image classification, spam detection, sentiment analysis
- **Models:** CNN, RNN, Random Forest, SVM

**Generative Models (This week!):**
- **Task:** Create new data samples
- **Question:** "Generate a picture of a cat"
- **Output:** New image, text, music, video
- **Examples:** DALL-E, Midjourney, ChatGPT, Sora
- **Models:** GANs, VAEs, Diffusion Models, Transformers

### üéØ Key Difference:

| Aspect | Discriminative | Generative |
|--------|---------------|------------|
| **Goal** | Classify existing data | Create new data |
| **Learns** | Decision boundary | Data distribution |
| **Output** | Labels/Predictions | New samples |
| **Example** | "This is a cat" | "Here's a new cat image" |
| **Math** | P(y\|x) - probability of label given data | P(x) - probability of data itself |

### üåü Real-World Examples (2024-2025):

**Discriminative AI:**
- üîç Google image search: "Is this a cat?"
- üìß Gmail spam filter: "Is this spam?"
- ü©∫ Medical diagnosis: "Is this cancer?"

**Generative AI:**
- üé® **DALL-E 3:** "Create an image of a cyberpunk cat"
- üé¨ **Sora:** "Generate a video of waves crashing"
- ‚úçÔ∏è **ChatGPT:** "Write a poem about AI"
- üéµ **Suno:** "Compose a jazz melody"
- üß¨ **AlphaFold:** "Predict protein structures"

Let's build generative models from scratch! üëá

In [None]:
# Import essential libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from IPython.display import Image, display

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Make plots beautiful
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úÖ Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device: {device}")
print("Let's create new data with Generative AI! üöÄ")

## üîÑ Autoencoders Explained

**Autoencoder = Compress then Reconstruct**

### The Concept:

**Human Analogy:**
Imagine describing a photo to someone:
1. **Encoder:** You compress the image into words: "A red car on a beach at sunset"
2. **Latent Space:** The compressed description (just 7 words instead of millions of pixels)
3. **Decoder:** The listener reconstructs the image in their mind

**In AI:**
```
Input Image (784 pixels)
     ‚Üì
  ENCODER (compresses)
     ‚Üì
Latent Space (32 numbers) ‚Üê Compressed representation!
     ‚Üì
  DECODER (reconstructs)
     ‚Üì
Output Image (784 pixels)
```

### Architecture:

**Encoder (Compression):**
- Input: Original image (28√ó28 = 784 pixels)
- Layers: Gradually reduce dimensions
- Output: Latent vector (e.g., 32 numbers)
- **Learns:** Important features that capture the essence

**Decoder (Reconstruction):**
- Input: Latent vector (32 numbers)
- Layers: Gradually increase dimensions
- Output: Reconstructed image (28√ó28 = 784 pixels)
- **Learns:** How to recreate the image from compressed form

### üéØ Why Autoencoders?

**Applications:**
1. **Dimensionality Reduction:** Compress data (like PCA but better)
2. **Denoising:** Remove noise from images/audio
3. **Anomaly Detection:** Find unusual patterns
4. **Feature Learning:** Extract meaningful representations
5. **Generation:** Sample from latent space to create new data!

### üåü Real-World Uses (2024-2025):

- **Image Compression:** JPEG uses autoencoder-like principles
- **Recommendation Systems:** Netflix, Spotify (collaborative filtering)
- **Medical Imaging:** Denoise MRI scans, detect anomalies
- **Fraud Detection:** Bank transactions (anomaly detection)
- **Foundation for GANs and VAEs:** Building blocks for DALL-E, Stable Diffusion

Let's build one!

In [None]:
# Simple Autoencoder for MNIST Digits

class Autoencoder(nn.Module):
    def __init__(self, input_dim=784, latent_dim=32):
        """
        Simple Autoencoder
        
        Args:
            input_dim: Input size (28*28 = 784 for MNIST)
            latent_dim: Compressed representation size
        """
        super(Autoencoder, self).__init__()
        
        # ENCODER: 784 ‚Üí 128 ‚Üí 64 ‚Üí 32
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, latent_dim),
            nn.ReLU()
        )
        
        # DECODER: 32 ‚Üí 64 ‚Üí 128 ‚Üí 784
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, input_dim),
            nn.Sigmoid()  # Output between 0 and 1 (pixel values)
        )
        
    def forward(self, x):
        """
        Forward pass: Encode then Decode
        """
        # Encode
        latent = self.encoder(x)
        
        # Decode
        reconstructed = self.decoder(latent)
        
        return reconstructed, latent

# Create model
autoencoder = Autoencoder(input_dim=784, latent_dim=32).to(device)

print("‚úÖ Autoencoder Created!")
print(f"\nArchitecture:")
print(autoencoder)
print(f"\nüí° Compression Ratio: 784 ‚Üí 32 (24.5x compression!)")
print(f"   Like compressing a 784KB file to 32KB!")

# Count parameters
total_params = sum(p.numel() for p in autoencoder.parameters())
print(f"\nTotal Parameters: {total_params:,}")

In [None]:
# Load MNIST Dataset

transform = transforms.Compose([
    transforms.ToTensor(),
])

# Download and load training data
train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders
batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print("‚úÖ MNIST Dataset Loaded!")
print(f"\nTraining samples: {len(train_dataset):,}")
print(f"Test samples: {len(test_dataset):,}")
print(f"Batch size: {batch_size}")

# Visualize some examples
fig, axes = plt.subplots(2, 8, figsize=(16, 4))
fig.suptitle('üìä Sample MNIST Digits', fontsize=16, fontweight='bold')

for i in range(16):
    ax = axes[i // 8, i % 8]
    img, label = train_dataset[i]
    ax.imshow(img.squeeze(), cmap='gray')
    ax.set_title(f'Label: {label}')
    ax.axis('off')

plt.tight_layout()
plt.show()

print("\nüí° These are the digits our autoencoder will learn to compress and reconstruct!")

In [None]:
# Train the Autoencoder

def train_autoencoder(model, train_loader, epochs=5):
    """
    Train autoencoder to reconstruct images
    """
    # Loss function: How different is reconstruction from original?
    criterion = nn.MSELoss()  # Mean Squared Error
    
    # Optimizer
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # Training loop
    model.train()
    losses = []
    
    for epoch in range(epochs):
        epoch_loss = 0
        
        for batch_idx, (images, _) in enumerate(train_loader):
            # Flatten images: (batch, 1, 28, 28) ‚Üí (batch, 784)
            images = images.view(-1, 784).to(device)
            
            # Forward pass
            reconstructed, _ = model(images)
            
            # Calculate loss
            loss = criterion(reconstructed, images)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            epoch_loss += loss.item()
        
        # Average loss for epoch
        avg_loss = epoch_loss / len(train_loader)
        losses.append(avg_loss)
        
        print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")
    
    return losses

print("üöÄ Training Autoencoder...")
print("Goal: Learn to compress and reconstruct MNIST digits\n")

losses = train_autoencoder(autoencoder, train_loader, epochs=5)

# Plot training loss
plt.figure(figsize=(10, 5))
plt.plot(losses, marker='o', linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Reconstruction Loss', fontsize=12)
plt.title('üìâ Autoencoder Training Loss', fontsize=14, fontweight='bold')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print("\n‚úÖ Training Complete!")
print("üí° Lower loss = Better reconstruction")

In [None]:
# Visualize Reconstruction Results

def visualize_reconstruction(model, test_loader, n_samples=10):
    """
    Show original vs reconstructed images
    """
    model.eval()
    
    # Get a batch
    images, labels = next(iter(test_loader))
    images = images[:n_samples]
    labels = labels[:n_samples]
    
    # Flatten and reconstruct
    images_flat = images.view(-1, 784).to(device)
    with torch.no_grad():
        reconstructed, latent = model(images_flat)
    
    # Reshape for visualization
    images = images.cpu().numpy()
    reconstructed = reconstructed.view(-1, 1, 28, 28).cpu().numpy()
    
    # Plot
    fig, axes = plt.subplots(2, n_samples, figsize=(20, 4))
    fig.suptitle('üé® Autoencoder: Original vs Reconstructed', fontsize=16, fontweight='bold')
    
    for i in range(n_samples):
        # Original
        axes[0, i].imshow(images[i].squeeze(), cmap='gray')
        axes[0, i].set_title(f'Original\n(Label: {labels[i]})', fontsize=10)
        axes[0, i].axis('off')
        
        # Reconstructed
        axes[1, i].imshow(reconstructed[i].squeeze(), cmap='gray')
        axes[1, i].set_title('Reconstructed', fontsize=10)
        axes[1, i].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    # Print latent space info
    print(f"\nüí° Latent Space Analysis:")
    print(f"   Original: {28*28} pixels")
    print(f"   Compressed: {latent.shape[1]} numbers")
    print(f"   Compression: {(28*28/latent.shape[1]):.1f}x")
    print(f"\n   Latent vector for first image: {latent[0][:8].cpu().numpy()}...")

visualize_reconstruction(autoencoder, test_loader, n_samples=10)

print("\nüéØ Key Observations:")
print("  - Reconstructions look very similar to originals!")
print("  - We compressed 784 numbers ‚Üí 32 numbers ‚Üí 784 numbers")
print("  - The 32 numbers capture the 'essence' of the digit")
print("  - This is the foundation for generative models!")

## üåå Understanding Latent Space

**Latent Space = The compressed representation space**

### What is Latent Space?

**Concept:**
- The "hidden" representation learned by the encoder
- Low-dimensional space that captures key features
- Like a "coordinate system" for all possible images

**Example:**
```
Image of "5" ‚Üí Encoder ‚Üí [0.2, -0.5, 0.8, ...] ‚Üê Latent vector (32 numbers)
                              ‚Üì
                    These 32 numbers encode:
                    - Curvature of the digit
                    - Thickness of lines
                    - Orientation
                    - Style
```

### üéØ Why Latent Space Matters:

**1. Dimensionality Reduction:**
- 784 pixels ‚Üí 32 numbers (much easier to work with!)
- Removes redundancy (neighboring pixels are correlated)

**2. Feature Learning:**
- Automatically discovers important features
- No manual feature engineering needed!

**3. Generation (The Key!):**
- Sample random points in latent space
- Decode them ‚Üí New images!
- This is how we "generate" new data

**4. Interpolation:**
- Smoothly transition between images
- Morph a "3" into a "5"
- Used in DeepFakes, style transfer

### üåü In Modern AI (2024-2025):

**DALL-E / Stable Diffusion:**
- Latent space encodes "concepts" (not just pixels)
- "Cat" + "Astronaut" = points in latent space
- Decoder generates "astronaut cat" image

**ChatGPT:**
- Word embeddings are latent representations
- "King" - "Man" + "Woman" = "Queen" (vector arithmetic in latent space!)

Let's explore latent space!

In [None]:
# Visualize Latent Space (2D projection)

from sklearn.manifold import TSNE

def visualize_latent_space(model, test_loader, n_samples=1000):
    """
    Visualize latent space using t-SNE
    """
    model.eval()
    
    latent_vectors = []
    labels_list = []
    
    # Collect latent vectors
    with torch.no_grad():
        for images, labels in test_loader:
            images = images.view(-1, 784).to(device)
            _, latent = model(images)
            latent_vectors.append(latent.cpu().numpy())
            labels_list.append(labels.numpy())
            
            if len(latent_vectors) * batch_size >= n_samples:
                break
    
    # Concatenate
    latent_vectors = np.concatenate(latent_vectors)[:n_samples]
    labels_list = np.concatenate(labels_list)[:n_samples]
    
    # Reduce to 2D using t-SNE
    print("üîÑ Computing t-SNE (may take a moment...)")
    tsne = TSNE(n_components=2, random_state=42)
    latent_2d = tsne.fit_transform(latent_vectors)
    
    # Plot
    plt.figure(figsize=(12, 10))
    scatter = plt.scatter(latent_2d[:, 0], latent_2d[:, 1], 
                         c=labels_list, cmap='tab10', 
                         alpha=0.6, s=20)
    plt.colorbar(scatter, label='Digit')
    plt.xlabel('Latent Dimension 1', fontsize=12)
    plt.ylabel('Latent Dimension 2', fontsize=12)
    plt.title('üåå Latent Space Visualization (32D ‚Üí 2D)', fontsize=14, fontweight='bold')
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.show()
    
    print("\nüí° Observations:")
    print("  - Similar digits cluster together (same color)")
    print("  - The model learned meaningful representations!")
    print("  - Each cluster = one digit in latent space")
    print("  - Points between clusters = 'in-between' digits")

visualize_latent_space(autoencoder, test_loader, n_samples=1000)

## üé≤ Variational Autoencoders (VAEs)

**VAE = Autoencoder + Probability + Generation**

### Problem with Regular Autoencoders:

**Issue: Latent space is NOT continuous**
- Regular autoencoder: Latent vectors are scattered
- Random sampling from latent space ‚Üí garbage output
- Example: Sample random [0.3, -0.2, 0.7, ...] ‚Üí doesn't decode to a valid digit

**Why?**
- Only trained on REAL data points
- Gaps in latent space have no meaning
- Can't generate NEW samples reliably

### Solution: Variational Autoencoder (VAE)

**Key Idea: Force latent space to be continuous and smooth**

**How?**
1. **Encoder outputs distribution** (not just a point)
   - Regular: Encoder ‚Üí single vector
   - VAE: Encoder ‚Üí mean (Œº) and variance (œÉ¬≤)

2. **Sample from distribution**
   - z = Œº + œÉ * Œµ (where Œµ ~ N(0,1))
   - Adds randomness during training

3. **Regularize latent space**
   - Force distributions to be similar to N(0,1)
   - Uses KL-Divergence loss
   - Result: Smooth, continuous latent space!

### Architecture:

```
Input Image
    ‚Üì
  ENCODER
    ‚Üì
  Œº (mean) and œÉ¬≤ (variance)
    ‚Üì
z = Œº + œÉ * Œµ  ‚Üê Sampling!
    ‚Üì
  DECODER
    ‚Üì
Reconstructed Image
```

### Loss Function:

**Total Loss = Reconstruction Loss + KL Divergence**

1. **Reconstruction Loss:** How well can we reconstruct the input?
   - Same as regular autoencoder (MSE)

2. **KL Divergence:** How different is our distribution from standard normal?
   - Regularizes latent space
   - Prevents overfitting to specific points

### üéØ Why VAEs Matter:

**Generation:**
- Sample z ~ N(0,1)
- Decode(z) ‚Üí NEW image!
- Smooth latent space ‚Üí realistic outputs

**Interpolation:**
- Smooth transitions between images
- Morph one face into another

### üåü Modern Applications (2024-2025):

- **Stable Diffusion:** Uses VAE for image compression before diffusion
- **Music Generation:** Generate new melodies (MusicVAE)
- **Drug Discovery:** Generate molecular structures
- **Image Editing:** Smooth interpolation for transitions
- **Anomaly Detection:** Identify outliers in latent space

Let's implement a VAE!

In [None]:
# Variational Autoencoder Implementation

class VAE(nn.Module):
    def __init__(self, input_dim=784, latent_dim=32):
        """
        Variational Autoencoder
        """
        super(VAE, self).__init__()
        
        # ENCODER
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
        )
        
        # Latent space parameters
        self.fc_mu = nn.Linear(64, latent_dim)      # Mean
        self.fc_logvar = nn.Linear(64, latent_dim)  # Log variance
        
        # DECODER
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 128),
            nn.ReLU(),
            nn.Linear(128, input_dim),
            nn.Sigmoid()
        )
        
    def encode(self, x):
        """
        Encode input to latent distribution parameters
        """
        h = self.encoder(x)
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        return mu, logvar
    
    def reparameterize(self, mu, logvar):
        """
        Reparameterization trick: z = Œº + œÉ * Œµ
        """
        std = torch.exp(0.5 * logvar)  # Standard deviation
        eps = torch.randn_like(std)    # Random noise from N(0,1)
        z = mu + std * eps              # Sample from N(Œº, œÉ¬≤)
        return z
    
    def decode(self, z):
        """
        Decode latent vector to reconstruction
        """
        return self.decoder(z)
    
    def forward(self, x):
        """
        Full forward pass
        """
        # Encode
        mu, logvar = self.encode(x)
        
        # Sample latent vector
        z = self.reparameterize(mu, logvar)
        
        # Decode
        reconstructed = self.decode(z)
        
        return reconstructed, mu, logvar

# Create VAE
vae = VAE(input_dim=784, latent_dim=32).to(device)

print("‚úÖ Variational Autoencoder Created!")
print(f"\nArchitecture:")
print(vae)
print(f"\nüí° Key Difference from Regular Autoencoder:")
print("   - Encoder outputs Œº (mean) and œÉ¬≤ (variance)")
print("   - Sampling step: z = Œº + œÉ * Œµ")
print("   - Enables generation from random sampling!")

In [None]:
# VAE Loss Function

def vae_loss(reconstructed, original, mu, logvar):
    """
    VAE loss = Reconstruction Loss + KL Divergence
    
    Args:
        reconstructed: Decoder output
        original: Original input
        mu: Mean from encoder
        logvar: Log variance from encoder
    """
    # Reconstruction loss (Binary Cross Entropy)
    BCE = F.binary_cross_entropy(reconstructed, original, reduction='sum')
    
    # KL Divergence: KL(N(Œº,œÉ¬≤) || N(0,1))
    # Formula: -0.5 * sum(1 + log(œÉ¬≤) - Œº¬≤ - œÉ¬≤)
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    
    return BCE + KLD, BCE, KLD

print("‚úÖ VAE Loss Function Defined")
print("\nüìä Loss Components:")
print("  1. Reconstruction Loss (BCE):")
print("     - Measures how well we reconstruct the input")
print("     - Lower = better reconstruction")
print("\n  2. KL Divergence (KLD):")
print("     - Measures distance from standard normal N(0,1)")
print("     - Regularizes latent space to be smooth")
print("     - Prevents overfitting")
print("\n  Total Loss = BCE + KLD")
print("  (Balance between reconstruction and regularization)")

In [None]:
# Train VAE

def train_vae(model, train_loader, epochs=5):
    """
    Train VAE
    """
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    model.train()
    
    losses = {'total': [], 'bce': [], 'kld': []}
    
    for epoch in range(epochs):
        epoch_loss = 0
        epoch_bce = 0
        epoch_kld = 0
        
        for batch_idx, (images, _) in enumerate(train_loader):
            images = images.view(-1, 784).to(device)
            
            # Forward pass
            reconstructed, mu, logvar = model(images)
            
            # Calculate loss
            loss, bce, kld = vae_loss(reconstructed, images, mu, logvar)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            epoch_loss += loss.item()
            epoch_bce += bce.item()
            epoch_kld += kld.item()
        
        # Average losses
        avg_loss = epoch_loss / len(train_loader.dataset)
        avg_bce = epoch_bce / len(train_loader.dataset)
        avg_kld = epoch_kld / len(train_loader.dataset)
        
        losses['total'].append(avg_loss)
        losses['bce'].append(avg_bce)
        losses['kld'].append(avg_kld)
        
        print(f"Epoch [{epoch+1}/{epochs}]")
        print(f"  Total: {avg_loss:.4f} | BCE: {avg_bce:.4f} | KLD: {avg_kld:.4f}")
    
    return losses

print("üöÄ Training VAE...")
print("Goal: Learn smooth latent space for generation\n")

vae_losses = train_vae(vae, train_loader, epochs=5)

# Plot losses
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Total loss
axes[0].plot(vae_losses['total'], marker='o', linewidth=2, color='purple')
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('üìâ Total Loss', fontsize=13, fontweight='bold')
axes[0].grid(alpha=0.3)

# BCE
axes[1].plot(vae_losses['bce'], marker='s', linewidth=2, color='blue')
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Loss', fontsize=12)
axes[1].set_title('üìâ Reconstruction Loss (BCE)', fontsize=13, fontweight='bold')
axes[1].grid(alpha=0.3)

# KLD
axes[2].plot(vae_losses['kld'], marker='^', linewidth=2, color='red')
axes[2].set_xlabel('Epoch', fontsize=12)
axes[2].set_ylabel('Loss', fontsize=12)
axes[2].set_title('üìâ KL Divergence', fontsize=13, fontweight='bold')
axes[2].grid(alpha=0.3)

plt.tight_layout()
plt.show()

print("\n‚úÖ VAE Training Complete!")

## üåü Real AI Example: Image Denoising and Reconstruction

**Task:** Remove noise from corrupted images using VAE

### Real-World Applications:

**Medical Imaging (2024-2025):**
- üè• MRI scans: Remove noise, enhance quality
- ü©ª X-rays: Denoise for better diagnosis
- üß† Brain scans: Clean up artifacts

**Photography:**
- üì∏ Low-light enhancement (Google Night Sight)
- üåô Astrophotography: Remove sensor noise
- üì± Smartphone cameras: Computational photography

**Satellite Imagery:**
- üõ∞Ô∏è Weather prediction: Clean up atmospheric interference
- üåç Earth observation: Enhance resolution

**Video Restoration:**
- üé¨ Old film restoration (Disney uses this!)
- üì∫ Upscaling SD to HD/4K

Let's denoise images with our VAE!

In [None]:
# Image Denoising with VAE

def add_noise(images, noise_factor=0.5):
    """
    Add Gaussian noise to images
    """
    noisy = images + noise_factor * torch.randn_like(images)
    noisy = torch.clamp(noisy, 0., 1.)  # Keep values in [0, 1]
    return noisy

def denoise_images(model, test_loader, n_samples=10):
    """
    Demonstrate denoising with VAE
    """
    model.eval()
    
    # Get images
    images, labels = next(iter(test_loader))
    images = images[:n_samples]
    labels = labels[:n_samples]
    
    # Add noise
    noisy_images = add_noise(images, noise_factor=0.5)
    
    # Denoise
    noisy_flat = noisy_images.view(-1, 784).to(device)
    with torch.no_grad():
        denoised, _, _ = model(noisy_flat)
    
    # Convert to numpy
    original = images.cpu().numpy()
    noisy = noisy_images.cpu().numpy()
    denoised = denoised.view(-1, 1, 28, 28).cpu().numpy()
    
    # Visualize
    fig, axes = plt.subplots(3, n_samples, figsize=(20, 6))
    fig.suptitle('üé® VAE Image Denoising: Real AI Application', 
                 fontsize=16, fontweight='bold')
    
    for i in range(n_samples):
        # Original
        axes[0, i].imshow(original[i].squeeze(), cmap='gray')
        axes[0, i].set_title(f'Original\n(Label: {labels[i]})', fontsize=9)
        axes[0, i].axis('off')
        
        # Noisy
        axes[1, i].imshow(noisy[i].squeeze(), cmap='gray')
        axes[1, i].set_title('Noisy (50%)', fontsize=9)
        axes[1, i].axis('off')
        
        # Denoised
        axes[2, i].imshow(denoised[i].squeeze(), cmap='gray')
        axes[2, i].set_title('Denoised (VAE)', fontsize=9)
        axes[2, i].axis('off')
    
    plt.tight_layout()
    plt.show()

denoise_images(vae, test_loader, n_samples=10)

print("\nüéØ Real-World Impact:")
print("\nüì± Smartphone Photography:")
print("  - Google Pixel: Uses similar denoising for Night Sight")
print("  - iPhone: Computational photography with neural networks")
print("  - Result: Clear photos even in very low light")
print("\nüè• Medical Imaging:")
print("  - MRI/CT scans: Reduce noise without losing detail")
print("  - Allows lower radiation doses (safer for patients!)")
print("  - Better diagnosis from clearer images")
print("\nüé¨ Video Production:")
print("  - Film restoration: Clean up old footage")
print("  - Disney+: Enhanced classic movies using AI denoising")
print("  - YouTube: Real-time noise reduction for creators")
print("\nüí° The same VAE principles power these real applications!")

In [None]:
# Generate NEW Images from VAE

def generate_new_images(model, n_samples=16):
    """
    Generate completely new images by sampling from N(0,1)
    """
    model.eval()
    
    with torch.no_grad():
        # Sample from standard normal distribution
        z = torch.randn(n_samples, 32).to(device)
        
        # Decode to images
        generated = model.decode(z)
    
    # Visualize
    generated = generated.view(-1, 1, 28, 28).cpu().numpy()
    
    fig, axes = plt.subplots(4, 4, figsize=(10, 10))
    fig.suptitle('‚ú® Generated Images from Random Latent Vectors', 
                 fontsize=16, fontweight='bold')
    
    for i in range(n_samples):
        ax = axes[i // 4, i % 4]
        ax.imshow(generated[i].squeeze(), cmap='gray')
        ax.set_title(f'Sample {i+1}', fontsize=10)
        ax.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\nüé® What Just Happened?")
    print("  1. Sampled random vectors from N(0,1)")
    print("  2. Passed through decoder")
    print("  3. Generated COMPLETELY NEW digit-like images!")
    print("\nüí° This is the CORE of generative AI:")
    print("  - Random noise ‚Üí Meaningful output")
    print("  - Same principle as DALL-E, Midjourney, Stable Diffusion!")

generate_new_images(vae, n_samples=16)

print("\nüåü From VAE to Modern AI (2024-2025):")
print("\nüìä Evolution:")
print("  VAE (2013) ‚Üí GANs (2014) ‚Üí Diffusion Models (2020) ‚Üí DALL-E 3 (2024)")
print("\nüé® Stable Diffusion:")
print("  - Uses VAE for image compression (512x512 ‚Üí 64x64 latent)")
print("  - Diffusion happens in latent space (faster!)")
print("  - VAE decoder: latent ‚Üí final image")
print("\n‚úçÔ∏è ChatGPT/GPT-4:")
print("  - Text generation = sampling from latent space of language")
print("  - Transformer = advanced encoder-decoder")
print("  - Same generative principles!")

## üéØ Interactive Exercises

Test your understanding of Generative AI!

### Exercise 1: Discriminative vs Generative

**Task:** Classify these AI systems as Discriminative or Generative

1. Gmail spam filter
2. DALL-E image generator
3. Face recognition (iPhone Face ID)
4. ChatGPT text generation
5. Credit card fraud detection
6. Midjourney art generator
7. Netflix recommendation system
8. Sora video generator

In [None]:
# YOUR ANSWERS HERE
# Example: {"system": "type"}

answers = {
    "Gmail spam filter": "?",
    "DALL-E": "?",
    "Face ID": "?",
    "ChatGPT": "?",
    "Fraud detection": "?",
    "Midjourney": "?",
    "Netflix recommendations": "?",
    "Sora": "?",
}

# Uncomment to see solution
# for system, answer in answers.items():
#     print(f"{system}: {answer}")

<details>
<summary>üìñ Click here for solution</summary>

```python
answers = {
    "Gmail spam filter": "Discriminative (classifies spam vs not spam)",
    "DALL-E": "Generative (creates new images)",
    "Face ID": "Discriminative (verifies identity)",
    "ChatGPT": "Generative (generates text)",
    "Fraud detection": "Discriminative (classifies fraud vs legitimate)",
    "Midjourney": "Generative (creates art)",
    "Netflix recommendations": "Discriminative (predicts what you'll like)",
    "Sora": "Generative (creates videos)",
}
```

**Pattern:**
- Classification/Detection ‚Üí Discriminative
- Creation/Generation ‚Üí Generative
</details>

### Exercise 2: Understanding Latent Space

**Question:** Why is a smooth, continuous latent space important for VAEs?

**Think about:**
- What happens when you sample a random point?
- Why do we need KL Divergence loss?
- How does this enable generation?

<details>
<summary>üìñ Click here for answer</summary>

**Why Smooth Latent Space Matters:**

1. **Generation Capability:**
   - Smooth space: ANY random point decodes to valid output
   - Gaps/holes: Random samples ‚Üí garbage
   - Example: If latent space has gaps, sampling might give "non-digit" output

2. **Interpolation:**
   - Smooth transitions between points
   - Morph digit "3" to "5" smoothly
   - Used in face morphing, style transfer

3. **KL Divergence Role:**
   - Forces latent distributions to be similar to N(0,1)
   - Prevents scattered, disconnected clusters
   - Creates continuous manifold of valid samples

4. **Analogy:**
   - Regular autoencoder: Scattered islands in ocean
   - VAE: Continuous landmass (can walk anywhere)

**Real Impact:**
- Stable Diffusion: Smooth latent space allows smooth image variations
- Face generation: Realistic faces at ANY latent point
- Text: GPT models learn smooth language manifold
</details>

### Exercise 3: Modify the VAE

**Task:** Experiment with different latent dimensions

**Questions:**
1. What happens with latent_dim = 2? (very compressed)
2. What happens with latent_dim = 128? (less compressed)
3. Which gives better reconstruction?
4. Which is better for generation?

In [None]:
# YOUR CODE HERE
# Try different latent dimensions

# Example:
# vae_small = VAE(input_dim=784, latent_dim=2).to(device)
# train_vae(vae_small, train_loader, epochs=3)

# Compare reconstruction quality
# Visualize latent space (2D is easy to plot!)

# Your experiments...

<details>
<summary>üìñ Click here for insights</summary>

**Latent Dimension Trade-offs:**

**latent_dim = 2:**
- ‚úÖ Easy to visualize (2D plot)
- ‚úÖ Very compressed
- ‚ùå Poor reconstruction (too much information loss)
- ‚ùå Limited generation diversity
- **Use:** Visualization, understanding structure

**latent_dim = 32 (our choice):**
- ‚úÖ Good reconstruction quality
- ‚úÖ Reasonable compression
- ‚úÖ Good generation
- **Use:** Balanced performance

**latent_dim = 128:**
- ‚úÖ Excellent reconstruction
- ‚úÖ Captures fine details
- ‚ùå Less compression
- ‚ùå Harder to train (more parameters)
- **Use:** When quality > compression

**General Rule:**
- Larger latent dim = Better reconstruction, less compression
- Smaller latent dim = Worse reconstruction, more compression
- Sweet spot depends on application!

**Modern Models:**
- Stable Diffusion: 4-channel latent (64√ó64)
- DALL-E: Uses CLIP embeddings (512-dim)
</details>

## üéì Key Takeaways

**You just learned:**

### 1. **Generative vs Discriminative AI**
   - ‚úÖ Discriminative: Classify/predict (P(y|x))
   - ‚úÖ Generative: Create new data (P(x))
   - ‚úÖ Generative AI is behind DALL-E, ChatGPT, Sora
   - **Key insight:** Generation requires modeling data distribution

### 2. **Autoencoders**
   - ‚úÖ Encoder: Compress input ‚Üí latent representation
   - ‚úÖ Decoder: Reconstruct from latent
   - ‚úÖ Applications: Denoising, compression, feature learning
   - **Limitation:** Can't generate new samples reliably

### 3. **Variational Autoencoders (VAEs)**
   - ‚úÖ Probabilistic latent space (Œº, œÉ¬≤)
   - ‚úÖ Reparameterization trick: z = Œº + œÉ * Œµ
   - ‚úÖ KL Divergence: Regularizes latent space
   - ‚úÖ Enables generation from random sampling!
   - **Key innovation:** Smooth, continuous latent space

### 4. **Real Applications (2024-2025)**
   - üé® **Image Generation:** Stable Diffusion uses VAE
   - üì∏ **Denoising:** Smartphone cameras, medical imaging
   - üé¨ **Restoration:** Film enhancement, upscaling
   - üß¨ **Drug Discovery:** Generate molecular structures
   - **Impact:** Foundation for modern generative AI

### üåü Connections to Modern AI:

**How VAEs relate to 2024-2025 AI:**

1. **Stable Diffusion:**
   - VAE encoder: Image ‚Üí latent space
   - Diffusion: Refine in latent space
   - VAE decoder: Latent ‚Üí high-res image

2. **DALL-E:**
   - Uses VQ-VAE (Vector Quantized VAE)
   - Compresses images to discrete tokens
   - Transformer generates tokens
   - Decoder: Tokens ‚Üí image

3. **ChatGPT:**
   - Latent space = embedding space
   - Generation = sampling from language manifold
   - Attention = sophisticated encoder-decoder

### üìä Comparison:

| Feature | Autoencoder | VAE |
|---------|------------|-----|
| Latent space | Deterministic | Probabilistic |
| Generation | ‚ùå Poor | ‚úÖ Good |
| Reconstruction | ‚úÖ Excellent | ‚úÖ Good |
| Training | Simple | More complex |
| Loss | MSE only | MSE + KL Div |

---

**üéâ Congratulations!** You now understand:
- How generative AI works at a fundamental level
- The building blocks of DALL-E, Stable Diffusion, and modern AI
- How to compress and generate data with neural networks

**Next:** We'll learn GANs - an even more powerful generative approach! üöÄ

## üöÄ Next Steps

**Practice Exercises:**
1. Train VAE on different datasets (Fashion-MNIST, CIFAR-10)
2. Experiment with different latent dimensions (2, 8, 64, 128)
3. Implement conditional VAE (control what digit to generate)
4. Try interpolation between two images
5. Build a denoising autoencoder for audio

**Coming Next:**
- **Day 2:** GANs (Generative Adversarial Networks) - Generator vs Discriminator!
- **Day 3:** Advanced Models - Diffusion, StyleGAN, DALL-E concepts

---

**üí° Deep Dive Resources:**
- "Auto-Encoding Variational Bayes" (Kingma & Welling, 2013)
- "Tutorial on Variational Autoencoders" (Carl Doersch)
- Stanford CS231n Lecture on Generative Models
- Fast.ai: Deep Learning for Coders (Part 2)

---

*Remember: VAEs are the foundation of modern generative AI. Understanding them unlocks understanding DALL-E, Stable Diffusion, and more!* üåü

**üéØ You now know how machines learn to create!**