# CMSC 178IP - Digital Image Processing
# Final Examination (Practical)

**Student Name:** _______________________

**Student Number:** _______________________

**Date:** _______________________

---

## Exam Information

| Item | Details |
|------|---------|
| **Total Points** | 100 (90 base + 10 bonus) |
| **Time Allocation** | 2-3 hours (self-paced) |
| **Deadline** | 1 week from release |
| **Format** | Jupyter Notebook + PDF export |

## Exam Structure (Ordered by Complexity)

| Part | Topic | Points | Difficulty |
|------|-------|--------|------------|
| Part 0 | Image Representation & Basics | 20 | ‚≠ê Easiest |
| Part 1 | Spatial Operations (Filtering, Edges, Thresholding) | 25 | ‚≠ê‚≠ê Medium |
| Part 2 | CNN Architecture Analysis | 25 | ‚≠ê‚≠ê‚≠ê Medium-Hard |
| Part 3 | Generative Models | 20 | ‚≠ê‚≠ê‚≠ê‚≠ê Hardest |
| Bonus | End-to-End Application | 10 | Applied |

## Instructions

1. **Complete all code cells** marked with `# TODO`
2. **Answer all analysis questions** in the designated markdown cells
3. **Complete all COMPARISON requirements** - try multiple approaches where asked
4. **Answer all REFLECTION questions** honestly - these help you learn!
5. **Run all cells** before submission (outputs must be visible)
6. **Export to PDF** and submit both `.ipynb` and `.pdf`
7. **Document LLM usage** including **what you learned** from each interaction
8. **Use YOUR student number as seed** for randomization (see below)
9. **Document your process** - what you tried, what failed, what you learned

## üé≤ Personalized Parameters (REQUIRED)

To ensure each student has unique values, use your **student number as a random seed**:

```python
MY_SEED = int("YOUR_STUDENT_NUMBER"[-6:])  # Last 6 digits of your student number
```

## Grading Breakdown

- **Implementation (40%):** Does your code work correctly?
- **Analysis (40%):** Do you understand WHY it works?
- **Reflection & Comparison (20%):** Did you explore alternatives and reflect on your learning?

---

## Setup and Imports

Run this cell first to import all required libraries.

In [None]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

# Image processing
from skimage import data, color, filters, morphology, measure, exposure, transform
from skimage.util import random_noise
from scipy import ndimage
from scipy.signal import convolve2d

# Deep learning (for analysis - no training required)
try:
    import torch
    import torch.nn as nn
    TORCH_AVAILABLE = True
except ImportError:
    print("PyTorch not available - some cells will use numpy alternatives")
    TORCH_AVAILABLE = False

# Utilities
from sklearn.metrics import accuracy_score, confusion_matrix
import warnings
warnings.filterwarnings('ignore')

# Display settings
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['figure.dpi'] = 100

# ============================================================
# üé≤ PERSONALIZED SEED - REPLACE WITH YOUR STUDENT NUMBER!
# ============================================================
# Replace "123456" with the LAST 6 DIGITS of your student number
# Example: If your student number is 2020-12345, use "012345"

MY_SEED = int("123456")  # TODO: REPLACE WITH YOUR STUDENT NUMBER!

np.random.seed(MY_SEED)

# Generate your personalized parameters
MY_NOISE_VAR = 0.01 + (MY_SEED % 100) / 2000  # Noise variance: 0.01-0.06
MY_SP_AMOUNT = 0.03 + (MY_SEED % 50) / 1000   # S&P amount: 0.03-0.08
MY_LATENT_DIM = 16 + (MY_SEED % 32)           # Latent dim: 16-48

print("‚úÖ All imports successful!")
print(f"NumPy version: {np.__version__}")
print(f"\nüé≤ Your personalized parameters:")
print(f"   Noise variance: {MY_NOISE_VAR:.4f}")
print(f"   S&P amount: {MY_SP_AMOUNT:.4f}")
print(f"   Latent dimension: {MY_LATENT_DIM}")
print(f"\n‚ö†Ô∏è  Did you replace MY_SEED with YOUR student number?")

In [None]:
# Helper function for displaying images
def show_images(images, titles=None, cmap='gray', figsize=(15, 5)):
    """Display multiple images in a row."""
    n = len(images)
    fig, axes = plt.subplots(1, n, figsize=figsize)
    if n == 1:
        axes = [axes]
    for i, (img, ax) in enumerate(zip(images, axes)):
        if img.ndim == 2:
            ax.imshow(img, cmap=cmap)
        else:
            ax.imshow(img)
        if titles:
            ax.set_title(titles[i])
        ax.axis('off')
    plt.tight_layout()
    plt.show()

def calculate_psnr(original, processed):
    """Calculate Peak Signal-to-Noise Ratio."""
    mse = np.mean((original - processed) ** 2)
    if mse == 0:
        return float('inf')
    max_pixel = 1.0 if original.max() <= 1 else 255.0
    return 20 * np.log10(max_pixel / np.sqrt(mse))

print("Helper functions loaded!")

---

# Part 0: Image Representation & Basics (20 points) ‚≠ê

**Time Allocation:** 25-30 minutes | **Difficulty:** Easiest

This section covers fundamental concepts: how images are represented, color spaces, and basic properties.

## 0.1 Image Properties & Data Types (10 points)

Understanding how images are stored and represented is fundamental to image processing.

In [None]:
# Load sample images
grayscale_img = data.camera()  # Grayscale image
color_img = data.astronaut()   # Color (RGB) image

# TODO: Explore image properties
print("=== Grayscale Image Properties ===")
print(f"Shape: {grayscale_img.shape}")
print(f"Data type: {grayscale_img.dtype}")
print(f"Min value: {grayscale_img.min()}")
print(f"Max value: {grayscale_img.max()}")
print(f"Total pixels: {grayscale_img.size}")

print("\n=== Color Image Properties ===")
print(f"Shape: {color_img.shape}")
print(f"Data type: {color_img.dtype}")
# TODO: Print min, max, and explain what the 3rd dimension represents

# Display both images
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(grayscale_img, cmap='gray')
axes[0].set_title(f'Grayscale: {grayscale_img.shape}')
axes[0].axis('off')
axes[1].imshow(color_img)
axes[1].set_title(f'Color (RGB): {color_img.shape}')
axes[1].axis('off')
plt.tight_layout()
plt.show()

# TODO: Extract and display individual color channels
# red_channel = color_img[:, :, 0]
# green_channel = color_img[:, :, 1]
# blue_channel = color_img[:, :, 2]
# Display each channel as a grayscale image

In [None]:
# TODO: Color Space Conversions
# Convert the color image to different color spaces

# RGB to Grayscale
gray_from_rgb = color.rgb2gray(color_img)

# RGB to HSV (Hue, Saturation, Value)
hsv_img = color.rgb2hsv(color_img)

# TODO: Display the original and converted images
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original RGB
axes[0, 0].imshow(color_img)
axes[0, 0].set_title('Original RGB')
axes[0, 0].axis('off')

# Grayscale
axes[0, 1].imshow(gray_from_rgb, cmap='gray')
axes[0, 1].set_title('Grayscale')
axes[0, 1].axis('off')

# HSV - Hue channel
axes[0, 2].imshow(hsv_img[:, :, 0], cmap='hsv')
axes[0, 2].set_title('Hue (H)')
axes[0, 2].axis('off')

# TODO: Display Saturation and Value channels
# axes[1, 0].imshow(hsv_img[:, :, 1], cmap='gray')
# axes[1, 0].set_title('Saturation (S)')

# axes[1, 1].imshow(hsv_img[:, :, 2], cmap='gray')
# axes[1, 1].set_title('Value (V)')

# TODO: What happens when you modify just the Hue?
# modified_hsv = hsv_img.copy()
# modified_hsv[:, :, 0] = (modified_hsv[:, :, 0] + 0.5) % 1.0  # Shift hue
# modified_rgb = color.hsv2rgb(modified_hsv)
# axes[1, 2].imshow(modified_rgb)
# axes[1, 2].set_title('Hue Shifted')

plt.tight_layout()
plt.show()

### Analysis 0.1 (5 points)

**Q1:** A grayscale image has shape `(512, 512)` and a color image has shape `(512, 512, 3)`. Explain what each dimension represents.

*Your answer:*


**Q2:** If an image has dtype `uint8` with values 0-255, and another has dtype `float64` with values 0.0-1.0, are they representing the same information? How would you convert between them?

*Your answer:*


**Q3:** Why is HSV color space useful for image processing tasks like object detection based on color? Give a specific example.

*Your answer:*

## 0.2 Histograms & Intensity Distribution (10 points)

Histograms show the distribution of pixel intensities and are essential for understanding image characteristics.

In [None]:
# Load images with different characteristics
dark_img = exposure.adjust_gamma(data.camera(), gamma=2.0)  # Darker
bright_img = exposure.adjust_gamma(data.camera(), gamma=0.5)  # Brighter
low_contrast = exposure.rescale_intensity(data.camera(), out_range=(80, 180))
normal_img = data.camera()

# TODO: Plot histograms for each image
fig, axes = plt.subplots(2, 4, figsize=(16, 8))

images = [dark_img, normal_img, bright_img, low_contrast]
titles = ['Dark Image', 'Normal Image', 'Bright Image', 'Low Contrast']

for i, (img, title) in enumerate(zip(images, titles)):
    # Display image
    axes[0, i].imshow(img, cmap='gray', vmin=0, vmax=255)
    axes[0, i].set_title(title)
    axes[0, i].axis('off')
    
    # TODO: Plot histogram
    # axes[1, i].hist(img.ravel(), bins=256, range=(0, 256), color='gray', alpha=0.7)
    # axes[1, i].set_xlabel('Pixel Value')
    # axes[1, i].set_ylabel('Frequency')
    # axes[1, i].set_xlim(0, 256)

plt.tight_layout()
plt.show()

# TODO: Calculate basic statistics for each image
for img, title in zip(images, titles):
    print(f"{title}: Mean={img.mean():.1f}, Std={img.std():.1f}, Min={img.min()}, Max={img.max()}")

### Analysis 0.2 (5 points)

**Q1:** By looking at a histogram, how can you tell if an image is: (a) too dark, (b) too bright, (c) low contrast?

*Your answer:*


**Q2:** What does the standard deviation of pixel values tell you about an image? What would a very low standard deviation indicate?

*Your answer:*


### Comparison Requirement

**Compare the histograms** of the four images above. For each, describe in 1-2 sentences what the histogram shape tells you about the image.

| Image | Histogram Description |
|-------|----------------------|
| Dark Image | |
| Normal Image | |
| Bright Image | |
| Low Contrast | |

---

## Part 0 Reflection (Required)

**What was the MOST intuitive concept in this section?**

*Your answer:*


**If you had to explain "color space" to a friend, what analogy would you use?**

*Your answer:*

## 0.3 Introduction to Kernels (5 points)

A **kernel** (or filter) is a small matrix that slides over an image to produce effects like blurring, sharpening, or edge detection. This is the foundation of convolution.

In [None]:
# Exploring basic kernels - see what different small matrices do to an image
from scipy.signal import convolve2d

# Load a test image
test_img = data.camera() / 255.0

# Define some common kernels
kernels = {
    'Identity': np.array([[0, 0, 0],
                          [0, 1, 0],
                          [0, 0, 0]]),
    
    'Box Blur (3x3)': np.array([[1, 1, 1],
                                 [1, 1, 1],
                                 [1, 1, 1]]) / 9,
    
    'Sharpen': np.array([[0, -1, 0],
                         [-1, 5, -1],
                         [0, -1, 0]]),
    
    'Edge Detect (Horizontal)': np.array([[-1, -2, -1],
                                           [0, 0, 0],
                                           [1, 2, 1]]),
    
    'Edge Detect (Vertical)': np.array([[-1, 0, 1],
                                         [-2, 0, 2],
                                         [-1, 0, 1]])
}

# Apply each kernel and display results
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Original image
axes[0].imshow(test_img, cmap='gray')
axes[0].set_title('Original Image')
axes[0].axis('off')

# Apply kernels
for i, (name, kernel) in enumerate(kernels.items()):
    if i >= 5:
        break
    result = convolve2d(test_img, kernel, mode='same', boundary='symm')
    # Clip for display (edge detection can have negative values)
    axes[i+1].imshow(np.clip(result, 0, 1), cmap='gray')
    axes[i+1].set_title(f'{name}')
    axes[i+1].axis('off')

plt.tight_layout()
plt.show()

# TODO: Print one of the kernels to see its values
print("Example: Sharpen kernel:")
print(kernels['Sharpen'])
print("\nNotice how the center value (5) is larger than the sum of neighbors (-4)")
print("This emphasizes the center pixel while subtracting surrounding values.")

### Analysis 0.3 (2 points)

**Q1:** Looking at the results above, describe what each kernel does to the image:
- **Box Blur:** 
- **Sharpen:** 
- **Edge Detect (Horizontal):** 

*Your answer:*


**Q2:** The sharpen kernel has a center value of 5 and surrounding values that sum to -4. Why do these values create a sharpening effect?

*Your answer:*


**Q3:** Why do edge detection kernels produce mostly dark images with bright lines where edges exist?

*Your answer:*

---

# Part 1: Spatial Operations (25 points) ‚≠ê‚≠ê

**Time Allocation:** 40-50 minutes | **Difficulty:** Medium

This section covers spatial filtering, edge detection, and basic segmentation.

## 1.1 Noise Filtering (8 points)

Different types of noise require different filtering approaches.

In [None]:
# Load test image and create noisy versions
original = data.camera() / 255.0

# Create noisy versions using YOUR personalized parameters
noisy_gaussian = random_noise(original, mode='gaussian', var=MY_NOISE_VAR)
noisy_sp = random_noise(original, mode='s&p', amount=MY_SP_AMOUNT)

print(f"üé≤ Your personalized noise levels:")
print(f"   Gaussian variance: {MY_NOISE_VAR:.4f}")
print(f"   S&P amount: {MY_SP_AMOUNT:.4f}")

show_images([original, noisy_gaussian, noisy_sp], 
            ['Original', 'Gaussian Noise', 'Salt & Pepper Noise'])

# TODO: Apply appropriate filters for each noise type
# For Gaussian noise: try Gaussian blur or bilateral filter
filtered_gaussian = None  # TODO: filters.gaussian(noisy_gaussian, sigma=?)

# For Salt & Pepper: try median filter
filtered_sp = None  # TODO: filters.median(noisy_sp, morphology.disk(?))

# TODO: Display filtered results
# show_images([noisy_gaussian, filtered_gaussian, noisy_sp, filtered_sp],
#             ['Gaussian Noise', 'Filtered', 'S&P Noise', 'Filtered'],
#             figsize=(16, 4))

### Analysis 1.1 (3 points)

**Q1:** What type of noise is each (Gaussian vs S&P)? How can you visually identify them?

*Your answer:*


**Q2:** Why does median filter work better for S&P noise than Gaussian blur?

*Your answer:*

## 1.2 Edge Detection (8 points)

Edges represent boundaries between regions of different intensities.

In [None]:
# Edge Detection using Sobel and Canny operators
from skimage.feature import canny

edge_image = data.camera() / 255.0

# TODO: Apply Sobel edge detection
sobel_x = filters.sobel_h(edge_image)  # Horizontal edges
sobel_y = filters.sobel_v(edge_image)  # Vertical edges
sobel_magnitude = None  # TODO: Calculate magnitude = sqrt(sobel_x^2 + sobel_y^2)

# TODO: Apply Canny edge detection with different sigma values
canny_sigma1 = None  # TODO: canny(edge_image, sigma=1)
canny_sigma3 = None  # TODO: canny(edge_image, sigma=3)

# Display results
# show_images([edge_image, sobel_magnitude, canny_sigma1, canny_sigma3],
#             ['Original', 'Sobel Magnitude', 'Canny œÉ=1', 'Canny œÉ=3'],
#             figsize=(16, 4))

### Analysis 1.2 (3 points)

**Q1:** What is the difference between Sobel and Canny edge detection?

*Your answer:*


**Q2:** How does the sigma parameter in Canny affect the results?

*Your answer:*

## 1.3 Thresholding & Segmentation (9 points)

Thresholding converts grayscale images to binary, separating foreground from background.

In [None]:
# Load coins image for thresholding
coins = data.coins()

# TODO: Apply different thresholding methods

# 1. Manual threshold
threshold_manual = 100
binary_manual = coins > threshold_manual

# 2. Otsu's automatic threshold
threshold_otsu = None  # TODO: filters.threshold_otsu(coins)
binary_otsu = None  # TODO: coins > threshold_otsu

# 3. Adaptive (local) threshold
# TODO: filters.threshold_local(coins, block_size=35)
binary_adaptive = None

# Display results
# show_images([coins, binary_manual, binary_otsu, binary_adaptive],
#             ['Original', f'Manual (t={threshold_manual})', 
#              f'Otsu (t={threshold_otsu})', 'Adaptive'],
#             figsize=(16, 4))

# Try different manual thresholds for comparison
for t in [80, 100, 120, 140]:
    binary = coins > t
    # Count approximate number of "coin pixels"
    # print(f"Threshold {t}: {np.sum(binary)} foreground pixels")

### Analysis 1.3 (4 points)

**Q1:** Why does Otsu's method select the threshold it does? What is it optimizing?

*Your answer:*


**Q2:** When would adaptive thresholding be preferred over Otsu's?

*Your answer:*


### Comparison Requirement

Try **3 different manual thresholds**. Document your observations:

| Threshold | Effect on Coins | Effect on Background |
|-----------|-----------------|---------------------|
| 80 | | |
| 100 | | |
| 120 | | |

---

## Part 1 Reflection (Required)

**Which technique (filtering, edge detection, or thresholding) was EASIEST to understand?**

*Your answer:*


**Which was HARDEST? Why?**

*Your answer:*

---

# Part 2: CNN Architecture Analysis (25 points) ‚≠ê‚≠ê‚≠ê

**Time Allocation:** 40-50 minutes | **Difficulty:** Medium-Hard

This section tests your understanding of Convolutional Neural Networks.

## 2.1 Convolution Operation (10 points)

Implement 2D convolution from scratch to understand how CNNs work.

In [None]:
def convolve2d_manual(image, kernel):
    """
    Implement 2D convolution from scratch.
    
    This is how CNN convolution layers work internally!
    
    Parameters:
    -----------
    image : 2D numpy array
        Input grayscale image
    kernel : 2D numpy array
        Convolution kernel (filter)
    
    Returns:
    --------
    output : 2D numpy array
        Convolved image (same size as input)
    
    Steps:
    1. Flip the kernel (required for true convolution)
    2. Pad the image with zeros
    3. Slide kernel over padded image
    4. At each position: element-wise multiply and sum
    """
    # Get dimensions
    img_h, img_w = image.shape
    k_h, k_w = kernel.shape
    
    # TODO: Step 1 - Flip the kernel (180 degree rotation)
    # Hint: np.flip(kernel) or kernel[::-1, ::-1]
    flipped_kernel = None  # TODO: Flip the kernel
    
    # TODO: Step 2 - Calculate padding needed for 'same' output size
    # For 'same' convolution, we need padding of (kernel_size - 1) / 2
    pad_h = k_h // 2
    pad_w = k_w // 2
    
    # TODO: Step 3 - Zero-pad the image
    # Hint: np.pad(image, ((pad_h, pad_h), (pad_w, pad_w)), mode='constant')
    padded = None  # TODO: Pad the image
    
    # TODO: Step 4 - Create output array
    output = np.zeros((img_h, img_w))
    
    # TODO: Step 5 - Perform convolution
    # For each output pixel position (i, j):
    #   - Extract the region from padded image
    #   - Multiply element-wise with flipped kernel
    #   - Sum all values
    
    # for i in range(img_h):
    #     for j in range(img_w):
    #         region = padded[i:i+k_h, j:j+k_w]
    #         output[i, j] = np.sum(region * flipped_kernel)
    
    return output

# Test your implementation
test_image = data.camera()[:64, :64] / 255.0  # Small test image
test_kernel = np.array([[1, 0, -1],
                        [2, 0, -2],
                        [1, 0, -1]])  # Sobel-x kernel

# Your implementation
# my_result = convolve2d_manual(test_image, test_kernel)

# Compare with scipy (ground truth)
from scipy.signal import convolve2d as scipy_convolve2d
scipy_result = scipy_convolve2d(test_image, test_kernel, mode='same', boundary='fill')

# Validation
# diff = np.abs(my_result - scipy_result).max()
# print(f"Maximum difference from scipy: {diff:.6f}")
# print("‚úÖ Passed!" if diff < 1e-5 else "‚ùå Check your implementation")

# Display results
# fig, axes = plt.subplots(1, 3, figsize=(12, 4))
# axes[0].imshow(test_image, cmap='gray')
# axes[0].set_title('Original')
# axes[0].axis('off')
# axes[1].imshow(my_result, cmap='gray')
# axes[1].set_title('Your Convolution')
# axes[1].axis('off')
# axes[2].imshow(scipy_result, cmap='gray')
# axes[2].set_title('SciPy Reference')
# axes[2].axis('off')
# plt.tight_layout()
# plt.show()

### Analysis 2.1 (4 points)

**Q1:** Why does convolution require flipping the kernel? What would happen if we didn't flip it?

*Your answer:*


**Q2:** Explain why zero-padding is needed for "same" convolution. What happens to the output size without padding?

*Your answer:*


**Q3:** The nested for-loop approach above is slow. In a real CNN, how is convolution made efficient?

*Your answer:*


## 2.2 CNN Parameter Calculation (8 points)

Understanding how parameters are calculated is crucial for designing efficient networks.

In [None]:
# CNN Architecture: Calculate parameters and output shapes
# Consider this simple CNN for CIFAR-10 (32x32x3 input, 10 classes)

"""
Layer 1: Conv2D(filters=32, kernel_size=3x3, input_channels=3)
Layer 2: MaxPool2D(pool_size=2x2)
Layer 3: Conv2D(filters=64, kernel_size=3x3)
Layer 4: MaxPool2D(pool_size=2x2)
Layer 5: Flatten
Layer 6: Dense(128)
Layer 7: Dense(10)  # Output layer
"""

# TODO: Calculate output shape after each layer
# Input: 32x32x3

print("=== Output Shapes ===")
print("Input:                32 √ó 32 √ó 3")
print("After Conv2D(32,3√ó3): ___ √ó ___ √ó ___")  # TODO
print("After MaxPool(2√ó2):   ___ √ó ___ √ó ___")  # TODO
print("After Conv2D(64,3√ó3): ___ √ó ___ √ó ___")  # TODO
print("After MaxPool(2√ó2):   ___ √ó ___ √ó ___")  # TODO
print("After Flatten:        ___")              # TODO
print("After Dense(128):     ___")              # TODO
print("After Dense(10):      ___")              # TODO

# TODO: Calculate parameters for each layer
# Formula for Conv2D: (kernel_h √ó kernel_w √ó input_channels + 1) √ó num_filters
# Formula for Dense: (input_features + 1) √ó output_features

print("\n=== Parameters ===")
print("Conv2D(32, 3√ó3, in=3):  (3 √ó 3 √ó 3 + 1) √ó 32 = ___")  # TODO
print("Conv2D(64, 3√ó3, in=32): (3 √ó 3 √ó ___ + 1) √ó 64 = ___")  # TODO
print("Dense(128):             (___ + 1) √ó 128 = ___")  # TODO (after flatten)
print("Dense(10):              (128 + 1) √ó 10 = ___")  # TODO
print("\nTotal parameters: ___")  # TODO

### Analysis 2.2 (3 points)

**Q1:** Why do Conv2D layers have far fewer parameters than Dense layers, even though they process the entire image?

*Your answer:*


**Q2:** What is the purpose of MaxPooling? What would happen if we removed all pooling layers?

*Your answer:*


**Q3:** If we doubled the number of filters in each Conv2D layer, how would that affect the total parameters?

*Your answer:*


## 2.3 Feature Maps (7 points)

Visualize what different convolutional filters detect in an image.

In [None]:
# Visualize feature maps from different filters
from scipy.signal import convolve2d

# Load test image
feature_img = data.camera() / 255.0

# Define filters that detect different features
feature_filters = {
    'Horizontal Edges': np.array([[-1, -2, -1],
                                   [0, 0, 0],
                                   [1, 2, 1]]),
    
    'Vertical Edges': np.array([[-1, 0, 1],
                                 [-2, 0, 2],
                                 [-1, 0, 1]]),
    
    'Diagonal (/)': np.array([[0, 1, 2],
                               [-1, 0, 1],
                               [-2, -1, 0]]),
    
    'Diagonal (\\)': np.array([[2, 1, 0],
                                [1, 0, -1],
                                [0, -1, -2]]),
    
    'Laplacian (Blob)': np.array([[0, 1, 0],
                                   [1, -4, 1],
                                   [0, 1, 0]])
}

# Apply each filter and display feature maps
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.flatten()

# Original
axes[0].imshow(feature_img, cmap='gray')
axes[0].set_title('Original Image')
axes[0].axis('off')

# Feature maps
for i, (name, kernel) in enumerate(feature_filters.items()):
    feature_map = convolve2d(feature_img, kernel, mode='same', boundary='symm')
    # Take absolute value for visualization
    axes[i+1].imshow(np.abs(feature_map), cmap='hot')
    axes[i+1].set_title(f'Feature: {name}')
    axes[i+1].axis('off')

plt.tight_layout()
plt.show()

# TODO: Observe which parts of the image activate each filter
print("Observe: Different filters 'light up' for different image features!")
print("- Horizontal edges filter responds to the tripod legs")
print("- Vertical edges filter responds to the camera body edges")
print("- This is how early CNN layers learn to detect basic features!")

### Analysis 2.3 (4 points)

**Q1:** Looking at the feature maps above, which filter responds most strongly to the camera's tripod? Why?

*Your answer:*


**Q2:** In a CNN, early layers learn filters similar to what we see above (edges, blobs). What do DEEPER layers learn to detect?

*Your answer:*


**Q3:** Why is the concept of "hierarchical feature learning" important for image classification?

*Your answer:*

---

## Part 2 Reflection (Required)

**What was the MOST challenging concept in this section (convolution, parameters, or feature maps)?**

*Your answer:*


**In your own words, explain why CNNs are better than fully-connected networks for image classification.**

*Your answer:*


**If you were designing a CNN to recognize faces, what kinds of features would you expect early vs. late layers to detect?**

*Your answer:*

---

# Part 3: Generative Models (20 points) ‚≠ê‚≠ê‚≠ê‚≠ê

**Time Allocation:** 35-45 minutes | **Difficulty:** Hardest

This section tests your understanding of autoencoders, VAEs, and GANs.

## 3.1 Autoencoder Architecture (12 points)

Design and analyze an autoencoder architecture for MNIST digit reconstruction.

In [None]:
# TODO: Define autoencoder architecture in pseudocode or PyTorch-like notation
# Input: 28x28 = 784 pixels (flattened)
# Goal: Compress to YOUR personalized latent space dimension, then reconstruct

print(f"üé≤ Your personalized latent dimension: {MY_LATENT_DIM}")
print(f"   Design your autoencoder to compress to {MY_LATENT_DIM} dimensions\n")

# Example structure (complete this with YOUR latent dimension):
"""
Encoder:
    Input(784) 
    ‚Üí Dense(?) + ReLU
    ‚Üí Dense(?) + ReLU  
    ‚Üí Dense(MY_LATENT_DIM)  # Your personalized latent space

Decoder:
    Input(MY_LATENT_DIM)  # From latent space
    ‚Üí Dense(?) + ReLU
    ‚Üí Dense(?) + ReLU
    ‚Üí Dense(784) + Sigmoid  # Reconstruct image
"""

# TODO: Calculate total parameters for YOUR architecture
# The latent dimension affects your parameter count!
# Write your calculation below:

print("Encoder parameters:")
print(f"  Layer 1: (784 + 1) √ó ___ = ___")
print(f"  Layer 2: (___ + 1) √ó ___ = ___")
print(f"  Layer 3: (___ + 1) √ó {MY_LATENT_DIM} = ___")
# TODO: Complete

print("\nDecoder parameters:")
print(f"  Layer 1: ({MY_LATENT_DIM} + 1) √ó ___ = ___")
print(f"  Layer 2: (___ + 1) √ó ___ = ___")
print(f"  Layer 3: (___ + 1) √ó 784 = ___")
# TODO: Complete

print("\nTotal parameters:")
# TODO: Calculate

### Analysis 3.1 (3 points)

**Q1:** What is the purpose of the "bottleneck" (latent space) in an autoencoder? What happens if it's too small or too large?

*Your answer:*


**Q2:** An autoencoder trained on digit images achieves good reconstruction loss, but when you sample random points in the latent space, the decoded images look like noise. Why does this happen?

*Your answer:*


## 3.2 VAE Loss Function (9 points)

Implement and understand the VAE loss function components.

In [None]:
def vae_loss(x_original, x_reconstructed, mu, log_var, beta=1.0):
    """
    Calculate VAE loss = Reconstruction Loss + Œ≤ * KL Divergence
    
    Parameters:
    -----------
    x_original : array
        Original input images (batch_size, 784)
    x_reconstructed : array
        Reconstructed images from decoder (batch_size, 784)
    mu : array
        Mean of latent distribution (batch_size, latent_dim)
    log_var : array
        Log variance of latent distribution (batch_size, latent_dim)
    beta : float
        Weight for KL divergence term
    
    Returns:
    --------
    total_loss, reconstruction_loss, kl_loss
    """
    # TODO: Implement reconstruction loss (MSE or BCE)
    reconstruction_loss = None  # TODO: Mean squared error between original and reconstructed
    
    # TODO: Implement KL divergence
    # KL(q(z|x) || p(z)) = -0.5 * sum(1 + log_var - mu^2 - exp(log_var))
    kl_loss = None  # TODO
    
    total_loss = reconstruction_loss + beta * kl_loss
    
    return total_loss, reconstruction_loss, kl_loss

# Test with dummy data
np.random.seed(42)
batch_size = 32
latent_dim = 16

x_orig = np.random.rand(batch_size, 784)
x_recon = x_orig + np.random.randn(batch_size, 784) * 0.1  # Noisy reconstruction
mu = np.random.randn(batch_size, latent_dim) * 0.5
log_var = np.random.randn(batch_size, latent_dim) * 0.5

# total, recon, kl = vae_loss(x_orig, x_recon, mu, log_var)
# print(f"Total Loss: {total:.4f}")
# print(f"Reconstruction Loss: {recon:.4f}")
# print(f"KL Divergence: {kl:.4f}")

### Analysis 3.2 (3 points)

**Q1:** What is the purpose of the KL divergence term in the VAE loss? What distribution are we encouraging the latent space to match?

*Your answer:*


**Q2:** In Œ≤-VAE, we use Œ≤ > 1 to weight the KL term more heavily. What is the trade-off when increasing Œ≤?

*Your answer:*


## 3.3 GAN Training Dynamics (9 points)

Analyze GAN training behavior and common failure modes.

In [None]:
# Simulated GAN training curves (DO NOT MODIFY)
np.random.seed(42)
epochs = 100

# Scenario A: Healthy training
d_loss_healthy = 0.7 - 0.2 * (1 - np.exp(-np.arange(epochs)/30)) + 0.05 * np.random.randn(epochs)
g_loss_healthy = 2.0 - 1.3 * (1 - np.exp(-np.arange(epochs)/40)) + 0.08 * np.random.randn(epochs)

# Scenario B: Mode collapse  
d_loss_collapse = np.concatenate([0.7 - 0.3 * np.arange(30)/30, np.ones(70) * 0.1 + 0.02 * np.random.randn(70)])
g_loss_collapse = np.concatenate([2.0 - 0.5 * np.arange(30)/30, np.ones(70) * 0.3 + 0.05 * np.random.randn(70)])

# Scenario C: Discriminator too strong
d_loss_strong_d = 0.7 * np.exp(-np.arange(epochs)/10) + 0.02 * np.random.randn(epochs)
g_loss_strong_d = 2.0 + 0.5 * np.log(1 + np.arange(epochs)/20) + 0.1 * np.random.randn(epochs)

fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].plot(d_loss_healthy, 'b-', label='D Loss')
axes[0].plot(g_loss_healthy, 'r-', label='G Loss')
axes[0].set_title('Scenario A')
axes[0].legend()
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Loss')

axes[1].plot(d_loss_collapse, 'b-', label='D Loss')
axes[1].plot(g_loss_collapse, 'r-', label='G Loss')
axes[1].set_title('Scenario B')
axes[1].legend()
axes[1].set_xlabel('Epoch')

axes[2].plot(d_loss_strong_d, 'b-', label='D Loss')
axes[2].plot(g_loss_strong_d, 'r-', label='G Loss')
axes[2].set_title('Scenario C')
axes[2].legend()
axes[2].set_xlabel('Epoch')

plt.tight_layout()
plt.show()

### Analysis 3.3 (6 points)

**Q1:** For each scenario (A, B, C), describe what is happening during training and whether it represents healthy or problematic training.

*Scenario A:*

*Scenario B:*

*Scenario C:*


**Q2:** What is "mode collapse" in GANs? Why does it occur, and name ONE technique to mitigate it.

*Your answer:*


**Q3:** Compare VAEs and GANs: which typically produces sharper images, and why?

*Your answer:*


### Comparison Table (Required)

Fill in this comparison table based on your understanding:

| Aspect | Autoencoder | VAE | GAN |
|--------|-------------|-----|-----|
| Training stability | | | |
| Image sharpness | | | |
| Can generate new samples? | | | |
| Main loss function | | | |

---

## Part 3 Reflection (Required)

**Before this exam, what did you think "generative models" meant?**

*Your answer:*


**What is the MOST SURPRISING thing you learned about generative models?**

*Your answer:*


**If you had to explain VAE vs GAN to a non-technical friend, how would you describe the difference in ONE sentence each?**

*VAE:*

*GAN:*

---

# Bonus: End-to-End Application (10 points)

**Time Allocation:** 30-45 minutes

Design and implement a complete image processing pipeline.

## Scenario: Document Enhancement Pipeline

You receive a scanned document with multiple degradations:
- Uneven illumination
- Noise
- Low contrast
- Slight blur

Your task is to design a restoration pipeline that produces a clean, readable document.

In [None]:
# Create a degraded document image
def create_degraded_document():
    """Create a synthetic degraded document."""
    np.random.seed(42)
    
    # Start with white background
    doc = np.ones((300, 400)) * 0.95
    
    # Add text-like lines
    for row in range(40, 260, 20):
        line_length = np.random.randint(100, 350)
        start_col = np.random.randint(20, 50)
        doc[row:row+6, start_col:start_col+line_length] = 0.1
    
    # Add title
    doc[15:28, 80:320] = 0.05
    
    # Apply degradations
    # 1. Uneven illumination
    x, y = np.meshgrid(np.linspace(0, 1, 400), np.linspace(0, 1, 300))
    illumination = 0.6 + 0.4 * np.sin(x * np.pi) * (0.7 + 0.3 * y)
    degraded = doc * illumination
    
    # 2. Add noise
    degraded = degraded + np.random.randn(*degraded.shape) * 0.05
    
    # 3. Blur
    degraded = filters.gaussian(degraded, sigma=1.2)
    
    # 4. Reduce contrast
    degraded = exposure.rescale_intensity(degraded, out_range=(0.2, 0.8))
    
    return np.clip(degraded, 0, 1), np.clip(doc, 0, 1)

degraded_doc, clean_doc = create_degraded_document()
show_images([degraded_doc, clean_doc], ['Degraded Document', 'Original (Target)'])

In [None]:
# TODO: Design and implement your restoration pipeline

def restore_document(degraded):
    """
    Restore a degraded document image.
    
    Your pipeline should address:
    1. Uneven illumination
    2. Noise
    3. Low contrast
    4. Blur
    
    Consider the ORDER of operations carefully!
    """
    result = degraded.copy()
    
    # TODO: Step 1 - Address illumination
    # Hint: Consider using morphological operations or background estimation
    
    # TODO: Step 2 - Denoise
    # Hint: What type of noise is present?
    
    # TODO: Step 3 - Enhance contrast
    # Hint: Consider histogram equalization or contrast stretching
    
    # TODO: Step 4 - Sharpen (if needed)
    # Hint: Be careful not to amplify noise
    
    return result

# Apply your pipeline
# restored = restore_document(degraded_doc)

# Show results
# show_images([degraded_doc, restored, clean_doc],
#             ['Degraded', 'Your Restoration', 'Target'])

# Calculate PSNR
# psnr = calculate_psnr(clean_doc, restored)
# print(f"PSNR: {psnr:.2f} dB")

### Bonus Analysis (4 points)

**Q1:** Explain why you chose the ORDER of operations in your pipeline. Why not a different order?

*Your answer:*


**Q2:** What is the most challenging degradation to correct? Why?

*Your answer:*


**Q3:** How would you adapt your pipeline if this were a color document instead of grayscale?

*Your answer:*


### Comparison Requirement (2 points)

**Try TWO different orderings** of your pipeline operations. Document the results:

**Ordering A:** (your main approach)
- Step 1: _____
- Step 2: _____
- Step 3: _____
- Step 4: _____
- PSNR achieved: _____

**Ordering B:** (alternative approach)
- Step 1: _____
- Step 2: _____
- Step 3: _____
- Step 4: _____
- PSNR achieved: _____

**Q4:** Why did one ordering work better than the other? Be specific about what went wrong with the worse ordering.

*Your answer:*

---

# LLM Usage Log

Document all LLM interactions below. **Be honest** - this log helps you reflect on your learning process.

For each LLM interaction, answer:
1. What you asked
2. Which LLM you used
3. **What you LEARNED from the response** (not just "used the code")

| Question/Task | LLM Used | What I LEARNED |
|---------------|----------|----------------|
| Example: "How to implement 2D convolution" | ChatGPT | I learned that convolution requires flipping the kernel first, which I didn't realize before |
| | | |
| | | |
| | | |
| | | |

### LLM Reflection (Required)

**Did using an LLM help you LEARN, or did it just give you answers?** Be honest.

*Your answer:*


**What is ONE thing you would have struggled to understand WITHOUT LLM help?**

*Your answer:*


**What is ONE thing you learned BETTER by trying yourself first before asking an LLM?**

*Your answer:*

---

# Submission Checklist

Before submitting, verify:

- [ ] Student name and ID filled in at the top
- [ ] **MY_SEED replaced with YOUR student number** (personalized parameters)
- [ ] All code cells executed (outputs visible)
- [ ] All TODO items completed
- [ ] All analysis questions answered **in your own words**
- [ ] All **reflection questions** answered honestly
- [ ] All **comparison requirements** completed
- [ ] LLM usage fully documented with **what you learned**
- [ ] **Process log completed** (failed attempts, debugging, time spent)
- [ ] All plots/figures visible
- [ ] Notebook exported as PDF
- [ ] Both `.ipynb` and `.pdf` files ready for submission

**File naming:** `LastName_FirstName_FinalsExam.ipynb` and `.pdf`

---

## UP Honor Code Statement

*"Honor and Excellence" (Karangalan at Kahusayan)*

As a student of the University of the Philippines, I am committed to upholding the highest standards of academic integrity. The pursuit of knowledge is not merely about obtaining correct answers, but about genuine learning and intellectual growth.

### I pledge the following:

1. **Honesty in Work:** All answers, code, and analysis in this exam represent my own understanding. Where I received help (from LLMs, resources, or others), I have documented it truthfully.

2. **Integrity in Learning:** I used AI tools as learning aids, not as substitutes for understanding. I can explain any code I submitted and defend any answer I wrote.

3. **Respect for the Process:** I did not share exam questions or answers with classmates. I understand that copying defeats the purpose of education.

4. **Commitment to Excellence:** I approached this exam as an opportunity to demonstrate genuine learning, not just to obtain a grade.

---

**By submitting this exam, I affirm that I have upheld the UP tradition of Honor and Excellence.**

**Student Signature:** ______________________ 

**Date:** __________

**Student Number:** ______________________