# Homework 1 - Part 2: The Double Descent Phenomenon

**Due Date**: [Add due date]

**Name**: ___________________________

**Student ID**: ___________________________

---

## Assignment Overview

In this assignment, you will investigate the **double descent phenomenon** - a surprising discovery that challenges the classical bias-variance tradeoff.

### The Phenomenon

Classical machine learning theory predicts a U-shaped test error curve:
- Few parameters → underfitting
- Optimal parameters → sweet spot
- Too many parameters → overfitting

**But modern research shows**: Test error can **decrease again** when models become extremely overparameterized!

### What You'll Do:
1. Implement functions for linear regression double descent
2. Build and train neural networks of varying sizes
3. Observe and analyze the double descent curve
4. Compare results across both settings

### Learning Objectives:
- Understand the limitations of classical bias-variance theory
- Implement minimum-norm interpolation
- Analyze overparameterized models
- Connect theory to modern deep learning practice

### Grading:
- **Part 1**: Linear Regression Implementation (35 points)
- **Part 2**: Neural Network Implementation (25 points)
- **Part 3**: Analysis Questions (30 points)
- **Part 4**: Code Quality (10 points)
- **Total**: 100 points

---

## Setup: Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
import time
from typing import Tuple, List, Dict
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 12

# Set random seeds for reproducibility
SEED = 42
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed(SEED)

# Check device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## Part 1: Double Descent in Linear Regression (35 points)

### Background

We study the linear model: $y = X\beta + \epsilon$

**Two regimes:**
1. **Underparameterized (p < n)**: Use standard least squares
   $$\hat{\beta} = (X^T X)^{-1} X^T y$$

2. **Overparameterized (p \geq n)**: Use minimum-norm interpolator
   $$\hat{\beta} = X^T (XX^T)^{-1} y$$

The minimum-norm solution is the **smallest L2 norm** among all solutions that perfectly fit the training data.

### Task 1.1: Implement Data Generation (8 points)

In [None]:
def generate_linear_regression_data(n: int = 100,
                                   d: int = 200,
                                   sigma: float = 0.2,
                                   beta_decay: bool = True) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Generate linear regression data: y = X @ beta + epsilon

    TODO: Implement this function to generate:
    1. Design matrix X from standard normal distribution
    2. True coefficients beta with decay (if beta_decay=True: beta_j ~ 1/j^2)
    3. Response y = X @ beta + Gaussian noise

    Args:
        n: Number of samples
        d: Number of features
        sigma: Noise standard deviation
        beta_decay: If True, use decaying coefficients

    Returns:
        X: Design matrix (n x d)
        y: Response vector (n,)
        beta_true: True coefficients (d,)
    """
    # TODO: Generate design matrix X from standard normal
    # Hint: Use np.random.randn(n, d)
    X = None  # YOUR CODE HERE

    # TODO: Generate true coefficients
    if beta_decay:
        # Create decaying coefficients: beta_j ~ 1/j^2
        # Hint: np.random.randn(d) / (np.arange(1, d + 1) ** 2)
        beta_true = None  # YOUR CODE HERE
    else:
        # Standard normal coefficients
        beta_true = None  # YOUR CODE HERE

    # TODO: Generate response with Gaussian noise
    # y = X @ beta_true + noise
    # Hint: noise = np.random.randn(n) * sigma
    y = None  # YOUR CODE HERE

    return X, y, beta_true

# Test your implementation
X_test, y_test, beta_test = generate_linear_regression_data(n=50, d=100)
assert X_test.shape == (50, 100), "X shape incorrect"
assert y_test.shape == (50,), "y shape incorrect"
assert beta_test.shape == (100,), "beta shape incorrect"
print("✓ Data generation tests passed!")
print(f"X range: [{X_test.min():.2f}, {X_test.max():.2f}]")
print(f"y range: [{y_test.min():.2f}, {y_test.max():.2f}]")
print(f"beta range: [{beta_test.min():.4f}, {beta_test.max():.4f}]")

### Task 1.2: Implement Least Squares (7 points)

In [None]:
def fit_least_squares(X: np.ndarray, y: np.ndarray) -> np.ndarray:
    """
    Fit least squares: beta_hat = (X^T X)^{-1} X^T y
    Works when X has full column rank (p < n)

    TODO: Implement standard least squares estimation

    Args:
        X: Design matrix (n x p)
        y: Response vector (n,)

    Returns:
        beta_hat: Estimated coefficients (p,)
    """
    try:
        # TODO: Implement least squares
        # Step 1: Compute X^T X
        # Step 2: Compute X^T y
        # Step 3: Solve (X^T X) beta = X^T y using np.linalg.solve
        # Hint: beta_hat = np.linalg.solve(X.T @ X, X.T @ y)
        beta_hat = None  # YOUR CODE HERE
        return beta_hat
    except np.linalg.LinAlgError:
        # Fallback if singular
        return np.linalg.lstsq(X, y, rcond=None)[0]

# Test your implementation
X_test, y_test, beta_true = generate_linear_regression_data(n=100, d=50)
beta_hat = fit_least_squares(X_test, y_test)
assert beta_hat.shape == (50,), "beta_hat shape incorrect"
residuals = y_test - X_test @ beta_hat
print("✓ Least squares tests passed!")
print(f"Residual norm: {np.linalg.norm(residuals):.4f}")

### Task 1.3: Implement Minimum-Norm Interpolator (10 points)

This is the **key method** for the overparameterized regime!

In [None]:
def fit_minimum_norm(X: np.ndarray, y: np.ndarray) -> np.ndarray:
    """
    Fit minimum-norm interpolator: beta_hat = X^T (X X^T)^{-1} y
    This gives the minimum L2 norm solution among all interpolating solutions.
    Works when p >= n (overparameterized regime)

    TODO: Implement minimum-norm interpolation

    Args:
        X: Design matrix (n x p)
        y: Response vector (n,)

    Returns:
        beta_hat: Estimated coefficients (p,)
    """
    # TODO: Implement minimum-norm interpolator
    # Step 1: Compute X @ X^T (this is n x n, smaller when p > n)
    # Step 2: Solve (X @ X^T) @ alpha = y for alpha
    # Step 3: Compute beta_hat = X^T @ alpha
    # Hint: beta_hat = X.T @ np.linalg.solve(X @ X.T, y)
    beta_hat = None  # YOUR CODE HERE

    return beta_hat

# Test your implementation
X_test, y_test, beta_true = generate_linear_regression_data(n=50, d=100)
beta_hat = fit_minimum_norm(X_test, y_test)
assert beta_hat.shape == (100,), "beta_hat shape incorrect"

# Check that it interpolates (perfectly fits training data)
predictions = X_test @ beta_hat
interpolation_error = np.max(np.abs(y_test - predictions))
print("✓ Minimum-norm tests passed!")
print(f"Max interpolation error: {interpolation_error:.6f} (should be ~0)")
print(f"Beta norm: {np.linalg.norm(beta_hat):.4f}")

### Task 1.4: Implement Risk Computation (5 points)

In [None]:
def compute_risk(X: np.ndarray, y: np.ndarray, beta_hat: np.ndarray) -> float:
    """
    Compute mean squared error: (1/n) ||y - X beta_hat||^2

    TODO: Implement MSE calculation

    Args:
        X: Design matrix (n x p)
        y: Response vector (n,)
        beta_hat: Estimated coefficients (p,)

    Returns:
        MSE risk
    """
    # TODO: Compute predictions and MSE
    # Step 1: predictions = X @ beta_hat
    # Step 2: risk = np.mean((y - predictions) ** 2)
    predictions = None  # YOUR CODE HERE
    risk = None  # YOUR CODE HERE

    return risk

# Test your implementation
risk = compute_risk(X_test, y_test, beta_hat)
assert isinstance(risk, (float, np.floating)), "Risk should be a float"
assert risk >= 0, "Risk should be non-negative"
print("✓ Risk computation tests passed!")
print(f"Test risk: {risk:.6f}")

### Task 1.5: Run Double Descent Experiment (5 points)

Now we'll put it all together! The experiment function is provided.

In [None]:
def double_descent_experiment(n: int = 100,
                             d: int = 200,
                             p_values: List[int] = None,
                             sigma: float = 0.2,
                             n_trials: int = 50) -> Dict:
    """
    Run double descent experiment (PROVIDED)
    """
    if p_values is None:
        p_values = list(range(10, 151, 5))

    results = {
        'p_values': p_values,
        'train_risks': [],
        'test_risks': [],
        'train_risks_std': [],
        'test_risks_std': []
    }

    print(f"\nRunning experiment with n={n}, d={d}, sigma={sigma}")
    print(f"Testing {len(p_values)} different feature counts")
    print("-" * 80)

    for p in p_values:
        train_risks_trials = []
        test_risks_trials = []

        for trial in range(n_trials):
            # Generate data
            X_full_train, y_train, beta_true = generate_linear_regression_data(n, d, sigma)
            X_full_test, y_test, _ = generate_linear_regression_data(n, d, sigma, beta_decay=False)
            y_test = X_full_test @ beta_true + np.random.randn(n) * sigma

            # Select p features
            np.random.seed(SEED + trial)
            selected_features = np.random.choice(d, size=p, replace=False)
            X_train = X_full_train[:, selected_features]
            X_test = X_full_test[:, selected_features]

            # Fit model
            if p < n:
                beta_hat = fit_least_squares(X_train, y_train)
            else:
                beta_hat = fit_minimum_norm(X_train, y_train)

            # Compute risks
            train_risk = compute_risk(X_train, y_train, beta_hat)
            test_risk = compute_risk(X_test, y_test, beta_hat)

            train_risks_trials.append(train_risk)
            test_risks_trials.append(test_risk)

        results['train_risks'].append(np.mean(train_risks_trials))
        results['test_risks'].append(np.mean(test_risks_trials))
        results['train_risks_std'].append(np.std(train_risks_trials))
        results['test_risks_std'].append(np.std(test_risks_trials))

        if p % 20 == 0 or p == n:
            print(f"p={p:3d}: Train Risk={np.mean(train_risks_trials):.6f}, "
                  f"Test Risk={np.mean(test_risks_trials):.6f}")

    return results

# Run the experiment
print("="*80)
print("PART 1: LINEAR REGRESSION DOUBLE DESCENT")
print("="*80)

p_values = list(range(10, 151, 5))
results_linear = double_descent_experiment(
    n=100,
    d=200,
    p_values=p_values,
    sigma=0.2,
    n_trials=50
)

print("\n✓ Experiment complete!")

### Visualize Results (Provided)

In [None]:
# Visualization code (PROVIDED)
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

p_vals = results_linear['p_values']
train_risks = results_linear['train_risks']
test_risks = results_linear['test_risks']
train_std = results_linear['train_risks_std']
test_std = results_linear['test_risks_std']

# Plot 1: Log-log scale
ax1 = axes[0]
ax1.plot(p_vals, train_risks, 'o-', label='Train Risk',
         linewidth=2.5, markersize=6, color='#e74c3c', alpha=0.8)
ax1.plot(p_vals, test_risks, 's-', label='Test Risk',
         linewidth=2.5, markersize=6, color='#3498db', alpha=0.8)
ax1.fill_between(p_vals,
                 np.array(train_risks) - np.array(train_std),
                 np.array(train_risks) + np.array(train_std),
                 alpha=0.2, color='#e74c3c')
ax1.fill_between(p_vals,
                 np.array(test_risks) - np.array(test_std),
                 np.array(test_risks) + np.array(test_std),
                 alpha=0.2, color='#3498db')
ax1.axvline(x=100, color='black', linestyle='--', linewidth=2,
            alpha=0.7, label='Interpolation Threshold (n=100)')
ax1.set_xlabel('Number of Features (p)', fontsize=14, fontweight='bold')
ax1.set_ylabel('Risk (MSE)', fontsize=14, fontweight='bold')
ax1.set_title('Linear Regression Double Descent (Log-Log)', fontsize=16, fontweight='bold')
ax1.set_xscale('log')
ax1.set_yscale('log')
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3, which='both')

# Plot 2: Linear scale with annotations
ax2 = axes[1]
ax2.plot(p_vals, test_risks, 's-', label='Test Risk',
         linewidth=2.5, markersize=6, color='#3498db', alpha=0.8)
ax2.axvline(x=100, color='black', linestyle='--', linewidth=2, alpha=0.7)

y_pos = ax2.get_ylim()[1] * 0.9
ax2.annotate('Classical\nRegime', xy=(50, y_pos),
             fontsize=12, ha='center', style='italic',
             bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
ax2.annotate('Peak', xy=(100, y_pos),
             fontsize=12, ha='center', style='italic',
             bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.5))
ax2.annotate('Modern\nRegime', xy=(130, y_pos),
             fontsize=12, ha='center', style='italic',
             bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))

ax2.set_xlabel('Number of Features (p)', fontsize=14, fontweight='bold')
ax2.set_ylabel('Test Risk (MSE)', fontsize=14, fontweight='bold')
ax2.set_title('Linear Regression Double Descent', fontsize=16, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('linear_double_descent.png', dpi=150, bbox_inches='tight')
plt.show()

## Part 2: Double Descent in Neural Networks (25 points)

### Background

We'll demonstrate double descent on Fashion-MNIST:
- **Architecture**: Input (784) → Hidden (W) → Hidden (W) → Output (10)
- **Variable**: Width W (from 5 to 2000)
- **Goal**: Observe test accuracy improve in overparameterized regime

### Load Dataset (Provided)

In [None]:
def load_fashion_mnist(subset_size: int = 1000, test_size: int = 500):
    """Load Fashion-MNIST (PROVIDED)"""
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,))
    ])

    train_dataset = datasets.FashionMNIST(
        root='./data', train=True, download=True, transform=transform
    )
    test_dataset = datasets.FashionMNIST(
        root='./data', train=False, download=True, transform=transform
    )

    if subset_size < len(train_dataset):
        indices = np.random.choice(len(train_dataset), subset_size, replace=False)
        train_dataset = torch.utils.data.Subset(train_dataset, indices)

    if test_size < len(test_dataset):
        indices = np.random.choice(len(test_dataset), test_size, replace=False)
        test_dataset = torch.utils.data.Subset(test_dataset, indices)

    return train_dataset, test_dataset

print("="*80)
print("PART 2: NEURAL NETWORK DOUBLE DESCENT")
print("="*80)

train_dataset, test_dataset = load_fashion_mnist(subset_size=1000, test_size=500)
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")

### Task 2.1: Implement Neural Network Architecture (15 points)

In [None]:
class FullyConnectedNet(nn.Module):
    """
    Fully connected neural network with variable width.
    Architecture: Input (784) -> Hidden1 -> Hidden2 -> Output (10)

    TODO: Implement this network class
    """
    def __init__(self, input_dim: int = 784,
                 hidden_dims: List[int] = [100, 100],
                 output_dim: int = 10,
                 activation: str = 'relu'):
        super(FullyConnectedNet, self).__init__()

        # TODO: Set activation function
        if activation == 'relu':
            self.activation = None  # YOUR CODE HERE (nn.ReLU())
        elif activation == 'tanh':
            self.activation = None  # YOUR CODE HERE (nn.Tanh())
        else:
            raise ValueError(f"Unknown activation: {activation}")

        # TODO: Build layers
        # Create a sequential network:
        # Input -> Linear -> Activation -> Linear -> Activation -> ... -> Output
        layers = []
        in_dim = input_dim

        # YOUR CODE HERE
        # Loop through hidden_dims:
        #   - Add nn.Linear(in_dim, hidden_dim)
        #   - Add activation
        #   - Update in_dim = hidden_dim
        for hidden_dim in hidden_dims:
            pass  # YOUR CODE HERE

        # Add final output layer
        # YOUR CODE HERE: layers.append(nn.Linear(in_dim, output_dim))

        self.network = nn.Sequential(*layers)

    def forward(self, x):
        """
        TODO: Implement forward pass
        Flatten input and pass through network
        """
        # YOUR CODE HERE
        # Step 1: Flatten: x = x.view(x.size(0), -1)
        # Step 2: Pass through network: return self.network(x)
        pass

    def count_parameters(self):
        """Count parameters (PROVIDED)"""
        return sum(p.numel() for p in self.parameters() if p.requires_grad)

# Test your implementation
test_model = FullyConnectedNet(input_dim=784, hidden_dims=[50, 50], output_dim=10)
test_input = torch.randn(5, 1, 28, 28)
test_output = test_model(test_input)
assert test_output.shape == (5, 10), f"Expected shape (5, 10), got {test_output.shape}"
print("✓ Network tests passed!")
print(f"Parameters: {test_model.count_parameters():,}")

### Task 2.2: Implement Training Function (10 points)

In [None]:
def train_neural_network(model: nn.Module,
                        train_dataset,
                        test_dataset,
                        epochs: int = 100,
                        batch_size: int = 32,
                        lr: float = 0.001,
                        patience: int = 20,
                        verbose: bool = True) -> Dict:
    """
    Train a neural network.

    TODO: Complete the training loop
    """
    # Create data loaders (PROVIDED)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

    model = model.to(device)

    # TODO: Define loss and optimizer
    criterion = None  # YOUR CODE HERE (nn.CrossEntropyLoss())
    optimizer = None  # YOUR CODE HERE (optim.Adam(model.parameters(), lr=lr))

    history = {'train_loss': [], 'test_loss': [], 'train_acc': [], 'test_acc': []}
    best_test_acc = 0
    patience_counter = 0
    start_time = time.time()

    for epoch in range(epochs):
        # Training phase
        model.train()
        train_loss = 0.0
        train_correct = 0
        train_total = 0

        for batch_X, batch_y in train_loader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)

            # TODO: Training step
            # 1. Zero gradients
            # 2. Forward pass
            # 3. Compute loss
            # 4. Backward pass
            # 5. Optimizer step
            # YOUR CODE HERE
            pass

            train_loss += loss.item() * batch_X.size(0)
            _, predicted = outputs.max(1)
            train_total += batch_y.size(0)
            train_correct += predicted.eq(batch_y).sum().item()

        train_loss /= len(train_dataset)
        train_acc = 100. * train_correct / train_total

        # Test phase
        model.eval()
        test_loss = 0.0
        test_correct = 0
        test_total = 0

        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                batch_X, batch_y = batch_X.to(device), batch_y.to(device)
                outputs = model(batch_X)
                loss = criterion(outputs, batch_y)

                test_loss += loss.item() * batch_X.size(0)
                _, predicted = outputs.max(1)
                test_total += batch_y.size(0)
                test_correct += predicted.eq(batch_y).sum().item()

        test_loss /= len(test_dataset)
        test_acc = 100. * test_correct / test_total

        history['train_loss'].append(train_loss)
        history['test_loss'].append(test_loss)
        history['train_acc'].append(train_acc)
        history['test_acc'].append(test_acc)

        # Early stopping
        if test_acc > best_test_acc:
            best_test_acc = test_acc
            patience_counter = 0
        else:
            patience_counter += 1

        if patience_counter >= patience:
            if verbose:
                print(f"Early stopping at epoch {epoch+1}")
            break

    training_time = time.time() - start_time

    return {
        'model': model,
        'history': history,
        'best_test_acc': best_test_acc,
        'final_train_acc': train_acc,
        'final_test_acc': test_acc,
        'training_time': training_time,
        'num_parameters': model.count_parameters(),
        'epochs_trained': len(history['train_loss'])
    }

### Run Neural Network Experiment (Provided)

**Note**: This will take 10-20 minutes depending on your hardware.

In [None]:
print("\nRunning Neural Network Experiment...")
print("-" * 80)

widths = [5, 10, 15, 20, 30, 50, 75, 100, 150, 200, 300, 500, 750, 1000, 1500, 2000]
nn_results = []

for width in widths:
    print(f"\n[{widths.index(width)+1}/{len(widths)}] Training width={width}...")
    
    model = FullyConnectedNet(
        input_dim=784,
        hidden_dims=[width, width],
        output_dim=10,
        activation='relu'
    )
    
    print(f"  Parameters: {model.count_parameters():,}")
    
    results = train_neural_network(
        model, train_dataset, test_dataset,
        epochs=100, batch_size=32, lr=0.001,
        patience=20, verbose=False
    )
    
    nn_results.append(results)
    
    print(f"  Train Acc: {results['final_train_acc']:.2f}%")
    print(f"  Test Acc:  {results['final_test_acc']:.2f}%")

print("\n✓ Neural network experiment complete!")

### Visualize Neural Network Results (Provided)

In [None]:
# Extract data
nn_params = [r['num_parameters'] for r in nn_results]
nn_train_acc = [r['final_train_acc'] for r in nn_results]
nn_test_acc = [r['final_test_acc'] for r in nn_results]
nn_test_error = [100 - acc for acc in nn_test_acc]

# Find interpolation threshold
interpolation_threshold_idx = None
for i, acc in enumerate(nn_train_acc):
    if acc >= 99.0:
        interpolation_threshold_idx = i
        break

# Plot
fig, ax = plt.subplots(1, 1, figsize=(12, 6))
ax.plot(nn_params, nn_test_error, 's-', linewidth=2.5, markersize=8,
        color='#e74c3c', label='Test Error')

if interpolation_threshold_idx is not None:
    ax.axvline(x=nn_params[interpolation_threshold_idx], color='black',
               linestyle='--', linewidth=2, alpha=0.7,
               label=f'Interpolation (~{nn_params[interpolation_threshold_idx]:,} params)')

ax.set_xlabel('Number of Parameters', fontsize=14, fontweight='bold')
ax.set_ylabel('Test Error (%)', fontsize=14, fontweight='bold')
ax.set_title('Neural Network: Double Descent', fontsize=16, fontweight='bold')
ax.set_xscale('log')
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('neural_network_double_descent.png', dpi=150, bbox_inches='tight')
plt.show()

## Part 3: Analysis Questions (30 points)

Answer the following questions based on your experimental results.

### Question 1 (10 points)

**Describe the double descent phenomenon observed in linear regression.**

a) What happens to test risk as p increases from 10 to 100?

b) What happens at p = n = 100 (the interpolation threshold)?

c) What happens to test risk as p increases beyond 100?

d) Why is this surprising from a classical bias-variance perspective?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Question 2 (10 points)

**Explain the minimum-norm interpolator.**

a) When p > n, why are there infinitely many solutions that perfectly fit the training data?

b) What does "minimum-norm" mean? Why do we choose this particular solution?

c) Why does the minimum-norm solution tend to generalize better than other interpolating solutions?

d) How does gradient descent implicitly find this solution in neural networks?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

### Question 3 (10 points)

**Connect to modern deep learning practice.**

a) Large language models (like GPT) have billions of parameters but are trained on relatively smaller datasets. Based on double descent, why might this work?

b) In your neural network experiments, did you observe double descent? Compare the test accuracy at the interpolation threshold vs. the largest models.

c) What are the practical implications for practitioners? Should we:
   - Use small models and tune carefully?
   - Use very large models and rely on overparameterization?
   - Something in between?

d) How does this change our understanding of regularization (L2 penalty, dropout, etc.)?

**YOUR ANSWER HERE:**

*Double-click to edit this cell*

## Part 4: Save Results

In [None]:
# Save results (PROVIDED)
linear_df = pd.DataFrame({
    'p': results_linear['p_values'],
    'train_risk': results_linear['train_risks'],
    'test_risk': results_linear['test_risks'],
    'train_risk_std': results_linear['train_risks_std'],
    'test_risk_std': results_linear['test_risks_std']
})
linear_df.to_csv('linear_double_descent_results.csv', index=False)

nn_df = pd.DataFrame({
    'width': widths,
    'parameters': nn_params,
    'train_acc': nn_train_acc,
    'test_acc': nn_test_acc,
    'training_time': [r['training_time'] for r in nn_results],
    'epochs': [r['epochs_trained'] for r in nn_results]
})
nn_df.to_csv('neural_network_double_descent_results.csv', index=False)

print("\n" + "="*80)
print("Assignment Complete!")
print("="*80)
print("\nGenerated files:")
print("  - linear_double_descent.png")
print("  - neural_network_double_descent.png")
print("  - linear_double_descent_results.csv")
print("  - neural_network_double_descent_results.csv")

## Submission Checklist

Before submitting, ensure:

- [ ] All TODO sections are completed
- [ ] All code cells run without errors
- [ ] All plots are generated
- [ ] All analysis questions are answered
- [ ] Your name and student ID are at the top
- [ ] The notebook is saved with all outputs visible

**Submit**: Upload this completed notebook (.ipynb file) with all outputs.