# Lab 2 - Module 2: Gradient Descent on Parameter Space (Line Fitting)

**Learning Objectives:**
- Apply gradient descent to 2D parameter optimization
- Visualize GD paths on MSE contour plots
- Compare GD navigation to manual search from Lab 1
- Understand how learning rate affects convergence in 2D

**Time:** ~20 minutes

---

**IMPORTANT:** Enter the same group code from Lab 1!

## Connection to Lab 1

In **Lab 1 Modules 2-3**, you:
- Manually adjusted (m, b) sliders to fit a line
- Explored parameter space by submitting guesses
- Only saw MSE values (no visual of data)
- Navigated by trial and error

**Today:** Gradient descent will navigate the same parameter space automatically!

### Key Difference:
- **Lab 1:** You chose where to go next based on MSE feedback
- **Lab 2:** GD computes gradients and moves systematically downhill

## 1. Setup: Generate Same Data as Lab 1

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from ipywidgets import FloatText, Button, Output, VBox, HBox
from IPython.display import display

group_code = int(input("Enter your group code: "))
np.random.seed(group_code)

# Generate same line fitting data as Lab 1
true_m = np.random.uniform(-3, 3)
true_b = np.random.uniform(-5, 5)
x_data = np.linspace(-5, 5, 25)
noise = np.random.normal(0, 1.0, size=len(x_data))
y_data = true_m * x_data + true_b + noise

print("✓ Same line fitting data loaded from Lab 1")
print(f"\nTrue parameters: m = {true_m:.3f}, b = {true_b:.3f}")
print("(Revealed for learning purposes)")

## 2. Define MSE Function and Gradients

The Mean Squared Error (MSE) measures how well a line fits the data:

```
MSE(m, b) = (1/N) × Σ(y_i - (m × x_i + b))²
```

Gradient descent will minimize this function by adjusting (m, b).

In [None]:
def compute_mse(params):
    """
    Compute MSE for line parameters [m, b].
    
    Args:
        params: numpy array [m, b]
    
    Returns:
        MSE value (scalar)
    """
    m, b = params
    y_pred = m * x_data + b
    return np.mean((y_data - y_pred) ** 2)

def compute_gradient_mse(params, h=1e-5):
    """
    Compute numerical gradient of MSE with respect to [m, b].
    
    Returns:
        [gradient_m, gradient_b]
    """
    grad = np.zeros_like(params)
    
    for i in range(len(params)):
        params_forward = params.copy()
        params_backward = params.copy()
        
        params_forward[i] += h
        params_backward[i] -= h
        
        grad[i] = (compute_mse(params_forward) - compute_mse(params_backward)) / (2 * h)
    
    return grad

def gd_step_2d(params, learning_rate):
    """
    Perform one gradient descent step on (m, b).
    
    Returns:
        Updated [m, b]
    """
    grad = compute_gradient_mse(params)
    return params - learning_rate * grad

print("✓ MSE and gradient functions ready")

## 3. Compute MSE Landscape (for visualization)

We'll precompute the MSE over a grid so we can show contour plots.

In [None]:
# Create grid for contour plot
m_vals = np.linspace(-5, 5, 60)
b_vals = np.linspace(-5, 5, 60)
M_grid, B_grid = np.meshgrid(m_vals, b_vals)
MSE_grid = np.zeros_like(M_grid)

for i in range(M_grid.shape[0]):
    for j in range(M_grid.shape[1]):
        MSE_grid[i, j] = compute_mse(np.array([M_grid[i, j], B_grid[i, j]]))

# Find grid minimum
flat_idx = np.argmin(MSE_grid)
i_min, j_min = np.unravel_index(flat_idx, MSE_grid.shape)
m_min_grid = M_grid[i_min, j_min]
b_min_grid = B_grid[i_min, j_min]
mse_min_grid = MSE_grid[i_min, j_min]

print("✓ MSE landscape computed")
print(f"Grid minimum: m = {m_min_grid:.3f}, b = {b_min_grid:.3f}, MSE = {mse_min_grid:.4f}")

## 4. Prediction Questions (Answer BEFORE running GD)

**Q6 (PREDICTION):** 

Think about your Lab 1 experience:
- What was your best (m, b) from Lab 1 Module 3?
- If you start GD from that point, will it find an even lower MSE?
- Starting from (0, 0), predict: Will the path be straight or curved? Why?

Write your predictions on the answer sheet before continuing!

## 5. Interactive Gradient Descent

**Instructions:**
1. Enter starting values for m and b (try (0, 0) first)
2. Choose a learning rate (start with 0.1)
3. Click "Run 1 Step" to see a single GD update
4. Click "Run 20 Steps" to see longer path
5. Click "Show Full Landscape" to reveal MSE contours
6. Click "Reset" to try new starting point or learning rate

In [None]:
# State for interactive GD
gd_state_2d = {
    'history': [],
    'running': False,
    'show_landscape': False
}

# Widgets
m0_input = FloatText(description="Starting m:", value=0.0, step=0.5)
b0_input = FloatText(description="Starting b:", value=0.0, step=0.5)
lr_input = FloatText(description="Learning rate:", value=0.1, step=0.01)
step_button = Button(description="Run 1 Step", button_style='info')
multi_step_button = Button(description="Run 20 Steps", button_style='success')
reveal_button = Button(description="Show Full Landscape", button_style='primary')
reset_button = Button(description="Reset", button_style='warning')
output_area = Output()

def initialize_gd_2d():
    """Initialize or reset GD state"""
    m_start = m0_input.value
    b_start = b0_input.value
    params_start = np.array([m_start, b_start])
    
    gd_state_2d['history'] = [{
        'step': 0,
        'm': m_start,
        'b': b_start,
        'MSE': compute_mse(params_start),
        'grad_m': 0.0,
        'grad_b': 0.0
    }]
    gd_state_2d['running'] = True

def run_gd_steps_2d(n_steps):
    """Run n steps of gradient descent"""
    if not gd_state_2d['running']:
        initialize_gd_2d()
    
    lr = lr_input.value
    
    for _ in range(n_steps):
        current = gd_state_2d['history'][-1]
        params_current = np.array([current['m'], current['b']])
        
        # Compute gradient
        grad = compute_gradient_mse(params_current)
        
        # Update parameters
        params_new = params_current - lr * grad
        m_new, b_new = params_new
        
        gd_state_2d['history'].append({
            'step': current['step'] + 1,
            'm': m_new,
            'b': b_new,
            'MSE': compute_mse(params_new),
            'grad_m': grad[0],
            'grad_b': grad[1]
        })

def plot_gd_2d():
    """Plot GD path on parameter space"""
    with output_area:
        output_area.clear_output(wait=True)
        
        if not gd_state_2d['history']:
            print("Click 'Run 1 Step' or 'Run 20 Steps' to start!")
            return
        
        # Extract history
        df = pd.DataFrame(gd_state_2d['history'])
        m_hist = df['m'].values
        b_hist = df['b'].values
        mse_hist = df['MSE'].values
        
        # Create figure
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6), dpi=100)
        
        # Left plot: Parameter space with optional landscape
        if gd_state_2d['show_landscape']:
            # Show full MSE contours
            contour = ax1.contourf(M_grid, B_grid, MSE_grid, levels=25, cmap='viridis', alpha=0.6)
            plt.colorbar(contour, ax=ax1, label='MSE')
            
            # Mark grid minimum
            ax1.scatter([m_min_grid], [b_min_grid], c='white', marker='o', 
                       s=200, edgecolors='black', linewidths=2, zorder=10, label='Grid minimum')
        
        # Plot GD path
        ax1.plot(m_hist, b_hist, 'r-', linewidth=2, alpha=0.8, label='GD path')
        
        # Mark start and end
        ax1.scatter([m_hist[0]], [b_hist[0]], c='green', marker='o', 
                   s=200, edgecolors='black', linewidths=2, zorder=5, label='Start')
        ax1.scatter([m_hist[-1]], [b_hist[-1]], c='red', marker='*', 
                   s=300, edgecolors='black', linewidths=2, zorder=5, label='Current')
        
        # Add arrows
        arrow_freq = max(1, len(m_hist) // 8)
        for i in range(0, len(m_hist) - 1, arrow_freq):
            ax1.annotate('', xy=(m_hist[i+1], b_hist[i+1]), 
                        xytext=(m_hist[i], b_hist[i]),
                        arrowprops=dict(arrowstyle='->', color='red', lw=2, alpha=0.6))
        
        ax1.set_xlabel('m (slope)', fontsize=11)
        ax1.set_ylabel('b (intercept)', fontsize=11)
        ax1.set_title('GD Path in Parameter Space', fontsize=12, fontweight='bold')
        ax1.grid(True, alpha=0.3)
        ax1.legend(fontsize=9, loc='best')
        ax1.set_xlim(-5, 5)
        ax1.set_ylim(-5, 5)
        
        # Right plot: MSE over iterations
        ax2.plot(df['step'], mse_hist, 'b-o', linewidth=2, markersize=5)
        ax2.axhline(y=mse_min_grid, color='green', linestyle='--', 
                   linewidth=2, alpha=0.5, label='Grid minimum MSE')
        ax2.set_xlabel('Step', fontsize=11)
        ax2.set_ylabel('MSE', fontsize=11)
        ax2.set_title('MSE Convergence', fontsize=12, fontweight='bold')
        ax2.grid(True, alpha=0.3)
        ax2.legend(fontsize=9)
        
        plt.tight_layout()
        plt.show()
        
        # Summary stats
        print(f"\nSteps taken: {len(gd_state_2d['history']) - 1}")
        print(f"Start: m = {m_hist[0]:.4f}, b = {b_hist[0]:.4f}, MSE = {mse_hist[0]:.4f}")
        print(f"Current: m = {m_hist[-1]:.4f}, b = {b_hist[-1]:.4f}, MSE = {mse_hist[-1]:.4f}")
        print(f"MSE improvement: {mse_hist[0] - mse_hist[-1]:.4f}")
        print(f"\nGrid minimum: m = {m_min_grid:.4f}, b = {b_min_grid:.4f}, MSE = {mse_min_grid:.4f}")
        print(f"Distance from minimum: Δm = {abs(m_hist[-1] - m_min_grid):.4f}, Δb = {abs(b_hist[-1] - b_min_grid):.4f}")
        
        # Table of recent steps
        print("\n" + "="*80)
        print("Recent Steps:")
        df_display = df[['step', 'm', 'b', 'MSE', 'grad_m', 'grad_b']].tail(10).copy()
        for col in ['m', 'b', 'MSE', 'grad_m', 'grad_b']:
            df_display[col] = df_display[col].apply(lambda v: f"{v:.4f}")
        display(df_display)

def on_step_click(b):
    run_gd_steps_2d(1)
    plot_gd_2d()

def on_multi_step_click(b):
    run_gd_steps_2d(20)
    plot_gd_2d()

def on_reveal_click(b):
    gd_state_2d['show_landscape'] = True
    plot_gd_2d()

def on_reset_click(b):
    gd_state_2d['history'] = []
    gd_state_2d['running'] = False
    gd_state_2d['show_landscape'] = False
    with output_area:
        output_area.clear_output()
        print("Reset! Adjust starting point and learning rate, then click 'Run 1 Step'.")

step_button.on_click(on_step_click)
multi_step_button.on_click(on_multi_step_click)
reveal_button.on_click(on_reveal_click)
reset_button.on_click(on_reset_click)

print("Interactive Gradient Descent on Parameter Space")
print("="*80)
print("1. Set starting (m, b) and learning rate")
print("2. Click 'Run 1 Step' to see one GD update")
print("3. Click 'Run 20 Steps' to see longer path")
print("4. Click 'Show Full Landscape' to reveal MSE contours")
print("5. Try different starting points and learning rates")
print("\nSuggested experiments:")
print("  - Start at (0, 0) with LR = 0.1")
print("  - Try your Lab 1 best guess as starting point")
print("  - Try LR = 0.01 (too small), LR = 0.1 (good), LR = 0.5 (risky)")
print("="*80)

display(VBox([
    HBox([m0_input, b0_input, lr_input]),
    HBox([step_button, multi_step_button, reveal_button, reset_button]),
    output_area
]))

## 6. Systematic Comparison: Three Learning Rates

Let's compare three learning rates starting from the same point (0, 0).

In [None]:
# Compare three learning rates
start_params = np.array([0.0, 0.0])
learning_rates = [0.01, 0.1, 0.5]
max_steps = 50
colors_lr = ['blue', 'green', 'red']

results_2d = {}

for lr in learning_rates:
    history_params = [start_params.copy()]
    history_mse = [compute_mse(start_params)]
    
    params_current = start_params.copy()
    for step in range(max_steps):
        grad = compute_gradient_mse(params_current)
        params_current = params_current - lr * grad
        
        history_params.append(params_current.copy())
        history_mse.append(compute_mse(params_current))
    
    results_2d[lr] = {
        'params': np.array(history_params),
        'mse': np.array(history_mse)
    }

# Plot comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6), dpi=100)

# Left: Parameter space with contours
contour = ax1.contourf(M_grid, B_grid, MSE_grid, levels=25, cmap='viridis', alpha=0.4)
plt.colorbar(contour, ax=ax1, label='MSE')

# Plot paths for each LR
for i, lr in enumerate(learning_rates):
    params_hist = results_2d[lr]['params']
    ax1.plot(params_hist[:, 0], params_hist[:, 1], 'o-', 
            color=colors_lr[i], label=f'LR = {lr}', linewidth=2, markersize=4, alpha=0.8)

# Mark start and minimum
ax1.scatter([start_params[0]], [start_params[1]], c='black', marker='o', 
           s=200, edgecolors='white', linewidths=2, zorder=10, label='Start')
ax1.scatter([m_min_grid], [b_min_grid], c='white', marker='*', 
           s=300, edgecolors='black', linewidths=2, zorder=10, label='Minimum')

ax1.set_xlabel('m (slope)', fontsize=11)
ax1.set_ylabel('b (intercept)', fontsize=11)
ax1.set_title('GD Paths for Different Learning Rates', fontsize=12, fontweight='bold')
ax1.legend(fontsize=9, loc='best')
ax1.grid(True, alpha=0.3)
ax1.set_xlim(-5, 5)
ax1.set_ylim(-5, 5)

# Right: MSE convergence
for i, lr in enumerate(learning_rates):
    mse_hist = results_2d[lr]['mse']
    ax2.plot(range(len(mse_hist)), mse_hist, 'o-',
            color=colors_lr[i], label=f'LR = {lr}', linewidth=2, markersize=4, alpha=0.8)

ax2.axhline(y=mse_min_grid, color='black', linestyle='--', 
           linewidth=2, alpha=0.3, label='Minimum MSE')
ax2.set_xlabel('Step', fontsize=11)
ax2.set_ylabel('MSE', fontsize=11)
ax2.set_title('Convergence Comparison', fontsize=12, fontweight='bold')
ax2.legend(fontsize=9)
ax2.grid(True, alpha=0.3)
ax2.set_yscale('log')  # Log scale to see differences better

plt.tight_layout()
plt.show()

# Summary table
print("\nComparison Summary:")
print("="*80)
summary_data = []
for lr in learning_rates:
    final_params = results_2d[lr]['params'][-1]
    final_mse = results_2d[lr]['mse'][-1]
    
    summary_data.append({
        'Learning Rate': lr,
        'Final m': f"{final_params[0]:.4f}",
        'Final b': f"{final_params[1]:.4f}",
        'Final MSE': f"{final_mse:.6f}",
        'Error from min': f"{final_mse - mse_min_grid:.6f}"
    })

display(pd.DataFrame(summary_data))

print(f"\nGrid minimum: m = {m_min_grid:.4f}, b = {b_min_grid:.4f}, MSE = {mse_min_grid:.6f}")
print(f"True parameters: m = {true_m:.4f}, b = {true_b:.4f}")

## 7. Visualize the Fitted Line

Let's see what the actual line looks like for the GD solution!

In [None]:
# Use the best result (LR = 0.1)
best_lr = 0.1
final_params = results_2d[best_lr]['params'][-1]
m_gd, b_gd = final_params

# Plot data with fitted line
plt.figure(figsize=(10, 6), dpi=100)
plt.scatter(x_data, y_data, s=80, alpha=0.6, edgecolors='black', label='Data')
plt.plot(x_data, m_gd * x_data + b_gd, 'r-', linewidth=3, 
        label=f'GD fit: y = {m_gd:.3f}x + {b_gd:.3f}')
plt.plot(x_data, true_m * x_data + true_b, 'g--', linewidth=2, alpha=0.5,
        label=f'True line: y = {true_m:.3f}x + {true_b:.3f}')

plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Gradient Descent Line Fit', fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"GD found: m = {m_gd:.4f}, b = {b_gd:.4f}")
print(f"True params: m = {true_m:.4f}, b = {true_b:.4f}")
print(f"Error: Δm = {abs(m_gd - true_m):.4f}, Δb = {abs(b_gd - true_b):.4f}")

## Questions for Your Answer Sheet

**Q7.** Describe the shape of the GD path on the MSE contour plot:
- Is it straight or curved? Why?
- Does the path always move perpendicular to contour lines?
- What happens to step size as GD approaches the minimum?

**Q8.** Compare GD to your manual exploration in Lab 1:
- Which was more efficient at finding the minimum?
- How many "guesses" did you make in Lab 1 vs. GD steps here?
- What advantage does GD have over random/systematic search?

**Q9.** Based on the learning rate comparison:
- Which LR converged fastest to near-minimum?
- What happens with LR = 0.01? (Too slow?)
- What happens with LR = 0.5? (Oscillation? Divergence?)
- How would you choose a good LR for a new problem?

## Next Steps

1. **Answer Q6, Q7, Q8, Q9** on your answer sheet
2. **Return to the LMS** and continue to Module 3
3. In Module 3, you'll explore learning rate effects in more depth!