# Lab 2 - Module 4: Mountain Landscape - Gradient Descent Limitations

**Learning Objectives:**
- Understand how GD gets trapped at local optima
- See the importance of starting position
- Connect to real neural network training challenges
- Recognize when GD fails despite perfect implementation

**Time:** ~15 minutes

---

**IMPORTANT:** Enter the same group code from Lab 1!

## Connection to Lab 1

In **Lab 1 Module 5**, you:
- Manually explored a mountain landscape with multiple peaks
- Chose (x, y) locations to sample
- Tried to find the global maximum
- Discovered that finding all peaks was hard!

**Today:** Gradient descent will face the same challenge!

### The Problem:
- GD only sees **local slope** (gradient)
- GD follows the steepest uphill direction
- GD gets **stuck** at the first peak it reaches
- GD cannot "see" that a higher peak exists elsewhere

## 1. Setup: Generate Same Mountain Landscape

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from ipywidgets import FloatText, Button, Checkbox, Output, VBox, HBox
from IPython.display import display

group_code = int(input("Enter your group code: "))
np.random.seed(group_code)

# Skip line and parabola parameters (we only need mountain)
_ = np.random.uniform(-3, 3)  # true_m
_ = np.random.uniform(-5, 5)  # true_b
_ = np.random.uniform(0.5, 2.0)  # hidden_a
_ = np.random.uniform(-4, 4)  # hidden_b
_ = np.random.uniform(-10, 10)  # hidden_c

# Mountain landscape parameters (same as Lab 1)
num_peaks = np.random.randint(3, 6)
peak_centers = []
peak_heights = []
peak_widths = []

for _ in range(num_peaks):
    cx = np.random.uniform(-3.0, 3.0)
    cy = np.random.uniform(-3.0, 3.0)
    height = np.random.uniform(1.0, 5.0)
    width = np.random.uniform(0.6, 1.5)
    peak_centers.append((cx, cy))
    peak_heights.append(height)
    peak_widths.append(width)

def mountain_height(pos):
    """Mountain landscape with multiple Gaussian peaks.
    
    Args:
        pos: [x, y] array or scalars
    
    Returns:
        Altitude (scalar or array)
    """
    if isinstance(pos, (list, tuple)):
        pos = np.array(pos)
    
    x = pos[..., 0] if pos.ndim > 1 else pos[0]
    y = pos[..., 1] if pos.ndim > 1 else pos[1]
    
    z = np.zeros_like(x, dtype=float)
    for (cx, cy), h, w in zip(peak_centers, peak_heights, peak_widths):
        z += h * np.exp(-(((x - cx)**2 + (y - cy)**2) / (2 * w**2)))
    return z

# Find global maximum
grid_size = 100
x_vals = np.linspace(-5, 5, grid_size)
y_vals = np.linspace(-5, 5, grid_size)
Xg, Yg = np.meshgrid(x_vals, y_vals)
Zg = np.zeros_like(Xg)
for i in range(grid_size):
    for j in range(grid_size):
        Zg[i, j] = mountain_height([Xg[i, j], Yg[i, j]])

flat_idx = np.argmax(Zg)
i_max, j_max = np.unravel_index(flat_idx, Zg.shape)
x_global = Xg[i_max, j_max]
y_global = Yg[i_max, j_max]
h_global = Zg[i_max, j_max]

print(f"âœ“ Mountain landscape loaded (same as Lab 1 Module 5)")
print(f"Number of peaks: {num_peaks}")
print(f"Global maximum at: ({x_global:.2f}, {y_global:.2f}) with height {h_global:.2f}")
print("(Revealed for learning purposes)")

## 2. Gradient Ascent (Uphill Climbing)

To find peaks (maxima), we'll use **gradient ascent** instead of descent:

```
new = old + learning_rate Ã— gradient  (note: + instead of -)
```

This moves **uphill** toward local peaks.

In [None]:
def compute_gradient_mountain(pos, h=1e-4):
    """Compute numerical gradient of mountain height.
    
    Args:
        pos: [x, y] position
    
    Returns:
        [grad_x, grad_y]
    """
    pos = np.array(pos, dtype=float)
    grad = np.zeros(2)
    
    for i in range(2):
        pos_forward = pos.copy()
        pos_backward = pos.copy()
        pos_forward[i] += h
        pos_backward[i] -= h
        grad[i] = (mountain_height(pos_forward) - mountain_height(pos_backward)) / (2 * h)
    
    return grad

def gradient_ascent_step(pos, learning_rate):
    """One step of gradient ascent (climbing uphill)."""
    grad = compute_gradient_mountain(pos)
    return pos + learning_rate * grad  # + for ascent (uphill)

def run_gradient_ascent(start_pos, learning_rate, max_steps=50, tol=1e-4):
    """Run gradient ascent from starting position."""
    history = {
        'pos': [np.array(start_pos, dtype=float)],
        'height': [mountain_height(start_pos)],
        'grad': [],
        'converged': False,
        'n_steps': 0
    }
    
    pos_current = np.array(start_pos, dtype=float)
    
    for step in range(max_steps):
        grad = compute_gradient_mountain(pos_current)
        history['grad'].append(grad)
        
        # Ascent step
        pos_new = gradient_ascent_step(pos_current, learning_rate)
        h_new = mountain_height(pos_new)
        
        history['pos'].append(pos_new.copy())
        history['height'].append(h_new)
        history['n_steps'] = step + 1
        
        # Check convergence (gradient near zero)
        if np.linalg.norm(grad) < tol:
            history['converged'] = True
            break
        
        pos_current = pos_new
    
    return history

print("âœ“ Gradient ascent functions ready")

## 3. Prediction Questions (Answer BEFORE running)

**Q13 (PREDICTION):** 

Think about starting positions:
- Starting at (1, 1): Which peak will gradient ascent reach?
- Will it find the global maximum?
- What if you start at (-2, 3)? Same peak or different?
- Why does starting position matter so much?

Write your predictions on the answer sheet!

## 4. Interactive: Run GD from Different Starting Points

In [None]:
# State for multiple GD runs
gd_runs = []
colors_runs = ['red', 'blue', 'green', 'orange', 'purple', 'brown']
show_landscape = False

# Widgets
x_start_input = FloatText(description="Start x:", value=0.0, step=0.5)
y_start_input = FloatText(description="Start y:", value=0.0, step=0.5)
lr_mountain_input = FloatText(description="Learning rate:", value=0.5, step=0.1)
run_ga_button = Button(description="Run Gradient Ascent", button_style='success')
reveal_landscape_button = Button(description="Reveal Landscape", button_style='primary')
reset_mountain_button = Button(description="Reset All", button_style='warning')
output_mountain = Output()

def plot_mountain_results():
    """Plot all GD runs on mountain landscape."""
    with output_mountain:
        output_mountain.clear_output(wait=True)
        
        if not gd_runs:
            print("No gradient ascent runs yet. Enter starting position and click 'Run Gradient Ascent'.")
            return
        
        # Create plot
        fig, ax = plt.subplots(figsize=(10, 8), dpi=100)
        
        # Show landscape if revealed
        if show_landscape:
            contour = ax.contourf(Xg, Yg, Zg, levels=30, cmap='plasma', alpha=0.6)
            plt.colorbar(contour, ax=ax, label='Altitude')
            
            # Mark global maximum
            ax.scatter([x_global], [y_global], c='yellow', marker='*', 
                      s=500, edgecolors='black', linewidths=3, zorder=20, label='Global maximum')
        
        # Plot each GD run
        for i, run in enumerate(gd_runs):
            pos_hist = np.array(run['history']['pos'])
            h_hist = run['history']['height']
            color = colors_runs[i % len(colors_runs)]
            
            # Path
            ax.plot(pos_hist[:, 0], pos_hist[:, 1], 'o-', 
                   color=color, linewidth=2, markersize=5, alpha=0.8, 
                   label=f"Run {i+1}: start ({run['start'][0]:.1f}, {run['start'][1]:.1f})")
            
            # Mark start
            ax.scatter([pos_hist[0, 0]], [pos_hist[0, 1]], 
                      c='white', marker='o', s=150, edgecolors=color, linewidths=3, zorder=10)
            
            # Mark end (peak reached)
            ax.scatter([pos_hist[-1, 0]], [pos_hist[-1, 1]], 
                      c=color, marker='*', s=300, edgecolors='black', linewidths=2, zorder=11)
        
        ax.set_xlabel('x', fontsize=12)
        ax.set_ylabel('y', fontsize=12)
        ax.set_title('Gradient Ascent on Mountain Landscape', fontsize=14, fontweight='bold')
        ax.set_xlim(-5, 5)
        ax.set_ylim(-5, 5)
        ax.grid(True, alpha=0.3)
        ax.legend(fontsize=9, loc='best')
        
        plt.tight_layout()
        plt.show()
        
        # Summary table
        print("\nSummary of Gradient Ascent Runs:")
        print("="*80)
        summary_data = []
        for i, run in enumerate(gd_runs):
            final_pos = run['history']['pos'][-1]
            final_h = run['history']['height'][-1]
            summary_data.append({
                'Run': i + 1,
                'Start (x, y)': f"({run['start'][0]:.2f}, {run['start'][1]:.2f})",
                'Final (x, y)': f"({final_pos[0]:.2f}, {final_pos[1]:.2f})",
                'Final height': f"{final_h:.3f}",
                'Steps': run['history']['n_steps'],
                'Found global?': 'Yes' if abs(final_h - h_global) < 0.5 else 'No'
            })
        
        display(pd.DataFrame(summary_data))
        print(f"\nGlobal maximum: ({x_global:.2f}, {y_global:.2f}), height = {h_global:.2f}")
        print(f"\nNotice: Different starting points lead to different local peaks!")

def on_run_ga_click(b):
    x_start = x_start_input.value
    y_start = y_start_input.value
    lr = lr_mountain_input.value
    
    # Run gradient ascent
    history = run_gradient_ascent([x_start, y_start], lr, max_steps=50)
    
    gd_runs.append({
        'start': [x_start, y_start],
        'lr': lr,
        'history': history
    })
    
    plot_mountain_results()

def on_reveal_click(b):
    global show_landscape
    show_landscape = True
    plot_mountain_results()

def on_reset_click(b):
    global gd_runs, show_landscape
    gd_runs = []
    show_landscape = False
    with output_mountain:
        output_mountain.clear_output()
        print("Reset! Try new starting positions.")

run_ga_button.on_click(on_run_ga_click)
reveal_landscape_button.on_click(on_reveal_click)
reset_mountain_button.on_click(on_reset_click)

print("Interactive Gradient Ascent on Mountain Landscape")
print("="*80)
print("1. Enter starting (x, y) position (range: -5 to 5)")
print("2. Set learning rate (try 0.5)")
print("3. Click 'Run Gradient Ascent' to see GD climb to nearest peak")
print("4. Try MULTIPLE starting points to see different outcomes")
print("5. Click 'Reveal Landscape' to see all peaks\n")
print("Suggested starting points to try:")
print("  - (0, 0)")
print("  - (2, 2)")
print("  - (-3, 1)")
print("  - (1, -2)")
print("="*80)

display(VBox([
    HBox([x_start_input, y_start_input, lr_mountain_input]),
    HBox([run_ga_button, reveal_landscape_button, reset_mountain_button]),
    output_mountain
]))

## Questions for Your Answer Sheet

**Q14.** Based on your experiments:
- Did gradient ascent find the global maximum?
- Why or why not?
- How did the starting position affect which peak was reached?
- Can GD "see" distant peaks? Why not?

**Q15.** Connection to neural network training:
- Neural networks have loss landscapes with millions of local minima
- How is this mountain problem similar to training a neural network?
- What strategies might help overcome getting stuck at local optima?
  (Hint: random restarts, momentum, adaptive learning rates, etc.)
- Why is neural network optimization still successful despite local minima?

## Key Takeaways: Limitations of Gradient Descent

### What We Learned:

1. **Local Optima Problem:**
   - GD only sees **local slope**, not the full landscape
   - Gets trapped at the first peak/valley it reaches
   - Cannot escape local optima on its own

2. **Starting Point Matters:**
   - Different starting points â†’ different local optima
   - No way to know if you found global optimum
   - In ML: Weight initialization is crucial!

3. **Gradient Descent is "Greedy":**
   - Always follows steepest descent/ascent
   - Never "backtracks" or explores
   - Deterministic path from starting point

### Real Machine Learning Solutions:

1. **Random Restarts:** Try multiple starting points
2. **Momentum:** Add "inertia" to push through small barriers
3. **Stochastic GD:** Add noise to escape local minima
4. **Adaptive Methods:** Adjust learning rate dynamically (Adam, RMSprop)
5. **Good Initialization:** Smart starting points (Xavier, He initialization)
6. **Acceptance of Local Minima:** In practice, "good enough" local minima work!

### Why Neural Networks Still Work:

- High-dimensional spaces have many "good" local minima
- Most local minima are close in performance to global minimum
- Overparameterization helps: many paths to good solutions
- Modern architectures and techniques reduce the problem

## Lab 2 Complete!

Congratulations! You've completed Lab 2 and learned:

âœ“ The universal update rule: `new = old - learning_rate Ã— gradient`

âœ“ How gradient descent automates the manual search from Lab 1

âœ“ The critical importance of learning rate (Goldilocks problem)

âœ“ How GD navigates parameter space systematically

âœ“ The fundamental limitation: getting stuck at local optima

âœ“ Why starting position matters enormously

### Next Steps:

1. **Answer Q13, Q14, Q15** on your answer sheet
2. **Review your predictions** - How accurate were they?
3. **Return to the LMS** to submit your work

Great work! ðŸŽ‰