# Lab 2 - Module 0: Setup and The Update Rule

**Run this module first!**

This module:
1. Loads your group's parameters from Lab 1
2. Introduces the universal update rule
3. Shows how gradient descent automates what you did in Lab 1

**Time:** ~5 minutes (Prelab)

## 0. From Lab 1 to Lab 2

In Lab 1, you **manually searched** for optimal parameters:
- Adjusted sliders to find the best line fit
- Chose (x, y) values to explore landscapes
- Used "warmer/colder" feedback to guide your choices

**Today:** You'll learn how **gradient descent** automates this process.

### Key Insight

> **Gradient descent is the automated version of what you did manually in Lab 1.**

Instead of guessing which direction to move, gradient descent:
- Computes the **local slope** (gradient)
- Moves **downhill** automatically
- Uses a simple update rule: `new = old + change`

## 1. Load Your Group Code

Enter the **same group code** you used in Lab 1.

This will regenerate the same landscapes so you can compare manual search (Lab 1) vs. gradient descent (Lab 2).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import json
from datetime import datetime

group_code = int(input("Enter your group code from Lab 1: "))

np.random.seed(group_code)
print(f"Random seed set using group code: {group_code}")

## 2. Generate Same Parameters as Lab 1

This code regenerates the exact same landscapes you explored in Lab 1.

In [None]:
# Generate all parameters for this group (same as Lab 1)
np.random.seed(group_code)

# Line fitting parameters
true_m = np.random.uniform(-3, 3)
true_b = np.random.uniform(-5, 5)

# Hidden function parameters (parabola: f(x) = a(x-b)^2 + c)
hidden_a = np.random.uniform(0.5, 2.0)
hidden_b = np.random.uniform(-4, 4)
hidden_c = np.random.uniform(-10, 10)

# Mountain landscape parameters
num_peaks = np.random.randint(3, 6)

# Store all parameters
group_data = {
    "group_code": group_code,
    "created_at": datetime.now().isoformat(),
    "parameters": {
        "line_slope": float(true_m),
        "line_intercept": float(true_b),
        "hidden_func_a": float(hidden_a),
        "hidden_func_b": float(hidden_b),
        "hidden_func_c": float(hidden_c),
        "num_mountain_peaks": int(num_peaks)
    }
}

# Save to file
filename = f"lab2_group_{group_code}_params.json"
with open(filename, "w") as f:
    json.dump(group_data, f, indent=2)

print("=" * 60)
print("✓ Parameters Generated!")
print("=" * 60)
print(f"Group Code: {group_code}")
print(f"Parameters file: {filename}")
print()
print("These are the SAME landscapes you explored in Lab 1.")
print("Now you'll see how gradient descent navigates them automatically!")
print("=" * 60)

## 3. The Universal Update Rule

Gradient descent uses one simple rule to update parameters:

### The Rule:

```
new_value = old_value + change
```

Where:
```
change = -learning_rate × gradient
```

- **gradient** = local slope (direction of steepest ascent)
- **learning_rate** = how big a step to take
- **negative sign** = go downhill (opposite of gradient)

### Why This Works:

- **Gradient** tells us which direction is uphill
- We want to go **downhill** (minimize error)
- So we move in the **negative gradient** direction
- **Learning rate** controls how far we step

## 4. Visualizing the Update Rule

Let's see how the update rule works on a simple curve.

Consider a parabola: `f(x) = (x - 2)²`

The minimum is at x = 2. Let's see how gradient descent finds it.

In [None]:
# Simple parabola centered at x=2
def simple_parabola(x):
    return (x - 2)**2

# Plot the function
x_vals = np.linspace(-1, 5, 300)
y_vals = [simple_parabola(x) for x in x_vals]

plt.figure(figsize=(12, 8), dpi=100)
plt.plot(x_vals, y_vals, 'b-', linewidth=2, label='f(x) = (x-2)²')
plt.axvline(x=2, color='green', linestyle='--', linewidth=2, alpha=0.5, label='Minimum at x=2')
plt.xlabel('x', fontsize=12)
plt.ylabel('f(x)', fontsize=12)
plt.title('Simple Parabola: f(x) = (x-2)²', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(fontsize=11)
plt.tight_layout()
plt.show()

print("The minimum of f(x) = (x-2)² is at x = 2, where f(2) = 0")

## 5. Understanding Slope and Movement Direction

The **gradient** is just the slope at a point.

Let's mark three points and see their slopes:

In [None]:
# Compute numerical gradient (slope) at three points
def compute_slope(x, func, h=1e-5):
    """Compute slope using central difference"""
    return (func(x + h) - func(x - h)) / (2 * h)

# Three test points
test_points = [0.0, 2.0, 4.0]
test_colors = ['red', 'green', 'blue']
test_labels = ['Left of minimum (x=0)', 'At minimum (x=2)', 'Right of minimum (x=4)']

# Plot
plt.figure(figsize=(12, 8), dpi=100)
plt.plot(x_vals, y_vals, 'b-', linewidth=2, alpha=0.3, label='f(x) = (x-2)²')

print("Gradient Analysis:")
print("=" * 60)

for i, x in enumerate(test_points):
    y = simple_parabola(x)
    slope = compute_slope(x, simple_parabola)
    
    # Plot point
    plt.scatter(x, y, c=test_colors[i], s=200, edgecolors='black', 
               linewidths=2, zorder=5, label=test_labels[i])
    
    # Print analysis
    print(f"\nPoint: x = {x:.1f}")
    print(f"  f(x) = {y:.2f}")
    print(f"  Gradient (slope) = {slope:.2f}")
    
    if slope > 0.1:
        direction = "LEFT (downhill)"
        print(f"  → Slope is POSITIVE → Function increasing → Move {direction}")
    elif slope < -0.1:
        direction = "RIGHT (downhill)"
        print(f"  → Slope is NEGATIVE → Function decreasing → Move {direction}")
    else:
        print(f"  → Slope is ZERO → At the minimum!")

plt.xlabel('x', fontsize=12)
plt.ylabel('f(x)', fontsize=12)
plt.title('Gradient Direction Analysis', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(fontsize=10)
plt.tight_layout()
plt.show()

print("\n" + "=" * 60)

## 6. Interactive: Explore Learning Rate

Now let's see how **learning rate** affects the step size.

Starting from x = 0, we'll take one gradient descent step with different learning rates.

In [None]:
# Starting point
x_start = 0.0
f_start = simple_parabola(x_start)
grad_start = compute_slope(x_start, simple_parabola)

print("Starting Point:")
print(f"  x = {x_start}")
print(f"  f(x) = {f_start:.2f}")
print(f"  gradient = {grad_start:.2f}")
print()

# Try different learning rates
learning_rates = [0.05, 0.2, 0.5, 1.0]

print("One Step of Gradient Descent:")
print("=" * 60)

for lr in learning_rates:
    # Apply update rule: new = old - learning_rate × gradient
    change = -lr * grad_start
    x_new = x_start + change
    f_new = simple_parabola(x_new)
    
    print(f"\nLearning Rate = {lr}:")
    print(f"  change = -({lr}) × ({grad_start:.2f}) = {change:.2f}")
    print(f"  x_new = {x_start} + {change:.2f} = {x_new:.2f}")
    print(f"  f(x_new) = {f_new:.2f}")
    print(f"  Improvement: {f_start:.2f} → {f_new:.2f} (reduced by {f_start - f_new:.2f})")

print("\n" + "=" * 60)

## 7. Visualize Learning Rate Effects

In [None]:
# Visualize the steps
plt.figure(figsize=(12, 8), dpi=100)
plt.plot(x_vals, y_vals, 'b-', linewidth=2, alpha=0.3, label='f(x) = (x-2)²')

# Mark starting point
plt.scatter(x_start, f_start, c='red', s=300, marker='o', 
           edgecolors='black', linewidths=2, zorder=5, label='Start')

# Show each step
colors_lr = ['orange', 'yellow', 'lightgreen', 'lightblue']
for i, lr in enumerate(learning_rates):
    change = -lr * grad_start
    x_new = x_start + change
    f_new = simple_parabola(x_new)
    
    plt.scatter(x_new, f_new, c=colors_lr[i], s=200, marker='*',
               edgecolors='black', linewidths=1.5, zorder=5, 
               label=f'LR={lr}: x={x_new:.2f}')
    
    # Arrow showing movement
    plt.annotate('', xy=(x_new, f_new), xytext=(x_start, f_start),
                arrowprops=dict(arrowstyle='->', color=colors_lr[i], lw=2))

plt.xlabel('x', fontsize=12)
plt.ylabel('f(x)', fontsize=12)
plt.title('Effect of Learning Rate on Step Size', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(fontsize=10, loc='upper right')
plt.tight_layout()
plt.show()

print("Notice:")
print("  • Larger learning rate → Bigger step")
print("  • Smaller learning rate → Smaller step")
print("  • All steps move in the SAME direction (toward minimum)")

## Questions for Your Answer Sheet

**Q1.** If the gradient (slope) at a point is **positive**, which direction does gradient descent move? Why?

**Q2.** What happens to the step size if:
- (a) The learning rate is very large (e.g., 10.0)?
- (b) The slope (gradient magnitude) is very large?
- (c) Both learning rate and slope are small?

## Next Steps

1. **Answer Q1 and Q2** on your answer sheet
2. **Return to the LMS** and continue to Module 1
3. **Remember your group code** for the next modules!

In the next module, you'll apply gradient descent to the hidden parabola from Lab 1 and see how it automates the search you did manually.