<div style="text-align: center;">

# **Spring 2026 &mdash; CIS 3813<br>Advanced Data Science<br>(Introduction to Machine Learning)**
### Week 2: How Models Learn: Gradient Descent in Action

</div>

---

## **Lab Instructions**

**Due Date**: Monday, 09 February @ 6:00 PM (with grace period until Wednesday, 11 February @ 11:59 PM)

In this lab, you will:
1. Implement gradient descent from scratch
2. Explore the effects of different learning rates
3. Visualize the optimization process
4. Apply gradient descent to a real dataset

**AI Usage**: 
- You may use AI tools for this lab
- **REQUIRED**: Include AI attribution using the format shown in the syllabus
- For B/A level credit, include detailed attribution in markdown cells

## **Grading**

| Component | Points |
|-----------|--------|
| Exercise 1: Implementing Gradient Descent | 25 |
| Exercise 2: Learning Rate Experiments | 25 |
| Exercise 3: Visualizing Convergence | 25 |
| Exercise 4: Application & Reflection | 15 |
| In-Class Mastery Assessment (Week 3) | 10 |
| **Total** | **100** |


---

## **AI Assistance Declaration**

**Tools used:** [e.g., ChatGPT-4 / GitHub Copilot / Claude / None]

**Sections with AI help:** [e.g., "Exercise 3: Pipeline Creation"]

**What I learned:** [Brief description of key concepts AI helped you understand]

**What I did independently:** [Sections you completed without AI assistance]

---

In [None]:
# Run this cell first to import required libraries
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm

# Set up plotting defaults
# plt.style.use('seaborn-v0_8-whitegrid') (Optional)
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

# Set random seed for reproducibility
np.random.seed(42)

---

# **Exercise 1: Implementing Gradient Descent (25 points)**

## **1.1 Generate Training Data (5 points)**

Create a dataset for linear regression with the following specifications:
- True relationship: $y = 2.5x - 1.5 + \epsilon$
- $\epsilon$ is Gaussian noise with mean 0 and standard deviation 1.5
- 100 data points
- X values uniformly distributed between -3 and 3

In [None]:
# TODO: Generate the training data
# Hint: Use np.random.uniform() for X and np.random.normal() for noise

true_m = 2.5
true_b = -1.5
n_samples = 100

# YOUR CODE HERE
X = None  # Replace with your code
y = None  # Replace with your code

# Visualize your data
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, label='Data points')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Generated Dataset')
plt.legend()
plt.show()

## **1.2 Implement the Loss Function (5 points)**

In the lecture, we used Mean Squared Error (MSE). For this lab, you will implement **Mean Absolute Error (MAE)** instead:

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - (mx_i + b)|$$

**Why MAE?** Unlike MSE, MAE treats all errors equally (no squaring), making it more robust to outliers. This also means the gradients will be different. (Meaning you'll need to derive them yourself in Part 1.3!)

In [None]:
def compute_mae(m, b, X, y):
    """
    Compute Mean Absolute Error.

    Parameters:
    -----------
    m : float
        Slope parameter
    b : float
        Intercept parameter
    X : numpy array
        Input features
    y : numpy array
        Target values

    Returns:
    --------
    float : Mean absolute error
    """
    # YOUR CODE HERE
    # Hint: Use np.abs() for absolute value
    pass  # Replace with your implementation

In [None]:
# Test your MAE function
# These should make sense: better parameters = lower MAE
print(f"MAE with true parameters (m=2.5, b=-1.5): {compute_mae(2.5, -1.5, X, y):.4f}")
print(f"MAE with m=0, b=0: {compute_mae(0, 0, X, y):.4f}")
print(f"MAE with m=5, b=5: {compute_mae(5, 5, X, y):.4f}")

## **1.3 Implement the Gradient Calculation (10 points)**

Now you need to derive and implement the gradients for MAE. This is different from MSE!

**Your task:** Derive the partial derivatives of MAE with respect to $m$ and $b$.

**Hints:**
- The derivative of $|u|$ with respect to $u$ is $\text{sign}(u)$, where $\text{sign}(u) = +1$ if $u > 0$, $-1$ if $u < 0$, and $0$ if $u = 0$
- Use the chain rule
- Let $e_i = y_i - (mx_i + b)$ be the error for point $i$
- NumPy has `np.sign()` function

**Write your derivation here (in comments or markdown):**

In [None]:
def compute_gradients(m, b, X, y):
    """
    Compute the gradients of MAE with respect to m and b.

    Parameters:
    -----------
    m : float
        Current slope parameter
    b : float
        Current intercept parameter
    X : numpy array
        Input features
    y : numpy array
        Target values

    Returns:
    --------
    tuple : (dm, db) - gradients with respect to m and b

    Hint: Use np.sign() to get the sign of the errors
    """
    # YOUR CODE HERE
    # Step 1: Compute predictions
    # Step 2: Compute errors (y - predictions)
    # Step 3: Compute sign of errors
    # Step 4: Apply chain rule for dm and db
    pass  # Replace with your implementation

In [None]:
# Test your gradient function
# At the true parameters, gradients should be close to zero
dm, db = compute_gradients(true_m, true_b, X, y)
print(f"Gradients at true parameters: dm = {dm:.4f}, db = {db:.4f}")
print("(These should be close to zero)")

# At poor parameters, gradients should point toward better values
dm, db = compute_gradients(0, 0, X, y)
print(f"\nGradients at m=0, b=0: dm = {dm:.4f}, db = {db:.4f}")
print("(dm should be negative, indicating we need to increase m)")

## **1.4 Implement the Full Gradient Descent Algorithm (5 points)**

Now put it all together! Implement gradient descent that:
1. Initializes m and b to given starting values
2. Iteratively updates parameters using the gradient
3. Records the history of parameters and loss values

In [None]:
def gradient_descent(X, y, m_init, b_init, learning_rate, n_iterations):
    """
    Perform gradient descent to find optimal m and b.

    Parameters:
    -----------
    X : numpy array
        Input features
    y : numpy array
        Target values
    m_init : float
        Initial slope value
    b_init : float
        Initial intercept value
    learning_rate : float
        Step size for gradient descent
    n_iterations : int
        Number of iterations to run

    Returns:
    --------
    tuple : (final_m, final_b, history)
        final_m, final_b: optimized parameters
        history: list of dicts with keys 'm', 'b', 'mse' for each iteration
    """
    m = m_init
    b = b_init
    history = []

    for i in range(n_iterations):
        # YOUR CODE HERE
        # 1. Compute gradients
        # 2. Update m and b
        # 3. Compute MAE and record in history (use 'mae' as key)
        pass  # Replace with your implementation

    return m, b, history

In [None]:
# Test your gradient descent implementation
final_m, final_b, history = gradient_descent(
    X, y,
    m_init=0.0,
    b_init=0.0,
    learning_rate=0.1,
    n_iterations=100
)

print(f"Starting: m=0.0, b=0.0")
print(f"Final: m={final_m:.4f}, b={final_b:.4f}")
print(f"True: m={true_m}, b={true_b}")
print(f"\nFinal MAE: {history[-1]['mae']:.4f}")

---

# **Exercise 2: Learning Rate Experiments (25 points)**

## **2.1 Experiment with Different Learning Rates (15 points)**

Run gradient descent with the following learning rates:
- 0.001 (very small)
- 0.01 (small)
- 0.1 (medium)
- 0.5 (large)
- 1.0 (very large)

Use the same starting point (m=0, b=0) and 500 iterations for each.

In [None]:
# TODO: Run experiments with different learning rates
learning_rates = [0.001, 0.01, 0.1, 0.5, 1.0]
results = {}

for lr in learning_rates:
    # YOUR CODE HERE
    # Run gradient descent and store results
    pass

# Print final parameters for each learning rate
for lr in learning_rates:
    # YOUR CODE HERE
    # Print results
    pass

## **2.2 Visualize Convergence (10 points)**

Create a plot showing MAE vs. iteration for each learning rate. Use a log scale for the y-axis.

In [None]:
# TODO: Create convergence plot
plt.figure(figsize=(12, 6))

# YOUR CODE HERE
# Plot MAE history for each learning rate

plt.xlabel('Iteration')
plt.ylabel('MAE (log scale)')
plt.title('Convergence for Different Learning Rates')
plt.yscale('log')
plt.legend()
plt.show()

### **Question 2.2.1 (Answer in the markdown cell below)**

Based on your experiments:
1. Which learning rate converged fastest?
2. What happened with the very large learning rate (1.0)?
3. What is the tradeoff between using a small vs. large learning rate?

**Your Answer:**

1. [Your answer here]

2. [Your answer here]

3. [Your answer here]

---

# **Exercise 3: Visualizing the Optimization Path (25 points)**

## **3.1 Create a Loss Surface (10 points)**

Create a contour plot showing the loss surface (MAE as a function of m and b).

In [None]:
# TODO: Create the loss surface
# Create a grid of m and b values
m_range = np.linspace(-1, 5, 100)
b_range = np.linspace(-4, 2, 100)
M, B = np.meshgrid(m_range, b_range)

# Compute MAE for each (m, b) combination
Z = np.zeros_like(M)
# YOUR CODE HERE
# Fill in Z with MAE values

# Create contour plot
plt.figure(figsize=(10, 8))
# YOUR CODE HERE
# Create contour plot and mark the true minimum

plt.xlabel('m (slope)')
plt.ylabel('b (intercept)')
plt.title('Loss Surface (MAE)')
plt.colorbar(label='MAE')
plt.show()

## **3.2 Plot Optimization Paths (15 points)**

On the same contour plot, show the optimization path for three different starting points:
- Start 1: (m=4, b=1)
- Start 2: (m=-0.5, b=-3)
- Start 3: (m=1, b=1)

Use learning_rate=0.1 and 200 iterations.

In [None]:
# TODO: Run gradient descent from each starting point
starting_points = [
    (4.0, 1.0, 'red', 'Start 1'),
    (-0.5, -3.0, 'blue', 'Start 2'),
    (1.0, 1.0, 'green', 'Start 3')
]

# YOUR CODE HERE
# Run gradient descent for each starting point and store histories

In [None]:
# TODO: Create visualization with all paths
plt.figure(figsize=(12, 10))

# Draw contour
# YOUR CODE HERE

# Draw paths for each starting point
# YOUR CODE HERE
# Mark starting points with circles, ending points with stars

# Mark true minimum
plt.scatter([true_m], [true_b], color='black', s=300, marker='X',
            label='True Minimum', zorder=10)

plt.xlabel('m (slope)')
plt.ylabel('b (intercept)')
plt.title('Gradient Descent Paths from Different Starting Points')
plt.legend()
plt.show()

### **Question 3.2.1 (Answer in the markdown cell below)**

Do all three paths converge to (approximately) the same point? Why or why not? What does this tell us about the loss surface for linear regression?

**Your Answer:**

[Your answer here]

---

# **Exercise 4: Application & Reflection (15 points)**

## **4.1 Apply to a Non-Linear Function (10 points)**

Now let's apply gradient descent to minimize a different function:

$$f(x) = x^4 - 3x^2 + 2$$

This function has multiple local minima. Use calculus to find the derivative, then implement gradient descent.

In [None]:
def f(x):
    """The function to minimize: f(x) = x^4 - 3x^2 + 2"""
    return x**4 - 3*x**2 + 2

def df(x):
    """Derivative of f(x). YOU NEED TO COMPUTE THIS."""
    # YOUR CODE HERE
    pass  # Replace with the derivative

def gradient_descent_1d(start, learning_rate, n_iterations):
    """1D Gradient descent for f(x)"""
    x = start
    history = [(x, f(x))]

    for _ in range(n_iterations):
        # YOUR CODE HERE
        pass

    return x, history

In [None]:
# Visualize the function first
x_plot = np.linspace(-2.5, 2.5, 200)
y_plot = f(x_plot)

plt.figure(figsize=(10, 6))
plt.plot(x_plot, y_plot, 'b-', linewidth=2, label='$f(x) = x^4 - 3x^2 + 2$')
plt.axhline(y=0, color='k', linewidth=0.5)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Function with Multiple Minima')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# TODO: Run gradient descent from different starting points
# Try starting from: x = -2, x = 0.1, x = 2
starting_points_1d = [-2.0, 0.1, 2.0]
learning_rate = 0.005
n_iterations = 200

plt.figure(figsize=(12, 5))

# YOUR CODE HERE
# Run gradient descent from each starting point
# Create a visualization showing all paths

### **Question 4.1.1**

Did gradient descent find the same minimum from all starting points? Explain what happened.

**Your Answer:**

[Your answer here]

## **4.2 Reflection: Connecting to Faith (5 points)**

In the lecture, we connected gradient descent to Philippians 3:14: *"I press on toward the goal for the prize of the upward call of God in Christ Jesus."*

In Part 4.1, you saw that gradient descent can get "stuck" in local minima—places that seem like the lowest point but aren't the true global minimum.

### **Question 4.2.1**

How might this relate to our spiritual journey? Can you think of ways that people (or you personally) might settle for "local minima" in life rather than pressing on toward the ultimate goal? How can we avoid getting stuck?

**Your Reflection:**

[Your thoughtful response here - aim for 3-5 sentences]

---

## **Submission Checklist**

Before submitting, make sure you have:

- [ ] Completed the AI Assistance Declaration at the top
- [ ] Exercise 1: All functions implemented and tested
- [ ] Exercise 2: Learning rate experiments complete with visualization and written answers
- [ ] Exercise 3: Loss surface and optimization paths visualized with written answers
- [ ] Exercise 4: Non-linear function optimization and reflection completed
- [ ] All code cells run without errors
- [ ] Restarted kernel and run all cells to verify everything works

### **Submission Instructions**

1. Save this notebook
2. **Restart kernel and run all cells** (Kernel → Restart & Run All)
3. Verify all outputs appear correctly (especially visualizations)
4. Check that all written responses are complete
5. Submit the `.ipynb` file to Canvas before Monday, 09 February @ 6:00 PM
   - Grace period until Wednesday, 11 February @ 11:59 PM

**Remember:** This notebook submission is worth 90% of your Week 1 Lab grade. The remaining 10% comes from next week's in-class mastery assessment.

---

## **Next Week Preview**

**Mastery Assessment (Week 3)**: Be prepared to answer 1-2 questions about gradient descent without AI assistance. Focus on:
- What is a gradient and what does it tell us?
- What happens if the learning rate is too large or too small?
- How do MAE and MSE gradients differ? (Hint: think about sign vs. magnitude)
- What is the relationship between the loss function and model parameters?