# Gradient Descent Homework

Welcome to your homework on **Gradient Descent with PyTorch**!

In this notebook, you will practice the concepts from today's lesson:
- Linear regression
- Loss functions (MSE)
- Manual gradient computation
- PyTorch Autograd
- Multiple linear regression

**Instructions:**
- Fill in the code cells marked with `# YOUR CODE HERE`
- Do NOT change any other code
- Run all cells to check your answers

---

In [None]:
import torch
import numpy as np

---

## Question 1: Conceptual Questions

Answer the following questions by assigning the correct option letter (as a string) to each variable.

**1a)** In the equation $y = wx + b$, what does $w$ represent?
- A: The input feature
- B: The weight (how strongly the input affects the output)
- C: The predicted output
- D: The loss value

**1b)** What does the Mean Squared Error (MSE) loss function measure?
- A: The sum of all predictions
- B: The average of squared differences between predictions and actual values
- C: The number of training examples
- D: The learning rate

**1c)** If the gradient $\frac{\partial \text{Loss}}{\partial w}$ is **positive**, what should we do to $w$?
- A: Increase $w$
- B: Decrease $w$
- C: Keep $w$ the same
- D: Set $w$ to zero

**1d)** What does `requires_grad=True` do in PyTorch?
- A: It makes the tensor immutable
- B: It tells PyTorch to track operations on this tensor for automatic gradient computation
- C: It normalizes the tensor values
- D: It converts the tensor to a NumPy array

In [None]:
# YOUR CODE HERE
answer_1a = "__"  # Replace __ with A, B, C, or D
answer_1b = "__"
answer_1c = "__"
answer_1d = "__"

---

## Question 2: Compute MSE Loss Manually

Given the following predictions and actual values, compute the MSE loss **by hand** (using PyTorch operations).

$$\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(y_{\text{predicted}} - y_{\text{actual}})^2$$

| Predicted | Actual |
|-----------|--------|
| 50        | 48     |
| 60        | 63     |
| 70        | 68     |
| 80        | 82     |

In [None]:
y_pred = torch.tensor([50, 60, 70, 80], dtype=torch.float32)
y_actual = torch.tensor([48, 63, 68, 82], dtype=torch.float32)

# YOUR CODE HERE
# Compute the MSE loss using the formula above
mse_loss = ...

---

## Question 3: Manual Gradient Computation

Given a simple model $y = wx + b$ with **one data point** $x = 3$, $y_{\text{actual}} = 10$, and current parameters $w = 2$, $b = 1$:

1. Compute the prediction $\hat{y}$
2. Compute the MSE loss (with just one data point: $\text{Loss} = (\hat{y} - y)^2$)
3. Compute the gradient $\frac{\partial \text{Loss}}{\partial w} = 2(\hat{y} - y) \cdot x$
4. Compute the gradient $\frac{\partial \text{Loss}}{\partial b} = 2(\hat{y} - y)$

In [None]:
x_val = 3.0
y_actual_val = 10.0
w_val = 2.0
b_val = 1.0

# YOUR CODE HERE
# Step 1: Compute prediction
y_hat = ...

# Step 2: Compute loss
loss_val = ...

# Step 3: Compute dL/dw
dw = ...

# Step 4: Compute dL/db
db = ...

print(f"Prediction: {y_hat}")
print(f"Loss: {loss_val}")
print(f"dL/dw: {dw}")
print(f"dL/db: {db}")

---

## Question 4: Linear Regression with Manual Gradients

A gym trainer wants to predict how many **calories** a person burns based on the number of **minutes exercised**.

| Minutes Exercised | Calories Burned |
|-------------------|-----------------|
| 10                | 120             |
| 20                | 220             |
| 30                | 340             |
| 40                | 430             |
| 50                | 550             |

Build a linear model $y = wx + b$ using **manual gradient descent** (no autograd).

Use:
- `learning_rate = 0.0001`
- `epochs = 5000`
- `torch.manual_seed(0)` before initializing weights

In [None]:
# --- Data ---
x = torch.tensor([10, 20, 30, 40, 50], dtype=torch.float32)
y = torch.tensor([120, 220, 340, 430, 550], dtype=torch.float32)

# --- Initialize parameters ---
torch.manual_seed(0)

# YOUR CODE HERE
w = ...  # Initialize w using torch.randn(1)
b = ...  # Initialize b using torch.randn(1)

learning_rate = 0.0001
epochs = 5000
n = len(x)

In [None]:
# YOUR CODE HERE
# Implement the gradient descent loop
# For each epoch:
#   1. Compute predictions: y_pred = x * w + b
#   2. Compute MSE loss: loss = mean((y_pred - y)^2)
#   3. Compute error: error = y_pred - y
#   4. Compute gradient for w: dw = (2/n) * sum(x * error)
#   5. Compute gradient for b: db = (2/n) * sum(error)
#   6. Update w and b
#   7. Print loss every 500 epochs

for epoch in range(epochs):
    pass  # Replace this with your implementation

print(f"\nFinal weight: {w.item():.4f}")
print(f"Final bias: {b.item():.4f}")

---

## Question 5: Linear Regression with Autograd

Now solve the **same problem** from Question 4, but using **PyTorch Autograd** instead of computing gradients manually.

Remember the key differences:
- Use `requires_grad=True` when creating `w` and `b`
- Call `loss.backward()` to compute gradients automatically
- Update parameters inside `torch.no_grad()`
- Zero out gradients with `.grad.zero_()` after each update

In [None]:
# --- Data (same as Q4) ---
x = torch.tensor([10, 20, 30, 40, 50], dtype=torch.float32)
y = torch.tensor([120, 220, 340, 430, 550], dtype=torch.float32)

# --- Initialize parameters ---
torch.manual_seed(0)

# YOUR CODE HERE
w_auto = ...  # Initialize with torch.randn(1) and requires_grad=True
b_auto = ...  # Initialize with torch.randn(1) and requires_grad=True

learning_rate = 0.0001
epochs = 5000
n = len(x)

In [None]:
# YOUR CODE HERE
# Implement gradient descent using autograd
# For each epoch:
#   1. Forward pass: y_pred = x * w_auto + b_auto
#   2. Compute MSE loss
#   3. Call loss.backward()
#   4. Update parameters inside torch.no_grad() block
#   5. Zero out gradients
#   6. Print loss every 500 epochs

for epoch in range(epochs):
    pass  # Replace this with your implementation

print(f"\nFinal weight (autograd): {w_auto.item():.4f}")
print(f"Final bias (autograd): {b_auto.item():.4f}")

---

## Question 6: Multiple Linear Regression

A real estate agent wants to predict **house prices** (in $1000s) based on two features:
- **Size** (in hundreds of sq ft)
- **Number of bedrooms**

| Size (100s sqft) | Bedrooms | Price ($1000s) |
|------------------|----------|----------------|
| 8                | 2        | 250            |
| 12               | 3        | 340            |
| 15               | 3        | 395            |
| 18               | 4        | 450            |
| 22               | 4        | 520            |
| 25               | 5        | 580            |

Build a multiple linear regression model:
$$\text{Price} = w_1 \cdot \text{Size} + w_2 \cdot \text{Bedrooms} + b$$

Use Autograd with:
- `learning_rate = 0.0001`
- `epochs = 10000`
- `torch.manual_seed(42)` before initializing weights

In [None]:
# --- Data ---
# x shape: (6, 2) -> 6 houses, 2 features each
x = torch.tensor([[8, 2],
                   [12, 3],
                   [15, 3],
                   [18, 4],
                   [22, 4],
                   [25, 5]], dtype=torch.float32)

y = torch.tensor([250, 340, 395, 450, 520, 580], dtype=torch.float32)

# --- Initialize parameters ---
torch.manual_seed(42)

# YOUR CODE HERE
w = ...  # Shape (2, 1) with requires_grad=True
b = ...  # Shape (1,) with requires_grad=True

learning_rate = 0.0001
epochs = 10000
n = len(x)

In [None]:
# YOUR CODE HERE
# Implement gradient descent using autograd for multiple linear regression
# Remember:
#   - Use matrix multiplication (@) for the forward pass: y_pred = x @ w + b
#   - Use .squeeze() on y_pred before computing loss
#   - Print loss every 1000 epochs

for epoch in range(epochs):
    pass  # Replace this with your implementation

print(f"\nFinal weights: w1={w[0].item():.4f}, w2={w[1].item():.4f}")
print(f"Final bias: {b.item():.4f}")

---

## Question 7: Learning Rate Experiment

The **learning rate** is a critical hyperparameter. Let's see what happens when it's too large or too small.

Using the simple dataset below, run gradient descent with **three different learning rates** and observe the final loss after 1000 epochs.

Fill in the training loop, then answer the question at the end.

In [None]:
# --- Data ---
x = torch.tensor([1, 2, 3, 4, 5], dtype=torch.float32)
y = torch.tensor([3, 5, 7, 9, 11], dtype=torch.float32)  # y = 2x + 1

learning_rates = [0.001, 0.01, 0.5]
epochs = 1000
n = len(x)

for lr in learning_rates:
    torch.manual_seed(42)
    w = torch.randn(1, dtype=torch.float32, requires_grad=True)
    b = torch.randn(1, dtype=torch.float32, requires_grad=True)

    # YOUR CODE HERE
    # Run gradient descent for the given number of epochs using autograd
    final_loss = 0.0
    for epoch in range(epochs):
        pass  # Replace with your implementation

    print(f"LR = {lr}: Final Loss = {final_loss:.4f}, w = {w.item():.4f}, b = {b.item():.4f}")

**Question:** Based on your results above, which learning rate gave the best result and why?

- A: `0.001` - Small learning rate, slow convergence, might not reach minimum
- B: `0.01` - Good balance, converges well to the minimum
- C: `0.5` - Large learning rate, might overshoot and diverge

In [None]:
# YOUR CODE HERE
best_lr_answer = "__"  # Replace __ with A, B, or C

---

## Bonus: Make a Prediction

Using the model you trained in **Question 4** (Calories burned), predict:

1. How many calories would someone burn in **35 minutes** of exercise?
2. How many calories would someone burn in **60 minutes** of exercise?

In [None]:
# Re-setup the trained model from Q4 (run Q4 first if needed)
# If you completed Q4, w and b should still be available

# YOUR CODE HERE
x_new = torch.tensor([35, 60], dtype=torch.float32)
predictions = ...  # Use the model equation: y = w * x + b

print(f"Predicted calories for 35 minutes: {predictions[0].item():.1f}")
print(f"Predicted calories for 60 minutes: {predictions[1].item():.1f}")

---

## Summary

In this homework you practiced:

1. **Core concepts** of linear regression, loss functions, and gradients
2. **Manual MSE computation** from predictions and actual values
3. **Manual gradient computation** using the derivative formulas
4. **Full gradient descent loop** with manually computed gradients
5. **Autograd-based gradient descent** using `loss.backward()`
6. **Multiple linear regression** with matrix multiplication
7. **Learning rate effects** on model training

Great work! These are the building blocks for everything in deep learning!