# Machine Learning Cost Function Assignment
## Exploring RMSE (Root Mean Squared Error)

**Total Points: 25**

---

### Learning Objectives
By the end of this assignment, you will be able to:
1. Understand what a cost function is in machine learning
2. Calculate RMSE (Root Mean Squared Error) from scratch
3. Compare multiple models using RMSE
4. Determine which model performs best
5. Implement mathematical formulas using basic Python

---

### Important Rules
- **You may NOT use any libraries** (no numpy, pandas, sklearn, etc.)
- **Use only basic Python** - loops, lists, math operations
- You may use the built-in `math` module for square root only
- Show all your work in the designated code cells

---

### Point Distribution
- Part 1: Understanding RMSE (5 points)
- Part 2: Implementing RMSE Calculation (8 points)
- Part 3: Comparing Models (7 points)
- Part 4: Analysis and Conclusion (5 points)

---
## Background: What is a Cost Function?

In machine learning, a **cost function** (also called a loss function) measures how well a model's predictions match the actual values.

Think of it like this:
- Your model makes predictions
- We compare those predictions to the real values
- The cost function tells us "how wrong" the model is
- **Lower cost = Better model**

### What is RMSE?

**RMSE (Root Mean Squared Error)** is one of the most common cost functions for regression problems (predicting continuous values like house prices).

**Formula:**

RMSE = √(Sum of (Actual - Predicted)² / Number of data points)

Or more mathematically:

RMSE = √(Σ(yᵢ - ŷᵢ)² / n)

Where:
- yᵢ = actual value
- ŷᵢ = predicted value
- n = number of data points
- Σ = sum of all values

### Why Square the Errors?
1. Makes all errors positive (avoids negative errors canceling out positive ones)
2. Penalizes large errors more heavily
3. Common in statistics and optimization

### Why Take the Square Root?
- Returns the error to the same units as the original data
- Makes it easier to interpret (if predicting house prices in thousands, RMSE is also in thousands)

### Example:
If RMSE = 50 for house prices (in thousands of dollars), the model's predictions are off by about $50,000 on average.

---
## The Data: House Price Predictions

We have 20 houses with their actual prices and predictions from 3 different machine learning models.

**All prices are in thousands of dollars (e.g., 250 = $250,000)**

| House # | Actual Price | Model A | Model B | Model C |
|---------|--------------|---------|---------|----------|
| 1       | 250          | 245     | 260     | 248      |
| 2       | 300          | 305     | 310     | 295      |
| 3       | 180          | 175     | 195     | 182      |
| 4       | 420          | 415     | 400     | 425      |
| 5       | 275          | 280     | 265     | 273      |
| 6       | 350          | 345     | 370     | 348      |
| 7       | 195          | 200     | 185     | 198      |
| 8       | 410          | 405     | 430     | 408      |
| 9       | 285          | 290     | 275     | 287      |
| 10      | 320          | 315     | 340     | 318      |
| 11      | 230          | 235     | 220     | 232      |
| 12      | 380          | 375     | 395     | 378      |
| 13      | 265          | 270     | 255     | 267      |
| 14      | 295          | 300     | 285     | 293      |
| 15      | 340          | 335     | 360     | 338      |
| 16      | 215          | 220     | 205     | 217      |
| 17      | 390          | 385     | 410     | 388      |
| 18      | 270          | 275     | 260     | 272      |
| 19      | 310          | 305     | 325     | 308      |
| 20      | 360          | 355     | 380     | 358      |

In [None]:
# Data provided for you - DO NOT MODIFY
# All values are in thousands of dollars

actual_prices = [250, 300, 180, 420, 275, 350, 195, 410, 285, 320, 
                 230, 380, 265, 295, 340, 215, 390, 270, 310, 360]

model_a_predictions = [245, 305, 175, 415, 280, 345, 200, 405, 290, 315,
                       235, 375, 270, 300, 335, 220, 385, 275, 305, 355]

model_b_predictions = [260, 310, 195, 400, 265, 370, 185, 430, 275, 340,
                       220, 395, 255, 285, 360, 205, 410, 260, 325, 380]

model_c_predictions = [248, 295, 182, 425, 273, 348, 198, 408, 287, 318,
                       232, 378, 267, 293, 338, 217, 388, 272, 308, 358]

print("Data loaded successfully!")
print(f"Number of houses: {len(actual_prices)}")
print(f"First actual price: ${actual_prices[0]},000")
print(f"First Model A prediction: ${model_a_predictions[0]},000")

---
## Part 1: Understanding RMSE (5 points)

Before we calculate RMSE, let's make sure we understand the formula by working through a small example.

### TODO 1.1 (2 points)

Given these simple values:
- Actual: [10, 20, 30]
- Predicted: [12, 18, 32]

Calculate RMSE **by hand** (you can use a calculator) and write your answer in the markdown cell below.

Show your work:
1. Calculate each error: (Actual - Predicted)
2. Square each error
3. Find the average (mean) of squared errors
4. Take the square root

**Your Answer:**

Step 1: Errors
- (write your calculations here)

Step 2: Squared Errors
- (write your calculations here)

Step 3: Mean of Squared Errors
- (write your calculations here)

Step 4: Square Root
- RMSE = (write your final answer here)

### TODO 1.2 (3 points)

Now verify your hand calculation by writing Python code to calculate RMSE for the same small example.

**Remember:** Use only basic Python! No libraries except `math.sqrt()`

In [None]:
import math  # Only for math.sqrt()

# Simple example data
actual = [10, 20, 30]
predicted = [12, 18, 32]

# TODO: Calculate RMSE step by step
# Step 1: Calculate errors (actual - predicted)
errors = []  # Your code here

# Step 2: Square each error
squared_errors = []  # Your code here

# Step 3: Calculate mean of squared errors
mean_squared_error = 0  # Your code here

# Step 4: Take square root
rmse = 0  # Your code here

print(f"RMSE = {rmse}")

---
## Part 2: Implementing RMSE Calculation (8 points)

Now let's create a reusable function to calculate RMSE.

### TODO 2.1 (5 points)

Create a function called `calculate_rmse()` that:
- Takes two parameters: `actual` (list) and `predicted` (list)
- Returns the RMSE value
- Uses only basic Python (loops, lists, arithmetic)
- You may use `math.sqrt()` for the square root

**Hint:** Follow the same steps as TODO 1.2, but make it a function!

In [None]:
import math

def calculate_rmse(actual, predicted):
    """
    Calculate Root Mean Squared Error (RMSE)
    
    Parameters:
    actual (list): List of actual values
    predicted (list): List of predicted values
    
    Returns:
    float: The RMSE value
    """
    # TODO: Your code here
    # Step 1: Calculate squared errors
    
    # Step 2: Calculate mean of squared errors
    
    # Step 3: Take square root and return
    
    pass  # Remove this when you add your code

# Test your function with the simple example
test_actual = [10, 20, 30]
test_predicted = [12, 18, 32]
test_rmse = calculate_rmse(test_actual, test_predicted)
print(f"Test RMSE: {test_rmse}")
print("If this matches your earlier calculation, your function works!")

### TODO 2.2 (3 points)

Test your function with another example to make sure it works:
- Actual: [100, 200, 300, 400]
- Predicted: [110, 190, 310, 390]

Calculate RMSE and explain what this number means in plain English.

In [None]:
# TODO: Test your function
test_actual_2 = [100, 200, 300, 400]
test_predicted_2 = [110, 190, 310, 390]

# Your code here:


**Your Explanation:**

The RMSE value means: (explain in plain English what this number tells us about the model's predictions)

---
## Part 3: Comparing Models (7 points)

Now we'll use RMSE to determine which of the three models (A, B, or C) is best at predicting house prices.

### TODO 3.1 (4 points)

Calculate RMSE for all three models using the house price data provided at the beginning.

Print the results in a clear, organized way.

In [None]:
# TODO: Calculate RMSE for each model
# The data lists are already defined above (actual_prices, model_a_predictions, etc.)

# Calculate RMSE for Model A
rmse_a = 0  # Your code here

# Calculate RMSE for Model B
rmse_b = 0  # Your code here

# Calculate RMSE for Model C
rmse_c = 0  # Your code here

# Print results
print("="*50)
print("MODEL COMPARISON RESULTS")
print("="*50)
# Your code here to print each model's RMSE


### TODO 3.2 (3 points)

Write code to determine which model has the lowest RMSE (best model) and which has the highest (worst model).

Use basic Python - no libraries! You can use `min()` and `max()` functions if helpful.

In [None]:
# TODO: Find the best and worst models

# Create a dictionary or list to store models and their RMSE values
# Your code here

# Find best model (lowest RMSE)
# Your code here

# Find worst model (highest RMSE)
# Your code here

# Print results
print("\n" + "="*50)
print("BEST MODEL: (print which model and its RMSE)")
print("WORST MODEL: (print which model and its RMSE)")
print("="*50)

---
## Part 4: Analysis and Conclusion (5 points)

### TODO 4.1 (3 points)

Write a short analysis answering these questions:

1. Which model performed best? Why do you think this is?
2. What does the RMSE value of the best model mean in practical terms? (Remember, prices are in thousands)
3. If you were a real estate company, would you use any of these models? Why or why not?

**Your Analysis:**

1. Best Model:
   - (Your answer here)

2. Practical Meaning:
   - (Your answer here)

3. Would You Use These Models:
   - (Your answer here)

### TODO 4.2 (2 points)

Create a simple visualization of the results WITHOUT using any plotting libraries.

Use basic Python to print a text-based bar chart showing the RMSE of each model.

Example format:
```
Model A: ******* (7.5)
Model B: *********** (11.2)
Model C: ****** (6.8)
```

Where each * represents approximately 1 unit of RMSE.

In [None]:
# TODO: Create a text-based bar chart
# Hint: You can use string multiplication like "*" * 5 to get "*****"

print("\nRMSE COMPARISON (Text Bar Chart)")
print("="*50)

# Your code here


---
## Bonus Challenge (Optional - No Extra Points, Just for Learning!)

Try these additional challenges to deepen your understanding:

1. Calculate the **Mean Absolute Error (MAE)** for each model
   - Formula: MAE = Sum of |Actual - Predicted| / n
   - Compare: Do the rankings change?

2. Find the **maximum error** (worst single prediction) for each model

3. Calculate what percentage of predictions were within $10,000 of the actual price for each model

In [None]:
# Optional bonus challenges - try if you want extra practice!
# Your code here (optional)


---
## Summary and Key Takeaways

Congratulations! You've learned:

1. **What a cost function is** - A way to measure how well a model performs
2. **How to calculate RMSE** - Step by step, from scratch
3. **How to compare models** - Lower RMSE = Better model
4. **Why RMSE matters** - It tells us the average error in predictions
5. **Basic Python implementation** - You can do complex calculations without libraries!

### Key Concepts:

- **RMSE formula**: √(Σ(actual - predicted)² / n)
- **Lower is better**: A model with lower RMSE makes better predictions
- **Same units**: RMSE is in the same units as your data (dollars, meters, etc.)
- **Penalizes large errors**: Because we square the errors, big mistakes hurt more

### Real-World Applications:

RMSE is used in:
- Real estate price prediction
- Stock market forecasting
- Weather prediction
- Sales forecasting
- Energy consumption prediction
- Any regression problem in machine learning!

---

### Before Submitting:

Make sure you've completed:
- [ ] Part 1: Hand calculation and verification (5 points)
- [ ] Part 2: RMSE function implementation (8 points)
- [ ] Part 3: Model comparison (7 points)
- [ ] Part 4: Analysis and visualization (5 points)
- [ ] All cells run without errors
- [ ] All markdown answers are filled in

**Total: 25 points**

Good luck!