# Stochastic Gradient Descent from Scratch

We're implementing one of the most powerful algorithms in machine learning: **Stochastic Gradient Descent (SGD)**. This is how machines learn to make predictions by gradually improving their guesses.

**Goal**: Predict house prices from area using only basic Python - no fancy libraries!

## 1. Our Dataset and Model

We have 15 houses with their area (m²) and price (k€). Our model is simple linear regression:

**Price = b + w × Area**

- `b` = base price (intercept)
- `w` = price per square meter (slope)

In [1]:
houses = [
    (50, 200), (80, 320), (100, 380), (120, 460), (150, 550),
    (180, 640), (200, 720), (40, 180), (90, 350), (110, 420),
    (160, 580), (220, 800), (70, 280), (130, 480), (170, 620)
]

def predict(area, b, w):
    # TODO: Implement error
    return 0

print("Sample houses (area, price):")
for i in range(5):
    area, price = houses[i]
    print(f"{area}m² -> {price}k€")

Sample houses (area, price):
50m² -> 200k€
80m² -> 320k€
100m² -> 380k€
120m² -> 460k€
150m² -> 550k€


## 2. Try Your Own Model

Before we let the algorithm learn, try setting your own parameters. What do you think are good values for base price and price per m²?

In [2]:
# Try your own parameters here!
my_b = 100  # base price in k€
my_w = 3    # price per m² in k€

print(f"Your model: Price = {my_b} + {my_w} × Area")
print("\nYour predictions:")
for i in range(5):
    area, actual = houses[i]
    pred = predict(area, my_b, my_w)
    error = abs(pred - actual)
    print(f"{area}m² -> Predicted: {pred:.0f}k€, Actual: {actual}k€, Error: {error:.0f}k€")

Your model: Price = 100 + 3 × Area

Your predictions:
50m² -> Predicted: 0k€, Actual: 200k€, Error: 200k€
80m² -> Predicted: 0k€, Actual: 320k€, Error: 320k€
100m² -> Predicted: 0k€, Actual: 380k€, Error: 380k€
120m² -> Predicted: 0k€, Actual: 460k€, Error: 460k€
150m² -> Predicted: 0k€, Actual: 550k€, Error: 550k€


## 3. Let the Algorithm Learn (SGD)

Now watch SGD automatically find better parameters by:
1. Making predictions on each house
2. Calculating the error
3. Adjusting parameters to reduce future errors
4. Repeating until parameters converge

In [3]:
def compute_error(houses, b, w):
    # TODO: Implement a method that computes this error
    return None

b, w = 0, 0
learning_rate = 0.00001
epochs = 10

print(f"Initial: b={b:.2f}, w={w:.4f}, Error={compute_error(houses, b, w):.1f}")

for epoch in range(epochs):
    for area, actual_price in houses:
        # TODO: predict the price for this house
        pred = None
        # TODO: Based on this prediction update your estimates for w and b.

    loss = compute_error(houses, b, w)
    print(f"Epoch {epoch+1}: b={b:.2f}, w={w:.4f}, Error={loss:.1f}")

print(f"\nFinal model: Price = {b:.2f} + {w:.4f} * Area")

TypeError: unsupported format string passed to NoneType.__format__

## 4. Evaluate the Results

Let's see how well our learned model performs compared to your manual guess:

In [None]:
print("SGD Predictions vs Your Manual Predictions:")
print("House\tActual\tSGD Pred\tSGD Error\tYour Pred\tYour Error")
print("-" * 65)

total_sgd_error = 0
total_your_error = 0

for i, (area, actual) in enumerate(houses[:10]):  # Show first 10 houses
    sgd_pred = predict(area, b, w)
    your_pred = predict(area, my_b, my_w)

    sgd_error = abs(sgd_pred - actual)
    your_error = abs(your_pred - actual)

    total_sgd_error += sgd_error
    total_your_error += your_error

    print(f"{area}m²\t{actual}k€\t{sgd_pred:.0f}k€\t\t{sgd_error:.0f}k€\t\t{your_pred:.0f}k€\t\t{your_error:.0f}k€")

print(f"\nAverage Error (first 10 houses):")
print(f"SGD Model: {total_sgd_error/10:.1f}k€")
print(f"Your Model: {total_your_error/10:.1f}k€")

print(f"\nFinal Error: {compute_error(houses, b, w):.1f}")
print(f"Your Error: {compute_error(houses, my_b, my_w):.1f}")

SGD Predictions vs Your Manual Predictions:
House	Actual	SGD Pred	SGD Error	Your Pred	Your Error
-----------------------------------------------------------------
50m²	200k€	189k€		11k€		250k€		50k€
80m²	320k€	300k€		20k€		340k€		20k€
100m²	380k€	374k€		6k€		400k€		20k€
120m²	460k€	448k€		12k€		460k€		0k€
150m²	550k€	559k€		9k€		550k€		0k€
180m²	640k€	670k€		30k€		640k€		0k€
200m²	720k€	744k€		24k€		700k€		20k€
40m²	180k€	152k€		28k€		220k€		40k€
90m²	350k€	337k€		13k€		370k€		20k€
110m²	420k€	411k€		9k€		430k€		10k€

Average Error (first 10 houses):
SGD Model: 16.3k€
Your Model: 18.0k€

Final Error: 15.5
Your Error: 18.0


## Key Takeaways

🎯 **SGD automatically found good parameters** by learning from data

⚡ **Each house teaches the algorithm** - errors guide parameter updates

🔄 **Iterative improvement** - small steps lead to better predictions

📊 **Lower MSE = Better model** - this is how we measure success

This is the foundation of how neural networks, recommendation systems, and most AI learns from data!