
# 🧪 Minimal ML Exercise with PyTorch: Predict House Price from Year

This **tiny, end-to-end exercise** uses a **small dataset** of past years and house prices to train a **linear regression model** in **PyTorch**.  
You’ll learn how to:
1. Load a toy dataset (Year → Price)
2. Visualize it
3. Train a minimal `nn.Linear` model with MSE loss
4. Predict price for a future year

> Keep it simple. Run each cell top-to-bottom. Adjust the data or the learning rate to experiment.



## 0) Setup
If you don't have PyTorch locally, install it first (uncomment and run the below). On Colab, PyTorch is usually preinstalled.


In [None]:

# !pip install torch pandas matplotlib --quiet



## 1) Imports


In [None]:

import torch
import torch.nn as nn
import torch.optim as optim

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt



## 2) Load a tiny dataset (Year, Price)
The dataset is purposely small and simple. You can edit the table below or load from `house_prices.csv` if you downloaded it.


In [None]:

# Option A: Define inline (edit as you like)
years = list(range(2015, 2025))  # 2015..2024
prices = [220_000, 230_000, 235_000, 245_000, 250_000, 265_000, 290_000, 320_000, 340_000, 360_000]

df = pd.DataFrame({"Year": years, "Price": prices})
df


In [None]:

# Option B: Load from CSV (if available in your working directory)
# import pandas as pd
# df = pd.read_csv('house_prices.csv')
# df.head()



## 3) Visualize the data


In [None]:

plt.figure()
plt.scatter(df["Year"], df["Price"])
plt.title("House Price vs. Year")
plt.xlabel("Year")
plt.ylabel("Price")
plt.show()



## 4) Prepare tensors for PyTorch
We’ll normalize the input `Year` for stability and scale `Price` to thousands.


In [None]:

# Convert to tensors
X = torch.tensor(df["Year"].values, dtype=torch.float32).view(-1, 1)
y = torch.tensor(df["Price"].values, dtype=torch.float32).view(-1, 1)

# Normalize features (Year)
X_mean = X.mean()
X_std  = X.std()
Xn = (X - X_mean) / X_std

# Scale price to thousands to keep numbers smaller
price_scale = 1000.0
y_scaled = y / price_scale

Xn, y_scaled



## 5) Define a minimal model
A single linear layer is enough for this toy problem.


In [None]:

model = nn.Linear(1, 1)  # y = w*x + b

criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)  # try 0.01 if it overshoots



## 6) Train the model


In [None]:

losses = []
epochs = 1000

for epoch in range(epochs):
    model.train()
    optimizer.zero_grad()
    preds = model(Xn)
    loss = criterion(preds, y_scaled)
    loss.backward()
    optimizer.step()

    losses.append(loss.item())
    if (epoch+1) % 200 == 0:
        print(f"Epoch {epoch+1:4d} | Loss: {loss.item():.6f}")



## 7) Check training loss


In [None]:

plt.figure()
plt.plot(losses)
plt.title("Training Loss (MSE)")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()



## 8) Predict for any year
Change `test_year` to see predictions. The model expects normalized input, so we use the same `X_mean` and `X_std` as training.


In [None]:

def predict_price(year: int) -> float:
    model.eval()
    x = torch.tensor([[float(year)]], dtype=torch.float32)
    x_norm = (x - X_mean) / X_std
    with torch.no_grad():
        pred_scaled = model(x_norm)
    return float(pred_scaled.item() * price_scale)

test_year = 2026
print(f"Predicted price for {test_year}: ${predict_price(test_year):,.0f}")



## 9) (Optional) Save the trained model


In [None]:

torch.save(
    {
        "model_state_dict": model.state_dict(),
        "X_mean": X_mean.item(),
        "X_std": X_std.item(),
        "price_scale": price_scale,
    },
    "house_price_model.pth",
)
print("Saved to house_price_model.pth")



## 10) (Optional) Load and reuse the model later


In [None]:

checkpoint = torch.load("house_price_model.pth", map_location="cpu")
model_loaded = nn.Linear(1, 1)
model_loaded.load_state_dict(checkpoint["model_state_dict"])

X_mean_loaded = torch.tensor(checkpoint["X_mean"])
X_std_loaded  = torch.tensor(checkpoint["X_std"])
price_scale_loaded = checkpoint["price_scale"]

def predict_with_loaded(year: int) -> float:
    model_loaded.eval()
    x = torch.tensor([[float(year)]], dtype=torch.float32)
    x_norm = (x - X_mean_loaded) / X_std_loaded
    with torch.no_grad():
        pred_scaled = model_loaded(x_norm)
    return float(pred_scaled.item() * price_scale_loaded)

print("Reloaded model prediction for 2026:", f"${predict_with_loaded(2026):,.0f}")



---

### ✅ What you learned
- A minimal **PyTorch** regression pipeline end-to-end
- Normalization, scaling, training loop, loss tracking
- Making predictions for arbitrary inputs

> Try: Change the dataset, adjust the learning rate, or extend the model with `nn.Sequential` and non-linear layers.
