# House Price Prediction (Machine Learning)

This notebook demonstrates a simple end-to-end ML workflow:
- Load dataset
- Clean and preprocess data
- Train baseline and improved models
- Evaluate performance using RMSE


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression


## 1) Load Dataset
Replace `data.csv` with your dataset file name when you upload it.

In [None]:
# Example placeholder dataset (you can replace with your real dataset later)
# This makes the notebook runnable even without external files

np.random.seed(42)
data = pd.DataFrame({
    "sqft": np.random.randint(500, 3500, 200),
    "bedrooms": np.random.randint(1, 6, 200),
    "bathrooms": np.random.randint(1, 4, 200),
    "age": np.random.randint(0, 60, 200),
})

# Target variable: price (synthetic)
data["price"] = (
    data["sqft"] * 300
    + data["bedrooms"] * 10000
    + data["bathrooms"] * 15000
    - data["age"] * 500
    + np.random.normal(0, 20000, 200)
)

data.head()

## 2) Train/Test Split

In [None]:
X = data.drop("price", axis=1)
y = data["price"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_test.shape

## 3) Baseline Model: Linear Regression

In [None]:
lr = LinearRegression()
lr.fit(X_train, y_train)
pred_lr = lr.predict(X_test)

rmse_lr = mean_squared_error(y_test, pred_lr, squared=False)
rmse_lr

## 4) Improved Model: Random Forest Regressor

In [None]:
rf = RandomForestRegressor(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)
pred_rf = rf.predict(X_test)

rmse_rf = mean_squared_error(y_test, pred_rf, squared=False)
rmse_rf

## 5) Quick Comparison

In [None]:
results = pd.DataFrame({
    "Model": ["Linear Regression", "Random Forest"],
    "RMSE": [rmse_lr, rmse_rf]
})

results

In [None]:
results.set_index("Model").plot(kind="bar")
plt.title("RMSE Comparison")
plt.ylabel("RMSE")
plt.show()

## Conclusion
This notebook shows a simple ML workflow for house price prediction. In a real version, the dataset can be replaced with a public housing dataset (e.g., Kaggle) and further feature engineering can be added.