# 🏡 Day 2: Predict House Prices Using Multivariate Linear Regression

Welcome to Day 2 of the 50 Days of ML Challenge! Today, we’ll build a multivariate linear regression model to predict house prices using real-world data from the California Housing dataset.

## 📦 Step 1: Import Required Libraries

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

import joblib
import warnings
warnings.filterwarnings('ignore')


## 📥 Step 2: Load the Dataset
We’ll use the **California Housing** dataset available via `sklearn.datasets`. It contains housing data for districts in California.

In [None]:

from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing(as_frame=True)
df = housing.frame
df.head()


## 🔍 Step 3: Explore the Dataset
Let’s understand the structure and basic statistics of the dataset.

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.describe()

## 📊 Step 4: Exploratory Data Analysis (EDA)
We'll visualize feature distributions, correlations, and relationships.

In [None]:

# Histograms of all features
df.hist(bins=30, figsize=(15, 10))
plt.tight_layout()
plt.show()


In [None]:

# Correlation heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


In [None]:

# Pairplot of selected features
sns.pairplot(df[['MedInc', 'AveRooms', 'HouseAge', 'MedHouseVal']])
plt.show()


## 🧼 Step 5: Data Preprocessing
We’ll separate features and target, and scale the features using StandardScaler.

In [None]:

X = df.drop("MedHouseVal", axis=1)
y = df["MedHouseVal"]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


## 🧠 Step 6: Train the Model
We’ll train a simple linear regression model on the scaled data.

In [None]:

model = LinearRegression()
model.fit(X_train, y_train)


## ✅ Step 7: Evaluate the Model
We’ll evaluate our model using common regression metrics.

In [None]:

y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R² Score: {r2:.2f}")


## 📈 Step 8: Visualize Predictions

In [None]:

plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.show()


## 💾 Step 9: Save the Trained Model

In [None]:
joblib.dump(model, "linear_regression_house_model.pkl")

## 🔮 Step 10: Make Predictions on New Data

In [None]:

sample_input = X_test[0].reshape(1, -1)
predicted_price = model.predict(sample_input)
print(f"Predicted Price: {predicted_price[0]:.2f}")


## 🧠 Final Thoughts
Today we explored a real-world housing dataset and implemented a multivariate linear regression model. We applied proper EDA, preprocessing, training, evaluation, and even saved our model for deployment. Great job! 🚀