#ðŸ”¹STEP 1: Import Libraries

Description:
This step imports all the required Python libraries for data handling, model building, and evaluation.

pandas and numpy are used for data manipulation

scikit-learn provides machine learning models and evaluation tools

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor

print("Libraries imported successfully âœ…")


Libraries imported successfully âœ…


#ðŸ”¹ STEP 2: Load Housing Price Dataset

Description:
In this step, the Housing Price CSV file is loaded into a pandas DataFrame.
This allows us to view, analyze, and process the data in a structured tabular format before applying machine learning models.

In [None]:
df = pd.read_csv("Housing Price.csv")

print("Dataset loaded successfully âœ…")
print("\nFirst 5 rows of the dataset:")
display(df.head())

print("\nDataset shape (rows, columns):", df.shape)
print("\nColumn names:")
print(df.columns)



Dataset loaded successfully âœ…

First 5 rows of the dataset:


Unnamed: 0,price,area,bedrooms,bathrooms,stories,mainroad,guestroom,basement,hotwaterheating,airconditioning,parking,prefarea,furnishingstatus
0,13300000,7420,4,2,3,yes,no,no,no,yes,2,yes,furnished
1,12250000,8960,4,4,4,yes,no,no,no,yes,3,no,furnished
2,12250000,9960,3,2,2,yes,no,yes,no,no,2,yes,semi-furnished
3,12215000,7500,4,2,2,yes,no,yes,no,yes,3,yes,furnished
4,11410000,7420,4,1,2,yes,yes,yes,no,yes,2,no,furnished



Dataset shape (rows, columns): (545, 13)

Column names:
Index(['price', 'area', 'bedrooms', 'bathrooms', 'stories', 'mainroad',
       'guestroom', 'basement', 'hotwaterheating', 'airconditioning',
       'parking', 'prefarea', 'furnishingstatus'],
      dtype='object')


#ðŸ”¹ STEP 3: Select Features and Target

Description:
This step separates the dataset into:

Features (X): Input variables used for prediction (e.g., Area, Bedrooms, Bathrooms)

Target (y): Output variable to be predicted (House Price)

This separation is necessary for supervised learning.

In [None]:
X = df[['area']]     # Feature
y = df['price']      # Target

print("Feature (X) preview:")
display(X.head())

print("\nTarget (y) preview:")
display(y.head())



Feature (X) preview:


Unnamed: 0,area
0,7420
1,8960
2,9960
3,7500
4,7420



Target (y) preview:


Unnamed: 0,price
0,13300000
1,12250000
2,12250000
3,12215000
4,11410000


#ðŸ”¹ STEP 4: Split the Dataset

Description:
The dataset is divided into:

Training data: Used to train the model

Testing data: Used to evaluate the modelâ€™s performance

This helps measure how well the model generalizes to unseen data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Data split completed âœ…")
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)


Data split completed âœ…
X_train shape: (436, 1)
X_test shape: (109, 1)
y_train shape: (436,)
y_test shape: (109,)


# 1. Linear Regression

Description:
Linear Regression models the relationship between a single independent variable and the house price using a straight-line equation.
It predicts prices by finding the best-fit line that minimizes prediction errors.

In [None]:
lr = LinearRegression()
lr.fit(X_train, y_train)

y_pred_lr = lr.predict(X_test)

print("Linear Regression completed âœ…")
print("Coefficient (slope):", lr.coef_)
print("Intercept:", lr.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred_lr))

print("\nActual vs Predicted Prices:")
comparison = pd.DataFrame({
    'Actual Price': y_test.values,
    'Predicted Price': y_pred_lr
})
display(comparison.head())


Linear Regression completed âœ…
Coefficient (slope): [425.72984194]
Intercept: 2512254.2639593435
Mean Squared Error: 3675286604768.185

Actual vs Predicted Prices:


Unnamed: 0,Actual Price,Predicted Price
0,4060000,5024060.0
1,6650000,5279498.0
2,3710000,4232203.0
3,6440000,4640903.0
4,2800000,4198144.0


# 2. Multi-Linear Regression

Description:
Multi-Linear Regression extends linear regression by using multiple input features (such as area, number of bedrooms, and bathrooms) to predict house prices.
It captures the combined effect of several variables on the target value.

In [None]:
X_multi = df[['area', 'bedrooms', 'bathrooms']]
y = df['price']

print("Multi-linear features preview:")
display(X_multi.head())

X_train, X_test, y_train, y_test = train_test_split(
    X_multi, y, test_size=0.2, random_state=42
)

mlr = LinearRegression()
mlr.fit(X_train, y_train)

y_pred_mlr = mlr.predict(X_test)

print("Multi-Linear Regression completed âœ…")
print("Coefficients:", mlr.coef_)
print("Intercept:", mlr.intercept_)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred_mlr))


Multi-linear features preview:


Unnamed: 0,area,bedrooms,bathrooms
0,7420,4,2
1,8960,4,4
2,9960,3,2
3,7500,4,2
4,7420,4,1


Multi-Linear Regression completed âœ…
Coefficients: [3.45466570e+02 3.60197650e+05 1.42231966e+06]
Intercept: 59485.379208717495
Mean Squared Error: 2750040479309.0522


# 3. Polynomial Regression

Description:
Polynomial Regression models non-linear relationships by transforming input features into polynomial terms.
This allows the model to fit curved patterns in housing price data that linear regression cannot capture.

In [None]:
poly = PolynomialFeatures(degree=2)

X_poly = poly.fit_transform(df[['area']])

print("Polynomial feature transformation completed âœ…")
print("Original X shape:", X.shape)
print("Polynomial X shape:", X_poly.shape)

X_train, X_test, y_train, y_test = train_test_split(
    X_poly, y, test_size=0.2, random_state=42
)

poly_model = LinearRegression()
poly_model.fit(X_train, y_train)

y_pred_poly = poly_model.predict(X_test)

print("Polynomial Regression MSE:", mean_squared_error(y_test, y_pred_poly))


Polynomial feature transformation completed âœ…
Original X shape: (545, 1)
Polynomial X shape: (545, 3)
Polynomial Regression MSE: 3562004338819.1157


# 4. K-Nearest Neighbors (KNN) Regression

Description:
KNN Regression predicts house prices by averaging the prices of the K most similar houses based on feature distance.
It is a simple, instance-based model that relies on similarity rather than a mathematical equation.

In [None]:
knn = KNeighborsRegressor(n_neighbors=5)

knn.fit(X_train, y_train)

y_pred_knn = knn.predict(X_test)

print("KNN Regression completed âœ…")
print("Number of neighbors used:", knn.n_neighbors)
print("Mean Squared Error:", mean_squared_error(y_test, y_pred_knn))


KNN Regression completed âœ…
Number of neighbors used: 5
Mean Squared Error: 3409832445137.6147


# 5. Decision Tree Regression

Description:
Decision Tree Regression predicts house prices by splitting the data into branches based on feature values.
It makes predictions by learning a set of decision rules and works well for capturing complex, non-linear relationships.


In [None]:
dt = DecisionTreeRegressor(random_state=42)

dt.fit(X_train, y_train)

y_pred_dt = dt.predict(X_test)

print("Decision Tree Regression completed âœ…")
print("Tree depth:", dt.get_depth())
print("Number of leaves:", dt.get_n_leaves())
print("Mean Squared Error:", mean_squared_error(y_test, y_pred_dt))


Decision Tree Regression completed âœ…
Tree depth: 20
Number of leaves: 247
Mean Squared Error: 3680624743980.632
