**Write a program that takes in data on**
**houses as input and uses machine-learning**
**algorithms to predict the retail price of each**
**house. The program should be able to**
**handle a variety of features, such as the**
**number of rooms, size, age, location, etc.,**
**and use these features to train multiple**
**machine learning models.**

**Your task is to compare the accuracy of the**
**different models and identify which model**
**performs best for predicting house prices.**
**You can use metrics such as mean squared**
**error or R-squared value to evaluate the**
**performance of each model.**

In [None]:
#Loading the required libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

In [None]:
#Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
#Loading the dataset
df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/HousingData.csv')

In [None]:
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,,36.2


In [None]:
df.shape

(506, 14)

In [None]:
x = df.drop(['CRIM','CHAS','RAD','ZN', 'INDUS','NOX','DIS','TAX','PTRATIO','B','MEDV'], axis=1)
y = df['MEDV']

In [None]:
x.head()

Unnamed: 0,RM,AGE,LSTAT
0,6.575,65.2,4.98
1,6.421,78.9,9.14
2,7.185,61.1,4.03
3,6.998,45.8,2.94
4,7.147,54.2,


In [None]:
x.RM = x.RM.fillna(x.RM.median())
x.AGE = x.AGE.fillna(x.AGE.median())
x.LSTAT = x.LSTAT.fillna(x.LSTAT.median())

In [None]:
y.head()

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: MEDV, dtype: float64

In [None]:
x.shape

(506, 3)

In [None]:
y.shape

(506,)

In [None]:
# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

In [None]:
# Initialize models
models = {
    "Linear Regression": LinearRegression(),
    "Decision Tree": DecisionTreeRegressor(),
    "Random Forest": RandomForestRegressor()
}

In [None]:
# Train and evaluate models
results = {}
for model_name, model in models.items():
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    results[model_name] = {"MSE": mse, "R-squared": r2}

In [None]:
# Display results
print("Model Performance:")
for model_name, metrics in results.items():
    print(f"{model_name}:")
    print(f"  Mean Squared Error: {metrics['MSE']}")
    print(f"  R-squared Value: {metrics['R-squared']}")
    print("-" * 30)

Model Performance:
Linear Regression:
  Mean Squared Error: 31.46779703804514
  R-squared Value: 0.570896308877861
------------------------------
Decision Tree:
  Mean Squared Error: 37.193137254901956
  R-squared Value: 0.4928239666349956
------------------------------
Random Forest:
  Mean Squared Error: 19.687584970588233
  R-squared Value: 0.7315345789873262
------------------------------


**'Random Forest' model performs best for predicting house prices.**