# House Price Prediction

## Objective
Predict house prices using property features such as size, number of bedrooms, and location.

This notebook includes:
- Loading and exploring the dataset
- Preprocessing and handling categorical variables
- Training a regression model (Linear Regression)
- Evaluating model performance (MAE & RMSE)
- Visualizing results and insights


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np


In [None]:
# Load the dataset
data = pd.read_csv("house_prices.csv")

# Preview data
print("First 5 rows:\n", data.head())
print("\nShape of dataset:", data.shape)
print("Columns:", data.columns)


In [None]:
# Summary statistics
print("\nSummary statistics:\n", data.describe())

# Check for missing values
print("\nMissing values:\n", data.isnull().sum())

# Correlation heatmap
plt.figure(figsize=(10,8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title("Feature Correlation")
plt.show()


In [None]:
# Handle categorical variables
if 'location' in data.columns:
    data = pd.get_dummies(data, columns=['location'], drop_first=True)

# Drop missing values
data = data.dropna()


In [None]:
# Features and target
X = data.drop('price', axis=1)
y = data['price']

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


In [None]:
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)


In [None]:
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print(f"Mean Absolute Error (MAE): {mae}")
print(f"Root Mean Squared Error (RMSE): {rmse}")


In [None]:
plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred, alpha=0.7)
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices")
plt.show()


## Summary / Insights

- The dataset contains numeric and possibly categorical features.
- Linear Regression predicts house prices using these features.
- MAE and RMSE indicate the modelâ€™s prediction accuracy.
- Visualization shows how well predicted prices match actual prices.
- This notebook provides a clear workflow from data loading to evaluation.
