# House Price Prediction with 3D Visualization

This notebook demonstrates house price prediction using multiple linear regression and creates an interactive 3D visualization of the model's predictions. We'll use the Bengaluru House Price dataset and create a 3D surface plot to show the relationships between key features and house prices.

## Import Required Libraries

We'll import the necessary libraries for data manipulation, machine learning, and visualization.

In [30]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import plotly.express as px

## Load and Prepare Data

Load the Bengaluru House Price dataset and prepare it for modeling.

In [31]:
# Load the dataset
#df = pd.read_csv('dataset/Bengaluru_House_Data.csv')
df = pd.DataFrame([
    [2104,3,399900], [1600,3,329900], [2400,3,369000], [1416,2,232000],
    [3000,4,539900], [1985,4,299900], [1534,3,314900], [1427,3,198999],
    [1380,3,212000], [1494,3,242500], [1940,4,239999], [2000,3,347000],
    [1890,3,329999], [4478,5,699900], [1268,3,259900], [2300,4,449900],
    [1320,2,299900], [1236,3,199900], [2609,4,499998], [3031,4,599000],
    [1767,3,252900], [1888,2,255000], [1604,3,242900], [1962,4,259900],
    [3890,3,573900], [1100,3,249900], [1458,3,464500], [2526,3,469000],
    [2200,3,475000], [2637,3,299900], [1839,2,349900], [1000,1,169900],
    [2040,4,314900], [3137,3,579900], [1811,4,285900], [1437,3,249900],
    [1239,3,229900], [2132,4,345000], [4215,4,549000], [2162,4,287000],
    [1664,2,368500], [2238,3,329900], [2567,4,314000], [1200,3,299000],
    [852,2,179900], [1852,4,299900], [1203,3,239500]
], columns=['SquareFeet', 'Bedrooms', 'Price'])

# Basic data cleaning
df = df.dropna()  # Remove rows with missing values
print(df)

# Select relevant features (example: size and price)
# We'll use total_sqft and price for this example
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
df['SquareFeet'] = pd.to_numeric(df['SquareFeet'], errors='coerce')
df = df.dropna()

# Create feature matrix X and target variable y
X = df[['SquareFeet', 'Bedrooms']]  # Using square footage and number of bathrooms
y = df['Price']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.6, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

    SquareFeet  Bedrooms   Price
0         2104         3  399900
1         1600         3  329900
2         2400         3  369000
3         1416         2  232000
4         3000         4  539900
5         1985         4  299900
6         1534         3  314900
7         1427         3  198999
8         1380         3  212000
9         1494         3  242500
10        1940         4  239999
11        2000         3  347000
12        1890         3  329999
13        4478         5  699900
14        1268         3  259900
15        2300         4  449900
16        1320         2  299900
17        1236         3  199900
18        2609         4  499998
19        3031         4  599000
20        1767         3  252900
21        1888         2  255000
22        1604         3  242900
23        1962         4  259900
24        3890         3  573900
25        1100         3  249900
26        1458         3  464500
27        2526         3  469000
28        2200         3  475000
29        

## Train Linear Regression Model

Create and train a multiple linear regression model using the prepared data.

In [32]:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
# Create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# make predictions
y_pred = model.predict(X_test)

# Print model performance
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)
print(f"Training R² score: {train_score:.4f}")
print(f"Testing R² score: {test_score:.4f}")

# evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Training R² score: 0.6680
Testing R² score: 0.6750
Mean Squared Error: 5997340131.963454
R^2 Score: 0.6749768372279563


## Create 3D Surface Plot



In [33]:
# Create a mesh grid for plotting
sqft_range = np.linspace(df['SquareFeet'].min(), df['SquareFeet'].max(), 50)
bath_range = np.linspace(df['Bedrooms'].min(), df['Bedrooms'].max(), 50)
sqft_mesh, bath_mesh = np.meshgrid(sqft_range, bath_range)

# Prepare the points for prediction
mesh_points = np.column_stack((sqft_mesh.ravel(), bath_mesh.ravel()))
mesh_points_scaled = scaler.transform(mesh_points)

# Generate predictions
predicted_prices = model.predict(mesh_points_scaled)
price_mesh = predicted_prices.reshape(sqft_mesh.shape)

# Create the 3D surface plot
fig = go.Figure(data=[go.Surface(
    x=sqft_mesh,
    y=bath_mesh,
    z=price_mesh,
    colorscale='Viridis'
)])

# Update layout with labels and title
fig.update_layout(
    title='House Price Prediction Surface',
    scene=dict(
        xaxis_title='Total Square Feet',
        yaxis_title='Number of Bathrooms',
        zaxis_title='Predicted Price'
    ),
    width=900,
    height=700
)

# Add actual data points
fig.add_trace(go.Scatter3d(
    x=df['SquareFeet'],
    y=df['Bedrooms'],
    z=df['Price'],
    mode='markers',
    marker=dict(
        size=4,
        color='red',
        opacity=0.6
    ),
    name='Actual Prices'
))

# Show the plot
fig.show()


X does not have valid feature names, but StandardScaler was fitted with feature names


X does not have valid feature names, but LinearRegression was fitted with feature names

