A junior data scientist at a real estate company is tasked with predicting house prices based on multiple factors. The company wants to understand how variables like square footage, number of bedrooms, and age of the house affect house prices. The data scientist decides to use Multiple Linear Regression to analyze the relationship.

The objective is to predict house prices (y) based on three predictors:

- x<sub>1</sub>: Square footage (sqft)
- x<sub>2</sub>: Number of bedrooms
- x<sub>3</sub>: Age of the house

In [1]:
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
# Create a sample dataset
data = pd.DataFrame({
    'square_footage': [1500, 1800, 2400, 3000, 3500, 4000, 4500, 2000, 2200, 3200],
    'num_bedrooms': [3, 3, 4, 4, 5, 5, 6, 3, 3, 4],
    'house_age': [10, 15, 20, 5, 8, 12, 25, 20, 18, 6],
    'house_price': [300000, 350000, 500000, 600000, 650000, 700000, 750000, 400000, 420000, 580000]
})

# Display the first few rows
print(data.head())

   square_footage  num_bedrooms  house_age  house_price
0            1500             3         10       300000
1            1800             3         15       350000
2            2400             4         20       500000
3            3000             4          5       600000
4            3500             5          8       650000


In [3]:
# Define predictors (independent variables) and response (dependent variable)
X = data[['square_footage', 'num_bedrooms', 'house_age']]  # Independent variables
X = sm.add_constant(X)  # Add a constant for the intercept
y = data['house_price']  # Dependent variable


# Verify shape
print(X.shape, y.shape)

(10, 4) (10,)


In [4]:
# Fit the Multiple Linear Regression model
model = sm.OLS(y, X).fit()

# Print the model summary
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:            house_price   R-squared:                       0.975
Model:                            OLS   Adj. R-squared:                  0.963
Method:                 Least Squares   F-statistic:                     79.16
Date:                Sun, 26 Oct 2025   Prob (F-statistic):           3.24e-05
Time:                        10:22:59   Log-Likelihood:                -114.61
No. Observations:                  10   AIC:                             237.2
Df Residuals:                       6   BIC:                             238.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
const            1.02e+05    4.8e+04      2.

  return hypotest_fun_in(*args, **kwds)


### Results ###

**Key Metrics**
r<sup>2</sup> = 0.975

- 97.5% of the variability in house price is explained by the 3 predictors in the model
- This suggests a strong fit of the regression model

adjusted R<sup>2</sup> = 0.963

- The slight drop from r<sup>2</sup> to adjusted R<sup>2</sup> indicates 3 variables slightly overfits the model
- The value of adjusted R<sup>2</sup> shows the regression model still has significant predictive power

F-statistic = 79.16, p-value = 3.24 x e<sup>-5</sup>

- The F-statistic tests the null hypothesis that all regression coefficients are zero
- The very small p-value indicates that the model overall is statistically significant

Coefficients

- Intercept p-value is 0.078, which is marginally statistically insignificant
- Square-footage p-value is 0.007, which is statistically significant. The value of the coefficient indicates that each additional square foot adds $140.55 to the price of the home
- Number of bedrooms p-value is 0.716, which is not statistically significant
- House age p-value is 0.33, which is not statistically significant

Overall, this analysis indicates that the square-footage of the house is the only one of these three factors that influences the house price in a statistically significant way.