Part 3: Multiple Linear Regression
Fit a multiple linear regression model using Building Height, Material Quality, Labor Cost, Concrete Strength, and Foundation Depth as independent variables.
- What is the equation of the multiple regression model?
- Which independent variable has the highest impact on Construction Cost based on the regression coefficients?


In [14]:
import pandas as pd
import statsmodels.api as sm

try:
    df = pd.read_csv("construction_cost_data.csv")
except FileNotFoundError:
    print("Error: 'construction_cost_data.csv' not found.")
    exit()

X = df[['Building_Height', 'Material_Quality_Index', 'Labor_Cost', 'Concrete_Strength', 'Foundation_Depth']]
y = df['Construction_Cost']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

print(model.summary())

coefficients = model.params

print("\nEquation of the multiple regression model:")
equation = "Construction Cost = " + str(coefficients['const']) + " + "
for i, col in enumerate(X.columns[1:]):
    equation += str(coefficients[col]) + " * " + col + " + "
equation = equation[:-3]
print(equation)

# Correct way to find the highest impact variable:
abs_coeffs = abs(coefficients[X.columns[1:]])  # Get absolute coefficients, excluding the constant
highest_impact_variable = abs_coeffs.idxmax() # Get the column name with the highest absolute coefficient

print("\nIndependent variable with the highest impact on Construction Cost:", highest_impact_variable)

                            OLS Regression Results                            
Dep. Variable:      Construction_Cost   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 9.153e+04
Date:                Thu, 06 Feb 2025   Prob (F-statistic):          1.23e-171
Time:                        11:00:08   Log-Likelihood:                -372.31
No. Observations:                 100   AIC:                             756.6
Df Residuals:                      94   BIC:                             772.3
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                    -15