6. Multiple Linear Regression
Scenario based question:
A startup company wants to predict the monthly salary of employees based on their years of experience, education level (in years of study), and weekly working hours. The HR team has collected data from 10 employees and wants to build a Multiple Linear Regression model to:

--Identify which factors significantly impact salary.

--Predict salaries of new employees.

--Evaluate the model’s performance.

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import statsmodels.api as sm

In [2]:
# Sample dataset
data = {
    "experience": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],     # Years of experience
    "education":  [12, 12, 14, 14, 16, 16, 18, 18, 20, 20],  # Education years
    "hours":      [35, 36, 37, 38, 40, 42, 44, 45, 47, 50],  # Weekly hours worked
    "salary":     [25, 28, 32, 36, 42, 48, 55, 60, 68, 75]   # Monthly salary (in $1000s)
}

In [3]:
df = pd.DataFrame(data)


In [4]:
# Features and Target
X = df[["experience", "education", "hours"]]
y = df["salary"]

In [5]:
# Train Multiple Linear Regression model
model = LinearRegression()
model.fit(X, y)

In [6]:
# Predictions
y_pred = model.predict(X)

In [7]:
# Evaluation
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y, y_pred)

In [8]:
print("Model Coefficients (β values):", model.coef_)
print("Model Intercept (β0):", model.intercept_)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)
print("R² Score:", r2)

Model Coefficients (β values): [0.6952381  0.54047619 2.69047619]
Model Intercept (β0): -76.95714285714286
Mean Squared Error (MSE): 0.5547619047619035
Root Mean Squared Error (RMSE): 0.7448234050846573
R² Score: 0.9979260461895327


In [9]:
new_employee = pd.DataFrame({"experience": [5], "education": [16], "hours": [40]})
predicted_salary = model.predict(new_employee)

In [10]:
print("\nPredicted Salary for New Employee: $", round(predicted_salary[0], 2), "k")


Predicted Salary for New Employee: $ 42.79 k


In [11]:
# Statistical summary using statsmodels
X_sm = sm.add_constant(X)   # add intercept term
ols_model = sm.OLS(y, X_sm).fit()
print("\nStatistical Summary:\n", ols_model.summary())


Statistical Summary:
                             OLS Regression Results                            
Dep. Variable:                 salary   R-squared:                       0.998
Model:                            OLS   Adj. R-squared:                  0.997
Method:                 Least Squares   F-statistic:                     962.3
Date:                Mon, 01 Sep 2025   Prob (F-statistic):           1.95e-08
Time:                        13:49:05   Log-Likelihood:                -11.243
No. Observations:                  10   AIC:                             30.49
Df Residuals:                       6   BIC:                             31.70
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        -76.9571     16.