# Module 4: Multiple Linear Regression
Multiple Linear Regression (MLR) is an extension of simple linear regression where more than one independent variable is used to predict a dependent variable.

## 🎯 Learning Objectives
- Understand how multiple variables influence an outcome
- Build and interpret multiple regression models in Python
- Evaluate model fit with R-squared and p-values
- Check for multicollinearity and examine residuals

## 🧠 What is Multiple Linear Regression?
MLR models the linear relationship between a continuous outcome and two or more predictors.

General form:
$$ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_kX_k + \varepsilon $$
- $Y$: Dependent variable
- $X_1...X_k$: Independent variables
- $\beta_0$: Intercept
- $\beta_1...\beta_k$: Coefficients
- $\varepsilon$: Error term

## 📦 Example: Predicting Exam Scores
We’ll use hours studied and attendance rate to predict exam scores.

In [None]:
import pandas as pd
import statsmodels.api as sm

# Simulated dataset
data = pd.DataFrame({
    'hours_studied': [2, 3, 5, 7, 9, 10],
    'attendance': [80, 85, 90, 95, 98, 99],
    'score': [50, 55, 65, 70, 78, 85]
})

X = data[['hours_studied', 'attendance']]
X = sm.add_constant(X)  # add intercept
y = data['score']

model = sm.OLS(y, X).fit()
print(model.summary())

## 🔍 Interpreting Output
- **Coefficients**: How much the target changes per unit increase in predictor
- **R-squared**: Proportion of variance explained by the model
- **p-values**: Check statistical significance of predictors

## 📏 Checking Multicollinearity with VIF

In [None]:
from statsmodels.stats.outliers_influence import variance_inflation_factor

vif_data = pd.DataFrame()
vif_data['feature'] = X.columns
vif_data['VIF'] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif_data)

## 📈 Visualizing Residuals
Residuals should be randomly scattered for assumptions to hold.

In [None]:
import matplotlib.pyplot as plt

residuals = model.resid
fitted = model.fittedvalues

plt.scatter(fitted, residuals)
plt.axhline(0, color='red', linestyle='--')
plt.xlabel('Fitted Values')
plt.ylabel('Residuals')
plt.title('Residuals vs Fitted')
plt.show()

## 🧠 Good Practices in MLR
- Avoid using too many predictors with small datasets
- Check correlation among predictors
- Interpret coefficients in context
- Use adjusted R² for better model evaluation

## ✅ Practice Exercises
1. Use a dataset with at least three predictors to fit an MLR model.
2. Interpret the regression coefficients and R-squared.
3. Check for multicollinearity using VIF.
4. Create a residual plot and assess model assumptions.
5. Try adding or removing predictors—how does model performance change?