# Multiple linear regression model 
The modeling objective is to understand how different service-level features contribute to the **Overall Rating** assigned by passengers. To achieve this, a multiple linear regression model will be built using the **Ordinary Least Squares (OLS) method** from the *statsmodels* library.

In [3]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv("Airline_review.csv")

# Convert overall rating to numeric
df['Overall Rating'] = pd.to_numeric(df['Overall_Rating'], errors='coerce')

# Define feature columns and target variable
features = ["Seat Comfort", "Cabin Staff Service", "Food & Beverages", "Ground Service", 
            "Inflight Entertainment", "Wifi & Connectivity", "Value For Money"]
target = "Overall Rating"

# Convert feature columns to numeric (handling missing values)
df[features] = df[features].apply(pd.to_numeric, errors='coerce')

# Drop rows with missing values in either features or target
df = df.dropna(subset=features + [target])

# Prepare independent (X) and dependent (y) variables
X = df[features]
y = df[target]

# Add constant term for intercept
X = sm.add_constant(X)

# Fit an OLS regression model
model = sm.OLS(y, X).fit()

# Print summary with coefficients and statistical significance (p-values)
print(model.summary())


                            OLS Regression Results                            
Dep. Variable:         Overall Rating   R-squared:                       0.437
Model:                            OLS   Adj. R-squared:                  0.432
Method:                 Least Squares   F-statistic:                     91.07
Date:                Wed, 12 Mar 2025   Prob (F-statistic):           4.42e-98
Time:                        16:10:53   Log-Likelihood:                -1642.3
No. Observations:                 830   AIC:                             3301.
Df Residuals:                     822   BIC:                             3338.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                             coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------
const                     -0

## 1. Model Fit (R-Squared and Adjusted R-Squared)
R-squared = 0.437 → The model explains 43.7% of the variation in Overall Rating based on the selected predictors.
Adjusted R-squared = 0.432 → Similar to R² but adjusted for the number of predictors. A value of 0.432 suggests that there are other important factors not included in this model.
🔹 Interpretation: The model has moderate explanatory power, but other variables (e.g., ticket price, flight duration, delays) could improve its predictive accuracy.

## 2. Statistical Significance of the Model (F-Statistic and P-Value)
F-statistic = 91.07, P-value = 4.42e-98 (very small)
This means that at least one of the predictors is significantly contributing to the model.
🔹 Interpretation: The model is statistically significant overall.

## 3. Interpretation of Individual Predictors (Coefficients, T-Values, and P-Values)
Each coefficient tells us how much Overall Rating changes when the respective feature increases by one unit, holding all other variables constant.

| Predictor               | Coefficient | P-Value | Interpretation |
|-------------------------|------------|---------|---------------|
| **const (Intercept)**   | -0.0631    | 0.616   | Not significant; the baseline rating when all factors are 0. |
| **Seat Comfort**        | -0.0244    | 0.754   | 🚫 Not significant (P > 0.05). Does not strongly impact overall rating. |
| **Cabin Staff Service** | 0.3128     | 0.000   | ✅ Strong positive impact on overall rating. |
| **Food & Beverages**    | 0.0722     | 0.353   | 🚫 Not significant (P > 0.05). Does not strongly impact overall rating. |
| **Ground Service**      | 0.1837     | 0.007   | ✅ Moderate positive impact on overall rating. |
| **Inflight Entertainment** | -0.0463  | 0.543   | 🚫 Not significant (P > 0.05). No strong impact on overall rating. |
| **Wifi & Connectivity** | -0.0676    | 0.384   | 🚫 Not significant (P > 0.05). No strong impact on overall rating. |
| **Value For Money**     | 0.7428     | 0.000   | ✅ Strongest positive impact on overall rating. |


#### Key Takeaways:

The most significant factors influencing Overall Rating are:

✅ Value for Money (Strongest impact, Coefficient = 0.7428, P < 0.001)
✅ Cabin Staff Service (Positive impact, Coefficient = 0.3128, P < 0.001)
✅ Ground Service (Moderate impact, Coefficient = 0.1837, P = 0.007)
Factors that surprisingly have NO significant effect on Overall Rating:

🚫 Seat Comfort (P = 0.754)
🚫 Food & Beverages (P = 0.353)
🚫 Inflight Entertainment (P = 0.543)
🚫 Wifi & Connectivity (P = 0.384)
Possible Explanation: Customers might prioritize Value for Money and Cabin Crew Service over factors like Seat Comfort, Entertainment, or Wifi.


## 4. Multicollinearity Check (Condition Number)
Condition Number = 13.3 (below 30)
🔹 Interpretation: No serious multicollinearity issues (predictors are not strongly correlated with each other).


## 5. Residual Analysis (Model Assumptions)
Durbin-Watson = 1.966 → Indicates no strong autocorrelation in residuals (good).
Jarque-Bera Test (Prob = 2.25e-32) → Shows some non-normality in residuals (might need further checks).


## Final Summary:
✅ Key predictors:

- Value for Money (Most important)
- Cabin Staff Service
- Ground Service

❌ Not significant predictors:

- Seat Comfort
- Food & Beverages
- Inflight Entertainment
- Wifi & Connectivity
  
🎯 Insights for Airlines:

- Invest in "Value for Money" improvements (better services at competitive prices).
- Train and support cabin staff for better customer service.
- Ground service matters – ensure smooth check-in, baggage handling, and boarding.