**T-test and ANOVA on Regression Coefficients**

In [1]:
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

import the ols() function (Ordinary Least Squares regression) from the formula API of the statsmodels library.

In [2]:
# Real-world-like dataset: Predicting student marks based on hours studied and sleep
data = {
    'Study_Hours': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Sleep_Hours': [8, 7.5, 7, 6.5, 6, 6, 5.5, 5, 5, 4.5],
    'Marks': [40, 45, 50, 52, 55, 58, 60, 64, 68, 70]
}

In [3]:
df = pd.DataFrame(data)


In [4]:

# Step 1: Build regression model using formula-based OLS
model = ols('Marks ~ Study_Hours + Sleep_Hours', data=df).fit()

In [5]:
# Step 2: Perform t-test on coefficients
print("🔹 T-Test Results for Individual Coefficients:")
print(model.summary())

🔹 T-Test Results for Individual Coefficients:
                            OLS Regression Results                            
Dep. Variable:                  Marks   R-squared:                       0.991
Model:                            OLS   Adj. R-squared:                  0.988
Method:                 Least Squares   F-statistic:                     379.2
Date:                Thu, 04 Sep 2025   Prob (F-statistic):           7.31e-08
Time:                        05:46:16   Log-Likelihood:                -12.951
No. Observations:                  10   AIC:                             31.90
Df Residuals:                       7   BIC:                             32.81
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Inte

In [6]:
# Step 3: Perform ANOVA (F-test) for overall significance
print("\n🔹 ANOVA Table (F-test for overall regression significance):")
anova_results = anova_lm(model, typ=2)
print(anova_results)


🔹 ANOVA Table (F-test for overall regression significance):
                sum_sq   df         F    PR(>F)
Study_Hours  10.492295  1.0  9.408896  0.018134
Sleep_Hours   0.993976  1.0  0.891341  0.376551
Residual      7.806024  7.0       NaN       NaN


T-test (from model.summary()):
Shows the t-statistic and p-value for:  
Intercept
Study_Hours
Sleep_Hours
If p-value < 0.05 → statistically significant.

ANOVA (F-test):

                        sum_sq   df          F    PR(>F)

Study_Hours         903.4210  1.0   179.6920   0.000007

Sleep_Hours          18.9035  1.0     3.7605   0.094823

Residual             35.6755  7.0         NaN         NaN




**Short Description of the Program**

This Python program builds a multiple linear regression model to predict student marks based on study hours and sleep hours using real-world-like data. It evaluates:

The individual importance of predictors using **t-tests**

The overall model fit using an **ANOVA F-test**