**Hypothesis Test**

- [Assumption Check](#assumption-check)
- [Paramatric Independent t-Test](#independent-samples-test)
- [Paramatric Paired t-Test](#pair)
- [Non-Paramatric Independent t-Test](#wilcoxon-rank-sum)
- [Non-Paramatric Paired t-Test](#wilcoxon-sign-test)

**ANOVA**

- [One Way](#one-way-analysis-of-variance)
- [Comparing Specific Groups](#comparing-specific-groups)
- [Contrast Estimation](#sub-group-and-contrast-estimation)
- [Bonferroni Correction](#bonferroni-correction)
- [Tukey-HSD](#tukeyhsd)
- [Kruskal-Wallis Procedure](#kruskal-wallis-procedure)

In [6]:
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
import matplotlib.pyplot as plt

path = "../src/"
abl = pd.read_csv(path + "data/abalone_sub.csv")
hr_df = pd.read_csv(path + "data/health_promo_hr.csv")
heifers = pd.read_csv(path + "data/antibio.csv")

stud_perf = pd.read_csv(path + "data/student/student-mat.csv", sep=';')
stud_perf2 = stud_perf[stud_perf.Medu != 0]

def outliers_index(x):
    return np.where((0.6745 * np.absolute(x - np.median(x)) / stats.median_abs_deviation(x)) > 2.24)
out_df_list = []

for i,df in stud_perf2.groupby('Medu'):
    to_rm = outliers_index(df.G3)
    out_df = df.drop(df.index[to_rm])
    out_df_list.append(out_df)
stud_perf3 = pd.concat(out_df_list)

### Hypothesis Test

### Assumption Check

#### Skewed

In [None]:
abl.groupby("gender").skew()

#### Kurtosis
- Positive kurtosis implies that the tails are “fatter” than those of a Normal.
- Negative kurtosis indicate that the tails are “thinner” than those of a Normal.

In [None]:
for i,df in abl.groupby('gender'):
    print(f"{df.gender.iloc[0]}: {df.viscera.kurt():.4f}")

##### Hypothesis tests for Normality
$$H_0: \text{Data follows Normal Distribution}$$
$$H_0: \text{Data dose not follow Normal Distribution}$$

In [None]:
x = abl.viscera[abl.gender == "M"]
y = abl.viscera[abl.gender == "F"]

stats.shapiro(x)
stats.shapiro(y)

In [None]:
def checkNormality(data):
    result = stats.shapiro(data)
    if result.pvalue < 0.05:
        print("Data dose NOT follow normal distribution")
    else:
        print("Data follows normal distribution")

checkNormality(x)
checkNormality(y)

#### Equal Variance

If the larger s.d is **more than twice** the smaller one, than we should not use the equal variance form of the test.

In [None]:
abl.groupby('gender').describe()

### Parametric Tests
Assume some distribution held

#### t-Test

- Assumes that the data originate from a **Normal distribution**.

#### Independent Samples Test
- `stats.ttest_ind` cannot set significance level. We could simply compare p-value with level we would like.
- Confidential Interval could set significant level

In [None]:
x = abl.viscera[abl.gender == "M"]
y = abl.viscera[abl.gender == "F"]
t_out = stats.ttest_ind(x, y, alternative = "two-sided", equal_var = True)
ci_95 = t_out.confidence_interval(confidence_level=0.95)

print(f"""
* The p-value for the test is {t_out.pvalue:.3f}. 
* The actual value of the test statistic is {t_out.statistic:.3f}.
* The upper and lower limits of the CI are ({ci_95[0]:.3f}, {ci_95[1]:.3f}).
""")

#### Pair

$$D_i = X_i - Y_i$$
$$H_0: \mu_D = 0$$
$$H_1: \mu_D \neq 0$$

In [None]:
checkNormality(hr_df.baseline)
checkNormality(hr_df.after5)

In [None]:
paired_out = stats.ttest_rel(hr_df.baseline, hr_df.after5)
print(f"""
Test statistic: {paired_out.statistic:.3f}.
p-val: {paired_out.pvalue:.3f}.""")

### Non-parametric Tests

If the distributional assumptions of the t-test are not met

#### Wilcoxon Rank Sum
- Independent 2-sample test
- Both $n_1$ and $n_2$ are at least 10
- Observations (not the ranks) come from an underlying **continuous distribution**

$H_0:$ the distribution of group 1 is in same location of the distribution of group 2.

$H_1:$ the distribution of group 1 is a location shift of the distribution of group 2.

In [None]:
stats.mannwhitneyu(x, y, alternative="two-sided")

#### Wilcoxon Sign Test

- Paired Samples Test
- If the number of non-zero $D_i$ ’s is at least 16, then the test statistic $W$ follows a $N(0,1)$ distribution approximately.
- $D_i = X_i - Y_i$

$$H_0: \text{median of } D_i = 0$$
$$H_1: \text{median of } D_i \neq 0$$

In [None]:
wsr_out = stats.wilcoxon(hr_df.baseline, hr_df.after5, correction=True, method='approx')
print(f"""Test statistic: {wsr_out.statistic:.3f}. p-val: {wsr_out.pvalue:.3f}.""")

### ANOVA

In [None]:
ols('MPG_city ~ C(Cylinders, Treatment)', data=Cars93).fit() # anova model
ols('MPG_city ~ Cylinders', data=Cars93).fit() # regression model

### One-Way F-test: 
Is there any significant difference, at 5% level, between the mean decomposition level of the groups?

- The observations are **independent** of each other.
- The errors are Normally distributed.
- The variance within each group is the same.

$$\text{Estimate } \mu = \text{Intercept} + \text{coef(Type)}$$

In [None]:
# this model set type A as reference level
lm_model = ols('org ~ type', data=heifers).fit()
anova_tab = sm.stats.anova_lm(lm_model, type=3,)
print(lm_model.summary())
print(anova_tab)

                            OLS Regression Results                            
Dep. Variable:                    org   R-squared:                       0.587
Model:                            OLS   Adj. R-squared:                  0.514
Method:                 Least Squares   F-statistic:                     7.973
Date:                Sun, 04 May 2025   Prob (F-statistic):           8.95e-05
Time:                        20:34:41   Log-Likelihood:                 26.655
No. Observations:                  34   AIC:                            -41.31
Df Residuals:                      28   BIC:                            -32.15
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
Intercept            2.8950      0.050  

Set Reference Leve $\Leftrightarrow$ The coefficient of type A is zero in the model $\Leftrightarrow$ The first one in factor

In [None]:
locate_df = pd.read_table(path + "data/locate.txt", delimiter="\\s+")
locate_df.replace({'F': '3-F', 'M': '2-M', 'R': '1-R'}, inplace=True)
locate_lm = ols('sales ~ C(location, Treatment)', data=locate_df).fit()
anova_tab = sm.stats.anova_lm(locate_lm, type=3,)
print(anova_tab)

**Check Assumptions**

In [None]:
f, axs = plt.subplots(1, 2, figsize=(8,4))
tmp = plt.subplot(121)
lm_model.resid.hist()
tmp = plt.subplot(122)
sm.qqplot(lm_model.resid, line="q", ax=tmp)

### Comparing specific groups

In [None]:
lm_model.params

In [None]:
est1  = lm_model.params.iloc[2] - lm_model.params.iloc[1]
MSW = lm_model.mse_resid
df = lm_model.df_resid
t = -stats.t.ppf(0.025, df)

lower_ci = est1 - t*np.sqrt(MSW * (1/6 + 1/6))
upper_ci = est1 + t*np.sqrt(MSW * (1/6 + 1/6))
print(f"""The 95% CI for the diff. between Enrofloxacin and control is ({lower_ci:.3f}, {upper_ci:.3f}).""") 

### Sub-group and Contrast Estimation

In [None]:
c1 = np.array([-1, 0.5, 0.5])
n_vals = np.array([6, 6, 6,])
L = np.sum(c1 * lm_model.params.iloc[2:5])

In [None]:
MSW = lm_model.mse_resid
df = lm_model.df_resid
t = -stats.t.ppf(0.025, df)
se1 = np.sqrt(MSW*np.sum(c1**2 / n_vals))

In [None]:
lower_ci = L - t*se1
upper_ci = L + t*se1
print(f"""The 95% CI for the diff. between the two groups is ({lower_ci:.3f}, {upper_ci:.3f}).""") 

Estimate the confidence interval for a contrast comparing higher education to non-higher education
(i.e. Medu = 4 vs. Medu = 1|2|3)

In [10]:
lm_model = ols('G3 ~ C(Medu, Treatment)', data=stud_perf3).fit()


c1 = np.array([-1/3, -1/3, -1/3, 1])

# get the number of students in each group
n_vals = []
for i in range(1,5):
    l = len(stud_perf3[stud_perf3.Medu==i])
    n_vals.append(l)

# append 0 since the estimate of beta 1 is 0 (Since it is reference level)
# [1:] to remove the intercept 
est_params = np.append([0], lm_model.params.to_numpy()[1:])

L = np.sum(c1 * est_params)
MSW = lm_model.mse_resid
df = lm_model.df_resid
t = -stats.t.ppf(0.025, df)
se1 = np.sqrt(MSW*np.sum(c1**2 / n_vals))
lower_ci = L - t*se1
upper_ci = L + t*se1
print(f"""The 95% CI for the diff. between the two groups is ({lower_ci:.3f}, {upper_ci:.3f}).""") 

The 95% CI for the diff. between the two groups is (0.746, 2.133).


### Bonferroni Correction
Set confidential level to be $(1 - \alpha / m)$

In [None]:
print(locate_lm.summary(alpha=0.05/2))

### Multiple Comparisons

#### TukeyHSD

- Correcting for multiple comparisons
- Construct confidence intervals for **all** pairwise comparisons
- **Shorter** confidence intervals than a Bonferroni correction for all pairwise comparisons.

$$H_0: \mu_X = \mu_Y$$

In [None]:
import statsmodels.stats.multicomp as mc

cp = mc.MultiComparison(heifers.org, heifers.type)
tk = cp.tukeyhsd()
print(tk)

#### Kruskal-Wallis Procedure

- If the assumptions of the ANOVA procedure are not met
- Generalisation of the Wilcoxon Rank-Sum test for 2 independent samples.
- This test should only be used if $n_i \geq 5$ for all groups.

$H_0:$ All groups follow the same distribution

$H_1:$ At least one of the groups’ distribution differs from another by a location shift.

In [None]:
out = [x[1] for x in heifers.org.groupby(heifers.type)]
kw_out = stats.kruskal(*out)
print(f"""The test statistic is {kw_out.statistic:.3f}, the p-value is {kw_out.pvalue:.3f}.""")

In [None]:
[x[1] for x in heifers.org.groupby(heifers.type)]