# Hypothesis Testing
<hr style="border:2px solid black">

## 1. Introduction

---

**what is hypothesis testing?**
- statistical analysis that uses sample data to assess two mutually exclusive theories about population
- computes sample statistic, and factors in estimates of sampling error to support one of the theories

---

**null hypothesis $H_0$**

- one of the two mutually exclusive theories in hypothesis testing
- typically, it states that there is no effect

---

**alternative hypothesis $H_1$**

- complementary theory to null hypothesis
- typically, it states that population parameter does not equal to null hypothesis value

---

**level of significance**

- probability of rejecting $H_0$ when it is true 
- typical values: $\alpha=0.05$ (95% confidence level)$~\text{or}~$$\alpha=0.01$ (99% confidence level)

---

**p-value**

- metric in test of hypothesis
- probability of obtaining test results at least as extreme as the result actually observed
- calculated from distribution function of test statistic
- small $p$-value is significant: $H_0$ is unlikely or untenable
- not-so-small $p$-value is insignificant: $H_0$ cannot be rejected

---

**guidelines for using p-value**

>|       p-value      |evidence against null hypothesis|
 |:------------------:|:------------------------------:|
 |     $$p>0.10$$     |       weak or no evidence      |
 | $$0.05<p\leq0.10$$ |        moderate evidence       |
 | $$0.01<p\leq0.05$$ |         strong evidence        |
 |    $$p\leq0.01$$   |       very strong evidence     |


---

**two-tailed & one-tailed tests**

- two-tailed
    + deviation from $H_0$ in either direction is unfavorable
    + example: testing if males in two ethnic groups have the same average height
- one-tailed
    + deviation in only direction is unfavorable to $H_0$
    + example: testing the efficacy of a drug

---

**test types based on distribution of test statistic**

- $z$-test:
    + can be used in testing hypotheses regarding means, correlation coefficients etc.
    + applicable when population variance is known or sample size is large (>=30)
- $t$-test
    + can be used in testing hypotheses regarding means, correlation coefficients etc.
    + applicable when population variance is unknown or sample size is small (<30)
- $\chi²$-test
    + can be used in testing hypotheses regarding the nature of one or more distributions
- $F$-test
    + can be used in testing hypotheses regarding equality of variance in two population 
    + useful in testing hypotheses regarding means, proportions and correlation coefficients

---

**load packages**

In [1]:
# data analysis stack
import numpy as np
import pandas as pd

# statistics stack
from scipy import stats

# miscellaneous
import warnings
warnings.simplefilter('ignore')

**load data**

In [2]:
df = sns.load_dataset('penguins')
df.dropna(inplace=True, ignore_index=True)

In [3]:
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female
4,Adelie,Torgersen,39.3,20.6,190.0,3650.0,Male


<hr style="border:2px solid black">

## 2. Test about Means

### 2.1 Comparison of Sample Mean with Population Mean

#### Example
>$\bf{H_0}~$: Chinstrap penguins' body_mass_g mean $=$ 3700.0
>
>$\bf{H_1}~$: Chinstrap penguins' body_mass_g mean $\neq$ 3700.0

In [4]:
species_groups = df.groupby('species')

**calculate the test statistic**

In [5]:
df[df.species=='Chinstrap'].shape

(68, 7)

In [6]:
df[df.species=='Chinstrap']['body_mass_g'].mean()

3733.0882352941176

In [7]:
t_stat, p_value = stats.ttest_1samp(
    a=df[df.species=='Chinstrap']['body_mass_g'],
    popmean=3700.0,
    alternative='two-sided'
)

>test statistic: $~~~t~=~\frac{\bar{x}-\mu}{s}$

In [8]:
t_stat

0.7099340949596884

In [9]:
p_value

0.4802086432837619

**conclusion**
>the null hypothesis cannot be rejected

### 2.2 Comparison of Two Correlated Sample Means

>test statistic: $~~~t~=~\frac{\bar{x}_1\,-\,\bar{x}_2}{\sqrt{(s_1²/n_1)\,+\,(s_2²/n_2)}}$

### 2.3 Comparison of Two Independent Sample Means

**function: [`scipy.stats.ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy.stats.ttest_ind)**

#### Example
>$\bf{H_0}~$: Adelie and Chinstrap penguins have same body_mass_g mean 
>
>$\bf{H_1}~$: Adelie and Chinstrap penguins have different body_mass_g mean

In [10]:
specieswise_body_mass = {
    species: sub_df.body_mass_g for species, sub_df in species_groups
}

**Adelie vs Chinstrap body mass**

In [11]:
t_stat, p_value = stats.ttest_ind(
    list(specieswise_body_mass['Adelie']),
    list(specieswise_body_mass['Chinstrap'])
)

In [12]:
p_value

0.6748289682757558

**conclusion**
- the null hypothesis cannot be rejected

**Adelie vs Gentoo body mass**

In [None]:
t_stat, p_value = stats.ttest_ind(
    list(specieswise_body_mass['Adelie']),
    list(specieswise_body_mass['Gentoo'])
)

In [None]:
p_value

**conclusion**
- very strong evidence against the null hypothesis

<hr style="border:2px solid black">

## 3. Test about Variances

### 3.1 Comparison of Sample Variance with Population Variance

>test statistic: $\chi² = (\text{sample size}-1)\times\left(\frac{\text{sample variance}}{\text{population variance}}\right)$ 

**$\chi²$ test: [`scipy.stats.chi2`](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.chi2.html)**

#### Example
>$\bf{H_0}~$: Adelie penguins' bill_length_mm variance $=$ 10.0
>
>$\bf{H_1}~$: Adelie penguins' bill_length_mm variance $\neq$ 10.0

**sample**

In [13]:
specieswise_bill_length = {
    species: sub_df.bill_length_mm for species, sub_df in species_groups
}

In [14]:
sample = list(specieswise_bill_length['Adelie'])

In [15]:
sample_size = len(sample)
sample_size

146

**parameter**

In [16]:
population_variance = 10.0

**statistic**

In [17]:
chi2_stat = (sample_size - 1)*(np.var(sample)/population_variance)
chi2_stat

102.09252322199288

**degrees of freedom**

In [18]:
dof = sample_size - 1

**level of significance**

In [19]:
alpha = 0.05

**lower critical value**

In [20]:
chi2_critical_lower = stats.chi2.ppf(alpha/2, dof)
chi2_critical_lower

113.55571355377477

**upper crititical valie**

In [21]:
chi2_critical_upper = stats.chi2.ppf(1-(alpha/2), dof)
chi2_critical_upper

180.22912239204106

**conclusion**
>data does not support the null hypothesis

#### User Defined Function

In [22]:
def variance_test(a, popvar, alternative="two-tailed", alpha=0.05):
    """
    chi_squared test comparing sample variance with population variance
    """
    n = len(a)
    chi2_stat = (n - 1) * np.var(a) / popvar 
    
    if alternative == "lower":
        chi2_critical = stats.chi2.ppf(alpha, n-1)
        if chi2_stat <= chi2_critical:
            return "H_0 is untenable"
        else:
            return "H_0 cannot be rejected"
    
    elif alternative == "upper":
        chi2_critical = stats.chi2.ppf(1-alpha, n-1)
        if chi2_stat >= chi2_critical:
            return "H_0 is untenable"
        else:
            return "H_0 cannot be rejected"
    
    else:
        chi2_critical_lower = stats.chi2.ppf(alpha/2, n-1)
        chi2_critical_upper = stats.chi2.ppf(1-(alpha/2), n-1)
        if (chi2_stat <= chi2_critical_lower) or (chi2_stat >= chi2_critical_upper):
            return "H_0 is untenable"
        else:
            return "H_0 cannot be rejected"

In [23]:
variance_test(
    a=specieswise_bill_length['Adelie'],
    popvar=25.0
)

'H_0 is untenable'

### 3.2 Comparison of Sample Variances

- **Bartlett Test:[`scipy.stats.bartlett`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.bartlett.html#scipy.stats.bartlett)**
>$\chi²$-test assuming normal distribution

- **Levene Test: [`scipy.stats.levene`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.levene.html)**
>$F$-test for non-normal distribution

#### Example
>$\bf{H_0}~$: all penguin species have the same variance of flipper length
>
>$\bf{H_1}~$: at least two penguin species have different variance of flipper length

In [24]:
specieswise_flipper_length = {
    species: sub_df.flipper_length_mm for species, sub_df in species_groups
}

In [25]:
W_stat, p_value = stats.levene(
    *specieswise_flipper_length.values(),
    center='median'
)

In [26]:
p_value

0.6426253107522957

**conclusion**
>no evidence against the null hypothesis

<hr style="border:2px solid black">

## 4. Independence of Categorical Variables

#### function: [`scipy.stats.chi2_contingency`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html)

#### Example
>$\bf{H_0}~$: penguin species and island variables are independent
>
>$\bf{H_1}~$: penguin species and island variables are not independent

In [27]:
contingency_table = pd.crosstab(df.species, df.island)
contingency_table

island,Biscoe,Dream,Torgersen
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adelie,44,55,47
Chinstrap,0,68,0
Gentoo,119,0,0


In [28]:
statistic, pvalue, dof, expected_freq = stats.chi2_contingency(
    observed=contingency_table,
    correction=True
)

In [29]:
print(f"""
statistic: {statistic}
pvalue: {pvalue},
dof: {dof},
expected_freq: 
{pd.DataFrame(expected_freq)}
""")


statistic: 284.5900126880923
pvalue: 2.2818915409873682e-60,
dof: 4,
expected_freq: 
           0          1          2
0  71.465465  53.927928  20.606607
1  33.285285  25.117117   9.597598
2  58.249249  43.954955  16.795796



**conclusion**
- data does not support the null hypothesis

<hr style="border:2px solid black">