# Hypothesis Testing

- Comparison of Means
- Tests of Proportion
- Chi-Square Tests of Independence
- ANOVA (Analysis of Variance)

## Comparison of Means

**Theory:**
The two-sample t-test is used to determine if there is a significant difference between the means of two independent groups. It assumes that the data is normally distributed and that the variances of the two groups are equal.

- **Null Hypothesis (H0)**: 
    - The means of the two groups are equal.
- **Alternative Hypothesis (H1)**: 
    - The means of the two groups are not equal.


- **T-statistic**: 
    - Measures the size of the difference relative to the variation in the sample data.
- **P-value**: 
    - The probability of observing the data, or something more extreme, if the null hypothesis is true. 
    - A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis

---



### Exercise 

**Scenario:**
A company wants to know if there is a significant difference in the average salaries of employees in two different departments: *Sales* and *Marketing*.

**Task:**
Perform a two-sample t-test to determine if there is a significant difference in the average salaries between the two departments.


<!--
```python
import scipy.stats as stats

# Perform two-sample t-test
t_stat, p_value = stats.ttest_ind(sales_dept, mktg_dept)

print(f"T-statistic: {t_stat}, P-value: {p_value}")
```
-->

In [1]:
from scipy.stats import ttest_ind

In [2]:
sales_dept = [55,000, 60,000, 58,000, 62,000, 59,000]
mktg_dept = [52,000, 54,000, 53,000, 55,000, 56,000]

## Tests of Proportion

**Theory:**
The z-test for proportions is used to determine if there is a significant difference between the proportions of two independent groups. It is commonly used in surveys and experiments.

- **Null Hypothesis (H0)**: 
    - The proportions of the two groups are equal.
- **Alternative Hypothesis (H1)**: 
    - The proportions of the two groups are not equal.


- **Z-statistic**: 
    - Measures the difference between the observed proportions relative to the standard error of the difference
- **P-value**: 
    - The probability of observing the data, or something more extreme, if the null hypothesis is true. 
    - A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis

---



### Exercise

**Scenario:**
A survey was conducted to determine if there is a significant difference in the proportion of people who prefer product A over product B in two different cities.

**Task:**
Perform a z-test for proportions to determine if there is a significant difference in the proportion of people who prefer product A between the two cities.


**Data:**
- City 1: 120 out of 200 people prefer product A.
- City 2: 150 out of 250 people prefer product A.


<!--
```python
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest

# Data
count = [120, 150]
nobs = [200, 250]

# Perform z-test for proportions
z_stat, p_value = proportions_ztest(count, nobs)

print(f"Z-statistic: {z_stat}, P-value: {p_value}")
```
-->

In [3]:
import statsmodels.api as sm
from statsmodels.stats.proportion import proportions_ztest

## Chi-Square Tests of Independence

**Theory:**
The chi-square test of independence is used to determine if there is a significant association between two categorical variables. It compares the observed frequencies in each category to the frequencies expected if the variables were independent.

- **Null Hypothesis (H0)**: 
    - The two variables are independent.
- **Alternative Hypothesis (H1)**: 
    - The two variables are not independent.

- **Z-statistic**: 
    - Measures the difference between the observed proportions relative to the standard error of the difference
- **P-value**: 
    - The probability of observing the data, or something more extreme, if the null hypothesis is true. 
    - A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis
- **Degrees of Freedom (dof)**: 
    - Calculated as (number of rows - 1) * (number of columns - 1).
- **Expected Frequencies**: The frequencies expected if the variables were independent.


---


**Scenario:**
A researcher wants to know if there is an association between gender and preference for a new product.

**Task:**
Perform a chi-square test of independence to determine if there is a significant association between gender and preference for the new product.

**Data:**
- Male: 30 prefer, 20 do not prefer
- Female: 25 prefer, 25 do not prefer


<!--
```python
import scipy.stats as stats

# Data
observed = [[30, 20], [25, 25]]

# Perform chi-square test of independence
chi2_stat, p_value, dof, expected = stats.chi2_contingency(observed)

print(f"Chi-square statistic: {chi2_stat}, P-value: {p_value}")
```
-->

In [4]:
from scipy.stats import chi2_contingency as chi2

## ANOVA (Analysis of Variance)

**Theory:**
ANOVA is used to determine if there are any statistically significant differences between the means of three or more independent groups. It tests the null hypothesis that all group means are equal.

- **Null Hypothesis (H0)**: 
    - All group means are equal.
    
- **Alternative Hypothesis (H1)**: 
    - At least one group mean is different.


- **F-statistic:** 
    - Measures the ratio of the variance between the groups to the variance within the groups.

- **P-value:** 
    - The probability of observing the data, or something more extreme, if the null hypothesis is true. 
    - A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis.

In [5]:
import scipy.stats as stats

# Data
group1 = [23, 20, 22, 21, 24]
group2 = [30, 32, 29, 31, 33]
group3 = [40, 42, 41, 39, 43]

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group1, group2, group3)

print(f"F-statistic: {f_stat}, P-value: {p_value}")

F-statistic: 180.66666666666734, P-value: 1.1028256754466441e-09


### More detailed steps of ANOVA

#### 1. Formulate Hypotheses
- **Null Hypothesis (H0)**: All group means are equal.
- **Alternative Hypothesis (H1)**: At least one group mean is different.

#### 2. Calculate Group Means and Overall Mean
- Group Means: Calculate the mean of each group.
- Overall Mean: Calculate the mean of all data points combined.

#### 3. Calculate Sum of Squares
**Total Sum of Squares (SST)**: Measures the total variability in the data. 
$$[ SST = \sum_{i=1}^{N} (X_i - \bar{X})^2 ]$$
- where $(X_i)$ is each individual data point
- $(\bar{X})$ is the overall mean
- $(N)$ is the total number of data points

**Between-Group Sum of Squares (SSB)**: 
- Measures the variability between the group means. 
$$[ SSB = \sum_{j=1}^{k} n_j (\bar{X}_j - \bar{X})^2 ]$$ 
- where $(n_j)$ is the number of data points in group (j)
- $(\bar{X}_j)$ is the mean of group (j)
- $(k)$ is the number of groups.

**Within-Group Sum of Squares (SSW)**: Measures the variability within each group. 
$$[ SSW = \sum_{j=1}^{k} \sum_{i=1}^{n_j} (X_{ij} - \bar{X}j)^2 ]$$ 
- where $(X{ij})$ is each data point in group $(j)$.

#### 4. Calculate Degrees of Freedom
- **Total Degrees of Freedom (DFT)**: (N - 1)
- **Between-Group Degrees of Freedom (DFB)**: (k - 1)
- **Within-Group Degrees of Freedom (DFW)**: (N - k)

#### 5. Calculate Mean Squares
- **Mean Square Between (MSB)**: 
$$[ MSB = \frac{SSB}{DFB} ]$$
- **Mean Square Within (MSW)**: 
$$[ MSW = \frac{SSW}{DFW} ]$$

#### 6. Calculate the F-statistic
- **F-statistic**: 
$$[ F = \frac{MSB}{MSW} ]$$

#### 7. Determine the P-value
- Compare the calculated F-statistic to the critical value from the F-distribution table or use a statistical software to determine the p-value.

#### 8. Make a Decision
- If the p-value is less than the significance level (typically 0.05), reject the null hypothesis. This indicates that there is a significant difference between the group means.

#### Step-by-step implementation

In [6]:
## Calculate Group Mean s and Overall Mean ##

group1 = [23, 20, 22, 21, 24]
group2 = [30, 32, 29, 31, 33]
group3 = [40, 42, 41, 39, 43]

mean1 = sum(group1) / len(group1)
mean2 = sum(group2) / len(group2)
mean3 = sum(group3) / len(group3)
overall_mean = sum(group1 + group2 + group3) / len(group1 + group2 + group3)

print(f"Group 1 Mean: {mean1}")
print(f"Group 2 Mean: {mean2}")
print(f"Group 3 Mean: {mean3}")
print(f"Overall Mean: {overall_mean}")

Group 1 Mean: 22.0
Group 2 Mean: 31.0
Group 3 Mean: 41.0
Overall Mean: 31.333333333333332


In [7]:
## Calculate Sum of Square **

# Total Sum of Squares (SST)
sst = sum((x - overall_mean)**2 for x in group1 + group2 + group3)

# Between-Group Sum of Squares (SSB)
ssb = len(group1) * (mean1 - overall_mean)**2 + len(group2) * (mean2 - overall_mean)**2 + len(group3) * (mean3 - overall_mean)**2

# Within-Group Sum of Squares (SSW)
ssw = sum((x - mean1)**2 for x in group1) + sum((x - mean2)**2 for x in group2) + sum((x - mean3)**2 for x in group3)

print(f"SST: {sst}")
print(f"SSB: {ssb}")
print(f"SSW: {ssw}")

SST: 933.3333333333335
SSB: 903.3333333333333
SSW: 30.0


In [8]:
## Calculate Degrees of Freedom ##

dft = len(group1 + group2 + group3) - 1
dfb = 3 - 1  # Number of groups - 1
dfw = dft - dfb

print(f"DFT: {dft}")
print(f"DFB: {dfb}")
print(f"DFW: {dfw}")

DFT: 14
DFB: 2
DFW: 12


In [9]:
## Calculate Mean Squares ##

msb = ssb / dfb
msw = ssw / dfw

print(f"MSB: {msb}")
print(f"MSW: {msw}")

MSB: 451.66666666666663
MSW: 2.5


In [10]:
## Calculate F-Statistic ##
f_stat = msb / msw

print(f"F-statistic: {f_stat}")

F-statistic: 180.66666666666666


In [11]:
## Determine P-Value ##

import scipy.stats as stats

p_value = stats.f.sf(f_stat, dfb, dfw)

print(f"P-value: {p_value}")

P-value: 1.1028256754466685e-09


In [12]:
## Make Decision ##

alpha = 0.05

if p_value <= alpha:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

    

Reject the null hypothesis


## Multiple Comparison of Means with ANOVA (Post-hoc Test)

**Theory**:
When ANOVA indicates a significant difference, a post-hoc test is used to determine which specific groups' means are different. One common post-hoc test is Tukey's Honest Significant Difference (HSD) test.

- **Tukey's HSD Test**: 
    - Compares all possible pairs of group means to determine which means are significantly different from each other.
- **endog**: 
    - The dependent variable (data).
- **groups**: 
    - The independent variable (group labels).
- **alpha**: 
    - The significance level (typically 0.05 but depends on use-case).


In [13]:
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Data
data = group1 + group2 + group3
groups = ['group1']*len(group1) + ['group2']*len(group2) + ['group3']*len(group3)

# Perform Tukey's HSD test
tukey_result = pairwise_tukeyhsd(endog=data, groups=groups, alpha=0.05)

print(tukey_result)

Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj  lower   upper  reject
---------------------------------------------------
group1 group2      9.0   0.0  6.3321 11.6679   True
group1 group3     19.0   0.0 16.3321 21.6679   True
group2 group3     10.0   0.0  7.3321 12.6679   True
---------------------------------------------------


**Example with Ranking the Values:**
Let's extend the example to include ranking the values.

- **ANOVA Result**: 
    - Provides the F-statistic and p-value for the overall test.
- **Tukey's HSD Test Result**: 
    - Shows which group means are significantly different.
- **Ranking the Values**: 
    - Adds a rank column to the DataFrame and sorts the values by rank

In [14]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Data
data = {
    'value': group1 + group2 + group3,
    'group': ['group1']*len(group1) + ['group2']*len(group2) + ['group3']*len(group3)
}

df = pd.DataFrame(data)

# Perform ANOVA
model = ols('value ~ group', data=df).fit()
anova_result = sm.stats.anova_lm(model, typ=2)
print(anova_result)

# Perform Tukey's HSD test
tukey_result = pairwise_tukeyhsd(endog=df['value'], groups=df['group'], alpha=0.05)
print(tukey_result)

# Ranking the values
df['rank'] = df['value'].rank()
print(df.sort_values(by='rank'))

              sum_sq    df           F        PR(>F)
group     903.333333   2.0  180.666667  1.102826e-09
Residual   30.000000  12.0         NaN           NaN
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj  lower   upper  reject
---------------------------------------------------
group1 group2      9.0   0.0  6.3321 11.6679   True
group1 group3     19.0   0.0 16.3321 21.6679   True
group2 group3     10.0   0.0  7.3321 12.6679   True
---------------------------------------------------
    value   group  rank
1      20  group1   1.0
3      21  group1   2.0
2      22  group1   3.0
0      23  group1   4.0
4      24  group1   5.0
7      29  group2   6.0
5      30  group2   7.0
8      31  group2   8.0
6      32  group2   9.0
9      33  group2  10.0
13     39  group3  11.0
10     40  group3  12.0
12     41  group3  13.0
11     42  group3  14.0
14     43  group3  15.0


### More detailed steps of ANOVA Post-hoc Test (Tukey's HSD)

#### 1. Perform ANOVA 
- First, we need to perform ANOVA to determine if there is a significant difference between the group means which we already did above. If ANOVA indicates a significant difference, we proceed with the post-hoc test

#### 2. Formulate Hypotheses for Post-hoc Test
- **Null Hypothesis (H0)**: The means of the specific pairs of groups are equal.
- **Alternative Hypothesis (H1)**: The means of the specific pairs of groups are not equal.

#### 3. Calculate Pairwise Comparisons
Tukey's HSD test compares all possible pairs of group means to determine which pairs are significantly different.

#### 4. Calculate the HSD Statistic
The HSD statistic is calculated for each pair of groups. It is based on the studentized range distribution.

#### 5. Determine the P-values for Each Pair
The p-values indicate the probability of observing the data, or something more extreme, if the null hypothesis is true for each pair of groups.

#### 6. Make a Decision for Each Pair
If the p-value for a pair is less than the significance level (typically 0.05), reject the null hypothesis for that pair, indicating a significant difference between the group means.

#### Step-by-step implementation

In [15]:
## Perform ANOVA ##
# even though we already did this

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Data
group1 = [23, 20, 22, 21, 24]
group2 = [30, 32, 29, 31, 33]
group3 = [40, 42, 41, 39, 43]

data = {
    'value': group1 + group2 + group3,
    'group': ['group1']*len(group1) + ['group2']*len(group2) + ['group3']*len(group3)
}

df = pd.DataFrame(data)

# Perform ANOVA
model = ols('value ~ group', data=df).fit()
anova_result = sm.stats.anova_lm(model, typ=2)
print("ANOVA Result:")
print(anova_result)

ANOVA Result:
              sum_sq    df           F        PR(>F)
group     903.333333   2.0  180.666667  1.102826e-09
Residual   30.000000  12.0         NaN           NaN


In [16]:
## Check ANOVA Result and potentially peform Tukey's HSD test ##

# Check if ANOVA is significant
if anova_result['PR(>F)'].iloc[0] < 0.05:
    print("\nANOVA is significant. Proceeding with Tukey's HSD test.")
    
    # Perform Tukey's HSD test
    tukey_result = pairwise_tukeyhsd(endog=df['value'], groups=df['group'], alpha=0.05)
    print("\nTukey's HSD Test Result:")
    print(tukey_result)
else:
    print("\nANOVA is not significant. No need for further post-hoc testing.")


ANOVA is significant. Proceeding with Tukey's HSD test.

Tukey's HSD Test Result:
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj  lower   upper  reject
---------------------------------------------------
group1 group2      9.0   0.0  6.3321 11.6679   True
group1 group3     19.0   0.0 16.3321 21.6679   True
group2 group3     10.0   0.0  7.3321 12.6679   True
---------------------------------------------------


## Interpret the Results ##

- **Group1 - Group2**: The comparison between Group 1 and Group 2.
- **Group1 - Group3**: The comparison between Group 1 and Group 3.
- **Group2 - Group3**: The comparison between Group 2 and Group 3.
- **meandiff**: The difference in means between the two groups.
- **p-adj**: The adjusted p-value for the comparison.
- **lower**: The lower bound of the confidence interval for the mean difference.
- **upper**: The upper bound of the confidence interval for the mean difference.
- **reject**: Indicates whether the null hypothesis is rejected for the comparison (True if p-adj < 0.05)


