# Types of Hypothesis Tests

1. Parametric
    - z-test, t-test, and ANOVA
    - Assume a normal distribution
    - Require sufficiently large sample sizes
2. Non-parametric
    - avoid the parametric assumptions and conditions
        - data is not normally distributed
        - sample size is small
        - Wilcoxon-signed rank test
    - Many non-parametric tests use ranks of the data

# Assumptions in parametric hypothesis test

- Hypothesis tests make assumptions about data
- Every hypothesis test assumes:
    1. each sample is randomly sourced from its population
    2. Each observation/sample is independent
    3. Sample is large enough so that central limit theorem can be applied
        - Smaller sample = greater uncertainty = wider confidence interval = Greater false negative or false positive errors
        - How large?
            - One Sample t-test : 30 observations in the sample
            - Two Samples t-test : 30 observations in each sample
            - Paired Samples t-test : 30 pairs of observations in the sample
            - ANOVA t-test : 30 observations in each sample
            - One sample proportion test : 10 successes and 10 failures for each sample. If probability is close to 0 or 1, we need a bigger sample
            - Two sample proportion test : 10 successes and 10 failures for each sample. If probability is close to 0 or 1, we need a bigger sample
            - Chi-square proportion test : 5 successes and 10 failures for each sample. If probability is close to 0 or 1, we need a bigger sample
    4. Sample is normally distributed
- If it is not sourced randomly, then the sample will not represent entire population
    - There is no way to test if the sample is randomly picked from dataset
    - Ask the people involved in the data collection process
- Make sure observations / samples are really independent 
    - If there are dependencies between 2 samples, use paired tests
    - If there is dependencies and paired test is not performed, there is an increased chance of getting False negative in the result
    - Check the dependency before data collection
- Sanity check:
        - Calculate bootstrap distribution
        - Visualize the distribution with histogram
        - If the distribution is not normal, then one of the assumptions has not been met
        - Revisit data collection to check for randomness, independence, and sample size

# Wilcoxon-signed rank test

- Non-parametric test
- Dependent samples
- Works well when the assumptions of a paired t-test aren't met.
- Works on the ranked absolute differences between the pairs of data
- Steps:
    1. Calculate difference between 2 columns and store the result in a new column (`diff`)
    2. Take absolute value of the new column
    3. Rank the dataframe according the the absolute value
    4. Calculate `T_minus` and `T_plus`
        - `T_minus` : Sum of ranks of negative differences
        - `T_plus` : Sum of ranks of positive differences
    5. Calculate test statistic : w
        - minimum value between `T_minus` and `T_plus`
    6. Calculate p-value from w

In [1]:
# sample_df['diff'] = sample_df['year_08'] - sample_df['year_09']
# sample_df['abs_diff'] = sample_df['diff'].abs()

# from scipy.stats import rankdata
# sample_df['rank_abs_diff'] = rankdata(sample_df['abs_diff'])
# T_minus = sample_df[sample_df['diff'] < 0]['rank_abs_diff'].sum()
# T_plus = sample_df[sample_df['diff'] >= 0]['rank_abs_diff'].sum()
# W = np.min([T_minus, T_plus])
# # For p, use w distribution table to look up value 

### Alternative Approach

In [2]:
# alpha = 0.01
# pingouin.wilcoxon(x=sample_df['year_08'],
#                   y=sample_df['year_09'],
#                   alternative="less")

# Wilcoxon-Mann-Whitney test


- Also know as the Mann Whitney U test
- A t-test on the ranks of the numeric input
- Works on unpaired data
- For 2 groups
- Steps:
    1. Convert data from long to wide format
    2. Perform Wilcoxon-Mann-Whitney test

In [3]:
# long_df = sample_df[['cat_col', 'nummeric_col']]
# wide_df = long_df.pivot(columns='cat_col', values='nummeric_col')

# alpha=0.01

# import pingouin
# pingouin.mwu(x=wide_df['cat1'],
#              y=wide_df['cat2'],
#              alternative='greater')

# Kruskal-Wallis test

- Wilcoxon-Mann-Whitney test extended to more than 2 groups
- Checks for dependency of one group with other multiple groups


In [4]:
# alpha=0.01
# pingouin.kruskal(data=sample_df,
#                  dv='numeric_col',
#                  between='multiclass_col')