
# Section 22: Hypothesis Testing ( A/B Testing) In-Depth

WORK IN PROGRESS (as of 01/26)
- online-ds-pt-100719
- For 01/28/20



# Topics 
 

- Revisiting workflow: choosing the correct hypothesis test.

- Apply workflow to ["In-Depth AB Testing Lab"](https://learn.co/tracks/data-science-career-v2/module-3-probability-sampling-and-ab-testing/section-22-ab-testing/in-depth-ab-testing-lab)

- Perform post-hoc calculations and write conclusions:
    - Effect Size
    - Post-hoc pairwise comparisons 
    - Statistical Power
    

## Resources
**Overivews/Cheatsheets**
- [CodeAcademy Hypothesis Testing Slideshow](https://drive.google.com/open?id=1p4R2KCErq_iUO-wnfDrGPukTgQDBNoc7)
- [Cheatsheet: Hypothesis Testing with Scipy](https://drive.google.com/open?id=1EY4UCg20HawWlWa50M2tFauoKBQcFFAW)


- [Choosing Between Parametric and Non-Parametric Tests](https://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test)

**Trustable Stat References**:
- [Graphpad Prism's Stat Guide](https://www.graphpad.com/guides/prism/8/statistics/index.htm)
- [LAERD Statistics Test Selector](https://statistics.laerd.com/premium/sts/index.php)


# Choosing the Correct Hypothesis Test

##  STEP 0: Stating our Hypothesis:

- **Before selecting the correct hypothesis test, you must first officially state your null hypothesis ($H_0$) and alternative hypothesis ($H_A$ or $H_1$)**

> **Before stating your hypotheses, ask yourself:**
1. What question am I attempting to answer?
2. What metric/value do I want to measure to answer this question?
3. Do I expect the groups to be different in a specific way? (i.e. one group greater than the other).
    - Or do I just think they'll be different, but don't know how?

- **Now formally declare your hypotheses after asking yourself the questions above:**

- $H_1$ : 

- $H_0$ :

<br>

## STEP 1: Determine the category/type of test based on your data.

### Q1: What type of data do I have (Numeric or categorical?)

### Q2: How many samples/groups am I comparing?

- Using the answers to the above 2 questions: select the type of test from this table.

| What type of comparison? | Numeric Data | Categorical Data|
| --- | --- | --- |
|Sample vs Known Quantity/Target|1 Sample T-Test| Binomial Test|
|2 Samples | 2 Sample T-Test| Chi-Square|
|More than 2| ANOVA and/or Tukey | Chi Square|

## STEP 2:  Do we meet the assumptions of the chosen test?

### ASSUMPTIONS SUMMARY


- [One-Sample T-Test](https://statistics.laerd.com/spss-tutorials/one-sample-t-test-using-spss-statistics.php)
    - No significant outliers
    - Normality

- [Independent t-test (2-sample)](https://statistics.laerd.com/statistical-guides/independent-t-test-statistical-guide.php)
    - No significant outliers
    - Normality
    - Equal Variance

- [One Way ANOVA](https://statistics.laerd.com/spss-tutorials/one-way-anova-using-spss-statistics.php)
    - No significant outliers
    - Equal variance
    - Normality

- [Chi-Square test](https://statistics.laerd.com/spss-tutorials/chi-square-test-for-association-using-spss-statistics.php)
    - Both variables are categorical


### HOW TO: TEST ASSUMPTIONS AND SELECT CORRECT TEST

#### 0. Check for & Remove Outliers


- Required for 1-sample t-test and ANOVA.
- Use one of the two methods below to identify outliers:
    - Use Tukey's interquartile range rule.
    - Use absolutely value of Z-scores >3 as rule.
- CAUTION: Tukey's IQR method removes more outliers than z-scores. Take care in choosing the appropriate outlier removal.

#### 1. **Test Assumption of  Normality**

- Use either of the following tests:
    - D'Agostino-Pearson's normality test<br>
    ```scipy.stats.normaltest```
    - Shapiro-Wilik Test<br>
    ```scipy.stats.shapiro```<br>


- **1A. If you have normal data:**

    - **Move onto assumption \#2**, testing assumption of equal variance.
    
    
- **1B. If you don't have normal data:** 
    
    > **Check if your group sizes (n) are big enough to safely ignore normality assumption? (see table below)**

    - **If your N is big enough:**
        - **Move onto assumption \#2**, testing assumption of equal variance. 
   - **If you group N's are NOT large enough**:  
        - **Move onto step 3.**, selecting the non-parametric version of your t-test
     


| Parametric Test| Sample size guidelines for nonnormal data| 
| --- | --- |
| 1-sample t test| Greater than 20|
| 2-sample t test| Each group should be greater than 15| 
| One-Way ANOVA|If have 2-9 groups, each group n >= 15. <br>If have 10-12 groups, each group n>20.|
    

#### 2. Test for Equal Variance

 - Levene's Test<br>
```scipy.stats.levene```

- **If you fail the assumption of equal variance:**
    - Use a Welch's T-Test.
        - for scipy, add `equal_var=False` to `ttest_ind`
        
        
- **If you pass the assumption of equal variance:**
    - Use a regular 2-sample t-test.
    - See Final Summary Table at the bottom.
    

#### 3. Select a non-parametric equivalent of your t-test.
 

> **Table Source: Parametric  T-Tests vs Non-Parametric Alternatives**
- [Choosing Between Parametric and Non-Parametric Tests](https://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test)

- **Select the test from the right Nonparametric column that matches your Parametric t-test.** 


- See final summary table at bottom for scipy functions. 

| Parametric tests (means) | Nonparametric tests (medians) |
 | --- | --- |
 | 1-sample t test | 1-sample Wilcoxon |
 | 2-sample t test | Mann-Whitney U test |
 | One-Way ANOVA | Kruskal-Wallis |
 
 
 
 

### Summary Table - Hypothesis Testing Functions

| Parametric tests (means) | Function | Nonparametric tests (medians) | Function |
 | --- | --- | --- | --- |
 | **1-sample t test** |`scipy.stats.ttest_1samp()`|  **1-sample Wilcoxon** |`scipy.stats.wilcoxon`|
 | **2-sample t test** |`scipy.stats.ttest_ind()` | **Mann-Whitney U test** |`scipy.stats.mannwhitneyu()` |
 | **One-Way ANOVA** | `scipy.stats.f_oneway()` | **Kruskal-Wallis** | `scipy.stats.kruskal` | 
 
 



    

## STEP 3: Interpret Result & Post-Hoc Tests

- **Perform hypothesis test from summary table above to get your p-value.**

- **If p value is < $\alpha$:**
    - Reject the null hypothesis.
    - Calculate effect size (e.g. Cohen's $d$)
    
- **If p<.05 AND you have multiple groups (i.e. ANOVA)**
    - **Must run a pairwise Tukey's test to know which groups were significantly different.**
    - [Tukey pairwise comparison test](https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)
    - `statsmodels.stats.multicomp.pairwise_tukeyhsd`
    
    
- Report statistical power (optional)

#### Post-Hoc Functions:

| Post-Hoc Tests/Calculatons|Function|
|--- | --- |
|**Tukey's Pairwise Comparisons** | `statsmodels.stats.multicomp.pairwise_tukeyhsd`|
|**Effect Size**| `Cohens_d`|
|**Statistical Power** | `statsmodels.stats.power`:<br>  `TTestIndPower` , `TTestPower`

# SUMMARY TABLES - COMPLETE

### Assumption Tests
 
|Assumption test| Function |
| --- | --- |
| **Normality**| `scipy.stats.normaltest`|
| **Equal Variance** | `scipy.stats.levene`|


### Hypothesis Tests

| Parametric tests (means) | Function | Nonparametric tests (medians) | Function |
| --- | --- | --- | --- |
| **1-sample t test** |`scipy.stats.ttest_1samp()`|  **1-sample Wilcoxon** |`scipy.stats.wilcoxon`|
| **2-sample t test** |`scipy.stats.ttest_ind()` | **Mann-Whitney U test** |`scipy.stats.mannwhitneyu()`|
| **One-Way ANOVA** | `scipy.stats.f_oneway()` | **Kruskal-Wallis** | `scipy.stats.kruskal` | 

 
 ### Post-Hoc Tests/Calculations
 
 | Post-Hoc Tests/Calculatons|Function|
 |--- | --- |
 |**Tukey's Pairwise Comparisons** | `statsmodels.stats.multicomp.pairwise_tukeyhsd`|
 |**Effect Size**| `Cohens_d`|
 |**Statistical Power** | `statsmodels.stats.power`:<br>  `TTestIndPower` , `TTestPower`


## HYPOTHESIS TESTING STEPS

- Separate data in group vars.
- Visualize data and calculate group n (size)

    
* Select the appropriate test based on type of comparison being made, the number of groups, the type of data.


- For t-tests: test for the assumptions of normality and homogeneity of variance.

    1. Check if sample sizes allow us to ignore assumptions, and if not:
    2. **Test Assumption Normality**

    3. **Test for Homogeneity of Variance**

    4. **Choose appropriate test based upon the above** 
    
    
* **Perform chosen statistical test, calculate effect size, and any post-hoc tests.**
    - To perform post-hoc pairwise comparison testing
    - Effect size calculation
        - Cohen's d

# ACTIVITY: In-Depth AB Testing

- Open in another notebook.

In [1]:
!pip install -U fsds_100719
from fsds_100719.imports import *
import scipy.stats as stats
import statsmodels.api as sms
import statsmodels.formula.api as smf

fsds_1007219  v0.7.2 loaded.  Read the docs: https://fsds.readthedocs.io/en/latest/ 


Handle,Package,Description
dp,IPython.display,Display modules with helpful display and clearing commands.
fs,fsds_100719,Custom data science bootcamp student package
mpl,matplotlib,Matplotlib's base OOP module with formatting artists
plt,matplotlib.pyplot,Matplotlib's matlab-like plotting module
np,numpy,scientific computing with Python
pd,pandas,High performance data structures and tools
sns,seaborn,High-level data visualization library based on matplotlib


['[i] Pandas .iplot() method activated.']


## Getting Group Data and EDA

In [2]:
df = None

In [3]:
def plot_dists(grp1,grp2,col='BL',name1='Exp',name2='Control'):

    ## Defining "gridspec_kws" for plt.subplots()
    ## This will make our first plot 3 times wider than the second.
    gs_kw = dict(width_ratios=[3, 1])
    
    fig, axes = plt.subplots(figsize=(10,4),ncols=2,
                             gridspec_kw=gs_kw,constrained_layout=True)

    ## Defining the data 
    group1 = {'name':name1, 
             'data':grp1[col],
             'plot_specs':{
                 'hist_kws':dict(color='b', lw=2,ls='-'),
                 'kde_kws':dict(color='b',lw=1,ls='-'),
                 'label':f"{name1} (n={len(grp1[col])})"}
             }
    
    group2 = {'name':name2, 
             'data':grp2[col],
              'plot_specs':{
                  'hist_kws':dict(color='orange', lw=2,ls='-'),
                  'kde_kws':dict(color='orange',lw=1,ls='-'),
                   'label':f"{name2} (n={len(grp2[col])})"}
             }
    
    
    ax=axes[0]
    sns.distplot(group1['data'], **group1['plot_specs'],ax=axes[0])
    sns.distplot(group2['data'], **group2['plot_specs'],ax=axes[0])
    ax.legend()
    
    ax.set(ylabel="Density")
    ax.set(xlabel='Number of Licks')
    
    
    ax = axes[1]
    ax.bar(group1['name'],group1['data'].mean(),
          yerr=sem(group1['data']))

    ax.bar(group2['name'],group2['data'].mean(),
          yerr=sem(group2['data']))    
    
    return fig,ax

### Writing functions to test assumptions

In [4]:
import scipy.stats as stats

def test_normality(grp_control,col='BL',alpha=0.05):
    import scipy.stats as stats
    stat,p =stats.normaltest(grp_control[col])
    if p<alpha:
        print(f"Normal test p value of {np.round(p,3)} is < {alpha}, therefore data is NOT normal.")
    else:
        print(f"Normal test p value of {np.round(p,3)} is > {alpha}, therefore data IS normal.")
    return p

def test_equal_variance(grp1,grp2, alpha=.05):
    stat,p = stats.levene(grp1,grp2)
    if p<alpha:
        print(f"Levene's test p value of {np.round(p,3)} is < {alpha}, therefore groups do NOT have equal variance.")
    else:
        print(f"Normal test p value of {np.round(p,3)} is > {alpha},  therefore groups DOES have equal variance.")
    return p

# APPENDIX: 

## Functions from Last Class to Make Bite-Sized

In [5]:
def plot_statplot(df_means,grps=None,
                  group_col='Group',data_col='BL'):
    
    if grps is None:
        grps = df_means.groupby(group_col).groups

    ## Examine KDEs for BL
    fig= plt.figure(figsize=(10,6))
    axes=['','']
    # Define gridspec to create grid coordinates             
    gs = fig.add_gridspec(nrows=1,ncols=9)
    axes[0] = fig.add_subplot(gs[0,0:7])
    axes[1] = fig.add_subplot(gs[0,7:])

    data1=df_means.loc[grps['ChR2'],data_col]
    data2=df_means.loc[grps['Control'],data_col]
    
    group1 = {'name':'ChR2',
             'data':data1,#df_means.loc[grps['ChR2'],data_col],
             'n':len(data1)}
    plot1 = {'hist_kws':dict(color='blue',lw=2, ls='-')}#,bins='auto')}

    group2 = {'name':'Control',
             'data':data2,#df_means.loc[grps['Control'],data_col],
             'n':len(data2)}
    plot2 = {'hist_kws':dict(color='orange',lw=2, ls='-')}#,bins='auto')}
    
    ax = axes[0]
    label1= f"{group1['name']} n={group1['n']}"
    sns.distplot(group1['data'], label=label1,
                 ax=ax, hist_kws=plot1['hist_kws'])
    # ax.legend()

    label2= f"{group2['name']} n={group2['n']}"
    sns.distplot(group2['data'], label=label2,
                 ax=ax,hist_kws=plot2['hist_kws'])
    ax.legend()

    

    ax.axvline(group1['data'].mean(),color=plot1['hist_kws']['color'], ls='--')
    ax.axvline(group2['data'].mean(),color=plot2['hist_kws']['color'], ls='--')


    ax = axes[1]

    ax.bar(group1['name'],group1['data'].mean(),
          yerr=sem(group1['data']))

    ax.bar(group2['name'],group2['data'].mean(),
          yerr=sem(group2['data']))
    
    plt.suptitle(f"Phase = {data_col}",fontsize=20)
    
    return fig, ax

In [7]:
def test_assumptions(df_means,grps=None,
                     group_col='Group',
                     grp1='ChR2',
                     grp2='Control',
                     data_col='BL',
                    plot_data=False):
    """MASSIVE FUNCTION PASTED IN DUE TO VERY LATE STUDY GROUP
    WE WILL CONSTRUCT A BETTER/SIMPLER VERSION OF THIS TOGETHER IN NEXT STUDY GROUP."""
    
    if grps is None:
        grps = df_means.groupby(group_col).groups
        
        
    group1 = {'name':grp1,
              'data':df_means.loc[grps[grp1],data_col]}
    
    group2 = {'name':grp2,
              'data':df_means.loc[grps[grp2],data_col]}
    
    results = [['Col','Test','Group(s)','Stat','p','p<.05']]
    
    ## Normality testing
    stat,p = stats.normaltest(group1['data'])
    results.append([data_col,'Normality',group1['name'],
                  stat, p, p<.05])
    
    stat,p = stats.normaltest(group2['data'])    
    results.append([data_col,'Normality',group2['name'],
                  stat, p, p<.05])
    ## Homo. of Variance Testing
    stat,p = stats.levene(group1['data'],group2['data'])
    results.append([data_col,'Equal Variance','Both',
                  stat, p, p<.05])
    
    
    ## Parametric T-Test
    stat,p = stats.ttest_ind(group1['data'],group2['data'])
    results.append([data_col,'T-Test 2samp','Both',stat,p,p<.05])
    
    ## Non-Parametric MWU
    stat,p = stats.mannwhitneyu(group1['data'],group2['data'])
    results.append([data_col,'Mann Whitney U','Both',stat,p,p<.05])
    
    ## Effect size with Cohen's d
    d = Cohen_d(group1['data'],group2['data'])
    results.append([data_col, "Cohen's d", 'Both','','',d])
    
#     if plot_data:
#         plot_dists(grp, col=data_col)
    
    return pd.DataFrame(results[1:],columns=results[0])
