# Topic 15: ANOVAs & Statistical Power with  Neuro Data 

- 04/01/21
- online-ds-ft-022221

## Learning Objectives

- Revisit hypothesis testing using my neuroscience research data.
- Learn about ANOVAs
- Discuss the multiple comparison problem.
- Discuss the multiple comparison problem and Tukey's test

### NOTES

- This notebook is intended to walk through preparing my binge drinking data for Hypothesis Testing
- Specifically, in this notebook I will attempt to use the most appropriate stat tests, which are not taught in the Learn curriculum
    -  **Two way RM ANOVA**
    - **Repeated Measures ANOVA in Python**


# REFERENCES


- Hypothesis Testing Workflow:
    - https://github.com/jirvingphd/hypothesis_testing_workflow_python

- **Two-Way and RM ANOVA Resources**
    - [RM ANOVA IN Python with Statsmodels](https://www.marsja.se/repeated-measures-anova-in-python-using-statsmodels/)
    - One-way RM ANOVA (other packages): https://www.marsja.se/repeated-measures-anova-using-python/
    - Two-Way: https://marsja.se/two-way-anova-repeated-measures-using-python/


## HYPOTHESIS TESTING STEPS

- Separate data in group vars.
- Visualize data and calculate group n (size)

    
* Select the appropriate test based on type of comparison being made, the number of groups, the type of data.


- For t-tests: test for the assumptions of normality and homogeneity of variance.

    1. Check if sample sizes allow us to ignore assumptions, and if not:
    2. **Test Assumption Normality**

    3. **Test for Homogeneity of Variance**

    4. **Choose appropriate test based upon the above** 
    
    
* **Perform chosen statistical test, calculate effect size, and any post-hoc tests.**
    - To perform post-hoc pairwise comparison testing
    - Effect size calculation
        - Cohen's d

## Statistical Tests Summary Table



| Parametric tests (means) | Function | Nonparametric tests (medians) | Function |
 | --- | --- | --- | --- |
 | 1-sample t test |`scipy.stats.ttest_1samp()`|  1-sample Wilcoxon |`scipy.stats.wilcoxon`|
 | 2-sample t test |`scipy.stats.ttest_ind()` | Mann-Whitney U test |`scipy.stats.mannwhitneyu()` |
 | One-Way ANOVA | `scipy.stats.f_oneway()` | Kruskal-Wallis | `scipy.stats.kruskal` | 
 
 
 | Factorial DOE with one factor and one blocking variable |Friedman test  |


# Real-World Science / Experimental Design

> ## The Role of Stress Neurons in the Amygdala in Addiction/Binge Drinking

- We will be talking through some of the experiments from my Postdoctoral research on the roll of stress neurons in the escalation of binge drinking.
- [James' Neuroscience Research Poster: Society for Neuroscience 2016](https://drive.google.com/open?id=14z2dUdPB_8ei3HA7R1j3ylwEP0kVZhJq)

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_100719_cohort_notes/master/images/sect_20_neuro_data.png">



#### The Opponent-Process Theory of Addiction 


<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Images/robinson-berridge-fig1.jpg">

## Hypothesis 

- Based on prior evidence in the field, stress neurons in the amygdala are believed to be responsible for the negative emotions that promote binge consumption to relieve negative symptoms

$ H_1$: Increasing the activity of stress neurons (CRF neurons) in the amygdala will increase the amount of alcohol consumed by binge-drinking mice.

$H_0$: Stimulation of CRF neurons has no effect on the amount of alcohol consumed.

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Images/jmi_poster_preds1.png" width=60%>

## Experimental Design

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Images/opto_6steps.jpg">

<img src="https://raw.githubusercontent.com/jirvingphd/hypothesis_testing_lessons/master/images/jmi_poster_fig1_no_mouse.png">

<!---
<img src="https://raw.githubusercontent.com/jirvingphd/hypothesis_testing_lessons/master/images/jmi_poster_fig1.png">--->

<img src="https://raw.githubusercontent.com/jirvingphd/fsds_pt_100719_cohort_notes/master/Images/jmi_poster_fig2.png">

<!---
<img src="https://raw.githubusercontent.com/jirvingphd/fsds_100719_cohort_notes/master/images/sect_20_neuro_data.png">')
--->

## Hypothesis Testing: Mouse Data

### Hypothesis
> Question: does stimulation of CRF Neurons in the central amygdala increase alcohol consumption?

- Metric:
- Groups:


- $H_1$: 

- $H_0$: 

$\alpha$=0.05


### Step 1: which type of test?

- What type of data?
    -  Numerical (# of licks)
- How many groups?
    -  Control vs Experimental
    - Training Phases (BL,S,PS,R)

#### Let's First Try to Treat this as 2-sample T-Tests (one for each phase)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

In [None]:
plt.style.use('seaborn-notebook')
pd.set_option('display.max_columns',0)
pd.set_option('display.precision',3)

# Obtaining/Preprocessing Data

In [None]:
## Load in the mouse drinking data cleaned csv
df = pd.read_csv('mouse_drinking_data_cleaned.csv',
                 index_col=0)
df.drop('Sex',inplace=True,axis=1)
df 

#### Laying Out Our Approach
We need to average all 4 session of the same phase (BL,S,R1,R2) for each mouse...

1. Make a **dict/lists of the column names** that should be **averaged together** (`col_dict`)

2. Make a new df of means using `col_dict`

3. Make a grp dict using  `df_means.groupby('Group').groups` 

- Visualize the two populations

- Prepare for hypothesis tests
    - Either use `grps` dict to reference the correct columsn to pass into tests

<!---
**Variables:**

- `col_dict` (dict): dict of column names to be grouped together for means
- `df_means` (df): df of col_dict column means.
- `grps` (dict): groupby dict where keys = 'Group' column and values = row indices

- `data` (dict): Dictionary of...
    - Series of each phase by group? --->

In [None]:
## Loop through the differnet phases of the experiment
phases = ['BL','S','PS','R1','R2']

## save corresponding column names as values 
col_dict = {}


In [None]:
## Get then opposite of col_dict


### Calculating individual mouse means by phase

In [None]:
## calculate the mean for all BL columns for each mouse


In [None]:
## Make a new df_means with just the mouse id and group first


In [None]:
## Loop through col_dict and calcualte the means for each phase for each mouse


### Getting Group Data For EDA & Testing

In [None]:
## Use groupby.groups
grps = None
grps

In [None]:
## Make an empty data dict


## For each group and its row numbers

    
    ## Save the group df as grp name 
    
    
    # Display data


### Plotting Group Means + Standard Error of the Mean

In [None]:
from scipy.stats import sem

## Select a phase to visualize

## Create lists for saving x,y, and yerr


# For each group
    
    ## grab the correct phasen col from group data

    
    ## Save x,y 

    ## Calc and save error
    

In [None]:
## plot with matplotlib


In [None]:
## Functionize
from scipy.stats import sem

def plot_bars_yerr():
    pass

In [None]:
## test function


### Run 2-sample T-Test on Baseline Days

In [None]:
test_phase = "BL"
f,a = plot_bars_yerr(data,phase)

#### Test Assumptions

In [None]:
from scipy import stats


## Make list of list of headers

## Make an empty list for our group data

## Loop through the data dictionary 
   
    ## Grab the correct phase column from the group df
    
    ## Append group data to list of group data
    
    ## Test for nomrality and save result 
    
    ## save results 



### Adding Levene's Test

#### Run Correct Test

- Since we failed assumption of normality, we will perform the Mann Whitney U test instead of the 2-sample t-test

In [None]:
## visualize one more time and run the correct test


### >> Functionized

In [None]:
## Functionize code for testing other phases
def test_assumptions(data,test_phase):#,plot=True):

    ## Make list of list of headers
    results = [['Phase','Group','n','Test Name','Test Stat','p','sig?']]

    ## Make an empty list for our group data
    test_equal_var = []

    ## Loop through the data dictionary 
    for grp,grp_df in data.items():

        ## Grab the correct phase column from the group df
        grp_data = grp_df[test_phase].copy()
        ## Append group data to list of group data
        test_equal_var.append(grp_data)

        ## Test for nomrality and save result 
        stat,p = stats.normaltest(grp_data)
        results.append([test_phase, grp,len(grp_data),'normality',stat,p,p<.05])


    ## Test for equal variance
    stat, p = stats.levene(*test_equal_var)
    results.append([test_phase,'-','-','Equal Variance',stat,p,p<.05])

    results_df = pd.DataFrame(results[1:],columns=results[0])
    return results_df

In [None]:
## Using our two functions, plot and test the asssumptions for S phase
current_phase = 'S'
fig,ax = plot_bars_yerr(data,current_phase)
res_df=  test_assumptions(data,current_phase)
res_df

#### Make a final function to use both of the above

In [None]:
def test_and_plot_phase(data,phase):
    res_df = test_assumptions(data,phase)
    f,a = plot_bars_yerr(data,phase)
    display(res_df)
    plt.show()

### Using our functions, evaluate each phase's assumption tests and select the correct hypothesis test

In [None]:
## BL 

In [None]:
# S


In [None]:
# R1


In [None]:
#R2


## ANOVA

- Let's analyze the difference between phases for an ANOVA

### Run One-Way ANOVAs with Scipy
- One for Control Mice 
- One for Experimental Mice

In [None]:
## Run f_oneway 


In [None]:
## Run f_twoway


- compare the two p-values

## Two-Way  ANOVA with Statsmodels

<!-- ### RM ANOVA Melting DF -->

### Melting a dataframe
https://pandas.pydata.org/docs/reference/api/pandas.melt.html

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [None]:
to_melt = None
to_melt

In [None]:
## melt to create df2


In [None]:
## Mapping Phase from Phase Dict

## Getting Day of Phase


In [None]:
## Now that we melted the data, use sns.barplot!


### Create an OLS Model to Run an ANOVA

In [None]:
## define formula for model and fit


In [None]:
## create the two way ANOVA table 


#### Tukey's Multiple Comparison Test

In [None]:
## Follow up with 
from statsmodels.stats.multicomp import pairwise_tukeyhsd
pairwise_tukeyhsd

In [None]:
## create a Group-Phase column for tukey


In [None]:
## Run tukey's test and display summary



## The CORRECT Test: Repeated Measures ANOVA

In [None]:
from statsmodels.stats.anova import AnovaRM


## CONCLUSION
- Running the correct test according to the assumptions of normality and equal variance will ensure you can get the correct test result.

- Notice how the last phase (R) did NOT come back as significant when we ran the t-test, but DID come back significant when we performed the Mann Whitney U instead. 



(https://www.statsmodels.org/stable/generated/statsmodels.stats.multicomp.pairwise_tukeyhsd.html)

## Effect Size Visual
- https://rpsychologist.com/d3/NHST/
