# Week02 - Hypothesis testing, Power analysis, ANOVA, T-test # 
Feb 14 2017

www.biostathandbook.com

## Hypothesis testing ##
What is a hypothesis? - A testable question

Four basics elements of a statistical test
- Null Hypothesis $ H_0 $

In a calibration bath measuring temperature,

x - sensore measurement <br>
$ \mu $ : instrument, error prone<br>
example: $ H_0: \bar{x} = \mu $

__ Alternate hypothesis $H_a$ __

$ H_a = \bar{x} \neq \mu $

__ Parametric __ : Based on theoretical distribution, foundation for statistical theory


__ T - Statistic __ : $ t = \frac{ \bar{x} - \mu } {S\sqrt{\frac{1}{N}}} $ <br>
Can define a __region of rejection of the null hypothesis__ as being outside of a certain area of the t-distribution
This is the alpha (ie if $\alpha = 0.05$, then we have 95% confidence level)


<img src='images/t_test_ex.png' width ='400'>










__Confidence Level__
if Null 



__Statistical Power__ : Depends on the the hypothesis test, the number of sampples, and the size of the effect that is being looked for

### Student's T-test ### 
Motivated for understanding the distribution of means of small samples <br>
__Comparing pairs of populations__ : Quantify whether the means are actually different

Box Plots
- Line represents Median
- Box 25-75th percentile
- Whiskers - 10-90th percentile
- outliers shown outside of whiskers

In [8]:
# Testing whether the population mean and the data mean are different

from scipy import stats
a = [1,2.,4]
popmean = 10
t,p = stats.ttest_1samp(a, popmean) # list of samples, population mean (true mean)
print("t:",round(t,3))
print("p:",round(p,3)) # less than .05 but greater than <0.01, so can reject Null hypothesis at 95% but not 99% confidence

a = [1,2.,4]
popmean = 3
t,p = stats.ttest_1samp(a, popmean) # list of samples, population mean (true mean)
print("t:",round(t,3))
print("p:",round(p,3)) # less than .05 but greater than <0.01, so can reject Null hypothesis at 95% but not 99% confidence

t: -8.693
p: 0.013
t: -0.756
p: 0.529


## Two sample t-test
__t-test #1__ <br>
Comparing $\bar{x}$ and $\bar{y}$
assume same variance ie  $S_x^2 = S_y^2 $

$ t = \frac{\bar{x} - \bar{y}} {S_{xy} \sqrt{\frac{1}{N_x} + \frac{1}{N_y} }} $


Pooled Variance: 

$ S_{xy} = \sqrt{\frac{(N_x - 1)S_x^2 + (N_y - 1)S_y^2} {N_x + N_y -2}  } $ <br>
$H_0$ : $ \bar{x} = \bar{y} $ <br>
$H_a$ : $ \bar{x} \neq \bar{y} $

__Welch's t-test__

Compares $\bar{x}$ and $\bar{y}$ when __variance of x and y are different__






# Two- sampled t-test
stats.ttest_ind(x,y)

# Welch's t-test
stats.ttest_ind(x,y,equal_var=False)

In [23]:
# Emery and Thompson 3.14 Example
print('t-test p-value:', ((23-20) /  ((2.55)*(1/31 + 1/31)**.5)))
print('alpha (0.05)',stats.t.ppf(.975,60))

print('Reject the Null Hypothesis that the two means are equal!')

t-test p-value: 4.631769337654006
alpha (0.05) 2.00029782106
Reject the Null Hypothesis that the two means are equal!


If the null hypothesis is __False__, there is a __5%__ chance of incorrectly reject the null hypothesis

__ Power (1 - $ \beta $) __ <br>
General accepted power levels: $1 -  \beta = 0.8$
* Effect Size - $d = \frac{|\mu_1 - \mu_2 |} {\sigma} $
    - d=0.2 "small"
    - d=0.8 "large"
* Sampled Size
* Confidence Level
* Target Power

Instrument Calibration example:
* Want to be able to measure a minimum effect of 2 $\mu$Mol
* instrument noise = 5 $\mu$Mol
* Significance level: $\alpha$ = 0.05
* Power: 1- $\beta$ = 0.8

__ How many of samples do we need to detect this effect? __
49 samples

In this case, the effect size can be thought of as the differences of 2uM / the standard deviation of 5uM

If the actual difference is 2uM, then we will get a significant difference 80% of the time with N=50

## ANOVA ##
__ Analysis of variance __: test for a statistically significant difference between means of 3+ different groups

* Does not tell you which group is different
* Requires the use of __*post hoc*__ analysis to determine which means are different from each other

Example:
<img src='images/anova_example.png' width = '600'> http://www.biostathandbook.com/onewayanova.html

__One-Way ANOVA__ 
Fisher's LSD *post-hoc* test used to determine which populations are different from each other.

__Two-Way ANOVA__
Data are grouped into different genotypes, within those groupings, sex is segregated. Thus two factors are varying across the examples



One-Way example

* J Populations (or "treatments")
* N samples per population
* $H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 $
* $H_a$: One mean will be different from any of the others

Three different types of CTDs in a water bath, each has four different measurements
( Does not require the same number of samples within each population

Use the F-statistic: The ration of variance of two groups of samples taken from a normal distribution follows an *F* distribution

## $$ F = \frac{s_1^2}{s_2^2} $$ ##
Can be used to test whether sample variance are  significantly different. 

Sum of Squares Between: __SSB__ $$\sum_{j=1}^J{N_j(\bar{y_j}\bar{y})^2}$$where $\bar{y_j}$ is the mean of each population and $\bar{y}$ is the mean of all samples

Mean Square Between: __MSB__ $$\frac{SSB}{J-1}$$

Sum of Squares Within: __SSW__ $$ \sum_{j=1}^J{\sum_{i=1}^{N_j}}({y_{ij}} - \bar{y}_i)^2  $$

Mean Square Within: __MSW__ $$ \frac{SSW} {\sum_{j=1}^J ({ N_j} -J )} $$


F-Distribution: __F__ $$ =\frac{MSB}{MSW} $$


Reject the Null Hypothesis if F is large -> One-tailed test, ie not interested in small F <br>
Region of rejection is is above some critical level


### Popular *post-hoc* tests ###

* Fisher's LSD (least significant difference)

* Tukey HSD (Honest significant difference) - More commonly used and more conservative
