## Power analysis

Statistical power describes the probability of correctly accepting the null hypothesis, if it is false. A conventional target power is $(1 -  \beta) = 0.8$. This means that **if** the null hypothesis is false, there is a probability $\beta$ = 0.2 (one out of five chance) that the null hypothesis will be accepted incorrectly (Type II error). This would mean incorrectly inferring that there is no difference between two sets of samples.

Statistical power and confidence levels are not independent. The confidence level for accepting or rejecting a null hypothesis is one of the primary factors that determined statistical power.

Power analysis can be useful for studies that inform management decisions, in which avoiding a Type II error might be a "conservative" action. For example, a Type II error in a study on the effect of pollution on abundance of an organism at two sites (pristine and disturbed) might mean "incorrectly" protecting habitat. The accepted level of risk of a Type II error in this case might vary between environmental and industry stakeholders.

Power analysis is also useful _before_ conducting an experiment. It can help you determine how many samples you need to observe a certain effect with a statistical test. The effect that you want to observe might be determined by the resolution of your instrument, or what you think might be important in an ecological sense. Remember that determining a significant difference between two sets of samples does not mean that the difference is important.

There are four ingredients in a power analysis. If three are known, than the fourth can be calculated.

1. Effect Size:  $d = \frac{|\mu_1 - \mu_2 |} {\sigma} $
    * d=0.2 "small"
    * d=0.8 "large"
2. Sample size: $N$
3. Confidence Level: (1-$\alpha$), also refered to as the $\alpha $ level or significance level, typical is 0.05.
4. Target Power: (1-$\beta$)

#### Effect size (d) - minimum deviation from the null hypothesis that you expect to be able to detect

This is the effect size for a difference between two means (t-test)

* d is non-dimensional
* often called "Cohen's d"
* d = difference of the mean / standard deviation

##### Example: Detecting change due to a restoration activity

Goal 1 (ecologically significant effect): increase (indicates the need for a one-tailed z-test) the mean oxygen concentrations by 20 $\mu$M
* The natural variability is 50 $\mu$M (std. dev.)
* $\mu$ is the mean oxygen concentration before restoration
* $\bar{x}$ mean of the samples collected after restoration

* $d = \frac{|\mu_1 - \mu_2 |} {\sigma} $
* $d = \frac{20 \mu M} {50 \mu M} = 0.4 $


Goal 2: want to be able to show that a *statistically* significant difference is present, if the activity is a a success.  This is different than Goal 1.

* In this case, $H_0: \bar{x} \le \mu$  and $H_a : \bar{x} > \mu$

##### One-tailed z-test (like a t-test, valid for a large N)

* $Z = \frac{\bar{x} - \mu} {\frac {S} {\sqrt{N}}}$
* **compare to $Z_{crit}$:**
$Z_{1-\alpha}$

#### Resources for calculating power

Online visualization (for one-sample z-test only):

http://rpsychologist.com/d3/NHST/


Online power calculator (for many different statistical tests):

http://webpower.psychstat.org/wiki/

G* power:

http://www.gpower.hhu.de/en.html

Python http://jpktd.blogspot.com/2013/03/statistical-power-in-statsmodels.html

#### Example: detecting small differences with a noisy instrument 
* Want to be able to measure a differences of 2 $\mu$M
* instrument noise = 5 $\mu$M
* Significance level: $\alpha$ = 0.05
* Power: 1- $\beta$ = 0.8

_How many of samples do we need to detect this difference?_

In this case, the effect size can be thought of as the absolute difference of 2uM, relative to the standard deviation (noise level) of 5uM. The effect size $d = $ 0.4.

In [9]:
from statsmodels.stats import power
nobs = power.tt_solve_power(power=0.8,alpha=0.05,effect_size=0.4)
print('N = ',round(nobs,3))

N =  51.009


If the actual difference is 2uM, then we will get a significant difference 80% of the time with $N = $ 51. This example is for a one-sample t-test, but other functions in the `power` library can be used for other statistical tests.

In [10]:
dir(power) # List all the functions in the power library 

['FTestAnovaPower',
 'FTestPower',
 'GofChisquarePower',
 'NormalIndPower',
 'Power',
 'TTestIndPower',
 'TTestPower',
 '_GofChisquareIndPower',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'brentq_expanding',
 'ftest_anova_power',
 'ftest_power',
 'iteritems',
 'normal_power',
 'np',
 'optimize',
 'print_function',
 'stats',
 'tt_ind_solve_power',
 'tt_solve_power',
 'ttest_power',
 'zt_ind_solve_power']