## Power

In this lab, we explore the concept of statistical power. Recall that
power is the ability to reject the null hypothesis. Greater power means
greater ability to reject the null hypothesis. Of course, if you cannot
reject the null hypothesis, you cannot rule out random chance as an
explanation for your findings. Good power is a bare minimum
requirement.

Technically defined, power is the proportion of the time in which you
would achieve *p* &lt; .05 for a given population effect size. Power is
relevant *only* when the null hypothesis is false. You want good
power...usually the recommendation is .80 or higher.

Power is **highly dependnet on sample size**.

The standard guidelines for Cohen's *d* are:
<pre>
| #  |    d Value    |  Meaning   |
|:--:|:-------------:|:----------:|
| 1. |    0 - 0.2    | Negligible |
| 2. |   0.2 - 0.5   |   Small    |
| 3. |   0.5 - 0.8   |   Medium   |
| 4. |     0.80 +    |   Large    |
</pre>

To reiterate, in power calculations, there are always three things that
interrelate:

1.  Power
2.  Sample size (*n* per group in group-comparison studies)
3.  Effect size

In [None]:
from scipy import stats
from statsmodels.stats.power import tt_ind_solve_power
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Planning Sample Size

Sometimes it can be handy to generate a number of power estimates for
different effect sizes and sample sizes. 

Procedures for creating power table:
1. A list of d values is created with a list comprehension.
2. A data frame is created with a single column for sample sizes.
3. Loop over the d values. Within the loop a list comprehension is used to compute the power of the test for each sample size. 

Also, plot power vs. the sample size for each of the d values:

In [None]:
# create a list of d values
d_vals = [d/10.0 for d in range(2, 16)]
# create a list of sample size
sample_size = range(20,210,10)

powers = pd.DataFrame({'Sample Size':sample_size})

# loop over d values for getting the power with the corresponding sample size
for d in d_vals:
    powers['d = ' + str(d)] = [tt_ind_solve_power(effect_size=d, nobs1=x, alpha=0.05, 
                                                  power=None, ratio=1, alternative='two-sided')
                              for x in sample_size]

powers

In [None]:
fig = plt.figure(figsize=(12,10))
ax = fig.gca()
# use Pandas plotting function to generate the figure
powers.plot(x='Sample Size', ax=ax, linestyle = '-.')
plt.hlines(y = 0.8, xmin = 20, xmax = 200, color = 'red', linestyle = '--')
plt.title('Power vs. sample size for values of d')
plt.ylabel('Power')
plt.xlabel('Sample Size')

You can easily see with the graph that 80%
power (red dashed line) would take 180 participants per group for a *d*
= .03 but would only take 45 people per group at *d* = .06.