In [2]:
from statsmodels.stats import power
from math import ceil

# The interpretation of the p-value, and the “p-value fallacy”

---

A value of p < 0.05 for the null hypothesis has to be interpreted as follows: If the null hypothesis is true, the chance that we find a test statistic as extreme or more extreme than the one observed is less than 5%. This is not the same as saying that the null hypothesis is false, and even less so, that an alternative hypothesis is true! Stating a p-value alone is no longer state-of-the-art for the statistical analysis of data. You should 
also state the the confidence intervals for the parameter that you investigate.


# Types of Error

In hypothesis testing, two types of errors can occur:
![foo](images/type12errors.png)

## Type I errors
 * your incorrect rejection of a true null hypothesis (a "false positive")
 * likelihood of a Type I error is commonly indicated with α, and is set before you start the data analysis.

## Type II errors and Test Power
* Incorrectly retaining a false null hypothesis (a "false negative").
* probability for this type of error is commonly indicated with β.

# Sample Size
The power of a statistical test depends on four factors:

* α, the probability for Type I errors
* β, the probability for Type II errors. Power = the probability of correctly rejecting a false null hypothesis = 1−β
* d, the effect size, i.e. the magnitude of the investigated effect relative to σ (the standard deviation of the sample)
* n, the sample size

Only 3 of these 4 parameters can be chosen, the 4th is then automatically fixed.

---

## For 2 groups t-test use:
power.tt_ind_solve_power
## For 1 group t-test use:
power.tt_solve_power

Tells us that if we compare two groups with the same number of subjects and the same standard deviation, require an α=0.05 a test power of 80%, and we want to detect a difference between the groups that is half the standard deviation, we need to test 64 subjects.

In [9]:
power.tt_ind_solve_power(effect_size=0.5, alpha=0.05, power=0.8)

63.765611775409525

Tells us that if we have an α=0.05, a test power of 80%, and 25 subjects in each group, then the smallest difference between the groups is 81% of the sample standard deviation.

In [4]:
power.tt_ind_solve_power(nobs1=25, alpha=0.05, power=0.8)

0.9357585233711874

# Example (one group t-test)
Suppose a researcher wants to design a new study with a power of 0.8 and a significance of 0.05 to test whether the caffeine content for a brand of coffee is really 100mg. A previous study gave a mean caffeine level for this brand of 110 mg and a standard deviation of 7 mg.  Use PROC POWER to determine how many cups of coffee need testing.



In [4]:
effect_size = (110 - 100) / 7
nobs = power.tt_solve_power(effect_size=effect_size, alpha=0.05, power=0.8)
nobs = ceil(nobs)
actual_power = power.tt_solve_power(effect_size=effect_size, alpha=0.05, nobs=nobs)
print('nobs:', nobs, 'actual power:', actual_power)

nobs: 5 actual power: 0.670814322908


# Credits:
* https://en.wikipedia.org/wiki/Type_I_and_type_II_errors
* http://work.thaslwanter.at/Stats/html/statsAnalysis.html