# Chemometrics
<br>
**Julien Wist / 2017 / Universidad del Valle**
<br>
**Andrés Bernal / 2017 / ???**

An up-to-date version of this notebook can be found here: https://github.com/jwist/chemometrics/

In [2]:
options(repr.plot.width=4, repr.plot.height=4) # change these setting to plot larger figures

## power of an experiment

http://www.sciencedirect.com/science/book/9780121790608


In [3]:
library(pwr) # follows Cohen's book

### directionality

If a research seek to reject the true null hypothesis by comparing A and B values (that could be the mean of 2 populations) if rejection is expected when A and B differ, then the test has no direction (two-tailed). To the contrary, if the true null hypothesis could be rejected when A is larger than B, then the test has a direction (one-tailed).

### significance level, $\alpha$

The significance level is the rate at which the true null hypthesis may be rejected. A researcher claims that the results from his sample are significant *if* the probability to find such results if the true null hypothesis is below $\alpha$.

$\alpha$ is also known as type-I error.

### power of a statistical test, $\beta$

"The power of a statistical test of a null hypotheis is the probability that it will lead to the rejection of the null hypothesis, i.e., the probability that it will result in the conclusion that the phenomenon exists" [Cohen]

It illustrates the fact that even if an effect exists in the population (the total of the individuals) it is not obvious that it will be represented in a sample (of the population).

For example, if the power of a test is low and no effect is found, the result should be considered with care. It is analogous to the conclusion that no substance is present because no signal is observed. A chemist will ask about the limit of detection of the equipment before concluding that the substance is not present.

Thus, the power depends on three parameters, the significance, the reliability of the sample result (sample size) and the effect size, which is the degree to which the phenomenon exists (think of the prevalence in our previous examples)

 - The reliability of the sample result is often obtained as the standard deviation of the sample (the subset of the population). 
 - The significance level is $\alpha$ and is chosen by the user as an adjustable parameter.
 - The effect size is often unknown and difficult to estimate. As its name indicate it has to do with the size of the effect that should be observed. Strong effects are easily detected, while slight ones are mucho more difficult to observe with certainty.

In [4]:
# we estimate the standard deviation from the quantiles
# for infected group
q75 = 87500 
q25 = 30000
  
X_75_25 <- q75 - q25

# Compute IQR in SD units
S_75_25 <- qnorm(.75) - qnorm(.25)

# Estimate standard deviation
S_X_i <-  X_75_25 / S_75_25

# for control group
q75 = 5000
q25 = 0

X_75_25 <- q75 - q25

# Compute IQR in SD units
S_75_25 <- qnorm(.75) - qnorm(.25)

# Estimate standard deviation
S_X_c <-  X_75_25 / S_75_25

# we estimate the mean for infected gorup
m1 = 70000
# and for control group
m2 = 2000
 
# we estimate cohen d factor
d = (m1 - m2 ) / max(c(S_X_i, S_X_c))

# and we compute the power for n
N = 50
pwr.t.test(d = d, n = N/2, sig.level=0.01, type="two.sample", alternative="two.sided")



     Two-sample t test power calculation 

              n = 25
              d = 1.595315
      sig.level = 0.01
          power = 0.9979137
    alternative = two.sided

NOTE: n is number in *each* group
