# Estimation and Hypothesis Testing

In [None]:
library(tidyverse)

In [None]:
options(repr.plot.width=4, repr.plot.height=3)

## Set random number seed for reproucibility

In [None]:
set.seed(42)

## Functions around probability distributions

Radnom numbers

In [None]:
rnorm(5)

PDF

In [None]:
x <- seq(-3, 3, length.out = 100)
plot(x, dnorm(x), type="l", ylab="PDF")

CDF 

In [None]:
x <- seq(-3, 3, length.out = 100)
plot(x, pnorm(x), type="l", ylab="CDF")

Quantiles (inverse CDF)

In [None]:
p = seq(0, 1, length.out = 101)
plot(p, qnorm(p), type="l", ylab="Quantile")

## Point estimates

In [None]:
x <- rnorm(10)

In [None]:
x

### Mean

Manual calculation

In [None]:
sum(x)/length(x)

Using built-in function

In [None]:
mean(x)

### Median

Manual calculation

In [None]:
x_sorted <- sort(x)

In [None]:
length(x)

Since there are an even number of observations, we need the average of the middle two data poitns

In [None]:
sum(x_sorted[5:6])/2

Using built-in function

In [None]:
median(x)

### Quantiles

The mean is just the 50 percentile. We can use R to get any percentile we like.

In [None]:
quantile(x, 0.5)

In [None]:
quantile(x, seq(0,1,length.out = 5))

**Exercise**

Gene X is known to have a normal distribution with a mean of 100 units and a standard deviation of 15 units in the US population. With respect to this population,

- (1) What is the medan value for gene X?

- (2) What is the probability of finding a value of more than 130 for gene X if you pick a person at random?

- (3) If you measrue gene X and find that it is in the 95th percentile for this population, what is the measured value?

- (4) Find answers to questions (1), (2) and (3) by simulating 1 million people sampled from the US population. Do they agree with the theoretical calculated values?

- (5) Plot the PDF of gene X using `ggplot2` for values between 50 and 150. Give it a title of `PDF of N(100, 15)`, a subtile of `I made this!`, and label the x-axis as `Gene X` and y-axis as `PDF`. Make the PDF blue, and fill the region under the curve blue with a transparency of 50%.

## Interval estimates

### Confidence intervals

In [None]:
ci = 0.95

In [None]:
alpha = (1-ci)
n <- length(x)
m <- mean(x)
s <- sd(x)
se <- s/sqrt(n)
me <- qt(1-alpha/2, df=n-1) * se
c(m - me, m + me)

Note that confidence intervals get larger as the confidence required increases.

In [None]:
ci = 0.99

In [None]:
alpha = (1-ci)
n <- length(x)
m <- mean(x)
s <- sd(x)
se <- s/sqrt(n)
me <- qt(1-alpha/2, df=n-1) * se
c(m - me, m + me)

### Making a function

#### Review of R custom functions

In [None]:
f <- function(a, b=1) {
    a + b
}

In [None]:
f(2)

In [None]:
f(2,3)

In [None]:
f(b=4, a=1)

**Exercise**

Make a function called `conf` for calculating confidence intervals for the sample mean that takes two arguments 

- x is the vector of sample values
- ci is the confidence interval with a default of 0.95

The funciton should return a vector of two numbers indicating the lwoer and upper limeit of the confidence interval

Check that it gives the same answer as the example above.

### Coverage

In 1,000 experiments, we expect the true mean (0) to lie within the estimated 95% CIs 950 times.

In [None]:
n_expt <- 1000
n <- 10
cls <- t(replicate(n_expt, conf(rnorm(n))))

In [None]:
sum(cls[,1] < 0 & 0 < cls[,2])

## Hypothesis testing

### Binomial test

In [None]:
set.seed(123)

n = 50
tosses = sample(c('H', 'T'), n, replace=TRUE, prob=c(0.55, 0.45))
t = table(tosses)
t

In [None]:
binom.test(table(tosses))

In [None]:
set.seed(123)

n = 250
tosses = sample(c('H', 'T'), n, replace=TRUE, prob=c(0.55, 0.45))
t = table(tosses)
t

In [None]:
binom.test(t)

#### What happens if we choose a one-sided test?

In [None]:
binom.test(t, alternative = "greater")

In [None]:
binom.test(t, alternative = "less")

#### What happens if we change our null hypothesis?

In [None]:
binom.test(t, p = 0.55)

## Two-sample model

### Welch t-test

In [None]:
set.seed(123)

n <- 10
x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 1, 1)

In [None]:
t.test(x1, x2)

### Standard t-test

In [None]:
t.test(x1, x2, var.equal = TRUE)

### Power of t-test

Note: `pwr` is not installed in Docker containers.

```R
library(pwr)
d <- 1
pwr.t.test(d = d, sig.level = 0.05, power = 0.9)
```

gives output

```
     Two-sample t test power calculation 

              n = 22.02109
              d = 1
      sig.level = 0.05
          power = 0.9
    alternative = two.sided

NOTE: n is number in *each* group
```

#### Interpretation of the power calculaiton

If we did many experiments with `n=23` per group where the effect size is as specified and the test assumptions are valid, we expect that at least 90% of them will have a p-value less than the nominal significance level (0.05). If we used `n=22` we would expect that just under 90% of the experiments will have a p-value less than the nomial significance level (0.05).

In particular notet that about 1-power of the experiments will fail to show a statistically significant p value even if the assumptionss are met (false negative).

In [None]:
n_expts <- 10000
n <- 22
alpha = 0.05
sum(replicate(n_expts, t.test(rnorm(n, 0, 1), rnorm(n, 1, 1))$p.value) < alpha)/n_expts

In [None]:
n_expts <- 10000
n <- 23
alpha = 0.05
sum(replicate(n_expts, t.test(rnorm(n, 0, 1), rnorm(n, 1, 1))$p.value) < alpha)/n_expts

#### Distribution of p-values under the null is uniform

That meas that you expect $\alpha$ of the experiments to be false positives.

In [None]:
n_expt <- 10000
n <- 50
ps <- replicate(n_expts, t.test(rnorm(n), rnorm(n))$p.value)

In [None]:
sum(ps < alpha)/n_expts

In [None]:
hist(ps)

### Paired and one-sample t-tests

A paired t-test is commonly used to evaluate if paired measuremetns (e..g. weight before and after a diet for the same person) has changed. The paired t-test is equivalent to a one-sample t-test  that compares the difference in measurements for the paired values with a fixed number (usually 0). 

In [None]:
x1 <- rnorm(10, 100, 15)
x2 <- rnorm(10, 100, 15)
delta <- x1 - x2

In [None]:
t.test(x1, x2, paired=TRUE)

In [None]:
t.test(delta, mu=0)

**Exercise**

Suppose that the null hypotehsis is that there is no difference in the paired measurements and the standard deviation of the differene is 2.

- Run a simulation for 100,000 experiments with `n=25` per experiment to show the distributon of the p values using a paired or one -sample t-test under the null.
- If the significance level is 0.05, how many false positive results were observed?