# <center> Lesson 8 - Hypothesis Testing </center>

All we'll do today is go through one hypothesis testing exercise. Instead of using Monte-Carlo sampling, we'll appeal to the Central Limit Theorem.

Given $X_1, \cdots, X_n \sim Poisson(\lambda)$, we want to test:

$$H_0: \lambda < 5 \text{ (null hypothesis)}$$ 

$$H_1: \lambda \geq 5 \text{ (alternative hypothesis)}$$

## Data

Here's the data. The sample size is $n = 20$.

In [10]:
from numpy import array
x = array([1, 1, 3, 0, 1, 7, 7, 5, 5, 8, 6, 4, 9, 5, 6, 4, 5, 5, 4, 1])

## Compute the estimate

Write a function `estimate_lambda_hat(x)` that takes 

In [3]:
from numpy import mean
def estimate_lambda_hat(x):
    return mean(x)

## Find p-values using the Central Limit Theorem

Let $\hat{\Lambda}$ is (random) value of the sample mean.

$$\hat{\Lambda} = \frac{1}{n}\sum_i X_i \quad \sim^a \quad Normal(\lambda, \frac{\lambda}{n})$$

Now we'll write a function `p_value(x, lambda_hyp)`, that does the following.

+ Computes $\hat{\lambda}$ from the data

+ Computes the p-value using the cdf of the Normal distribution:

$$\text{p-value = }P(\hat{\Lambda} > \hat{\lambda}) = 1 - P(\hat{\Lambda} \leq \hat{\lambda}) = 1 - F(\hat{\lambda})$$

where $F$ is the CDF of the normal

In [4]:
from numpy import sqrt, mean
from scipy.stats import norm

def p_value(x, lambda_hyp):
    lambda_hat = estimate_lambda_hat(x)
    m = lambda_hyp
    n = len(x)
    s = sqrt(lambda_hyp/n)
    return 1 - norm(m, s).cdf(lambda_hat)

In [5]:
p_value(x, lambda_hyp = 3.5)

0.021082465626681479

## Test

Write a function `test(x, lambda_hyp, alpha)` that returns the boolean `True` if the data "accepts" the Hypothesis and `False` if the data rejects it.

In [6]:
def test(x, lambda_hyp, alpha):
    p = p_value(x, lambda_hyp)
    return p > alpha

In [12]:
test(x, lambda_hyp = 3.4, alpha = .01), test(x, lambda_hyp = 3.8, alpha = .1)

(True, True)

## Discuss

+ When $\alpha = 0.01$, what is the lowest value of $\lambda$ that the data "accepts"?

+ When $\alpha = 0.1$, what is the lowest value of $\lambda$ that the data "accepts"?

+ What do these values mean in terms of Type 1 and Type 2 error? 

+ Graphically, how should we understand this?