# Hypothesis and Inference
- Science in general seek to find the truth. In stats, we use numbers to explain the population. 
- Most of the time, we can't actually figure out the truth about the population. It is simply too big (too many humans to interview, too many vehicles to count, etc)
- so we take samples and conduct analysis on them. And we suppose that the population behave similarly compared to the sample i.e. statistical attributes of the samples can apply to the population. 
- Once a certain belief about population is accertained, there will be opposition. 
 - "how could be trust the established norm?"
- In such situation, the burden is on the scientist to give enough evidence to reject the current belief. 
- Hypothesis testing is the process to do this.

## Statistical Hypothesis Testing Steps
1. State the hypothesis
2. Set decision criteria
3. Compute test stats
4. Make decision

## 1. State the hypothesis
We will go through the process with the coin flip example

hypothesis: a premise (or claim) that we want to test/ investigate
 - H0 - null hypothesis: the default/ the established/ the currently accepted value for a parameter
  - e.g. from previous studies
 - Ha - alternative hypothesis: also called the researche hypothesis. Involve the claim to be tested. NOTE that we are testing H0 because we think it is wrong. 
 - Note that the H0 and Ha are mathematical opposites.
  - The alternative alternative is simply what the null hypothesis is not.
 - we will assume first that the null hypothesis is true until evidence suggest otherwise

| Layman | Stats |
| --- | --- | 
| Current Belief: the coin is fair | H0: p(lands heads) = 0.5 |
| Alternate belief: the coin is NOT fair | Ha: p(lands heads) != 0.5 |

NOTE that we do not assert that p(lands heads) = 0.51 or any other value. The alternate hypothesis is the OPPOSITE of the null hypothesis

Now, we will conduct the test by collecting sample

Possible outcomes:
- Reject the null hypothesis
 - we are not saying that the alternative is the truth. we are saying that it is more true than the null. 
 - we can't really prove that something is true. 
- fail to reject the null hypothesis
 - still 'king of the hill' till a new contender comes along

## 2. Set decision criteria
- Science place a conservative standard to meet for a researcher to claim that he/she has made an important discovery (rejecting the current belief/ null hypothesis.
- we set a standard value alpha, with a conventionally accepted conservative value of .05
- How do we understand this value?
 - Assuming that the null hypothesis is true (p(lands heads) is indeed 0.5),
 - an alpha set to .05 means that we may reject the null if the observed data (from the sample) are so unusual that they would have occured by chance at most 5% of the time. 
 - i.e. reject the notion that "the coin is fair" if, from the observed data, we find that p(lands heads) = 0.3 or 0.2 or 0.8. These are so rare that if we observe this, we can't be sure of the null anymore. 
 - the smaller the alpha, the more stringent the test (more unlikely to find a statistically significant result)

## 3. Compute test stats
- once alpha has been set, a statistic is computed e.g. probability in this case, but could be mean, correlation etc. 
- each statistic has an associated probability value called p-value
 - p-value: the probability that a particular observed statistic occuring due to chance, given the sampling distribution
 - i.e different distribution have different p value
 - e.g. in a normal distribution, the chance that a value is close to the mean is very high, but the chance that a value is far away from mean (e.g. mean + 2*std) is very low

### Make decision
Now we use alpha and p-value
- alpha sets the standard for how extreme the data must be before we can reject the null hypothesis
 - e.g. must happen only 0.05 of the time
- p-value indicates how extreme the data ARE. 
- compare p-value with alpha to determine if the observed data are statistically different from the null. 
 - p-value <= alpha. reject null hypothesis. result is statistically significant (we are sure that there is something besides chance alone that gave us an observed sample
 - p-value > alpha. fail to reject null hypothesis. we are reasonably sure that our observed data can be explained by chance alone. 

## Back to the coin flip example
1&2. hypothesis is made, and we can set alpha as .05

Testing:
- Flip coin some number n times, count the number of heads X
- Each coin flip is a Bernoulli/ binomial trial (random trial with exactly two possible outcomes), 
 - X ~ Binomial(n,p) random variable,
 - which we can _approximate using the normal distribution_ (from Central Limit Theorem)

In [11]:
def normal_approximation_to_binomial(n, p):
    """finds mu and sigma corresponding to a Binomial(n, p)"""
    mu = p * n
    sigma = math.sqrt(p * (1 - p) * n)
    return mu, sigma

 - Remember: Whenever a random variable follows a normal distribution, 
 - we can use normal_cdf to __figure out the probability that its realized value lies within (or outside) a particular interval.__

It is the difference in the area under cdf

# Reference
- Chapter 7 of Data Science From Scratch
- https://www.youtube.com/watch?v=VK-rnA3-41c&feature=youtu.be
- https://www.thoughtco.com/the-difference-between-alpha-and-p-values-3126420
- https://www.sagepub.com/sites/default/files/upm-binaries/40007_Chapter8.pdf
- https://courses.washington.edu/p209s07/lecturenotes/Week%205_Monday%20overheads.pdf
- https://stats.stackexchange.com/questions/21581/how-to-assess-whether-a-coin-tossed-900-times-and-comes-up-heads-490-times-is-bi
- https://ipython-books.github.io/72-getting-started-with-statistical-hypothesis-testing-a-simple-z-test/