# Hypothesis Testing

## An assumption, a sample, and an important question

We have a set of assumptions about the population. With these assumptions, we suppose the population has a certain shape. The assumed population must then also have a certain parameter.

In contrast to the intangible assumed population and parameter, we have actual data from an observed sample and corresponding sample statistic. Remember that in general samples will resemble the population they were drawn from. But, once we actually visualize our sample it might so happen that our observed sample statistic differs substantially from the assumed parameter...

![](../images/assumed-vs-observed.png)

The question is simple: *could this observed sample statistic have really come from the assumed population?*

If not, there could be some serious repercussions -- chances are our previous set of assumptions about the population this sample came from are wrong! We would need to move to a new set of beliefs about the population.

Otherwise, if it looks like the observed statistic could have feasibly come from the assumed population, then we don't have enough evidence to abandon the assumptions. We can continue operating under those beliefs (for now at least).

\`\`\`{note}

In statistics, these two competing answers to our question give way to the formal definitions of the *null hypothesis* and *alternate hypothesis*:

- The {dterm}`null hypothesis`, denoted $H_0$, is the hypothesis that our assumptions about the population are valid. The null states that the population parameter is truly equal to the assumed value.
    $$H_0: \text{population parameter} = x$$
    Under the belief of this hypothesis, the only reason we managed to observe a sample whose statistic wasn't equal to this parameter value is *wholly do to random chance*.
    
- The {dterm}`alternate hypothesis`, denoted $H_1$, contradicts the null by stating that the true population parameter is *not* equal to the assumed value.
    $$H_1: \text{population parameter} \neq x$$
    or
    $$H_1: \text{population parameter} < x$$
    or
    $$H_1: \text{population parameter} > x$$
    This hypothesis stipulates that there was some reason *other than randomness* that we observed a sample whose statistic wasn't equal to the parameter value -- namely, the population the sample was drawn from doesn't actually have that parameter value.
\`\`\`

But how do we decide whether or not to abandon the null using unwavering, quantitative methods, rather than loose judgements? By harnessing the power of probabilities, of course!

## Simulating our sample

As you know by now, the way we data scientists answer uncertain questions is by answering with a {dterm}`probability`. So, hopefully we can find a way to say "there's an $x\%$ chance the sample statistic could have been generated by sampling from this population."

We already know how to do this! We can use our simulation skills to literally just simulate a ton of random samples from the assumed population, then calculate the probability that these samples exhibit statistics like the one we observed.

![We draw a bunch of new samples from the assumed population. Each sample in turn generates a sample statistic.](../images/sample-from-population.png)

- if h1 true, then some of these simulated samples should have randomly produced sample statistics that look like the one we originally observed
    - remember, *do to random chance alone*
- let's take a look by drawing the {dterm}`sampling distribution` of the statistics from these simulated samples

![](../images/simulated-statistics.png)