> The goal of statistical inference is to make generalizations about the population when only a sample is available.


## Probability Distribution

In inference, we use an estimate $\hat{f}$ that predicts $Y$ in order to understand the relationship between the response and each predictor $X_1, X_2,...,X_p$. 


The population variable can take many values that are more or less likely to happen. This **probability distribution** has a numerical feature of interest called a **parameter**. The data we collect is **randomly generated** from this probability distribution. 

We draw our conclusions about the parameter from the sample **statistic**. Three important limitations:
+ Because a sample is only part of the population, the numerical value of the statistic will not be the exact value of the parameter.
+ The observed value of the statistic depends on the particular sample selected.
+ Some variability in the values of a statistic, over different samples, is unavoidable.

## Hypothesis Testing

More formally, understanding relationships between response and predictors is about testing two opposite hypotheses:
1. null hypothesis $H_0$: there is no relationship between a predictor and a response.
2. alternative hypothesis: there is a relationship.

### Experimental Design

The first step of experimental design is to formulate the null hypothesis, that we assume is true. Next, we run an experiment to test if the data supports it:
+ collect data from a sample of predetermined size _(see [Statistical Power](#Statistical-Power) below)_.
+ perform the appropriate statistical test.

Based on the experimental results, we can either reject or fail to reject $H_0$. If we reject it, we say that the data supports the alternate hypothesis.


### Test Distribution

A statistical test calculates a test statistic from the sample data, that follows a specific probability distribution under the null hypothesis. This gives us the probability of observing the experimental results under the null hypothesis.


### P-Value

We reject the null hypothesis if the probability of observing the experimental results, called the p-value, is very small under its assumption. The cutoff probability is called the level of significance $\alpha$ and is typically 5%. 

More specifically, we measure the probability that our sample(s) produce such a test statistic or one more extreme under the $H_0$ probability distribution. A low p-value means that $H_0$ is unlikely to actually describe the population: we reject the null hypothesis.

+ $P\leq\alpha$: we reject the null hypothesis. The observed effect is statistically significant.
+ $P\gt\alpha$: we fail to reject the null hypothesis. The observed effect is not statistically significant.


### Types of Errors

There are four possible outcomes for our hypothesis testing, with two [types of errors](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors):

| Decision          | $$H_0$$ is True                      | $$H_0$$ is False                     |
|-------------------:|:---------------------------------:|:---------------------------------:|
| **Reject H0** | **Type I error**: False Positive   | Correct inference: True Positive |
| **Fail to reject H0** | Correct inference: True Negative | **Type II error**: False Negative |

<br>

#### Type I error

The Type I error is the probability of incorrecly rejecting the null hypothesis when the sample belongs to the population but with extreme values; this probability is equal to the level of significance $\alpha$. It is also called False Positive: falsely stating that the alternate hypothesis is true.

#### Type II error

The Type II error $\beta$ is the probability of incorrectly failing to reject a null hypothesis; it is also called False Negative.


_Note: The probabilities of making these two kinds of errors are related. Decreasing the Type I error increases the probability of the Type II error._


### Statistical Power


[Power](https://en.wikipedia.org/wiki/Statistical_power), also called the sensitivity, is the probability of correctly rejecting a false $H_0$; It is equal to $1 - \beta$.

Two key things impact statistical power:
+ the effect size: a large difference between groups is easier to detect.
+ the sample size: it directly impacts the test statistic and the p-value.

Given the variance of data $\sigma$ and the minimum difference to detect $\delta$, a typical formula to assess [sample size](https://en.wikipedia.org/wiki/Sample_size_determination) is:

$$N = (z_\alpha + z_\beta)^2 \times \frac{\sigma^2}{\delta^2}$$

Where $z_\alpha$ and $z_\beta$ are the z-score of $\alpha$ and $\beta$, respectively. 