# The Multiple Testing Problem
So far, we have managed to define a framework for testing hypotheses using contrasts. We have seen how to turn these contrasts into maps of $t$-statistics, and have seen how we can then threshold these maps using the associated $p$-values using a threshold of $p < 0.05$. Although it may seem like we are essentially done, those of you with some statistical knowledge may have already spotted a big issue known as the *multiple testing problem*.

## The General Multiple Testing Problem
To understand the multiple testing problem, consider what a significance threshold of 5% actually means. In the world of frequentist inference, if we repeated our experiment multiple times and performed the same hypothesis test, we would only expect a significant $p$-value to occur 5% of the time if the null hypothesis was true. A small $p$-value therefore means either something rare has happened, or the null hypothesis is false.

This is fine for a *single* hypothesis test, but if we did keep repeating the experiment over and over again, then suddenly those rare events are not so rare anymore. To see this, consider that the probability of at least one result from all those repeated experiments being significant is given by

$$
\text{FWER} = 1 - (1 - \alpha)^{m}.
$$

This is known as the *family-wise error rate*, where $\alpha$ is the significance level and $m$ is the number of tests. If we only conduct 1 test with $\alpha = 0.05$, then 

$$
\text{FWER} = 1 - (1 - 0.05)^{1} = 1 - 0.95 = 0.05,
$$

as we would expect. However, if we conducted 10 tests then we get

$$
\text{FWER} = 1 - (1 - 0.05)^{10} = 1 - 0.60 = 0.40.
$$

So suddenly, the chance of at least one of those tests being significant is 40%, not the 5% that we wanted. Remember, this is the probability of significance *when the null is true*. So even if there is no effect, there is suddenly a 40% chance of declaring a result significant and thus making a [Type I error](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors).

```{admonition} Advanced: Understanding the Probability of Multiple Testing
:class: dropdown

To understand the probability behind the multiple testing problem, imagine rolling multiple dice simultaneously. The probability of rolling a 6 with a single die is $1/6 \approx 16.7\%$. The probability of at least one die showing a 6 when rolling multiple dice is given by

$$
P(6) = 1 - (1-0.167)^{m}, 
$$

where $m$ is the number of dice. If we were to roll 2 dice, this gives

$$
P(6) = 1 - (1-0.167)^{2} = 1 - 0.69 = 0.31, 
$$

meaning our chance of getting a 6 had increased from $\approx 16.7\%$ to $\approx 31\%$

```