# Solutions to the Multiple Testing Problem
As mentioned at the end of the previous section, the multiple testing problem presents a large barrier to using the approach we have discussed so far. Luckily, there are some solutions available. In this section we will discuss the three main approaches you will need when working with `SPM`.

## Solution 1: A Stricter Threshold
The first approach we might consider is to simply use a stricter threshold than 5%. Typically, in imaging, the most liberal threshold we can get away with is $p < 0.001$. By using this threshold, we reduce the number of false positive down to a worst-case scenario of 0.1% of the voxels. For an image of 100,000 voxels, this would mean 1,000 false positives, for an image of 50,000 voxels this would mean 500 false positives and so on. Although this is still not great, it is nowhere near as bad as using the traditional 5% threshold. The advantage is that this will still retain good sensitivity, because we know we should also be seeing the majority of true effects as well. This is exemplified in {numref}`poldrack-uncorr-thresh-again-fig`, repeated from the previous section. Although there are lots of results outside the yellow circle, importantly we have identified the majority of true positives inside the circle as well.

```{figure} images/poldrack-uncorr-thresh.png
---
width: 800px
name: poldrack-uncorr-thresh-again-fig
---
Visualisation of thresholding an image that contains true activations (within the circle) and noise (outside the circle) using $p < 0.10$. Notice that, despite the false-positives, the sensitivity for detecting effects *within* the circle is very high.
```

## Solution 2: Control the Family-wise Error (FWE)
An alternative to a stricter threshold is to try and control the FWER directly. By doing so, we can limit the possibility of any false positives. If we control the FWER at 5% it means that there is only a 5% chance of any false positives in the image. This means that, after correction, we can be 95% confident that there are no false positives at all in our image. If we were to repeat our experiment multiple times, and correct our images each time with a FWER procedure, we would only expect 5% of those repeats to contain *any* false positives. So, most of the time we can be pretty confident that any results that survive a FWE-correction procedure are *true positives*. This is illustrated in {numref}`poldrack-fwe-thresh-fig` from [Poldrack, Mumford and Nichols (2011)](https://www-cambridge-org.manchester.idm.oclc.org/core/books/handbook-of-functional-mri-data-analysis/8EDF966C65811FCCC306F7C916228529#), where controlling the FWER at 10% leads to only 1/10 repeats showing any false positives (the second-to-last example, if you are having trouble seeing it). However, we should also note what this had done to our ability to see the true positives within the yellow circle.

```{figure} images/poldrack-fwe-thresh.png
---
width: 800px
name: poldrack-fwe-thresh-fig
---
Visualisation of thresholding an image that contains true activations (within the circle) and noise (outside the circle) using a FWE-corrected $p < 0.10$. Notice that only 10% of the repeats contain *any* false-positives.
```

```{tip}
When writing up your analyses using Microsoft Word, be aware of the fact that FWE will often get auto-corrected to "FEW". Please keep an eye on this as we get many submissions every year that talk about "FEW correction in SPM".
```

### Bonferroni Correction
One of the simplest ways of applying an FWE-correction is to use a Bonferroni procedure. Here, we simply create a new threshold by dividing the old threshold by the number of comparisons. So, for our image of 100,000 voxels, we have

$$
\alpha_{\text{BONF}} = \frac{0.05}{m} = \frac{0.05}{100000} = 0.0000005.
$$

Now, we only count voxels where $p < 0.0000005$ as significant. Hopefully it is clear just from this example how *strict* this approach is. It will indeed control the FWER, however, the Bonferroni correction is designed for cases where our tests are independent of one another. This means the value of one test is not connected in any way to the value of another. In fMRI, however, our tests have a degree of correlation because tests in neighbouring voxels are likely to be very similar. This is where the concept of multiple comparisons becomes a bit hazy. If two tests are perfectly correlated, does it still count as two comparisons or one? If those two tests are correlated, but not perfectly, how many tests does that count as? All of this is to say that if our tests are correlated then there should be some way of having a less severe correction, because the number of tests is not equivalent to the number of voxels.

### Random Field Theory Correction
A solution to this issue, that allows us to take the degree of correlation in an image into account, is given by the application of something called *random field theory*. This was the crowning achievement of the SPM developers and is one of the most significant additions to the world of neuroimaging. However, there is no getting around the fact that it is *complicated*. In fact, you could go as far to say that many of the people who work with fMRI data do not really understand how this works. By-and-large, they just assume that it does work. 

It is useful to have some sense of RFT correction, which is explained more fully in the advanced drop-down below. The simple explanation is that this method is able to quantify the degree of correlation in an image by calculating its *smoothness*. This can then be used to redefine the image in terms of *resolution-elements* (known as RESELS), which quantify the resolution of the image in terms of independent units. These RESELs are then used to determine the threshold needed to achieve FWER = 5%, using something known as the *Euler Characteristic* (EC), which quantifies the expected number of clusters under the null hypothesis.  Practically speaking, however, you do not need to know how this works to use `SPM`. If you are interested, read the drop-down below and consult the paper by [Nichols & Hayasaka (2003)](https://pubmed.ncbi.nlm.nih.gov/14599004/).  

````{admonition} Advanced: How Does Random Field Theory Correction Work?
:class: dropdown

To understand random field theory in more detail, we start by breaking the process into 3 steps:
- Estimate the smoothness of an image to quantify the correlation between voxels
- Enter that smoothness value into an equation to computer the expected Euler characteristic at different thresholds
- Find the threshold where the Euler characteristic tells you that only 5% of equivalent images would be expected to show at least one result. This becomes your multiple comparison correction threshold.

To unpack these steps, let us examine an image in 2D. Imagine an image that was just pure noise (i.e. a null image). If the values in that image were drawn from a Gaussian distribution, it might look something like the *left* of {numref}`smoothed-grf-fig`. As it stands, this is not a good representation of a null fMRI image, because we would expect a degree of correlation between neighbouring voxels (even if there was nothing going on). So let us apply some smoothing to the image to create that correlation. This leads to the image on the *right* of {numref}`smoothed-grf-fig`.

```{figure} images/smoothed-grf.png
---
width: 800px
name: smoothed-grf-fig
---
Illustration of an un-smoothed (*left*) and smoothed (*right*) Gaussian random field.
```


````

## Solution 3: Control the False Discovery Rate (FDR)


## Choosing a Threshold
