# The Need for Statistical Modelling

## Why Do We Want to Model fMRI Data?

In order to understand the need for statistical modelling, consider an fMRI dataset of dimensions 50 $\times$ 60 $\times$ 50. A single volume alone will contain 150,000 voxels. If we had TR=2 and scanned for only 10 minutes we would collect 300 volumes. This leads to a total of 45,000,000 data points from this basic fMRI dataset. In part, the aim of statistical modelling is to help us make sense of this huge amount of data. In this sense, modelling the data is a means of condensing all these values into a smaller set of values that provide information on the magnitude of the experimental effects.

A related point is that statistical modelling allows us to separate the effects of interest from the noise. As we know, the variation in the BOLD signal could comes from a variety of sources. This includes subject motion and scanner noise as well as true changes associated with our experiment. A statistical model allows us to separate out these effects so that we can focus only on those of interest and discount any sources of variation not associated with our experiment, as illustrated in {numref}`modelling-fig`.

```{figure} images/modelling.png
---
width: 800px
name: modelling-fig
---
Illustration of how a statistical model splits a dataset into the effects of interest and error.
```

A model also allows us to quantify our degree of certainty about the experimental effects given the amount of noise in the data. This certainty can be combined with the magnitudes of the experimental effects to form statistics that can be used to answer questions about which brain regions are associated with our experimental task, as well as how those regions change across different experimental conditions. All of this allows us to simplify our data down into the parts we are most interested in and reach conclusions about the association between brain responses and our experiment.

```{note}
The term *model* can be thought of in a similar fashion to its everyday use. In much the same way that a *model* is a simplified replication of something, a statistical model is a simplification of reality. In this sense, a model is not designed to create an *exact* prediction of the phenomena under study. Instead, a good model captures the general patterns in the relationships between the variables of interest, producing a prediction that is *close enough* to have some utility. As the statistician George Box stated: "All models are wrong, but some are useful". 
```

## How Do We Model fMRI Data?

Recall that each voxel from a single volume in an fMRI dataset represents a single point of a time series. This time series reflects how the measured signal changed during the course of our experiment. As illustrated in {numref}`timeseries-fig`, there is a time series at every voxel in an fMRI dataset.

```{figure} images/time-series-everywhere.png
---
width: 800px
name: timeseries-fig
---
Illustration of how each voxel of an image is associated with a time series of BOLD signal change.
```

In the Statistical Parametric Mapping approach to modelling fMRI data, we fit a statistical model to each one of these timeseries separately. The estimated parameters from those models then provide us with some indication about the magnitude and direction of our experimental effects. The tool that SPM uses to do this is known as the General Linear Model (GLM). The GLM is one of the most important concepts you willl learn about in this course and will be the main focus of this lesson and, in many ways, this whole module.