# Problems With Using the GLM to Model fMRI Data
Although we have now seen one way of using the GLM to model fMRI data, there are several problems with the approach described in the prevous section. These include issues with the shape and timing of the BOLD response, the correlation structure of time series data, the presence of low-frequency noise and the arbitrary scaling of BOLD data. Much of the early research into the mass-univariate method was concerned with identifying and providing solutions to these problems. This can be seen in a series of papers from Friston and colleagues called [Analysis of the fMRI time-series](https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.460010207), [Analysis of the fMRI time-series - revisited](https://pubmed.ncbi.nlm.nih.gov/9343589/), and [Analysis of the fMRI time-series - revisited -- again](https://pubmed.ncbi.nlm.nih.gov/9343600/). These papers demonstrate how it took a number of years to identify and solve the problems that accompany applying the GLM to fMRI data. Even to this day, some of the "solutions" used by SPM are contentious. As such, it is important to understand both the problems and their purported solutions in order to understand some of the limitations of the analysis approach we will be using going forward.

## Issue 1: The Shape and Timing of the BOLD Response
The first problem with the approach described in the previous section is the *shape* and *timing* of the predictor variables. So far, we have used a dummy variable to indicate the onset and offset of the experimental conditions. This implies an instantaneous change in signal at the start of each experimental condition, as well as an instantaneous change in signal at the end of the experimental condition. However, as we know, the BOLD response is far from instantaneous. In {numref}`hrf-fig` we can see a typical BOLD response to a stimulus at time 0. 

```{figure} images/hrf.png
---
width: 500px
name: hrf-fig
---
Illustration of a typical BOLD response to a single stimulus at time 0.
```

Notice how it takes around 6 seconds for the signal to reach its peak and around 20 seconds to come back to baseline. As such, if we use a dummy variable built from the onset times of the stimuli, we run the risk of our model not fitting the data simply because the peak BOLD response is offset by around 6 seconds. This lack of fit will lead to larger errors and a larger variance estimate. This will directly impact the magnitude of the standard errors, leading to a less sensitive model. This issue is illustrated in {numref}`bad-fit-fig`.

```{figure} images/bad-fit.png
---
width: 600px
name: bad-fit-fig
---
Illustration of how a dummy variable leads to poor correspondance with the BOLD signal given the delayed response to the stimuli.
```

One solution to this problem would be to offset the dummy variable in time to accommodate the delayed response. This would help improve the fit, but we would still not be capturing the *shape* of the actual response. In order to do so, we use the process of *convolution*, which we discussed earlier on the course. By convolving the dummy variable with a model of the hemodynamic response, we can create a much more realistic prediction of what the signal should look like. This is illustrated in {numref}`hrf-conv-fig`.

```{figure} images/hrf-convolve.png
---
width: 800px
name: hrf-conv-fig
---
Demonstration of how convolution of a hemodynamic response model (*left*) with a dummy variable (*middle*) can produce a prediction for the shape of the BOLD signal in response to the experimental manipulation (*right*).
```

This model of the hemodynamic response is known as the *hemodynamic response function* (HRF), and was derived using deconvolution methods by several authors in the mid-90s. The specific version shown in {numref}`hrf-conv-fig` is referred to as the *canonical* HRF by SPM, and was created by combining two gamma distributions. As such, this is sometimes termed the *double-gamma* HRF. We will discuss more about the justification for the HRF and the convolution operator in the *Experimental Design & Optimisation* module. For now, just consider the comparison in {numref}`conv-error-fig` of a model without convolution (*left*) and with convolution (*right*). As should hopefully be clear, the model *with* convolution provides a much better fit to the fMRI time series. 

```{figure} images/convolution-error.png
---
width: 800px
name: conv-error-fig
---
Comparison between a model without convolution (*left*) and with convolution (*right*).
```

## Issue 2: Autocorrelation
The second problem with the approach described in the previous section is that it does not acknowledge that we are working with *time series* data. A particularly important element of the theory behind the GLM is that this method is only correct if the data in $\mathbf{Y}$ are *independent*. In other words, that there is no *correlation* between the data points. This matters because correlation has an impact on the standard errors and thus influences the magnitude of the test statistics and the calculated *p*-values. So far, we have used the GLM as if the time series contains no correlation. Unfortunately, time series data have a very specific correlation structure known as *autocorrelation*. This means that values close in time are more correlated than values far away in time. Becuase of this, we need some way to take this correlation structure into account to make sure the GLM calculations are accurate.

In order to accommodate the correlation in the time series, SPM performs an initial model fit at every voxel and then uses the errors to estimate the correlation structure in the data. This estimation is performed using an autoregressive model of order 1, usually shortened to AR(1). This is given by

$$
\begin{align}
\epsilon_{t} &= \rho \epsilon_{t-1} + \tau_{t} \\
\tau_{t} &\sim \mathcal{N}(0,\sigma)
\end{align}
$$

Here, $\rho$ is the correlation and $\epsilon_{t-1}$ is the error one step back in time. The main consequence of this structure is that when the covariance (correlation) of two errors is calculated, you get

$$
\text{Cov}\left(\epsilon_{t},\epsilon_{t+n}\right) = \frac{\sigma^{2}}{1-\rho^{2}}\rho^{|n|}
$$

This means that the correlation gets *weaker* the further apart in time the data points are. This therefore captures a simple version of autocorrelation that can be estimated efficiently by SPM using maximum likelihood methods. However, it is notable is that SPM does not estimate this correlation structure uniquely at each voxel. Instead, a pool of voxels is used to estimate a *single* correlation structure that is assumed to be the same everywhere in the brain. This is done in the name of computational efficiency, but will never be true in reality.

Once SPM has estimated the AR(1) model, it uses the estimated correlation structure to create a *whitening matrix* ($\mathbf{W}$) that can be used to remove the correlation from the data. SPM does this by pre-multiplying the data and design

$$
\mathbf{WY} = \mathbf{WX}\boldsymbol{\beta} + \mathbf{W}\boldsymbol{\epsilon}
$$

From a practical perspective, after the whitening procedure our data and our design matrix will be different. This is because removing the correlation from the data breaks the connection between the data and the predictor variables, given that the shape of the BOLD response will change. In order to make sure our predictor variables are still accurate, they must also be adjusted by the whitening matrix. This is the reason why you will see the design matrix change colour in SPM during the course of the statistical modelling.

## Issue 3: Low-frequency Noise
A third issue we face with applying the GLM to fMRI data is that the timeseries is often contaminated by low-frequency noise, also known as signal drift. For example, the timeseries below shows a steady increase in signal magnitude over time.

```{figure} images/drift.png
---
width: 800px
name: drift-fig
---
Illustration of signal drift caused by low-frequency noise in the fMRI time series.
```

The problem with the drift is that it will bias the parameter estimates. For instance, we may get the impression that the signal change is larger in one experimental condition simply because there were repeats closer to the end of the experiment when the signal was larger than at the start of the experiment. Unfortunately, the cause of the drift relates to the scanners themselves and thus is difficult to avoid. As such, we have to find a way of removing this drift from the data instead.

The way SPM does this is by high-pass filtering the data. This is done using a discrete cosine transform (DCT) basis set. This is essentially a series of cosines increasing in frequency up to the desired filter cutoff, as shown below. The high-pass filter can be enacted by adding the DCT basis set as extra columns in the design matrix. This works in a similar fashion to the Fourier transform, in the sense that a linear combination of periodic functions can be used to represent any signal. In this case, the cosines will act together to remove any frequency below the cutoff from the data.

In order to not clutter up the design matrix, SPM actually takes a slightly different approach (which works out the same as adding the cosines to the design matrix). What SPM does is use the cosines to form a filtering matrix (denoted S) which can be used to pre-multiply the data and design, much in the same way as whitening.

$$
\mathbf{SY} = \mathbf{SX}\boldsymbol{\beta} + \mathbf{S}\boldsymbol{\epsilon}
$$

So this removes the low-frequencies from the data and the design, using the same principles as the whitening procedure. The only element of this we need to concern ourselves with is what cutoff to use. By default SPM chooses 128 seconds, which is equivalent to 1/128 = 0.008Hz. So any frequency below 0.008Hz will be removed from the data. Practically, this means we should design our experiments so that any periodic changes of interest (such as our experimental conditions) do not occur slower than around every 2 minutes, otherwise we run the risk of removing experimental signal with the filter. As we will see, there are facilities within SPM that can be used to find a suitable filter value and this is something we will also come back to in the Experimental Design and Optimisation module. This filtering is also another reason why you will see the design matrix change colour during the course of the analysis in SPM.

## Issue 4: Image Scaling
The final issue that we have not considered is that the BOLD signal itself is measured on an arbitrary scale that can differ from subject-to-subject or scanner-to-scanner. The problem is that the GLM parameters are on the same scale as the data. For instance, β1 gives the change in signal for a unit change in our first predictor variable. Unfortunately, this means we cannot meaningfully compare the magnitude of these parameters between subjects, because their value depends upon the arbitrary scale of the signal. To get around this, SPM will scale the data such that the mean of the data over all voxels and all volumes is equal to 100. This is known as grand mean scaling and will effectively put all our subjects on the same scale. This is done completely automatically in the background for you, so why do we care? Well you have to be slightly careful about interpretation here because the temptation might be to think that this scaling renders the parameters on some sort of standardised and interpretable scale, such as percentage signal change. However, this is not correct and the actual procedure to convert the parameters to percentage signal change is much more involved (as explained by Pernet (2014)). So note that SPM automatically scales the data, but remember this does not result in some standardised and easy to interpret metric for our effects.

## The Adapted GLM
So we have now seen that in order to make the GLM work with fMRI data we have to convolve our dummy variables with the HRF, estimate and remove the autocorrelation in the data, filter the data to remove low-frequency noise and scale the data to put it on a common scale across all subjects. All these steps form an adapted version of the GLM, as illustrated below

where the "star" symbol indicates the whitened and filtered versions of the data, design matrix and errors. Once the parameters have been estimated, they can be used to multiply each column to scale the prediction to best match the data, as shown below. Notice how the constant takes on the baseline level of signal (400 in this example) and that the conditions are simply changes relative to this baseline.

Adding all the scaled columns together provides our final model prediction, giving us a clear perspective on what the GLM is doing to try and separate the true signal from noise.
