# Mixed-effects: The Best Way
In order to improve upon the FFX approach, we need a method that accommodates the between-subjects variance. To see how to do this, let us return to our hierarchical model from earlier

$$
\begin{align}
y_{ij}  &= \mu_{j} + \epsilon_{ij} &\quad \text{(Level 1)} \\
\mu_{j} &= \mu + \eta_{j} &\quad \text{(Level 2)} \\
\end{align}
$$

If we collapse across the levels (by replacing $\mu_{j}$ with its equality from Level 2), we create a single model

$$
y_{ij}  = \mu + \eta_{j} + \epsilon_{ij}
$$

Importantly, this model contains a single parameter associated with the population mean (the $\mu$ term) and *two* error terms ($\eta_{j}$ and $\epsilon_{ij}$). Because the population means is considered a *constant* (i.e. it does not change from measurement-to-measurement), it is known as a *fixed-effect*. As such, any random variation in the value of $y_{ij}$ comes from both $\eta_{j}$ and $\epsilon_{ij}$. As such, these terms are both *random variables*, with distributions assumed to have the following form

$$
\begin{align}
\eta_{j}       &\sim \mathcal{N}(0, \sigma^{2}_{b})     \\
\epsilon_{ij}  &\sim \mathcal{N}(0, \sigma^{2}_{w_{j}}) \\
\end{align}
$$

As such, these error terms capture both the *between-subject* and *within-subject* variances. Because they are random variables, they are termed *random-effects*. Because our model now contains both kinds of effects, it is known as a *mixed-effects* (MFX) model.

```{note}
In the world of fMRI, MFX analyses are often referred to as *random-effects* (RFX) models. This corresponds to the fact that we are treating subjects as a random draw from a population. The subjects are therefore seen as *random* rather than *fixed*. It is *not* implying that the model only contains random effects, though this would be the interpretation if this term was being used in the usual statistical sense. Just remember that RFX and MFX are used somewhat interchangeably in fMRI, just to make sure everyone is as confused as possible.
```

Importantly, collapsing the two levels together means that our overall probability model for an individual subject's data is given by

$$
y_{ij} \sim \mathcal{N}(\mu, \sigma^{2}_{b} + \sigma^{2}_{w_{j}}).
$$

Thus, each data-point we sample can be thought of as containing some mixture of the *within-subject* and the *between-subject* variance. The advantage of MFX models is that they can *separate* these two sources. This has several practical advantages
- The correct variance terms can be selected for testing different effects ...
- Subjects who are *noisy* (i.e. who have larger values of $\sigma^{2}_{w_{j}}$) can be *down-weighted* in the analysis. This means the model will automatically trust cleaner data sets and use the information from less-noisy subjects more than noisy subjects.

## Computational and Practical Challenges for MFX
