# The importance of cross-validation

In [1]:
import warnings
warnings.filterwarnings("ignore")

To date, we have focussed on "feature engineering" quite broadly.
When applying machine learning to neuroimaging data, however, equally important are (1) the model that we train to generate predictions and (2) how we assess the generalizability of our learned model.
In this context, appropriate cross-validation methods are critical for drawing meaningful inferences.
However, many neuroscience researchers are not familiar with how to choose an appropriate cross-validation method for their data.

```{figure} ../images/poldrack-2020-fig3.jpg
---
height: 250px
name: cv-usage
---
From {cite}`Poldrack_2020`, depicting results from a review of 100 Studies (2017–2019) claiming prediction on fMRI data.
_Panel A_ shows the prevalence of cross-validation methods in this sample.
_Panel B_ shows a histogram of associated sample sizes.
```

We briefly overview what cross-validation aims to achieve, as well as several different strategies for cross-validation that are in use with neuroimaging data.
We then provide examples of appropriate and inappropriate cross-validation within the `development_fmri` dataset. 

## Why cross-validate ?

First, let's formalize the problem that cross-validation aims to solve, using notation from {cite}`Little_2017`. 

For $N$ observations, we can choose a variable $y \in \mathbb{R}^n$ that we are trying to predict from data $X \in \mathbb{R}^{n \times p}$ in the presence of confounds $Z \in \mathbb{R}^{n \times k}$⁠.
For example, we may have neuroimaging data for 155 participants, from which we are trying to predict their age group as either a child or an adult.
There are additional confounding measures in this prediction, both measured and unmeasured.
For example, motion is a likely confounding variable, as children often move more in the scanner than adults.

In this notation, we can then consider $y$ as a function of X and Z:

$$
  y = Xw + Zu + \epsilon
$$

where $\epsilon$ is observation noise, and we have assumed a strictly linear relationship between the variables.

In such model, $\epsilon$ may be independent and identically distributed (i.i.d.) even though the relationship between $y$ and $X$ is not i.i.d; for example, if it changes with age group membership.

The machine learning problem is to estimate a function $\hat{f}_{\{ train \}}$ that predicts best $y$ from $X$.
In other words, we want to minimize an error $\mathcal{E}(y,\hat{f}(X))$⁠.

The challenge is that we are interested in this error on new, unknown, data.
Thus, we would like to know the expectaction of the error for $(y, X)$ drawn from their unknown distribution:

$$
  \mathbb{E}_{(y,X)} [\mathcal{E}(y,\hat{f}(X))].
$$

From this we note two important points.
  1. Evaluation procedures _must_ test predictions of the model on held-out data that is independent from the data used to train the model.
  2. Cross-validation procedures that repeating the train-test split many times to vary the training set also allow use to ask a related question:
    given _future_ data to train a machine learning method on a clinical problem, what is the error that I can expect on new data?


## Forms of cross-validation

Given the importance of cross-validation in machine learning, many different schemes exist.
The [scikit-learn documentation has a section](https://scikit-learn.org/stable/modules/cross_validation.html) just on this topic, which is worth reviewing in full.
Here, we briefly highlight how cross-validation impacts our estimates in our example dataset.

## Testing cross-validation schemes in our example dataset.

We'll keep working with the same `development_dataset`, though this time we'll fetch all 155 subjects.
Again, we'll derive functional connectivity matrices for each participant, though this time we'll only consider the "correlation" measure.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from nilearn import (datasets, maskers, plotting)
from nilearn.connectome import ConnectivityMeasure
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.svm import LinearSVC

development_dataset = datasets.fetch_development_fmri()
msdl_atlas = datasets.fetch_atlas_msdl()

masker = maskers.NiftiMapsMasker(
    msdl_atlas.maps, resampling_target="data",
    t_r=2, detrend=True,
    low_pass=0.1, high_pass=0.01).fit()
correlation_measure = ConnectivityMeasure(kind='correlation')

pooled_subjects = []
groups = []  # child or adult

for func_file, confound_file, phenotypic in zip(
        development_dataset.func,
        development_dataset.confounds,
        development_dataset.phenotypic):

    time_series = masker.transform(func_file, confounds=confound_file)
    pooled_subjects.append(time_series)
    groups.append(phenotypic['Child_Adult'])

_, classes = np.unique(groups, return_inverse=True)
pooled_subjects = np.asarray(pooled_subjects)



Downloading data from https://osf.io/download/5c8ff3eb2286e80019c3c198/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ed2286e80017c41b56/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ee2286e80016c3c379/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ee4712b400183b70c3/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3efa743a9001660a0d5/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f14712b4001a3b560e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f1a743a90017608164/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f12286e80016c3c37e/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f34712b4001a3b5612/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f7a743a90019606cdf/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f6a743a90017608171/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f64712b400183b70d8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f72286e80019c3c1af/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3f92286e80018c3e463/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4534712b400183b716d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3fb2286e80017c41b72/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3fb2286e80019c3c1b3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3fd4712b400183b70e6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3fe4712b4001a3b5620/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ff4712b400173b5399/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff401a743a9001660a104/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff403a743a90017608181/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4034712b400183b70f6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4042286e80019c3c1c2/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4052286e80017c41b92/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4064712b400183b70fe/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4074712b400183b7104/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff40aa743a9001660a119/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4092286e80017c41ba7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff40b2286e80016c3c39a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff40d2286e80016c3c39f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff40da743a90018606eac/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff40e4712b400173b53a8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4104712b400173b53ad/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4112286e80016c3c3a5/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff412a743a9001660a128/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff414a743a90019606cfc/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff416a743a90019606d01/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff417a743a9001660a130/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4184712b400193b5c19/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41a2286e80019c3c1de/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41aa743a9001660a13b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41b2286e80016c3c3b6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41d2286e80018c3e499/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41da743a900176081a2/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff41ea743a90018606ec7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4202286e80019c3c1e2/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4212286e80018c3e49d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4212286e80019c3c1e6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff424a743a900176081af/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4264712b400193b5c2f/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4252286e80017c41bfc/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4282286e80017c41c0a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff4292286e80017c41c0f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47057f2be3c0019030a1f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e63f2be3c0017056ba9/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704af2be3c001705703b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e7a353c58001a9b3324/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3952286e80016c3c2e7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3954712b400193b5b79/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47016a3bc970018f1fc88/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e6ba3bc970019f07152/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff395a743a900176080af/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3964712b400193b5b7d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff399a743a9001660a031/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3982286e80017c41a29/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39aa743a90018606e21/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39aa743a900176080ba/ ...


 ...done. (4 seconds, 0 min)


Downloading data from https://osf.io/download/5cb470153992690018133d3b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e793992690017108eb9/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47038353c5800199ac9a2/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e85a3bc97001aeff750/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4701c3992690018133d49/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e1c3992690018133a9e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39aa743a900176080bf/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39d4712b400193b5b89/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4703039926900160f6b3e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e4d353c58001b9cb325/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700af2be3c0017056f69/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e0cf2be3c001801f757/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4702b39926900171090e4/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e35f2be3c00190305ff/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39ca743a90019606c50/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a2a743a9001660a048/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4701ff2be3c0017056fad/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e0339926900160f6930/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a12286e80017c41a48/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a12286e80016c3c2fc/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff39fa743a90018606e2f/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a34712b4001a3b55a3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4703439926900160f6b43/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e40f2be3c001801f77f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a34712b400193b5b92/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a84712b400183b7048/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47020f2be3c0019030968/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e6f353c58001a9b3311/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a72286e80017c41a54/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a7a743a90018606e42/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4702639926900190faf1d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e3f353c5800199ac787/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47049353c5800199ac9b4/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46eaa353c58001c9abebb/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a74712b4001a3b55ad/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3a72286e80017c41a59/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3aa4712b400183b704d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ac4712b4001a3b55b7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47051f2be3c001601df24/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e12f2be3c001801f75e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3aca743a9001660a063/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ac4712b400183b7051/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47032a3bc970019f07386/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e42353c58001b9cb311/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47036f2be3c001801fa3d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e6539926900190fad0c/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47057353c58001a9b353f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46ea4353c58001b9cb3a6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4703af2be3c001601def7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e51f2be3c001801f78e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ae4712b400183b7055/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3af2286e80018c3e3c0/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b02286e80018c3e3c4/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b14712b400183b705a/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47029f2be3c0019030994/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e17f2be3c00190305da/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47047f2be3c0017057034/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e23a3bc970018f1fa00/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b12286e80016c3c30f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b34712b400183b7060/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b2a743a9001660a07a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b54712b400193b5ba3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c9e99d006cd47001a5ab599/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b62286e80016c3c31b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47011f2be3c001903092f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e54353c58001a9b32f3/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b72286e80017c41a88/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b94712b4001a3b55bf/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47039a3bc970018f1fcbf/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e983992690017108ed8/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b92286e80017c41a8e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3b92286e80018c3e3e0/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ba2286e80016c3c325/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3bd2286e80017c41a9e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700bf2be3c001801f9c3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e22f2be3c0017056b52/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47019a3bc970017efe457/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e1fa3bc970018f1f9f5/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb470313992690018133d6d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e85f2be3c001601dc65/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3be4712b400193b5bab/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3bf2286e80017c41aa8/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3be4712b4001a3b55c4/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c12286e80017c41ab1/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c34712b400173b5362/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c42286e80017c41ab6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47010f2be3c0017056f80/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46ea03992690017108ee8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47022f2be3c0017056fb9/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e30353c58001b9cb2f5/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c44712b400183b7071/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c42286e80017c41abc/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700cf2be3c0017056f70/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e4fa3bc970019f0713b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4702af2be3c001601debb/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e75353c58001a9b331b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c7a743a90018606e5f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c9a743a90017608120/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47019f2be3c0019030945/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e5ba3bc97001aeff72b/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4702cf2be3c0017056fdb/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e4b353c5800199ac78f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4701a39926900171090bb/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46ea7353c58001b9cb3ac/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47010a3bc970019f0735c/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e8e3992690017108ed0/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb470423992690018133d92/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e46a3bc970018f1fa36/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700df2be3c001801f9c8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e4f353c58001a9b32e9/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4702ff2be3c00190309b0/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e8af2be3c0017056be0/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47047f2be3c001801fa64/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46ea43992690017108ef5/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c94712b4001a3b55d3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3c9a743a9001760811a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ca4712b400183b707a/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3cc2286e80017c41adc/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47027f2be3c0017056fd0/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e38a3bc970018f1fa25/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47028f2be3c001801fa13/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e2fa3bc970018f1fa11/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47050f2be3c0017057062/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e77f2be3c001601dc4f/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4701e39926900171090c7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e82f2be3c001903063e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47015f2be3c001801f9df/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e8139926900160f698a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47013f2be3c0019030935/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e65353c58001b9cb346/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3cea743a90019606c9f/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3cea743a90018606e68/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47025f2be3c001801fa04/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e09a3bc970018f1f9d8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47044a3bc970018f1fccc/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e29f2be3c0017056b6a/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47042f2be3c0017057025/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e64a3bc970019f0714b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4705239926900160f6b74/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e00f2be3c001601dbf3/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704d353c58001b9cb5d8/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46dfff2be3c001601dbef/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3ce2286e80016c3c34b/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d12286e80019c3c16f/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47010f2be3c0019030921/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e0539926900160f6935/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47052353c58001b9cb5e3/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46df9f2be3c0017056b01/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700d353c58001b9cb57d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46dfef2be3c0017056b15/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb470383992690018133d76/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46dfc3992690018133a72/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4703ef2be3c00190309d3/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e0ea3bc970019f0710c/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d3a743a90019606caa/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d34712b400193b5bc7/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d22286e80017c41af2/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d52286e80017c41afe/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704ef2be3c001801fa79/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e7339926900190fad1a/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4700fa3bc97001aeff8f0/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e16f2be3c0017056b3e/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d84712b400183b708c/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d7a743a90017608138/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704cf2be3c0017057049/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e34a3bc97001aeff717/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb470553992690018133dbe/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e29f2be3c001601dc17/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d8a743a90019606cb5/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3d8a743a90018606e75/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3dba743a90018606e7e/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3de4712b4001a3b55f4/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47017f2be3c0017056f8d/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e13f2be3c0017056b37/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704ca3bc970018f1fcda/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e0b39926900160f693c/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5cb4704c3992690018133da6/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb46e1c3992690018133aa2/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3dc4712b4001a3b55f0/ ...


 ...done. (2 seconds, 0 min)


Downloading data from https://osf.io/download/5c8ff3df2286e80018c3e421/ ...


 ...done. (1 seconds, 0 min)


Downloading data from https://osf.io/download/5cb47021353c58001b9cb59f/ ...


Downloading data from https://osf.io/download/5cb46e2f353c58001b9cb2f1/ ...


 ...done. (2 seconds, 0 min)


KeyboardInterrupt: 

In [our classification example](class-example), we used `StratifiedShuffleSplit` for cross-validation.
This method preserves the percentage of samples for each class across train and test splits; that is, the percentages of child and adult participants in our classification example.
What if we don't account for age groups when generating our cross-validation folds ?

In [None]:
# First, re-generate our cross-validation scores for StratifiedShuffleSplit

strat_scores = []

cv = StratifiedShuffleSplit(n_splits=15, random_state=0, test_size=5)
for train, test in cv.split(pooled_subjects, groups):
    connectivity = ConnectivityMeasure(kind="correlation", vectorize=True)
    connectomes = connectivity.fit_transform(pooled_subjects[train])
    classifier = LinearSVC().fit(connectomes, classes[train])
    predictions = classifier.predict(
        connectivity.transform(pooled_subjects[test]))
    strat_scores.append(accuracy_score(classes[test], predictions))
print(np.mean(strat_scores))

In [None]:
# Then, compare with cross-validation scores for ShuffleSplit

from sklearn.model_selection import ShuffleSplit
shuffle_scores = []

cv = ShuffleSplit(n_splits=15, random_state=0, test_size=5)
for train, test in cv.split(pooled_subjects):
    connectivity = ConnectivityMeasure(kind="correlation", vectorize=True)
    connectomes = connectivity.fit_transform(pooled_subjects[train])
    classifier = LinearSVC().fit(connectomes, classes[train])
    predictions = classifier.predict(
        connectivity.transform(pooled_subjects[test]))
    shuffle_scores.append(accuracy_score(classes[test], predictions))
print(np.mean(shuffle_scores))

## Leave-one-out can give overly optimistic estimates

In {cite}`Varoquaux_2017`, Varoquaux and colleagues evaluated the impact of different cross-validation schemes on derived accuracy values.
We reproduce their Figure 6 below.

```{figure} ../images/varoquaux-2016-fig6.png
---
height: 400px
name: cv-strategies
---
From {cite}`Varoquaux_2017` shows the difference in accuracy measured by cross-validation and on the held-out
validation set, in intra and inter-subject settings, for different cross-validation strategies:
(1) leave one sample out, (2) leave one block of samples out (where the block is the natural unit of the experiment: subject or session), and random splits leaving out 20% of the blocks as test data, with (3) 3, (4) 10, or (5) 50 random splits. 
For inter-subject settings, leave one sample out corresponds to leaving a session out.
The box gives the quartiles, while the whiskers give the 5 and 95 percentiles.
```

We see that cross-validation schemes that "leak" information from the train to test set can give overly optimistic predictions.
For example, if we leave-one-session-out for predictions within a participant, we see that our estimated prediction accuracy from cross-validation is much higher than our prediction accuracy on a held-out validation set.
This is because different sessions from the same participant are highly-correlated;
that is, participants are likely to show similar patterns of neural responses across sessions.


## Small sample sizes give a wide distribution of errors

Another common issue in cross-validation, particularly leave-one-out cross-validation, is the small size of the resulting test set.

```{figure} ../images/varoquaux-2017-fig1.png
---
height: 400px
name: test-size
---
From {cite}`Varoquaux_2018`, this plot shows the distribution of errors between the prediction accuracy as assessed via cross-validation (average across folds) and as measured on a large independent test set for different types of neuroimaging data.
Accuracy is reported for two reasonable choices of cross-validation strategy: leave-one-out (leave-one-run-out or leave-one-subject-out in data with multiple runs or subjects), or 50-times repeated splitting of 20% of the data.
The bar and whiskers indicate the median and the 5th and 95th percentile. 
```

The results show that these confidence bounds extends at least 10% both ways;
that is, there is a 5% chance that it is 10% above the true generalization accuracy and a 5% chance this it is 10% below.
This wide confidence bound is a result of an interaction between (1) the large sampling noise in neuroimaging data and (2) the relatively small sample sizes that we provide to the classifier.

In [None]:
# Compare with cross-validation scores for leave-one-subject-out

from sklearn.model_selection import LeaveOneOut
loo_scores = []

cv = LeaveOneOut()
for train, test in cv.split(pooled_subjects):
    connectivity = ConnectivityMeasure(kind="correlation", vectorize=True)
    connectomes = connectivity.fit_transform(pooled_subjects[train])
    classifier = LinearSVC().fit(connectomes, classes[train])
    predictions = classifier.predict(
        connectivity.transform(pooled_subjects[test]))
    loo_scores.append(accuracy_score(classes[test], predictions))
print(loo_scores)

```{bibliography} references.bib
:style: unsrt
:filter: docname in docnames
```