# Machine learning with `nilearn`

Although nilearn's visualizations are quite nice, its primary purpose was to facilitate machine learning in neuroimaging. It's in some sense the bridge between [nibabel](http://nipy.org/nibabel/) and [scikit-learn](http://scikit-learn.org/stable/). On the one hand, it reformats images to be easily passed to scikit-learn, and on the other, it reformats the results to produce valid nibabel images.

So let's take a look at a short multi-variate pattern analysis (MVPA) example.

**Note**: This section is heavily based on the [nilearn decoding tutorial](https://nilearn.github.io/auto_examples/plot_decoding_tutorial.html).

## Setup

In [None]:
%matplotlib inline
import numpy as np
import nibabel as nb

## Load machine learning dataset

Let's load the dataset we prepared in the previous notebook:

In [None]:
func = '/home/neuro/notebooks/data/dataset_ML.nii.gz'
!nib-ls $func

## Create mask

As we only want to use voxels in a particular region of interest (ROI) for the classification, let's create a function that returns a mask that either contains the only the brain, only the eyes or both:

In [None]:
from nilearn.image import resample_to_img, math_img
from scipy.ndimage import binary_dilation

def get_mask(mask_type):
    
    # Specify location of the brain and eye image
    brain = '/templates/MNI152_T1_1mm_brain.nii.gz'
    eyes = '/templates/MNI152_T1_1mm_eye.nii.gz'

    # Load region of interest
    if mask_type == 'brain':
        img_resampled = resample_to_img(brain, func)
    elif mask_type == 'eyes':
        img_resampled = resample_to_img(eyes, func)
    elif mask_type == 'both':
        img_roi = math_img("img1 + img2", img1=brain, img2=eyes)
        img_resampled = resample_to_img(img_roi, func)

    # Binarize ROI template
    data_binary = np.array(img_resampled.get_fdata()>=10, dtype=np.int8)

    # Dilate binary mask once
    data_dilated = binary_dilation(data_binary, iterations=1).astype(np.int8)

    # Save binary mask in NIfTI image
    mask = nb.Nifti1Image(data_dilated, img_resampled.affine, img_resampled.header)
    mask.set_data_dtype('i1')
    
    return mask

## Masking and Un-masking data

For the classification with `nilearn` we need our functional data in a 2D, sample-by-voxel matrix. To get that, we'll select all the voxels defined in our `mask`.

In [None]:
from nilearn.plotting import plot_roi
anat = '/templates/MNI152_T1_1mm.nii.gz'
mask = get_mask('both')
plot_roi(mask, anat, cmap='Paired', dim=-.5, draw_cross=False, annotate=False)

`NiftiMasker` is an object that applies a mask to a dataset and returns the masked voxels as a vector at each time point.

In [None]:
from nilearn.input_data import NiftiMasker
masker = NiftiMasker(mask_img=mask, standardize=False, detrend=False)
samples = masker.fit_transform(func)
print(samples)

Its shape corresponds to the number of time-points times the number of voxels in the mask.

In [None]:
print(samples.shape)

To recover the original data shape (giving us a masked and z-scored BOLD series), we simply use the masker's inverse transform:

In [None]:
masked_epi = masker.inverse_transform(samples)

Let's now visualize the masked epi.

In [None]:
from nilearn.image import math_img
from nilearn.plotting import plot_stat_map

max_zscores = math_img("np.abs(img).max(axis=3)", img=masked_epi)
plot_stat_map(max_zscores, bg_img=anat, dim=-.5, cut_coords=[33, -20, 20],
              draw_cross=False, annotate=False, colorbar=False,
              title='Maximum Amplitude per Voxel in Mask')

# Simple MVPA Example

Multi-voxel pattern analysis (MVPA) is a general term for techniques that contrast conditions over multiple voxels. It's very common to use machine learning models to generate statistics of interest.

In this case, we'll use the response patterns of voxels in the mask to predict if the eyes were **closed** or **open** during a resting-state fMRI recording. We'll use a support vector classifier (SVC) and leave-one-run-out cross-validation.

**Note:** This section is not intended to teach machine learning, but to demonstrate a simple nilearn pipeline.

In [None]:
from sklearn.svm import SVC
from sklearn.model_selection import LeaveOneGroupOut, cross_val_score

The labels file contains metadata for each volume, indicating the stimulus type and subject number.

In [None]:
labels = '/home/neuro/notebooks/data/labels.txt'

In [None]:
!head -n 17 $labels

Using `np.recfromcsv()`, we can refer to each column of this file by its header.

In [None]:
attrs = np.recfromcsv(labels, delimiter=" ")
attrs.shape

In [None]:
stimuli, runs = attrs['labels'], attrs['chunks']
print(np.unique(stimuli))

In [None]:
np.unique(runs)

Leave-one-subject-out cross-validation trains on `(n - 1)` subjects, and classifies the remaining subject, for each subject. Mean (across subject) cross-validation accuracy is a common statistic for classification-based MVPA.

In [None]:
# Let's specify the classifier
clf = SVC(kernel='linear')

In [None]:
%%time
# Performe the cross validation (takes time to compute)
cva = cross_val_score(estimator=clf,
                      X=samples,
                      y=stimuli,
                      groups=runs,
                      cv=LeaveOneGroupOut(),
                      n_jobs=-1)

After the cross validation was computed we can extract the overall accuracy, as well as the accuracy for each individual fold (i.e. leave-one-subject-out prediction).

In [None]:
print('Average accuracy = %.02f percent' % (cva.mean() * 100))

In [None]:
print('Accuracy per fold:', cva, sep='\n')

**Wow, 86.46% accuracy!!!** That's great! But with a simple MVPA approach we unfortunately don't know which regions are driving the classification accuracy. We just know that all voxels in the mask allow the classification of the two classes.

## Same same, but different

Let's do the same MVPA approach again, but this time with a `LogisticRegression` classifier and a mask that only keeps the voxels around the eyes.

In [None]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()

In [None]:
masker = NiftiMasker(mask_img=get_mask('eyes'), standardize=False, detrend=False)
samples = masker.fit_transform(func)

In [None]:
%%time
cva = cross_val_score(estimator=clf,
                      X=samples,
                      y=stimuli,
                      groups=runs,
                      cv=LeaveOneGroupOut(),
                      n_jobs=-1)

In [None]:
print('Average accuracy = %.02f percent' % (cva.mean() * 100))

Hmm.. 80.47% is still great, but worse than before. We need a better technique that tells us where in head we should look. Luckily, there exists the **Searchlight** approach.

# Searchlight approach

In [None]:
from nilearn import decoding

In [None]:
# The radius is the one of the Searchlight sphere that will scan the volume
searchlight = decoding.SearchLight(
    get_mask('eyes'),
    process_mask_img=get_mask('eyes'),
    radius=5.6, n_jobs=-1,
    verbose=1, cv=LeaveOneGroupOut())

In [None]:
searchlight.fit(nb.load(func), stimuli, groups=runs)

In [None]:
# Use the fmri mean image as a surrogate of anatomical data
from nilearn import image
mean_fmri = image.mean_img(func)

In [None]:
from nilearn.plotting import plot_stat_map, plot_img, show
searchlight_img = new_img_like(mean_fmri, searchlight.scores_)

In [None]:
from nilearn.plotting import plot_glass_brain

In [None]:
plot_glass_brain(searchlight_img, threshold=0.6, cmap='bwr', black_bg=True, colorbar=True, display_mode='lyrz', vmax=0.7)