In [None]:
%matplotlib inline

Get rid of warnings

In [None]:
import warnings
warnings.simplefilter("ignore")


# Decoding with ANOVA + SVM: face vs house in the Haxby dataset

This example does a simple but efficient decoding on the Haxby dataset:
using a feature selection, followed by an SVM.

## What is decoding ?

In [None]:
from IPython.display import Image
Image(filename='../images/decoding_pipeline_example.png')

A schematic representation of standard decoding workflow/pipeline. The input (data) is prepared and potentially preprocessed before being submitted to a model that then utilizes a certain metric to provide a certain output.Image taken from https://main-educational.github.io/brain_encoding_decoding/haxby_data.html

## The Haxby Dataset

The dataset comes from one of the first studies which have demonstrated the feasibility of brain decoding was the study by Haxby and colleagues (2001).
Subjects were presented with various images drawn from different categories (face, houses, cats, bottle, scrambled, scissors, shoe, chair) interleaved with resting periods. 
Subsequently a decoding model used to predict the presented categories based on the brain activity/responses. In the respective parts of this session, we will try to do the same!

## Retrieve the files of the Haxby dataset



In [None]:
# Use nilearn to fetch the data
from nilearn import datasets

# By default 2nd subject will be fetched
haxby_dataset = datasets.fetch_haxby()
func_img = haxby_dataset.func[0]

# print basic information on the dataset
print('Mask nifti image (3D) is located at: %s' % haxby_dataset.mask)
print('Functional nifti image (4D) is located at: %s' %
      func_img)

### Take a look at the fmri data we got

In [None]:
# take the mean image of the series, because it's hard to plot a 4D image
from nilearn.image import mean_img
func_image_mean = mean_img(func_img)

from nilearn.plotting import view_img
view_img(func_image_mean, cmap='magma', symmetric_cmap=False)

Note that the image is not exactly in MNI space

### Take a look at the brain mask we got

For this, we overlay it on the anatomical image. Note that the correspondence is poor, but making will most likely remove ou-of-brain regions, which is a good thing. 

In [None]:
from nilearn.plotting import plot_roi
plot_roi( haxby_dataset.mask, bg_img=haxby_dataset.anat[0],
                  cmap='Paired', dim=-1)

### Take a look at the stimuli used.

In [None]:
# For that, you need to get the stimulus information
import matplotlib.pyplot as plt
haxby_dataset_ = datasets.fetch_haxby(subjects=[], fetch_stimuli=True)
stimulus_information = haxby_dataset_.stimuli

# Read the stimulus images and plot them
for stim_type in stimulus_information:
  # skip control images, there are too many
  if stim_type != 'controls':

     file_names = stimulus_information[stim_type]
     file_names = file_names[0:16]
     fig, axes = plt.subplots(4, 4)
     fig.suptitle(stim_type)

     for img_path, ax in zip(file_names, axes.ravel()):
         ax.imshow(plt.imread(img_path), cmap=plt.cm.gray)

     for ax in axes.ravel():
         ax.axis("off")

plt.show()

## Load the behavioral data

In [None]:
# Load target information as string and give a numerical identifier to each
import pandas as pd
behavioral = pd.read_csv(haxby_dataset.session_target[0], sep=" ")
conditions = behavioral['labels']
print(conditions.unique())

In [None]:
# make this a BIDS-compatible events file for plotting
# we have no information of event duration, so we assume that it is 1 (in TR units)
event_dictionary = {'onset': conditions.index,
                    'trial_type':conditions.values,
                    'duration': [1]* len(conditions)}
events = pd.DataFrame(event_dictionary )

# plot the event structure
from nilearn.plotting import plot_event, show
plot_event(events, figsize=(15, 4))
show()

In [None]:
# Restrict the analysis to faces and places
from nilearn.image import index_img
condition_mask = behavioral['labels'].isin(['face', 'house'])
conditions = conditions[condition_mask]
func_img = index_img(func_img, condition_mask)

# Confirm that we now have 2 conditions
print(conditions.unique())

# The number of the session is stored in the CSV file giving the behavioral
# data. We have to apply our session mask, to select only faces and houses.
session_label = behavioral['chunks'][condition_mask]

# plot that stuff
event_dictionary = {'onset': conditions.index,
                    'trial_type':conditions.values,
                    'duration': [1]* len(conditions)}
events = pd.DataFrame(event_dictionary)
plot_event(events, figsize=(15, 4))
show()

Note: We directly associate labels with images, forgetting about hemodynamic delay (!) Well, this is what Haxby did. It is not fully unreasonable, given of the block structure of the design.

## ANOVA pipeline with :class:`nilearn.decoding.Decoder` object

Nilearn Decoder object aims to provide smooth user experience by acting as a
pipeline of several tasks: preprocessing with NiftiMasker, reducing dimension
by selecting only relevant features with ANOVA -- a classical univariate
feature selection based on F-test, and then decoding with different types of
estimators (in this example is Support Vector Machine with a linear kernel)
on nested cross-validation.



### What is a SVC ?

In [None]:
Image(filename='../images/optimal-hyperplane.png')

A Support Vector Classifier aims at finding an optimal hyperplane to separate two classes in high-dimensional space, while maximizing the margin. Image from the scikit-learn SVM documentation under BSD 3-Clause license. Note that the above image is misleading because in reality, images live in a very high-dimensional space (= the number of voxels), which is barely populated with the few samples we provided. Yet the intuition of an "optimal hyperplane" remains valid.

### What is an ANOVA ?

It is a statistic that measures how strongly the signal in the voxel is explained by associated stimulus. Tehcnically, the statistic is analogous to the F statistic used in parametric mapping: the voxels with high F value are those you should retain to inform the classifier !

### Why do an ANOVA-based voxel selection followed by an SVC ?

There are 2 main reasons to do it:
* Fitting an SVC is costly, hence fitting it on reduced data will save computation time
* It can also benefit accuracy, because it amounts to removing noisy voxels that may degrade (a bit) classifier performance.

In [None]:
from nilearn.decoding import Decoder
# Here screening_percentile is set to 5 percent
screening_percentile = 5
# since we work on minimally preprocessed data, smoothing should be beneficial
smoothing_fwhm = 4
# For decoding, standardizing is often very important. Hence, standardize=True

decoder = Decoder(estimator='svc', 
                  mask=haxby_dataset.mask, 
                  smoothing_fwhm=smoothing_fwhm,
                  standardize=True, 
                  screening_percentile=screening_percentile,
                  scoring='accuracy')

## Fit the decoder and predict



In [None]:
decoder.fit(func_img, conditions)
y_pred = decoder.predict(func_img)

Note that for this classification task both classes contain the same number of samples (the problem is balanced). Then, we can use accuracy to measure the performance of the decoder. This is done by defining accuracy as the scoring. Let’s measure the prediction accuracy:

In [None]:
print((y_pred == conditions).sum() / float(len(conditions)))

This prediction accuracy score is meaningless. Why?

## Obtain prediction scores via cross validation
Define the cross-validation scheme used for validation. Here we use a
LeaveOneGroupOut cross-validation on the session group which corresponds to a
leave a session out scheme, then pass the cross-validator object to the cv
parameter of decoder.leave-one-session-out For more details please take a
look at:https://nilearn.github.io/dev/auto_examples/00_tutorials/plot_decoding_tutorial.html#sphx-glr-auto-examples-00-tutorials-plot-decoding-tutorial-py

In [None]:
from sklearn.model_selection import LeaveOneGroupOut
cv = LeaveOneGroupOut()

decoder = Decoder(estimator='svc',
                  mask=haxby_dataset.mask,
                  standardize=True,
                  screening_percentile=5,
                  scoring='accuracy',
                  cv=cv)

# Compute the prediction accuracy for the different folds (i.e. session)
# now we need to provide the groups information ta fitting time
decoder.fit(func_img, conditions, groups=session_label)

# Print the CV scores
print(decoder.cv_scores_['face'])

There exist other cross validation schemes: 5-fold, Shiffle Split etc. Note that it is advisable that the cross-validation structure be consistent with data organization: e.g. LeaveOneGroupout, or LeavePGroupOut, where groups can correspond to runs, sessions or individuals. The zoology of cross-validation models  can be found in https://scikit-learn.org/stable/modules/cross_validation.html

### A note on hyperparameters

Note that when no hyperparameters are specified, the classifier will used default ones, e.g. C=100

In [None]:
decoder?

In [None]:
print(decoder.cv_params_['face'])

One may instead provide a grid of parameters, among which the classifier will pick the best ones with nested cross-validation. This will impact accruacy.

In [None]:
param_grid = {'C': [1, 10., 100., 1000., 10000.]}
decoder = Decoder(estimator='svc', 
                  mask=haxby_dataset.mask,
                  standardize=True,
                  screening_percentile=5,
                  scoring='accuracy',
                  cv=cv,
                  param_grid=param_grid)
decoder.fit(func_img, conditions, groups=session_label)
print(decoder.cv_scores_['face'])
print(decoder.cv_params_['face'])

## Visualize the results
Look at the SVC's discriminating weights using
:class:`nilearn.plotting.plot_stat_map`



In [None]:
weight_img = decoder.coef_img_['face']
from nilearn.plotting import plot_stat_map, show
plot_stat_map(weight_img,
              bg_img=haxby_dataset.anat[0],
              title='SVM weights',
              dim=-1)
show()

Or we can plot the weights using :class:`nilearn.plotting.view_img` as a
dynamic html viewer



In [None]:
from nilearn.plotting import view_img
view_img(weight_img,
         bg_img=haxby_dataset.anat[0],
         title="SVM weights",
         dim=-1)

Saving the results as a Nifti file may also be important



In [None]:
weight_img.to_filename('haxby_face_vs_house.nii')

# Getting the Chance-level accuracy for this data

Does the model above perform better than chance? To answer this question, we measure a score at random using simple strategies that are implemented in the nilearn.decoding.Decoder object. This is useful to inspect the decoding performance by comparing to a score at chance.

Let’s define a object with Dummy estimator replacing ‘svc’ for classification setting. This object initializes estimator with default dummy strategy.

In [None]:
dummy_decoder = Decoder(estimator='dummy_classifier', mask=haxby_dataset.mask, cv=cv)
dummy_decoder.fit(func_img, conditions, groups=session_label)

# Now, we can compare these scores by simply taking a mean over folds
print(dummy_decoder.cv_scores_['face'])