# 07 -  Second Level Analysis

## Download data and install dependencies

For the group aissgnments, we will provide you with single-subject contrast maps based on which you can perform the second level (or: group level) analysis and report the results. In this notebook, we will work with data included in Nilearn, specifically using the ```fetch_localizer_contrasts()``` function, which qives access to a wide range of functional localizer contrasts (for further infos see [here](https://osf.io/vhtf6)).

In [None]:
!pip install nilearn

In [None]:
import numpy as np
import pandas as pd
from nilearn import datasets, plotting, image
from pprint import pprint

For illustration purposes, we will only the right vs. left button press for a total of 20 participants.

In [None]:
n_subjects = 20

con_left_right = datasets.fetch_localizer_contrasts(["right vs left button press"],
                                                    n_subjects)

In [None]:
print(f"data is stored in: {datasets.get_data_dirs()[0]}")

Now we need to find a way to access the respective contrast maps (one contrast for each subject). The ```glob``` module from the Python standard library can help us here. It provides a nice way to list the contens of given paths:

Now we need to find a way to access the respective contrast maps (one contrast for each subject). The ```glob``` module from the Python standard library can help us here. It provides a nice way to list the contens of given paths:

In [None]:
from glob import glob

cmaps = sorted(
    glob(f"{datasets.get_data_dirs()[0]}/brainomics_localizer/brainomics_data/**/*.nii.gz",
         recursive=True)
)

pprint(cmaps)
print(f"\nThere are {len(cmaps)} contrast maps")

Before we move on, let's define a BIDS style subject list (i.e., "sub-01", "sub-02" etc.). We can do this in one line using a [List comprehension](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) and some [string formatting](https://docs.python.org/3/reference/lexical_analysis.html#f-strings):

In [None]:
subject_list = [f"sub-{i:02d}" for i in range(1, n_subjects + 1)]
pprint(subject_list)

## Basic Second-level model

To remind you, in the first-level analysis we summarized the data using a linear contrasts of our predictors (or: regressors). This was done *per* subject and allows us to model the experimental design. Now, the goal is to use the resulting contrast maps to summarize the evidence over all subjects, increasing statistical power.


### Set up Model

Now that we our contrast maps in place, we have to define a design matrix (i.e., our independent variables) for the given statistical test we want to perform. The design matrix should be specified as a ```pandas``` dataframe. Also, it should have as many rows as there are contrast maps (one contrast map per subject; the respective contrast values of each voxel will serve as the dependent variables; note that we will follow a mass univariate approach just as in the first-level analysis).

For now, we will only include an intercept in our model.

In [None]:
# one sample t-test: only intercept needed

design_matrix = pd.DataFrame([1] * n_subjects,
                             columns = ["intercept"],
                             index = subject_list)

design_matrix

In [None]:
from nilearn.glm.second_level import SecondLevelModel

second_level = SecondLevelModel(n_jobs=2)
second_level = second_level.fit(cmaps, design_matrix = design_matrix)

Great, now that we have fitted our second level design matrix to the first-level contrast maps, we can go ahead and calculate a second-level contrast. Say we want to know whether a right vs. a left button press elicits a statistically significant activation across all subjects. That is, we want to compute the average group level contrast of "right vs. left button press". We can do this by evaluating the intercept contrast:

In [None]:
right_left_avg = second_level.compute_contrast(second_level_contrast = "intercept",
                                               output_type = "z_score")

Let's have a look at the resulting statistical map. To get a better feeling will set an arbitrary threshold of 3:

In [None]:
plotting.plot_stat_map(right_left_avg, threshold=3)

### Hypothetical group comparison

Let's say we want to compare older vs. younger adults (i.e., compare two groups). To do so, we have to specify our design matrix in a specific way. For illustration purposes, let's just assume the first 10 subject of the localizer dataset are young adults, while the latter 10 subjects are older adults.

We start by including an intercept (just as we did above)

In [None]:
design_matrix_groups = pd.DataFrame([1] * n_subjects,
                                    columns = ["intercept"],
                                    index = subject_list)

As a next step, we initialize columns that will indicate group membership. That is, we will create a new column for each group.

In [None]:
# start by initializing the group dummy columns
design_matrix_groups["group_young"] = 0
design_matrix_groups["group_old"]   = 0

In [None]:
design_matrix_groups

Next, we will define the group membership. Remember that the rows of our design matrix correspond to the respective contrast maps. We know that the first 10 subjects (i.e., the first 10 contrast maps) are young adudlts, so we will set the first 10 entries of the column ```group_young``` to 1, the remaining rows will stay 0. Analogously, we set the last 10 entries of the ```group_old``` to 1.

Note: It is very important that the order of the contrast maps matches the respective entries in the design matrix (i.e., make sure that the first 10 contrast maps belong to young adults).

In [None]:
design_matrix_groups.loc[subject_list[:10], "group_young"] = 1
design_matrix_groups.loc[subject_list[10:], "group_old"] = 1

In [None]:
design_matrix_groups

In [None]:
second_level = SecondLevelModel(n_jobs=2)
second_level = second_level.fit(cmaps, design_matrix = design_matrix_groups)

In practise, information regarding group membership or other participant-specific data is often stored in a separate [participants file](https://bids-specification.readthedocs.io/en/stable/03-modality-agnostic-files.html#participants-file). Thus, you would use this file to set up a second level design matrix as above.

### Define second level contrasts

Now that we have fitted the model to the data, we will have compute some contrasts in order to make inferences about the data (that is, we assign weights to our parameters). For this, we will set up a contrast matrix.

We start by defining an identity matrix (using ```np.eye```) with a shape corresponding to the number of columns in our design matrix (we could also write out the contrast vectors manually as we did in the first level analysis notebook)

In [None]:
contrast_matrix = np.eye(design_matrix_groups.shape[1])

In [None]:
contrast_matrix

With this contrast matrix we could test three contrasts. The first contrast (first row in the contrast matrix) would test the average effect across all subjects (intercept set to 1). The second one would test the average in the group of younger adults. Finally, the third one would test the average in the group of younger adults.

For the sake of clarity, let's define a dictionary in which we can assign a verbal label to the contrasts. Here, we will also use a list comprehension:

In [None]:
contrasts = dict(
    [(column, contrast_matrix[i]) for i, column in enumerate(design_matrix_groups.columns)]
)

In [None]:
pprint(contrasts)

Cool, but what if we are interested in comparing younger and older adults? For this we would have to define further contrast(s).

Just as in the first level analysis we have to set one group (or condition in the 1st level) to 1 and the other one to -1. We can do this by simply subtracting the contrasts from one another:

In [None]:
contrasts['group_old-young'] =  contrasts['group_old'] - contrasts['group_young']
contrasts['group_young-old'] = -contrasts['group_old'] + contrasts['group_young']

In [None]:
pprint(contrasts)

Now that we have defined our contrasts, we can go ahead an acutally calculate them. We will do this using a simple ```for``` loop:

In [None]:
cmaps_second = {con: None for con in contrasts}

for con in contrasts:
    print(f'\nRunning {con}...')
    print(f' - Calculating contrast...')
    res = second_level.compute_contrast(contrasts[con], output_type='z_score')
    cmaps_second[con] = res
    print( 'done.')

### Correcting for multiple comparisons

As in the first level analysis, we need to correct for multiple comparisons in the second level analysis. Before we do so, let's have a look at the raw z-map (or: the untresholded map).

*Since the group comparisons were just hypothetical we will only look at the intercept contrast, i.e., the grand average*

In [None]:
disp  = plotting.plot_stat_map(cmaps_second["intercept"],
                               title='Raw z map')

Now, let's correct for mulitple comparisons. As you know there are different approaches to do is. Here, we will just look at the *false discovery rate* (which doesn't mean this is the preferred approach for every research question). Specifically, we will use an alpha level of 0.05 and set the cluster-defining threshold to an arbitrary number of 10 voxels.

In [None]:
from nilearn.glm import threshold_stats_img

thresholded_map, threshold = threshold_stats_img(
    cmaps_second["intercept"],
    alpha=.05, height_control='fdr', cluster_threshold=10)

print(f"The FDR=.05 threshold is {round(threshold, 3)}")

The FDR=.05 threshold is 3.197


In [None]:
plotting.plot_stat_map(thresholded_map, cut_coords=disp.cut_coords,
                       title="Thresholded z map, expected fdr = .05, clusters = 10",
                       threshold=threshold)

Again, there are different approaches to correct for multiple comparisons (for example, non-parametric approches such as permutation testing are becoming more and more common). Check out the [Nilearn user guide](https://nilearn.github.io/dev/glm/second_level_model.html#multiple-comparisons-correction) to see which routines are implemented in the package.

## Store objects

Since setting up and fitting models as well computing contrasts can take some time, it would nice to be able to preserve the results of these computations (especially when working in Colab). Luckily, the ```pickle``` module from Python's standard library can help us here. It provides a way to save Python objects to files.

In [None]:
import pickle

Objects can be stored by using the ```dump``` function. All it takes is the name of the object we want to save as well as the file name (which needs to be passed to the ```open``` function; here, ```wb``` stands for "write binary")

In [None]:
pickle.dump(second_level, open("/content/second_level.pkl", "wb"))

In [None]:
!ls /content

sample_data  second_level.pkl


Next, we unpickle the object - here, we use ```rb``` (i.e., read binary) as the parameter to the ```open``` function

In [None]:
secondlvl_pkl = pickle.load(open("/content/second_level.pkl", "rb"))

After unpickling the object, we have access to the usual attributes and methods and parameters of the second level object. For example, it still contains our previously defined design matrix:

In [None]:
secondlvl_pkl.design_matrix_

As we have pickled the already fitted object, we can go ahead and evaluate some contrasts (without fitting the model again):

In [None]:
right_left_pkl = secondlvl_pkl.compute_contrast(
    second_level_contrast = "intercept", output_type = "z_score")

In [None]:
plotting.plot_stat_map(right_left_pkl, threshold=3)

## Resources

[Nilearn user guide on second level models](https://nilearn.github.io/dev/glm/second_level_model.html#)
Here you can find links to different tutorials covering how to perform second level analyses with Nilearn. Included are (among others) overviews on:


- [Statistical testing of a second-level analysis](https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_thresholding.html#sphx-glr-auto-examples-05-glm-second-level-plot-thresholding-py)

- [Basic one sample t test](https://nilearn.github.io/dev/auto_examples/05_glm_second_level/plot_second_level_one_sample_test.html)
  - includes different strategies for multiple comparison correction, e.g. non-parametric inference