
## Inter-Subject Correlation and Inter-Subject Functional Correlation 

Last week, we had you compare the activity between the left and right hemisphere of a single participant (homotopy analyses). We found a high correlation, but we mentioned that there are confounds that mean this correlation isn't being **caused** exclusively by the stimulus. For instance, if the participant moved then voxel activity will change all across the brain at that moment (e.g., because now the fMRI is taking an acquisition that includes slightly different voxels). Another reason this isn't just stimulus driven is that if the participant wasn't doing anything (i.e., they were at rest) then there would be a correlation — the basis for resting state analyses. Today we are going to extend the logic we introduced last week to a case that addresses those confounds.

To understand the logic of this, consider that BOLD activity contains multiple components ([Figure 1a](#fig1)):
1. Task-based/stimulus-evoked signal that is reliable across participants. This is what we care about most of all.  
2. Intrinsic fluctuations in neural activity that are participant-specific. This could be signal (e.g., how that participant in particular processes the movie) or noise (e.g., the partcipants movement).  
3. Scanner or physiological noise that can be either shared or participant-specific. Resting state activity falls in this category.    

In the homotopy analyses last week, these three components couldn't be disentangled. [Figure 1b](#fig1) shows the logic of functional connectivity. Functional connectivity is a more general version of the homotopy analyses we did last week. 

Today we will use methods to eliminate #2 and #3. Speciically we will use intersubject correlation (ISC, [Hasson et al., 2004](https://doi.org/10.1126/science.1089506)) and intersubject functional correlation (ISFC, [Simony et al., 2016](https://doi.org/10.1038/ncomms12141)). ISC and ISFC help isolate #1 because it is the only component that ought to be shared across participants.

[Figure 1c](#fig1) shows the logic of ISC. Rather than correlating brain regions like functional connectivity, which preserves participant-specific activity and noise, ISC correlates between the brains of different participants in order to capture only the activity that is being evoked by the movie. In other words, the participant specific noise will not be common across participants — by definition — so the strength of correlation measures the degree to which brain activity is driven by the shared stimulus. In ISC, this correlation is done for every voxel in the brain to the matching voxel in other brains, producing a full brain map. To simplify the interpretation of ISC analyses it is typical to compare each individual participant with the average of all other participants.

[Figure 1d](#fig1) shows the logic of ISFC. For ISFC, we compute the correlation of every voxel in one participant with every other voxel in another participant (or average of other participants). This is called the full ISFC, but you can also do a partial ISFC in which you compare one voxel to all voxels in other participants. ISFC is valuable because it allows us to identify activity coupling in voxels that are not aligned across participants

[Figure 1e](#fig1) shows the full ISFC analysis. You can see the off-diagonal represents similarity of non-identical parts in the brain. Horizontal lines in the matrix are partial ISFC analyses (called ISFC maps here). The diagonal is the same as the ISC (i.e., correlation between the same voxels across participants).

<a id="fig1"></a>![image](https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fncomms12141/MediaObjects/41467_2016_Article_BFncomms12141_Fig1_HTML.jpg?as=webp)


In this notebook, we will use ISC and ISFC to identify brain regions that respond preferentially to narrative stories, rather than to a random assortment of words (replicating Simony et al., 2016). Furthermore, we will compare functional connectivity analyses to ISFC in terms of distinguishing between different tasks (e.g., resting state or narratives) like has been shown (Simony et al., 2016). 

## Goal of this script
1. To understand intersubject correlation (ISC) and it's variant: spatial ISC.
2. To understand intersubject functional correlation (ISFC).  
3. Practice visualizing and data handling
 

## Table of Contents

[1. The ISC-ISFC Workflow](#isc_isfc_wkflow)  

[2. Introduction to Pieman](#pieman)

[2. ISC](#isc)  

[3. ISFC](#isfc)  

[4. Spatial Correlation](#spat_corr)

#### Exercises
>[1](#ex1)   [2](#ex2)  [3](#ex3)  [4](#ex4)  [5](#ex5)  [6](#ex6)  [7](#ex7)  [8](#ex8)   [9](#ex9)  
>[Novel contribution](#novel)  

In [None]:
# We need to add the utils to the path
import sys
sys.path.insert(0, '..')

# Import utils
from utils import * 

# Specify where you want to save data
results_path = os.getcwd()

## 1. The ISC-ISFC workflow  <a id="isc_isfc_wkflow"></a>


The following sequence of steps are recommended for successfully running ISC and ISFC using [BrainIAK](http://brainiak.org/). ISC and ISFC are not doing anything fancy beyond just correlating voxel timecourses, so it is possible to write this code yourself; however, Brainiak provides optimized procedures that make running this faster and less memory intensive, and wrappers to deal with various possible statistical tests.

For both analyses, you need to do **data preparation** in which you take preprocessed data, a whole brain mask and return a list of voxel by time arrays, one for each participant.  

For ISC you then pass this list of arrays to the `isc` function and get a vector with length voxel, reporting the ISC for each voxel. To test ISC statistically, brainiak offers three options `phaseshift_isc`, `bootstrap_isc` and `permutation_isc`, but we only focus on the last one.

At the core of ISC's computation is the function `array_correlation` which takes in two 2d arrays (e.g., voxel by time arrays) and correlates row-wise (i.e., the vector of each row from one array is correlated with the same row in the other array) or column-wise, returning a single vector.

To run ISFC, the `isfc` function also takes in a list of arrays and correlates every voxel in one subject with every voxel in the average of the other subjects. ISFC can be evaluated with the same methods as ISC or with clustering methods, but neither will be used here.

At the core of ISFC's computation is `compute_correlation` which also takes two 2d arrays and will correlate every row with every other row, returning a 2d array for each participant.


**Self-study** <a id="ex1"></a>`compute_correlation` and `array_correlation` are faster than if you were to create a 'for loop' that iterates through each possible voxel pair and computes the correlation. Consider reading the doc strings for these functions ([array_correlation](https://brainiak.org/docs/brainiak.utils.html?highlight=array_correlation#brainiak.utils.utils.array_correlation) and [compute_correlation](https://brainiak.org/docs/brainiak.fcma.html?highlight=compute_correlation#brainiak.fcma.util.compute_correlation)) to learn the 'tricks' they pull to accelerate the computation.

## Introduction to the "Pieman" dataset <a id="pieman"></a>

We are going to use the "Pieman" dataset from [Simony et al. (2016)](https://doi.org/10.1038/ncomms12141) today. A description of the dataset is as follows:

>18 native English speakers were scanned (15 females, ages: 18–31), corresponding to the replication dataset from the Pieman study.  
>Stimuli for the experiment were generated from a 7 min real life story (["Pie Man", Jim O'Grady](https://www.youtube.com/watch?v=3nZzSUDECLo)) recorded at a live storytelling performance (["The Moth" storytelling event](https://themoth.org/), New York City). Subjects listened to the story from beginning to end (intact condition).
>In addition, subjects listened to scrambled versions of the story, which were generated by dividing the original stimulus into segments of different timescales (paragraphs and words) and then permuting the order of these segments. To generate the scrambled stimuli, the story was segmented manually by identifying the end points of each word and paragraph. Two adjacent short words were assigned to a single segment in cases where we could not separate them. Following segmentation, the intact story was scrambled at two timescales: short—‘words’ (W; 608 words, 0.7±0.5 s each) and long—‘paragraphs’ (P; 11 paragraphs, 38.1±17.6 s each). Laughter and applause were classified as single word events (4.4% of the words). Twelve seconds of neutral music and 3 s of silence preceded, and 15 s of silence followed, each playback in all conditions. These music and silence periods were discarded from all analyses.

Critically, this experiment resulted in two tasks that we will analyze separately. One is called `intact1` in which participants listened to the intact story and the other is called `word` in which the story was scrambled at the word level, making the story meaningless.

More details about the experiment may be accessed in the methods section of the paper.

In [None]:
# Here is a the video, although remember they just heard the audio
from IPython.display import YouTubeVideo

YouTubeVideo('3nZzSUDECLo')

## 2. ISC  <a id="isc"></a>

### Data File Preparation

BrainIAK has its own methods to load data. These are used here to speed up analyses, but we could use nibabel instead if we preferred. The key thing we need at the end of preprocessing is a 3d array that is time by voxel by participant.

> [load_images](https://brainiak.org/docs/brainiak.html#brainiak.isc.isc): reads data from all subjects in a list that you provide.  
> [load_boolean_mask](https://brainiak.org/docs/brainiak.html#brainiak.io.load_boolean_mask): Create a binary mask from a brain volume.  
> [mask_images](https://brainiak.org/docs/brainiak.html#brainiak.image.mask_images): Loads the brain images and masks them with the mask provided.  
> [image.MaskedMultiSubjectData.from_masked_images](https://brainiak.org/docs/brainiak.html?highlight=image%20maskedmultisubjectdata%20from_masked_images#brainiak.image.MaskedMultiSubjectData.from_masked_images): Creates a list of arrays, with each item in the list corresponding to one subject's data. This data format is accepted by the BrainIAK ISC and ISFC function.  


In [None]:
# Set up experiment metadata
print(f'Data directory is: {pieman2_dir}')

# Set up the varibale paths
dir_mask = os.path.join(pieman2_dir, 'masks')
mask_name = os.path.join(dir_mask, 'avg152T1_gray_3mm.nii.gz')

# Specify the experiment conditions
all_task_names = ['intact1','word']
all_task_des = ['intact story','word level scramble']

# How many participants are there
n_subjs_total = 18

# Assign the task labels to a number
group_assignment_dict = {task_name: i for i, task_name in enumerate(all_task_names)}

# Where do you want to store the data
dir_out = f'{results_path}/isc/'
if not os.path.exists(dir_out):
    os.makedirs(dir_out)
    print(f'Dir {dir_out} created.')

**Self-study:** In the above block strings are formatted differently then you have seen elsewhere. This style is called [f-strings](https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/) and was added to Python recently

### Making a function

We have been using functions throughout the past few notebooks, but so far the functions have all been hidden away. Python makes it very easy to define functions, and we are going to define one below. It just has the syntax:   
`def FUNCTION-NAME(ARGUMENTS):
    COMMANDS`

In [None]:
# Define the function
def get_file_names(data_dir, task_name, max_subjs=np.inf, verbose = False):
    """
    Get all the participant file names from a directory. This function uses variables defined outside of the function (e.g., data_dir)
    
    Parameters
    ----------
    data_dir [str]: the data root dir
    task_name [str]: the name of the task 
    max_subjs [int]: what is the maximum number of subjects you want to use
    verbose [bool]: Do you want to print the participant names as you are getting them
    
    Return
    ----------
    fnames [list]: file names for all subjects
    """
    
    fnames = []
    # Collect all file names 
    for subj in range(1, n_subjs_total): 
        
        # Get the participant name
        fname = f'{data_dir}/sub-{subj:03d}/func/sub-{subj:03d}-task-{task_name}.nii.gz'
        
        # If the file exists
        if os.path.exists(fname):
            # Add to the list of file names 
            fnames.append(fname)
            
            # Do you want to print the name
            if verbose: 
                print(fname) 
        
        # When you have more names then the max, quit out
        if len(fnames) >= max_subjs:  
            break
    
    # Return the variable
    return fnames

<div class="alert alert-block alert-info"><strong>Recommendation</strong>: Creating and using functions is an efficient way to program. Functions are extremely useful for many reasons and so should be used everywhere: they allow you to remove redundancy in your code (e.g., you don't have to duplicate the same code block), functions reduce the likelihood of an error since if you update the function you update all of its uses, and functions make your code much more readable. A useful tutorial on functions is found <a href="https://www.datacamp.com/community/tutorials/functions-python-tutorial">here</a>. 
    
One important thing to be aware of is how variables are shared between your workspace and a function. If you have variables in your workspace (i.e., any variables you have created in the usage of jupyter) then they will usually be accessible/usable in a function, regardless of whether they are used as input parameters. For instance, the function above doesn't define `n_subjs_total` but because it is defined in the workspace, the function can access it.
    
However, any variables you create in a function cannot be used in your workspace if you don't return them as outputs. For this reason it is easier if you keep the names of the variables in your function separate from the names in your workspace. This is turtles all the way down: if you have a function within a function then variables will be shared in the same way. </div>

Now that we have our functions set up, we can now load the data. We will load the data separately for the tasks (`intact1` vs. `word`). We are only going to load in the data for 9 participants to avoid concerns about memory resources (see below).

In [None]:
# What is the maximum number of file names you will pull out
max_subjs = 9

# Load the brain mask
brain_mask = io.load_boolean_mask(mask_name)

# We use this information later (e.g., affine)
brain_nii = nib.load(mask_name)

# Preset some variables we are going to put data into 
fnames = {}
images = {}
masked_images = {}
bold = {}
group_assignment = []
n_subjs = {}

# Load BOLD data for each task
for task_name in all_task_names: 
    
    # Load all the filenames for this task using the function defined above.
    fnames[task_name] = get_file_names(pieman2_dir, task_name, max_subjs=max_subjs)
    
    # Load all the images from those filenames.
    images[task_name] = io.load_images(fnames[task_name]) 
    
    # Mask the images.
    masked_images[task_name] = image.mask_images(images[task_name], brain_mask) 
    
    # Concatenate all of the masked images across participants  
    bold[task_name] = image.MaskedMultiSubjectData.from_masked_images(masked_images[task_name], 
                                                                      len(fnames[task_name]))
    
    # Convert nans into zeros.
    np.nan_to_num(bold[task_name], copy=False)
    
    # Compute the group assignment label.
    n_subjs_this_task = len(fnames[task_name])
    group_assignment += [group_assignment_dict[task_name]] * n_subjs_this_task # Duplicate the condition label as many times as you have participants
    n_subjs[task_name] = n_subjs_this_task
    
    # Report the summary
    print(f'\nCondition loaded: {task_name}\nShape of array (time by voxel by participant): {np.shape(bold[task_name])}')

<div class="alert alert-block alert-warning">
<strong>Memory limits</strong> We are only running this analysis on half of the participants. Be aware that running this on all 18 participants may push the limits of your memory and computational resources.
</div>

**Exercise 1:**<a id="ex1"></a> Inspect the data you just loaded and answer the following questions.<br>
a. Report the shape of `brain_mask`. <br>
**A:**<br><br>
b. How many voxels are in the brain mask? <br>
**A:**<br><br>
c. Inspect the shape of the `bold` variable. How many subjects do we have for each task condition? Do different subjects have the same number of TRs/voxels?<br>
**A:**<br><br>
d. Visualize `brain_mask` using nilearn plotting tools.<br>

In [None]:
# Insert code here

### Bargain-basement ISC

Before using BrainIAK's tools for ISC, it is helpful to do an easy to understand version of ISC first. In the code below we get three voxels — one in auditory cortex, one in precuneus and one in visual cortex — and we take the timecourse from the first participant for the `intact1` task and correlate it with the average timecourse from the other participants. This is ISC! 

Since participants were listening to the Moth in this data, we expect that there should be high ISC in the auditory and low ISC in visual cortex (i.e., there was no video so their brain activity shouldn't be synchronized in visual cortex). The precuneus should be high for the intact condition since the precuneus is thought to integrate information over long time periods into narratives.

In [None]:
# Get the voxels in the nifti space
voxel_coord = {}
voxel_coord['STC_R'] = [51, 37, 22] # This voxel is in auditory cortex (the superior temporal cortex specifically)
voxel_coord['dPCC'] = [30, 24, 34] # This voxel is in dorsal precuneus
voxel_coord['V1'] = [30, 10, 23] # This voxel is in early visual cortex

# The next step is tricky, it is an example of a different way masking matters
# We now need to take the coordinates and convert it into a vector index, since the brain data was masked into a 2d matrix
mask_coords = np.asarray(np.where(brain_mask)) # Get all the coordinates where the mask == 1 

# Cycle through the mask voxels and find the index that matches our coordinates, that is the element in the vector that matches
voxel_idx = {}
for region in voxel_coord.keys(): # Iterate through the keys (the two voxel regions)
    
    # Cycle through all voxels
    for voxel_counter in np.arange(mask_coords.shape[1]):
        
        # Does this match
        if np.all(mask_coords[:, voxel_counter] == voxel_coord[region]):
            voxel_idx[region] = voxel_counter
            
            # Quit the loop since you got it
            break
    
    # Report the matching voxel
    print('Index match for %s is %d' % (region, voxel_idx[region]))

In [None]:
# Iterate through the keys (the two voxel regions)
for region in voxel_coord.keys(): 

    # Get the time course from the 0th participant, which we are calling the left out one
    left_out_participant = bold['intact1'][:, voxel_idx[region], 0]

    # Average across the other participants
    av_participants = np.mean(bold['intact1'][:, voxel_idx[region], 1:], 1)

    # Compute the correlation
    r_val = np.corrcoef(left_out_participant, av_participants)[0, 1]

    print('Correlation of %s voxel for the first participant and the average of the others: r=%0.3f' % (region, r_val))


### ISC in Brainiak <a id="isc_compute"></a>

With that example in mind, we can now use BrainIAK's functions for computing ISC. This is simple and we can observe what the results look like. 

In [None]:
# run ISC, loop over conditions 
print('Running the ISC, this will take a while')
isc_maps = {}
for task_name in all_task_names:
    
    isc_maps[task_name] = isc(bold[task_name], pairwise=False) # setting pairwise to false means it does leave-one-subject-out
    print(f'Shape of {task_name} condition: {np.shape(isc_maps[task_name])}')

The output of ISC is a voxel by participant matrix. We need to get this back into volume space by unmasking the data. Reversing the masking process was a key step in the data handling notebook, if you aren't sure how it works review that code.

In [None]:
# Set parameters
subj_id = 0
task_name = 'intact1'
save_data = False

# Make the ISC output a volume
isc_vol = np.zeros(brain_nii.shape)

# Map the ISC data for the first participant into brain space. This is equivalent to unmasking
isc_vol[brain_mask == 1] = isc_maps[task_name][subj_id, :]

To use the plotting tools, we need the data to be in the nifti format: a 3d array doesn't have information like the voxel size that the plotting took needs.

In [None]:
# make a nii image of the isc map. It needs to have the header information to know what properties to store the data with
isc_nifti = nib.Nifti1Image(isc_vol, brain_nii.affine, brain_nii.header)

# Save the ISC data as a volume (This is sample code, if you don't want to rerun the analysis!)
if save_data: 
    isc_map_path = os.path.join(dir_out, f'ISC_{task_name}_sub{subj_id:02d}.nii.gz')
    nib.save(isc_nifti, isc_map_path)

Now we will visualize the ISC result for one participant and condition. This brain shows us where activity is correlated between participants.

In [None]:
# Plot the data as a statmap
threshold = .25

f, ax = plt.subplots(1,1, figsize = (12, 5))
plotting.plot_stat_map(isc_nifti,
                       threshold=threshold, 
                       axes=ax)
ax.set_title(f'ISC map for subject {subj_id+1}, task = {task_name}') 

**Exercise 2**:<a id="ex2"></a> Confirm that the ISC values for the selected voxels from `voxel_coord` are the same as those reported for our 'Bargain-basement' computation earlier.

In [None]:
# Insert code here

**Exercise 3:** <a id="ex3"></a> Visualize the **ISC maps averaged across all participants**, separately for each condition. For each condition, show the same cut that emphasizes what is different between the conditions. Moreover, use the same threshold and vmax for the visualizations. 

In [None]:
# Make the average ISC map

# Turn those into niftis

# Visualize them using `plotting.plot_stat_map`

This analysis was performed in volumetric space; however, nilearn makes it easy to visualize this data in surface space (assuming the alignment to MNI standard is excellent). Here's an example of surface plot.

In [None]:
# Set some plotting parameters. 
subj_id = 0 
task_name = 'intact1'
threshold = .2 
view = 'lateral'

# Make the ISC output a volume
isc_vol = np.zeros(brain_nii.shape)

# Map the ISC data for the first participant into brain space
isc_vol[brain_mask == 1] = isc_maps[task_name][subj_id, :]

# Make a nii image of the isc map 
isc_intact_1subj = nib.Nifti1Image(isc_vol, brain_nii.affine, brain_nii.header)

# Plot 
title_text = (f'Avg ISC map, {task_name} for one participant')

_=plotting.plot_img_on_surf(isc_intact_1subj, views=[view], # this shows only the lateral view
                              title=title_text,
                              inflate=True, 
                              alpha=1, darkness=1, cmap='RdYlBu_r',
                              threshold=threshold)


**Exercise 4:** <a id="ex4"></a> Using the **ISC maps averaged across participants**, visualize the surface for both conditions in both the medial and lateral views (i.e., you should have 4 plots). Make sure you use the same threshold (0.2) and vmax (0.6) for all plots, and add titles to each plot. 

In [None]:
# Insert code here

**Exercise 5:** <a id="ex5"></a> Based on your answers to the last two questions, what are some brain regions showing stronger correlation in the intact story condition than the word-level scramble condition? Use [this website](https://www.ebrains.eu/tools/human-brain-atlas) if you aren't familiar with the labels of brain regions.  What does this tell us about the processing of language? 

Hint: There's a few papers that discuss this type of contrast; this [paper](https://doi.org/10.1523/JNEUROSCI.3684-10.2011) may help.

**A:**

### ISC with statistical tests  <a id="isc_stats"></a>

BrainIAK provides several nonparametric statistical tests for ISC analysis ([Nastase et al., 2019](https://academic.oup.com/scan/article/14/6/667/5489905)). Nonparametric tests are preferred over parametric tests (i.e., converting an r value into a p value with a table) due to the inherent non-independence across participants — each subject contributes to the ISC of other subjects, violating assumptions of independence required for standard parametric tests (e.g., t-test, ANOVA). Moreover, the voxels are also not independent from each other due to spatial smoothing. 

**Self-study:** Additional resources on parametric versus nonparametric statistics [here](https://www.ibm.com/docs/en/db2woc?topic=nonparametric-background).

We are going to use permutation testing (a nonparametric method) to contrast the ISC maps in the `intact` condition with the `word scrambled` condition.

#### Permutation test
Permutation tests are used to compute a null distribution of values via the following steps:

1. Prepare the data. We have already done the hard part by making the ISC for each condition. Now we just have to concatenate the conditions. To do the concatenation, we will stack the data to make the `isc_maps_all_tasks` variable.  

2. The permutation test involves permuting the condition labels across participants to simulate the randomization of conditions. To do this, we first need to match up the condition labels with the correct data. We prepared such a list of assignments when we loaded the data and stored the information in the variable: `group_assignment`.
 
3. The next steps are executed internally in BrainIAK in the function `permutation_isc`. The steps are:    
> - For each permutation iteration: 
>> - BrainIAK permutes the group assignment for each subject (i.e., it randomly reassigns the condition labels).  
>> - A mean of the ISC values is then computed for this shuffled group for each condition.   
>> - The difference of group means is computed between each condition, for each voxel.  
> - The differences in means for each iteration are collected and form the null distribution, with a unique distribution for each voxel.   
> - With the null distribution, we compare the *real* ISC values with the distribution of *permuted* ISC values. The ranking of the real data is converted into the percentile is the *p* value in non-parametric testing. The function returns the actual observed ISC values, p-values, and optionally the null distribution for each voxel in the brain.

<div class="alert alert-block alert-warning">
    <strong>Why is the percentile the <it>p</it> value?</strong> The percentile tells us the likelihood of getting a result as extreme as the real values, given that there is no difference between conditions (meaning that the permutation does nothing). This is the same definition as a <it>p</it> value in parametric testing
</div>


In [None]:
# Concatenate ISCs from both tasks
isc_maps_all_tasks = np.vstack([isc_maps[task_name] for
                                task_name in all_task_names])

# Repor the condition labels
print(f'group_assignment: {group_assignment}')
print(f'isc_maps_all_tasks: {np.shape(isc_maps_all_tasks)}')

<div class="alert alert-block alert-warning">
    <strong>Usage of hstack and vstack:</strong> `np.hstack` stacks arrays in sequence horizontally (column wise) and vstack stacks arrays in sequence vertically (row wise). More details can be found at <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.hstack.html">np.hstack</a> and <a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.vstack.html">np.vstack</a>.
</div>
    

In [None]:
# Run permutation tests with Brainiak
n_permutations = 1000 # Normally you'd run >10**4 permutations but we are doing fewer for times sake.
summary_statistic='median' # Compare to the median of the distribution

print('Running permutations, this will take a minute')
observed, p, distribution = permutation_isc(
    isc_maps_all_tasks, 
    pairwise=False,
    group_assignment=group_assignment, 
    summary_statistic=summary_statistic,
    n_permutations=n_permutations
)

**Exercise 6:** <a id="ex6"></a> Interpret the results of the permutation test.  
**Q:** What do the outputs `observed`, `p`, `distribution` each mean? 

**A:**

**Q:** Make a plot that visualizes `p` on the brain. Specifically, set a threshold of `p<0.01`. To do this, follow the pseudo code below and refer to the Data Handling notebook on how to 'reverse' masking:

In [None]:
# Insert code here

# Flip the p values so that a low p value becomes high by doing 1 - p

# Preset a volume with zeros that is the same shape as `brain_mask`

# In the voxels where `brain_mask == 1`, add the `p` values (Hint: we referred to this as unmasking above)

# Make a nii image out of this newly created volume. (Hint: we did this step after running the ISC above)

# Visualize the nifti, setting the threshold to greater than 0.99 (which corresponds to p<0.01)

## 3. ISFC  <a id="isfc"></a>

ISFC is akin to functional connectivity (FC), in that the goal is to find out how regions are correlated to each other. Whereas FC is computed within individuals, ISFC is computed between individuals. Hence the only correlations that should be robust in ISFC are those that are present across individuals and are driven by the stimulus they are watching. This makes the connectivity information identified with more trustworthy than connections found with FC (refer back to the figure at the notebook).

In this section, we will compare FC with ISFC on the Pieman data.  At the end of the exercises, you will qualitatively replicate [Simony et al. (2016)](https://doi.org/10.1038/ncomms12141), showing that ISFC is sensitive to the experimental condition participants are in.

### Parcellate the data  <a id="isfc_parcel"></a>

ISFC in voxel space is too computationally intensive for this notebook (and for many use cases). This is because ISFC in voxel space means computing the correlation between all voxels (about 30,000) with all other voxels, for each participant. Hence, we will simplify the ISFC analysis by dividing the brain into a smaller number of parcels. We are going to use predefined ROI masks to select the voxels. When you first run this command, it is going to download the parcellation into your home directory.

In [None]:
# Load a parcellation
atlas = datasets.fetch_atlas_harvard_oxford('cort-maxprob-thr25-2mm', symmetric_split=True)
plotting.plot_roi(atlas.maps, title='Harvard-Oxford Parcellation')

# Get the labels that we will use
labels = atlas.labels[1:] # Ignore the first entry since it is 'background'

# Get the shape of the data you are going to make
n_regions = len(labels) # rm background region 
n_TRs = np.shape(bold[task_name])[0]

print(f'Number of voxels:\t {np.shape(bold[task_name][1])}')
print(f'Number of parcels:\t {n_regions}')

We will now mask the bold data according to the ROI parcels. This uses the nilearn masking method. As you can see, this works pretty well when there are many individual masks to be formed

In [None]:
# Get a masker for the atlas. 
masker_HarOxf = NiftiLabelsMasker(labels_img=atlas.maps)

# Preset voxel array
bold_HarOxf = {}
for task_name in all_task_names:
    bold_HarOxf[task_name] = np.zeros((n_TRs, n_regions, n_subjs[task_name])) 

# Transform the data to the parcel space.
for task_name in all_task_names:
    for subj_id in range(n_subjs[task_name]):
        
        # Load the nifti object
        nii_t_s = nib.load(fnames[task_name][subj_id])
        
        # Apply the masker, which involves averaging all voxels within each region
        bold_HarOxf[task_name][:,:,subj_id] = masker_HarOxf.fit_transform(nii_t_s)
       
    print('Finished loading in', task_name)        


### Compute functional connectivity 

Do all pairwise correlations between regions **within** participant.

In [None]:
# Preset volume
fc_maps_HarOxf = {}
for task_name in all_task_names:
    fc_maps_HarOxf[task_name] = np.zeros((n_regions, n_regions)) 

# Loop through task names
for task_name in all_task_names: 
    # Loop through the subjects
    for subj_id in range(n_subjs[task_name]):
        
        # The FC map for a given task & subject is the correlation between all regions' bold data, for that task. You can use np.corrcoef to do this easily
        fc_maps_HarOxf[task_name] += np.corrcoef(bold_HarOxf[task_name][:,:,subj_id].T)
        
    fc_maps_HarOxf[task_name] /= n_subjs[task_name] # now average over the number of subjects
    np.fill_diagonal(fc_maps_HarOxf[task_name], np.nan) # and make sure the diagonal is nans, since they are meaningless otherwise    

### Compute ISFC

Do all pairwise correlations between regions **between** participant.

In [None]:
isfc_maps_HarOxf = {}
for task_name in all_task_names:
    isfc_maps_HarOxf[task_name] = isfc(data=bold_HarOxf[task_name],
                                   summary_statistic='median',
                                   vectorize_isfcs=False)   

Wow that ISFC calculation was fast! Compute time takes a lot longer when you use the whole brain because you are going from 10e4 correlation computations to about 10e9 correlations. ISFC still does it fast, but it is not tractable within this notebook.

**Exercise 7:** <a id="ex7"></a> First average the FC and ISFC matrixes across participants for each condition separately. Then visualize the FC & ISFC matrices for each condition (i.e., make 4 figures). You will want to use `plt.imshow` to make these figures. Set the tick labels to be the ROI labels from the atlas (`labels`). Moreover, add axis labels explaining each axis. Add a colorbar and set the color limit to be the same across the plots so that you can compare the range. Remember, google is your friend!

In [None]:
# Insert code here


**Exercise 8:**<a id="ex8"></a> Comment on the differences between the FC and ISFC matrices for each of the conditions.

**A:**

### Plotting a connectome

Connectomes reflect the connection strengths between regions, so we can use FC/ISFC analyses as connectomes, just like we could do for FC or DTI results. Nilearn has some beautiful tools for plotting connectomes. 

`plotting.plot_connectome` takes in a node by node correlation matrix (e.g., what you get from ISFC) and a node by coordinate matrix (i.e., where each region in the atlas belongs in the brain) and then creates a connectome. Thresholds can be used to only show strong connections.

The first step is to get the coordinates of each ROI so that you have a mapping between rows and locations in the brain (in standard space)

In [None]:
# Load the atlas as a volume
atlas_vol = atlas.maps.get_fdata()

# Get the unique values from the atlas
vals = np.unique(atlas_vol)

# Iterate through all of the ROIs
coords = []
for label_id in vals:
    
    # Skip the background
    if label_id == 0:
        continue
        
    # Pull out the ROI of within the mask    
    roi_mask = (atlas_vol == label_id)
    
    # Create as a nifti object so it can be read by the cut coords algorithm
    nii = nib.Nifti1Image(roi_mask.astype('int16'), atlas.maps.affine)
    
    # Find the centre of mass of the connectome
    coords.append(plotting.find_xyz_cut_coords(nii))
    

Now plot the connectome for the ISFC data from the `intact1` condition.

In [None]:
# Plot the connectome
plotting.plot_connectome(isfc_maps_HarOxf['intact1'], 
                         coords, 
                         edge_threshold='95%');

To help you understand this figure, the dots correspond to the parcels of the atlas that we computed the ISFC in. These are in their position in the brain but the brain here is a little different than what we have looked at before, it is a glass brain. To understand it, think of it like you were looking at a glass brain with dots inside. If so you would not only see the dots on the side close to you, but also the side far away from you. In this case, when you look at the coronal view you not only see dots that are posterior but anterior too. The red lines connecting the dots are connections that pass the threshold (in our case, the top 5% of voxels). You can see that the majority of connections are between regions in the temporal lobe, specifically the auditory cortex, but there are also connections to the precuneus.

### Next steps

Outside of the scope of today's notebook is all the additional analyses you can do on ISFC data. One cool set of analyses is to do clustering to see whether a constellation of regions in the held out participants correlate with other regions in the other participants. For instance, you can use these matrices as inputs to multi-dimensional scaling or TSNE analyses. Similarly, you can assess the hierarchical nature of the clusters using dendrograms.

All of these analyses are at your finger tips with Python. Below we list some of the functions you can call:

Create a TSNE plot:  
`from sklearn.manifold import TSNE`  
`tsne = TSNE(n_components=2, metric="precomputed", random_state=0)`  
`results = tsne.fit(1 - isfc_data)`  

Do single linkage clustering and then create a dendrogram:  
`from scipy.cluster.hierarchy import linkage, dendrogram`  
`linkage_clustering = linkage(1 - isfc_data, method='ward')`  
`dn = dendrogram(linkage_clustering)`  

## 4. Spatial pattern correlation across subjects <a id="spat_corr"></a>


### Spatial inter-subject correlation  <a id="spatial_isc"></a>

We can apply the idea of inter-subject correlation to be able to ask a different question. Rather than asking *where* in the brain are participants similar — as is done with ISC — we can ask *when* in the movie participants are similar. 

Traditional ISC is computed by correlating the timecourse of voxel pairs. However, we could instead correlate the pattern across voxel activity for a specific time point. That is, how does the pattern of activity across voxels for one time point correlate with the average voxel pattern of the other participants at that time point. By doing this for each time point, we can generate a time course of these correlations to observe the general ebb and flow of coupling in brain activity across participants. 

This can be done simply by transposing the voxel and time dimensions (for a 3-D matrix, this is accomplished with a 90 degree rotation). If we have data in the format: (TRs, voxels, subjects), we can use `.transpose(1,0,2)`, where the indices refer to the dimensions of the array. This operation results in an array in the format (voxels, TRs, subjects).

You can do spatial ISC on the whole brain, but that is often less interesting than doing it on a masked portion of the brain. Below we will revisit the regions that we spelled out previously in `voxel_coord` and look at the correlations in those regions

### Why would spatial ISC be useful

Imagine an experiment (which has been done by [these authors](https://journals.sagepub.com/doi/10.1177/0956797616682029)) where participants all hear the same story but participants are given one of two interpretative frameworks while listening to the story. Specifically, half the participants think they are hearing a story about a jealous and anxious man whose wife happens to be out of the house, while the other group thinks they are hearing a story about a cuckolded man whose wife is cheating on him. 

You might expect that most of the time during the story the interpretation has no affect on the way the participants process the story. However, there will be some events in the story, such as when the wife is brought up, that the neural representations will diverge greatly. 

Spatial ISC gives you a time resolved way to pinpoint when those neural responses diverge, perhpas reflecting when our interpretations diverge

In [None]:
# Get a list of ROIs. 
roi_mask_path = os.path.join(pieman2_dir,'masks','rois')
all_roi_fpaths = glob.glob(os.path.join(roi_mask_path, '*.nii.gz'))

# Collect all ROIs 
all_roi_names = []
all_roi_nii = {}
all_roi_masker = {}
for roi_fpath in all_roi_fpaths:
    
    # Compute ROI name
    roi_fname = os.path.basename(roi_fpath)
    roi_name = roi_fname.split('.')[0]
    all_roi_names.append(roi_name)
    
    # Load roi nii file 
    roi_nii = nib.load(roi_fpath)
    all_roi_nii[roi_name] = roi_nii
    
    # Make roi maskers
    all_roi_masker[roi_name] = NiftiMasker(mask_img=roi_nii)

print(f'Path to all roi masks: {roi_mask_path}')    
print(f'Here are all ROIs:\n{all_roi_names}')

In [None]:
# Make a function to load data for one ROI
def load_roi_data(roi_name, fnames): 
    # Pick a ROI masker.
    roi_masker = all_roi_masker[roi_name]    
    
    # preset array. 
    bold_roi = {task_name:[] for i, task_name in enumerate(all_task_names)}
    
    # Gather data. 
    for task_name in all_task_names:
        for subj_id in range(n_subjs[task_name]):
            
            # Get the data for task t, subject s 
            nii = nib.load(fnames[task_name][subj_id])
            
            # Mask the data and append it to the list that is being formed
            bold_roi[task_name].append(roi_masker.fit_transform(nii))
            
        # Reformat the data to standard form, rather than list
        bold_roi[task_name] = np.transpose(np.array(bold_roi[task_name]), [1,2,0])
    return bold_roi

Compute spatial ISC on similar ROIs to what we used previously in the 'Bargain-basement ISC' section

In [None]:
# Get the ROIs that were labeled in voxel_coord earlier
roi_selected = voxel_coord.keys()
roi_selected_names = ['primary auditory cortex', 'precuneus', 'primary visual cortex']

# compute spatial ISC for all ROIs 
iscs_roi_selected = []
for j, roi_name in enumerate(roi_selected):
    print('%d %s (%s)' % (j, roi_selected_names[j], roi_name))
    
    # Load data 
    bold_roi = load_roi_data(roi_name, fnames)
    
    # Compute ISC 
    iscs_roi = {}
    for task_name in all_task_names: 
        
        # Transpose the data, so that now it is time by voxel
        transposed_data = np.transpose(bold_roi[task_name], [1,0,2])
        
        # Run ISC and store the data
        iscs_roi[task_name] = isc(transposed_data, )
    
    # Append the ISC results for this participant
    iscs_roi_selected.append(iscs_roi)

Now that you have the timecourse of ISC for the different ROIs, plot them 

In [None]:
# Plot the spatial ISC over time for each ROI

# For each ROI
for j, roi_name in enumerate(roi_selected):

    # For each task make a figure
    plt.figure()
    for i, task_name in enumerate(all_task_names): 
        
        # What is the timecourse of activity
        timecourse = np.mean(iscs_roi_selected[j][task_name], axis=0)
                             
        # Plot the data
        plt.plot(timecourse)
        
        print('Average spatial ISC for %s %s: %0.2f' % (roi_selected_names[j], task_name, np.mean(timecourse)))

    plt.ylabel('Correlation')
    plt.title('Spatial ISC, {}'. format(roi_selected_names[j]))
    plt.legend(all_task_des)   
    plt.hlines(0, 0, len(timecourse), 'k')
    plt.xlabel('TRs')

**Exercise 9:**<a id="ex9"></a> Interpret the spatial ISC results you observed above and what you can conclude from this type of analysis. Refer to the "Why would spatial ISC be useful" section if you are having trouble.

**A:** 

**Novel contribution:**<a id="novel"></a> be creative and make one new discovery by adding an analysis, visualization, or optimization.

In [None]:
# Code here

### Contributions<a id="contributions"></a>

E. Simony and U. Hasson for providing data  
C. Baldassano and C. Chen provided initial code  
M. Kumar, C. Ellis and N. Turk-Browne produced the initial notebook 4/4/18  
S. Nastase enhanced the ISC brainiak module; added the section on statistical testing   
Q. Lu added solutions; switched to S. Nastase's ISC module; replicated Lerner et al 2011 & Simony et al. 2016; added spatial ISC.   
M. Kumar edits to section introductions and explanation on permutation test.  
K.A. Norman provided suggestions on the overall content and made edits to this notebook.  
C. Ellis incorporated edits from cmhn-s19  
Q. Lu added solutions; modified the scripts using the newest brainiak, nilearn version.      
T. Yates made edits for cmhn_s21    
E. Busch edits for cmhn s22, cmhn s23  
C. Ellis edits for mmn23