# Searchlights <a name="searchlights"></a>

In univariate analyses there is a distinction between whole-brain and [ROI-based](https://doi.org/10.1093/scan/nsm006) analyses. When done correctly, whole-brain univariate analyses allow you to identify, in an unbiased and data-driven way, regions of the brain that differ between experimental conditions. Compare this to what we have done with multivariate analyses (e.g., classification, RSA): up to this point we have taken ROIs (e.g., lateral occipital cortex) and looked at the pattern of activity across voxels. Using these procedures we have been able to determine whether these voxels collectively contain information about different conditions. However, throughout these analyses it has been unclear whether a subset of these voxels have been driving performance. The weights of a classifier are not directly informative of the influence of individual voxels. In other words, our multivariate analyses have been unable to say *where* in the brain there are representations that distinguish between conditions. 

A searchlight is a moving window space that exhaustively searches the brain in order to localize representations. Running a searchlight is computationally intensive because it involves running a separate multivariate analysis for every voxel in the brain. Fortunately, [BrainIAK](http://brainiak.org/) (see introductory [article](https://apertureneuropub.cloud68.co/articles/42/)) contains a procedure that efficiently carves up brain data into appropriate chunks and distributes them across the computational resources available. 

For more information on the searchlight technique, read this comprehensive [NeuroImage Comment](https://doi.org/10.1016/j.neuroimage.2013.03.041) which also includes citations to landmark papers on the topic. 

We will be using the Sherlock dataset today. Review details about it from week 2 if you need a reminder.

## Goal of this script

1. Learn how to perform a whole-brain searchlight.  
2. Learn how to replace the kernel used inside the searchlight.  
3. Use batch scripts, SLURM, and MPI to run searchlight on compute clusters.
4. Perform multiple comparisons correction

## Table of contents

1. [Searchlight workflow](#sl_wf)  
2. [Running searchlight analyses on a cluster](#submitting_searchlights)  
3. [Multiple comparisons correction](#multi)

Exercises

>[1](#ex1)   [2](#ex2)  [3](#ex3)  [4](#ex4)  [5](#ex5)  [6](#ex6)  [7](#ex7)  [8](#ex8)  [9](#ex9) [10](#ex10) [11](#ex11)

[Novel Contribution](#novel)


## 1. Searchlight workflow <a id="sl_wf"></a>

Running a searchlight is computationally intensive and complex, involving multiple steps. To show how the searchlight functionality in BrainIAK works, we will walk through the steps needed to perform a simple searchlight that can be run in this notebook before moving onto a real example that requires submitting the job to a cluster, where it will have access to more computational resources. This simple workflow takes the following steps:

>1. [Data preparation](#data_prep)
>2. [Set the searchlight parameters](#set_param)
>3. [Create the searchlight object](#create_obj)
>4. [Create our classification kernel](#create_kernel)
>5. [Execute the searchlight](#exec_sl)

However, before we start, there are a few things that you should know about searchlights. Think of a searchlight as a processing step in which you pull out a certain size chunk of your data and then perform a kernel operation (specified by a function you write). You can use any kernel on this chunk of data. Critically, **the searchlight function does not specify the analysis you want to perform; all it does is carve up the data.**

**Computational demand and parallelization**

As mentioned before, searchlights are computationally intensive, so we need to be aware of the kind of computational burden this analysis imposes. With each subject, we want to perform the operation for each of their 1000s of brain voxels. If we were to run this serially (meaning, a single searchlight at a time), even if each operation only took *1s to complete*, the analysis would take multiple days for *just one* participant. With more complicated kernels, the full analysis could take months or years -- far beyond the due date for this assignment! In this notebook we are only going to consider fast/tractable use cases, but you will see even then how much faster searchlight analyses are.

Fortunately, the searchlight function in BrainIAK intelligently parallelizes your data to give you a considerable and scalable speed boost. Parallelization is the idea that when two or more computational tasks can be completed independently (because they don't interact in any way), these tasks can be run simultaneously on different cores (Note: we refer to cores as our computational units of serial processing, although there are other parallelizations available within-core, such as threads or even multiple instructions within thread). The nice thing about parallelization is that it is scalable: in general, a job parallelized across 10 cores will run about 10 times faster than on 1 core. For reference, your computer likely has 4-16 cores, so you could speed up processing if you recruited all of these resources (and shut down all other types of background processing). 
<div class="alert alert-block alert-info">
So remember, the two main things that determine the speed of a searchlight: **the kernel algorithm** and the **amount of parallelization**.
</div>

**How does the BrainIAK searchlight work?**

<div class="alert alert-block alert-info"> This paragraph contains critical information that people often misunderstand about searchlights, so read it carefully</div>
To analyze data efficiently, it is important to understand how the Searchlight function in BrainIAK works. You must provide the function with a numpy volume of 4D data and a binary mask of brain voxels, which tells the function which of the voxels in that 4D volume correspond to voxels inside the brain. The searchlight code simply searches for every voxel in the mask that is equal to 1 and then centers a searchlight around this.  You specify the number of voxels that are used in each searchlight by setting a radius parameter. This means that for every value of 1 in your mask, the searchlight function will apply your kernel function to the corresponding voxel (and those around it) in the 4D volume. Hence when writing the kernel you need to keep in mind that the input data the function receives are not the size of the original data but instead the size of the searchlight radius. In other words, you might give the searchlight function a brain and mask volume that is 64x64x64 but each kernel operation only runs on a data and mask volume that is something like 3x3x3 (where the size depends on your searchlight radius, to be explained below). 

**How to start using a searchlight? Small!**

When getting used to searchlights it is encouraged that you scale up your code gently. This is to prevent the possibility that you run a searchlight that takes a very long time to finish only to find there was a small error (which happens a lot). Instead it is better to write a simple function that runs quickly so you can check if your code works properly before scaling up. A simple workflow for testing your searchlight (and estimating how long your searchlight would take on a bigger dataset) would be this:

1. Create a brain mask with a small cluster of voxels and run the searchlight interactively (like we are doing now, using a notebook) to check whether the code works.
2. Use timestamps to extract the execution time.  
3. Print the number of voxels that are passed to the searchlight kernel.  
4. Run the searchlight as a job on the smallest unit of real data you have (a single run or single participant).
5. Check the runtime and memory usage of this searchlight.

Taking our own advice, we are going to write a searchlight script below to perform a very simple searchlight. In fact, we are only going to run this searchlight on a few voxels in the brain so that we can see whether our code is working and how long each kernel operation would take. After we have familiarized ourselves with the way searchlights work on a small scale, we will then graduate to full-scale analyses using batch scripts and cluster computing.

In [None]:
# Import utils
import sys
sys.path.insert(0, '..')
from utils import * 

# Get the specific name of the searchlight function for ease
from brainiak.searchlight.searchlight import Searchlight

# Get the searchlight utility functions
from searchlight_utils import *

# Use for inspecting code
import inspect

%autosave 5

# Preset the some information about the experiment and analysis. These won't necessarily make sense now, but we will refer to them later
TR_duration = 1.5 # How many seconds is each TR
first_segment_duration = 946 # How long was the first movie segment
ppt_num = 17 # How many participants are there?

### Data preparation <a id="data_prep"></a>
Prepare data for a single participant using a similar pipeline to what we used to prepare the data for RSA. The diffrence is that the data needs to stay as a 4D volume for Searchlight, rather than switching to a 2d voxel by time array. The script we are using to load in the data is called `prepare_sherlock_4D`. This function is stored in the script called `searchlight_utils.py`. Below, we print out the contents of this function.

<div class="alert alert-block alert-info">
Why did we make a separate script to run this function, rather than do it here? This function is going to be used later by the `searchlight.py` script, and so we want to make sure that both this script and the notebook are doing the same thing. Plus it reduces the clutter.
</div>

In [None]:
print(inspect.getsource(prepare_sherlock_4D))

In [None]:
# Now load the data
ppt = 'sub-01' # Specify the participant to load

# Generate the 4D volume, mask and affine matrix
func_vol, whole_brain_mask, affine = prepare_sherlock_4D(ppt)

# Since we are analyzing only the second half of data at the moment, we can crop it here
func_vol_2nd = func_vol[:, :, :, first_segment_duration:]

<a id="ex1"></a>
**Exercise 1:** Explain why the input to a searchlight analysis is 4D rather than 2D? 

**A:**

For our initial searchlight testing, we are going to do the time-point by time-point similarity comparison to Alexnet, like we did at the end of the RSA notebook. Hence we need to load in the off-diagonal of the model

In [None]:
model_vec = np.load('alex_fc6_rsm.npy')

### Set searchlight parameters <a id="set_param"></a>

To run the [searchlight](http://brainiak.org/docs/brainiak.searchlight.html) function in BrainIAK you need the following variables/parameters:  

1. **data** = The brain data as a 4D volume.  
2. **mask** = A binary mask specifying the "center" voxels in the brain around which you want to perform searchlight analyses. A searchlight will be drawn around every voxel in this mask with the value of 1. Hence, if you chose to use the whole-brain mask as the mask for the searchlight procedure, the searchlight may include voxels outside of your mask when the "center" voxel is at the border of the mask. In your kernel function you can set a threshold to include these voxels (e.g., you could require that each searchlight includes a certain number of voxels).  
3. **bcvar** = An optional variable (can be a list, numpy array, dictionary, etc.) that you want to **b**[road]**c**[ast] in your searchlight kernel. In our case, we are going to broadcast the Alexnet model vector but later we are going to broadcast the condition labels so that we can run a classifier. If you don't need to broadcast anything set this to 'None'.  
4. **sl_rad** = The size of the searchlight's radius, excluding the center voxel. This means the total volume size of the searchlight, if using a cube, is defined as: ((2 * sl_rad) + 1) ^ 3.  
5. **max_blk_edge** = When the searchlight function carves the data up into chunks, it doesn't distribute only a single searchlight's worth of data to each core. Instead, it creates a block of data, with the edge length specified by this variable, which determines the number of searchlights to run within a job. The larger this number is the more memory efficient, but there are slight losses in parallelization speed.  
6. **pool_size** = Maximum number of cores running on a block (typically 1).  

<a id="ex2"></a>
**Exercise 2:** BrainIAK searchlights can come in many shapes. What shapes does BrainIAK support? What variable sets the searchlight shape for BrainIAK? (*Hint:* Google can help)

**A:**

<a id="ex3"></a>
**Exercise 3:** How many voxels would be in a **cube** searchlight if you set `sl_rad = 4`? (Please still use `sl_rad = 1` in the following exercises, we just want you to state the answer.)

**A:**

Let's make a mask of the subset of voxels we want to run our searchlight while we are testing it out (remember the advice from earlier: start small)

In [None]:
# Make a mask of only a few voxels
small_mask = np.zeros(whole_brain_mask.shape) # Specify a mask that is the same size as the brain 
small_mask[28:32, 33:37, 28:32] = 1 # Specify the coordinates you want to make the small mask in.

# Visualize the small mask
small_mask_nii = nib.Nifti1Image(small_mask, affine)  # Create the volume image.
nilearn.plotting.view_img(small_mask_nii)

The mask we just made is a confusing concept so I want to be really explicit about what happened here. When you normally run a searchlight — like what we will do later on — the mask you will use is the whole brain, and the searchlight will run through each voxel in the brain. We don't have time for that in the notebook, so we are going to only run the searchlight on a small mask that sits within the brain. This small mask is purely for instruction, and would not be used in a proper searchlight.

Now let's specify the other parameters we need for the searchlight

In [None]:
# Preset the variables to be used in the searchlight
data = func_vol_2nd # What data to load in 
mask = small_mask # The small mask containing only one voxel that you want to run this analysis in 
bcvar = model_vec # The model to do the comparison to brain data with
sl_rad = 1 # Specify the radius of the searchlight
max_blk_edge = 5 # Specify the size of the chunk to be handed to the MPI
pool_size = 1 # How many cores per chunk
kernel = calc_rsa # What is the function you are going to perform inside the kernel

### Create Searchlight Object  <a id="create_obj"></a>

Now that you have the data ready, we need to do the first step to set up the searchlight. In particular, we need to create the `Searchlight` object and then call the functions to distribute this information to the cluster.

In [None]:
# Create the searchlight object
sl_rsa = Searchlight(sl_rad=sl_rad, max_blk_edge=max_blk_edge)
print("Setup searchlight inputs")
print("Input data shape: ", data.shape)
print("Input mask shape: ", mask.shape)

# Distribute the information to the searchlights (preparing it to run)
sl_rsa.distribute([data], mask)

# Data that is needed for all searchlights is sent to all cores via the sl.broadcast function. In this example, we are sending the model responses
sl_rsa.broadcast(bcvar)

### Define the function (AKA "kernel") that needs to be computed <a id="create_kernel"></a>

Every searchlight needs a function to run, called a "kernel." This is the function that you want to measure/classify your data with. This could perform classification, RSA, or any other computation that you wish.

*Side note:* The word "kernel" is used in multiple mathematical/computational contexts. In this notebook, we mainly use kernel to refer to the function that is passed to the searchlight object.

The function is also defined in the `searchlight_utils.py` script but we print it here so you can see it.

In [None]:
print(inspect.getsource(calc_rsa))

### Execute the searchlight <a id="exec_sl"></a>

We execute the searchlight and save the results in brain space. Each searchlight result is assigned to the voxel that the searchlight is centered on. 


In [None]:
# Start the clock to time searchlight.
begin_time = time.time()

print("Begin Searchlight")
start_time = time.time()
sl_result = sl_rsa.run_searchlight(kernel, pool_size=pool_size)
end_time = time.time()
print("End Searchlight")

# How long did the searchlight take
print('\nSearchlight duration: %0.4f seconds' % (end_time - start_time))
print("Searchlight mask shape: ", sl_result.shape)


<a id="ex4"></a>
**Exercise 4:** If you ran this searchlight on the whole-brain, how long would it take? To answer this, you need to know the total duration of the searchlights that were just run (which is printed above), the number of searchlights that were run (you will want to read on to get that number) and the number of voxels that are in the `whole_brain_mask`

**A:** 

### Thinking about whole brain masks and searchlights

Let's look at the results of the searchlight. In the visualization below, you will see a blob of activity in the middle of the brain. This is because the `small_mask` only included a small part of the image.

In [None]:
# Save the results to a .nii file.
sl_result = sl_result.astype('double')  # Convert the output into a precision format that can be used by other applications.
sl_nii = nib.Nifti1Image(sl_result, affine)  # Create the volume image.

nilearn.plotting.view_img(sl_nii)

In this example we have used `small_mask` to simplify the computation but in proper searchlights we still have the concepts of the searchlights and the wholebrain mask, which are very easily confused, so let's spend some time thinking about them to tease them apart. 

The output of the Searchlight function was a 3d volume with the same dimensions as the brain data. Most of the volume is filled with zeroes, since we didn't run the searchlight on most parts of the brain. You can see that in the figure above since only the middle of the brain has a blob of activity. This happened, because we ONLY ran the Searchlight function on voxels within `small_mask`. In the section below, we print the results of the searchlights for just those voxels in the small mask.

In [None]:
# Get the results from the searchlight output and store them in a volume that is the same size as the small mask
searchlight_voxels = np.zeros((4, 4, 4))
searchlight_voxels[searchlight_voxels == 0] = sl_result[small_mask == 1] 

# Report results of running the searchlight
print('Searchlight results:\n', searchlight_voxels)

print('\nResult for x=28 y=34 z=29: %0.7f' % (sl_result[28, 34, 29]))

Let's spend a minute to really understand what you are seeing. `searchlight_voxels` is a 3d volume (4x4x4), which we are printing. To show a 3d array, Python prints out each layer of the 0th axis separately, and then shows the 1st and 2nd axes as an array. This is why you see four 4x4 arrays with some numbers and some 'nan's. For an element in the array to have a number, that means a searchlight was run on that voxel. We also give an example of how to find those voxels by traditional indexing. 

Think of the searchlight as carving out the data. For each voxel, the surrounding voxels (making a cube of 3x3x3) were carved out to be analyzed. In this carved out chunk of data, no other brain data matters, only the data in the chunk is what the `calc_rsa` function operates on.

If you still aren't sure what all this means, below we have created a tool for you to look at what voxels are included in the searchlight. If you specify a voxel coordinate in 3d (in the range of x=28:32, y=33:37, z=28:32) then this script will output which voxels are part of `small_mask`. If you look at a coordinate in the middle of this range (e.g., (30, 35, 30)) then all of the voxels in the searchlight are in `small_mask`, but if you pick a voxel on the edge (e.g., (28, 33, 28)) then few of the voxels will be included.

In [None]:
# Specify the single voxel you want to visualize
single_voxel = (30, 34, 30)
single_voxel_mask = np.zeros(small_mask.shape)
single_voxel_mask[single_voxel] = 1 

def mask_interrogate(data, sl_mask, myrad, bcvar):
    # Print the contents of the searchlight
    print('Which voxels of the searchlight are in the `small_mask`:\n', data[0][:, :, :, 0])
    
    return None # Don't return anything

# Create the searchlight object
sl_mask = Searchlight(sl_rad=sl_rad, max_blk_edge=max_blk_edge)

# Distribute the information to the searchlights (preparing it to run)
sl_mask.distribute([small_mask.reshape((*small_mask.shape, 1))], single_voxel_mask)

_ = sl_mask.run_searchlight(mask_interrogate, pool_size=pool_size)

**Exercise 5:**<a id="ex5"></a> Some of the voxels in `searchlight_voxels` are 'nan'. Why does this happen and why might we WANT this to happen? Refer to the `calc_rsa` function and the tool above to understand why it *does* happen. Note: `None` and 'nan' are equivalent for this purpose. To understand why we *want* it to happen, think about what might happen if you only have a few usable voxels in your searchlight.

**A:**

### Using new kernel functions

The beauty of the searchlight code is that the kernel used in the searchlight can be swapped out to ask a different question. That is what we are going to do below. Specifically, we are going to perform classification on the data to test whether someone was speaking (just like we did in the classification notebook). Three functions needed for this that are all stored in `searchlight_utils.py`: 
1. `generate_classifier_labels`
2. `calc_svm`
3. `subsample_balance`

**Exercise 6:**<a id="ex6"></a> Inspect the three functions listed above and answer the following questions:

**Q:** The `generate_classifier_labels` function identifies the observations to be used (AKA events used in the classifier). What is the minimum time, in seconds, between the observations that is possible?  

**A:**

**Q:** Explain what the `for loop` in `calc_svm` does that starts with `for train_counter in [0, 1]:`?  

**A:**

**Q:** If you have 10 observations from condition A and 15 observations from condition B, `subsample_balance` can balance these conditions. How does this function balance the conditions and what role does randomness play?

**A:**

**Q:** Explain whether the 'curse of dimensionality' is helped or hurt by running classification with a searchlight rather than a large ROI.

**A:**

Set up the new input variables that you will use, replacing those used for the RSA

In [None]:
# Run the script to get the classifier labels
observation_TRs_shifted, labels = generate_classifier_labels()

# Trim the data to only include the relevant observations
func_vol_1st_observations = func_vol[:, :, :, observation_TRs_shifted['first_segment']]
func_vol_2nd_observations = func_vol[:, :, :, observation_TRs_shifted['second_segment']]

# Change the inputs for the new function
data = [func_vol_1st_observations, func_vol_2nd_observations] # What data to load in 
bcvar = [labels['first_segment'], labels['second_segment']] # The model to do the comparison to brain data with
kernel = calc_svm # Specify the function to be used

Now run the searchlight

In [None]:
# Create the searchlight object
sl_svm = Searchlight(sl_rad=sl_rad, max_blk_edge=max_blk_edge)
print("Setup searchlight inputs")

# Distribute the information to the searchlights (preparing it to run)
sl_svm.distribute(data, mask)

# Data that is needed for all searchlights is sent to all cores via the sl.broadcast function. In this example, we are sending the model responses
sl_svm.broadcast(bcvar)

print("Begin Searchlight")
start_time = time.time()
sl_result = sl_svm.run_searchlight(kernel, pool_size=pool_size)
end_time = time.time()
print("End Searchlight")

# How long did the searchlight take
print('\nSearchlight duration: %0.4f seconds' % (end_time - start_time))
print("Searchlight mask shape: ", sl_result.shape)

# Report the searchlight results
searchlight_voxels = np.zeros((4, 4, 4))
searchlight_voxels[searchlight_voxels == 0] = sl_result[small_mask == 1] 

# Report results of running the searchlight
print('Searchlight results:\n', searchlight_voxels)

## 2. Running searchlight analyses on a cluster<a name="submitting_searchlights"></a>

Running searchlight analyses through notebooks or interactive sessions isn't tractable for real analyses. Although the example above ran quickly and without parallelization, we only performed 64 analyses. In what follows we are going to run 1000 times that number. To do that, we are going to submit jobs to the cluster and take advantage of the parallel computing resources available to you.

You have actually been submitting jobs throughout this course. Everytime you run `launch_jupyter.sh` you are submitting a job to the cluster to be run. However, we haven't talked about how jobs work yet. A job is an allocation of computing resources for a specific amount of time. What you do with those resources is up to you, as specified by your code. Right now, this notebook is running on a job for the specific time requested (4 hours) using the requested resources (20gb of memory and one computer core). Jobs will end when either 1. you tell it to, 2. the code finishes running, 3. the code hits an error that crashes it, or 4. the time allocation runs out. You can ask for any arrangement of time allocation or resources, but the cluster has limited resources, so there needs to be a way of priortiizing and organizing jobs. This is called "scheduling" and is implemented using [slurm](https://slurm.schedmd.com/documentation.html). The details are complicated about how this works, but put simply: jobs that ask for less resources will be given a higher priority in the schedule than jobs that ask for more resources. In order to interact with slurm and submit jobs, there is a whole suite of tools and practices to follow. Stanford's page for Sherlock has some details on it (https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/#slurm-commands), but we will guide you through it here.

To run a job, a good work flow is to have two scripts: One script that actually does the computation you care about (e.g., a python script like `my_script.py`) and a script that sets up the environment and specifies the job parameters (e.g., a bash script like `submit_my_script.sh`). The environment refers to the modules and packages you need to run your job. The job parameters refer to the partition you are going to use (`-p`), the number of cores (`-n`), the amount of memory (`-m`) and required time (`-t`). To run your job you then call the bash script with something like: `sbatch submit_my_script.sh` from the command line. 

<div class="alert alert-block alert-info">
Note: we will be doing a lot from the command line. Whenever we refer to it, we mean the command line that is on the cluster, not the local one that is currently hosting your notebook.  
</div>

Lucky for you we have already written the script needed here, called `run_searchlight.sh`. This script is written in bash and submits the `searchlight.py` script. Please explore this script to get familiar with submitting jobs. This will from a good template for you to adapt in the future.

In [None]:
# Print the bash script for running searchlights
!cat run_searchlight.sh

**Self-study:** If you are allergic to the command line, you can do a lot from within jupyter. For instance, you can use `!` like above to run Unix commands. Alternatively, you can use `os.system('command')`: whatever string is contained in the `command` is executed as if from the command line.

When a job starts, a log file named `./logs/searchlight-%j.out` is automatically created, where `%j` is replaced with the 7 digit job ID number. You can see any output from your script printed into the log file. You can print out the content of the log file in the terminal by running `cat searchlight-$JOBID.out` from the command line. You could also open the file from the Jupyter notebook file tree.

### Distributing jobs with parallel computing<a name="ranks"></a>

When you parallelize an analysis across multiple cores, this typically means that each core will run the code independently (though there are ways for cores to communicate directly with each other). The searchlight function then notices that there are multiple cores running the same job and assigns different pieces of the data to each core for analysis. The message passing interface (MPI), the parallelizing framework used by BrainIAK and most high-performance computing applications, keeps track of each process by assigning it a rank, starting at 0. We provide a brief overview of ranks, cores, and nodes below.

<img src="nodes_process.jpg" width="300" height="300"/>

What is a rank? It is a process generated by the application that you are running. Often this will be a single core of a CPU, but a process can have multiple cores (as shown in the figure). In the above figure "Process 1" can be considered to be rank=0 and "Process 2" rank=1. A rank can use more than one core on the cluster and is managed by MPI. Each rank also uses a part of the memory allocated. The cores and memory are part of the hardware belonging to a "node". A node is grouping of cores and when you submit a job, you are requesting use of part of the node, or sometimes even the entire node. **Optimizing the memory, and number of cores for your batch job will lead to significant gains in run-time for your programs**.

MPI handles all communication between the ranks (or processes). This can occur within a single node, or if you have asked for lots of resources, this can even span multiple nodes. Each process can spawn multiple threads (a form of parallel computing *within* a core), and Open-MP handles communication between threads.

**You do not need to configure any of the above. BrainIAK handles all the parallelization for you.** Instead you just need to have a good grasp of the concepts here. 

**Exercise 7:** <a id="ex7"></a> In your own words, define the basic features of a job request (as listed below). Your answer should be specific to their meaning in the context of job submission, not what the concept may mean more generally. If any of these concepts aren't familiar, read the Sherlock description linked above

**Cores:**  

**Time:**  

**Memory:**  

The searchlight code on BrainIAK is set up to have one core per process. You don't need to specify how many cores you have in the code, it will automatically look at how many are available and will use them. So, if you are running an analysis across 2 cores then some of your computations will be run with rank=0 and some with rank=1. The following commands are used to access MPI information that your job needs.

```
comm = MPI.COMM_WORLD
rank = comm.rank
size = comm.size
```

Where rank is the core that this job is being run on and size is the number of cores available.

A great feature of MPI is that although your analysis can run on multiple cores, your data doesn't need to be loaded on all cores! This is great because otherwise memory becomes a big limitation of running searchlight analyses. Instead, you can just load the data on one rank. If memory is sufficient to load all data on one rank, then we would write someting like this pseudocode:
```
if rank==0:
    load fMRI data
load labels
load mask
distribute(pieces of my data to other ranks)
run searchlight
```

The above pseudocode will avoid loading all the data on every rank. **The labels and mask must be loaded on all ranks.** The distribution will allocate only pieces of the data for processing. Again, this is all handled by MPI. In searchlight, the `.distribute` methods handles the distribution to different ranks.

After the searchlights have all finished, you need to save the data. MPI gathers the results from all of the cores and let's you save the result. 

We have written a Python script for running searchlights. The file `searchlight.py` loads in a participant, sets up the searchlight and kernel and then performs the searchlight across the multiple cores that the code is run on. 

By default, the code will use the `calc_rsa` kernel on just the first subject. In other words, it will perform the TSM comparison to the Alexnet model across the whole brain. However, you can change these parameters by specifying different arguments on the bash command line. For instance the command below:
`sbatch run_searchlight.sh 4 calc_svm` will perform a searchlight using the `calc_svm` kernel on participant 4.

The results of the searchlight will be stored in this directory with the following name: `searchlight_KERNEL_PARTICIPANT.nii.gz` where `KERNEL` will be the kernel chosen and `PARTICIPANT` will be the participant number.

The script is parallelizing across 3 cores so will run about 3 times faster than if you ran it in serial.

**Exercise 8:**  <a id="ex8"></a> Run the `run_searchlight.sh` script using `sbatch run_searchlight.sh` on the command line. You can use the default parameters, so no need to provide additional arguments. This will take a bit of time: first the job needs to be scheduled, which will depend on the load on the cluster, and then it will take a long time to actually run. Specifically, it will take the duration estimated in Exercise 4, divided by the number of cores. Once your analysis has finished, load in the resulting volume and view it using nilearn's plotting tools (**CRITICALLY: don't use `view_img` because that won't appear on github. Use `plot_stat_map` or `plot_epi` instead**). Interpret the results, stating where in the brain you find evidence of a correspondence between the model and the brain's representation (Hint: Use this [website](https://www.ebrains.eu/tools/human-brain-atlas) if you aren't familiar with the labels of brain regions).

<div class="alert alert-block alert-info">
To check on your job, write "squeue -u \$USER" into the command line (or "!squeue -u \$USER" into a jupyter cell) and it will report the status of your jobs. Each row is a job, and if your job is still running then it will be its own row (you will also have the tunnel job running too). If it hasn't started then the `NODELIST` will either be None or Pending. Check the `TIME` column for your searchlight job to see how long it has been running. If you want to look at the log file that is created by the script, then find the job id number (printed when you submitted the job or you can find it in your `squeue`) and run the command: `cat logs/*JOBID*` where you replace JOBID with the number.
</div>

In [None]:
# Insert visualization code here

**A:**

**Exercise 9:**  <a id="ex9"></a> Submit the `run_searchlight.sh` code for participant `3` with the `calc_svm` kernel. Once it is finished, visualize the output in a way that emphasizes the regions that have strong decoding. Interpret the results, stating where in the brain you classify the difference between speaking vs non-speaking.

In [None]:
# Insert visualization code here

**A:**

## 3. Multiple comparisons correction<a name="multi"></a>

The searchlights above were run on one participant. Like before, we want to know if this is robust across participants. We won't ask you to run the searchlight on all participants because it would take up a lot of compute resources and may interfere with other students doing their assignments. However, we will show you simple code to do it:

```
for ppt in `seq 1 17`
do
sbatch run_searchlight.sh $ppt
done
```
<div class="alert alert-block alert-info">
You don't need to run this code, we have already done it for you
</div>


This code loops through the values 1 up to 17 and uses that as an input to the searchlight function. Fortunately for you, we have run this earlier, in the following folder: `sherlock_dir + '/derivatives/searchlights/'`. Specifically, we created a nifti that stores each of the participant's searchlights in the 4th dimension. In other words, rather than the 4th dimension of the nifti representing the time point, it represents the participant ID.

Now let's figure out what results are robust across participants. As a start, let's first average all of the participants searchlight results and visualize those. This approach should be familiar now. You load in the 4d volume of participant data as a nifti, convert it to a volume, average that volume along the 4th dimension, and then convert it back to a nifti that allows you to view it.

**Exercise 10:**<a id="ex10"></a> Visualize the average of the searchlights that are stored in: `sherlock_dir + '/derivatives/searchlights/searchlight_calc_rsa_all.nii.gz'` using nilearn plotting tools (**CRITICALLY: don't use `view_img` because that won't appear on github. Use `plot_stat_map` or `plot_epi` instead**).

In [None]:
# Insert code here

The average is a useful way to see where there might be robust activity, but the average doesn't tell us whether effects are reliable. To do that we have to do statistics. There are lots of approaches to statistical testing fMRI data. I personally favor [FSL](https://fsl.fmrib.ox.ac.uk/)'s [randomise](https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Randomise/UserGuide). Randomise is a non-parametric method which computes a null distribution by permuting the condition labels. If you aren't familiar with permutation testing, here is a really cute example that clearly explains it: https://www.jwilber.me/permutationtest/

In our analysis we don't have condition labels, we are just testing whether a voxel is significantly different from chance (0 in the case of RSA, 0.5 in the case of SVM). Hence, what we do for this kind of data is for each voxel we take the difference from chance and randomly flip the sign of our participant data in order to create a 'permuted' sample. The reason for sign flipping is that if the null is true (which is what you are testing) then the distribution should be symmetrical around chance, so a positive value in your dataset is just as likely as a negative value. 

This permuted sample is averaged and the value is stored for each voxel. This is done at least 1000 times, and the then the distribution of permuted values for each voxel is compared to the real value. For voxels that aren't reliable, the permuted distribution will contain the real value, but for reliable voxels the permuted distribution and real value won't overlap much. We quantify the degree of non-overlap between the real and permuted distribution by reporting a t-statistic.

Critically, because there are tens of thousands of voxels in the brain, we have to do some multiple comparisons correction (otherwise 5% of our voxels will be significant due to mere chance). We don't want to be too conservative since fMRI data is smooth, so adjacent voxels aren't independent of each other (and thus the effective degrees of freedom of fMRI data is far fewer than the number of voxels in the brain). Fortunately, randomise gives us a data-driven method for clustering. In fMRI, we assume that real brain activity is spatially smoothed in blobs (AKA clusters). If only one voxel is strongly activated and is surrounded by non-responding voxels then we usually treat that as not real signal. However, how big should your cluster be? The common approach is to set a minimum cluster size, but that is arbitrary. Randomise does [Threshold-Free Cluster Enhancement (TFCE)](https://www.fmrib.ox.ac.uk/datasets/techrep/tr08ss1/tr08ss1.pdf) which uses the spatial smoothness properties of the data and the strength of activation to estimate likely cluster locations. 

To implement randomise and TFCE, we created another bash script for you to run, called `run_randomise.sh`. Specify the volume to run it on as an input (e.g., `sbatch run_randomise.sh $FILE`). You should inspect this function before running it. This produces a couple of outputs in your current folder, each with a different suffix: 
> *tstat1.nii.gz*: This is the raw t test result for each voxel, based on the permutation. Hence high values (either positive or negative) represent a strong difference from the null distribution  
> *tfce_tstat1.nii.gz*: This is the volume with threshold free cluster enhancement. Each voxel will have a 1-p value (i.e., values near 1 are significant at p near 0), which reflects the degree of response and clustering that is present in the voxel. This is a 2-tailed test, so strongly negative or strongly positive t-statistic values will be significant.    
> *vox_corrp_tstat1.nii.gz*: This is a voxelwise correction that doesn't pay attention to clusters but just corrects based on intensity. Again this is 2-tailed and values near 1 are highly significant.

**Exercise 11:**<a id="ex11"></a> Submit randomise with the `sherlock_dir + '/derivatives/searchlights/searchlight_calc_rsa_all.nii.gz'` data (be aware that the command line won't recognize `sherlock_dir`). When the job has finished (about 5 minutes), visualize the output ending in `tfce_tstat1.nii.gz` (**CRITICALLY: don't use `view_img` because that won't appear on github. Use `plot_stat_map` or `plot_epi` instead**). Set the threshold to show values that are significant at p<0.05, with a 2-tailed test (Hint: setting a p-threshold with a 2-tailed test is not trivial). Additionally, visualize the t-statistic map (the file ending in `tstat1.nii.gz`) and find the region that has the strongest t-statistic. Finally, interpret those results, reporting where in the brain is significant and what that means. (Hint: think back to lecture when we discussed the assumptions of searchlights)

In [None]:
# Insert visualization of the tfce results

In [None]:
# Insert visualization of the tstat results

**A:**

**Self Study:** The exercise above is performed using the `calc_rsa` kernel. Remember that randomise is comparing each voxel against zero. This works well for `calc_rsa` because a null result is equal to 0. If we wanted to do multiple comparisons correction using the `calc_svm` kernel, we would need to take additional steps, since a null result is 0.5 for this classification. Randomise gives you options to account for means, but the easiest is to just remove the baseline before conducting randomise (i.e., subtract 0.5 from all the data).

**Novel contribution:**<a name="novel"></a> be creative and make one new discovery by adding an analysis, visualization, or optimization.

In [None]:
# Novel contribution code here

## Contributions <a id="contributions"></a> 

M. Kumar, C. Ellis and N. Turk-Browne produced the initial notebook 03/2018  
T. Meissner minor edits  
H. Zhang preprocessed dataset, add multi-subject section, add multiple exercises, add solutions, other edits  
Vineet Bansal provided the MPI diagrams.  
David Turner provided extensive input on how to use MPI for efficient running of jobs.  
M. Kumar added MPI information and enhanced section contents.  
K.A. Norman provided suggestions on the overall content and made edits to this notebook.  
C. Ellis implemented updates from cmhn-s19.<br/>
X. Li changed nipype function get_data() to get_fdata() since get_data() is deprecated in section 1.1.1      
T. Yates made edits for cmhn_s21.  
E. Busch edited commenting and hardcoding for cmhn-s22, adjusted for Grace cluster and changed submit scripts in cmhn-s23  
C. Ellis adapted for movie data