# fMRI arrays in python
In this notebook, we will learn to manipulate 3D or 4D arrays containing fMRI data. This is both difficult (because of the multi-dimensional nature of the arrays) and easier (because the data is interpretable and allows for intuitive visualizations of different dimensions). 

We will load fMRI data using a library called nibabel, which loads many types of neuroimaging formats (unfortunately, not brain voyager files...).

***Please note: for this notebook to work, you have to configure your pycortex installation to read the pycortex store associated with the example data. the `filestore` line in ~/.config/pycortex/options.cfg to read: `filestore = ~/remote_mounts/pomcloud0/datasets/IntroToEncodingModels/pycortex_store/`***

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import cortex as cx
import nibabel
import pathlib

%matplotlib inline

Show all files in the directory we will be working with:

In [None]:
iem_base = pathlib.Path('~/remote_mounts/pomcloud0/datasets/IntroToEncodingModels/').expanduser()

In [None]:
# We will work with a file from a category localizer experiment. 
# Load a single file of functional data
fname = iem_base / 's03_catloc_run01.nii.gz'
nii = nibabel.load(fname)
data = nii.get_fdata()

In [None]:
# The four dimensions of this array are (x, y, z, time)
print(data.shape)

Thus, this array contains 120 volumes of the brain. 

# Data manipulation / display

In [None]:
# Select the middle z slice of the first volume of data from this array.
z12_0 = data[:, :, 12, 0]
# This will be a 2D array:
print(z12_0.shape)
# ... so we can use imshow on it:
plt.imshow(z12_0)
plt.show()

Dig it! a brain! 

One important manipulation of fMRI arrays is to switch the order of all dimensions of the array. This can be done simply using the function `np.transpose()`. For 2D arrays, this is a very straighforward manipulation:

In [None]:
flip_me = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
flipped = np.transpose(flip_me)
print(flip_me)
print(flipped)


... In this example, the rows become columns and the columns become rows. This can also be accomplished by just calling the .T method of the array:

In [None]:
print(flip_me)
print(flip_me.shape)
print(flip_me.T)
print(flip_me.T.shape)

For 3D arrays, transposition is a more complex operation. By convention, `np.transpose()` (or `<array>.T`) reverses and flips the order of all dimensions in an array. So, if we apply that operation to our fMRI data, we get out an array that is (time x Z x Y x X) instead of (X x Y x Z x T). As you will see below, it's useful to have time as the first dimension. For many useful operations, syntax becomes simpler to write for TZYX arrays. Also, more concretely, pycortex will assume this shape of volumentric data (tzyx or zyx instead of xyzt or xyz). So, let's transpose the data:

In [None]:
dataT = data.T
print(dataT.shape)

As a quick example of why this is neat: with the data in this format, volumes of data can be selected like this:

In [None]:
# Select first volume
first_vol = dataT[0,:,:,:]
# Or, more simply, like this:
first_vol_variant = dataT[0]
# ... That's clean. We like clean.
print(np.allclose(first_vol, first_vol_variant))

Finally, note that python operations can be performed on the output of another operation, like this:

In [None]:
# Indexing following transposition
first_vol_variant2 = data.T[0]
print(np.allclose(first_vol, first_vol_variant2))
# Or: reshaping a linear array into a 2D array:
# (here, reshape is called on the output of np.arange())
my_2d_array = np.arange(10).reshape(5,2)
print(my_2d_array)

Best not to get too carried away with this; it makes code very difficult to read if taken to excess. But it can provide useful shortcuts.

## Exercises

In [None]:
# Select the middle X slice of the first volume, and plot it the same way


In [None]:
# Select the middle Y slice of the first volume, and plot it the same way


In [None]:
# Select all values from a single voxel (choose a voxel that is INSIDE the brain somewhere - how is up to you)
# and plot all 120 time values using plt.plot()


In [None]:
# Use plt.subplots() to create a plot of all 25 z slices of the fifth brain volume in a single figure. 
fig, axs = plt.subplots(...)

for ... # loop over axes and/or slices
    # Call some plot command here!

In [None]:
# Plot a histogram of ALL the values for all voxels at all time points using plt.hist()
# NOTE: you can't pass a 4D array to plt.hist(). You can try, but your computer will hate you. So you have to use 
# the .flatten() method of your array to convert it to a 1D array before passing it to plt.hist()


# Logical indices

We talked about logical indices in class this past week. Logical indices select values in an array where the logical index is True. Thus:

In [None]:
a = np.array([1, 2, 3, 4, 5])
b = np.array([True, True, False, True, False])
a[b]

Note that you can CREATE logical arrays to use for indexing. Check this out: 

In [None]:
c = a > 3
print(c)
print(a[c])

## Exercise on masking in general
The histogram you made above should show you that there are many values in the fMRI data that are near zero. These are very likely to be measurements from voxels outside the brain. Your task is to exclude these voxels and create a new histogram. Do this in two steps. First, take the mean of the data across time. This should result in a 3D array the size of the brain. Then use `>` or `<` to create a logical index of values above some minimum activity threshold. The conceptual basis here is that you want to include all voxels with an average activity OVER some value. (You need to choose what that value should be). Use this logical index to select values in the original data. Make a histogram of the selected values.

NOTE: your logical index will be a 3D array. Your data is 4D. Thus, you will need to include one `:` when using the logical index on your data, as shown in the answer below. 

In [None]:
# Define some index here (like c was done above)
idx = ...
# Use this index on your data
selected_data = data[idx, :]
# Create a histogram of the selected values
plt.hist(selected_data, ...) # add some keyword arguments here (e.g. a number of bins) to make this look nicer.

A very common use for logical arrays in fMRI analysis is the definition of regions of interest. These can be stored as brain-volume sized logical arrays that contain True values in the location of voxels within the ROI and False values elsewhere. 

Pycortex can generate ROI masks of this form, as shown in the following cell. Note that, as in creation of pycortex `Volume()` objects, you must specify a subject (whose brain is this?) and a transform (which experiment and/or which alignment of data with the underlying anatomy is this?), as well as a list of ROIs (which must have been pre-defined for that subject).

In [None]:
subject = 's03'
transform = 'category_localizer'
rois = cx.get_roi_masks(subject, transform, roi_list=['V1','V2','V3', 'FFA','PPA'])

In [None]:
# This returns a dictionary of arrays, one for each region of interest:
print(list(rois.keys()))
# Each one is the shape of the brain:
print(rois['V1'].shape)

## Exercise on ROIs

How many voxels are in each ROI? (this just takes some simple math on the array - how many True values are in each array??)

In [None]:
# Answer


Select all voxels in V1, FFA, and PPA, and plot the mean timecourse for each region. The code here should get you started!

In [None]:
# Answer
# Select V1 voxels only
V1_voxels = dataT[...] # How will you index into dataT?
# Take the mean across all voxels in V1
V1_mean = # ...?
# Plot answer
plt.plot(V1_mean)
# Do the same for FFA and PPA
# ...?

There should be hints of differences between each timecourse - i.e., maybe blocks of time where the timecourse is a little higher than at other points. This is a block design experiment, after all. (Don't expect anything too clean - there are only 2 reapeats of each condition in this data!)

# Last big exercise
This data is from a localizer experiment - the experiment shows faces, bodies, places, objects, and scrambled versions of some of the images in different blocks. A design matrix showing when each of the blocks started and stopped is stored in the X variable in the following file:

In [None]:
import h5py

In [None]:
hf = h5py.File(str(iem_base / 'catloc_design.hdf'))

In [None]:
with h5py.File(str(iem_base / 'catloc_design.hdf')) as hf:
    print(list(hf.keys()))
    X = hf['X'][:]
print(X.shape)

In [None]:
plt.imshow(X, aspect='auto', interpolation='nearest')

You may have noticed that neither dimension of this array matches with the time dimension of the fMRI data (that was 120 TRs, and the dimensions of X are (720, 5). BUT, you also may have noticed that there are six different fMRI files (labeled run 1 through run 6) in the directory that we loaded the original file from. SO. Your task is to analyze the experiment! 

At this point, we have gone over all the necessary steps. Consult past notebooks to help you figure out what to do for each of these!

1. Load the fMRI data into a Y variable 
    * You should be able to create a Y array that is 720 x (voxels) in size.
    * It will be much gentler on your memory if you mask out the cortical voxels for each run before concatenating them. An example of how to get a cortical mask for this subject is shown below.
    * For our purposes here, you should z-score the fMRI data along the time dimension. You can use the zscore function in scipy.stats (`from scipy.stats import zscore`). This should be done SEPARATELY for each run, because the mean for each run is likely to be different for spurious reasons!
2. Convolve each column of the X variable with a hemodynamic response function
3. Use OLS regression to fit weights to each column of the design matrix for each voxel.
4. Compute the difference between the weights for the `object` regressor and the weights for the `face` regressor. 
5. This should give you one value for each voxel in the brain; display this data on a pycortex surface!

In [None]:
# To get a pycortex mask for this subject:
subject = 's03'
transform = 'category_localizer'
mask = cx.db.get_mask(subject, transform, type='cortical')
print(mask.shape)

In [None]:
# Answer

# (it may be useful to use multiple cells for this)