# Download Data

*Written by Luke Chang*

Many of the imaging tutorials will use open data from the Pinel Localizer task.

The Pinel Localizer task was designed to probe several different types of basic cognitive processes, such as visual perception, finger tapping, language, and math. Several of the tasks are cued by reading text on the screen (i.e., visual modality) and also by hearing auditory instructions (i.e., auditory modality). The trials are randomized across conditions and have been optimized to maximize efficiency for a rapid event related design. There are 100 trials in total over a 5-minute scanning session. Read the original [paper](https://bmcneurosci.biomedcentral.com/articles/10.1186/1471-2202-8-91) for more specific details about the task and the [dataset paper](https://doi.org/10.1016/j.neuroimage.2015.09.052). 

This dataset is well suited for these tutorials as it is (a) publicly available to anyone in the world, (b) relatively small (only about 5min), and (c) provides many options to create different types of contrasts.

There are a total of 94 subjects available, but we will primarily only be working with a smaller subset of about 30.

Downloading the data is very easy as it is currently available on the [OSF website](https://osf.io/vhtf6/files/).

We will use the `osfclient` [package](https://github.com/osfclient/osfclient) to download the entire dataset. Note, that the entire dataset is fairly large (~5.25gb), so make sure you have space on your computer. At some point, we will make a smaller version for the dartbrain course available for download.

If you are taking the Psych60 course at Dartmouth, we have already made the download available on the jupyterhub server.

Let's first make sure the `osfclient` package is installed in our python environment.

In [5]:
!pip install osfclient



osfclient provides a command line interface built in python that can help us easily download (and also upload) datasets being shared on the Open Science Framework (OSF).

All we need to do is specifiy the OSF project id and the directory where we would like the data downloaded.

In [None]:
project_id = 'vhtf6'
output_directory = '/Users/lukechang/Dropbox/Dartbrains/Data'

!osf -p {project_id} clone {output_directory}

The dataset has been converted to be in a standard data format known as the [Brain Imaging Data Structure](https://bids.neuroimaging.io/) format or BIDS for short. BIDS is a specification to organize imaging datasets in a standard way across different laboratories. It contains a structured format for people to find relevant information for analyzing the dataset.

# Test Data
Let's look at the data to make sure everything downloaded properly.

This example assumes that you are using the docker container associated with the course.

We will use [pybids](https://bids-standard.github.io/pybids/) to explore the dataset. It should already be in included in the dartbrains docker container. Otherwise, you can install it using pypi with `!pip install pybids`.

In [7]:
from bids import BIDSLayout

data_dir = '/home/jovyan/Data'

layout = BIDSLayout(data_dir, derivatives=False)
layout

BIDS Layout: .../home/jovyan/Data | Subjects: 94 | Sessions: 0 | Runs: 0

This shows us that there are 94 subjects with only a single functional run.

We can query the `BIDSLayout` object to get all of the file names for each participant's functional data. Let's just return the first 10.

In [31]:
file_list = layout.get(target='subject', suffix='bold', return_type='file', extension='nii.gz')
file_list[:10]

['/home/jovyan/Data/sub-S01/func/sub-S01_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S02/func/sub-S02_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S03/func/sub-S03_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S04/func/sub-S04_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S05/func/sub-S05_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S06/func/sub-S06_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S07/func/sub-S07_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S08/func/sub-S08_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S09/func/sub-S09_task-localizer_bold.nii.gz',
 '/home/jovyan/Data/sub-S10/func/sub-S10_task-localizer_bold.nii.gz']

Ok, now let's try to load one of the functional datasets using `Brain_Data` from the nltools package.

In [11]:
from nltools.data import Brain_Data

data = Brain_Data(file_list[0])



DimensionError: Input data has incompatible dimensionality: Expected dimension is 4D and you provided a 5D image. See http://nilearn.github.io/manipulating_images/input_output.html.

Uh, oh...  This simple command isn't working. Here is your first lesson that things are always a little messy and require debugging.

Let's try to figure out what is going on.

First, let's look at the error and try to see what went wrong.

>DimensionError: Input data has incompatible dimensionality: Expected dimension is 4D and you provided a 5D image. See http://nilearn.github.io/manipulating_images/input_output.html.

It looks like that the data is being read in as a 5 dimensional image rather than a four dimensional image. `Brain_Data` can't read this type of data. Perhaps it's because this nifti file was created using an older version of SPM.

Let's test our hypothesis and use nibabel to load the data and inspect the shape of the data file.

In [12]:
import nibabel as nib

dat = nib.load(file_list[0])
dat.shape

(64, 64, 40, 1, 128)

ok, it looks like the first 3 dimensions are correctly describing the spatial dimensions of the data and the fifth dimension reflects the number of volumes in the dataset.

Notice that there is an extra dimension of `1` that we need to remove. We can do that with the numpy `squeeze` function.

In [15]:
dat.get_data().squeeze().shape


* deprecated from version: 3.0
* Will raise <class 'nibabel.deprecator.ExpiredDeprecationError'> as of version: 5.0
  """Entry point for launching an IPython kernel.


(64, 64, 40, 128)

`squeeze` gets rid of that extra dimension. Now we need to create a new nifti image with the correct data and write it back out to a file that we can use later.

We will initialize a new nibabel nifti instance and write it out to file.

In [18]:
dat_fixed = nib.Nifti1Image(dat.get_data().squeeze(), dat.affine)
nib.save(dat_fixed, file_list[0])


* deprecated from version: 3.0
* Will raise <class 'nibabel.deprecator.ExpiredDeprecationError'> as of version: 5.0
  """Entry point for launching an IPython kernel.


Let's double check that this worked correctly.

In [19]:
nib.load(file_list[0]).shape

(64, 64, 40, 128)

Ok! Looks like it worked!

Now let's fix the rest of the files so we can work with the data in all of the tutorials.

In [32]:
file_list = layout.get(target='subject', suffix='bold', return_type='file', extension='nii.gz')

for f in file_list:
    dat = nib.load(f)
    if len(dat.shape) > 4:
        dat_fixed = nib.Nifti1Image(dat.fget_data().squeeze(), dat.affine)
        nib.save(dat_fixed, f)

Now let's go back to our original code we tried to run that initially failed.

In [39]:
data = Brain_Data(file_list[0])

It works! 

Get used to debugging, it is a crucial part of neuroimaging data analysis, but can be a frustrating process.