### Workshop Dataset

For this workshop, we will be using a subset of a publicly available dataset, ds000030, from [openneuro.org](https://openneuro.org/datasets/ds000030). The dataset is structured according to the Brain Imaging Data Structure ([BIDS](https://bids-specification.readthedocs.io/en/stable/)). BIDS is a simple and intuitive way to organize and describe your neuroimaging and behavioural data. Neuroimaging experiments result in complicated data that can be arranged in several different ways. BIDS tackles this problem by suggesting a new standard (based on consensus from multiple researchers across the world) for the arrangement of neuroimaging datasets. Using the same organizational standard for *all* of your studies will also allow you to easily reuse your scripts and share data and code with other researchers.

Below is a tree diagram showing the folder structure of single MR session within ds000030. This was obtained by using the bash command `tree`.  
`!tree ../data/ds000030`

```
ds000030
├── CHANGES
├── dataset_description.json
├── derivatives
│   └── fmriprep
├── participants.tsv
├── README
├── sub-10159
│   ├── anat
│   │   ├── sub-10159_T1w.json
│   │   └── sub-10159_T1w.nii.gz
│   └── func
│       ├── sub-10159_task-rest_bold.json
│       └── sub-10159_task-rest_bold.nii.gz
└── task-rest_bold.json
```

The `participants.tsv` file is meant to describe some demographic information on each participant within your study (eg. age, handedness, sex, etc.) Let's take a look at the `participants.tsv` file to see what's been included in this dataset.

### Downloading Data

We've already randomly sampled 10 CONTROL and 10 SCHZ participants and placed the participant list in the `../download_list` text file. Let's download that data now.

In [1]:
# download T1w scans
!aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/anat \
  ../data/ds000030/sub-10159/anat

# download resting state fMRI scans
!aws s3 sync --no-sign-request \
  s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/func \
  ../data/ds000030/sub-10159/func \
  --exclude '*' \
  --include '*task-rest_bold*'

download: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/anat/sub-10159_T1w.json to ../data/ds000030/sub-10159/anat/sub-10159_T1w.json
download: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/anat/sub-10159_T1w.nii.gz to ../data/ds000030/sub-10159/anat/sub-10159_T1w.nii.gz
download: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/func/sub-10159_task-rest_bold.json to ../data/ds000030/sub-10159/func/sub-10159_task-rest_bold.json
download: s3://openneuro/ds000030/ds000030_R1.0.5/uncompressed/sub-10159/func/sub-10159_task-rest_bold.nii.gz to ../data/ds000030/sub-10159/func/sub-10159_task-rest_bold.nii.gz


### Querying a BIDS Dataset

[pybids](https://bids-standard.github.io/pybids/) is a Python API for querying, summarizing and manipulating the BIDS folder structure. A more detailed tutorial on using pybids can be found [here](https://github.com/bids-standard/pybids/blob/master/examples/pybids_tutorial.ipynb).

In [1]:
from bids.layout import BIDSLayout

In [2]:
layout = BIDSLayout('../data/ds000030')

The pybids layout object lets you query your BIDS dataset according to a number of parameters by using a `get_*()` method.  
We can get a list of the subjects we've downloaded from the dataset.

In [3]:
layout.get_subjects()

['10159']

To get a list of all of the files, just use `get()`. 

In [4]:
layout.get()

[<BIDSFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/CHANGES'>,
 <BIDSFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/dataset_description.json'>,
 <BIDSDataFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/participants.tsv'>,
 <BIDSFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/README'>,
 <BIDSFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/anat/sub-10159_T1w.json'>,
 <BIDSImageFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/anat/sub-10159_T1w.nii.gz'>,
 <BIDSFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-bart_bold.json'>,
 <BIDSImageFile filename='/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-bart_bold.nii.gz'>,
 <BIDSDataFile filename='/Users/Michael/projects/teachi

There are many arguments we can use to filter down this list. Any BIDS-defined keyword can be passed on as a constraint. In `pybids`, these keywords are known as **entities**. For a complete list of possibilities:

In [5]:
layout.entities

{'subject': <bids.layout.models.Entity at 0x10ec2aac8>,
 'session': <bids.layout.models.Entity at 0x10eb96a90>,
 'task': <bids.layout.models.Entity at 0x10ec2a550>,
 'acquisition': <bids.layout.models.Entity at 0x10ec2a5c0>,
 'ce': <bids.layout.models.Entity at 0x10ebcd898>,
 'reconstruction': <bids.layout.models.Entity at 0x10ec5d7f0>,
 'dir': <bids.layout.models.Entity at 0x10ec5dba8>,
 'run': <bids.layout.models.Entity at 0x10ec5de10>,
 'proc': <bids.layout.models.Entity at 0x10ec5dcf8>,
 'modality': <bids.layout.models.Entity at 0x10ec5dc50>,
 'echo': <bids.layout.models.Entity at 0x10ec5df60>,
 'recording': <bids.layout.models.Entity at 0x10ec5db38>,
 'suffix': <bids.layout.models.Entity at 0x10ec5d780>,
 'scans': <bids.layout.models.Entity at 0x10ec5d400>,
 'fmap': <bids.layout.models.Entity at 0x10eb96ac8>,
 'datatype': <bids.layout.models.Entity at 0x10ec5d630>,
 'extension': <bids.layout.models.Entity at 0x10ec67198>,
 'ImageType': <bids.layout.models.Entity at 0x10ecd64e0>,
 

For example, if we only want the file paths of all of our resting state fMRI scans,

In [6]:
layout.get(datatype='func', suffix='bold', task='rest', extensions=['.nii.gz'], return_type='file')



['/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-rest_bold.nii.gz']

**EXERCISE**: Retrieve the file paths of any scan where the `RepetitionTime` is 2 seconds.

In [8]:
layout.get(RepetitionTime=2, return_type='file')

['/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-bart_bold.nii.gz',
 '/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-rest_bold.nii.gz',
 '/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-scap_bold.nii.gz',
 '/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-stopsignal_bold.nii.gz',
 '/Users/Michael/projects/teaching/fmriprep_workshop/data/ds000030/sub-10159/func/sub-10159_task-taskswitch_bold.nii.gz']

Let's save the first file from our list of file paths to a variable and pull the metadata from its associated JSON file using the `get_metadata()` function.

In [9]:
fmri_file = layout.get(RepetitionTime=2, return_type='file')[0]
layout.get_metadata(fmri_file)

{'AccelNumReferenceLines': 24,
 'AccelerationFactorPE': 2,
 'AcquisitionMatrix': '64/0/0/64',
 'CogAtlasID': 'trm_4d559bcd67c18',
 'CogPOID': '',
 'DeviceSerialNumber': '35343',
 'EPIFactor': 128,
 'EchoTime': 0.03,
 'EchoTrainLength': 1,
 'EffectiveEchoSpacing': 0.000395,
 'FlipAngle': 90,
 'ImageType': 'ORIGINAL/PRIMARY/M/ND/MOSAIC',
 'ImagingFrequency': 123249925,
 'InPlanePhaseEncodingDirection': 'COL',
 'Instructions': 'This task is the one where you score points by inflating balloons. You push the first button to inflate the balloon, and the second button to stop inflating and move on to the next one. The more you inflate the balloon the more points you’ll get, but if you inflate it too much the balloon will pop and you won’t get any points. There are two different colors of balloons, green and white. Green balloons give points, but white balloons don’t, so when you see a white balloon you can just inflate it until it goes away to move on to the next one. You only get a limited n

We can even collect the metadata for all of our fmri scans into a list and convert this into a dataframe.

In [11]:
import pandas as pd

metadata_list = []
all_fmri_files = layout.get(datatype='func', suffix='bold', return_type='file', extensions='.nii.gz')
for fmri_file in all_fmri_files:
    fmri_metadata = layout.get_metadata(fmri_file)
    metadata_list.append(fmri_metadata)
df = pd.DataFrame.from_records(metadata_list)
df



Unnamed: 0,AccelNumReferenceLines,AccelerationFactorPE,AcquisitionMatrix,CogAtlasID,CogPOID,DeviceSerialNumber,EPIFactor,EchoTime,EchoTrainLength,EffectiveEchoSpacing,...,SequenceVariant,SliceTiming,SoftwareVersions,TaskDescription,TaskFullName,TaskName,TaskParameters,TotalScanTimeSec,TransmitCoilName,VariableFlipAngleFlag
0,24,2,64/0/0/64,trm_4d559bcd67c18,,35343,128,0.03,1,0.000395,...,SK,"[1.0025, 0, 1.0625, 0.06, 1.1225, 0.1175, 1.18...",syngo MR B15,"In the BART (Lejuez et al., 2002), participant...",Balloon Analog Risk Task (BART),bart,"{'ISI': 3, 'ITI': 2, 'mean_iti': 4, 'min_iti':...",542,Body,N
1,24,2,64/0/0/64,trm_4c8a834779883,COGPO_00086,35343,128,0.03,1,0.000395,...,SK,"[1.005, 0, 1.0625, 0.06, 1.1225, 0.12, 1.1825,...",syngo MR B15,"In the Resting scan, participants were asked t...",Resting State,rest,,312,Body,N
2,24,2,64/0/0/64,trm_4f2453b806fe1,,35343,128,0.03,1,0.000395,...,SK,"[1.005, 0, 1.0625, 0.06, 1.1225, 0.1175, 1.18,...",syngo MR B15,SCAP is a working memory task that tests the m...,Spatial Working Memory Capacity Tasks (SCAP),scap,{'trigger_time': None},590,Body,N
3,24,2,64/0/0/64,tsk_4a57abb949e1a,,35343,128,0.03,1,0.000395,...,SK,"[1.0025, 0, 1.0625, 0.0575, 1.12, 0.1175, 1.18...",syngo MR B15,The Stop-Signal Task measures response inhibit...,Stop-Signal Task,stopsignal,"{'Settings': {'BSI': 1, 'ISI': 1.5, 'Ladder1 s...",376,Body,N
4,24,2,64/0/0/64,tsk_4a57abb949e8a,COGPO_00107,35343,128,0.03,1,0.000395,...,SK,"[1.0025, 0, 1.0625, 0.0575, 1.12, 0.1175, 1.18...",syngo MR B15,"In the Task-Switching (TS) task, participants ...",Task Switching,taskswitch,"{'button_set': 2, 'left_color': 'red', 'right_...",424,Body,N
