# Introduction to `pybids`

[`pybids`](https://github.com/bids-standard/pybids) is a tool to query, summarize and manipulate data using the BIDS standard. 
In this tutorial we will use a `pybids` test dataset to illustrate some of the functionality of `pybids.layout`

In [29]:
import os
import sys
import bids
from bids import BIDSLayout

## The `BIDSLayout`

At the core of pybids is the `BIDSLayout` object. A `BIDSLayout` is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the `BIDSLayout` initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a `BIDSLayout` by passing in the BIDS dataset root location as a single argument:

In [30]:
# Here we're starting in /Volumes/Main/Working/IntendedFor and adding the data to create the path
bids_dir = os.path.join(os.getcwd(), "data")

# Initialize the layout
# following setting avoids an annoying warning message about deprecated feature
bids.config.set_option('extension_initial_dot', True)
layout = BIDSLayout(bids_dir, validate=False)

# Print some basic information about the layout
layout

BIDS Layout: .../Main/Working/IntendedFor/data | Subjects: 3 | Sessions: 2 | Runs: 10

### Querying the `BIDSLayout`
When we initialize a `BIDSLayout`, all of the files and metadata found under the specified root folder are indexed. This can take a few seconds (or, for very large datasets, a minute or two). Once initialization is complete, we can start querying the `BIDSLayout` in various ways. The workhorse method is [`.get()`](https://bids-standard.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get). If we call `.get()` with no additional arguments, we get back a list of all the BIDS files in our dataset:

In [31]:
all_files = layout.get()
print("There are {} files in the layout.".format(len(all_files)))
print("\nThe first 10 files are:")
all_files[:10]

There are 130 files in the layout.

The first 10 files are:


[<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/dataset_description.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_defacemask.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.nii.gz'>,
 <BIDSFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.bval'>,
 <BIDSFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.bvec'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.nii.gz'>,
 <BIDSFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_dir-PA_epi.bval'>,
 <BIDSFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_dir-PA_epi.bvec'

The returned object is a Python list. By default, each element in the list is a `BIDSFile` object. We discuss the `BIDSFile` object in much more detail below. For now, let's simplify things and work with just filenames:

In [32]:
layout.get(return_type='filename')[:10]

['/Volumes/Main/Working/IntendedFor/data/dataset_description.json',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_defacemask.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.json',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.bval',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.bvec',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.json',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/dwi/sub-078_acq-AP_dwi.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_dir-PA_epi.bval',
 '/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_dir-PA_epi.bvec']

This time, we get back only the names of the files.

Get all files for all the subject's T1 weighted images. Note that (optional) sessions are taken into account.

In [34]:
layout.get(target='subject', suffix='T1w')

[<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.nii.gz'>]

In [35]:
pd_fyls = layout.get(target='subject', suffix='phasediff')
pd_fyls

[<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_phasediff.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_phasediff.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.nii.gz'>,
 <BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.nii.gz'>]

In [36]:
for fyl in pd_fyls:
    print(fyl.relpath)

sub-078/fmap/sub-078_phasediff.json
sub-078/fmap/sub-078_phasediff.nii.gz
sub-188/fmap/sub-188_phasediff.json
sub-188/fmap/sub-188_phasediff.nii.gz
sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.json
sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.nii.gz
sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.json
sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.nii.gz


In [41]:
jfyls = [f for f in pd_fyls if f.filename.endswith('.json')]
for j in jfyls:
    print(j)

<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_phasediff.json'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.json'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.json'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.json'>


In [42]:
jf = jfyls[0]
jf

<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/fmap/sub-078_phasediff.json'>

In [43]:
jf.get_json()

'{\n  "AcquisitionMatrixPE": 68,\n  "AcquisitionNumber": 1,\n  "AcquisitionTime": "14:44:19.717500",\n  "BaseResolution": 68,\n  "BodyPartExamined": "HEAD",\n  "ConsistencyInfo": "N4_VD13A_LATEST_20120616",\n  "ConversionSoftware": "dcm2niix",\n  "ConversionSoftwareVersion": "v1.0.20190410  GCC6.3.0",\n  "DeviceSerialNumber": "45424",\n  "DwellTime": 2.53e-05,\n  "EchoNumber": 2,\n  "EchoTime": 0.00738,\n  "EchoTime1": 0.00492,\n  "EchoTime2": 0.00738,\n  "FlipAngle": 60,\n  "ImageOrientationPatientDICOM": [1, -2.05103e-10, 0, 2.05103e-10, 1, 0],\n  "ImageType": [\n    "ORIGINAL",\n    "PRIMARY",\n    "P",\n    "ND"\n],\n  "ImagingFrequency": 123.253,\n  "InPlanePhaseEncodingDirectionDICOM": "ROW",\n  "InstitutionAddress": "North_Warren_Ave_1609_Tucson_Denver_US_85719",\n  "InstitutionName": "University_of_Arizona",\n  "InstitutionalDepartmentName": "Department",\n  "MRAcquisitionType": "2D",\n  "MagneticFieldStrength": 3,\n  "Manufacturer": "Siemens",\n  "ManufacturersModelName": "Sky

In [46]:
# Get BOLD image files for a single subject:
layout.get(subject='219', extension='nii.gz', suffix='bold')

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_task-rest_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_task-rest_run-02_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-02_bold.nii.gz'>]

In [47]:
# Same as above, but for one session only:
layout.get(subject='219', session='itbs', extension='nii.gz', suffix='bold')

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-02_bold.nii.gz'>]

### Filtering files by entities
The utility of the `BIDSLayout` would be pretty limited if all we could do was retrieve a list of all files in the dataset. Fortunately, the `.get()` method accepts all kinds of arguments that allow us to filter the result set based on specified criteria. In fact, we can pass *any* BIDS-defined keywords (or, as they're called in PyBIDS, *entities*) as constraints. For example, here's how we would retrieve all BOLD runs with `.nii.gz` extensions for subject `'01'`:

In [17]:
# Retrieve filenames of all BOLD runs for subject 219
layout.get(subject='219', extension='nii.gz', suffix='bold', return_type='filename')

['/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_task-rest_run-01_bold.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_task-rest_run-02_bold.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-01_bold.nii.gz',
 '/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/func/sub-219_ses-itbs_task-rest_run-02_bold.nii.gz']

If you're wondering what entities you can pass in as filtering arguments, the answer is contained in the `.json` configuration files [housed here](https://github.com/bids-standard/pybids/tree/master/bids/layout/config). To save you the trouble, here are a few of the most common entities:

* `suffix`: The part of a BIDS filename just before the extension (e.g., `'bold'`, `'events'`, `'physio'`, etc.).
* `subject`: The subject label
* `session`: The session label
* `run`: The run index
* `task`: The task name

New entities are continually being defined as the spec grows, and in principle (though not always in practice), PyBIDS should be aware of all entities that are defined in the BIDS specification.

### Filtering by metadata
All of the entities listed above are found in the names of BIDS files. But sometimes we want to search for files based not just on their names, but also based on metadata defined (per the BIDS spec) in JSON files. Fortunately for us, when we initialize a `BIDSLayout`, all metadata files associated with BIDS files are automatically indexed. This means we can pass any key that occurs in any JSON file in our project as an argument to `.get()`. We can combine these with any number of core BIDS entities (like `subject`, `run`, etc.).

For example, say we want to retrieve all files where (a) the value of `SamplingFrequency` (a metadata key) is `100`, (b) the `acquisition` type is `'prefrontal'`, and (c) the subject is `'01'` or `'02'`. Here's how we can do that:

In [27]:
# Retrieve all files where RepetitionTime (a metadata key) = 6
# and acquisition = tse, for the three subjects
# layout.get(subject=['078', '188', '219'], RepetitionTime=6, acquisition="tse")
# get all 
layout.get(subject=['219'], SliceThickness=3)

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_magnitude1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_magnitude2.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_magnitude1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_magnitude2.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.nii.gz'>]