# Pybids

[`pybids`](https://github.com/bids-standard/pybids) is a tool to query, summarize and manipulate data using the BIDS standard. 
In this tutorial we will use a `pybids` test dataset to illustrate some of the functionality of `pybids.layout`

## Jupyter Notebook Tips
- There is a table of contents based on markdown: toggle with hamburger menu on far left
- Hit ESC to make sure you are in command mode and you are not inside the cell:
- Shift+Enter to run a cell
- `dd` to delete a cell
- `a` to insert a cell above
- `b` to insert a cell below

In [156]:
import os
import sys
import bids
from bids import BIDSLayout

## Create the `BIDSLayout` Instance

At the core of pybids is the `BIDSLayout` object. A `BIDSLayout` is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the `BIDSLayout` initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly initialize a `BIDSLayout` by passing in the BIDS dataset root location as a single argument:

In [160]:
# Here we're starting in /Volumes/Main/Working/IntendedFor and adding the data to create the path to create the
# variable bids_dir to hold the dataset root
bids_dir = os.path.join(os.getcwd(), "data")

# Initialize the layout
# following setting avoids an annoying warning message about deprecated feature
bids.config.set_option('extension_initial_dot', True)
layout = BIDSLayout(bids_dir, validate=False)

# Print some basic information about the layout
print(layout)
print("")
print(f"type of layout is: {type(layout)}")

BIDS Layout: .../Main/Working/IntendedFor/data | Subjects: 3 | Sessions: 2 | Runs: 10

type of layout is: <class 'bids.layout.layout.BIDSLayout'>


## BIDSLayout get()

- When we initialize a `BIDSLayout`, all of the files and metadata found under the specified root folder are indexed. This can take a few seconds (or, for very large datasets, a minute or two). 
- Once initialization is complete, we can start querying the `BIDSLayout` in various ways. 
- The workhorse method is [`.get()`](https://bids-standard.github.io/pybids/generated/bids.grabbids.BIDSLayout.html#bids.grabbids.BIDSLayout.get). If we call `.get()` with no additional arguments, we get back a list of all the BIDS files in our dataset. 
- When you call `.get()` on a `BIDSLayout`, the default returned values are objects of class `BIDSFile`. A `BIDSFile` is a lightweight container for individual files in a BIDS dataset. It provides easy access to a variety of useful attributes and methods. 

These three special functions retrieve Python **lists** containing **strings**.

- subjects
- sessions
- tasks

### Get count of files with layout.get()

In [None]:
all_files = layout.get()

# count of files in layout.get using len:
print(f"Using len to count files from layout.get(): {len(all_files)}")

### Built-in layout.get() functiona

These three special functions retrieve Python **lists** containing **strings**.

#### subjects

In [163]:
# Special built-in function to list subjects 
sub_list=layout.get_subjects()
print(sub_list)
print("")
# Retrieve the type of the collection and of the items in the collection

print(f"The sub_list is of type: {type(sub_list)}")
print(f"An item from the list is of type: {type(sub_list[0])}")
# Alternatively 
# layout.get(return_type='id', target='subject') 
# does the same thing, returning a list of strings

['078', '188', '219']

The sub_list is of type: <class 'list'>
An item from the list is of type: <class 'str'>


#### sesions

In [118]:
# Special built-in function to list sessions
layout.get_sessions()

# Alternatively
# layout.get(return_type='id', target='session')
# does the same thing

['ctbs', 'itbs']

#### tasks

In [119]:
# Special built-in function to list tasks
layout.get_tasks()

# Alternatively
# layout.get(return_type='id', target='task')
# does the same thing

['nad1', 'rest', 'russian', 'wb2']

### Filter the `BIDSLayout` Queries

The utility of the `BIDSLayout` would be pretty limited if all we could do was retrieve a list of all files in the dataset. Fortunately, the `.get()` method accepts all kinds of arguments that allow us to filter the result set based on specified criteria. In fact, we can pass *any* BIDS-defined keywords (or, as they're called in PyBIDS, *entities*) as constraints. 

If you're wondering what entities you can pass in as filtering arguments, the answer is contained in the `.json` configuration files [housed here](https://github.com/bids-standard/pybids/tree/master/bids/layout/config). To save you the trouble, here are a few of the most common entities:

* `suffix`: The part of a BIDS filename just before the extension (e.g., `'bold'`, `'events'`, `'phasediff'` etc.).
* `subject`: The subject label (the string after `sub-`
* `session`: The session label (the string after `ses-`
* `run`: The run index (the string after `run-`
* `task`: The task name (the string after `task`

The spec is evolving.

#### get file objects where suffix=T1w  

In [325]:
# Return all T1w file objects (the default): This is a list of objects of type `BIDSFile`:
# Both of these statements produce the same results
# layout.get(target='subject', suffix='T1w')
T1w=layout.get(suffix='T1w')

# Print one per line
# for t1 in T1w:
#    print(t1)
    
# Rewrite as list comprehension

[ print(t1) for t1 in T1w]
    
print("")
print(f"filetype of default collection: {type(T1w)}")
print(f"filetype of item: {type(T1w[0])}")

<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.json'>
<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.nii.gz'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.json'>
<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.nii.gz'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.json'>
<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.nii.gz'>
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.json'>
<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.nii.gz'>

filetype of default collection: <class 'list'>
filetype of item: <class 'bids.layout.models.BIDSJSONFile'>


#### get filename strings where suffix=T1w

In [326]:
# Return all T1w file names: This is a list of strings instead of objects because return-type='filename':
t1_ft=layout.get(return_type='filename', suffix='T1w')

# for t1f in t1_ft:
#    print(t1f)

# Rewrite for loop as list comprehension
[print(t1f) for t1f in t1_ft]    

print("")
print(f"filetype of collection for return_type='filename': {type(t1_ft)}")
print(f"filetype of item: {type(t1_ft[0])}")

/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.json
/Volumes/Main/Working/IntendedFor/data/sub-078/anat/sub-078_T1w.nii.gz
/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.json
/Volumes/Main/Working/IntendedFor/data/sub-188/anat/sub-188_T1w.nii.gz
/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.json
/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_T1w.nii.gz
/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.json
/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.nii.gz

filetype of collection for return_type='filename': <class 'list'>
filetype of item: <class 'str'>


#### get file objects for one subject, session & suffix

In [330]:
# Return T1w file names for one subject and session only:
layout.get(subject='219', session='itbs', suffix='T1w')

[<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_T1w.nii.gz'>]

#### get file objects for one subject, extension & suffix: return relative paths

In [327]:
# Return relative paths for all BOLD image files (*.nii.gz) for a single subject:

print("Relative paths for all sub-078 fmri image files:")
print("")    

# Here's where the work happens:
fmri_078=layout.get(subject='078', extension='nii.gz', suffix='bold')

# Here we extract the relative path for each one
# for fmri in fmri_078:
#    print(fmri.relpath)
    
[ print(fmri.relpath) for fmri in fmri_078 ]    

print("")        
print(f"type of the collection: {type(fmri_078)}") 
print(f"type of items in collection: {type(fmri_078[0])}")   
print(f"type of item containing the relative path is: {type(f.relpath)}")      


Relative paths for all sub-078 fmri image files:

sub-078/func/sub-078_task-russian_run-01_bold.nii.gz
sub-078/func/sub-078_task-russian_run-02_bold.nii.gz
sub-078/func/sub-078_task-russian_run-03_bold.nii.gz
sub-078/func/sub-078_task-russian_run-04_bold.nii.gz
sub-078/func/sub-078_task-wb2_run-01_bold.nii.gz
sub-078/func/sub-078_task-wb2_run-02_bold.nii.gz
sub-078/func/sub-078_task-wb2_run-03_bold.nii.gz

type of the collection: <class 'list'>
type of items in collection: <class 'bids.layout.models.BIDSImageFile'>
type of item containing the relative path is: <class 'str'>


#### get filename string for first file [0]

In [270]:
fmri_078_file1=layout.get(subject='078', extension='nii.gz', suffix='bold')[0].filename
print(f"Retrieve the filename of the first fmri image for sub-078:\n{fmri_078_file1}") 

Retrieve the filename of the first fmri image for sub-078:
sub-078_task-russian_run-01_bold.nii.gz


#### get all phasediff JSON file relative paths

In [328]:
# Return all phasediff JSON files
print("Return all phasediff JSON files as relative paths:")
print("")

# Here's where the work happens
pd_json_files=layout.get(target='subject', suffix='phasediff', extension='json')

# To print the relative path, you need a for-loop that gets each item in the list
# for pd_json_file in pd_json_files:
#    print(pd_json_file.relpath)
    
[print(pd_json_file.relpath) for pd_json_file in pd_json_files ]    

print("")    
print("==================================")
# These items can also be extracted by their indices:
print("extract the 2nd indexed file:")
pdj=pd_json_files[1]
print(pdj)

print("")
print("==================================")
print(f"pd_json_file is of type:\n {type(pd_json_files)}")
print(f"first pd_json_file is of type:\n {type(pdj)}")

Return all phasediff JSON files as relative paths:

sub-078/fmap/sub-078_phasediff.json
sub-188/fmap/sub-188_phasediff.json
sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.json
sub-219/ses-itbs/fmap/sub-219_ses-itbs_phasediff.json

extract the 2nd indexed file:
<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.json'>

pd_json_file is of type:
 <class 'list'>
first pd_json_file is of type:
 <class 'bids.layout.models.BIDSJSONFile'>


#### get path for the IntendedFor field for each filename

In [283]:
# Generate the path for the IntendedFor field for each filename
print("Return the datatype (hence directory label) and filename for all phasediff JSON files:")
print("")
for j_file in pd_json_files:
    print(f"{j_file.entities['datatype']}/{j_file.filename}")

Return the datatype (hence directory label) and filename for all phasediff JSON files:

fmap/sub-078_phasediff.json
fmap/sub-188_phasediff.json
fmap/sub-219_ses-ctbs_phasediff.json
fmap/sub-219_ses-itbs_phasediff.json


### Other `return_type` values: `id` and `dir`

- `get()` can return unique values (or ids) of particular entities. For example, say we want to know which subjects have at least one `tse` file. 
- We can request that information by setting `return_type='id'`. 
- When using this option, we also need to specify a target entity (or metadata keyword) called `target`. 
- This combination tells the `BIDSLayout` to return the unique values for the specified `target` entity. 
- For example, in the next example, we ask for all of the unique subject IDs that have at least one file with a `tse` suffix:

#### get ids for target subjects with at least one tse image (turbo-spin-echo)

In [285]:
# Return ids of for target subjects with at least one tse file
layout.get(return_type='id', target='subject', acquisition="tse")

['219']

#### get ids of for target **subjects** with at least one nad1 task file

In [286]:
# Return ids of subjects with at least one task=nad1
layout.get(return_type='id', target='subject', task="nad1")

['188']

#### get ids of for target **sessions** with at least one tse image

In [218]:
# Return ids of for target sessions with at least one tse image
layout.get(return_type='id', target='session', acquisition="tse")

['ctbs', 'itbs']

#### get dirs for subject (not if there are sessions!)

If our `target` is a BIDS entity that corresponds to a particular directory in the BIDS spec (e.g., `subject` or `session`) we can also use `return_type='dir'` to get all matching subdirectories:

In [222]:
# Return all subject directories (though apparently not those with sessions): This is a list of strings. 
out_dir=layout.get(return_type='dir', target='subject')
# It is not allowed to set multiple targets
# out_ft=layout.get(return_type='dir', target=['subject','session'])

print(out_dir)
print(f"filetype of collection for return_type='filename'; {type(out_dir)}")
print(f"filetype of item; {type(out_dir[0])}")

['/Volumes/Main/Working/IntendedFor/data/sub-078',
 '/Volumes/Main/Working/IntendedFor/data/sub-188']

#### get session dirs if they exist 

In [289]:
out_ses_dir=layout.get(return_type='dir', target='session')
print(out_ses_dir)
print("")
print(f"filetype of collection for return_type='filename'; {type(out_ses_dir)}")
print(f"filetype of item; {type(out_ses_dir[0])}")

['/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs', '/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs']

filetype of collection for return_type='filename'; <class 'list'>
filetype of item; <class 'str'>


## BIDSLayout Queries for JSON MetaData

### print content of a json file

In [290]:
print("Return contents of the individual JSON file extracted above as a dictionary:")
print("")
# Use print to honor the newlines in the text
print(pdj.get_json())

Return contents of the individual JSON file extracted above as a dictionary:

{
  "AcquisitionMatrixPE": 68,
  "AcquisitionNumber": 1,
  "AcquisitionTime": "14:35:26.635000",
  "BaseResolution": 68,
  "BodyPartExamined": "BRAIN",
  "ConsistencyInfo": "N4_VD13A_LATEST_20120616",
  "ConversionSoftware": "dcm2niix",
  "ConversionSoftwareVersion": "v1.0.20190410  GCC6.3.0",
  "DeviceSerialNumber": "45424",
  "DwellTime": 2.53e-05,
  "EchoNumber": 2,
  "EchoTime": 0.00738,
  "EchoTime1": 0.00492,
  "EchoTime2": 0.00738,
  "FlipAngle": 60,
  "ImageOrientationPatientDICOM": [1, -2.04604e-10, -1.4307e-11, 2.05103e-10, 0.997564, 0.0697565],
  "ImageType": [
    "ORIGINAL",
    "PRIMARY",
    "P",
    "ND"
],
  "ImagingFrequency": 123.232,
  "InPlanePhaseEncodingDirectionDICOM": "ROW",
  "InstitutionAddress": "North_Warren_Ave_1609_Tucson_Denver_US_85719",
  "InstitutionName": "University_of_Arizona",
  "InstitutionalDepartmentName": "Department",
  "IntendedFor": [
    "func/sub-188_task-nad1_r

#### get all files for a subject where SliceThickness (a metadata key) = 3

In [291]:
# Retrieve all files for subject 219 where SliceThickness (a metadata key) = 3
layout.get(subject=['219'], SliceThickness=3)

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_acq-tse_T2w1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_acq-tse_T2w2.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_magnitude1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_magnitude2.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/fmap/sub-219_ses-ctbs_phasediff.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_acq-asl_run-01.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/func/sub-219_ses-ctbs_acq-asl_run-02.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_acq-tse_T2

#### get image file names where TR=6 and the acquisition=tse 

In [292]:
# Retrieve image file names for which TR=6 and the acquisition is tse (turbo-spin-echo) 
layout.get(RepetitionTime=6, acquisition="tse")

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_acq-tse_T2w1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-ctbs/anat/sub-219_ses-ctbs_acq-tse_T2w2.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_acq-tse_T2w1.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-219/ses-itbs/anat/sub-219_ses-itbs_acq-tse_T2w2.nii.gz'>]

#### get image file names where TR=2.6 & suffix=bold
You can always **pass in a list rather than a string** (illustrated for subject).  
This will be interpreted as a logical disjunction (i.e., a file must match any one of the provided values).   

In [152]:
layout.get(subject=['078', '188', '219'], RepetitionTime=2.6, suffix="bold")

[<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/func/sub-078_task-wb2_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/func/sub-078_task-wb2_run-02_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-078/func/sub-078_task-wb2_run-03_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-02_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-03_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-04_bold.nii.gz'>]

## The `BIDSFile`
Here are some of the attributes and methods available to us in a `BIDSFile` (note that some of these are only available for certain subclasses of `BIDSFile`; e.g., you can't call `get_image()` on a `BIDSFile` that doesn't correspond to an image file!):
* `.path`: The full path of the associated file
* `.filename`: The associated file's filename (without directory)
* `.dirname`: The directory containing the file
* `.get_entities()`: Returns information about entities associated with this `BIDSFile` (optionally including metadata)
* `.get_image()`: Returns the file contents as a nibabel image (only works for image files)
* `.get_df()`: Get file contents as a pandas DataFrame (only works for TSV files)
* `.get_metadata()`: Returns a dictionary of all metadata found in associated JSON files
* `.get_associations()`: Returns a list of all files associated with this one in some way

Let's see some of these in action.

In [315]:
# Pick the 15th file in the dataset
bf = layout.get()[55]

# Print it
print(bf)


<BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.nii.gz'>


#### Print entities for a file

In [316]:
# Print all the entities associated with this file, and their values
print(bf.get_entities())

{'datatype': 'fmap', 'extension': '.nii.gz', 'fmap': 'phasediff', 'subject': '188', 'suffix': 'phasediff'}


#### Print metadata for the file

In [317]:
# Print all the JSON sidecar metadata associated with this image file 
# If you try this with a JSON file instead of an image file, it returns an empty dictionary
bf.get_metadata()

{'AcquisitionMatrixPE': 68,
 'AcquisitionNumber': 1,
 'AcquisitionTime': '14:35:26.635000',
 'BaseResolution': 68,
 'BodyPartExamined': 'BRAIN',
 'ConsistencyInfo': 'N4_VD13A_LATEST_20120616',
 'ConversionSoftware': 'dcm2niix',
 'ConversionSoftwareVersion': 'v1.0.20190410  GCC6.3.0',
 'DeviceSerialNumber': '45424',
 'DwellTime': 2.53e-05,
 'EchoNumber': 2,
 'EchoTime': 0.00738,
 'EchoTime1': 0.00492,
 'EchoTime2': 0.00738,
 'FlipAngle': 60,
 'ImageOrientationPatientDICOM': [1,
  -2.04604e-10,
  -1.4307e-11,
  2.05103e-10,
  0.997564,
  0.0697565],
 'ImageType': ['ORIGINAL', 'PRIMARY', 'P', 'ND'],
 'ImagingFrequency': 123.232,
 'InPlanePhaseEncodingDirectionDICOM': 'ROW',
 'InstitutionAddress': 'North_Warren_Ave_1609_Tucson_Denver_US_85719',
 'InstitutionName': 'University_of_Arizona',
 'InstitutionalDepartmentName': 'Department',
 'IntendedFor': ['func/sub-188_task-nad1_run-01_bold.nii.gz',
  'func/sub-188_task-nad1_run-02_bold.nii.gz',
  'func/sub-188_task-nad1_run-03_bold.nii.gz',
  

#### Print union of metadata and entities

In [318]:
# We can print the union of both of the above in one shot like this
bf.get_entities(metadata='all')

{'AcquisitionMatrixPE': 68,
 'AcquisitionNumber': 1,
 'AcquisitionTime': '14:35:26.635000',
 'BaseResolution': 68,
 'BodyPartExamined': 'BRAIN',
 'ConsistencyInfo': 'N4_VD13A_LATEST_20120616',
 'ConversionSoftware': 'dcm2niix',
 'ConversionSoftwareVersion': 'v1.0.20190410  GCC6.3.0',
 'DeviceSerialNumber': '45424',
 'DwellTime': 2.53e-05,
 'EchoNumber': 2,
 'EchoTime': 0.00738,
 'EchoTime1': 0.00492,
 'EchoTime2': 0.00738,
 'FlipAngle': 60,
 'ImageOrientationPatientDICOM': [1,
  -2.04604e-10,
  -1.4307e-11,
  2.05103e-10,
  0.997564,
  0.0697565],
 'ImageType': ['ORIGINAL', 'PRIMARY', 'P', 'ND'],
 'ImagingFrequency': 123.232,
 'InPlanePhaseEncodingDirectionDICOM': 'ROW',
 'InstitutionAddress': 'North_Warren_Ave_1609_Tucson_Denver_US_85719',
 'InstitutionName': 'University_of_Arizona',
 'InstitutionalDepartmentName': 'Department',
 'IntendedFor': ['func/sub-188_task-nad1_run-01_bold.nii.gz',
  'func/sub-188_task-nad1_run-02_bold.nii.gz',
  'func/sub-188_task-nad1_run-03_bold.nii.gz',
  

#### Get associated files

Here are all the files associated with our target file in some way. Notice how we get back both the JSON sidecar for our target file, and the BOLD runs that it is intended for SDC (susceptibility distortion correction)

In [321]:
bf.get_associations()

[<BIDSJSONFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/fmap/sub-188_phasediff.json'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-01_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-02_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-03_bold.nii.gz'>,
 <BIDSImageFile filename='/Volumes/Main/Working/IntendedFor/data/sub-188/func/sub-188_task-nad1_run-04_bold.nii.gz'>]

## Use Synthetic Dataset to create dataframe

In cases where a file has a `.tsv.gz` or `.tsv` extension, it will automatically be created as a `BIDSDataFile`, and we can easily grab the contents as a pandas `DataFrame`:

In [323]:
# Use a different test dataset--one that contains physio recording files
data_path = os.path.join(get_test_data_path(), 'synthetic')
layout2 = BIDSLayout(data_path)

# Get the first physiological recording file
recfile = layout2.get(suffix='physio')[0]

# Get contents as a DataFrame and show the first few rows
df = recfile.get_df()
df.head()

Unnamed: 0,onset,respiratory,cardiac
0,0.0,-0.714844,-0.262109
1,0.1,-0.757342,0.048933
2,0.2,-0.796851,0.355185
3,0.3,-0.833215,0.626669
4,0.4,-0.866291,0.83681


While it would have been easy enough to read the contents of the file ourselves with pandas' `read_csv()` method, notice that in the above example, `get_df()` saved us the trouble of having to read the physiological recording file's metadata, pull out the column names and sampling rate, and add timing information.

Mind you, if we don't *want* the timing information, we can ignore it:

In [15]:
recfile.get_df(include_timing=False).head()

Unnamed: 0,respiratory,cardiac
0,-0.757342,0.048933
1,-0.796851,0.355185
2,-0.833215,0.626669
3,-0.866291,0.83681
4,-0.895948,0.965038


## Other utilities

### Filename parsing
Say you have a filename, and you want to manually extract BIDS entities from it. The `parse_file_entities` method provides the facility:

In [16]:
path = "/a/fake/path/to/a/BIDS/file/sub-01_run-1_T2w.nii.gz"
layout.parse_file_entities(path)

{'subject': '01', 'run': 1, 'suffix': 'T2w', 'extension': 'nii.gz'}

A version of this utility independent of a specific layout is available at `bids.layout` ([doc](https://bids-standard.github.io/pybids/generated/bids.layout.parse_file_entities.html#bids.layout.parse_file_entities)) - 

In [17]:
from bids.layout import parse_file_entities

path = "/a/fake/path/to/a/BIDS/file/sub-01_run-1_T2w.nii.gz"
parse_file_entities(path)

{'extension': 'nii.gz', 'run': 1, 'subject': '01', 'suffix': 'T2w'}

### Path construction
You may want to create valid BIDS filenames for files that are new or hypothetical that would sit within your BIDS project. This is useful when you know what entity values you need to write out to, but don't want to deal with looking up the precise BIDS file-naming syntax. In the example below, imagine we've created a new file containing stimulus presentation information, and we want to save it to a `.tsv.gz` file, per the BIDS naming conventions. All we need to do is define a dictionary with the name components, and `build_path` takes care of the rest (including injecting sub-directories!):

In [18]:
entities = {
    'subject': '01',
    'run': 2,
    'task': 'nback',
    'suffix': 'bold'
}

layout.build_path(entities)

'sub-01/func/sub-01_task-nback_run-2_bold.nii.gz'

You can also use `build_path` in more sophisticated ways—for example, by defining your own set of matching templates that cover cases not supported by BIDS out of the box. For example, suppose you want to create a template for naming a new z-stat file. You could do something like:

In [19]:
# Define the pattern to build out of the components passed in the dictionary
pattern = "sub-{subject}[_ses-{session}]_task-{task}[_acq-{acquisition}][_rec-{reconstruction}][_run-{run}][_echo-{echo}]_{suffix<z>}.nii.gz",

entities = {
    'subject': '01',
    'run': 2,
    'task': 'nback',
    'suffix': 'z'
}

# Notice we pass the new pattern as the second argument
layout.build_path(entities, pattern, validate=False)


'sub-01_task-nback_run-2_z.nii.gz'

Note that in the above example, we set `validate=False` to ensure that the standard BIDS file validator doesn't run (because the pattern we defined isn't actually compliant with the BIDS specification).

The `scope` argument to `get()` specifies which part of the project to look in. By default, valid values are `'bids'` (for the "raw" BIDS project that excludes derivatives) and `'derivatives'` (for all BIDS-derivatives files). You can also pass the names of individual derivatives pipelines (e.g., passing `'fmriprep'` would search only in a `/derivatives/fmriprep` folder). Either a string or a list of strings can be passed.

The following call returns the filenames of all derivatives files.

### Exporting a `BIDSLayout` to a pandas `Dataframe`
If you want a summary of all the files in your `BIDSLayout`, but don't want to have to iterate `BIDSFile` objects and extract their entities, you can get a nice bird's-eye view of your dataset using the `to_df()` method.

In [244]:
# Convert the layout to a pandas dataframe
df = layout.to_df()
df.head()

entity,path,acquisition,datatype,direction,extension,fmap,run,scans,session,subject,suffix,task
0,/Volumes/Main/Working/IntendedFor/data/dataset...,,,,.json,,,,,,description,
1,/Volumes/Main/Working/IntendedFor/data/sub-078...,,anat,,.json,,,,,78.0,T1w,
2,/Volumes/Main/Working/IntendedFor/data/sub-078...,,anat,,.nii.gz,,,,,78.0,T1w,
3,/Volumes/Main/Working/IntendedFor/data/sub-078...,,anat,,.nii.gz,,,,,78.0,defacemask,
4,/Volumes/Main/Working/IntendedFor/data/sub-078...,AP,dwi,,.bval,,,,,78.0,dwi,


We can also include metadata in the result if we like (which may blow up our `DataFrame` if we have a large dataset). Note that in this case, most of our cells will have missing values.

In [245]:
layout.to_df(metadata=True).head()

entity,path,AcquisitionMatrixPE,AcquisitionNumber,AcquisitionTime,BandwidthPerPixelPhaseEncode,BaseResolution,BodyPartExamined,BolusDuration,ConsistencyInfo,ConversionSoftware,...,filename,fmap,operator,randstr,run,scans,session,subject,suffix,task
0,/Volumes/Main/Working/IntendedFor/data/dataset...,,,,,,,,,,...,,,,,,,,,description,
1,/Volumes/Main/Working/IntendedFor/data/sub-078...,,,,,,,,,,...,,,,,,,,78.0,T1w,
2,/Volumes/Main/Working/IntendedFor/data/sub-078...,240.0,1.0,15:37:3.697500,,256.0,HEAD,,N4_VD13A_LATEST_20120616,dcm2niix,...,,,,,,,,78.0,T1w,
3,/Volumes/Main/Working/IntendedFor/data/sub-078...,,,,,,,,,,...,,,,,,,,78.0,defacemask,
4,/Volumes/Main/Working/IntendedFor/data/sub-078...,128.0,1.0,15:29:39.717500,16.622,128.0,HEAD,,N4_VD13A_LATEST_20120616,dcm2niix,...,,,,,,,,78.0,dwi,


## Retrieving BIDS variables 
BIDS variables are stored in .tsv files at the run, session, subject, or dataset level. You can retrieve these variables with `layout.get_collections()`. The resulting objects can be converted to dataframes and merged with the layout to associate the variables with corresponding scans.

In the following example, we request all subject-level variable data available anywhere in the BIDS project, and merge the results into a single `DataFrame` (by default, we'll get back a single `BIDSVariableCollection` object for each subject). 

## BIDSValidator

`pybids` implicitly imports a `BIDSValidator` class from the separate [`bids-validator`](https://github.com/bids-standard/bids-validator) package. You can use the `BIDSValidator` to determine whether a filepath is a valid BIDS filepath, as well as answering questions about what kind of data it represents. Note, however, that this implementation of the BIDS validator is *not* necessarily up-to-date with the JavaScript version available online. Moreover, the Python validator only tests individual files, and is currently unable to validate entire BIDS datasets. For that, you should use the [online BIDS validator](https://bids-standard.github.io/bids-validator/).

In [248]:
from bids import BIDSValidator

# Note that when using the bids validator, the filepath MUST be relative to the top level bids directory
validator = BIDSValidator()
validator.is_bids('/sub-188/anat/sub-188_T1w.nii.gz')

True

In [249]:
# Can decide if a filepath represents a file part of the specification
validator.is_file('/sub-188/anat/sub-188_T1w.json')

True

In [250]:
# Can check if a file is at the top level of the dataset
validator.is_top_level('/dataset_description.json')

True

In [251]:
# or subject (or session) level
validator.is_subject_level('/dataset_description.json')

False

In [252]:
validator.is_session_level('/sub-219/ses-itbs/sub-219_ses-itbs_scans.json')

True

In [253]:
# Can decide if a filepath represents phenotypic data
validator.is_phenotypic('/sub-188/anat/sub-188_T1w.nii.gz')

False