# Introduction to PyBIDS for parsing fUSI-BIDS datasets 

This notebook is adapted from the [PyBIDS tutorial](https://bids-standard.github.io/pybids/index.html#), with focus on how parse fUSI-BIDS datasets.

In [1]:
from bids import BIDSLayout

PyBIDS is a tool to query, summarize, and manipulate data using the BIDS standard. In this tutorial, we will use the example fUSI-BIDS dataset available for download on the [fUSI-BIDS specification draft](https://docs.google.com/document/d/1W3z01mf1E8cfg_OY7ZGqeUeOKv659jCHQBXavtmT-T8/edit?tab=t.0#heading=h.4k1noo90gelw) to illustrate some of the functionality of pybids.

## 1 Parsing the dataset

At the core of PyBIDS is the `BIDSLayout` class.  `BIDSLayout` is a lightweight Python class that represents a BIDS project file tree and provides a variety of helpful methods for querying and manipulating BIDS files. While the BIDSLayout initializer has a large number of arguments you can use to control the way files are indexed and accessed, you will most commonly parse a fUSI-BIDS dataset by passing in three arguments:

In [2]:
# Initialize the layout
layout = BIDSLayout("fusi-bids-examples/datasets/0.0.10", config="fusi_bids.json", validate=False)

# Print some basic information about the layout
layout

BIDS Layout: ...-bids-examples/datasets/0.0.10 | Subjects: 10 | Sessions: 20 | Runs: 0

We set `validate` to `False` because the BIDS validator does not yet implement the fUSI-BIDS specification. Additionally, we set a custom PyBIDS configuration file to define the new `pose` entity and `fus`/`angio` datatypes introduced by the fUSI-BIDS specification.

## 2 Querying the dataset

When we initialize a `BIDSLayout`, all of the files and metadata found under the specified root folder are indexed. Once initialization is complete, we can start querying the `BIDSLayout` in various ways. The workhorse method is `get()`. If we call `get()` with no additional arguments, we get back a list of all the BIDS files in our dataset:

In [3]:
all_files = layout.get()

print("There are {} files in the layout.".format(len(all_files)))
print("\nThe first 10 files are:")
all_files[:10]

There are 337 files in the layout.

The first 10 files are:


[<BIDSFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/CHANGES.md'>,
 <BIDSJSONFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/dataset_description.json'>,
 <BIDSJSONFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/participants.json'>,
 <BIDSDataFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/participants.tsv'>,
 <BIDSJSONFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/pwd.json'>,
 <BIDSFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/README.md'>,
 <BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/angio/sub-01_ses-treatment_pwd.nii.gz'>,
 <BIDSJSONFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-exampl

The returned object is a Python list. By default, each element in the list is a `BIDSFile` object. We discuss the `BIDSFile` object in much more detail below. For now, let’s simplify things and work with just filenames:

In [4]:
layout.get(return_type="filename")[:10]

['/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/CHANGES.md',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/dataset_description.json',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/participants.json',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/participants.tsv',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/pwd.json',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/README.md',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/angio/sub-01_ses-treatment_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/fus/sub-01_ses-treatment_task-awake_pose-01_pwd.json',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-0

## 3 Filtering files by entities

The utility of the `BIDSLayout` would be pretty limited if all we could do was retrieve a list of all files in the dataset. Fortunately, the `get()` method accepts all kinds of arguments that allow us to filter the result based on specified criteria. In fact, we can pass any BIDS-defined entity as constraint. For example, here’s how we would retrieve all functional acquisitions with task `stim` for session `treatment`:

In [5]:
layout.get(session="treatment", task="stim", extension=".nii.gz", return_type="filename")

['/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/fus/sub-01_ses-treatment_task-stim_acq-bregmaMinus1_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/fus/sub-01_ses-treatment_task-stim_acq-bregmaMinus2_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/fus/sub-01_ses-treatment_task-stim_acq-bregmaPlus05_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-02/ses-treatment/fus/sub-02_ses-treatment_task-stim_acq-bregmaMinus1_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-02/ses-treatment/fus/sub-02_ses-treatment_task-stim_acq-bregmaMinus2_pwd.nii.gz',
 '/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-02/ses-treatment/fus/sub-02_ses-treatment_task-stim_acq-bregma

If you’re wondering what entities you can pass in as filtering arguments, the answer is contained in the JSON configuration files housed here. To save you the trouble, here are a few of the most common BIDS entities:

- `suffix`: The part of a BIDS filename just before the extension (e.g., `pwd`, `bold`, `events`, `physio`, etc.).
- `subject`: The subject label.
- `session`: The session label.
- `run`: The run index.
- `task`: The task name.

PyBIDS should be aware of all entities that are defined in the BIDS specification. Since we instantiated `BIDSLayout` using a custom fUSI-BIDS configuration file, it is also aware of the `pose` entity!

## 4 Filtering by metadata

All of the entities listed above are found in the names of BIDS files. But sometimes we want to search for files based not just on their names, but also based on metadata defined (per the BIDS specification) in JSON files. Fortunately for us, when we initialize a BIDSLayout, all metadata files associated with BIDS files are automatically indexed. This means we can pass any key that occurs in any JSON file in our project as an argument to `get()`. We can combine these with any number of core BIDS entities (like `subject`, `run`, etc.).

For example, say we want to retrieve all files where 

1. the value of `UltrafastSamplingFrequency` (a metadata key) is 500, 
2. the acquisition type is `bregmaPlus05`, and 
3. the subject is `01` or `02`. 

Here’s how we can do that:

In [6]:
layout.get(
    subject=("01", "02"), UltrafastSamplingFrequency=500, acquisition="bregmaPlus05"
)

[<BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-treatment/fus/sub-01_ses-treatment_task-stim_acq-bregmaPlus05_pwd.nii.gz'>,
 <BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-vehicle/fus/sub-01_ses-vehicle_task-stim_acq-bregmaPlus05_pwd.nii.gz'>,
 <BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-02/ses-treatment/fus/sub-02_ses-treatment_task-stim_acq-bregmaPlus05_pwd.nii.gz'>,
 <BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-02/ses-vehicle/fus/sub-02_ses-vehicle_task-stim_acq-bregmaPlus05_pwd.nii.gz'>]

Notice that we passed a tuple in for subject rather than just a string. This principle applies to all filters: you can always pass in a list instead of a single value, and this will be interpreted as a logical disjunction (i.e., a file must match any one of the provided values).

## 5 Other `get()` options

The `get()` method has a number of other useful arguments that control its behavior. We won’t discuss these in detail here, but briefly, here are a couple worth knowing about:

- `regex_search`: If you set this to `True`, string filter argument values will be interpreted as regular expressions.
- `scope`: If your BIDS dataset contains BIDS-derivatives sub-datasets, you can specify the scope (e.g., `derivatives`, or a BIDS-Derivatives pipeline name) of the search space.



## 6 The `BIDSFile`

When you call `get()` on a `BIDSLayout`, the default returned values are objects of class `BIDSFile`. A `BIDSFile` is a lightweight container for individual files in a BIDS dataset. It provides easy access to a variety of useful attributes and methods. Let’s take a closer look. First, let’s pick a random file from our existing layout.

In [7]:
bf = layout.get()[26]
bf

<BIDSImageFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-vehicle/fus/sub-01_ses-vehicle_task-awake_pose-02_pwd.nii.gz'>

Here are some of the attributes and methods available to us in a `BIDSFile` (note that some of these are only available for certain subclasses of `BIDSFile`; e.g., you can’t call `get_image()` on a `BIDSFile` that doesn’t correspond to an image file!):

- `path`: The full path of the associated file
- `filename`: The associated file’s filename (without directory)
- `dirname`: The directory containing the file
- `get_entities()`: Returns information about entities associated with this `BIDSFile` (optionally including metadata)
- `get_image()`: Returns the file contents as a NiBabel image (only works for image files)
- `get_df()`: Get file contents as a pandas DataFrame (only works for TSV files)
- `get_metadata()`: Returns a dictionary of all metadata found in associated JSON files
- `get_associations()`: Returns a list of all files associated with this one in some way

Let’s see some of these in action.

In [8]:
# Print all the entities associated with this file, and their values
bf.get_entities()

{'datatype': 'fus',
 'extension': '.nii.gz',
 'pose': '02',
 'session': 'vehicle',
 'subject': '01',
 'suffix': 'pwd',
 'task': 'awake'}

In [9]:
# Print all the metadata associated with this file
bf.get_metadata()

{'ClutterFilterWindowDuration': 400,
 'ClutterFilters': [{'FilterType': 'Butterworth low-pass',
   'HighThreshold': 25},
  {'FilterType': 'Fixed-threshold SVD',
   'LowThreshold': 60,
   'HighThreshold': 200}],
 'DelayAfterTrigger': 0.6,
 'DeviceSerialNumber': 'X23HFB12K8',
 'Manufacturer': 'Iconeus',
 'ManufacturersModelName': 'Iconeus One',
 'MaximalDepth': 10,
 'PlaneWaveAngles': [-10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10],
 'PowerDopplerIntegrationDuration': 400,
 'ProbeCentralFrequency': 15.625,
 'ProbeElevationAperture': 1.5,
 'ProbeElevationFocus': 8,
 'ProbeModel': 'IcoPrime',
 'ProbeNumberOfElements': 128,
 'ProbePitch': 0.11,
 'ProbeRadiusOfCurvature': 0,
 'ProbeType': 'linear',
 'ProbeVoltage': 25,
 'RepetitionTime': 2.4,
 'SequenceName': 'default',
 'SoftwareVersions': '1.5.0',
 'StationName': 'Machine01',
 'TaskDescription': 'Awake head-fixed state.',
 'TaskName': 'awake',
 'UltrafastSamplingFrequency': 500,
 'UltrasoundPulseRepetitionFrequency': 5500}

In [10]:
# We can the union of both of the above in one shot like this
bf.get_entities(metadata='all')

{'ClutterFilterWindowDuration': 400,
 'ClutterFilters': [{'FilterType': 'Butterworth low-pass',
   'HighThreshold': 25},
  {'FilterType': 'Fixed-threshold SVD',
   'LowThreshold': 60,
   'HighThreshold': 200}],
 'DelayAfterTrigger': 0.6,
 'DeviceSerialNumber': 'X23HFB12K8',
 'Manufacturer': 'Iconeus',
 'ManufacturersModelName': 'Iconeus One',
 'MaximalDepth': 10,
 'PlaneWaveAngles': [-10, -8, -6, -4, -2, 0, 2, 4, 6, 8, 10],
 'PowerDopplerIntegrationDuration': 400,
 'ProbeCentralFrequency': 15.625,
 'ProbeElevationAperture': 1.5,
 'ProbeElevationFocus': 8,
 'ProbeModel': 'IcoPrime',
 'ProbeNumberOfElements': 128,
 'ProbePitch': 0.11,
 'ProbeRadiusOfCurvature': 0,
 'ProbeType': 'linear',
 'ProbeVoltage': 25,
 'RepetitionTime': 2.4,
 'SequenceName': 'default',
 'SoftwareVersions': '1.5.0',
 'StationName': 'Machine01',
 'TaskDescription': 'Awake head-fixed state.',
 'TaskName': 'awake',
 'UltrafastSamplingFrequency': 500,
 'UltrasoundPulseRepetitionFrequency': 5500,
 'datatype': 'fus',
 'e

In [11]:
bf.get_associations()

[<BIDSJSONFile filename='/home/sdiebolt/Documents/Work/fusi-pybids-demo/fusi-bids-examples/datasets/0.0.10/sub-01/ses-vehicle/fus/sub-01_ses-vehicle_task-awake_pose-02_pwd.json'>]

## 7 Other utilities

If you want to learn more about PyBIDS, e.g. how to build paths from entities or load derivatives, check out the [PyBIDS tutorial](https://bids-standard.github.io/pybids/examples/pybids_tutorial.html).