bidsify - converts your (raw) data to the BIDS-format
This package offers a tool to convert your raw (f)MRI data to the "Brain Imaging Data Structuce" (BIDS) format. Using only a simple (json or yaml) config-file, it renames, reformats, and restructures your files such that it fits the BIDS naming scheme and conforms to file-formats specified by BIDS. After using
bidsify, you can run your data through BIDS-compatible analysis/preprocessing pipelines such as fmriprep
and mriqc package.
Currently, we use
bidsify at the Spinoza Centre for Neuroimaging (location REC) to convert data to BIDS after each scan-session. We automated this process, including automatic preprocessing and quality control, using another package, nitools (which essentially "glues together"
This package was originally developed to handle MRI-data from Philips scanners, which are traditionally exported
in the "PAR/REC" format. Currently,
bidsify also supports Philips (enhanced) DICOM (
DICOMDIR format) and Siemens DICOM (
.dcm extension), but the latter has not been fully tested yet!
bidsify is still very much in development, so there are probably still some bugs for data
that differs from our standard format (at the Spinoza Centre in Amsterdam) and the API might change
in the future. If you encounter any issues, please submit an issue or (better yet), submit a pull-request
with your proposed solution!
bidsify & dependencies
This package can be installed using
$ pip install bidsify
To get the "bleeding edge" version, you can install the master branch from github:
$ pip install git+https://github.com/spinoza-rec/bidsify.git@master
bidsify depends on the following Python packages:
- joblib (for parallelization)
Moreover, if you want to use the defacing option (i.e., removing facial features from anatomical images), make sure you have FSL installed, as well as the pydeface Python package. Also, to enable validating the BIDS-conversion process,(i.e., running
bidsify with the
-v flag), make sure to install bids-validator.
Lastly, if you want to use the Docker interface (i.e., running
bidsify with the -D flag), which obviates the need for installing dcm2niix/FSL/bids-validator, make sure to install Docker and make sure your user account has permission to run Docker (see below).
The current version (master branch) allows you to run
bidsify from docker, so you don't
have to install all the (large) dependencies (FSL, pydeface, dcm2niix, bids-validator, etc.). To do so,
you need to do the following.
- Install Docker (if you haven't already) and make sure you have permission to run Docker;
- Pull the Docker image:
docker pull lukassnoek/bidsify:0.x.x(fill in the latest version at the x.x);
- Run bidsify with the -D flag (e.g.,
bidsify -c /home/user/config.yml -d /home/user/data -D)
Now you can use
bidsify even without having FSL, dcm2niix, and other dependencies installed!
(You do need to install
bidsify itself though.)
How does it work?
After installing, the
bidsify command can be called as follows:
$ bidsify [-c config_file] [-d path_to_data_directory] [-o output_directory] [-v] [-D]
-c flag defaults to
config.yml in the current working directory.
-d flag defaults to the current working directory.
-o flag defaults to the parent-directory of the data-directory.
-v flag calls bids-validator after BIDS-conversion (optional).
-D flag runs
bidsify from Docker (recommended; see "Docker" section above).
For example, if you would call the following command ...
$ bidsify -c /home/user/data/config.yml -d /home/user/data
... your bidsified data will be in the following location:
/home/user ├── data | ├── config.yml | ├── s01 | └── s02 | └── bids ├── dataset_description.json ├── participants.tsv ├── sub-01 └── sub-02
This package aims to take in any MRI-dataset and convert it to BIDS using information from the
config-file provided by the user. Obviously,
bidsify cannot handle all types of scans/data,
but it can process most of the default scans/files we use at our MRI centre (Spinoza Centre), including
- Standard (gradient-echo) EPI scans, both multiband and sequential
- Standard (spin-echo) DWI scans
- "Pepolar" (gradient-echo) EPI scans (also called "topup")
- B0-based fieldmap scans (1 phase-difference + 1 magnitude image)
- T1-weighted and T2-weighted scans
bidsify can handle both PAR/REC and DICOM files. Moreover, in the future we want to enable processing of:
- Philips physiology-files ("SCANPHYSLOG" files; WIP, not functional yet)
In terms of "structure", this package allows the following "types" of datasets:
- Multi-subject, multi-session datasets
The config file
bidsify only needs a config-file in either the json or YAML format. This file should contain
information that can be used to rename and convert the raw files.
The config file contains a couple of sections, which are explained below (we'll use the YAML format).
The first (top-level) section (or "attribute" in JSON/YAML-lingo) in the file is the "options" section. An example of this section could be:
options: mri_ext: PAR # alternatives: DICOM, dcm, nifti debug: False n_cores: -1 subject_stem: sub deface: True spinoza_data: True out_dir: bids
No options need to be set explicitly as they all have sensible defaults. The attribute-value pairs mean the following:
mri_type: filetype of MRI-scans (PAR, dcm, DICOM, nifti; default: PAR)
n_cores: how many CPUs to use during conversion (default: -1, all CPUs)
debug: whether to print extra output for debugging (default: False)
subject_stem: prefix for subject-directories, e.g. "subject" in "subject-001" (default: sub)
deface: whether to deface the data (default: True, takes substantially longer though)
spinoza_data: whether data is from the Spinoza centre (default: False)
out_dir: name of directory to save results to (default: bids), relative to project-root.
Note that with respect to DICOM files, the
mri_type can be set to
DICOM (referring to Philips [enhanced] DICOM files) or
dcm (referring to Siemens DICOM files with the extension
The BIDS-format specifies the naming and format of several types of MRI(-related) filetypes.
These filetypes have specific suffixes, which are appended to the filenames in the renaming
process handled by
bidsify. The "mappings" section in the config is meant to
bidsify what filetype can be identified by which "key". Thus, the mappings
section consists of "filetype": "identifier" pairs. Basically, if BIDS requires a
specific suffix for a filetype, you need to specify that here. For example, a standard
dataset with several BOLD-fMRI files, a T1, and physiological recordings could have
a mappings section like this:
options: # ............. # mappings: bold: _func T1w: 3DT1 dwi: DWI physio: ppuresp events: log phasediff: _ph magnitude: _mag epi: topup T2w: T2w
Note that every file should belong to one, and only one, file-type! In other words,
bidsify should be able to figure out what kind of file it's dealing with from the filename. For example, if you have a file named
my_mri_file.PAR and you have configured the mappings as in the example above,
bidsify won't be able to figure out what file-type it's dealing with (a
bold file? A
T1w file?), because the filename does not contain any of the mappings (e.g.,
Moreover, the filename should not contain more than one file-type identifier! Suppose you have a file named
workingmemory_func_ppuresp.nii.gz; with the above mappings,
bidsify would conclude that it's either a
bold file (because the name contains
_func) OR a
physio file (because the name contains
ppuresp). As such,
bidsify is going to skip converting/renaming this file and move it to the unallocated directory. In summary: files should contain one, and only one, identifier (such as
_func) mapping to a particular file-type (e.g.,
Also, check the BIDS-specification for all filetypes supported by the format.
At the same (hierarchical) level as the "mappings" and "options" sections, a section with the name "metadata" can be optionally specified. This attribute may contain an arbitrary amount of attribute-value pairs which will be appended to each JSON-metadata file during the conversion. These are thus "dataset-general" metadata parameters. For example, you could specify the data of conversion here, if you'd like:
options: # some options mappings: # some mappings metadata: MagneticFieldStrength: 3 ParallelAcquisitionTechnique: SENSE InstitutionName: Spinoza Centre for Neuroimaging, location REC
mappings, and (optionally) the
the specifications for the four general "BIDS-datatypes" -
are listed in separate sections.
Each section, like
func, can contain multiple sub-sections referring to different scans
for that datatype. For example, you could have two different functional runs
with each a different task ("workingmemory" and "nback"). In that case, the "func"
section could look like:
options: # some options mappings: # some mappings func: wm-task: id: wmtask task: workingmemory nback-task: id: nbacktask task: nback
The exact naming of the "attributes" (here:
nback-task) of the sub-sections
do not matter, but the subsequent key-value pairs do matter. You always need to set the
key, which is used to identify the files that belong to this particular task. Any key-value pair
id key-value pair are append to the renamed filename along the BIDS-format.
For example, suppose you have a raw file
sub-001_wmtask.PAR. With the above config-file, this file
will be renamed into
As discussed, any key-value pair besides
id will be appended (in the format "key-value") to the
filename during the renaming-process. Imagine, for example, that you have only one task - "nback" - but
you acquired four runs of it per subject, of which the first two were acquired with a sequential acquisition protocol,
but the last two with a multiband protocol (e.g. if you'd want to do some methodological comparison).
The config-file should, in that case, look like:
options: # some options mappings: # some mappings func: nback-task1: id: nback1 task: nback run: 1 acq: sequential nback-task2: id: nback1 task: nback run: 2 acq: sequential nback-task3: id: nback3 task: nback run: 3 acq: multiband nback-task4: id: nback4 task: nback run: 4 acq: multiband
bidsify will then create four files (assuming that they can be "found" using their corresponding ``id``s):
The same logic can be applied to the "dwi", "anat", and "fmap" sections. For example, if you would have two T1-weighted structural scans, the "anat" section could look like:
options: # some options mappings: # some mappings anat: firstT1: id: 3DT1_1 run: 1 secondT1: id: 3DT1_2 run: 2
Importantly, any UNIX-style wildcard (e.g. *, ?, and [a,A,1-9]) can be used in the
id values in these sections!
Lastly, apart from the different elements (such as
nback-task1 in the previous example),
each datatype-section (
dwi) also may include a
metadata section, similar to the "toplevel"
metadata section. This field may
include key-value pairs that will be appended to each JSON-file within that
datatype. This is especially nice if you'd want to add metadata that is needed for
specific preprocessing/analysis pipelines that are based on the BIDS-format.
For example, the fmriprep package provides
preprocessing pipelines for BIDS-datasets, but sometimes need specific metadata.
For example, for each BOLD-fMRI file, it needs a field
EffectiveEchoSpacing in the
corresponding JSON-file, and for B0-files (one phasediff, one magnitude image) it needs
EchoTime2. To include those metadata fields in the
corresponding JSON-files, just include a
metadata field under the appropriate
datatype section. For example, to do so for the previous examples:
func: metadata: EffectiveEchoSpacing: 0.00365 PhaseEncodingDirection: "j" nback: id: nback task: nback fmap: metadata: EchoTime1: 0.003 EchoTime2: 0.008 B0: id: B0
How to use
After installing this package, the
bidsify command should be available.
This command assumes a specific organization of your directory with raw data.
Below, I outlined the assumed structure for a simple dataset with one BOLD run and one T1-weighted scan across
/home/user/data/ ├── config.yml ├── sub-01 │ ├── ses-1 │ │ ├── boldrun1.PAR │ │ ├── boldrun1.REC │ │ ├── T1.PAR │ │ └── T1.REC │ └── ses-2 │ ├── boldrun1.PAR │ ├── boldrun1.REC │ ├── T1.PAR │ └── T1.REC └── sub-02 ├── ses-1 │ ├── boldrun1.PAR │ ├── boldrun1.REC │ ├── T1.PAR │ └── T1.REC └── ses-2 ├── boldrun1.PAR ├── boldrun1.REC ├── T1.PAR └── T1.REC
(If you have DICOM-files with the
.dcm extension, just replace the PAR/REC files with a single dcm file.)
So all raw files should be in a single directory, which can be the subject-directory or, optionally, a session-directory. Note: the session directory must be named "ses-<something>".