In [1]:
import os
import pandas as pd
from cedalion.io import bids
import snirf2bids as s2b

## Convert a fNIRS dataset to BIDS

This notebook automates the conversion of an fNIRS dataset into a BIDS-compliant format. 
To begin, you'll need to:

1. Dataset Path: The folder containing the raw dataset.
2. Destination Path: The folder where the BIDS-compliant files will be saved.
3. Mapping CSV File: A CSV file that defines the dataset folder structure and includes the necessary details for constructing the BIDS structure.
4. Optional Metadata: Any additional metadata you want to include, which will be saved in the dataset_description.json file.

In [2]:
dataset_path = '/Users/shakiba/Downloads/snirf2bids_data/VFC High Density 2'
destination_path = '/Users/shakiba/Downloads/snirf2bids_data/VFC High Density Bids'
mapping_df_path = '/Users/shakiba/Downloads/snirf2bids_data/VFC High Density 2/snirf2BIDS_mapping.csv'
extra_meta_data_path = None

The `mapping_df_path` is the path to your mapping CSV file.
If you don’t already have a mapping CSV file, you can generate one using the scripts/parse_dataset.py script. After generating the CSV file, you might need to manually edit it to include additional information or make adjustments as required.

A valid mapping CSV must include all SNIRF files in your dataset, along with the following details for each file:

- Subject: The identifier for the participant.
- Session (optional): The session identifier, if applicable.
- Task: The task name or label.
- Run (optional): The run number, if applicable.

In [None]:
mapping_df = pd.read_csv(mapping_df_path, dtype=str)
mapping_df.head(10)

The mapping table serves as a key component for organizing and processing your dataset. 

The `ses`, `run`, and `acq` columns are optional and can be set to None if not applicable. The `current_name` column contains the path to the SNIRF files in your dataset. Since we will need the base filenames of the SNIRF files for further processing, an additional column will be created to store just the base filename.

In [None]:
mapping_df["filename_org"] = mapping_df["current_name"].apply(
    lambda x: os.path.basename(x))

mapping_df.head(10)

### Looking for possible *_scan.tsv files

To ensure no important information (e.g., acquisition time) from the original dataset is lost, we will:

- Search Subdirectories: Traverse through all subdirectories within the dataset.
- Locate Existing Scan Files: Search for all *_scan.tsv files in the dataset.
- Integrate into Mapping Table: Extract the relevant information from these files and add it to our mapping table.

This approach ensures that any details, such as acquisition time, are retained and incorporated into the BIDS-compliant structure.

In [None]:
scan_df = bids.search_for_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, scan_df, on="filename_org", how="left")

mapping_df.head(10)

The `acq_time` information is retrieved from the original dataset's *_scan.tsv files and integrated into the mapping table.

### Looking for possible *_session.tsv files

Similar to *_scan.tsv files, we search for *_session.tsv files in the dataset path to capture additional session-level metadata, such as acquisition times. Any relevant information from these files is added to the mapping table to ensure all session details are preserved.

In [None]:
session_df = bids.search_for_sessions_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, session_df, on=["sub", "ses"], how="left")

mapping_df.head(10)

### Create BIDS Folder Structure

The goal of this section is to rename the SNIRF files according to the BIDS naming convention and place them in the appropriate directory under `destination_path`, following the BIDS folder structure.

Steps:
1. Generate New Filenames: Create BIDS-compliant filenames for all SNIRF records.
2. Determine File Locations: Identify the appropriate locations for these files within the BIDS folder hierarchy.

This process ensures that the dataset adheres to BIDS standards for organization and naming.

In [None]:
mapping_df[["bids_name", "parent_path"]] = mapping_df.apply(
    bids.create_bids_standard_filenames, axis=1, result_type='expand')

mapping_df.head(10)

To facilitate proper organization:

- `parent_path`: Added to the mapping dataframe to define the location of each SNIRF file within `destination_path`.
- `bids_name`: Specifies the new BIDS-compliant name for each file.
In the following sections, we will rename all files to their corresponding `bids_name` and copy them to their designated parent_path.

In [114]:
_ = mapping_df.apply(bids.copy_rename_snirf, axis=1, args=(dataset_path, destination_path))

### Create BIDS specific files (e.g., _coordsystem.json)

In this step, we utilize the snirf2bids Python package to generate the necessary .tsv and .json files for the BIDS structure.

For every record, the following files will be created:
1. _coordsystem.json
2. _optodes.json
3. _optodes.tsv
4. *_channels.tsv
5. *_events.json
6. *_events.tsv
7. *_nirs.json

These files are essential for ensuring the dataset adheres to BIDS standards.

In [115]:
s2b.snirf2bids_recurse(destination_path)

### Create _scan.tsv Files

Now, we proceed to create scan files for all subjects and sessions. Previously, we searched the original dataset path for any provided scan information, which will now be incorporated into the BIDS structure.

In [None]:
scan_df = mapping_df[["sub", "ses", "bids_name", "acq_time"]]
scan_df['ses'].fillna("Unknown", inplace=True)
scan_df = scan_df.groupby(["sub", "ses"])
scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))

### Create _session.tsv Files

The next step is to create session files for all subjects. As with the scan files, we previously searched the original dataset path for any session information, which will now be used to create the corresponding BIDS session files.

In [None]:
session_df = mapping_df[["sub", "ses", "ses_acq_time"]]
session_df = session_df.groupby(["sub"])
session_df.apply(lambda group: bids.create_session_files(group, destination_path))

### Create participants tsv and json files

In this step, we gather all available participant information from the original dataset. If any participant details are provided, they will be incorporated into the BIDS structure.

Additionally, we create a template for the participants.json file with predefined columns, including:

- species
- age
- sex
- handedness

Each of these fields will include descriptive templates to ensure consistency in the BIDS-compliant structure.

In [118]:
bids.create_participants_tsv(dataset_path, destination_path, mapping_df)
bids.create_participants_json(dataset_path, destination_path)

### Create data description file

To create the dataset_description.json file, we follow these steps:

1. Search for an existing dataset_description.json in the dataset path and retain the provided information.
2. If extra_meta_data_path is specified, add the additional metadata about the dataset.
3. If neither dataset_description.json nor extra metadata is provided, use the basename of the dataset directory as the dataset name and set the BIDS version to '1.10.0'.

In [119]:
bids.create_data_description(dataset_path, destination_path)

### Check _coordsystem.json file

Since an empty string is not allowed for the `NIRSCoordinateSystem` key in the *_coordsystem.json file, we will populate it with "Other" to ensure BIDS compliance.

In [120]:
bids.check_coord_files(destination_path)

### Edit *_events.tsv

To allow editing of the `duration` or `trial_type` columns in the *_events.tsv files, the mapping CSV file must include the following extra columns:

1. `duration`: Specifies the new duration for each SNIRF file that needs editing.
2. cond and cond_match:
    - `cond`: A list of keys.
    - `cond_match`: A list of corresponding values. 
    
    These two columns will be used to create a dictionary that maps the trial_type column.

In [None]:
_ = mapping_df.apply(bids.edit_events, axis=1, args=(destination_path))

### Creating sourcedata Directory

Finally there is this possiblity to keep your original data under sourcedata directory at your `destination_path`.

In [3]:
bids.save_source(dataset_path, destination_path)