In [2]:
import os
import pandas as pd
import re
import snirf2bids as s2b
from pathlib import Path
from cedalion.io import bids

## Convert a fNIRS dataset to BIDS

This notebook automates the conversion of an fNIRS dataset into a BIDS-compliant format. 
To begin, you'll need to:

1. Dataset Path: The folder containing the raw dataset.
2. Destination Path: The folder where the BIDS-compliant files will be saved.
3. Mapping CSV File: A CSV file that defines the dataset folder structure and includes the necessary details for constructing the BIDS structure.
4. Optional Metadata: Any additional metadata you want to include. You can use [this google form](https://docs.google.com/forms/d/e/1FAIpQLSeZjlgIqCwp054HsHmTBKPziqcOlfTcaWpdXcGFYPDf0Q5vNg/viewform?usp=sf_link) or [this website](https://neurojson.org/Create/dataset_description_fnirs) to create this json file.

In [None]:
dataset_path = 'path-to-your-dataset'
destination_path = 'your-destination-bids-path'

dataset_path = Path(dataset_path)
destination_path = Path(destination_path)

In [4]:
mapping_df_path = bids.get_snirf2bids_mapping_csv(dataset_path)
extra_meta_data_path = Path("path-to-your-meta-data")

extra_meta_data_path = extra_meta_data_path if os.path.exists(extra_meta_data_path) else None

`get_snirf2bids_mapping_csv` helps you create your mapping CSV file. After generating the CSV file, you might need to manually edit it to include additional information or make adjustments as required.

A valid mapping CSV must include all SNIRF files in your dataset, along with the following details for each file:

- sub: The identifier for the participant.
- ses (optional): The session identifier, if applicable.
- task: The task name or label.
- run (optional): The run number, if applicable.
- acq (optional): The acquisition number, if applicable.
- cond (optional): Conditions' keys as a list.
- cond_match (optional): Conditions' values as a list.
- duration (optional): Events' duration. 

In [5]:
mapping_df = pd.read_csv(mapping_df_path, dtype=str)
mapping_df.head(10)

Unnamed: 0,current_name,sub,ses,task,run,acq,cond,cond_match,duration
0,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,1,,,,
1,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,2,,,,
2,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,1,,,,
3,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,0,,,,
4,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,1,,,,
5,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,2,,,,
6,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,1,,,,
7,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,2,,,,


The mapping table serves as a key component for organizing and processing your dataset. 

The `ses`, `run`, and `acq` columns are optional and can be set to None if not applicable. The `current_name` column contains the path to the SNIRF files in your dataset.

### Looking for possible *_scan.tsv files

To ensure no important information (e.g., acquisition time) from the original dataset is lost, we will:

- Search Subdirectories: Traverse through all subdirectories within the dataset.
- Locate Existing Scan Files: Search for all *_scan.tsv files in the dataset.
- Integrate into Mapping Table: Extract the relevant information from these files and add it to our mapping table.
- Extracts acquisition time from SNIRF files if missing in the `_scans.tsv` file.

This approach ensures that any details, such as acquisition time, are retained and incorporated into the BIDS-compliant structure.

In [6]:
mapping_df["filename_org"] = mapping_df["current_name"].apply(
    lambda x: os.path.basename(x))
scan_df = bids.search_for_acq_time_in_scan_files(dataset_path)

mapping_df = pd.merge(mapping_df, scan_df, on="filename_org", how="left")
mapping_df["acq_time"] = mapping_df.apply(bids.search_for_acq_time_in_snirf_files, axis=1, args=(dataset_path,))

mapping_df.head(10)

Unnamed: 0,current_name,sub,ses,task,run,acq,cond,cond_match,duration,filename_org,acq_time
0,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,1,,,,,sub-018_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:41
1,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,2,,,,,sub-018_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:47
2,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,1,,,,,sub-018_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:43
3,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,0,,,,,sub-018_ses-2_task-Electrical_run-00_nirs,2020-09-29 23:09:39
4,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,1,,,,,sub-017_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:26
5,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,2,,,,,sub-017_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:28
6,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,1,,,,,sub-017_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:24
7,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,2,,,,,sub-017_ses-2_task-Electrical_run-02_nirs,2020-09-29 23:09:30


The `acq_time` information is retrieved from the original dataset's *_scan.tsv files and integrated into the mapping table.

### Looking for possible *_session.tsv files

Similar to *_scan.tsv files, we search for *_session.tsv files in the dataset path to capture additional session-level metadata, such as acquisition times. Any relevant information from these files is added to the mapping table to ensure all session details are preserved.

In [7]:
session_df = bids.search_for_sessions_acq_time(dataset_path)
mapping_df = pd.merge(mapping_df, session_df, on=["sub", "ses"], how="left")

mapping_df.head(10)

Unnamed: 0,current_name,sub,ses,task,run,acq,cond,cond_match,duration,filename_org,acq_time,ses_acq_time
0,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,1,,,,,sub-018_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:41,
1,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,2,,,,,sub-018_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:47,
2,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,1,,,,,sub-018_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:43,
3,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,0,,,,,sub-018_ses-2_task-Electrical_run-00_nirs,2020-09-29 23:09:39,
4,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,1,,,,,sub-017_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:26,
5,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,2,,,,,sub-017_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:28,
6,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,1,,,,,sub-017_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:24,
7,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,2,,,,,sub-017_ses-2_task-Electrical_run-02_nirs,2020-09-29 23:09:30,


### Create BIDS Folder Structure

The goal of this section is to rename the SNIRF files according to the BIDS naming convention and place them in the appropriate directory under `destination_path`, following the BIDS folder structure.

Steps:
1. Generate New Filenames: Create BIDS-compliant filenames for all SNIRF records.
2. Determine File Locations: Identify the appropriate locations for these files within the BIDS folder hierarchy.

This process ensures that the dataset adheres to BIDS standards for organization and naming.

In [8]:
mapping_df[["bids_name", "parent_path"]] = mapping_df.apply(
    bids.create_bids_standard_filenames, axis=1, result_type='expand')

mapping_df.head(10)

Unnamed: 0,current_name,sub,ses,task,run,acq,cond,cond_match,duration,filename_org,acq_time,ses_acq_time,bids_name,parent_path
0,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,1,,,,,sub-018_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:41,,sub-018_ses-1_task-Electrical_run-01_nirs.snirf,sub-018/ses-1/nirs
1,SUB_018/Visit 1/sub-018_ses-1_task-Electrical_...,18,1,Electrical,2,,,,,sub-018_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:47,,sub-018_ses-1_task-Electrical_run-02_nirs.snirf,sub-018/ses-1/nirs
2,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,1,,,,,sub-018_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:43,,sub-018_ses-2_task-Electrical_run-01_nirs.snirf,sub-018/ses-2/nirs
3,SUB_018/Visit 2/sub-018_ses-2_task-Electrical_...,18,2,Electrical,0,,,,,sub-018_ses-2_task-Electrical_run-00_nirs,2020-09-29 23:09:39,,sub-018_ses-2_task-Electrical_run-00_nirs.snirf,sub-018/ses-2/nirs
4,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,1,,,,,sub-017_ses-1_task-Electrical_run-01_nirs,2020-09-29 23:09:26,,sub-017_ses-1_task-Electrical_run-01_nirs.snirf,sub-017/ses-1/nirs
5,SUB_017/Visit 1/sub-017_ses-1_task-Electrical_...,17,1,Electrical,2,,,,,sub-017_ses-1_task-Electrical_run-02_nirs,2020-09-29 23:09:28,,sub-017_ses-1_task-Electrical_run-02_nirs.snirf,sub-017/ses-1/nirs
6,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,1,,,,,sub-017_ses-2_task-Electrical_run-01_nirs,2020-09-29 23:09:24,,sub-017_ses-2_task-Electrical_run-01_nirs.snirf,sub-017/ses-2/nirs
7,SUB_017/Visit 2/sub-017_ses-2_task-Electrical_...,17,2,Electrical,2,,,,,sub-017_ses-2_task-Electrical_run-02_nirs,2020-09-29 23:09:30,,sub-017_ses-2_task-Electrical_run-02_nirs.snirf,sub-017/ses-2/nirs


To facilitate proper organization:

- `parent_path`: Added to the mapping dataframe to define the location of each SNIRF file within `destination_path`.
- `bids_name`: Specifies the new BIDS-compliant name for each file.
In the following sections, we will rename all files to their corresponding `bids_name` and copy them to their designated parent_path.

In [9]:
_ = mapping_df.apply(bids.copy_rename_snirf, axis=1, args=(dataset_path, destination_path))

### Create BIDS specific files (e.g., _coordsystem.json)

In this step, we utilize the snirf2bids Python package to generate the necessary .tsv and .json files for the BIDS structure.

For every record, the following files will be created:
1. _coordsystem.json
2. _optodes.json
3. _optodes.tsv
4. *_channels.tsv
5. *_events.json
6. *_events.tsv
7. *_nirs.json

These files are essential for ensuring the dataset adheres to BIDS standards.

In [10]:
s2b.snirf2bids_recurse(destination_path)
pattern = re.compile(r'.*_scans\.tsv$|^participants\.tsv$|^temp_participants\.tsv$')
files_to_delete = [file for file in destination_path.rglob('*') if file.is_file() and pattern.match(file.name)]
for file in files_to_delete:
    file.unlink()

### Create _scan.tsv Files

Now, we proceed to create scan files for all subjects and sessions. Previously, we searched the original dataset path for any provided scan information, which will now be incorporated into the BIDS structure.

In [11]:
scan_df = mapping_df[["sub", "ses", "bids_name", "acq_time"]]
scan_df['ses'].fillna("Unknown", inplace=True)
scan_df = scan_df.groupby(["sub", "ses"])
scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  scan_df['ses'].fillna("Unknown", inplace=True)
  scan_df.apply(lambda group: bids.create_scan_files(group, destination_path))


### Create _session.tsv Files

The next step is to create session files for all subjects. As with the scan files, we previously searched the original dataset path for any session information, which will now be used to create the corresponding BIDS session files.

In [12]:
session_df = mapping_df[["sub", "ses", "ses_acq_time"]]
session_df = session_df.groupby(["sub"])
session_df.apply(lambda group: bids.create_session_files(group, destination_path))

  session_df.apply(lambda group: bids.create_session_files(group, destination_path))


### Create and Integrate participants.tsv and participants.json

In this step, we gather available participant information from the original dataset and incorporate it into the BIDS structure.

If you provide a participants.tsv file but not a corresponding participants.json, you should fill out the participants.json manually to include descriptions for each field to comply with BIDS standards.

If you provide neither file, new participants.tsv and participants.json files will be automatically created with standard fields:

- species
- age
- sex
- handedness

You can also pass your favourite/custom fields instead of these defaults when creating new files (only applies if no valid TSV is provided).

In [13]:
participants_tsv_file = "path-to-your-participants.tsv"
participants_json_file = "path-to-your-participants.json"
participants_tsv_file = Path(participants_tsv_file)
participants_json_file = Path(participants_json_file)

In [21]:
saved_participants = bids.create_participants_files(bids_dir=destination_path, 
                                                    participants_tsv_path= participants_tsv_file, 
                                                    participants_json_path=participants_json_file, 
                                                    mapping_df=mapping_df,
                                                    fields=["gender", "age"])

### Create data description file

To create the dataset_description.json file, we follow these steps:

1. Search for an existing dataset_description.json in the dataset path and retain the provided information.
2. If extra_meta_data_path is specified, add the additional metadata about the dataset.
3. If neither dataset_description.json nor extra metadata is provided, use the basename of the dataset directory as the dataset name and set the BIDS version to '1.10.0'.

In [15]:
bids.create_data_description(dataset_path, destination_path, extra_meta_data_path)

### Check _coordsystem.json file

Since an empty string is not allowed for the `NIRSCoordinateSystem` key in the *_coordsystem.json file, we will populate it with "Other" to ensure BIDS compliance.

In [16]:
bids.check_coord_files(destination_path)

### Fix *_events.tsv order

Sorting events files based on onset time

In [17]:
_ = mapping_df.apply(bids.sort_events, axis=1, args=(destination_path,))

### Edit *_events.tsv

To allow editing of the `duration` or `trial_type` columns in the *_events.tsv files, the mapping CSV file must include the following extra columns:

1. `duration`: Specifies the new duration for each SNIRF file that needs editing.
2. cond and cond_match:
    - `cond`: A list of keys e.g. [1, 2].
    - `cond_match`: A list of corresponding values e.g. ["con", "inc"]. 
    
    These two columns will be used to create a dictionary that maps the trial_type column.

In [18]:
_ = mapping_df.apply(bids.edit_events, axis=1, args=(destination_path,))

### Creating sourcedata Directory

Finally there is this possiblity to keep your original data under sourcedata directory at your `destination_path`.

In [19]:
bids.save_source(dataset_path, destination_path)