This notebook teaches how to use the BIDS Archive & BIDS incremental classes.

# Preliminaries

There are a few terms that are important to understand before starting to use BIDS Archive.

## BIDS Entities

BIDS Entities, referred to later as just 'entities', represent metadata about files in the archive. You may already be familiar with common ones, like 'subject', 'task', and 'run'. 

Most entities are used in a key-value form and have their name and value present wherever they are used. They have three main representations. The first is the entity itself, the one-word, all lowercase string (e.g., 'subject'). The second is the entity's name, which may be sevearl words (e.g., 'Contrast Enhancing Agent' is the name for the 'ceagent' entity). The third is the entity's key, which is typically shorter and used in file names (e.g., 'ce' for the 'ceagent' entity). A few entities and their multiple representations are shown in the table below:

| Entity | Name | Key |
| --- | --- | --- |
| subject | Subject | sub |
| session | Session | ses |
| run | Run | run |
| ceagent | Contrast Enhancing Agent | ce |

Some entities aren't used in key-value format, and have only one representation. Examples include 'datatype' (e.g., 'func' or 'anat'), 'extension' (e.g., '.nii', '.nii.gz', or '.json'), and 'suffix' (e.g., 'bold'). See the table below for how these can appear in file naming and archive organization.

Together, these entities provide a unique and consistent way to name files and organize the BIDS dataset.

#### Exercise: What entities are present in the path `sub-01/func/sub-01_task-languageproduction_run-01_bold.nii.gz`, and what are the entity values? 

##### Answer: 
| Entity Name | Value |
|---          | ---   |
| subject | 01 |
| datatype | func |
| task | languageproduction |
| run | 01 |
| suffix | bold |
| extension | .nii.gz|

# BIDS Archive: Opening Existing Dataset

Objective: Learn how to create a BIDS Archive pointing to a specific dataset on disk.

Procedure:
1. Download a small, sample dataset from OpenNeuro to use with `BidsArchive`.
2. Open the dataset using `BidsArchive` and print out some summary data about it

In [None]:
""" Add rtCommon to the path """
import os
import sys
currPath = os.path.dirname(os.path.realpath(os.getcwd())) # docs
rootPath = os.path.dirname(currPath) # project root
sys.path.append(rootPath)


""" Download the dataset """
import subprocess

# https://openneuro.org/datasets/ds002014/versions/1.0.1/download -- <40MB dataset
TARGET_DIR = 'dataset'
command = 'aws s3 sync --no-sign-request s3://openneuro.org/ds002014 ' + TARGET_DIR
command = command.split(' ')
if subprocess.call(command) == 0:
    print("Dataset successfully downloaded")
else:
    print("Error in calling download command")
    

""" Open downloaded dataset """
from rtCommon.bidsArchive import BidsArchive

archive = BidsArchive(TARGET_DIR)
print('Archive: ', archive)

# BIDS Archive: Querying Dataset

Objective: Learn how to extract information and files from the `BidsArchive`.

Procedure:

1. Search for images in the dataset.
2. Search for sidecar metadata for the images in the dataset.

In [None]:
# Any BIDS entity can be extracted from the archive using getEntity() (e.g., getSubjects(), getRuns(), getTasks())
print('Dataset info: Subjects: {subjects} | Runs: {runs} | Tasks: {tasks}\n'
      .format(subjects=archive.getSubjects(), runs=archive.getRuns(), tasks=archive.getTasks()))

# Arguments can be passed as keywords or using a dictionary with equivalent results
entityDict = {'subject': archive.getSubjects()[0], 'run': archive.getRuns()[0]}
imagesUsingDict = archive.getImages(**entityDict)
imagesUsingKeywords = archive.getImages(subject=archive.getSubjects()[0], run=archive.getRuns()[0])
assert imagesUsingDict == imagesUsingKeywords

print('Number of image files associated with Subject {}, Run {}: {}'.format(
    entityDict['subject'], entityDict['run'], len(imagesUsingDict)))

# Get all images from the functional runs
images = archive.getImages(datatype='func')
print('Number of functional images: {}'.format(len(images)))

# Anatomical images can be retrieved too
images = archive.getImages(datatype='anat')
print('Number of anatomical images: {}'.format(len(images)))

In [None]:
# No images are returned if matches aren't found
subjectName='invalidSubject'
images = archive.getImages(subject=subjectName)
print('Number of image files associated with Subject "{}": {}'.format(subjectName, len(images)))

Now that we've seen how to get images from an archive, we'll look at how to get metadata for images we've retrieved from the archive.

To get metadata for an image, the path to the image file is required. Every `BIDSImageFile` returned from `getImages` has a `path` property you can use to obtain this path.

In [None]:
import json

# Get all image files, then create a dictionary mapping each image file's path to its metadata dictionary
imageFiles = archive.getImages()
metadata = {i.path: archive.getMetadata(i.path) for i in imageFiles}
for path, metaDict in metadata.items():
    print('Metadata for:', path, "is:\n", json.dumps(metaDict, indent=4, sort_keys=True), "\n")

The last piece of data we'll see how to get from an archive is the events file corresponding to a particular scanning run.

In [None]:
# Event files to get can be filtered by entities, as with 
# getImages and getMetadata
events = archive.getEvents(subject='01', 
                           task='languageproduction', run=1)

# All event files can be retrieved when specifiying no entities
events = archive.getEvents()

# Event files are returned as BIDSDataFile objects
# See the PyBids documentation for more information on those
eventsFile = events[0]
print('Events file type: ', type(eventsFile))

# One method of the BIDSDataFile object returns
# a Pandas data frame of the events file
eventsDF = eventsFile.get_df()

print("Sample data: \n", eventsDF[:][:5])

# BIDS Archive: Getting & Appending Incrementals

One of the most important functions that a BIDS Archive enables in the context of RT-Cloud is working with BIDS Incrementals. Using a BIDS Archive, you can extract data and package it into a BIDS Incremental using `getIncremental`, and you can append new data in BIDS Incrementals to the archive using `appendIncremental`.

For example, if you have a complete dataset that you want to test a new real-time experiment on, you can using `getIncremental` repeatedly to iterate over your entire dataset, streaming the resulting BIDS Incrementals to RT-Cloud and the new experimental script you want to try out. Then, when you're running your new experiment in RT-Cloud for real, as BIDS Incremental files are streamed from the scanner to your script, you can build up an archive of your entire experiment by calling `appendIncremental` for each BIDS Incremental you receive. 

In [None]:
# Set up an archive and get an incremental
firstIncremental = archive.getIncremental(subject='01', task='languageproduction')

# Iterate through each time slice of the 4-D NIfTI file and 
# store the incrementals in order
entityFilterDict = {'subject': '01', 'task': 'languageproduction'}
NUM_SLICES = firstIncremental.imageDimensions[3]
incrementals = []
for i in range(NUM_SLICES):
    incrementals.append(archive.getIncremental(**entityFilterDict))
    
# Append them in the same order as they were retrieved to a new archive,
# then show that the two archives have the same data for that subject & task
import tempfile

with tempfile.TemporaryDirectory() as td:
    # Create new archive
    newArchive = BidsArchive(td)
    for inc in incrementals:
        newArchive.appendIncremental(inc)
        
    # Compare incrementals in original and new archive
    for i in range(NUM_SLICES):
        assert archive.getIncremental(**entityFilterDict) == \
               newArchive.getIncremental(**entityFilterDict)

# BIDS Incremental: Creating Incremental

A `BIDS Incremental` has two primary components:
1. A NIfTI image
2. A metadata dictionary storing information about the image.

It also has a few other components that are used when the `BIDS Incremental` is written to disk, and may be used by you for other purposes. Those are:
1. The dataset description dictionary, which becomes the `dataset_description.json` in a BIDS Archive.
2. The README string, which becomes the `README` file in a BIDS archive.
3. The events dataframe, which becomes the `<file name entities>_events.tsv` file in a BIDS archive.

To create a `BIDS Incremental`, only the image and the metadata dictionary are needed, and default versions of the other components are created if the `BIDS Incremental` is written to disk.

When reading from a BIDS-compliant dataset, all metadata
should already be present, and using BIDS Archive methods
to read the image and metadata is sufficient to create the
incremental.

In [None]:
from rtCommon.bidsIncremental import BidsIncremental

# Get the NIfTI image
imageFile = archive.getImages(subject='01', run=1)[0]
image = imageFile.get_image()

# Get the metadata for the image
metadata = archive.getMetadata(imageFile.path)

# Create the BIDS Incremental
incremental = BidsIncremental(image, metadata)
print('Created Incremental: ', incremental)

If converting from a DICOM image, sometimes extra work on metadata is needed. This is because BIDS requires certain fields in order to build a valid archive, so BIDS Incremental requires that these fields be provided at creation time in the metadata dictionary. The following example shows how these fields sometimes must be added by the user of the system.

In [None]:
from rtCommon.imageHandling import convertDicomFileToNifti, readDicomFromFile, readNifti
from rtCommon.bidsCommon import getDicomMetadata
from rtCommon.errors import MissingMetadataError

with tempfile.TemporaryDirectory() as td:
    TEMP_NIFTI_NAME = 'temp.nii'
    TEMP_NIFTI_PATH = os.path.join(td, TEMP_NIFTI_NAME)
    dicomPath = os.path.join(rootPath, "tests/test_input/001_000013_000005.dcm")
    convertDicomFileToNifti(dicomPath, TEMP_NIFTI_PATH)
    image = readNifti(TEMP_NIFTI_PATH)

    publicMeta, privateMeta = getDicomMetadata(readDicomFromFile(dicomPath))
    publicMeta.update(privateMeta)

    try:
        incremental = BidsIncremental(image, publicMeta)
    except MissingMetadataError as e:
        print(e)
        # We can see that 'subject', 'suffix', and 'datatype' were not 
        # in the metadata able to be extracted from the DICOM; thus, we'll
        # have to provide them manually based on our knowledge of the experiment

    # Here, we'll pretend the subject is the 1st subject, the imaging methodology
    # was fMRI BOLD, and the datatype is func, representing a functional run
    publicMeta.update({'subject': '01', 'suffix': 'bold', 'datatype': 'func'})

    # Now, the incremental's creation will succeed
    incremental = BidsIncremental(image, publicMeta)
    print('Created Incremental:', incremental)

# BIDS Incremental: Querying Incremental

A `BIDS Incremental` is the basic unit of data transfer in RT-Cloud, and your scripts will often interact directly with an Incremental and the data within it. This part of the tutorial will show you how to obtain different parts of the Incremental's data.

In [None]:
# Getting, setting, and removing metadata
fields = ['subject', 'task', 'RepetitionTime', 'ProtocolName']
print('-------- Getting Fields --------')
for field in fields:
    print(field + ': ' + str(incremental.getMetadataField(field)))
    
print('\n-------- After Setting Fields --------')
for field in fields:
    incremental.setMetadataField(field, 'test')
for field in fields:
    print(field + ': ' + str(incremental.getMetadataField(field)))
    
print('\n-------- Removing Fields --------')
for field in fields:
    # Note that required fields can only be changed, not removed
    try:
        incremental.removeMetadataField(field)
    except ValueError as e:
        print(str(e))
for field in fields:
    print(field + ': ' + str(incremental.getMetadataField(field)))

In [None]:
print('\n-------- Full Metadata Dictionary --------')
print(incremental.imageMetadata)

In addition to these methods, there are several properties that help extract particular entities or data having to do with the NIfTI image contained within the Incremental.

In [None]:
# Entities
print('Suffix:', incremental.suffix)
print('Datatype:', incremental.datatype)
print('BIDS Entities:', incremental.entities)

In [None]:
# Image properties
print('Image dimensions:', incremental.imageDimensions)
print('\nImage header:', incremental.imageHeader)
print('\nImage data:', incremental.imageData)

Because each `BIDS Incremental` can also be made into a fully valid, on-disk BIDS Archive, there are also a variety of properties in the `BIDS Incremental` about how its data would be represented on disk in folders and files.

In [None]:
print('\n-------- Directory Names and Paths --------')
print('Dataset directory name:', incremental.datasetName)
print('Data directory path:', incremental.dataDirPath)

print('\n-------- File Names --------')
print('Image file name:', incremental.imageFileName)
print('Metadata file name:', incremental.metadataFileName)
print('Events file name:', incremental.eventsFileName)

print('\n-------- File Paths --------')
print('Image file path:', incremental.imageFilePath)
print('Metadata file path:', incremental.metadataFilePath)
print('Events file path:', incremental.eventsFilePath)

# BIDS Incremental: Writing to Disk

One of the key features of a `BIDS Incremental` is that it is also a valid, 1-image `BIDS Archive`. Thus, a `BIDS Incremental` can be written out to an archive on disk and navigated on the file system.

In [26]:
with tempfile.TemporaryDirectory() as td:
    incremental.writeToArchive(td)

    archiveFromIncremental = BidsArchive(td)
    print('Archive:', archiveFromIncremental)
    print('\nBIDS Files in Archive from Incremental:', archiveFromIncremental.get())

Archive: Root: ...61t76xtwh00000gn/T/tmpa3rpi9b5 | Subjects: 1 | Sessions: 1 | Runs: 1

BIDS Files in Archive from Incremental: [<BIDSJSONFile filename='/var/folders/3j/db49krhd0sq5n961t76xtwh00000gn/T/tmpa3rpi9b5/dataset_description.json'>, <BIDSFile filename='/var/folders/3j/db49krhd0sq5n961t76xtwh00000gn/T/tmpa3rpi9b5/README'>, <BIDSJSONFile filename='/var/folders/3j/db49krhd0sq5n961t76xtwh00000gn/T/tmpa3rpi9b5/sub-test/ses-01/func/sub-test_ses-01_task-test_run-1_bold.json'>, <BIDSImageFile filename='/var/folders/3j/db49krhd0sq5n961t76xtwh00000gn/T/tmpa3rpi9b5/sub-test/ses-01/func/sub-test_ses-01_task-test_run-1_bold.nii'>, <BIDSDataFile filename='/var/folders/3j/db49krhd0sq5n961t76xtwh00000gn/T/tmpa3rpi9b5/sub-test/ses-01/func/sub-test_ses-01_task-test_run-1_events.tsv'>]


# BIDS Incremental: Sending Over a Network