This notebook teaches how to use the BIDS Archive class.

# Preliminaries

There are a few terms that are important to understand before starting to use BIDS Archive.

## BIDS Entities

BIDS Entities, referred to later as just 'entities', represent metadata about files in the archive. You may already be familiar with common ones, like 'subject', 'task', and 'run'. 

Most entities are used in a key-value form and have their name and value present wherever they are used. They have three main representations. The first is the entity itself, the one-word, all lowercase string (e.g., 'subject'). The second is the entity's name, which may be sevearl words (e.g., 'Contrast Enhancing Agent' is the name for the 'ceagent' entity). The third is the entity's key, which is typically shorter and used in file names (e.g., 'ce' for the 'ceagent' entity). A few entities and their multiple representations are shown in the table below:

| Entity | Name | Key |
| --- | --- | --- |
| subject | Subject | sub |
| session | Session | ses |
| run | Run | run |
| ceagent | Contrast Enhancing Agent | ce |

Some entities aren't used in key-value format, and have only one representation. Examples include 'datatype' (e.g., 'func' or 'anat'), 'extension' (e.g., '.nii', '.nii.gz', or '.json'), and 'suffix' (e.g., 'bold'). See the table below for how these can appear in file naming and archive organization.

Together, these entities provide a unique and consistent way to name files and organize the BIDS dataset.

#### Exercise: What entities are present in the path `sub-01/func/sub-01_task-languageproduction_run-01_bold.nii.gz`, and what are the entity values? 

##### Answer: 
| Entity Name | Value |
|---          | ---   |
| subject | 01 |
| datatype | func |
| task | languageproduction |
| run | 01 |
| suffix | bold |
| extension | .nii.gz|

# BIDS Archive: Opening Existing Dataset

Objective: Learn how to create a BIDS Archive pointing to a specific dataset on disk.

Procedure:
1. Download a small, sample dataset from OpenNeuro to use with `BidsArchive`.
2. Open the dataset using `BidsArchive` and print out some summary data about it

In [1]:
""" Add rtCommon to the path """
import os
import sys
currPath = os.path.dirname(os.path.realpath(os.getcwd()))
rootPath = os.path.dirname(currPath)
sys.path.append(rootPath)


""" Download the dataset """
import subprocess

# https://openneuro.org/datasets/ds002014/versions/1.0.1/download -- <40MB dataset
TARGET_DIR = 'dataset'
command = 'aws s3 sync --no-sign-request s3://openneuro.org/ds002014 ' + TARGET_DIR
command = command.split(' ')
if subprocess.call(command) == 0:
    print("Dataset successfully downloaded")
else:
    print("Error in calling download command")
    

""" Open downloaded dataset """
from rtCommon.bidsArchive import BidsArchive

archive = BidsArchive(TARGET_DIR)
print('Archive: ', archive)

Dataset successfully downloaded
Archive:  Root: ...t-cloud/docs/tutorials/dataset | Subjects: 1 | Sessions: 0 | Runs: 1


# BIDS Archive: Querying Dataset

Objective: Learn how to extract information and files from the `BidsArchive`.

Procedure:

1. Search for images in the dataset.
2. Search for sidecar metadata for the images in the dataset.

In [2]:
# Any BIDS entity can be extracted from the archive using getEntity() (e.g., getSubjects(), getRuns(), getTasks())
print('Dataset info: Subjects: {subjects} | Runs: {runs} | Tasks: {tasks}\n'
      .format(subjects=archive.getSubjects(), runs=archive.getRuns(), tasks=archive.getTasks()))

# Arguments can be passed as keywords or using a dictionary with equivalent results
entityDict = {'subject': archive.getSubjects()[0], 'run': archive.getRuns()[0]}
imagesUsingDict = archive.getImages(**entityDict)
imagesUsingKeywords = archive.getImages(subject=archive.getSubjects()[0], run=archive.getRuns()[0])
assert imagesUsingDict == imagesUsingKeywords

print('Number of image files associated with Subject {}, Run {}: {}'.format(
    entityDict['subject'], entityDict['run'], len(imagesUsingDict)))

# Get all images from the functional runs
images = archive.getImages(datatype='func')
print('Number of functional images: {}'.format(len(images)))

# Anatomical images can be retrieved too
images = archive.getImages(datatype='anat')
print('Number of anatomical images: {}'.format(len(images)))

Dataset info: Subjects: ['01'] | Runs: [1] | Tasks: ['languageproduction']

Number of image files associated with Subject 01, Run 1: 1
Number of functional images: 1
Number of anatomical images: 1


In [3]:
# No images are returned if matches aren't found
subjectName='invalidSubject'
images = archive.getImages(subject=subjectName)
print('Number of image files associated with Subject "{}": {}'.format(subjectName, len(images)))

ERROR:rtCommon.bidsArchive:No images have all provided entities: {'subject': 'invalidSubject'}


Number of image files associated with Subject "invalidSubject": 0


Now that we've seen how to get images from an archive, we'll look at how to get metadata for images we've retrieved from the archive.

To get metadata for an image, the path to the image file is required. Every `BIDSImageFile` returned from `getImages` has a `path` property you can use to obtain this path.

In [4]:
import json

# Get all image files, then create a dictionary mapping each image file's path to its metadata dictionary
imageFiles = archive.getImages()
metadata = {i.path: archive.getMetadata(i.path) for i in imageFiles}
for path, metaDict in metadata.items():
    print('Metadata for:', path, "is:\n", json.dumps(metaDict, indent=4, sort_keys=True), "\n")

Metadata for: /Users/stephen/Documents/princeton/fall2020/cos4978 thesis/rt-cloud/docs/tutorials/dataset/sub-01/anat/sub-01_T1w.nii.gz is:
 {
    "AcquisitionMatrixPE": 320,
    "AcquisitionNumber": 1,
    "AcquisitionTime": "16:23:42.600000",
    "ConversionSoftware": "dcm2niix",
    "ConversionSoftwareVersion": "v1.0.20190410  GCC4.8.2",
    "DeviceSerialNumber": "40720",
    "EchoTime": 0.0025,
    "FlipAngle": 9,
    "ImageOrientationPatientDICOM": [
        0.998291,
        0.0584448,
        0.000264799,
        -0.0340637,
        0.58551,
        -0.809949
    ],
    "ImageType": [
        "DERIVED",
        "SECONDARY",
        "MPR",
        "CSA",
        "MPR",
        "CSAPARALLEL",
        "M",
        "ND",
        "NORM"
    ],
    "ImagingFrequency": 123.188,
    "InPlanePhaseEncodingDirectionDICOM": "ROW",
    "InstitutionAddress": "Maraweg_21_Bielefeld_District_DE_33617",
    "InstitutionName": "EVKB_Mara_1",
    "InstitutionalDepartmentName": "Department",
    "Inv

The last piece of data we'll see how to get from an archive is the events file corresponding to a particular scanning run.

In [5]:
# Event files to get can be filtered by entities, as with 
# getImages and getMetadata
events = archive.getEvents(subject='01', 
                           task='languageproduction', run=1)

# All event files can be retrieved when specifiying no entities
events = archive.getEvents()

# Event files are returned as BIDSDataFile objects
# See the PyBids documentation for more information on those
eventsFile = events[0]
print('Events file type: ', type(eventsFile))

# One method of the BIDSDataFile object returns
# a Pandas data frame of the events file
eventsDF = eventsFile.get_df()

print("Sample data: \n", eventsDF[:][:5])

Events file type:  <class 'bids.layout.models.BIDSDataFile'>
Sample data: 
    onset  duration   trial_type
0      0        30         rest
1     30        30  occupations
2     60        30         rest
3     90        30      animals
4    120        30         rest


# BIDS Archive: Getting & Appending Incrementals

One of the most important functions that a BIDS Archive enables in the context of RT-Cloud is working with BIDS Incrementals. Using a BIDS Archive, you can extract data and package it into a BIDS Incremental using `getIncremental`, and you can append new data in BIDS Incrementals to the archive using `appendIncremental`.

For example, if you have a complete dataset that you want to test a new real-time experiment on, you can using `getIncremental` repeatedly to iterate over your entire dataset, streaming the resulting BIDS Incrementals to RT-Cloud and the new experimental script you want to try out. Then, when you're running your new experiment in RT-Cloud for real, as BIDS Incremental files are streamed from the scanner to your script, you can build up an archive of your entire experiment by calling `appendIncremental` for each BIDS Incremental you receive. 

In [6]:
# Set up an archive and get an incremental
firstIncremental = archive.getIncremental(subject='01', task='languageproduction')

# Iterate through each time slice of the 4-D NIfTI file and 
# store the incrementals in order
entityFilterDict = {'subject': '01', 'task': 'languageproduction'}
NUM_SLICES = firstIncremental.imageDimensions[3]
incrementals = []
for i in range(NUM_SLICES):
    incrementals.append(archive.getIncremental(**entityFilterDict))
    
# Append them in the same order as they were retrieved to a new archive,
# then show that the two archives have the same data for that subject & task
import tempfile

with tempfile.TemporaryDirectory() as td:
    # Create new archive
    newArchive = BidsArchive(td)
    for inc in incrementals:
        newArchive.appendIncremental(inc)
        
    # Compare incrementals in original and new archive
    for i in range(NUM_SLICES):
        assert archive.getIncremental(**entityFilterDict) == \
               newArchive.getIncremental(**entityFilterDict)

# BIDS Incremental: Creating Incremental

# BIDS Incremental: Querying Incremental

# BIDS Incremental: Writing to Disk

# BIDS Incremental: Sending Over a Network