## Data input for BIDS datasets
`DataGrabber` and `SelectFiles` are great if you are dealing with generic datasets with arbitrary organization. However, if you have decided to use Brain Imaging Data Structure (BIDS) to organize your data (or got your hands on a BIDS dataset) you can take advantage of a formal structure BIDS imposes. In this short tutorial, you will learn how to do this.

## `pybids` - a Python API for working with BIDS datasets
`pybids` is a lightweight python API for querying BIDS folder structure for specific files and metadata. You can install it from PyPi:
```
pip install pybids
```
Please note it should be already installed in the tutorial Docker image.

## The `layout` object and simple queries
To begin working with pybids we need to initialize a layout object. We will need it to do all of our queries

In [2]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/home/neuro/Data/ds000114/")

Let's figure out what are the subject labels in this dataset

In [5]:
layout.get_subjects()

['06', '10', '04', '09', '08', '05', '01', '02', '07', '03']

What datatypes are included in this dataset?

In [6]:
layout.get_datatypes()

['dwi', 'func', 'anat']

Which different data suffixes are included in this dataset?

In [7]:
layout.get_suffixes(datatype='func')

['events', 'bold']

What are the different tasks included in this dataset?

In [9]:
layout.get_tasks()

['covertverbgeneration',
 'overtverbgeneration',
 'fingerfootlips',
 'linebisection',
 'overtwordrepetition']

We can also ask for all of the data for a particular subject and one datatype.

In [10]:
layout.get(subject='01', datatype="anat", session="test")

[<BIDSFile filename='sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz'>]

We can also ask for a specific subset of data. Note that we are using extension filter to get just the imaging data (BIDS allows both .nii and .nii.gz so we need to include both).

In [11]:
layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'])

[<BIDSFile filename='sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz'>,
 <BIDSFile filename='sub-01/ses-test/func/sub-01_ses-test_task-overtwordrepetition_bold.nii.gz'>]

You probably noticed that this method does not only return the file paths, but objects with relevant query fields. We can easily extract just the file paths.

In [12]:
layout.get(subject='01', suffix='bold', extensions=['nii', 'nii.gz'], return_type='file')

['/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-overtverbgeneration_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test

### Exercise 1:
List all files for the "linebisection" task for subject 02.

In [16]:
#write your solution here
layout.get(subject='02', suffix='bold', task='linebisection', return_type='file')

['/home/neuro/Data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-linebisection_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-linebisection_bold.nii.gz']

In [15]:
from bids.layout import BIDSLayout
layout = BIDSLayout("/home/neuro/Data/ds000114/")

layout.get(subject='02', return_type='file', task="linebisection")

['/home/neuro/Data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-linebisection_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-02/ses-retest/func/sub-02_ses-retest_task-linebisection_events.tsv',
 '/home/neuro/Data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-linebisection_bold.nii.gz',
 '/home/neuro/Data/ds000114/sub-02/ses-test/func/sub-02_ses-test_task-linebisection_events.tsv']

## `BIDSDataGrabber`: Including `pybids` in your `nipype` workflow
This is great, but what we really want is to include this into our nipype workflows. To do this, we can import `BIDSDataGrabber`, which provides an `Interface` for `BIDSLayout.get`

In [17]:
from nipype.interfaces.io import BIDSDataGrabber
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.utility import Function

bg = Node(BIDSDataGrabber(), name='bids-grabber')
bg.inputs.base_dir = '/home/neuro/Data/ds000114'

	 A newer version (1.7.0) of nipy/nipype is available. You are using 1.6.1


You can define static filters, that will apply to all queries, by modifying the appropriate input

In [18]:
bg.inputs.subject = '01'
res = bg.run()
res.outputs

211102-16:19:41,286 nipype.workflow INFO:
	 [Node] Setting-up "bids-grabber" in "/tmp/tmpu7u1eyql/bids-grabber".
211102-16:19:41,289 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211102-16:19:41,488 nipype.workflow INFO:
	 [Node] Finished "bids-grabber".



T1w = ['/home/neuro/Data/ds000114/sub-01/ses-retest/anat/sub-01_ses-retest_T1w.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz']
bold = ['/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-fingerfootlips_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-covertverbgeneration_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz', '/

Note that by default `BIDSDataGrabber` will fetch `nifti` files matching datatype `func` and `anat`, and output them as two output fields. 

To define custom fields, simply define the arguments to pass to `BIDSLayout.get` as dictionary, like so:

In [29]:
bg.inputs.output_query = {'bolds': dict(suffix='bold', task='linebisection')}
res = bg.run()
res.outputs

211102-16:36:56,651 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpu7u1eyql/bids-grabber".
211102-16:36:56,655 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211102-16:36:56,844 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".



bolds = ['/home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebisection_bold.nii.gz', '/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-linebisection_bold.nii.gz']

This results in a single output field `bold`, which returns all files with `suffix:bold` for `subject:"01"` 

Now, lets put it in a workflow. We are not going to analyze any data, but for demonstration purposes, we will add a couple of nodes that pretend to analyze their inputs

In [30]:
def printMe(paths):
    print("\n\nanalyzing " + str(paths) + "\n\n")
    
analyzeBOLD = Node(Function(function=printMe, input_names=["paths"],
                            output_names=[]), name="analyzeBOLD")

In [32]:
wf = Workflow(base_dir='/home/neuro/Result/Nipype_tutorial/working_dir', name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD, "paths")
wf.run()

211102-16:51:36,652 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211102-16:51:36,670 nipype.workflow INFO:
	 Running serially.
211102-16:51:36,671 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpu7u1eyql/bids-grabber".
211102-16:51:36,674 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211102-16:51:36,860 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211102-16:51:36,861 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/home/neuro/Result/Nipype_tutorial/working_dir/bids_demo/analyzeBOLD".
211102-16:51:36,869 nipype.workflow INFO:
	 [Node] Cached "bids_demo.analyzeBOLD" - collecting precomputed outputs
211102-16:51:36,870 nipype.workflow INFO:
	 [Node] "bids_demo.analyzeBOLD" found cached.


<networkx.classes.digraph.DiGraph at 0x7f086329dd90>

### Exercise 2:
Modify the `BIDSDataGrabber` and the workflow to collect T1ws images for subject `10`.

In [None]:
# write your solution here

In [34]:
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex2_BIDSDataGrabber = BIDSDataGrabber()
ex2_BIDSDataGrabber.inputs.base_dir = '/home/neuro/Data/ds000114'
ex2_BIDSDataGrabber.inputs.subject = '10'
ex2_BIDSDataGrabber.inputs.output_query = {'T1w': dict(datatype='anat')}

ex2_res = ex2_BIDSDataGrabber.run()
ex2_res.outputs


T1w = ['/home/neuro/Data/ds000114/sub-10/ses-retest/anat/sub-10_ses-retest_T1w.nii.gz', '/home/neuro/Data/ds000114/sub-10/ses-test/anat/sub-10_ses-test_T1w.nii.gz']

## Iterating over subject labels
In the previous example, we demonstrated how to use `pybids` to "analyze" one subject. How can we scale it for all subjects? Easy - using `iterables` (more in [Iteration/Iterables](basic_iteration.ipynb)).

In [35]:
bg_all = Node(BIDSDataGrabber(), name='bids-grabber')
bg_all.inputs.base_dir = '/home/neuro/Data/ds000114'
bg_all.inputs.output_query = {'bolds': dict(suffix='bold')}
bg_all.iterables = ('subject', layout.get_subjects()[:2])
wf = Workflow(name="bids_demo")
wf.connect(bg_all, "bolds", analyzeBOLD, "paths")
wf.run()

211102-16:58:28,500 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211102-16:58:28,505 nipype.workflow INFO:
	 Running serially.
211102-16:58:28,506 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpqahi6sy3/bids_demo/_subject_10/bids-grabber".
211102-16:58:28,508 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211102-16:58:28,694 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211102-16:58:28,695 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD" in "/tmp/tmpaz8e_t8y/bids_demo/_subject_10/analyzeBOLD".
211102-16:58:28,697 nipype.workflow INFO:
	 [Node] Running "analyzeBOLD" ("nipype.interfaces.utility.wrappers.Function")


analyzing ['/home/neuro/Data/ds000114/sub-10/ses-retest/func/sub-10_ses-retest_task-covertverbgeneration_bold.nii.gz', '/home/neuro/Data/ds000114/sub-10/ses-retest/func/sub-10_ses-retest_task-fingerfootlip

<networkx.classes.digraph.DiGraph at 0x7f08632df130>

## Accessing additional metadata
Querying different files is nice, but sometimes you want to access more metadata. For example `RepetitionTime`. `pybids` can help with that as well

In [36]:
layout.get_metadata('/home/neuro/Data/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz')

{'EchoTime': 0.05,
 'FlipAngle': 90,
 'RepetitionTime': 2.5,
 'SliceTiming': [0.0,
  1.2499999999999998,
  0.08333333333333333,
  1.333333333333333,
  0.16666666666666666,
  1.4166666666666663,
  0.25,
  1.4999999999999996,
  0.3333333333333333,
  1.5833333333333328,
  0.41666666666666663,
  1.666666666666666,
  0.5,
  1.7499999999999993,
  0.5833333333333333,
  1.8333333333333326,
  0.6666666666666666,
  1.9166666666666659,
  0.75,
  1.9999999999999991,
  0.8333333333333333,
  2.083333333333332,
  0.9166666666666666,
  2.1666666666666656,
  1.0,
  2.249999999999999,
  1.0833333333333333,
  2.333333333333332,
  1.1666666666666665,
  2.416666666666665],
 'TaskName': 'finger_foot_lips'}

Can we incorporate this into our pipeline? Yes, we can! To do so, let's use a `Function` node to use `BIDSLayout` in a custom way.
(More about MapNode in [MapNode](basic_mapnodes.ipynb))

In [37]:
def printMetadata(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ str(layout.get_metadata(path)["RepetitionTime"]) + "\n\n")
    
analyzeBOLD2 = MapNode(Function(function=printMetadata, input_names=["path", "data_dir"],
                             output_names=[]), name="analyzeBOLD2", iterfield="path")
analyzeBOLD2.inputs.data_dir = "/home/neuro/Data/ds000114/"

In [38]:
wf = Workflow(name="bids_demo")
wf.connect(bg, "bolds", analyzeBOLD2, "path")
wf.run()

211102-17:01:22,830 nipype.workflow INFO:
	 Workflow bids_demo settings: ['check', 'execution', 'logging', 'monitoring']
211102-17:01:22,834 nipype.workflow INFO:
	 Running serially.
211102-17:01:22,835 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.bids-grabber" in "/tmp/tmpu7u1eyql/bids-grabber".
211102-17:01:22,838 nipype.workflow INFO:
	 [Node] Running "bids-grabber" ("nipype.interfaces.io.BIDSDataGrabber")
211102-17:01:23,24 nipype.workflow INFO:
	 [Node] Finished "bids_demo.bids-grabber".
211102-17:01:23,25 nipype.workflow INFO:
	 [Node] Setting-up "bids_demo.analyzeBOLD2" in "/tmp/tmpzdm85xwg/bids_demo/analyzeBOLD2".
211102-17:01:23,27 nipype.workflow INFO:
	 [Node] Setting-up "_analyzeBOLD20" in "/tmp/tmpzdm85xwg/bids_demo/analyzeBOLD2/mapflow/_analyzeBOLD20".
211102-17:01:23,29 nipype.workflow INFO:
	 [Node] Running "_analyzeBOLD20" ("nipype.interfaces.utility.wrappers.Function")


analyzing /home/neuro/Data/ds000114/sub-01/ses-retest/func/sub-01_ses-retest_task-linebise

<networkx.classes.digraph.DiGraph at 0x7f0863272640>

### Exercise 3:
Modify the `printMetadata` function to also print `EchoTime` 

In [None]:
# write your solution here

In [None]:
from nipype.pipeline import Node, MapNode, Workflow
from nipype.interfaces.io import BIDSDataGrabber

ex3_BIDSDataGrabber = Node(BIDSDataGrabber(), name='bids-grabber')
ex3_BIDSDataGrabber.inputs.base_dir = '/data/ds000114'
ex3_BIDSDataGrabber.inputs.subject = '01'
ex3_BIDSDataGrabber.inputs.output_query = {'bolds': dict(suffix='bold')}

In [None]:
# and now modify analyzeBOLD2
def printMetadata_et(path, data_dir):
    from bids.layout import BIDSLayout
    layout = BIDSLayout(data_dir)
    print("\n\nanalyzing " + path + "\nTR: "+ 
          str(layout.get_metadata(path)["RepetitionTime"]) +
          "\nET: "+ str(layout.get_metadata(path)["EchoTime"])+ "\n\n")
    
ex3_analyzeBOLD2 = MapNode(Function(function=printMetadata_et, 
                                    input_names=["path", "data_dir"],
                                    output_names=[]), 
                           name="ex3", iterfield="path")
ex3_analyzeBOLD2.inputs.data_dir = "/data/ds000114/"

# and create a new workflow
ex3_wf = Workflow(name="ex3")
ex3_wf.connect(ex3_BIDSDataGrabber, "bolds", ex3_analyzeBOLD2, "path")
ex3_wf.run()