### Test-building tutorial: CPAC.utils.bids_utils.bids_parse_sidecar

In this example, we're taking [an existing function](https://github.com/FCP-INDI/C-PAC/blob/4046b303bc3426d8fae3311a7c0fe0b540371c4e/CPAC/utils/bids_utils.py#L165-L268) and creating a test.

Our function definition:

In [None]:
def bids_parse_sidecar(config_dict, dbg=False):
    # type: (dict, bool) -> dict
    """Uses the BIDS principle of inheritance to build a data structure that
    maps parameters in side car .json files to components in the names of
    corresponding nifti files.

    :param config_dict: dictionary that maps paths of sidecar json files
       (the key) to a dictionary containing the contents of the files (the values)
    :param dbg: boolean flag that indicates whether or not debug statements
       should be printed
    :return: a dictionary that maps parameters to components from BIDS filenames
       such as sub, sess, run, acq, and scan type
    """

    # we are going to build a large-scale data structure, consisting of many
    # levels of dictionaries to hold the data.
    bids_config_dict = {}

    # initialize 'default' entries, this essentially is a pointer traversal
    # of the dictionary
    t_dict = bids_config_dict
    for level in ['scantype', 'site', 'sub', 'ses', 'task',
                  'acq', 'rec', 'dir', 'run']:
        key = '-'.join([level, 'none'])
        t_dict[key] = {}
        t_dict = t_dict[key]

    if dbg:
        print(bids_config_dict)

    # get the paths to the json yaml files in config_dict, the paths contain
    # the information needed to map the parameters from the jsons (the vals
    # of the config_dict) to corresponding nifti files. We sort the list
    # by the number of path components, so that we can iterate from the outer
    # most path to inner-most, which will help us address the BIDS inheritance
    # principle
    config_paths = sorted(
        list(config_dict.keys()),
        key=lambda p: len(p.split('/'))
    )

    if dbg:
        print(config_paths)

    for cp in config_paths:

        if dbg:
            print("processing %s" % (cp))

        # decode the filepath into its various components as defined by  BIDS
        f_dict = bids_decode_fname(cp)

        # handling inheritance is a complete pain, we will try to handle it by
        # build the key from the bottom up, starting with the most
        # parsimonious possible, incorporating configuration information that
        # exists at each level

        # first lets try to find any parameters that already apply at this
        # level using the information in the json's file path
        t_params = bids_retrieve_params(t_dict, f_dict)

        # now populate the parameters
        bids_config = {}
        if t_params:
            bids_config.update(t_params)

        # add in the information from this config file
        t_config = config_dict[cp]
        if t_config is list:
            t_config = t_config[0]

        try:
            bids_config.update(t_config)
        except ValueError:
            err = "\n[!] Could not properly parse the AWS S3 path provided " \
                  "- please double-check the bucket and the path.\n\nNote: " \
                  "This could either be an issue with the path or the way " \
                  "the data is organized in the directory. You can also " \
                  "try providing a specific site sub-directory.\n\n"
            raise ValueError(err)

        # now put the configuration in the data structure, by first iterating
        # to the location of the key, and then inserting it. When a key isn't
        # defined we use the "none" value. A "none" indicates that the
        # corresponding parameters apply to all possible settings of that key
        # e.g. run-1, run-2, ... will all map to run-none if no jsons
        # explicitly define values for those runs
        t_dict = bids_config_dict  # pointer to current dictionary
        for level in ['scantype', 'site', 'sub', 'ses', 'task', 'acq',
                      'rec', 'dir', 'run']:
            if level in f_dict:
                key = "-".join([level, f_dict[level]])
            else:
                key = "-".join([level, "none"])

            if key not in t_dict:
                t_dict[key] = {}

            t_dict = t_dict[key]

        t_dict.update(bids_config)

    return (bids_config_dict)

We need to import a couple functions into this notebook; these functions are in the same file as the function we're discussing, so they're already defined there.

In [None]:
from CPAC.utils.bids_utils import bids_decode_fname, bids_retrieve_params

We need a "dictionary that maps paths of sidecar json files (the key) to a dictionary containing the contents of the files (the values)" to pass to `config_dict`.

First, we'll grab some sample data. `get_BIDS_examples_dir` will return a path to a local copy of [bids-examples](https://github.com/bids-standard/bids-examples), cloning that repository if necessary.

In [None]:
from CPAC.utils.tests import get_BIDS_examples_dir

examples_dir = get_BIDS_examples_dir()

`collect_bids_files_configs` creates the kind of dictionary we need, so we can import that function and some sample data to get our one required parameter. Prefixing a function or method with `?` in Jupyter will pull up the signature and docstring.

In [None]:
from CPAC.utils.bids_utils import collect_bids_files_configs
?collect_bids_files_configs

We don't need the `file_paths` list that `collect_bids_files_configs` returns, so we can just assign the second returned item (`collect_bids_files_configs(bids_dir)[1]`) to a variable.

In [None]:
import os

snythetic_data = os.path.join(examples_dir, 'synthetic')
config_dict = collect_bids_files_configs(bids_dir=snythetic_data)[1]

Now we have the required parameter:

In [None]:
config_dict

so we can see what our function actually returns:

In [None]:
bids_parse_sidecar(config_dict)

So with our synthetic data, we get a bunch of levels specified at `'none'` plus some deeply nested `'TaskName'`s and `'RepetitionTime'`s. I'm not sure that's what we really want. With a little refactoring,

In [None]:
LEVEL_HIERARCHY = ['scantype', 'site', 'sub', 'ses', 'task', 'acq', 'rec',
                   'dir', 'run']

def bids_parse_sidecar(config_dict, dbg=False):
    """Uses the BIDS principle of inheritance to build a data structure that
    maps parameters in sidecar .json files to components in the names of
    corresponding nifti files.

    :param config_dict: dictionary that maps paths of sidecar JSON files
        (the key) to a dictionary containing the contents of the files
        (the values)
    :param dbg: boolean flag that indicates whether or not debug statements
         should be printed
    :return: a dictionary that maps parameters to components from BIDS filenames
       such as sub, sess, run, acq, and scan type
    """
    # we are going to build a large-scale data structure, consisting of many
    # levels of dictionaries to hold the data.
    bids_config_dict = {}

    # initialize
    t_dict = bids_config_dict

    # get the paths to the json yaml files in config_dict, the paths contain
    # the information needed to map the parameters from the jsons (the vals
    # of the config_dict) to corresponding nifti files. We sort the list
    # by the number of path components, so that we can iterate from the outer
    # most path to inner-most, which will help us address the BIDS inheritance
    # principle
    config_paths = sorted(
        list(config_dict.keys()),
        key=lambda p: len(p.split('/'))
    )

    if dbg:
        print(config_paths)

    for cp in config_paths:

        if dbg:
            print("processing %s" % (cp))

        # decode the filepath into its various components as defined by  BIDS
        f_dict = bids_decode_fname(cp)

        # handling inheritance is a complete pain, we will try to handle it by
        # build the key from the bottom up, starting with the most
        # parsimonious possible, incorporating configuration information that
        # exists at each level

        # first lets try to find any parameters that already apply at this
        # level using the information in the json's file path
        t_params = bids_retrieve_params(t_dict, f_dict)

        # now populate the parameters
        bids_config = {}
        if t_params:
            bids_config.update(t_params)

        # add in the information from this config file
        t_config = config_dict[cp]
        if t_config is list:
            t_config = t_config[0]

        try:
            bids_config.update(t_config)
        except ValueError:
            err = "\n[!] Could not properly parse the AWS S3 path provided " \
                  "- please double-check the bucket and the path.\n\nNote: " \
                  "This could either be an issue with the path or the way " \
                  "the data is organized in the directory. You can also " \
                  "try providing a specific site sub-directory.\n\n"
            raise ValueError(err)

        # now put the configuration in the data structure, by first iterating
        # to the location of the key, and then inserting it. When a key isn't
        # defined we use the "none" value. A "none" indicates that the
        # corresponding parameters apply to all possible settings of that key
        # e.g. run-1, run-2, ... will all map to run-none if no jsons
        # explicitly define values for those runs
        t_dict = bids_config_dict  # pointer to current dictionary
        for level in LEVEL_HIERARCHY:
            key = None
            if level in f_dict:
                key = "-".join([level, f_dict[level]])

            if key is not None and key[-5:] != '-none':
                if key not in t_dict:
                    t_dict[key] = {}

                t_dict = t_dict[key]

        t_dict.update(bids_config)

    return (bids_config_dict)

we can have just the relevant levels:

In [None]:
bids_parse_sidecar(config_dict)

Now that we have a known input and output, we can set up a docstring test. [Doctests](https://docs.python.org/3.8/library/doctest.html) are, as you might guess from the name, tests that exist in docstrings. Most major Python testing libraries support doctests.

To set a doctest up, simply
* prefix each line to run with `>>> `
* prefix line continuations with `... ` (plus as many indents as you need)
* assert outputs to stdout without any prefix

To test the above, for example, we could do

```Python
>>> import os
>>> from CPAC.utils.tests import get_BIDS_examples_dir
>>> synth = os.path.join(
...     os.path.join(get_BIDS_examples_dir(), 'synthetic'))
>>> sidecar_dict = bids_parse_sidecar(
...     collect_bids_files_configs(synth)[1])
>>> sidecar_dict.get('scantype-bold').get('task-rest')
{'TaskName': 'Rest', 'RepetitionTime': 2.5}
```

To assert `sidecar_dict['scantype-bold']['task-rest']` == `{'TaskName': 'Rest', 'RepetitionTime': 2.5}`.

If we plug that test into our docstring

In [None]:
LEVEL_HIERARCHY = ['scantype', 'site', 'sub', 'ses', 'task', 'acq', 'rec',
                   'dir', 'run']

def bids_parse_sidecar(config_dict, dbg=False):
    """Uses the BIDS principle of inheritance to build a data structure that
    maps parameters in sidecar .json files to components in the names of
    corresponding nifti files.

    :param config_dict: dictionary that maps paths of sidecar JSON files
        (the key) to a dictionary containing the contents of the files
        (the values)
    :param dbg: boolean flag that indicates whether or not debug statements
         should be printed
    :return: a dictionary that maps parameters to components from BIDS filenames
       such as sub, sess, run, acq, and scan 

    >>> import os
    >>> from CPAC.utils.tests import get_BIDS_examples_dir
    >>> synth = os.path.join(
    ...     os.path.join(get_BIDS_examples_dir(), 'synthetic'))
    >>> sidecar_dict = bids_parse_sidecar(
    ...     collect_bids_files_configs(synth)[1])
    >>> sidecar_dict.get('scantype-bold').get('task-rest')
    {'TaskName': 'Rest', 'RepetitionTime': 2.5}
    """
    # we are going to build a large-scale data structure, consisting of many
    # levels of dictionaries to hold the data.
    bids_config_dict = {}

    # initialize
    t_dict = bids_config_dict

    # get the paths to the json yaml files in config_dict, the paths contain
    # the information needed to map the parameters from the jsons (the vals
    # of the config_dict) to corresponding nifti files. We sort the list
    # by the number of path components, so that we can iterate from the outer
    # most path to inner-most, which will help us address the BIDS inheritance
    # principle
    config_paths = sorted(
        list(config_dict.keys()),
        key=lambda p: len(p.split('/'))
    )

    if dbg:
        print(config_paths)

    for cp in config_paths:

        if dbg:
            print("processing %s" % (cp))

        # decode the filepath into its various components as defined by  BIDS
        f_dict = bids_decode_fname(cp)

        # handling inheritance is a complete pain, we will try to handle it by
        # build the key from the bottom up, starting with the most
        # parsimonious possible, incorporating configuration information that
        # exists at each level

        # first lets try to find any parameters that already apply at this
        # level using the information in the json's file path
        t_params = bids_retrieve_params(t_dict, f_dict)

        # now populate the parameters
        bids_config = {}
        if t_params:
            bids_config.update(t_params)

        # add in the information from this config file
        t_config = config_dict[cp]
        if t_config is list:
            t_config = t_config[0]

        try:
            bids_config.update(t_config)
        except ValueError:
            err = "\n[!] Could not properly parse the AWS S3 path provided " \
                  "- please double-check the bucket and the path.\n\nNote: " \
                  "This could either be an issue with the path or the way " \
                  "the data is organized in the directory. You can also " \
                  "try providing a specific site sub-directory.\n\n"
            raise ValueError(err)

        # now put the configuration in the data structure, by first iterating
        # to the location of the key, and then inserting it. When a key isn't
        # defined we use the "none" value. A "none" indicates that the
        # corresponding parameters apply to all possible settings of that key
        # e.g. run-1, run-2, ... will all map to run-none if no jsons
        # explicitly define values for those runs
        t_dict = bids_config_dict  # pointer to current dictionary
        for level in LEVEL_HIERARCHY:
            key = None
            if level in f_dict:
                key = "-".join([level, f_dict[level]])

            if key is not None and key[-5:] != '-none':
                if key not in t_dict:
                    t_dict[key] = {}

                t_dict = t_dict[key]

        t_dict.update(bids_config)

    return (bids_config_dict)

and run the doctest

In [None]:
from doctest import run_docstring_examples
run_docstring_examples(
    bids_parse_sidecar, locals(), name="bids_parse_sidecar",
    verbose=True)

we see the test succeeds!

We can include as many and as complicated of tests as we want so long as our tests assert specific outputs to stdout.

In [None]:
LEVEL_HIERARCHY = ['scantype', 'site', 'sub', 'ses', 'task', 'acq', 'rec',
                   'dir', 'run']

def bids_parse_sidecar(config_dict, dbg=False):
    """Uses the BIDS principle of inheritance to build a data structure that
    maps parameters in sidecar .json files to components in the names of
    corresponding nifti files.

    :param config_dict: dictionary that maps paths of sidecar JSON files
        (the key) to a dictionary containing the contents of the files
        (the values)
    :param dbg: boolean flag that indicates whether or not debug statements
         should be printed
    :return: a dictionary that maps parameters to components from BIDS filenames
       such as sub, sess, run, acq, and scan 

    >>> import os
    >>> from CPAC.utils.tests import get_BIDS_examples_dir
    >>> examples = get_BIDS_examples_dir()
    >>> synth = os.path.join(examples, 'synthetic')
    >>> sidecar_dict = bids_parse_sidecar(
    ...     collect_bids_files_configs(synth)[1])
    >>> sidecar_dict.get('scantype-bold').get('task-rest')
    {'TaskName': 'Rest', 'RepetitionTime': 2.5}
    >>> sidecar_dict  # doctest: +NORMALIZE_WHITESPACE
    {'scantype-bold':
        {'task-nback': {'TaskName': b'N-Back', 'RepetitionTime': 2.5},
         'task-rest': {'TaskName': 'Rest', 'RepetitionTime': 2.5}}}
    >>> bids_parse_sidecar(collect_bids_files_configs(
    ...     os.path.join(examples, 'ds107')
    ... )[1])  # doctest: +NORMALIZE_WHITESPACE
    {'scantype-bold': {'task-onebacktask':
        {'RepetitionTime': 3.0, 'TaskName': 'one-back task'}}}
    """
    # we are going to build a large-scale data structure, consisting of many
    # levels of dictionaries to hold the data.
    bids_config_dict = {}

    # initialize
    t_dict = bids_config_dict

    # get the paths to the json yaml files in config_dict, the paths contain
    # the information needed to map the parameters from the jsons (the vals
    # of the config_dict) to corresponding nifti files. We sort the list
    # by the number of path components, so that we can iterate from the outer
    # most path to inner-most, which will help us address the BIDS inheritance
    # principle
    config_paths = sorted(
        list(config_dict.keys()),
        key=lambda p: len(p.split('/'))
    )

    if dbg:
        print(config_paths)

    for cp in config_paths:

        if dbg:
            print("processing %s" % (cp))

        # decode the filepath into its various components as defined by  BIDS
        f_dict = bids_decode_fname(cp)

        # handling inheritance is a complete pain, we will try to handle it by
        # build the key from the bottom up, starting with the most
        # parsimonious possible, incorporating configuration information that
        # exists at each level

        # first lets try to find any parameters that already apply at this
        # level using the information in the json's file path
        t_params = bids_retrieve_params(t_dict, f_dict)

        # now populate the parameters
        bids_config = {}
        if t_params:
            bids_config.update(t_params)

        # add in the information from this config file
        t_config = config_dict[cp]
        if t_config is list:
            t_config = t_config[0]

        try:
            bids_config.update(t_config)
        except ValueError:
            err = "\n[!] Could not properly parse the AWS S3 path provided " \
                  "- please double-check the bucket and the path.\n\nNote: " \
                  "This could either be an issue with the path or the way " \
                  "the data is organized in the directory. You can also " \
                  "try providing a specific site sub-directory.\n\n"
            raise ValueError(err)

        # now put the configuration in the data structure, by first iterating
        # to the location of the key, and then inserting it. When a key isn't
        # defined we use the "none" value. A "none" indicates that the
        # corresponding parameters apply to all possible settings of that key
        # e.g. run-1, run-2, ... will all map to run-none if no jsons
        # explicitly define values for those runs
        t_dict = bids_config_dict  # pointer to current dictionary
        for level in LEVEL_HIERARCHY:
            key = None
            if level in f_dict:
                key = "-".join([level, f_dict[level]])

            if key is not None and key[-5:] != '-none':
                if key not in t_dict:
                    t_dict[key] = {}

                t_dict = t_dict[key]

        t_dict.update(bids_config)

    return (bids_config_dict)

In [None]:
run_docstring_examples(
    bids_parse_sidecar, locals(), name="bids_parse_sidecar",
    verbose=True)

Once you have doctests in place, you can configure an IDE or continuous integration service to run the tests automatically and to alert you to any breaking changes.