Skip to content

Testing: Datasets

Mathieu Guay-Paquet edited this page Apr 24, 2024 · 11 revisions

This page is about the sample data available to work with when debugging and writing test functions. Guidelines for writing the test functions themselves can be found on the Tests Wiki page.

SCT provides two types of data for use in testing and debugging:

  1. Real anatomical images
  2. Dummy data

Note: The organization and usage of test data is currently an active point of discussion in Issue #3136.

Real anatomical images

This folder contains cropped MRI images for several different modalities/contrasts (t1, t2, dmri, etc.). It also contains processed versions of these images (segmented spinal cords, labeled spinal cords, etc.) so that you don't have to run time-consuming SCT CLI commands to setup your test data.

It is downloaded at the start of each test session using a session-scoped Pytest fixture inside conftest.py. Alternatively, you can manually download this folder using the command sct_download_data -d sct_testing_data.

This folder is just like sct_testing_data/, except the images are uncropped, so they are larger and will take longer to process. So, to keep your tests light, you should only use images from sct_testing_data/. But, sct_example_data/ is still useful for debugging or reproducing issues.

Notably, data from this folder is also used by the batch_processing.sh script, because that script is meant to demonstrate typical workflows for SCT.

You can manually download this folder using the command sct_download_data -d sct_example_data.

This is a NeuroPoly-internal repository of code and data for SCT tests that we only run around release time. For example, this could be because:

  • The test uses large data files.
  • The test uses private data files.
  • The test runs for a long time.
  • The test needs access to a GPU.

The repository's README.md contains instructions on how to run the tests.

Legacy datasets (duke, django)

The following datasets are collections of files stored on older NeuroPoly servers, before the NeuroPoly lab switched to data.neuro.polymtl.ca. These datasets should not be actively relied on or added to. They are primarily useful when referencing old SCT issues, but any data that is relevant to modern SCT usage should be taken out of these locations and stored somewhere new.

duke:sct_testing/large

Large database of data from different sites, vendors, pathologies, quality. BIDS-compatible. Data related to an SCT issue are named: sub-issueXXXX (with XXXX being the issue number). Data from SCT users (isolated cases) are named: sub-userXXXX (with XXXX being incremented). Do not forget to add the entry to the .tsv file, with the information from the site. For file naming naming conventions, please refer to sct_testing/large/README.txt.

To get the data, connect to duke.neuro.polymtl.ca with your PolyGRAMES account (using SMB/AFP, instructions here). (SSH access is not enabled for duke.)

To generate ground truth data, use this script to go faster: 132.207.65.40/Public_JCA/sct_testing/sct_testing/batch_generate_ground_truth_data.sh

duke:sct_testing/issues

This directory contains private, user-shared image files that were passed onto Julien for the purposes of reproducing/debugging various issues.

To get the data, connect to duke.neuro.polymtl.ca with your PolyGRAMES account (using SMB/AFP). (SSH access is not enabled for duke.)

If a user provides image files, be sure to upload them here, especially if the issue has not been solved yet. This allows future developers to retrieve the files and debug the issue later in time.

django:folder_shared/sct_issue

This directory no longer exists, and the corresponding data has been migrated to duke:sct_testing/issues.

Dataset organization/structure

If you are creating a new dataset or modifying an existing one, please respect the following path/file naming:

- subject_x/
    - t1/
        - t1.nii.gz
    - t2/
        - t2.nii.gz
    - dmri/
        - dmri.nii.gz
        - bvecs.txt
        - bvals.txt
    - mt/
        - mt0.nii.gz
        - mt1.nii.gz
        - mtr.nii.gz

Dummy data

Functions and fixtures

If you'd like to create your own data from scratch, there are dummy data functions that can help you:

  • Functions inside spinalcordtoolbox/testing/create_test_data.py
    • Check this first to see if you can import and re-use any of the dummy functions for your tests.
    • Used by unit_testing/test_math.py, unit_testing/test_centerline.py, and unit_testing/test_deepseg_sc.py.
  • Fixtures inside test modules
    • Used on a per-module basis, e.g. in unit_testing/test_image.py, unit_testing/test_reports.py.

Here is an example of a dummy data fixture:

@pytest.fixture
def fake_3dimage_custom(data):
    """
    :return: a Nifti1Image (3D) in RAS+ space
    """
    affine = np.eye(4)
    return nibabel.nifti1.Nifti1Image(data, affine)

When writing tests, if possible, try to re-use dummy data functions rather than creating new ones.

Using fixtures with pytest.mark.parametrize

Note: The section below is pending based on Pull Request #3152.

Parametrization involves defining multiple sets of arguments for a single test, and is a compact way of producing multiple test cases for a single test function. Typically, the @pytest.mark.parametrize decorator would be used to parametrize test functions. However, it has a big limitation: parametrization can't be used with fixtures. (https://github.com/pytest-dev/pytest/issues/349)

To fix this, we use @parametrize from the pytest_cases plugin instead. Example usage can be seen below:

# Define a fixture function to generate artificial data
@pytest.fixture
def fake_3dimage_sct(fake_3dimage):
    """
    :return: an Image (3D) in RAS+ (aka SCT LPI) space
    shape = (7,8,9)
    """
    i = fake_3dimage
    img = Image(i.get_data(), hdr=i.header,
                orientation="LPI",
                dim=i.header.get_data_shape(),
                )
    return img

# Use "fixture_ref" to use the fixture as input for the test_image argument
@pytest_cases.parametrize("test_image", [fixture_ref(fake_3dimage_sct)])
def test_check_missing_label(test_image):
    # Test function goes here

Pytest devs have commented that they want to include pytest_cases into core functionality. If/when this happens, we should update our parametrization decorators accordingly.

Clone this wiki locally