Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking BidsDataset API Changes #236

Open
2 of 3 tasks
pvandyken opened this issue Feb 11, 2023 · 0 comments
Open
2 of 3 tasks

Breaking BidsDataset API Changes #236

pvandyken opened this issue Feb 11, 2023 · 0 comments
Labels
breaking New feature that breaks compatibility with previous versions

Comments

@pvandyken
Copy link
Contributor

pvandyken commented Feb 11, 2023

This issue picks up from #209. There's going to be a few steps involved in establishing the breaking BidsDataset API, so we can use this issue for tracking.

The relevant portion of the API proposal is copied here:

API

  • BidsDataset.path

    The root path of the datset.

  • BidsDataset.wildcards[<one or more entities>]

    Return {"wildcard": "{snakemake_wildcard}"} pairings. Any selected entities not found in any component would be silently ignored, allowing a generic version of the current BidsDataset.subj_wildcards.

  • BidsDataset.entities[<one or more entities>]

    An extension of BidsComponent.entities. In the simple case, with one entity in the selector, the entity values across all components which have the entity will be returned in a list. With multiple entities in the selector, a dict[entity, list[values] will be returned. If an entity is not found in any component, it could raise an error, or the entity could be ignored.

    If used as an iterator, or if .items, .values, or .keys is called, any entity appearing in at least one component will be considered. dict(BidsDataset.entities) will be equivalent to selecting every single available entity.

  • BidsDataset.zip_lists[<one or more entities>]

    Returns the entity group consensus across all components.

    itertools.product(*BidsDataset.entities[*selected_entities].values()) will be used as the baseline. In other words, all possible combinations of all values of the selected entities found across all components. Each such combination will be called a row. From this baseline, rows with values missing in one or more components will be filtered out. Components with just one of the selected entities will filter out all rows with entity values not found in the component. Components with multiple of the selected entities will filter all rows with entity combinations not found in the component. Components not containing any of the selected entities will not be considered.

    Lists are automatically de-duplicated prior to return. This is necessary because different components may have different numbers of entities, making meaningful comparison without de-duplication impossible:

    inputs.zip_lists['subject', 'session'] == [
      ['001', '001', '002', '002'],
      ['01', '02', '01', '02']
    ] != [
      ['001', '001', '002', '002', '001', '001', '002', '002'],
      ['01', '02', '01', '02', '01', '02', '01', '02']
    ]

    Because of this, note that:

    entity = "my_entity"
    assert inputs.zip_lists[entity] == inputs.entities[entity]

    dict(BidsDataset.zip_lists) will be equivalent to BidsDataset.zip_list[<every single entity...>].
    If used as an iterator, or if any of .keys, .values, or .items is called, and no selection made, it shall be treated as the dict case above.

@pvandyken pvandyken added the breaking New feature that breaks compatibility with previous versions label Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking New feature that breaks compatibility with previous versions
Projects
None yet
Development

No branches or pull requests

1 participant