Breaking `BidsDataset` API Changes #236

pvandyken · 2023-02-11T20:41:37Z

This issue picks up from #209. There's going to be a few steps involved in establishing the breaking BidsDataset API, so we can use this issue for tracking.

The relevant portion of the API proposal is copied here:

API

BidsDataset.path

The root path of the datset.
BidsDataset.wildcards[<one or more entities>]

Return {"wildcard": "{snakemake_wildcard}"} pairings. Any selected entities not found in any component would be silently ignored, allowing a generic version of the current BidsDataset.subj_wildcards.
BidsDataset.entities[<one or more entities>]

An extension of BidsComponent.entities. In the simple case, with one entity in the selector, the entity values across all components which have the entity will be returned in a list. With multiple entities in the selector, a dict[entity, list[values] will be returned. If an entity is not found in any component, it could raise an error, or the entity could be ignored.

If used as an iterator, or if .items, .values, or .keys is called, any entity appearing in at least one component will be considered. dict(BidsDataset.entities) will be equivalent to selecting every single available entity.
BidsDataset.zip_lists[<one or more entities>]

Returns the entity group consensus across all components.

itertools.product(*BidsDataset.entities[*selected_entities].values()) will be used as the baseline. In other words, all possible combinations of all values of the selected entities found across all components. Each such combination will be called a row. From this baseline, rows with values missing in one or more components will be filtered out. Components with just one of the selected entities will filter out all rows with entity values not found in the component. Components with multiple of the selected entities will filter all rows with entity combinations not found in the component. Components not containing any of the selected entities will not be considered.

Lists are automatically de-duplicated prior to return. This is necessary because different components may have different numbers of entities, making meaningful comparison without de-duplication impossible:
```
inputs.zip_lists['subject', 'session'] == [
  ['001', '001', '002', '002'],
  ['01', '02', '01', '02']
] != [
  ['001', '001', '002', '002', '001', '001', '002', '002'],
  ['01', '02', '01', '02', '01', '02', '01', '02']
]
```
Because of this, note that:
```
entity = "my_entity"
assert inputs.zip_lists[entity] == inputs.entities[entity]
```
dict(BidsDataset.zip_lists) will be equivalent to BidsDataset.zip_list[<every single entity...>].
If used as an iterator, or if any of .keys, .values, or .items is called, and no selection made, it shall be treated as the dict case above.

The text was updated successfully, but these errors were encountered:

pvandyken mentioned this issue Feb 11, 2023

Transition to breaking BidsDataset API #239

Open

pvandyken added the breaking New feature that breaks compatibility with previous versions label Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking `BidsDataset` API Changes #236

Breaking `BidsDataset` API Changes #236

pvandyken commented Feb 11, 2023 •

edited

Loading

Breaking BidsDataset API Changes #236

Breaking BidsDataset API Changes #236

Comments

pvandyken commented Feb 11, 2023 • edited Loading

API

Breaking `BidsDataset` API Changes #236

Breaking `BidsDataset` API Changes #236

pvandyken commented Feb 11, 2023 •

edited

Loading