Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-34340: implement RFC-834 deprecations #903

Merged
merged 16 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
19 changes: 19 additions & 0 deletions doc/changes/DM-34340.api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Deprecate most public APIs that use Dimension or DimensionElement objects.

This implements [RFC-834](https://jira.lsstcorp.org/browse/RFC-834), deprecating the `DimensionGraph` class (in favor of the new, similar `DimensionGroup`) and a large number of `DataCoordinate` methods and attributes, including its `collections.abc.Mapping` interface.

This includes:

- use `DataCoordinate.dimensions` instead of `DataCoordinate.graph` (likewise for arguments to `DataCoordinate.standardize`);
- use `dict(DataCoordinate.required)` as a drop-in replacement for `DataCoordinate.byName()`, but consider whether you want `DataCoordinate.required` (a `Mapping` view rather than a `dict`) or `DataCoordinate.mapping` (a `Mapping` with all *available* key-value pairs, not just the required ones);
- also use `DataCoordinate.mapping` or `DataCoordinate.required` instead of treating `DataCoordinate` itself as a `Mapping`, *except* square-bracket indexing, which is still very much supported;
- use `DataCoordinate.dimensions.required.names` or `DataCoordinate.required.keys()` as a drop-in replacement for `DataCoordinate.keys().names` or `DataCoordinate.names`, but consider whether you actually want `DataCoordinate.dimensions.names` or `DataCoordinate.mapping.keys` instead.

`DimensionGroup` is almost identical to `DimensionGraph`, but it and its subset attributes are not directly iterable (since those iterate over `Dimension` and `DimensionElement` objects); use the `.names` attribute to iterate over names instead (just as names could be iterated over in `DimensionGraph`).

`DimensionGraph` is still used in some `lsst.daf.butler` APIs (most prominently `DatasetType.dimensions`) that may be accessed without deprecation warnings being emitted, but iterating over that object or its subset attributes *will* yield deprecation warnings.
And `DimensionGraph` is still accepted along with `DimensionGroup` without warning in most public APIs.
When `DimensionGraph` is removed, methods and properties that return `DimensionGraph` will start returning `DimensionGroup` instead.

Rare code (mostly in downstream middleware packages) that does need access to `Dimension` or `DimensionElement` objects should obtain them directly from the `DimensionUniverse`.
For the pattern of checking whether a dimension is a skypix level, test whether its name is in `DimensionUniverse.skypix_dimensions` or `DimensionGroup.skypix` instead of obtaining a `Dimension` instance and calling `isinstance(dimension, SkyPixDimension)`.
18 changes: 7 additions & 11 deletions doc/lsst.daf.butler/dimensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,11 @@ There are two kinds of relationships:
For example, the visit dimension has an implied dependency on the physical filter dimension, because a visit is observed through exactly one filter and hence each visit ID determines a filter name.
When both dimensions are associated with database tables, an implied dependency involves having a foreign key field in the dependent table that is not part of a primary key in the dependent table.

A `DimensionGraph` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the graph.
It also categorizes those dimensions into `~DimensionGraph.required` and `~DimensionGraph.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
`DimensionGraph` also guarantees a deterministic and topological sort order for its elements.
A `DimensionGroup` is an immutable, set-like container of dimensions that is guaranteed to (recursively) include all dependencies of any dimension in the set.
It also categorizes those dimensions into `~DimensionGroup.required` and `~DimensionGroup.implied` subsets, which have roughly the same meaning for a set of dimensions as they do for a single dimension: once the primary key values of all of the required dimensions are known, the primary key values of all implied dimensions are known as well.
`DimensionGroup` also guarantees a deterministic and topological sort order for its elements.

Because `Dimension` instances have a `~Dimension.name` attribute, we typically
use `~lsst.daf.butler.NamedValueSet` and `~lsst.daf.butler.NamedKeyDict` as containers when immutability is needed or the guarantees of `DimensionGraph`.
This allows the string names of dimensions to be used as well in most places where `Dimension` instances are expected.

The complete set of all compatible dimensions is held by a special subclass of `DimensionGraph`, `DimensionUniverse`.
The complete set of all recognized dimensions is managed by a `DimensionUniverse`.
A dimension universe is constructed from configuration, and is responsible for constructing all `Dimension` and `DimensionElement` instances; within a universe, there is exactly one `Dimension` instance that is always used to represent a particular dimension.

`DimensionUniverse` instances themselves are held in a global map keyed by the version number in the configuration used for construction, so they behave somewhat like singletons.
Expand All @@ -50,15 +46,15 @@ Data IDs
--------

The most common way butler users encounter dimensions is as the keys in a *data ID*, a dictionary that maps dimensions to their primary key values.
Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGraph` that contains exactly those keys.
Different datasets with the same `DatasetType` are always identified by the same set of dimensions (i.e. the same set of data ID keys), and hence a `DatasetType` instance holds a `DimensionGroup` that contains exactly those keys.

Many data IDs are simply Python dictionaries that use the string names of dimensions or actual `Dimension` instances as keys.
Most `Butler` and `Registry` APIs that accept data IDs as input accept both dictionaries and keyword arguments that are added to these dictionaries automatically.

The data IDs returned by the `Butler` or `Registry` (and most of those used internally) are usually instances of the `DataCoordinate` class.
`DataCoordinate` instances can have different states of knowledge about the dimensions they identify.
They always contain at least the key-value pairs that correspond to its `DimensionGraph`\ 's `~DimensionGraph.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the graph.
They can also contain key-value pairs for the `~DimensionGraph.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
They always contain at least the key-value pairs that correspond to its `DimensionGroup`\ 's `~DimensionGroup.required` subset -- that is, the minimal set of keys needed to fully identify all other dimensions in the set.
They can also contain key-value pairs for the `~DimensionGroup.implied` subset (a state indicated by `DataCoordinate.hasFull()` returning `True`).
And if `DataCoordinate.hasRecords` returns `True`, the data ID also holds all of the metadata records associated with its dimensions, both as a mapping in
the `~DataCoordinate.records` attribute and via dynamic attribute access, e.g.
``data_id.exposure.day_obs``.
Expand Down
4 changes: 1 addition & 3 deletions python/lsst/daf/butler/_column_categorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,7 @@ def from_iterable(cls, iterable: Iterable[Any]) -> ColumnCategorization:

def filter_skypix(self, universe: DimensionUniverse) -> Iterator[SkyPixDimension]:
return (
dimension
for name in self.dimension_keys
if isinstance(dimension := universe[name], SkyPixDimension)
dimension for name in self.dimension_keys if (dimension := universe.skypix_dimensions.get(name))
)

def filter_governors(self, universe: DimensionUniverse) -> Iterator[GovernorDimension]:
Expand Down
32 changes: 12 additions & 20 deletions python/lsst/daf/butler/_config_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,11 @@
from collections.abc import Iterable, Mapping
from typing import TYPE_CHECKING, Any

from .dimensions import DimensionGraph
from .dimensions import DimensionGroup

if TYPE_CHECKING:
from ._config import Config
from .dimensions import Dimension, DimensionUniverse
from .dimensions import DimensionUniverse

log = logging.getLogger(__name__)

Expand All @@ -58,23 +58,23 @@ class LookupKey:
name : `str`, optional
Primary index string for lookup. If this string looks like it
represents dimensions (via ``dim1+dim2+dim3`` syntax) the name
is converted to a `DimensionGraph` and stored in ``dimensions``
is converted to a `DimensionGroup` and stored in ``dimensions``
property.
dimensions : `DimensionGraph`, optional
dimensions : `DimensionGroup`, optional
Dimensions that are relevant for lookup. Should not be specified
if ``name`` is also specified.
dataId : `dict`, optional
Keys and values from a dataId that should control lookups.
universe : `DimensionUniverse`, optional
Set of all known dimensions, used to expand and validate ``name`` or
``dimensions``. Required if the key represents dimensions and a
full `DimensionGraph` is not provided.
full `DimensionGroup` is not provided.
"""

def __init__(
self,
name: str | None = None,
dimensions: Iterable[str | Dimension] | None = None,
dimensions: DimensionGroup | None = None,
dataId: dict[str, Any] | None = None,
*,
universe: DimensionUniverse | None = None,
Expand All @@ -100,7 +100,7 @@ def __init__(
# indicate this but have to filter out the empty value
dimension_names = [n for n in name.split("+") if n]
try:
self._dimensions = universe.extract(dimension_names)
self._dimensions = universe.conform(dimension_names)
except KeyError:
# One or more of the dimensions is not known to the
# universe. This could be a typo or it could be that
Expand All @@ -122,15 +122,7 @@ def __init__(
self._name = name

elif dimensions is not None:
if not isinstance(dimensions, DimensionGraph):
if universe is None:
raise ValueError(
f"Cannot construct LookupKey for dimensions={dimensions} without universe."
)
else:
self._dimensions = universe.extract(dimensions)
else:
self._dimensions = dimensions
self._dimensions = dimensions
else:
# mypy cannot work this out on its own
raise ValueError("Name was None but dimensions is also None")
Expand Down Expand Up @@ -181,8 +173,8 @@ def name(self) -> str | None:
return self._name

@property
def dimensions(self) -> DimensionGraph | None:
"""Dimensions associated with lookup (`DimensionGraph`)."""
def dimensions(self) -> DimensionGroup | None:
"""Dimensions associated with lookup (`DimensionGroup`)."""
return self._dimensions

@property
Expand All @@ -203,7 +195,7 @@ def __hash__(self) -> int:
def clone(
self,
name: str | None = None,
dimensions: DimensionGraph | None = None,
dimensions: DimensionGroup | None = None,
dataId: dict[str, Any] | None = None,
) -> LookupKey:
"""Clone the object, overriding some options.
Expand All @@ -216,7 +208,7 @@ def clone(
name : `str`, optional
Primary index string for lookup. Will override ``dimensions``
if ``dimensions`` are set.
dimensions : `DimensionGraph`, optional
dimensions : `DimensionGroup`, optional
Dimensions that are relevant for lookup. Will override ``name``
if ``name`` is already set.
dataId : `dict`, optional
Expand Down
4 changes: 2 additions & 2 deletions python/lsst/daf/butler/_dataset_ref.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ def makeDatasetId(
else:
raise ValueError(f"Unexpected ID generation mode: {idGenerationMode}")

for name, value in sorted(dataId.byName().items()):
for name, value in sorted(dataId.required.items()):
items.append((name, str(value)))
data = ",".join(f"{key}={value}" for key, value in items)
return uuid.uuid5(self.NS_UUID, data)
Expand Down Expand Up @@ -328,7 +328,7 @@ def __init__(
):
self.datasetType = datasetType
if conform:
self.dataId = DataCoordinate.standardize(dataId, graph=datasetType.dimensions)
self.dataId = DataCoordinate.standardize(dataId, dimensions=datasetType.dimensions)
else:
self.dataId = dataId
self.run = run
Expand Down