-
Couldn't load subscription status.
- Fork 5
Description
The xattree decorator combining attrs and xarray works for demonstrations but it's full of shortcuts.
Vision good
attrsfor clean, self-describing object modelxarraydata tree as the "skeleton" of the simulationxarraydimension handling and coordinate inheritance
Implementation bad
- Stealing from
__dict__means we can't use slotted classes and breaks typical Python object model expectations. Violates principle of least surprise and makes debugging harder. - Too magical and fragile. Everything is implicit in the
xattreedecorator, which hijacks the object lifecycle. No separation of responsibilities. - Dimension lookups are slow and error-prone, and there are no clear precedence rules. These should be registered explicitly.
- Proxying attributes to the data tree is slow and complicated and surprising and makes debugging harder.
- Type-checking/intellisense support is patchy.
Fixing it
Tentatively thinking
- A runtime checkable protocol with which a component declares which dimensions it defines and how to build a dataset/tree from itself
- A mixin to manage the tree. Checks protocol compliance, lazily builds the tree on first access, exposes a typed attribute, manages parent/child relationships (delegated to the tree).
This way everything is explicit, no magic, no special semantics to learn, standard attrs fields and dataclass patterns, explicit method calls (or forwarding dictionary-style access) for tree operations. Components still manage their own data and each can be a standalone tree or attach to another tree. Full intellisense/type-checking. Cleaner separation of concerns.
The performance profile will also be better suited for the general case (manipulating/accessing data), with faster attribute access. There is a one-time cost to (re)build the tree but presumably that will be a rarer operation.
I think the original idea of avoiding duplicating data by proxying attrs to the tree was misguided since xarray will already just wrap an array, not copy it. An xarray dataset/tree on top of an attrs class should not be much extra memory pressure since the xarray objects just have a view of array variables.