Skip to content

Data model refactor #167

@wpbonelli

Description

@wpbonelli

The xattree decorator combining attrs and xarray works for demonstrations but it's full of shortcuts.

Vision good

  • attrs for clean, self-describing object model
  • xarray data tree as the "skeleton" of the simulation
  • xarray dimension handling and coordinate inheritance

Implementation bad

  • Stealing from __dict__ means we can't use slotted classes and breaks typical Python object model expectations. Violates principle of least surprise and makes debugging harder.
  • Too magical and fragile. Everything is implicit in the xattree decorator, which hijacks the object lifecycle. No separation of responsibilities.
  • Dimension lookups are slow and error-prone, and there are no clear precedence rules. These should be registered explicitly.
  • Proxying attributes to the data tree is slow and complicated and surprising and makes debugging harder.
  • Type-checking/intellisense support is patchy.

Fixing it

Tentatively thinking

  • A runtime checkable protocol with which a component declares which dimensions it defines and how to build a dataset/tree from itself
  • A mixin to manage the tree. Checks protocol compliance, lazily builds the tree on first access, exposes a typed attribute, manages parent/child relationships (delegated to the tree).

This way everything is explicit, no magic, no special semantics to learn, standard attrs fields and dataclass patterns, explicit method calls (or forwarding dictionary-style access) for tree operations. Components still manage their own data and each can be a standalone tree or attach to another tree. Full intellisense/type-checking. Cleaner separation of concerns.

The performance profile will also be better suited for the general case (manipulating/accessing data), with faster attribute access. There is a one-time cost to (re)build the tree but presumably that will be a rarer operation.

I think the original idea of avoiding duplicating data by proxying attrs to the tree was misguided since xarray will already just wrap an array, not copy it. An xarray dataset/tree on top of an attrs class should not be much extra memory pressure since the xarray objects just have a view of array variables.

Metadata

Metadata

Assignees

Labels

backendRelated to the in-memory data model/storerefactorrequirementCore requirement

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions