Skip to content

Conversation

@wpbonelli
Copy link
Member

@wpbonelli wpbonelli commented Dec 7, 2024

Demo how an attrs-based component hierarchy might expose a xarray.DataTree view of itself.

The idea is to provide a .data attribute on input component classes which is either a data tree or a data set, depending whether the item is a node or a leaf in the hierarchy.

This is implemented with a datatree decorator which is applied on top of attrs.define().

@datatree
@define(slots=False)
class Npf:
    ...

It could also be implemented as a base class, or a decorator wrapping attrs.define() directly, though I think the attrs maintainers recommend composing instead of wrapping.

The decorator intercepts the __attrs_post_init__ hook and runs some extra code after it, including registering parent/child relationships so the full tree is accessible from any item in it.

This pattern assumes simulations are constructed from the top down as usual, i.e. first a simulation, which is passed into the model's initializer, which is passed into the package's initializer, etc

sim = Sim()
tdis = Tdis(sim=sim, nper=1, perioddata=[Tdis.PeriodData()])
gwf = Gwf(sim=sim)
dis = Dis(model=gwf)
ic = Ic(model=gwf, strt=1.0)
oc = Oc(model=gwf, perioddata=[Oc.Steps()])
npf = Npf(model=gwf, icelltype=0, k=1.0)

Here is a sample view from the top (simulation) level:

sim.data
<xarray.DataTree 'sim'>
Group: /
├── Group: /tdis
│       Dimensions:     (nper: 1)
│       Dimensions without coordinates: nper
│       Data variables:
│           perioddata  (nper) object 8B Tdis.PeriodData(perlen=1.0, nstp=1, tsmult=1.0)
└── Group: /gwf
    ├── Group: /gwf/dis
    │       Dimensions:  (ncol: 2, nrow: 2, nlay: 1)
    │       Dimensions without coordinates: ncol, nrow, nlay
    │       Data variables:
    │           delr     (ncol) float64 16B 1.0 1.0
    │           delc     (nrow) float64 16B 1.0 1.0
    │           top      (ncol, nrow) float64 32B 1.0 1.0 1.0 1.0
    │           botm     (ncol, nrow, nlay) float64 32B 0.0 0.0 0.0 0.0
    ├── Group: /gwf/ic
    │       Dimensions:  (nodes: 4)
    │       Dimensions without coordinates: nodes
    │       Data variables:
    │           strt     (nodes) float64 32B 1.0 1.0 1.0 1.0
    ├── Group: /gwf/oc
    │       Dimensions:     (nper: 1)
    │       Dimensions without coordinates: nper
    │       Data variables:
    │           perioddata  (nper) object 8B Oc.Steps(first='first', last=None, all=None,...
    └── Group: /gwf/npf
            Dimensions:    (nodes: 4)
            Dimensions without coordinates: nodes
            Data variables:
                icelltype  (nodes) int64 32B 0 0 0 0
                k          (nodes) float64 32B 1.0 1.0 1.0 1.0

Note there are several ways we could represent certain record types (e.g. for period data), there is some flexibility in the conversion from mf6 definitions. This is just a sketch of one option.

Dimensions so far are just dis and tdis i.e. nlay, ncol, nrow, nper.

I think a similar pattern could work for output as well. Output would also need space/time dimensions/coordinates.

I discovered xarray can only represent list data if records are classes — it treats tuples as an additional dimension in the array, instead of considering the tuple the item itself. This is true even of NamedTuple subclasses. This is more reason we should use proper classes for record types I suppose.

There are several shortcuts taken in this demo just to get it working, e.g.

  • the way subcomponents discover dis/tdis is a hack and may need a better implementation
  • shape parsing is currently only done at init time, but it should run as a validation step whenever the attribute is set

@wpbonelli wpbonelli force-pushed the datatree branch 2 times, most recently from c2cc5bc to 66e79a9 Compare December 12, 2024 16:02
@wpbonelli wpbonelli marked this pull request as ready for review December 12, 2024 20:30
@deltamarnix
Copy link
Contributor

mypy is giving quite some errors in the code.

For example this piece

def _parse_dim_names(s: str) -> tuple[str]:
    return tuple( # returns a tuple[str, ...], instead of tuple[str]
        [
            ss.strip()
            for ss in s.strip().replace("(", "").replace(")", "").split(",")
            if any(ss)
        ]
    )

@wpbonelli wpbonelli force-pushed the datatree branch 2 times, most recently from f473fc3 to db6ec10 Compare February 7, 2025 00:51
@wpbonelli
Copy link
Member Author

Ok, I got attrs to proxy xarray, more or less. Very hacky and no testing beyond "hey it prints the tree". But the basics work.

mypy is giving quite some errors in the code.

We should set up mypy on the new prototype branch we discussed, when we make that.

@wpbonelli wpbonelli merged commit e85c0b0 into modflowpy:develop Feb 7, 2025
2 checks passed
@wpbonelli wpbonelli deleted the datatree branch February 7, 2025 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants