From 278924f2b643a2b5f69b2ddbc360c9e3b2d6465e Mon Sep 17 00:00:00 2001 From: wpbonelli Date: Thu, 12 Jun 2025 07:28:05 -0400 Subject: [PATCH] update srs and sdd --- docs/dev/sdd.md | 193 +++++++++++++----------------------------------- docs/dev/srs.md | 39 ++++++++++ 2 files changed, 91 insertions(+), 141 deletions(-) diff --git a/docs/dev/sdd.md b/docs/dev/sdd.md index 81fc6598..5fc0863a 100644 --- a/docs/dev/sdd.md +++ b/docs/dev/sdd.md @@ -24,196 +24,107 @@ This is the Software Design Description (SDD) document for FloPy 4, also called *the product*. -This document describes a tentative design, focusing on "core" functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope here. +This document describes a tentative design, focusing on functional requirements. Some attention may be given to architecture, but non-functional requirements are largely out of scope. ## Conceptual model This document follows MODFLOW 6 terminology where applicable, with modifications/translations where appropriate. -MODFLOW 6 is designed as a hierarchy of modular **components**. +A MODFLOW 6 simulation is as a hierarchy of modular **components**. Components encapsulate related data and functionality. -Components encapsulate related functionality and data. Components may have user-specified configuration and/or data **variables**. +Components may have zero or more user-specified **variables** — we use this term interchangeably with **field**, with the latter preferred due to "variable"'s genericity. A field might be a model parameter, e.g. a numeric scalar or array value. Fields which configure non-numerical features of the simulation are called **options**. A field can be required or optional. -Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement. - -Component types include: +Components come in several subtypes: -- **simulation**: MF6's "unit of work", consisting of 1+ (possibly coupled) hydrologic processes +- **simulation**: the fundamental "unit of work" in MF6, consisting of 1+ (possibly coupled) hydrologic process(es) - **model**: a simulated hydrological process - **package**: a subcomponent of a model or simulation -Certain subsets of packages have distinguishing characteristics. A **stress package** represents a forcing. A **basic package** contains only input variables applying statically to the entire simulation. An **advanced package** contains time-variable (i.e. transient) input data. Usually only a single instance of a package is expected — when arbitrarily many are permitted, the package is called a **multi-package**. A **subpackage** is a concept only recognized by the product, not by MODFLOW 6 — a package linked to its parent not by a separate input file, but directly (i.e., subpackage data provided to the parent's initializer method). Subpackages may be attached to packages, models, or simulations. +The simulation is the root of the tree, with models and packages under it, each of which itself might have other packages. + +Most components must have one particular parent (e.g., models are children of a simulation), but some relax this requirement. + +Packages come in several flavors, not necessarily mutually exclusive. + +- A **stress package** represents a forcing. +- A **basic package** contains only input variables applying statically to the entire simulation. +- An **advanced package** contains time-varying input variables. +- Most packages are singular — the parent component may have one and only one instance. When arbitrarily many are permitted, the package is called a **multi-package**. +- A **subpackage** is a package whose parent is another package. ```mermaid classDiagram Simulation *-- "1+" Package Simulation *-- "1+" Model Simulation *-- "1+" Variable - Simulation *-- "1+" Subpackage Model *-- "1+" Package - Model *-- "1+" Subpackage Model *-- "1+" Variable Package *-- "1+" Subpackage Package *-- "1+" Variable Subpackage *-- "1+" Variable ``` -Components are specified by **definition files**. A **definition** specifies input variables for a single MF6 component. A **block** is a named collection of input variables. A definition file specifies exactly one component. A component may contain zero or more blocks. Each block must contain at least one variable. +Components are specified by **definition files**. A definition file specifies a single component and its fields. A definition file consists of top-level metadata and a collection of **blocks**. A block is a named collection of fields. A component may contain zero or more blocks. Each block must contain at least one variable. Most components will have a block named "Options" — see the [MODFLOW 6 DFN file specification](https://modflow6.readthedocs.io/en/latest/_dev/dfn.html) for more info. ## Object model -The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations (FUNC-3, FUNC-4). It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class. - -Component classes will provide access to both **specification** and **data** — that is, to **form** and **content**, respectively. It should be straightforward to read a component's specification off its class definition, or to inspect it programmatically (FUNC-21). Likewise it should be easy to retrieve the value of a variable from an instance of a component (FUNC-4). - -There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc. There are fewer options if the object model must be self-describing. - -### `attrs` - -`dataclasses` are derived from an older project called [`attrs`](https://www.attrs.org/en/stable/) which has [extra powers](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/). - -Our first proof of concept demonstrated a nested hierarchy of hand-rolled classes forming a component tree. Each component stored its data in-house. The specification was attached via metaclass magic. - -We propose to follow the same general pattern, using `attrs` instead for introspection and data access. - -### `xarray` - -XArray provides abstractions for working with multiple datasets related in a hierarchical context, some of which may share the same spatial/temporal indices and/or coordinate systems. - -We propose to follow [imod-python](https://github.com/Deltares/imod-python) in adopting [xarray](https://docs.xarray.dev/en/stable/index.html) as our components' onboard data store. - -### `attrs` + `xarray` - -Combining these patterns naively would result in several challenges, involving duplication, synchronization, and a more general problem reminiscent of [object-relational impedance mismatch](https://en.wikipedia.org/wiki/Object%E2%80%93relational_impedance_mismatch), where the list-oriented and array-oriented paradigms conflict. - -Ultimately, we'd like a mapping between an abstract hierarchy of components and variables, as defined in MF6 definition files, to a Python representation which is self-describing (courtesy of `attrs`) and self-aligning (courtesy of `xarray`). - -The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` extension, now in the core package, which provides [many of the features we want from a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html). - -## Data types - -MODFLOW 6 defines a [type system for input variables](https://github.com/MODFLOW-USGS/modflow6/tree/develop/doc/mf6io/mf6ivar#variable-types). We adapt this for Python. - -A variable is either a scalar or a composite. - -Scalars include integer, double precision, boolean, string, and path types. The product can represent scalars as builtin Python primitives. +The product's main use cases will include creating, manipulating, running, and inspecting MODFLOW 6 simulations. It is natural to provide an object-oriented interface in which every MF6 component module will generally have a corresponding class. -Composites include array, list, product (record), and sum (union) types. +There are many ways to implement an object model in Python: dictionaries, named tuples, plain classes, `dataclasses`, etc. -Translating from MF6: +Two requirements in particular motivate the design described below: 1) the object model is a tree, and 2) it must be self-describing. -- A "keystring" is a union. -- A "recarray" is a list. +Component classes must provide access to both **specification** and **data** — form and content, respectively. A component's specification should be legible from its class definition, to people and programs. -MF6 places some constraints on composite variables. These are explained below. +Moreover, MODFLOW 6 components are situated in a hierarchy, with the simulation at the root, a branch for each model, and so on for packages, etc. This is true of both specification and data — the specification tree defines how components may be connected together, while a simulation instantiates some subset of the specification. -### Records +A third motivation is consistency with [`imod-python`](https://github.com/Deltares/imod-python), which the product follows in several ways including: -MODFLOW 6 requires that records contain only scalar fields. A record may not contain another record. +- Using [`xarray`](https://docs.xarray.dev/en/stable/index.html) for the underlying data model +- Providing dictionary-style access and modification -In MF6 input files, a record appears as a whitespace-delimited line of text. While in principle a record's fields are named, MF6 may or may not expect particular values to be "tagged" (as indicated in definition files). "Tagging" is a concern of the product's MF6 IO layer, not the core object model. +Components in `imod-python` encode parent/child relations in a dictionary, which is filtered as needed for subcomponents of a particular type. The structure of a simulation (or of any component with respect to its children) is thus flexible. "Structural" checks (i.e., what may be attached to what?) run in a separate validation step. -We expect the product to represent records as full-fledged `attrs` classes. +The product aims instead for typed components, where children can be read off the class definition. This pulls structural validation from runtime to type-checking time, so invalid arrangements are visible in e.g. IDEs with Intellisense. -### Unions +The product adopts the standard library `dataclasses` paradigm for class definitions. The `dataclasses` module is derived from a project called [`attrs`](https://www.attrs.org/en/stable/) with [more power](https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/). `attrs` permits terse class definitions, e.g. -MODFLOW 6 requires that unions contain only records. Unions may not contain scalars directly. A union may not contain another union. - -To represent unions, the product can simply use `typing.Union`. - -### Arrays - -MODFLOW 6 supports N-dimensional arrays of homogeneous (scalar) type, where 1 <= N <= 3. - -We can accept any `numpy.typing.ArrayLike` value, whether a standard `ndarray` or some other flavor (i.e. "duck arrays"). A common case will be lazy (e.g. dask) arrays for larger-than-memory operations. We can [implement custom array-likes](https://numpy.org/doc/stable/user/basics.interoperability.html) if there is a good case for it. - -Arrays can be type hinted in full detail in component classes, e.g. `NDArray[np.floating]`, while methods can generally have more lenient type hints (e.g. `ArrayLike`) and perform any necessary type checks at runtime. - -### Lists - -MODFLOW 6 lists may contain records or unions of records. Lists may not contain raw scalars. Collections of scalars should be provided as arrays. - -A list of records is regular, i.e. tabular. A list of unions can be irregular (i.e. rows can have different element counts) and cannot be treated as tabular data. - -For instance: - -- A `packagedata` block is typically a list of records of a single type (thus regular/tabular). -- A `period` block is typically a list of unions, where each item may be a different record type (thus irregular). - -The product can accept regular list data as Python builtin collections, NumPy arrays of dtype `np.object_`, `xarray.DataArray` or other duck arrays, or tabular data structures, e.g. `np.recarray`, `pd.DataFrame`. - -**Note**: If storing a regular list in a tabular data structure, the product should avoid columns of dtype `np.object_` — e.g. prefer to store grid cell indices as separate columns `i`, `j`, `k`, not a single column. - -The product can accept irregular lists as builtin collections, NumPy arrays of dtype `np.object_`, or `xarray.DataArray` or other duck arrays. - -## Developer workflow - -The product is a core element in day-to-day MODFLOW 6 development. Most critically, the product must be able to generate a MODFLOW 6 interface layer from a specification (FUNC-20). - -Typically, a MODFLOW 6 developer will write a new component specification and module in MODFLOW 6, run the product's code-generation utilities, and use the regenerated MODFLOW 6 interface layer to write integration tests for the ew component. - - - -```mermaid -C4Container - title [Containers] Code generation workflow - - Boundary(mf6, "MODFLOW 6"){ - SystemDb(dfn, "Specification") - } - - Boundary(flopy, "FloPy") { - Boundary(devs, "Developer APIs") { - System(fpycore, "Core framework") - System(fpycodegen, "Code generation") - } - Boundary(users, "User APIs") { - System(fpymf6, "MF6 module") - } - Rel(fpymf6, fpycore, "imports") - - Rel(fpycodegen, dfn, "inspects") - Rel(fpycodegen, fpymf6, "generates") - } - - Person(dev, "Developer", "") - Person(user, "User", "") - - Rel(dev, dfn, "develops") - Rel(dev, fpycore, "develops") - Rel(dev, fpycodegen, "develops/uses") - Rel(user, fpymf6, "uses") - UpdateRelStyle(dev, dfn, $lineColor="blue", $offsetX="-20" $offsetY="-30") - UpdateRelStyle(dev, fpycore, $lineColor="blue", $offsetY="90") - UpdateRelStyle(dev, fpycodegen, $lineColor="blue", $offsetY="50") - UpdateRelStyle(user, fpymf6, $lineColor="blue", $offsetY="50") - UpdateRelStyle(user, fpycore, $lineColor="blue", $offsetX="-20" $offsetY="-10") +```python +from flopy.mf6.gwf import Ic +from attrs import define, field +from numpy.typing import NDArray +import numpy as np +@define +class Ic(Package): + """Initial conditions package""" + strt: NDArray[np.floating] = field(...) + export_array_ascii: bool = field(...) + export_array_netcdf: bool = field(...) ``` -From the MODFLOW 6 developer's perspective, the product's code generation workflow will remain more or less unchanged. +Minimal class definitions are easier to read and to generate from definition files. The trick is in mapping the MODFLOW 6 input specification to the Python type system. With this transformation defined, the original specification can be derived in reverse from the class definition. -We propose a few changes to the underlying implementation: +The product bolts on dictionary-style behavior by implementing `MutableMapping` in a component base class. -1. Use `Jinja2` for code generation -2. Keep the specification in MODFLOW 6 only -3. Distribute the specification with MODFLOW 6 +Where `imod-python` components expose their fields via [a `Dataset`](https://github.com/Deltares/imod-python/blob/master/imod/common/interfaces/ipackagebase.py), components in the product expose a `DataTree` node. The [`DataTree`](https://docs.xarray.dev/en/stable/generated/xarray.DataTree.html) is a recently developed `xarray` feature implementing [a hierarchical data store](https://docs.xarray.dev/en/stable/user-guide/hierarchical-data.html). Components in the product are an [experimental hybrid](https://github.com/modflowpy/xattree) of `attrs` and `xarray` where `attrs` properties, as well as parent/child references, are proxied through the `DataTree`. -Item 1 is an implementation detail. +Combining `attrs` and `xarray` in this way presents challenges involving duplication (`xarray` prefers copies to in-place updates) and synchronization. Some careful management of parent/child links is still required, even though `DataTree` does the majority of the work. -Item 2 will deduplicate the MODFLOW 6 specification and reduce the maintenance/synchronization burden for the product's developers. - -Item 3 will allow the product to generate a corresponding MODFLOW 6 interface layer when a new MF6 executable is installed. +The sparse, record-based list input format used by MODFLOW 6 is also in some tension with `xarray`, where it is natural to disaggregate tables into an array for each constituent column — this requires a nontrivial mapping between data as read from input files and the values eventually accessible through `xarray` APIs. ## IO -TODO - -### Reading input files +IO is at the boundary of the product. Details of any particular input or output format should not contaminate the product's object model. +The product provides an IO framework with which de/serializers can be registered for arbitrary components and formats. -### Writing input files +The product will allow IO to be configured globally, on a per-simulation basis, or at read/write time via method parameters. +IO is implemented in several layers: -### Reading output files +- IO operations, implemented as descriptors, backing `load` and `write` methods on the base component class +- `cattrs` converters to map the object model to/from Python primitives and containers (i.e. un/structuring) +- Encoders/decoders for any number of serialization formats, which translate primitives/containers to strings +In particular, the product will implement a conversion layer and a serialization layer for the MODFLOW 6 input file format. The serialization layer implements a file writer via `Jinja2` templates and a file parser via a `lark` parser generated from an EBNF language specification. diff --git a/docs/dev/srs.md b/docs/dev/srs.md index 9421b8eb..5e316dee 100644 --- a/docs/dev/srs.md +++ b/docs/dev/srs.md @@ -67,6 +67,8 @@ Other libraries and tools may build upon the product to offer more advanced, dom - A MODFLOW developer is setting up a worked example to demonstrate how to use a new feature... +- A MODFLOW 6 developer writes a new component specification and corresponding module, generates a compatible Python interface, and uses it to write integration tests for the new component... + ```mermaid C4Context title [Context] Product use cases @@ -94,6 +96,43 @@ C4Context ``` +```mermaid +C4Context + title [Context] Code generation workflow + + Boundary(mf6, "MODFLOW 6"){ + SystemDb(dfn, "Specification") + } + + Boundary(flopy, "FloPy") { + Boundary(devs, "Developer APIs") { + System(fpycore, "Core framework") + System(fpycodegen, "Code generation") + } + Boundary(users, "User APIs") { + System(fpymf6, "MF6 module") + } + Rel(fpymf6, fpycore, "imports") + + Rel(fpycodegen, dfn, "inspects") + Rel(fpycodegen, fpymf6, "generates") + } + + Person(dev, "Developer", "") + Person(user, "User", "") + + Rel(dev, dfn, "develops") + Rel(dev, fpycore, "develops") + Rel(dev, fpycodegen, "develops/uses") + Rel(user, fpymf6, "uses") + UpdateRelStyle(dev, dfn, $lineColor="blue", $offsetX="-20" $offsetY="-30") + UpdateRelStyle(dev, fpycore, $lineColor="blue", $offsetY="90") + UpdateRelStyle(dev, fpycodegen, $lineColor="blue", $offsetY="50") + UpdateRelStyle(user, fpymf6, $lineColor="blue", $offsetY="50") + UpdateRelStyle(user, fpycore, $lineColor="blue", $offsetX="-20" $offsetY="-10") + +``` + ## Requirements Broadly, the product should