Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reading model output #194

Merged
merged 8 commits into from
Jul 20, 2023
Merged

Improve reading model output #194

merged 8 commits into from
Jul 20, 2023

Conversation

dbrakenhoff
Copy link
Collaborator

@dbrakenhoff dbrakenhoff commented Jun 21, 2023

Use dask.delayed to load model output. This avoids reading all data into memory if delayed=True. Optionally chunk data array for doing memory efficient calculations on large data arrays.

Adds two kwargs to output methods:

  • delayed, if True, do not load data into memory, default is False
  • chunked, if True, chunk data array using da.chunk("auto"), default is False

Default behavior is same as before, for memory intensive output, use delayed=True, and optionally chunked=True, e.g.:

heads_orig = nlmod.gwf.output.get_heads_da(ds)  # read all data into memory
heads_delayed = nlmod.gwf.output.get_heads_da(ds, delayed=True)  # memory efficient
heads_chunked = nlmod.gwf.output.get_heads_da(ds, delayed=True, chunked=True)  # chunked

- add kwarg delayed, if False, load data into memory, else return data array with delayed dask arrays
- add kwarg chunked, if True, chunk data array with chunks="auto"
- add x,y data to vertex grid data array
Copy link
Collaborator

@OnnoEbbens OnnoEbbens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

I was thinking if we should make this functions cache-able using the cache_netcdf decorator but I think it is too hard (and to little gains) because the cache_netcdf requires a dataset as input and for this function it is optional. However if these functions do become annoyingly slow we could evaluate again because it is certainly possible to use the cache here.

nlmod/gwf/output.py Outdated Show resolved Hide resolved
nlmod/gwf/output.py Outdated Show resolved Hide resolved
nlmod/gwf/output.py Show resolved Hide resolved
nlmod/mfoutput.py Outdated Show resolved Hide resolved
nlmod/mfoutput.py Outdated Show resolved Hide resolved
- fix comments
- add docstrings
- fix where command for dry/noflow
- improve support for loading output without gwf or ds
- new folder mfoutput
- add new flopy binary read functions that support multithreading (binaryfile.py)
- separate reading budget and head output
- modify gwf.output and gwt.output to use new methods
- split logic into multiple reusable functions
- add support for grb files
- add/improve tests
- add method to obtain dims, coords from modelgrid object
@dbrakenhoff
Copy link
Collaborator Author

Alright, the idea is still the same but I refactored the code significantly to increase readability and simplify things.

The idea now is that the binary output file (HeadFile and CellBudgetFile) and the modelgrid should contain enough information to construct a DataArray. If you pass in only a filename to e.g. get_heads_da(fname=fname) you will receive a warning that the grid information is missing. This information can be provided by passing grbfile=<path to binary grid file> keyword argument. You can still load data, but the grid will be some default grid definition (and will not contain the correct spatial coordinates).

  • The file mfoutput/mfoutput.py contains general logic for converting data from flopy binary file objects (HeadFile and CellBudgetFile) to data arrays. I defined a bunch of helper functions to reduce duplication and keep functions short.
  • The file mfoutput/binaryfile.py contains code to read data from binary output files but supporting multithreading. Code is copied from flopy, but modified to contain only the necessary code. We do not support the same level of data accessing options as flopy.
  • gwt/output.py has been modified to use these new general functions for concentration data
  • gwf/output.py has been modified to use these new general functions for head and budget data

nlmod/gwf/output.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@OnnoEbbens OnnoEbbens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments

- codacy
- fix comments @OnnoEbbens
- some additional fixes
@dbrakenhoff dbrakenhoff merged commit 945e302 into dev Jul 20, 2023
@dbrakenhoff dbrakenhoff deleted the mfoutput branch July 20, 2023 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants