Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CESM support #30

Merged
merged 36 commits into from
Sep 12, 2022
Merged

CESM support #30

merged 36 commits into from
Sep 12, 2022

Conversation

mnlevy1981
Copy link
Collaborator

This branch cleans up the offline driver to make it easy to set up FEISTY runs from netCDF forcing / IC files. Improvements include

  1. Improved interpolation, so we no longer need daily forcing fields
  2. A notebook that generates forcing files from post-processed CESM output -- a future PR to add the post-processing to this repo would be great
  3. Updated default settings to match the FOSI run, with a clear example of how to override the defaults for the previous two test cases
  4. performance improvements to get runtime in line with the matlab performance -- relying on numpy instead of xarray for more computations, using the assume_sorted flag in the scipy interpolation routine, and refactoring how the offline driver stores the model state throughout the run

1. Instead of passing yaml file name and key to _domain_from_netcdf() and
   _forcing_from_netcdf(), we read the yaml in config_from_netcdf() and pass
   the dictionary to both routines
2. Added a biomass_init option to constructor for feisty_instance_type --
   currently 'constant' is only acceptable value, but this will let us read
   initial conditions from netCDF as well (needed for FOSI forcing)
The biggest enhancement is that users can now specify a netcdf file containing
initial conditions (split into fish_ic and bent_ic variables). I also did a
little more performance tuning, relying on numpy instead of xarray in a few
more spots in the code.
Modified interpolation routine to use first forcing date when model date is
earlier than that and last forcing date when model date is beyond (rather than
extrapolating). This new approach to forcing let me remove _forcing_t() from
the offline_driver class.
Also included big refactor of matlab-comparison.ipynb to make it easier to add
additional tests (e.g. we will want a FOSI_spinup option)
1. Use assume_sorted=True in xr.Dataset.interp()
2. Use .data in more copies so that we're not spending cycles copying metadata
Also added some primative timing info
offline driver now has self._max_output_time_dim (default=365) and _ds is a
list of datasets with time dimension no larger than that setting. After
running, self.gen_ds() will run xr.concat() to replace it with a single
dataset.
The offline driver can now create an initial condition file from the last
timestep of the existing state datasets.

Also added support to pass the argument max_output_time_dim through
config_testcase() and config_from_netcdf(), and am committing an updated
matlab-comparison.ipynb to show how to use several new options in the driver.
Used the monthly output from Kristen's post-processed output. Contains raw
cells that show how I converted .mat output to initial conditions, but we also
have a FOSI_spinup option that should be able to generate new IC files instead.
@mnlevy1981
Copy link
Collaborator Author

This PR will remain a draft until I update the documentation

Needed to add call to testcase.gen_ds()
1. Added missing testcase.gen_ds() calls
2. Updated API in test routines
3. Made sure the test_feisty.py tests were using the same parameters for
   testing as were used when generating the baseline
4. food_web.get_consumption() returns numpy array, not DataArray

Also made sure driver-example.ipynb used old settings (for consistency with
prior commits)
When building documentation, we don't want to re-run FOSI-forcing.ipynb
@mnlevy1981
Copy link
Collaborator Author

The only current test failure is

Traceback (most recent call last):
  File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/bin/black", line 8, in <module>
    sys.exit(patched_main())
  File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/black/__init__.py", line 1282, in patched_main
    patch_click()
  File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/black/__init__.py", line 1268, in patch_click
    from click import _unicodefun  # type: ignore
ImportError: cannot import name '_unicodefun' from 'click' (/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/click/__init__.py)

It looks like this is due to an incompatibility between black and click, and has been fixed but it's not clear to me how the environment used by pre-commit in linting.yaml is created so I don't know where to either pin click to an old version or update to the newest black

api-driver.rst now references the offline_driver class (previously the
simulation class), and api-internal.rst no longer references the
compute_benthic_biomass_update() function since that was removed a while ago
@mnlevy1981
Copy link
Collaborator Author

Assuming the testing passes (except the failure due to black / click), this is ready to be merged and I believe the documentation will be updated once this is available on main.

@mnlevy1981 mnlevy1981 marked this pull request as ready for review April 6, 2022 22:03
1. Add matlab-comparison notebook to the docs
2. Update matlab-comparison to include more details on what is in the notebook
3. Update various environments so hopefully everything builds correctly
New black version flagged formatting issue that was previously okay
@mnlevy1981
Copy link
Collaborator Author

It looks like this is due to an incompatibility between black and click, and has been fixed but it's not clear to me how the environment used by pre-commit in linting.yaml

I ran pre-commit autoupdate and that brought in newer versions of these packages, so we're passing CI again

Copy link
Contributor

@matt-long matt-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -274,8 +276,9 @@ def is_small(self):
class zooplankton_type(object):
"""Data structure containing zooplankton parameters."""

def __init__(self, name, **kwargs):
def __init__(self, glob_id, name, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change glob_id to global_id or group_id?

@@ -53,6 +55,7 @@ def __init__(
settings_dict={},
fish_ic_data=None,
benthic_prey_ic_data=None,
biomass_init='constant',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure that this plays friendly with fish_ic_data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option:

  • biomass_init --> biomass_init_file, default None
if biomass_init_file is not None:
   ds_ic = xr.open...
   assert varnames for all taxa in ds_ic
   # use it a
else: 
   # use fish_ic_data
   # use benthic_ic_data
   ....

@@ -53,6 +55,7 @@ def __init__(
settings_dict={},
fish_ic_data=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fish_ic_data=None,
fish_ic_data=1e-5,
benthic_ic_data=2e-3,

self.tendency_data.predation_rate[
:, :
] = self.tendency_data.predation_flux.data / np.maximum(
self.biomass.data[self.ndx_fish, :], 1e-300
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should 1e-300 be replaced by eps?

Perhaps we should also introduce a parameter for minimum biomass?

@@ -229,24 +227,28 @@ def compute_rescale_zoo_consumption(
for zoo_i in food_web.zoo_names:

link_ndx = food_web.prey_link_ndx[zoo_i]
biomass_zoo_pred = food_web._get_biomass_zoo_pred(biomass, zoo_i)
biomass_zoo_pred = np.maximum(food_web._get_biomass_zoo_pred(biomass, zoo_i).data, 1e-300)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use constants.eps

@@ -110,6 +109,10 @@ def __init__(
settings_in={},
fish_ic_data=None,
benthic_prey_ic_data=None,
biomass_init='constant',
allow_negative_forcing=False,
diagnostic_names=[],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to provide a utility to query available diags? Or have a minimal set of defaults?

'consumption_rate_max_pred',
'consumption_rate_link',
]
self._diagnostic_names = diagnostic_names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change _diagnostic_names to _add_diagnostic_names?

"""
self._ds = []
start_dates = [self.start_date]
days_per_year = [np.min([nt, self._max_output_time_dim])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

days_per_year ---> time_levs_per_ds

"""Return the feisty time tendency."""
gcm_data_t = self._forcing_t(t)
gcm_data_t = self.forcing.interp(time=forcing_t, assume_sorted=True)
if not self.allow_negative_forcing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might this logic belong in feisty_instance?

else:
print(f'Finished _solve at {time.strftime("%H:%M:%S")}')

def create_ic_file_from_final_state(self, ic_file, overwrite=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_ic_file_from_final_state(self, ic_file, overwrite=False):
def create_restart_file(self, ic_file, overwrite=False):

Moved some functions into new utils module, and then flake helped me clean up
bugs in those functions that hadn't been caught in the notebook

Also removed prettier from pre-commit, because it wasn't playing nice on
cheyenne (CalledProcessError with node because GLIBC versions weren't found)
Prettier is back in the pre-commit suite, and the deafult_language_version
option lets it install without the GLIBC errors
The forcing_time array wasn't being set properly for non-cyclic forcing

Also cleaned up matlab-comparison.ipynb following a code walkthrough:

1. rename "testcase" -> "fiesty_driver"
2. set matlab_script in one of the top cells (easy for others to change)
Instead of a cell that prints matlab_script, add that output to the cell that
calls .run(); also, FOSI_cesm should use initial conditions spun up in python
rather than matlab
1. Renamed glob_id -> group_ind
2. Clean up how we initialize biomass (biomass_init_file = None => fallback to
   fish_ic_data and benthic_prey_ic_data)
3. 1e-300 -> constants.eps
4. days_per_year -> time_levs_per_ds
5. create_ic_file_from_final_state -> create_restart_file
@mnlevy1981
Copy link
Collaborator Author

071770d cleans up the straight-forward comments @matt-long made, but the following are NOT yet addressed:

  1. better logic surrounding what diagnostics are included in the output
  2. moving forcing logic into feisty_instance

@mnlevy1981
Copy link
Collaborator Author

Also still working on parallelizing matlab-comparison.ipynb

domain.py and ecosystem.py no longer use module (global) variables; instead,
the interface class initializes self.domain_params and self.ecosys_params and
passes those objects to functions that rely on the previously-global variables.
@mnlevy1981
Copy link
Collaborator Author

ea12b57 addresses #20 because we can't parallelize (part of #21) with global variables in some of the modules

When I changed glob_id -> group_ind, I missed all the occurences in
interface.py. I also need to add docs/__init__.py to the repository so that
matlab-examples.ipynb can import functions from utils.py
When dimensions don't match in interface.py, include expected / received dims
in the exception
Needed to rename time -> forcing_time in the forcing dataset so that dask
wouldn't get confused when the template for the output had a time dimension
that didn't match what was in forcing
also cleaned up some of the driver code when testing uncovered small bugs
Just a little more re-organizing
Also removed python3.9 from environment because it was overriding the
python_version specified in the CI matrix. I'll test the notebooks with 3.10
again, and if dask is still not playing nice I'll figure out a good way to have
users downgrade to 3.9
also updated docs/_config.yml to hopefully pass CI
Remove the "with dask" matlab comparison notebook and rename the "with dask2"
comparison notebook
Refactored offline_driver to combine config and run into a single function,
which can then use xr.map_blocks() to wrap a call to a function that actually
creates the offline_driver object and steps through time in parallel
Contains a notebook for processing FOSI, as well as a .py script (that uses
dask-mpi) and a shell script that submits the .py script to casper

Also added the "pip install -e" to environment.yml (along with dask-mpi and its
dependencies), and hopefully cleaned up the CI without breaking anything
also added a README to describe the variables (the prettier pre-commit check
made it hard to do this with comments in the YAML itself)
FOSI_cesm.ipynb has FOSI-specific parameters hard-coded, but with the .py
script using YAML it makes sense to give it a more generic name.
Re-organize / add comments to yaml file; only write output file in the python
driver if output file is provided in yaml
1. Support for reading raw POP output instead of requiring post-processing
2. Support for reading multiple files when generating forcing dataset (e.g.
   provide a list of multiple POP time series files)
3. Support for mapping output back to the POP grid
   -- read in POP data on nlat, nlon grid
   -- map to single horizontal dimension (X)
   -- run FEISTY
   -- map output back to nlat, nlon grid
Also, spelled "high" correctly in highres_companion.yaml file name and updated
a notebook to only provide 2D output if requested in the YAML file
@mnlevy1981
Copy link
Collaborator Author

mnlevy1981 commented Sep 9, 2022

@matt-long I think this is ready for review. The examples/ directory shows how to set up YAML files to point FEISTY to forcing data, and I'm happy to walk through that with you.

Looking back at previous comments, I'm not sure I ever addressed this:

071770d cleans up the straight-forward comments @matt-long made, but the following are NOT yet addressed:

  • better logic surrounding what diagnostics are included in the output
  • moving forcing logic into feisty_instance

@mnlevy1981 mnlevy1981 merged commit e411e79 into marbl-ecosys:main Sep 12, 2022
@mnlevy1981 mnlevy1981 deleted the cesm_support branch January 17, 2023 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants