-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CESM support #30
CESM support #30
Conversation
1. Instead of passing yaml file name and key to _domain_from_netcdf() and _forcing_from_netcdf(), we read the yaml in config_from_netcdf() and pass the dictionary to both routines 2. Added a biomass_init option to constructor for feisty_instance_type -- currently 'constant' is only acceptable value, but this will let us read initial conditions from netCDF as well (needed for FOSI forcing)
The biggest enhancement is that users can now specify a netcdf file containing initial conditions (split into fish_ic and bent_ic variables). I also did a little more performance tuning, relying on numpy instead of xarray in a few more spots in the code.
Modified interpolation routine to use first forcing date when model date is earlier than that and last forcing date when model date is beyond (rather than extrapolating). This new approach to forcing let me remove _forcing_t() from the offline_driver class.
Also included big refactor of matlab-comparison.ipynb to make it easier to add additional tests (e.g. we will want a FOSI_spinup option)
1. Use assume_sorted=True in xr.Dataset.interp() 2. Use .data in more copies so that we're not spending cycles copying metadata
Also added some primative timing info
offline driver now has self._max_output_time_dim (default=365) and _ds is a list of datasets with time dimension no larger than that setting. After running, self.gen_ds() will run xr.concat() to replace it with a single dataset.
The offline driver can now create an initial condition file from the last timestep of the existing state datasets. Also added support to pass the argument max_output_time_dim through config_testcase() and config_from_netcdf(), and am committing an updated matlab-comparison.ipynb to show how to use several new options in the driver.
Used the monthly output from Kristen's post-processed output. Contains raw cells that show how I converted .mat output to initial conditions, but we also have a FOSI_spinup option that should be able to generate new IC files instead.
This PR will remain a draft until I update the documentation |
Needed to add call to testcase.gen_ds()
1. Added missing testcase.gen_ds() calls 2. Updated API in test routines 3. Made sure the test_feisty.py tests were using the same parameters for testing as were used when generating the baseline 4. food_web.get_consumption() returns numpy array, not DataArray Also made sure driver-example.ipynb used old settings (for consistency with prior commits)
When building documentation, we don't want to re-run FOSI-forcing.ipynb
The only current test failure is Traceback (most recent call last):
File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/bin/black", line 8, in <module>
sys.exit(patched_main())
File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/black/__init__.py", line 1282, in patched_main
patch_click()
File "/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/black/__init__.py", line 1268, in patch_click
from click import _unicodefun # type: ignore
ImportError: cannot import name '_unicodefun' from 'click' (/home/runner/.cache/pre-commit/repopwsd7jk3/py_env-python3.10/lib/python3.10/site-packages/click/__init__.py) It looks like this is due to an incompatibility between |
api-driver.rst now references the offline_driver class (previously the simulation class), and api-internal.rst no longer references the compute_benthic_biomass_update() function since that was removed a while ago
Assuming the testing passes (except the failure due to |
1. Add matlab-comparison notebook to the docs 2. Update matlab-comparison to include more details on what is in the notebook 3. Update various environments so hopefully everything builds correctly
New black version flagged formatting issue that was previously okay
I ran |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
feisty/core/ecosystem.py
Outdated
@@ -274,8 +276,9 @@ def is_small(self): | |||
class zooplankton_type(object): | |||
"""Data structure containing zooplankton parameters.""" | |||
|
|||
def __init__(self, name, **kwargs): | |||
def __init__(self, glob_id, name, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change glob_id
to global_id
or group_id
?
feisty/core/interface.py
Outdated
@@ -53,6 +55,7 @@ def __init__( | |||
settings_dict={}, | |||
fish_ic_data=None, | |||
benthic_prey_ic_data=None, | |||
biomass_init='constant', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure that this plays friendly with fish_ic_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option:
biomass_init
-->biomass_init_file
, defaultNone
if biomass_init_file is not None:
ds_ic = xr.open...
assert varnames for all taxa in ds_ic
# use it a
else:
# use fish_ic_data
# use benthic_ic_data
....
feisty/core/interface.py
Outdated
@@ -53,6 +55,7 @@ def __init__( | |||
settings_dict={}, | |||
fish_ic_data=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fish_ic_data=None, | |
fish_ic_data=1e-5, | |
benthic_ic_data=2e-3, |
feisty/core/interface.py
Outdated
self.tendency_data.predation_rate[ | ||
:, : | ||
] = self.tendency_data.predation_flux.data / np.maximum( | ||
self.biomass.data[self.ndx_fish, :], 1e-300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should 1e-300
be replaced by eps
?
Perhaps we should also introduce a parameter for minimum biomass?
feisty/core/process.py
Outdated
@@ -229,24 +227,28 @@ def compute_rescale_zoo_consumption( | |||
for zoo_i in food_web.zoo_names: | |||
|
|||
link_ndx = food_web.prey_link_ndx[zoo_i] | |||
biomass_zoo_pred = food_web._get_biomass_zoo_pred(biomass, zoo_i) | |||
biomass_zoo_pred = np.maximum(food_web._get_biomass_zoo_pred(biomass, zoo_i).data, 1e-300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use constants.eps
@@ -110,6 +109,10 @@ def __init__( | |||
settings_in={}, | |||
fish_ic_data=None, | |||
benthic_prey_ic_data=None, | |||
biomass_init='constant', | |||
allow_negative_forcing=False, | |||
diagnostic_names=[], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to provide a utility to query available diags? Or have a minimal set of defaults?
'consumption_rate_max_pred', | ||
'consumption_rate_link', | ||
] | ||
self._diagnostic_names = diagnostic_names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change _diagnostic_names
to _add_diagnostic_names
?
feisty/offline_driver.py
Outdated
""" | ||
self._ds = [] | ||
start_dates = [self.start_date] | ||
days_per_year = [np.min([nt, self._max_output_time_dim])] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
days_per_year
---> time_levs_per_ds
"""Return the feisty time tendency.""" | ||
gcm_data_t = self._forcing_t(t) | ||
gcm_data_t = self.forcing.interp(time=forcing_t, assume_sorted=True) | ||
if not self.allow_negative_forcing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might this logic belong in feisty_instance
?
feisty/offline_driver.py
Outdated
else: | ||
print(f'Finished _solve at {time.strftime("%H:%M:%S")}') | ||
|
||
def create_ic_file_from_final_state(self, ic_file, overwrite=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def create_ic_file_from_final_state(self, ic_file, overwrite=False): | |
def create_restart_file(self, ic_file, overwrite=False): |
Moved some functions into new utils module, and then flake helped me clean up bugs in those functions that hadn't been caught in the notebook Also removed prettier from pre-commit, because it wasn't playing nice on cheyenne (CalledProcessError with node because GLIBC versions weren't found)
Prettier is back in the pre-commit suite, and the deafult_language_version option lets it install without the GLIBC errors
The forcing_time array wasn't being set properly for non-cyclic forcing Also cleaned up matlab-comparison.ipynb following a code walkthrough: 1. rename "testcase" -> "fiesty_driver" 2. set matlab_script in one of the top cells (easy for others to change)
Instead of a cell that prints matlab_script, add that output to the cell that calls .run(); also, FOSI_cesm should use initial conditions spun up in python rather than matlab
1. Renamed glob_id -> group_ind 2. Clean up how we initialize biomass (biomass_init_file = None => fallback to fish_ic_data and benthic_prey_ic_data) 3. 1e-300 -> constants.eps 4. days_per_year -> time_levs_per_ds 5. create_ic_file_from_final_state -> create_restart_file
071770d cleans up the straight-forward comments @matt-long made, but the following are NOT yet addressed:
|
Also still working on parallelizing |
domain.py and ecosystem.py no longer use module (global) variables; instead, the interface class initializes self.domain_params and self.ecosys_params and passes those objects to functions that rely on the previously-global variables.
When I changed glob_id -> group_ind, I missed all the occurences in interface.py. I also need to add docs/__init__.py to the repository so that matlab-examples.ipynb can import functions from utils.py
When dimensions don't match in interface.py, include expected / received dims in the exception
Needed to rename time -> forcing_time in the forcing dataset so that dask wouldn't get confused when the template for the output had a time dimension that didn't match what was in forcing
also cleaned up some of the driver code when testing uncovered small bugs
Just a little more re-organizing
Also removed python3.9 from environment because it was overriding the python_version specified in the CI matrix. I'll test the notebooks with 3.10 again, and if dask is still not playing nice I'll figure out a good way to have users downgrade to 3.9
also updated docs/_config.yml to hopefully pass CI
Remove the "with dask" matlab comparison notebook and rename the "with dask2" comparison notebook
Refactored offline_driver to combine config and run into a single function, which can then use xr.map_blocks() to wrap a call to a function that actually creates the offline_driver object and steps through time in parallel
Contains a notebook for processing FOSI, as well as a .py script (that uses dask-mpi) and a shell script that submits the .py script to casper Also added the "pip install -e" to environment.yml (along with dask-mpi and its dependencies), and hopefully cleaned up the CI without breaking anything
also added a README to describe the variables (the prettier pre-commit check made it hard to do this with comments in the YAML itself)
FOSI_cesm.ipynb has FOSI-specific parameters hard-coded, but with the .py script using YAML it makes sense to give it a more generic name.
Re-organize / add comments to yaml file; only write output file in the python driver if output file is provided in yaml
1. Support for reading raw POP output instead of requiring post-processing 2. Support for reading multiple files when generating forcing dataset (e.g. provide a list of multiple POP time series files) 3. Support for mapping output back to the POP grid -- read in POP data on nlat, nlon grid -- map to single horizontal dimension (X) -- run FEISTY -- map output back to nlat, nlon grid
Also, spelled "high" correctly in highres_companion.yaml file name and updated a notebook to only provide 2D output if requested in the YAML file
@matt-long I think this is ready for review. The Looking back at previous comments, I'm not sure I ever addressed this:
|
This branch cleans up the offline driver to make it easy to set up FEISTY runs from netCDF forcing / IC files. Improvements include
numpy
instead ofxarray
for more computations, using theassume_sorted
flag in thescipy
interpolation routine, and refactoring how the offline driver stores the model state throughout the run