Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitation: Different number of tasks change answers #256

Open
ekluzek opened this issue Jan 20, 2022 · 5 comments
Open

Limitation: Different number of tasks change answers #256

ekluzek opened this issue Jan 20, 2022 · 5 comments
Labels
cesm-coupling For cesm coupling question

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jan 20, 2022

I'm not sure if this is expected or not. But, I did find that mizuRoute changes answers if a different number of tasks is used. The test I did for this is:

PEM_Ld10.nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.cheyenne_gnu.mizuroute-default

The test has no answer changes for CTSM or CPL, but mizuRoute fields change as follows...

 RMS basRunoff                        3.0671E-05            NORMALIZED  2.6883E+00
 RMS instRunoff                       1.8783E+00            NORMALIZED  3.6466E+00
 RMS dlayRunoff                       1.2893E+00            NORMALIZED  3.4618E+00
 RMS sumUpstreamRunoff                6.4705E+02            NORMALIZED  1.4161E+01
 RMS KWTroutedRunoff                  6.5100E+02            NORMALIZED  1.5828E+01
 RMS IRFroutedRunoff                  2.6097E+02            NORMALIZED  9.0326E+00
 RMS volume                           2.1904E+06            NORMALIZED  1.1718E+01
@ekluzek ekluzek added question cesm-coupling For cesm coupling labels Jan 20, 2022
ekluzek added a commit to nmizukami/mizuRoute that referenced this issue Jan 20, 2022
@nmizukami
Copy link
Collaborator

Hi @ekluzek , Yes, this is expected. A different number of MPI tasks changes orders of basinID (hru dimension) and reachID (seg dimension) in history file.

If basinID and reachID in one history file are sorted to the ones in another history file with different MPI task run, both should be identical. I have a python script to sort the netcdf.

Roughly speaking, the array order associated with hru and seg are determined based on independent river basins defined based on the MPI tasks numbers

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 20, 2022

Ahhh, OK. So right now we could run the python script to show that answers are identical.

In the short term I think it would be good to have that script in the repository so that it would be easy to run the PEM test and then verify that answers are identical with the script. Could you add the script somewhere in the repo?

In the medium to longer term it would be good to have a mode where the sorting was done inside the code, so that you don't need the external sorting script. In the coupler we have a trigger like this that slows things down a little bit, but ensures that answers will be identical on different number of processors. That trigger is already turned on for testing, we just need something similar for mizuRoute. A slow simple solution would be fine to start with. A future advancement could be to speed it up. Since, you wouldn't normally run in this mode it's OK for it to be slower.

How hard would it be to add such a flag that would do the sorting needed to make sure answers are identical?

@ekluzek
Copy link
Collaborator Author

ekluzek commented May 18, 2022

We talked about this and we want to look into using the python script to first just be able to do this checking. And then add some automatic tests that run the script so that we know answers don't change. If we can do that we wouldn't need to get this into the code. @nmizukami did work on this, but it was really slow.

@ekluzek ekluzek changed the title Different number of tasks change answers Limitation: Different number of tasks change answers Jun 28, 2023
@nmizukami
Copy link
Collaborator

nmizukami commented Jan 26, 2024

Hi Erik (@ekluzek)

I am wondering if we can use this to reorder history netcdfs before comparing the two.

/glade/u/home/mizukami/hydro_nm/python_general/nc_reorder.py

usage: nc_reorder.py [-h] nc1 nc2 var dim

Script to reorder netcdf all the variables with specified dimension based on desired ordered variable in the 2nd netcdf

positional arguments:
  nc1         input netcdf to be ordered
  nc2         second netcdf containing desired ordered variables
  var         name of nc1 and nc2 common variable (e.g., hruId) used for reordering
  dim         name of nc1 and nc2 dimension, along which variable is reordered

mizuRoute history file may have vars(time, seg) with reachID(seg) and may have vars(time, hru) with basinID(hru). so typically I do

nc_reorder.py in.nc ref.nc reachID seg

This create in.reorder.nc, which can be compared with ref.nc (only variables with seg dimension).

This sorting is not terribly slow.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 27, 2024

Yes this would be great. Having this as a tool that's supported would allow us to check by hand which is a good start.

We would need to add in it's usage to the test system in order for the tests to use it automatically.

One way to do that would be to add mizuRoute specific system tests for some exact restart tests that do this as an additional step after cases are run. One way to do that would be to extend the ERP and PEM test classes for mizuRoute to contain this step. So it might not be too hard to do.

Anyway bottom line is I'd love to have this script supported as part of the mizuRoute code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cesm-coupling For cesm coupling question
Projects
None yet
Development

No branches or pull requests

2 participants