Limitation: Different number of tasks change answers #256

ekluzek · 2022-01-20T07:05:38Z

I'm not sure if this is expected or not. But, I did find that mizuRoute changes answers if a different number of tasks is used. The test I did for this is:

PEM_Ld10.nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.cheyenne_gnu.mizuroute-default

The test has no answer changes for CTSM or CPL, but mizuRoute fields change as follows...

 RMS basRunoff                        3.0671E-05            NORMALIZED  2.6883E+00
 RMS instRunoff                       1.8783E+00            NORMALIZED  3.6466E+00
 RMS dlayRunoff                       1.2893E+00            NORMALIZED  3.4618E+00
 RMS sumUpstreamRunoff                6.4705E+02            NORMALIZED  1.4161E+01
 RMS KWTroutedRunoff                  6.5100E+02            NORMALIZED  1.5828E+01
 RMS IRFroutedRunoff                  2.6097E+02            NORMALIZED  9.0326E+00
 RMS volume                           2.1904E+06            NORMALIZED  1.1718E+01

The text was updated successfully, but these errors were encountered:

nmizukami · 2022-01-20T15:08:02Z

Hi @ekluzek , Yes, this is expected. A different number of MPI tasks changes orders of basinID (hru dimension) and reachID (seg dimension) in history file.

If basinID and reachID in one history file are sorted to the ones in another history file with different MPI task run, both should be identical. I have a python script to sort the netcdf.

Roughly speaking, the array order associated with hru and seg are determined based on independent river basins defined based on the MPI tasks numbers

ekluzek · 2022-01-20T21:51:08Z

Ahhh, OK. So right now we could run the python script to show that answers are identical.

In the short term I think it would be good to have that script in the repository so that it would be easy to run the PEM test and then verify that answers are identical with the script. Could you add the script somewhere in the repo?

In the medium to longer term it would be good to have a mode where the sorting was done inside the code, so that you don't need the external sorting script. In the coupler we have a trigger like this that slows things down a little bit, but ensures that answers will be identical on different number of processors. That trigger is already turned on for testing, we just need something similar for mizuRoute. A slow simple solution would be fine to start with. A future advancement could be to speed it up. Since, you wouldn't normally run in this mode it's OK for it to be slower.

How hard would it be to add such a flag that would do the sorting needed to make sure answers are identical?

ekluzek · 2022-05-18T20:58:53Z

We talked about this and we want to look into using the python script to first just be able to do this checking. And then add some automatic tests that run the script so that we know answers don't change. If we can do that we wouldn't need to get this into the code. @nmizukami did work on this, but it was really slow.

nmizukami · 2024-01-26T15:41:41Z

Hi Erik (@ekluzek)

I am wondering if we can use this to reorder history netcdfs before comparing the two.

/glade/u/home/mizukami/hydro_nm/python_general/nc_reorder.py

usage: nc_reorder.py [-h] nc1 nc2 var dim

Script to reorder netcdf all the variables with specified dimension based on desired ordered variable in the 2nd netcdf

positional arguments:
  nc1         input netcdf to be ordered
  nc2         second netcdf containing desired ordered variables
  var         name of nc1 and nc2 common variable (e.g., hruId) used for reordering
  dim         name of nc1 and nc2 dimension, along which variable is reordered

mizuRoute history file may have vars(time, seg) with reachID(seg) and may have vars(time, hru) with basinID(hru). so typically I do

nc_reorder.py in.nc ref.nc reachID seg

This create in.reorder.nc, which can be compared with ref.nc (only variables with seg dimension).

This sorting is not terribly slow.

ekluzek · 2024-01-27T04:03:04Z

Yes this would be great. Having this as a tool that's supported would allow us to check by hand which is a good start.

We would need to add in it's usage to the test system in order for the tests to use it automatically.

One way to do that would be to add mizuRoute specific system tests for some exact restart tests that do this as an additional step after cases are run. One way to do that would be to extend the ERP and PEM test classes for mizuRoute to contain this step. So it might not be too hard to do.

Anyway bottom line is I'd love to have this script supported as part of the mizuRoute code base.

ekluzek added question cesm-coupling For cesm coupling labels Jan 20, 2022

ekluzek added a commit to nmizukami/mizuRoute that referenced this issue Jan 20, 2022

Remove extra PEM tests as they are showing answer change see ESCOMP#256

1e8aa46

ekluzek changed the title ~~Different number of tasks change answers~~ Limitation: Different number of tasks change answers Jun 28, 2023

ekluzek mentioned this issue Jun 28, 2023

ERP tests failing to run #406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limitation: Different number of tasks change answers #256

Limitation: Different number of tasks change answers #256

ekluzek commented Jan 20, 2022

nmizukami commented Jan 20, 2022

ekluzek commented Jan 20, 2022

ekluzek commented May 18, 2022

nmizukami commented Jan 26, 2024 •

edited

Loading

ekluzek commented Jan 27, 2024

Limitation: Different number of tasks change answers #256

Limitation: Different number of tasks change answers #256

Comments

ekluzek commented Jan 20, 2022

nmizukami commented Jan 20, 2022

ekluzek commented Jan 20, 2022

ekluzek commented May 18, 2022

nmizukami commented Jan 26, 2024 • edited Loading

ekluzek commented Jan 27, 2024

nmizukami commented Jan 26, 2024 •

edited

Loading