-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limitation: Different number of tasks change answers #256
Comments
Hi @ekluzek , Yes, this is expected. A different number of MPI tasks changes orders of basinID (hru dimension) and reachID (seg dimension) in history file. If basinID and reachID in one history file are sorted to the ones in another history file with different MPI task run, both should be identical. I have a python script to sort the netcdf. Roughly speaking, the array order associated with hru and seg are determined based on independent river basins defined based on the MPI tasks numbers |
Ahhh, OK. So right now we could run the python script to show that answers are identical. In the short term I think it would be good to have that script in the repository so that it would be easy to run the PEM test and then verify that answers are identical with the script. Could you add the script somewhere in the repo? In the medium to longer term it would be good to have a mode where the sorting was done inside the code, so that you don't need the external sorting script. In the coupler we have a trigger like this that slows things down a little bit, but ensures that answers will be identical on different number of processors. That trigger is already turned on for testing, we just need something similar for mizuRoute. A slow simple solution would be fine to start with. A future advancement could be to speed it up. Since, you wouldn't normally run in this mode it's OK for it to be slower. How hard would it be to add such a flag that would do the sorting needed to make sure answers are identical? |
We talked about this and we want to look into using the python script to first just be able to do this checking. And then add some automatic tests that run the script so that we know answers don't change. If we can do that we wouldn't need to get this into the code. @nmizukami did work on this, but it was really slow. |
Hi Erik (@ekluzek) I am wondering if we can use this to reorder history netcdfs before comparing the two. /glade/u/home/mizukami/hydro_nm/python_general/nc_reorder.py
mizuRoute history file may have vars(time, seg) with reachID(seg) and may have vars(time, hru) with basinID(hru). so typically I do
This create in.reorder.nc, which can be compared with ref.nc (only variables with seg dimension). This sorting is not terribly slow. |
Yes this would be great. Having this as a tool that's supported would allow us to check by hand which is a good start. We would need to add in it's usage to the test system in order for the tests to use it automatically. One way to do that would be to add mizuRoute specific system tests for some exact restart tests that do this as an additional step after cases are run. One way to do that would be to extend the ERP and PEM test classes for mizuRoute to contain this step. So it might not be too hard to do. Anyway bottom line is I'd love to have this script supported as part of the mizuRoute code base. |
I'm not sure if this is expected or not. But, I did find that mizuRoute changes answers if a different number of tasks is used. The test I did for this is:
PEM_Ld10.nldas2_rMERIT_mnldas2.I2000Clm50SpMizGs.cheyenne_gnu.mizuroute-default
The test has no answer changes for CTSM or CPL, but mizuRoute fields change as follows...
The text was updated successfully, but these errors were encountered: