New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/ISSUE] Time interpolation error for leaf area index data, dependent on # of cores #132
Comments
The issue might lie within routine |
I put some more debug in Get_TimeIdx. It looks like for some reason the run with 5 cores is flagging the LAI data as not discontinuous but the run with 8 cores is: 5 cores; @@@ xlai42 : origYMDhm : 201907010000.00000
@@@ xlai42 : prefYMDhm : 201907010000.00000
@@@ xlai42 : tidx1a after availYMDhm: 23
@@@ xlai42 : isClosest 1, prefymdhm : 201907010000.00000
@@@ xlai42 : isClosest 1, availymdhm : 201906270000.00000
@@@ xlai42 : IsClosest 1, ntime : 46
@@@ xlai42 : IsClosest 1, tidx1a : 23
@@@ xlai42 : after IsClosest 1, exitSearch: F
@@@ xlai42 : after 1st or last timestep, : 23
@@@ xlai42 : after test for cont search, tidx1a: 23
@@@ xlai42 : after test for cont search, exit : T
@@@ xlai42 : data discontinuous : F 8 cores: @@@ xlai42 : origYMDhm : 201907010000.00000
@@@ xlai42 : prefYMDhm : 201907010000.00000
@@@ xlai42 : tidx1a after availYMDhm: 23
@@@ xlai42 : isClosest 1, prefymdhm : 201907010000.00000
@@@ xlai42 : isClosest 1, availymdhm : 201906270000.00000
@@@ xlai42 : IsClosest 1, ntime : 46
@@@ xlai42 : IsClosest 1, tidx1a : 23
@@@ xlai42 : after IsClosest 1, exitSearch: F
@@@ xlai42 : after 1st or last timestep, : 23
@@@ xlai42 : after test for cont search, tidx1a: 23
@@@ xlai42 : after test for cont search, exit : F
@@@ xlai42 : data discontinuous : T
---
@@@ xlai42 : after adjust, ntime : 46
@@@ xlai42 : after adjust, availYmdhm : 201912280000.00000
@@@ xlai42 : after adjust, prefymdhm : 201907010000.00000
@@@ xlai42 : after adjust, cnt : 1
@@@ xlai42 : after adjust, tidx1a : 23
@@@ xlai42 : after check_availymdhm 201907010000.00000 1 23
@@@ xlai42 : after is_closest 2 201907010000.00000 46 23 -1
@@@ xlai42 : after adjust, ntime : 46
@@@ xlai42 : after adjust, availYmdhm : 201912280000.00000
@@@ xlai42 : after adjust, prefymdhm : 201906010000.00000
@@@ xlai42 : after adjust, cnt : 2
@@@ xlai42 : after adjust, tidx1a : 23
@@@ xlai42 : after check_availymdhm 201906010000.00000 2 19
@@@ xlai42 : after is_closest 2 201906010000.00000 46 19 -1
@@@ xlai42 : after adjust, ntime : 46
@@@ xlai42 : after adjust, availYmdhm : 201912280000.00000
@@@ xlai42 : after adjust, prefymdhm : 201906070000.00000
@@@ xlai42 : after adjust, cnt : 3
@@@ xlai42 : after adjust, tidx1a : 19
@@@ xlai42 : after check_availymdhm 201906070000.00000 3 20
@@@ xlai42 : after is_closest 2 201906070000.00000 46 20 20 It may be that the run on 5 cores is correct because it is pulling the date 20190627 as the closest date, where the run on 8 cores keeps iterating back in time until it pulls 20190607 as the closest date (which is wrong). |
The XLAI is tagged with "I" (Interpolated), which should force * XLAI42 $ROOT/Yuan_XLAI/v2021-06/Yuan_proc_MODIS_XLAI.025x025.$YYYY.nc XLAI42 2000-2020/1-12/1-31/0 I xy cm2/cm2 * - 1 1 *
* ``` |
Found the issue. In HEMCO/src/Core/hco_config_mod.F90 Lines 960 to 1056 in 7274396
and the second is at: HEMCO/src/Core/hco_config_mod.F90 Lines 1298 to 1393 in 7274396
Note that at the top of the IF block starting at line 960 we initialize some fields of the data container object: Dta%MustFind = .FALSE.
Dta%UseSimYear= .FALSE.
Dta%Discontinuous = .FALSE. but at the top of the IF block starting at line 1298, we have: Dta%MustFind = .FALSE.
Dta%UseSimYear= .FALSE. so for the 2nd IF block, By adding Dta%Discontinuous = .FALSE. atop the IF block starting at around line 1298, the issue is now fixed. I will prepare a corresponding PR. |
This commit fixes the issue reported in geoschem/HEMCO #132. We had observed a difference in output in runs using different numbers of computational cores. We had thought this was a parallelization issue but in fact, the interpolation of leaf area indices was being done incorrectly. The root cause of the issue was that the Dta%Discontinuous field was being used before it was initialized. In some instances, this was inadvertently assigned a TRUE value when it should have been FALSE. Adding a Dta%Discontinuous = .FALSE. statement above the IF block where it is used fixes the problem. Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
Nice find! That code looks identical in both, which is pretty dangerous, as this bug shows. Could it be abstracted somehow so we don't need to duplicate? |
Thanks Bob, this is incredible detective work. I was also wondering whether this code repetition can somehow be avoided, especially in an often-updated place like the interpretation of time cycling flags because new ones may be added pretty frequently. This can probably be abstracted into a subroutine or into a separate file that is |
I can try to abstract that into a subroutine. I've also added some extra initialization of variables in other routines so I'll create a PR with the updated HEMCO. Will test against the prior version, we should just get numerical noise diffs if anything. |
Also see HEMCO issue #133 |
We can close this issue because PR #134, which resolves this issue, has been merged into the HEMCO 3.4.0 / GEOS-Chem 13.4.0 development stream. |
Remaining parallelization issues in GEOS-Chem Classic simulations have now been solved in PR geoschem/geos-chem#1190. |
While debugging geoschem/geos-chem#1103, I stumbled upon an indexing error that is selecting the wrong time slice for leaf area index variable
XLAI42
. I added the following debug input toHEMCO/src/Core/hcoio_read_std_mod.F90
.With the exact same executable, I get this output when I run GC-Classic on 1 core:
and this output with 5 cores:
and this output with 8 cores:
As you can see, the time slice indices tidx1 and tidx2 (which are the upper & lower time bounds of the data) are the same with 1 and 8 cores, but with 5 cores, HEMCO is pulling a different time slice.
Wondering if it is a parallelization issue upstream. Any ideas: @christophkeller @jimmielin @msulprizio @lizziel
The text was updated successfully, but these errors were encountered: