[BUG/ISSUE] Differences in output when splitting a run into consecutive shorter runs #57

lizziel · 2020-11-04T22:34:08Z

It has been a known issue for a long time that GCHP does not give exactly the same final result between a long single run and the identical run split up into shorter durations. This has been true for both the transport tracer and the full chemistry simulations.

This is especially problematic for GCHP because currently the only way to output monthly mean diagnostics is to break up a run into 1-month run segments. A monthly mean capability was supposed to be included within MAPL for the 13.0.0 release but that update is not yet ready in a MAPL release. Since we output monthly means in GCHP 1-year benchmarks I have been looking more closely at this issue to find fixes before we do the 13.0.0 benchmark.

Recent updates that are going into GEOS-Chem 13.0.0 correct this problem for transport tracers. Bug fixes in the GEOS-Chem and HEMCO submodules resolved the issue and the simulation now gives zero diffs regardless of how the run is split up. See the following posts on GitHub for more information on these updates:

[BUG/ISSUE] State_Met%FRCLND improperly set for GCHP after first timestep geos-chem#502 (State_Met%FRCLND improperly set for GCHP after first timestep)
[BUG/ISSUE] Incorrect source of Be7Strat and Be10Strat below tropopause level in RnPbBe extension HEMCO#59 (Incorrect source of Be7Strat and Be10Strat below tropopause level in RnPbBe extension)

Differences persist in the full chemistry simulation and I am actively looking into them.

lizziel · 2020-11-04T22:59:05Z

One issue I have found is that the input.geos option to initialize stratospheric H2O is set to True by default for all runs. This introduces small changes into meteorology during subroutine SET_H2O_TRAC. If splitting a run up into multiple segments then the init strat H2O setting should be True only for the very first run. This issue is now fixed in GCHP commit #4d100d4, which is a change to the GCHP run directory in the GEOS-Chem submodule.

It is important to note that this setting in input.geos should be updated if GEOS-Chem Classic runs are split up as well. We do not have scripts to automate the job submission and config file updates for consecutive GEOS-Chem Classic runs so no updates are needed for the GEOS-Chem repository for that.

Following this update, the only multi vs single run differences in the GCHP full chemistry simulation occur in chemistry. There is currently a parallelization bug that is preventing further progress in identifying the source of the differences.

sdeastham · 2020-11-04T23:08:05Z

Great to see this progress and I'm excited at the prospect of finally getting parity between single and multi-segment runs! Can you clarify on the parallelization bug though? Is something only being initialized on the root CPU and not propagating to the others?

lizziel · 2020-11-05T15:19:45Z

My understanding is the parallelization bug is this: geoschem/geos-chem#392. @msulprizio has been looking at it more closely lately.

sdeastham · 2020-11-05T15:23:04Z

Got it. That would only affect GC-Classic though, right (being an OpenMP parallelization bug)?

lizziel · 2020-11-05T15:57:09Z

Actually, we have OpenMP on by default if using Intel compilers. Due to the way the CMake files were written, OpenMP has been on for a while for all compilers despite building with OMP=n. As of a very recent 13.0 commit, OpenMP is now off if using gfortran to avoid a slow-down when using that compiler. It is still on with Intel due to a separate bug. Both of these problems are on the to do list for 13.0.0. We should in theory be able to switch between OMP=y and OMP=n without issue, but that is not yet the case.

I plan on revisiting the diffs in chemistry after spending some time on 13.0 documentation to make sure I can get rid of the parallelization bug signature by turning off OpenMP. I'm also curious if what I am seeing goes away with het chem off.

sdeastham · 2020-11-05T16:04:32Z

I assume though that, even though we have OpenMP on, we are still only using one OpenMP thread such that we still shouldn't be affected by OpenMP parallelization errors? That having been said it sounds like there's something even weirder than a pure parallelization error going on in there, so I see your point about waiting to see what happens there first!

stale · 2021-03-19T14:03:01Z

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

lizziel · 2021-04-08T13:27:08Z

Quick update on this. Current dev branch for GC-Classic has a parallelization bug still. It is not in GCHP so pretty sure it's an OpenMP issue. However, I did a quick multi vs single run test for GCHP 13.0 (2-hr run vs two 1-hr runs) and there are still differences for the full chemistry simulation when chemistry is turned on. I am keeping this issue open to keep it on the radar. Fortunately we will soon switch over to using MAPL History monthly collections so will no longer be running 1-year benchmarks in 1-month run segments.

yantosca · 2021-04-08T13:33:25Z

Also see issue: geoschem/HEMCO#78. I discovered what might be an issue in the HEMCO MEGAN extension. Not sure yet if this is related. I am going to be looking into this parallelization issue and should hopefully solve it soon.

lizziel · 2021-04-08T13:55:16Z

I don't think this is related. The differences are only coming in when chemistry is turned on. HEMCO on but chemistry off does not have differences.

lizziel · 2021-12-15T23:02:27Z

This issue has not been looked into for a while but I am keeping it open to bring attention to it as a long-standing issue. It will be revisited again in the future at some point.

lizziel · 2022-01-31T23:14:28Z

I plan to re-assess this issue in 14.0.

lizziel · 2022-03-15T14:30:07Z

I am revisiting this issue after work by @christophkeller to try to eliminate differences seen in GEOS due to GEOS-Chem. He added 60+ variables to the internal state, mostly State_Chm arrays used in isorropia, and reported this fixed the issue in 13.3 but not 13.4. Having zero diff across runs regardless of how the runs are split up in time is a requirement for GEOS.

I am doing tests with 13.4 using the additional internal state variables to hone in on the remaining source of differences, presumably missing internal state variables. The differences only appear when chemistry is turned on. I am finding a remaining bias near the surface when comparing a 2hr run versus two 1hr runs. For example, negative bias in ozone zonal mean:

It is not good to have 60+ additional 3D arrays in the internal state. This significantly increases the memory requirement and is particularly costly at high resolutions. I will look into whether we can adjust the order of operations such that carrying the fields for isorropia across timesteps is not necessary or at least minimized.

In summary, the to do list for this work is:

Find and fix remaining differences in chemistry when splitting up a run.
Examine order of operations to reduce required set of arrays that need to be carried across timesteps and thus stored in the internal state and included in the restart file.
Apply the GCHP updates to GC-Classic after GC-Classic no longer has parallelization problems, and assess if differences are still seen when splitting up runs into multiple segments.

Items 1 and 2 are motivated by strict requirements in GEOS with benefit to GCHP. All items will improve GEOS-Chem 1-year fullchem benchmark accuracy since we currently break up the 1-year benchmark runs into multiple months.

Fixes and updates related to this will go into 14.0.

lizziel · 2022-03-18T14:30:09Z

I found and fixed a problem in wet scavenging limited to H2O2. See geoschem/geos-chem#1178. The fix is going into 13.4.

lizziel · 2022-05-11T14:20:07Z

All remaining differences when splitting up a GCHP benchmark simulation will be removed in geoschem/geos-chem#1229 that is going into 14.0.0.

lizziel added the category: Bug Something isn't working label Nov 4, 2020

lizziel self-assigned this Nov 4, 2020

stale bot added the stale No recent activity on this issue label Mar 19, 2021

lizziel added never stale Never label this issue as stale and removed stale No recent activity on this issue labels Mar 19, 2021

lizziel added this to the 14.0.0 milestone Feb 15, 2022

lizziel mentioned this issue Mar 18, 2022

GCHP fix: move setup_wetscav so C_H2O is not all zeros in first timestep geoschem/geos-chem#1178

Merged

lizziel mentioned this issue Apr 20, 2022

Inconsistent restarted simulations [BUG/ISSUE] geoschem/geos-chem#1225

Closed

lizziel closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/ISSUE] Differences in output when splitting a run into consecutive shorter runs #57

[BUG/ISSUE] Differences in output when splitting a run into consecutive shorter runs #57

lizziel commented Nov 4, 2020

lizziel commented Nov 4, 2020

sdeastham commented Nov 4, 2020

lizziel commented Nov 5, 2020

sdeastham commented Nov 5, 2020

lizziel commented Nov 5, 2020 •

edited

sdeastham commented Nov 5, 2020

stale bot commented Mar 19, 2021

lizziel commented Apr 8, 2021

yantosca commented Apr 8, 2021

lizziel commented Apr 8, 2021

lizziel commented Dec 15, 2021

lizziel commented Jan 31, 2022

lizziel commented Mar 15, 2022 •

edited

lizziel commented Mar 18, 2022 •

edited

lizziel commented May 11, 2022

[BUG/ISSUE] Differences in output when splitting a run into consecutive shorter runs #57

[BUG/ISSUE] Differences in output when splitting a run into consecutive shorter runs #57

Comments

lizziel commented Nov 4, 2020

lizziel commented Nov 4, 2020

sdeastham commented Nov 4, 2020

lizziel commented Nov 5, 2020

sdeastham commented Nov 5, 2020

lizziel commented Nov 5, 2020 • edited

sdeastham commented Nov 5, 2020

stale bot commented Mar 19, 2021

lizziel commented Apr 8, 2021

yantosca commented Apr 8, 2021

lizziel commented Apr 8, 2021

lizziel commented Dec 15, 2021

lizziel commented Jan 31, 2022

lizziel commented Mar 15, 2022 • edited

lizziel commented Mar 18, 2022 • edited

lizziel commented May 11, 2022

lizziel commented Nov 5, 2020 •

edited

lizziel commented Mar 15, 2022 •

edited

lizziel commented Mar 18, 2022 •

edited