-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/ISSUE] GCHP c48 run with gfortran 8.2 on Odyssey hangs before end of run #13
Comments
Was the restart file successfully written? Also, is this only gfortran 8.2, or earlier versions as well?
…--
Lizzie Lundgren
Scientific Programmer
GEOS-Chem Support Team
geos-chem-support@as.harvard.edu<mailto:geos-chem-support@as.harvard.edu>
http://wiki.geos-chem.org/GEOS-Chem_Support_Team
Please direct all GEOS-Chem support issues to the entire GEOS-Chem Support Team
at geos-chem-support@as.harvard.edu<mailto:geos-chem-support@as.harvard.edu>. This will allow us to serve you better.
From: Bob Yantosca <notifications@github.com>
Reply-To: geoschem/gchp <reply@reply.github.com>
Date: Wednesday, December 19, 2018 at 4:21 PM
To: geoschem/gchp <gchp@noreply.github.com>
Cc: "Lundgren, Elizabeth W" <elundgren@seas.harvard.edu>, Mention <mention@noreply.github.com>
Subject: [geoschem/gchp] GCHP c48 run with gfortran 8.2 on Odyssey hangs before end of run (#13)
I tried running a GCHP C48 run on Odyssey but the job hung right after printing out the GIGCenv timer results.
AGCM Date: 2016/07/01 Time: 01:00:00
Writing: 11592 Slices ( 1 Nodes, 1 PartitionRoot) to File: OutputDir/GCHP.SpeciesConc_avg.20160701_0030z.nc4
Writing: 11592 Slices ( 1 Nodes, 1 PartitionRoot) to File: OutputDir/GCHP.SpeciesConc_inst.20160701_0100z.nc4
Writing: 72 Slices ( 1 Nodes, 1 PartitionRoot) to File: OutputDir/GCHP.StateMet_avg.20160701_0030z.nc4
Writing: 72 Slices ( 1 Nodes, 1 PartitionRoot) to File: OutputDir/GCHP.StateMet_inst.20160701_0100z.nc4
Times for GIGCenv
TOTAL : 1.069
INITIALIZE : 0.000
RUN : 0.418
GenInitTot : 0.650
--GenInitMine : 0.650
GenRunTot : 0.000
--GenRunMine : 0.000
GenFinalTot : 0.000
--GenFinalMine : 0.000
GenRecordTot : 0.001
--GenRecordMine : 0.001
GenRefreshTot : 0.000
--GenRefreshMine : 0.000
HEMCO::Finalize... OK.
Chem::State_Diag Finalize... OK.
Chem::State_Chm Finalize... OK.
Chem::State_Met Finalize... OK.
Chem::Input_Opt Finalize... OK.
Using parallel NetCDF for file: gcchem_internal_checkpoint_c48.nc
The script I used to submit the job is:
gchp.run.txt<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_files_2696506_gchp.run.txt&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=7UtRmp3lH_Jge7qZbLW5pTLXNOSs8hDz3Z62flMc_EA&s=AYJafJiJsKVhoyuLYkXuaxPWtaUQoDXMxVZu9PY-7Ec&e=>
And here is the full log:
gchp.log.txt<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_files_2696507_gchp.log.txt&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=7UtRmp3lH_Jge7qZbLW5pTLXNOSs8hDz3Z62flMc_EA&s=RJ6UYkBrg9bk5ik70GRuAMCl5JJp1lg0hShgLi07f3E&e=>
A similar run (done by @lizziel<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_lizziel&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=7UtRmp3lH_Jge7qZbLW5pTLXNOSs8hDz3Z62flMc_EA&s=QNj-1co2e9V4PlatU9l7r2yM-C940l7ti4gCAviuKIk&e=>) with Ifort 17.0.4 instead of gfortran 8.2 finished OK. Am wondering if the Gfortran compiler is not totally compatible with MAPL (or at least it seems to produce issues that we don't see when using ifort).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_issues_13&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=7UtRmp3lH_Jge7qZbLW5pTLXNOSs8hDz3Z62flMc_EA&s=1k-0aD5M6wfZPbra4KtOKHaWETPUlljCXHy26J1p8UA&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAnyqzz90LYY3kOBIHML80x1fD2zJwPoks5u6q3EgaJpZM4ZbCzq&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=7UtRmp3lH_Jge7qZbLW5pTLXNOSs8hDz3Z62flMc_EA&s=oSK2jcTDGpqcpprAsTvoFCu2VukQAN6r78oT_ZWopr0&e=>.
|
This is gfortran 8.2, did not test earlier versions. I think the restart files were written OK.
|
Hmm, it is odd that the end-of-run restart file is so much smaller than the initial checkpoint file. How does it compare to the initial restart file? If you view the output restart file does it look okay? I’d say the file size looks suspicious.
…--
Lizzie Lundgren
Scientific Programmer
GEOS-Chem Support Team
geos-chem-support@as.harvard.edu<mailto:geos-chem-support@as.harvard.edu>
http://wiki.geos-chem.org/GEOS-Chem_Support_Team
Please direct all GEOS-Chem support issues to the entire GEOS-Chem Support Team
at geos-chem-support@as.harvard.edu<mailto:geos-chem-support@as.harvard.edu>. This will allow us to serve you better.
From: Bob Yantosca <notifications@github.com>
Reply-To: geoschem/gchp <reply@reply.github.com>
Date: Wednesday, December 19, 2018 at 4:33 PM
To: geoschem/gchp <gchp@noreply.github.com>
Cc: "Lundgren, Elizabeth W" <elundgren@seas.harvard.edu>, Mention <mention@noreply.github.com>
Subject: Re: [geoschem/gchp] GCHP c48 run with gfortran 8.2 on Odyssey hangs before end of run (#13)
This is gfortran 8.2, did not test earlier versions.
I think the restart files were written OK.
256980 2018-12-19 15:41 gcchem_internal_checkpoint_c48.nc
2059258836 2018-12-19 15:38 gcchem_internal_checkpoint_c48.nc.20160701_0000z.bin
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geoschem_gchp_issues_13-23issuecomment-2D448751548&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=EDaD9yWuS193Zv2wt3csk4GjcEkmbo29iJJKjYQ4uAU&s=K70bvtmAizQJzieUjxI0M2VaYQwlpcGLdoJYCJ-lfko&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAnyq7G2tQArsUPUEEaBwXsPZgiFAKOaks5u6rC3gaJpZM4ZbCzq&d=DwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=xyVOGV-4mAPz62S8RZON4khwZesGKcGg2_BHL4y5NjQ&m=EDaD9yWuS193Zv2wt3csk4GjcEkmbo29iJJKjYQ4uAU&s=duEC_zQnVNEJUAaI_V---gd2tUZS025TjkPbBV9rUtk&e=>.
|
The restart file doesn't have any coordinates:
So it looks like something is messed up in restart file output. If there is an out-of-bounds error maybe that's doing it |
Have you tried compiling with debug flags? I think BOPT in GCHP/Shared/Config/ESMA_base.mk is what to configure, to 'g'. |
And I think you can add additional flags to the fortran flags section in GIGC.mk in the run directory. Some ideas for what to use are at https://stackoverflow.com/questions/3676322/what-flags-do-you-set-for-your-gfortran-debugger-compiler-to-catch-faulty-code. Maybe this will help with geoschem/GCHP#11 and geoschem/GCHP#14 as well. |
This issue (and also #14) appears to have been caused by an out-of-bounds error in the Olson landmap module. The variable maxFracInd was zero but should not have been. I added a quick fix in the GEOS-Chem "Classic" repo in GeosCore/olson_landmap_mod.F90:
With this fix, a C48 simulation finished properly on Odyssey, printing out all timing info. It appears the Olson land map data is not being read in properly, which is the root cause of this issue. I am investigating this. |
This bug fix was submitted for use in GEOS CTM by Kyle Gerheiser (GMAO) and is relevant to GCHP. See GEOS-ESM/GEOSgcm_GridComp issue #13: GEOS-ESM/GEOSgcm_GridComp#13 Without this fix concentrations in GCHP will blow up in advection at our previous default timesteps. Until the bug was fixed we decreased the default dynamic timesteps as work-around. Updating the default timesteps in the run directory creation template will be in a later commit. Signed-off-by: Lizzie Lundgren <elundgren@seas.harvard.edu>
I tried running a GCHP C48 run on Odyssey but the job hung right after printing out the GIGCenv timer results.
The script I used to submit the job is:
gchp.run.txt
And here is the full log:
gchp.log.txt
A similar run (done by @lizziel) with Ifort 17.0.4 instead of gfortran 8.2 finished OK. Am wondering if the Gfortran compiler is not totally compatible with MAPL (or at least it seems to produce issues that we don't see when using ifort).
The text was updated successfully, but these errors were encountered: