-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG/ISSUE]CH4 simulation: Infinity in DO_CLOUD_CONVECTION #954
Comments
Line 1885 of tpcore_fvdas_mod.F90 is the declaration of DQ1 in this code below: SUBROUTINE Qckxyz( dq1, J1P, J2P, JU1_GL, J2_GL, &
ILO, IHI, JULO, JHI, I1, &
I2, JU1, J2, K1, K2 )
!
! !INPUT PARAMETERS:
!
! Global latitude indices at the edges of the S/N polar caps
! J1P=JU1_GL+1; J2P=J2_GL-1 for a polar cap of 1 latitude band
! J1P=JU1_GL+2; J2P=J2_GL-2 for a polar cap of 2 latitude bands
INTEGER, INTENT(IN) :: J1P, J2P
! Global min & max latitude (J) indices
INTEGER, INTENT(IN) :: JU1_GL, J2_GL
! Local min & max longitude (I), latitude (J), altitude (K) indices
INTEGER, INTENT(IN) :: I1, I2
INTEGER, INTENT(IN) :: JU1, J2
INTEGER, INTENT(IN) :: K1, K2
! Local min & max longitude (I) and latitude (J) indices
INTEGER, INTENT(IN) :: ILO, IHI
INTEGER, INTENT(IN) :: JULO, JHI
!
! !INPUT/OUTPUT PARAMETERS:
!
! Species density [hPa]
REAL(fp), INTENT(INOUT) :: dq1(ILO:IHI, JULO:JHI, K1:K2)
! One thing that is puzzling is that this is called at line 927. but the traceback says line 828 if (FILL) then
! ===========
call Qckxyz &
! ===========
(dq1, &
j1p, j2p, 1, jm, &
1, im, 1, jm, 1, im, 1, jm, 1, km)
end if
|
@yanglibj can you also try using a Gfortran (such as gfortran 10) with debugging? That might trap the error better than ifort. |
Hi Bob,
I used the following to apply ifort on Cannon. `. ~/init/init.gc-classic.ifort17.centos7`
For Gortran, which of the following would you suggest using?
init.gc-classic.gfortran71.centos7* init.gc-classic.gfortran93.centos7* init.gc-classic.ifort17.centos7*
init.gc-classic.gfortran82.centos7* init.gc-classic.ifort11.centos7* init.gc-classic.ifort18.centos7*
init.gc-classic.gfortran92.centos7* init.gc-classic.ifort15.centos7* init.gc-classic.ifort19.centos7*
Yang
|
@yanglibj, try loading this environment file: https://github.com/Harvard-ACMG/cannon-env/blob/main/envs/gcc.gfortran10.2_cannon.env |
@yantosca I tried this environment but somehow got different weird problems when compiling. Since the DO_CLOUD_CONVECTION error was related to the original code 12.9.3, could you please try from your side? Thank you! |
To provide more info, when I used gfortran to compile, I got the following error. However, using ifort was fine... /n/helmod/apps/centos7/MPI/intel/17.0.4-fasrc01/openmpi/2.1.0-fasrc02/netcdf-fortran/4.4.0-fasrc03/lib/libnetcdff.so: undefined reference to `iso_c_binding_mp_c_null_ptr_'
collect2: error: ld returned 1 exit status
make[8]: *** [exe] Error 1
make[8]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/HEMCO/src/Interfaces'
make[7]: *** [all] Error 2
make[7]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/HEMCO/src/Interfaces'
make[6]: *** [check] Error 2
make[6]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/HEMCO'
make[5]: *** [lib] Error 2
make[5]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/HEMCO'
make[4]: *** [libhemco] Error 2
make[4]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/GeosCore'
make[3]: *** [lib] Error 2
make[3]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/GeosCore'
make[2]: *** [all] Error 2
make[2]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3/GeosCore'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/n/home07/yanglibj/GC/Code.12.9.3'
cp -f ./CodeDir/bin/geos geos
cp: cannot stat ‘./CodeDir/bin/geos’: No such file or directory
make: *** [build] Error 1 |
@yanglibj did you make clean before compiling with gfortran? |
Yes I did. I always make clean before compiling. |
Also another tip: open a new terminal window and then load the gfortran environment. Then make realclean and try to compile again. Sometimes if you change modules in the same terminal session that could lead to issues (if leftover modules aren't purged). |
I still couldn't make the model running with gcc.gfortran10.2_cannon.env. But I was able to use gfortran through init.gc-classic.gfortran71.centos7 I also tried an earlier version 12.7.1 but got the same NaN error. Below is the message:
|
@yanglibj, where is your code & rundir on Cannon? I can try to take a look. Make sure the folders have world-readable permissons (i.e. chmod 755 for folders or executables, chmod 644 for files). |
Also, @yanglibj, is your code "out-of-the-box" or does it contain any modifications? |
@yanglibj You wrote:
If the run crashing consistently at the same date and time then that may indicate a bad met field. I would recommend looking into the met fields for that date. If you see any issues, let us know and we can try to reprocess the meteorology fields. |
I didn't see anything weird related to the met field for 2021-4-4 so I didn't mention it up earlier. Maybe I missed anything. If you could reprocess the data, I can try the simulation again and see whether it fixes the problem. Also, I have shared my code and rundir on Cannon with your team in my private email. It would be helpful if you can take a look for me. Thank you! |
Btw if you prefer to look into a clean code version, you can download a clean v12.9.3 code and it should give you the same problem. |
Thanks @yanglibj. From the error output you sent via direct message, I noticed this
which would imply that you are running GEOS-Chem Classic with MPI, which may be part of the problem. Could you also send or post the run script that you are using? |
Thanks for catching this. This is the error from ./geos > GC_12.9.3.log, no MPI. Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x2aaaac1ec3ff in ???
#1 0x1164338 in __ncdf_mod_MOD_nc_read_arr
at /home/liy/GC/Code.12.9.3/NcdfUtil/ncdf_mod.F90:1156
#2 0xcc2d6c in __hcoio_read_std_mod_MOD_hcoio_read_std
at /home/liy/GC/Code.12.9.3/HEMCO/src/Core/hcoio_read_std_mod.F90:748
#3 0xd7e9ea in __hcoio_dataread_mod_MOD_hcoio_dataread
at /home/liy/GC/Code.12.9.3/HEMCO/src/Core/hcoio_dataread_mod.F90:257
#4 0xc7f844 in readlist_fill
at /home/liy/GC/Code.12.9.3/HEMCO/src/Core/hco_readlist_mod.F90:510
#5 0xc80b1d in __hco_readlist_mod_MOD_readlist_read
at /home/liy/GC/Code.12.9.3/HEMCO/src/Core/hco_readlist_mod.F90:327
#6 0xc5be75 in __hco_driver_mod_MOD_hco_run
at /home/liy/GC/Code.12.9.3/HEMCO/src/Core/hco_driver_mod.F90:163
#7 0x57aba5 in __hcoi_gc_main_mod_MOD_hcoi_gc_run
at /home/liy/GC/Code.12.9.3/GeosCore/hcoi_gc_main_mod.F90:828
#8 0x78d6d8 in __emissions_mod_MOD_emissions_run
at /home/liy/GC/Code.12.9.3/GeosCore/emissions_mod.F90:203
#9 0x4ff54c in geos_chem
at /home/liy/GC/Code.12.9.3/GeosCore/main.F90:990
#10 0x503e20 in main
at /home/liy/GC/Code.12.9.3/GeosCore/main.F90:32
Floating point exception (core dumped) Have you got a chance to try a run based on a clean 12.9.3 code? Did you see the same problem when running for 2021-4-4? Thank you! |
Hi @yanglibj, I was finally able to confirm that the issue happens with a clean 12.9.3 CH4 run (using 2021 GEOS_FP met) as well as with a 13.2.1 run for the same time period. The run always dies at 18z on 2021-04-04: In 12.9.3: ---> DATE: 2021/04/04 UTC: 17:50 X-HRS: 89.833336
---> DATE: 2021/04/04 UTC: 18:00 X-HRS: 90.000000
- Found all A1 met fields for 2021/04/04 18:30
- Found all A3cld met fields for 2021/04/04 19:30
- Found all A3dyn met fields for 2021/04/04 19:30
- Found all A3mstC met fields for 2021/04/04 19:30
- LightNOX extension is off. Skipping FLASH_DENS and CONV_DEPTH fields in FlexGrid_Read_A3mstE.
- Found all A3mstE met fields for 2021/04/04 19:30
- Found all I3 met fields for 2021/04/04 21:00
Infinity in DO_CLOUD_CONVECTION!
Infinity in DO_CLOUD_CONVECTION!
Infinity in DO_CLOUD_CONVECTION!
Infinity in DO_CLOUD_CONVECTION!
K, IC, Q(K,IC): 8 1 NaN
K, IC, Q(K,IC): 8 1 NaN and again in 13.2.1: ---> DATE: 2021/04/04 UTC: 17:50 X-HRS: 89.833336
---> DATE: 2021/04/04 UTC: 18:00 X-HRS: 90.000000
- Found all A1 met fields for 2021/04/04 18:30
- Found all A3cld met fields for 2021/04/04 19:30
- Found all A3dyn met fields for 2021/04/04 19:30
- Found all A3mstC met fields for 2021/04/04 19:30
- LightNOX extension is off. Skipping FLASH_DENS and CONV_DEPTH fields in FlexGrid_Read_A3mstE.
- Found all A3mstE met fields for 2021/04/04 19:30
- Found all I3 met fields for 2021/04/04 21:00
Infinity in DO_CLOUD_CONVECTION!
K, IC, Q(K,IC): 4 1 NaN CH4
NaN NaN NaN NaN NaN NaN 300.00000000000000 NaN so I would lean towards a bad met field at that date & time. At 18:00z, met fields are being read in for the next 3-hr period. I've traced it to the @YanshunLi-washu, would you be able to reprocess the 2021/04/04 GEOS-FP met data? |
@yantosca Thank you for confirming the issue. I will try again after @YanshunLi-washu reprocesses the met data. |
@yantosca Will do! |
Hi @YanshunLi-washu <https://github.com/YanshunLi-washu> and @yantosca
<https://github.com/yantosca>, I finally got some time to rerun the model.
But using the new data, I still got the same error.
@yantosca <https://github.com/yantosca> Bob, have you got a chance to try
this data?
@YanshunLi-washu <https://github.com/YanshunLi-washu> I noticed the last
few lines actually include a few times for 2021/04/05. Or could you
reprocess the data for this date?
…On Sun, Nov 7, 2021 at 1:13 AM Yanshun Li ***@***.***> wrote:
Hi @yanglibj <https://github.com/yanglibj> and @yantosca
<https://github.com/yantosca>, the 2021/04/04 data has been reprocessed!
Hope that will help.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#954 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKOMFETZMLIGOBUCI4IQTYDUKYRKVANCNFSM5F75FARA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@yantosca @YanshunLi-washu Hi Bob and Yanshun, I am checking in to see whether there is any update or anything I can do from my side. Thank you! |
Hi @YanshunLi-washu , thanks for your reply. Bob also tried the simulations and identified the same error. So the error was not related to how I use the code. Instead, it seems to be a general issue when running GEOS-Chem for recent months (April, 2021). See Bob's earlier reply: So that's why I wanted to see whether Bob tried your new met data as well. @yantosca |
This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue. |
I changed the time of the |
Description of the problem
Using version12.9.3, got the "Infinity in DO_CLOUD_CONVECTION" error starting since the 2021/04/04 19:30 time step.
Description of troubleshooting performed
Tried an earlier fix that was added to UCX_MOD: e4a632b
but this didn't resolve the problem.
Turned on debugging to track back the problem but didn't find anything wrong with tpcore_fvdas_mod.F90.
GEOS-Chem version
12.9.3
The text was updated successfully, but these errors were encountered: