Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/ISSUE] Typo in CloudHet #906

Closed
viral211 opened this issue Sep 21, 2021 · 16 comments · Fixed by #912
Closed

[BUG/ISSUE] Typo in CloudHet #906

viral211 opened this issue Sep 21, 2021 · 16 comments · Fixed by #912
Assignees
Labels
category: Bug Something isn't working topic: Chemical Mechanisms Related to KPP and/or GEOS-Chem chemistry mechanisms
Milestone

Comments

@viral211
Copy link

There is a typo in CloudHet in gckpp_Hetrates.F90. The line

ff = max( ff, 1.0e30_dp )

should be:
ff = min( ff, 1.0e30_dp )

I have run it by Chris Holmes @cdholmes. I think this would change the benchmark output significantly.

@viral211 viral211 added the category: Bug Something isn't working label Sep 21, 2021
@yantosca
Copy link
Contributor

Thanks Viral. Also tagging @msl3v on this. We can try to get this into 13.3.0 and benchmark.

@yantosca yantosca self-assigned this Sep 21, 2021
@yantosca yantosca added this to the 13.3.0 milestone Sep 21, 2021
@yantosca yantosca added the topic: Chemical Mechanisms Related to KPP and/or GEOS-Chem chemistry mechanisms label Sep 21, 2021
yantosca added a commit that referenced this issue Sep 24, 2021
This update addresses the issue from geoschem/geos-chem #906.
The MAX( ff, 1.0e30 ) expression in CloudHet, CloudHet1R, and
CloudHet2R functions should be MIN( ff, 1.0e30 ).

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca
Copy link
Contributor

This has been fixed with commit c6adf4b. Awaiting PR review and merge.

@yantosca
Copy link
Contributor

This has now been merged into the 13.3.0 development branch and tagged with 13.3.0-alpha.4.

@yantosca
Copy link
Contributor

Hi @viral211 and @cdholmes. I merged this PR as 13.3.0-alpha.4 but @msulprizio and I noticed that it caused fullchem sims to choke on the 2nd chemistry timestep with a could not integrate error:

---> DATE: 2019/07/01  UTC: 00:20  X-HRS:      0.333333
 Forced exit from Rosenbrock due to the following error:
 --> Step size too small: T + 10*H = T or H < Roundoff
 T=   962.73884987080180      and H=   5.0591750291364415E-013
 ### INTEGRATE RETURNED ERROR AT:           57          39          31
 Forced exit from Rosenbrock due to the following error:
 --> Step size too small: T + 10*H = T or H < Roundoff
 T=   966.10882298003025      and H=   5.1668987654245023E-013
 ## INTEGRATE FAILED TWICE !!! 
===============================================================================
GEOS-CHEM ERROR: Integrator error code : -7
STOP at INTEGRATE_KPP
===============================================================================

I am wondering if that MIN value is even needed. In my new hetrates code I now use a SafeDiv function to set the result at 1e30 if the division cannot be done. So we may be able to get away with this:

    ! Ratio of volume inside to outside cloud
    ! ff has a range [0,+inf], so cap it at 1e30
    ff = SafeDiv( H%CldFr, H%ClearFr, 1.0e+30_dp )

@yantosca yantosca reopened this Sep 27, 2021
@yantosca
Copy link
Contributor

Also, this was not caught in integration tests because the integration tests only do 1 chemistry timestep, and stopped before the timestep of the error. We should probably extend integration tests to 1hr of simulation runtime.

@cdholmes
Copy link
Contributor

Omitting the the MIN line, as you have done, should be fine. Does that fix the integration error?

@yantosca
Copy link
Contributor

It does not fix the integration error, I just treid.

The original code in the old gckpp_HetRates.F90 was:

! Ratio (in cloud) of heterogeneous loss to detrainment, s/s
kk = kI * tauc
! Ratio of volume inside to outside cloud
! ff has a range [0,+inf], so cap it at 1e30
ff = safe_div( fc, (1.0_dp - fc), 1.0e30_dp )
ff = max( ff, 1.0e30_dp )
! Ratio of mass inside to outside cloud
! xx has range [0,+inf], but ff is capped at 1e30, so this shouldn't overflow
xx = ( ff - kk - 1.0_dp ) / 2.0_dp + &
sqrt( 1.0_fp + ff*ff + kk*kk + &
2.0_dp*ff + 2.0_dp*kk - 2.0_dp*ff*kk ) / 2.0_dp
! Overall heterogeneous loss rate, grid average, 1/s
! kHet = kI * xx / ( 1d0 + xx )
! Since the expression ( xx / (1+xx) ) may behave badly when xx>>1,
! use the equivalent 1 / (1 + 1/x) with an upper bound on 1/x
kHet = kI / ( 1.0_dp + safe_div( 1.0_dp, xx, 1e30_dp ) )

If you print out some of the values of ff, xx, and kHet, there are sometimes negative xx and negative kHet:

 @@@ ff, xx, khet:    3.5772598845170236E-004   1.7895701188219704E-004   1.9458553343091576E-016
 @@@ ff, xx, khet:   0.15213769712086028       -2.7886791897719920E-002  -3.4170700869882024E-006
 @@@ ff, xx, khet:    3.8540670748149521E-002  -7.5296433906717475E-003  -1.2375804544089306E-007

I would think that you wouldn't want to have negative reaction rates. Should these be flushed to zero?

@yantosca
Copy link
Contributor

@cdholmes, flushing negative xx to zero gets us past the integration error:

    !------------------------------------------------------------------------
    ! Grid-average loss frequency
    !
    ! EXACT expression for entrainment-limited uptake
    !------------------------------------------------------------------------

    ! Ratio (in cloud) of heterogeneous loss to detrainment, s/s
    kk = kI * tauc

    ! Ratio of volume inside to outside cloud
    ! ff has a range [0,+inf], so cap it at 1e30
    ff = SafeDiv( H%CldFr, H%ClearFr, 1.0e+30_dp )
    !ff = MAX( ff, 1.0e+30_dp )
    ff = MIN( ff, 1.0e+30_dp )
    
    ! Ratio of mass inside to outside cloud
    ! xx has range [0,+inf], but ff is capped at 1e30, so this shouldn't overflow
    xx =     ( ff - kk - 1.0_dp ) / 2.0_dp +                                 &
         SQRT( 1e0_dp + ff**2 + kk**2 + 2*ff**2 + 2*kk**2 - 2*ff*kk ) / 2.0_dp

    ! Prevent negative xx and negative kHet (bmy, 9/27/21)
    xx = MAX( xx, 0.0_dp )
    
    ! Overall heterogeneous loss rate, grid average, 1/s
    ! kHet = kI * xx / ( 1d0 + xx )
    !  Since the expression ( xx / (1+xx) ) may behave badly when xx>>1,
    !  use the equivalent 1 / (1 + 1/x) with an upper bound on 1/x
    kHet = kI / ( 1.0_dp + SafeDiv( 1.0_dp, xx, 1.0e+30_dp ) )

    ! Overall loss rate in a particular reaction branch, 1/s
    kHet = kHet * branch

@cdholmes
Copy link
Contributor

  1. As a temporary fix, you can flush xx and khet to zero.
  2. Could you also print out kI and fc in addition to ff, xx, khet? I want to look into this further. (Might be roundoff error and catastrophic cancellation in xx calculation.)

@viral211
Copy link
Author

Hi @yantosca. The calculation of xx in the code you just pasted is from an older version. It had an error that we had fixed sometime back. The file on GitHub has the correct calculation. It should be:

xx = ( ff- kk- 1.0_dp ) / 2.0_dp + sqrt( 1.0_fp + ff*ff + kk*kk + 2.0_dp*ff + 2.0_dp*kk - 2.0_dp*ff*kk ) / 2.0_dp

I wonder if that's he reason for negative values of xx.

@yantosca
Copy link
Contributor

@cdholmes, here is printout from a few locations where xx and khet are zero:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    5.8539284960727474E-006
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :    2.1074142585861891E-002
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -9.5786521776447531E-003
@@@ khet  :   -5.6615040719761695E-008
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    4.3832103667774089E-005
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :   0.15779557320398671     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -6.0017737611452016E-002
@@@ khet  :   -2.7986737645513250E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    8.6356085558678892E-007
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :    3.1088190801124402E-003
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -9.1027302341539773E-004
@@@ khet  :   -7.8679234676646512E-010
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    3.9933645909351163E-005
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :   0.14376112527366419     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -5.6063335358846111E-002
@@@ khet  :   -2.3717834750793083E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    8.8885181978303368E-007
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :    3.1998665512189212E-003
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -9.5542402800685711E-004
@@@ khet  :   -8.5004253702303078E-010
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    3.9933645909351163E-005
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :   0.14376112527366419     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -5.6063335358846111E-002
@@@ khet  :   -2.3717834750793083E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    3.9044794089568130E-005
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :   0.14056125872244526     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -5.5124085660023581E-002
@@@ khet  :   -2.2778743127078089E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    3.9044794089568130E-005
@@@ fc    :    1.2736767530441284E-003
@@@ 1-fc  :   0.99872632324695587     
@@@ kk    :   0.14056125872244526     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.2753010743756928E-003
@@@ xx    :   -5.5124085660023581E-002
@@@ khet  :   -2.2778743127078089E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    7.7444165929824418E-005
@@@ fc    :   0.11744181811809540     
@@@ 1-fc  :   0.88255818188190460     
@@@ kk    :   0.27879899734736791     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :   0.13306977435489950     
@@@ xx    :   -2.2385298085750183E-002
@@@ khet  :   -1.7733067392980751E-006
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    4.3817520052903658E-005
@@@ fc    :   0.10060337185859680     
@@@ 1-fc  :   0.89939662814140320     
@@@ kk    :   0.15774307219045317     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :   0.11185651436840825     
@@@ xx    :   -4.0754996429750445E-003
@@@ khet  :   -1.7930906134716327E-007
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    4.2738640622205602E-005
@@@ fc    :   0.10060337185859680     
@@@ 1-fc  :   0.89939662814140320     
@@@ kk    :   0.15385910623994017     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :   0.11185651436840825     
@@@ xx    :   -2.7993019017954479E-003
@@@ khet  :   -1.1997420198567723E-007
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    4.2738640622205602E-005
@@@ fc    :   0.10060337185859680     
@@@ 1-fc  :   0.89939662814140320     
@@@ kk    :   0.15385910623994017     
@@@ tauc  :    3600.0000000000000     
@@@ ff    :   0.11185651436840825     
@@@ xx    :   -2.7993019017954479E-003
@@@ khet  :   -1.1997420198567723E-007
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@ kI    :    8.3063899387568819E-006
@@@ fc    :    1.3083992525935173E-002
@@@ 1-fc  :   0.98691600747406483     
@@@ kk    :    2.9903003779524776E-002
@@@ tauc  :    3600.0000000000000     
@@@ ff    :    1.3257452941129854E-002
@@@ xx    :   -7.7188966513230817E-003
@@@ khet  :   -6.4614921383144082E-008

I have a larger log file (75 MB) if you'd liek to see the whole thing.

@yantosca
Copy link
Contributor

@viral211 thanks. I was sure that I had fixed that in the new code but I guess it fell thru the cracks. Sorry for the bother.

@cdholmes
Copy link
Contributor

It looks like the bug in the xx definition was causing the negative values. I checked a few examples from the log excerpt that Bob sent. Can you confirm that the negatives are gone when xx is defined correctly?

@yantosca
Copy link
Contributor

I was able to run a 1-day fullchem_benchmark run and I didn't get the integration error. With the negatives the run bombed out before 00:30. So I think we are OK now.

@msulprizio
Copy link
Contributor

msulprizio commented Sep 29, 2021

I ran an internal 1-month benchmark to evaluate the impact of these changes (tagged as 13.3.0-alpha.4). The benchmark plots and tables may be viewed at http://ftp.as.harvard.edu/gcgrid/geos-chem/validation/InternalBenchmarks/GCC_13.3.0-alpha.4/. The OH metrics are included below for quick review:

###############################################################################
### OH Metrics
### Ref = GCC_13.3.0-alpha.3; Dev = GCC_13.3.0-alpha.4
###############################################################################

------------------------------------------------------------
Global mass-weighted OH concentration [10^5 molec cm^-3]
------------------------------------------------------------
Ref      : 11.05722637197
Dev      : 11.79245633477
Abs diff :  0.73522996280
%   diff :  6.649316

------------------------------------------------------------
CH3CCl3 (aka MCF) lifetime w/r/t tropospheric OH [years]
------------------------------------------------------------
Ref      :  5.634232
Dev      :  5.246796
Abs diff : -0.387436
%   diff : -6.876466

------------------------------------------------------------
CH4 lifetime w/r/t tropospheric OH [years]
------------------------------------------------------------
Ref      :  9.487550
Dev      :  8.841421
Abs diff : -0.646129
%   diff : -6.810283

Changes are considerably large, as suspected by Viral in the original post above. I will close out this issue for now since this will officially be benchmarked in 13.3.0, but @viral211, @cdholmes, or others, please feel free to comment here on any of the changes to model output.

@viral211
Copy link
Author

Thanks @msulprizio. I was expecting a general increase in ozone because NOx loss on clouds is now much slower. The decrease in ozone over the southern ocean was unexpected. It is caused by the increase in Br, but I am not sure what drives it. I spoke with Xuan Wang and we suspect that it is because the loss of Br through the HOBr+S(IV) reaction in clouds is now slower (along with other HOBr cloud reactions) and more HOBr is available to react with HBr/Br-/Cl- on aerosols and release radicals. Xuan is looking into this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Chemical Mechanisms Related to KPP and/or GEOS-Chem chemistry mechanisms
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants