Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/ISSUE] Olson landmap data is being read in as all zeroes for GCHP simulations with gfortran (but not ifort) #15

Closed
yantosca opened this issue Dec 21, 2018 · 14 comments

Comments

@yantosca
Copy link
Contributor

Long story short, the Olson land map data seems to be coming in as all zeroes from the Import State for GCHP simulations that use gfortran. This issue is most certainly the root cause of previously-mentioned issues #13 and #14.

My GC-classic and GCHP code were at the following commits:

GEOS-Chem repository
   Path        : /local/ryantosca/GCHP/gf82/Code.12.1.1
   Branch      : bugfix/GCHP_issues
   Last commit : Add a better error trap in Compute_Olson_Landmap_GCHP
   Date        : Fri Dec 21 12:03:57 2018 -0500
   User        : Bob Yantosca
   Hash        : 7db80f35
   Git status  : 
GCHP repository
   Path        : /local/ryantosca/GCHP/gf82/Code.12.1.1/GCHP
   Branch      : bugfix/GCHP_issues
   Last commit : Now exit if Compute_Olson_Landmap_GCHP returns with failure
   Date        : Fri Dec 21 12:07:09 2018 -0500
   User        : Bob Yantosca
   Hash        : 7bd7110
   Git status  : M Chem_GridCompMod.F90

Note that I modified the code to put in an error trap that will exit if all elements of State_Met%LandTypeFrac are zero (or more precisely, when the variable maxFracInd is zero).

I have narrowed down the issue to this code section of GCHP/Chem_GridCompMod.F90, where the Olson land map data is obtained from the Import State and copied into State_Met%LandFracType.

       IF ( FIRST ) THEN
       
          ! Set Olson fractional land type from import (ewl)
          If (am_I_Root) Write(6,'(a)') 'Initializing land type ' // &
                           'fractions from Olson imports'
          Ptr2d => NULL()
          DO T = 1, NSURFTYPE
       
             ! Create two-char string for land type
             landTypeInt = T-1
             IF ( landTypeInt < 10 ) THEN
                WRITE ( landTypeStr, "(A1,I1)" ) '0', landTypeInt
             ELSE
                WRITE ( landTypeStr, "(I2)" ) landTypeInt  
             ENDIF
             importName = 'OLSON' // TRIM(landTypeStr)
       
             ! Get pointer and set populate State_Met variable
             CALL MAPL_GetPointer ( IMPORT, Ptr2D, TRIM(importName),  &
                                    notFoundOK=.TRUE., __RC__ )
             If ( Associated(Ptr2D) ) Then
                If (am_I_Root) Write(6,*)                                &
                     ' ### Reading ' // TRIM(importName) // ' from imports'
                State_Met%LandTypeFrac(:,:,T) = Ptr2D(:,:)
             ELSE
                WRITE(6,*) TRIM(importName) // ' pointer is not associated'
             ENDIF
       
             ! Nullify pointer
             Ptr2D => NULL()
          ENDDO
          
          ! Compute State_Met variables IREG, ILAND, IUSE, and FRCLND
          CALL Compute_Olson_Landmap_GCHP( am_I_Root, State_Met, RC=STATUS )
          VERIFY_(STATUS)    ! <--- I added this error trap
       ENDIF

Also not shown above are some debug print statements.

I compiled and ran a C24 GCHP Rn-Pb-Be simulation with gfortran 8.2, using 6 cores of Odyssey. The modules were:

Currently Loaded Modules:
  1) git/2.17.0-fasrc01   5) gcc/8.2.0-fasrc01       9) hdf5/1.8.12-fasrc12
  2  gmp/6.1.2-fasrc01    6) openmpi/3.1.1-fasrc01  10) netcdf/4.1.3-fasrc02
  3) mpfr/3.1.5-fasrc01   7) zlib/1.2.8-fasrc07
  4) mpc/1.0.3-fasrc06    8) szip/2.1-fasrc02

And I got the following output in the gchp.log file:

  ### Reading OLSON01 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
  ### Reading OLSON02 from imports
  ### Reading OLSON03 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
 ... etc...
  ### Reading OLSON72 from imports
 %%%LTF:            0           5          25   0.0000000000000000        0.0000000000000000     
===============================================================================
GEOS-Chem ERROR: Invalid value of maxFracInd: 0!  This can indicate a problem 
reading Olson data from the Import State, and that State_Met%LandTypeFrac 
array has all zeroes.
 -> at Compute_Olson_Landmap_GCHP (in module GeosCore/olson_landmap_mod.F90)
===============================================================================

GIGCchem::Run_                                1863
GIGCchem::Run2                                1281
GCHP::Run                                      420
MAPL_Cap                                       792
===> Run ended at Fri Dec 21 12:25:57 EST 2018

The error message is from the new error trap that I committed to the bugfix/GCHP_issues branches of the gchp and geos-chem repos on Github. The numbers in each line beginning with "%%%LTF" indicate the core number, I & J value, and the sum of State_Met%LandTypeFrac for that I,J and Olson type. As you can see all of the Olson values are coming into State_Met%LandTypeFrac as zeroes.

It seems that the issue is happening somewhere in MAPL, as reading in the Olson data uses a new feature of MAPL to return the fraction of the grid box that is covered by land type N, where N is an integer.

In MAPL_ExtDataGridCompMod.F90, then there is this snippet, where there are calls down to MAPL_CFIORead

     if (present(field)) then
        if (trans == MAPL_HorzTransOrderBilinear) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                __RC__) 
        else if (trans == MAPL_HorzTransOrderBinning) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., __RC__) 
        else if (trans == MAPL_HorzTransOrderSample) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., Voting = .true., __RC__) 
        else if (trans == MAPL_HorzTransOrderFraction) then
           call MAPL_CFIORead(name, file, time, field, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., getFrac = val, __RC__) 
        end if

     else if (present(bundle)) then

        if (trans == MAPL_HorzTransOrderBilinear) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                __RC__)
        else if (trans == MAPL_HorzTransOrderBinning) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., __RC__)
        else if (trans == MAPL_HorzTransOrderSample) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., Voting = .true., __RC__)
        else if (trans == MAPL_HorzTransOrderFraction) then
           call MAPL_CFIORead(file, time, bundle, &
                time_is_cyclic=.false., time_interp=.false., ignoreCase = ignoreCase, &
                Conservative = .true., getFrac = val, __RC__)
        end if

     end if

The relevant calls are the ones where trans == MAPL_HorzTransOrderFraction. In MAPL_CFIO, there are further calls to MAPL_HorzTransformRun, which is where I suspect the error may be happening. MAPL_HorzTransformRun is an overloaded interface for several other module procedures

Unfortunately, at this point my knowledge of the innards of MAPL is not very comprehensive. If anyone has any other suggestions to try, then please let me know. My guess is that deep into MAPL there is some code that gfortran isn't parsing properly, or for which an unexpected side-effect is occurring.

NOTE: This could potentially be caused by the ESMF version which is 5.2. ESMF 5.2 pre-dates the newest versions of gfortran, so there could conceivably be some incompatibility. But who knows.

THE UPSHOT: Until we find & fix this issue, we should not use gfortran for GCHP simulations. While GCHP can run on the AWS cloud in tutorial mode, the error is still present and you will get erroneous output.

I verifiied that compiling and running GCHP using ifort 17 correctly read the Olson land map values from the Import State into State_Met%LandFracType. So this issue only happens with GNU Fortran.

Also, I will mark #13 and #14 as closed, as this issue is the root cause.

@JiaweiZhuang
Copy link
Contributor

JiaweiZhuang commented Dec 21, 2018

Thanks for this deep diagnosis.

If the core issue is

It seems that the issue is happening somewhere in MAPL, as reading in the Olson data uses a new feature of MAPL to return the fraction of the grid box that is covered by land type N, where N is an integer.

Some dirty fixes to correct this:

  • Regrid Olson data offline and read it directly (like for restart files). Do not use the option MAPL_HorzTransOrderFraction
  • Use ifort on AWS... (yes I am not kidding). I believe the license is needed only at compile time, not at run-time. Can distribute pre-compiled executable.

This would be like the last resort. It is indeed best to make gfortran working properly.

@lizziel
Copy link
Contributor

lizziel commented Dec 21, 2018

I am going to do a benchmark run with fortran8.2 on Odyssey in January. There may be other problems beyond this one that we don't know about since all GCHP benchmarks have been with ifort.

@yantosca
Copy link
Contributor Author

yantosca commented Dec 21, 2018

I poked around in MAPL_RegridConservative Run a bit. There is this code:

   FF = 0.0
    do N = 1, size(TT%OUT%II)
       II = TT%IN%II(N)
       JI = TT%IN%JJ(N)
!@       if (II<1 .or. II>size(INPUT,1) .or.JI<1 .or. JI>size(INPUT,2)) then
!@          print *,'we have a problem'
!@       endif
       VALI = INPUT(II,JI)
       if(VALI/=MAPL_UNDEF) then
          IO = TT%OUT%II(N)
          JO = TT%OUT%JJ(N)
          W  = TT%OUT%W(N)
          if (uSAMPLE) then
             if (W > FF(IO,JO)) then
                OUTPUT(IO,JO) = VALI
                FF    (IO,JO) = W
             end if
          %%% THIS IS WHERE THE FRACTION OF LANDTYPE PER GRIDBOX IS COMPUTED
          %%% GETFRAC= the desired land type value (0..72)
          %%% VALI = the land type read in from the file
          %%% OUTPUT = the fraction of grid box with land type GETFRAC
          %%% FF = contains the sum of the mapping weights on the ouptut grid 
          else if (doFrac) then
             if (VALI .gt. GetFrac_-eps .and. VALI .lt. GetFrac_+eps) then
                OUTPUT(IO,JO) = OUTPUT(IO,JO)+W
             end if
             FF(IO,JO)=FF(IO,JO)+W
          else
             OUTPUT(IO,JO) = OUTPUT(IO,JO) + W * VALI
             FF    (IO,JO) = FF    (IO,JO) + W
          end if
       endif
    end do

    if(.not. uSAMPLE) then
       where(FF /= 0.0)
          OUTPUT = OUTPUT / FF
       elsewhere 
          OUTPUT = MAPL_Undef
       end where
    end if

But this code seems to be working OK. I put in some debug print and as far as I can tell the output array is as expected.

The only thing I would do differently would be to replace the

WHERE( FF /= 0.0 ) 

with

WHERE( FF > 0.0 .or. FF < 0.0 )

since sometimes an equality test for zero will fail due to roundoff. But the values of FF always seem to be close to 1, so I don't think it's an issue. The error may be upstream from there.

Maybe in January we can forward this to the MAPL dev team.

@JiaweiZhuang
Copy link
Contributor

I am going to do a benchmark run with fortran8.2 on Odyssey in January.

@lizziel Consider checking the 4 cases:

  1. ifort standard
  2. gfortran standard
  3. ifort with DryDep turned off (to remove landmap effect)
  4. gfortran with DryDep turned off

Looks at surface ozone. (3) and (4) should be very close if there are no extra bugs besides this landmap one.

@lizziel
Copy link
Contributor

lizziel commented Jan 3, 2019

Reading land type binary masks instead of the land map should bypass this issue. I generated a new file and confirmed that the two methods give identical results when using ifort. Tests with gfortran are in progress.

@lizziel
Copy link
Contributor

lizziel commented Jan 4, 2019

Using an alternative regridding method does not correct this issue. Further testing is in progress.

@yantosca
Copy link
Contributor Author

I have created a better error trap (one that only requires modifications to GCHP/Chem_GridCompMod.F90). We now get this error message if the Olson values are all zeroes:

  ### Reading OLSON01 from imports
  ... etc ...
  ### Reading OLSON72 from imports 
State_Met%LandTypeFrac contains all zeroes!  This error is a known issue in MAPL 
when using gfortran. This should not happen if you compiled with ifort.
GIGCchem::Run_                                1857
GIGCchem::Run2                                1277
GCHP::Run                                      420
MAPL_Cap                                       792

@lizziel
Copy link
Contributor

lizziel commented Jan 11, 2019

Investigation of MAPL CFIO shows that the reading and regridding of the Olson land map file is correct for all regridding methods. The issue thus occurs somewhere between reading/regridding and retrieving the pointer to the import.

Until this issue is fixed we recommend not using gfortran with GCHP. The documentation is updated for this but there is also now a forced stop if the error occurs (see above comment from Bob).

@lizziel
Copy link
Contributor

lizziel commented Jan 15, 2019

This issue will not be fixed in GCHP 12.2.0. I will post here again when a fix is found and slated for an upcoming version.

@lizziel
Copy link
Contributor

lizziel commented Feb 14, 2019

GMAO tested our Olson land map file with their current integration branch of GEOS using gfortran. They did not encounter the problem we have using the old MAPL in GCHP. Updating GCHP to use a new version of MAPL, even newer than GMAO tested with, is in progress. I am currently working under the assumption that it will resolve this bug but it is too soon to verify this.

@JiaweiZhuang
Copy link
Contributor

@lizziel Thanks for the update! That's great to know and seems like new MAPL will solve a lot of crucial problems (gfortran bug, inefficient I/O).

@lizziel
Copy link
Contributor

lizziel commented Mar 19, 2019

I have confirmed that implementation of the new MAPL version in GCHP resolves this issue. The land mask imports are no longer all zero. It is still uncertain which version of GCHP the new MAPL will go into but I will announce it here once it is decided.

@JiaweiZhuang
Copy link
Contributor

I have confirmed that implementation of the new MAPL version in GCHP resolves this issue.

This is wonderful news! We should finally switch to the GNU stack for benchmarks to see if there are still undiscovered issues like this.

@lizziel
Copy link
Contributor

lizziel commented Jun 3, 2019

I am closing this issue since it is fixed with MAPL update going into 12.5.

@lizziel lizziel closed this as completed Jun 3, 2019
@msulprizio msulprizio changed the title Olson landmap data is being read in as all zeroes for GCHP simulations with gfortran (but not ifort) [BUG/ISSUE] Olson landmap data is being read in as all zeroes for GCHP simulations with gfortran (but not ifort) Sep 5, 2019
JiaweiZhuang referenced this issue in JiaweiZhuang/gchp Oct 30, 2019
Mainly for reproducing cloud-GCHP paper

Do not include IntelMPI + gfortran support as added by geoschem/GCHP@075d47d
because version 12.3.2 still has run-time errors with gfortran anyways (e.g. geoschem/GCHP#15)
sdeastham pushed a commit to sdeastham/gchp_legacy that referenced this issue Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants