Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/ISSUE] Surface CH4 fields causing full chem simulation runs to fail if not starting on first hr of first day of month #304

Closed
lizziel opened this issue May 11, 2020 · 17 comments
Assignees
Labels
category: Bug Something isn't working

Comments

@lizziel
Copy link
Contributor

lizziel commented May 11, 2020

Using GEOS-Chem Classic in both 12.8.0 and dev/12.8.1 I get the following run-time error when running the benchmark simulation starting at any day and time other than YYYYMM01 000000.

=====================================================================
GEOS-Chem ERROR: Cannot get pointer to NOAA_GMD_CH4 or CMIP6_Sfc_CH4 in 
SET_CH4! Make sure the data source corresponds to your emissions year in 
HEMCO_Config.rc (NOAA GMD for 1978 and later; else CMIP6).
 -> at SET_CH4 (in module GeosCore/set_global_ch4_mod.F90)
=====================================================================

=====================================================================
GEOS-CHEM ERROR: Error encountered in call to "SET_CH4"!
STOP at  -> at GEOS-Chem (in GeosCore/main.F90)
=====================================================================

This error occurs during the very first timestep. I am looking into a fix for 12.8.1.

@lizziel lizziel added the category: Bug Something isn't working label May 11, 2020
@lizziel lizziel self-assigned this May 11, 2020
@lizziel
Copy link
Contributor Author

lizziel commented May 11, 2020

It appears this behavior is a result of the time slice selection flag for NOAA_GMD_CH4 and CMIP6_Sfc_CH4 being EY. From the HEMCO users guide:

**E (exact):** Fields are only used if the time stamp on the field exactly
matches the current simulation datetime. In all other cases, data is
ignored but HEMCO does not return an error. For example, if the
source time attribute is set to 2000-2013/1-12/1-31/0 E, every time the
simulation enters a new day HEMCO will attempt to find a data field for
the current simulation date. If no such field can be found on the file, the
data is ignored (and a warning is prompted). This setting is particularly
useful for data that is highly sensitive to date and time, e.g. restart  variables.

**EF (exact, forced):** same as E, but HEMCO stops with an error if no data
field can be found for the current simulation date and time.
(v1.1.011 and higher)

**EC (exact, read/query continuously)**

**ECF (exact, forced, read/query continuously)**

**EY (exact, use simulation year):** Same as E, except don't allow Emission
year setting to override year value.:

Since these files are timestamped with the first time of each month, and the flag is Exact, data is not found in the first timestep for any run that does not start at midnight of the first day of a month. If data is not found for either NOAA_GMD_CH4 or CMIP6_Sfc_CH4 then the run fails with an error.

Using the range flag (R) avoids this issue. This flag is described as follows:

**R (range):** Data are only considered as long as the simulation time is
within the time range specified in attribute sourceTime. The provided
range does not necessarily need to match the time stamps of the input
file. If it is outside of the range of the netCDF time stamps, the closest
available date will be used. For instance, if a file contains data for years
2003 to 2010 and the provided range is set to 2006-2010/1/1/0 R, the file
will only be considered between simulation years 2006-2010. For
simulation years 2006 through 2009, the corresponding field on the file is
used. For all years beyond 2009, data of year 2010 is used. If the simulation
date is outside the provided time range, the data is ignored but HEMCO
does not return an error - the field is simply treated as empty (a corresponding
warning is issued in the HEMCO log file). For example, if the source time
attribute is set to 2000-2002/1-12/1/0 R, the data will be used for simulation
years 2000 to 2002 and ignored for all other years.

Another option to use background values if the exact date is not found.

@tsherwen , @sdeastham , @ltmurray : do any of you have comments on the intended behavior? We are using the exact flag since that is how the update was submitted. However, it seems the range flag is more appropriate. Thoughts?

@lizziel lizziel changed the title [BUG/ISSUE] Benchmark simulation run fails if not starting on first hr of first day of month [BUG/ISSUE] Full chem simulation runs fail if not starting on first hr of first day of month May 11, 2020
@sdeastham
Copy link
Contributor

The range behavior seems like it would be fine to me, but I'd say @tsherwen and @ltmurray would be the real arbiters on this one. That having been said, I'd love to unify the handling of CH4 with the other long-lived species (#287), so ideally we'd have the same behavior for all of them.

@ltmurray
Copy link
Contributor

ltmurray commented May 12, 2020 via email

@sdeastham
Copy link
Contributor

sdeastham commented May 12, 2020

That makes sense to me. It's true for the other long-lived source gases too, although to a much lesser extent, that having it recycle the last year might not yield great results. I'd suggest then RF for all of them. From the HEMCO manual

  • RF (range, forced): same as R, but HEMCO stops with an error if the simulation date is outside the provided range. (v1.1.011 and higher)

EDIT: Removed RY - realized it doesn't have the requisite forcing behavior.

@ltmurray
Copy link
Contributor

ltmurray commented May 12, 2020 via email

@tsherwen
Copy link
Contributor

I am OK with this for now too.

However, I think that explicitly saying in HEMCO to use emission dataset X for certain years and Y for another would be preferable. For CH4, that would allow the NOAA to override CMIP^ for the years the data is available. @christophkeller and I talked about this when I implemented the current setup for CH4, but I think I remember that this would require changes to HEMCO?

I think what you suggested in issue #287 is a good step. Maybe we can add the year-based selection in HEMCO as part of this?

@lizziel
Copy link
Contributor Author

lizziel commented May 12, 2020

We have been using EY not EF so that emission year setting does not override the year value. It therefore seems like RY is the better choice, unless you think we need a new flag equivalent to RYF.

The implementation of checking which source to use has changed since the original submission due to issues reliably accessing the HEMCO clock (#250). The new implementation always checks to see if NOAA_GMD_CH4 data is available in HEMCO. If it is not, then CMIP6_Sfc_CH4 data retrieval is attempted. If that is also not found, then the model exits with an error message. This is in line with our work towards reducing GEOS-Chem dependencies on HEMCO internals.

This method works because there is no overlap in years between the two datasets. Will that change in the future?

@sdeastham
Copy link
Contributor

It would be fantastic if HEMCO had a way of seamlessly switching between datasets depending on the year (this will cause problems with MAPL/ExtData in GEOS and GCHP, but that's a topic for another time). I wanted to mention that we also already have at least one more data source as an option for CH4 surface VMRs - specifically, the WMO 2018 projections. These overlap with both the NOAA and CMIP6 projections. For my part, I will also be running with additional CH4 estimates and predictions which overlap, such as the various RCPs.

If there's a way to transparently handle different sources of data for different times to fill the same item, that seems ideal to me. But for the average GEOS-Chem user, it seems like having some kind of RFY option be standard for the NOAA dataset would be reasonable.

@lizziel
Copy link
Contributor Author

lizziel commented May 13, 2020

@msulprizio told me that you can use the hierarchies in HEMCO to prioritize data application both spatially and temporally. This could therefore be used in theory to handle datasets that overlap in time.

Regarding the immediate fix to the issue of not being able to start beyond the first of the month, I want to get a quick fix into 12.8.1 which will be released very soon. I can put in a feature request for RYF functionality, but in the meantime we should pick one of the available options, RY or RF.

One thing that confuses me is why we would want the model to stop if the data is outside of the specified range (RF). I thought we wanted the model to keep going and try the other dataset to see if the simulation year is applicable to it instead. My understanding of the R flag is that outside of the specified range the HEMCO data container is empty, so using RY would work for this.

@christophkeller , is my understanding of this flag correct? I think we need to update the language in the HEMCO user's manual on this since it seems to give conflicting information, specifically the two sentences below in bold:

R (range): Data are only considered as long as the simulation time is within the time range specified in attribute sourceTime. The provided range does not necessarily need to match the time stamps of the input file. If it is outside of the range of the netCDF time stamps, the closest available date will be used. For instance, if a file contains data for years 2003 to 2010 and the provided range is set to 2006-2010/1/1/0 R, the file will only be considered between simulation years 2006-2010. For simulation years 2006 through 2009, the corresponding field on the file is used. For all years beyond 2009, data of year 2010 is used. If the simulation date is outside the provided time range, the data is ignored but HEMCO does not return an error - the field is simply treated as empty (a corresponding warning is issued in the HEMCO log file). For example, if the source time attribute is set to 2000-2002/1-12/1/0 R, the data will be used for simulation years 2000 to 2002 and ignored for all other years.

@ltmurray
Copy link
Contributor

ltmurray commented May 13, 2020 via email

@lizziel
Copy link
Contributor Author

lizziel commented May 13, 2020

It sounds like there needs to be a new issue beyond this issue that I created. How about I use the RF flag for now so that we can release 12.8.1 without this bug. I will then create a new issue calling for a recommendation on the default handling of surface CH4 since the current handling, both before and after this bug fix, is not necessarily what we want to default to be.

(Aside: If you get this in an email please click the link at the bottom to go to github before responding if possible. Comments added via email include the last comment in the new comment which clutters up the issue page.)

@lizziel
Copy link
Contributor Author

lizziel commented May 18, 2020

After discussion offline we are going to use the RY flag for NOAA_GMD_CH4 in 12.8.1 as a temporary solution to fix this bug in GEOS-Chem Classic. The new behavior is to default to the nearest NOAA_GMD_CH4 year data outside of the range specified in HEMCO_Config.rc. If simulation date is outside of the range then a warning will be printed to HEMCO.log that this is happening.

The warning printed to the log is buried amongst many other messages and warnings. It is therefore unlikely users will note that a surface CH4 year other than their simulation year is being used. For this reason there will be further discussion of this issue with a likely update to change the behavior again in a future version. This will be documented in a separate GitHub issue which I will link to here once it is created.

@lizziel lizziel closed this as completed May 19, 2020
@msulprizio msulprizio changed the title [BUG/ISSUE] Full chem simulation runs fail if not starting on first hr of first day of month [BUG/ISSUE] Surface CH4 fields causing full chem simulation runs to fail if not starting on first hr of first day of month Jun 2, 2020
@tsherwen
Copy link
Contributor

tsherwen commented Oct 6, 2020

@lizziel I've been running some historical runs in v12.9.1 and I noted that this update was only applied to the NOAA_GMD_CH4 collection. Please could we also use the RF flag for the CMIP6_Sfc_CH4 data collection too?

This dataset is primarily for historical runs outside of the the available meteorology, when NOAA data would not be used in preference. So limiting the year used for this collection to simulation year may not be the best choice as I would be expecting users to be using the HEMCO Emission year variable to use this dataset for a specific historical year.

@lizziel
Copy link
Contributor Author

lizziel commented Oct 6, 2020

Hi @tsherwen, I think Melissa will be making updates in 13.0 to restrict years of certain input datasets. @msulprizio, are the year rules for the CMIP_Sfc_CH4 inventory going to have any changes made?

@lizziel lizziel reopened this Oct 6, 2020
@lizziel
Copy link
Contributor Author

lizziel commented Oct 6, 2020

Also, looks like I dropped the ball on creating a new issue for this. Apologies!

@msulprizio
Copy link
Contributor

Yes, I can change the time cycle flag to RF flag for CMIP6_Sfc_CH4 too. I will do this as part of a larger cleanup in HEMCO_Config.rc to ensure users know when they are attempting to use data that is outside of the available time range. I just added a new issue (#475) to track the progress of that update.

@tsherwen
Copy link
Contributor

tsherwen commented Oct 7, 2020

Excellent, this sounds like the right route forward. Thanks @lizziel & @msulprizio

@lizziel lizziel closed this as completed Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants