[BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) #269

ktravis213 · 2020-04-03T12:51:17Z

Hi support team,
I am running version 12.6.3 at 0.25x0.3125 resolution for the China grid. I need hourly time series, which I output with the HISTORY.rc diagnostic. I can only run two days at a time at this resolution on my computing system. For some reason, the budget diagnostic outputs all 24 hours in a day to the same file, but the other diagnostics (species concentration, prod/loss) output 23 hours and put the 24th hour in a new file. I am not sure which is correct and how to fix this.
Thank you!

lizziel · 2020-04-03T14:41:28Z

Hi Katie, could you provide your HISTORY.rc file? You can drag and drop it into a comment box if you add extension .txt. Thanks!

yantosca · 2020-04-03T14:54:48Z

Hi Katie, thanks for bringing this to our attention. @msulprizio and I just looked at some timeseries output that she had handy for another project.

Long story short: What happens is that at 0Z on the last day of your run, a file is created with one timestamp (e.g. 2018-07-01, 00Z). But then when you start the next run stage in your, the existing file gets overwritten with data from e.g. 2018-07-01 00 to 2018-07-01 23z.

We think the solution may be simple. In 12.8.1, we can add a test to see if the file that is trying to be created already exists, and if it does, to use netCDF append mode instead of netCDF write mode.

In the meantime, there are a couple of things you can do as a workaround.

Add a line into your run script to rename the diagnostic files that are created at the last timestep of your run, e.g.

mv GEOSChem.SpeciesConc.20180701_0000z.nc4 GEOSChem.SpeciesConc.20180701_0000z.1.nc4

then you can combine those files with the new files that are created at the end of the month.

Or even better, at the end of the run, rename the OutputDir folder, and then create a new blank OutputDir for the next run. This will avoid having files from a previous run being overwritten. Then you can either merge the files with nco or cdo, or read them into Python with xr.open_mfdataset,

Anyway, thanks again and stay tuned for the fix!

yantosca · 2020-04-06T14:21:46Z

Also -- to answer your question about why the budget diagnostics were not affected, that is probably because you archived the budget diagnostics as time-averaged. The bug above only affects instantaneous collections.

This commit fixes an issue raised by Katie Travis on Github: #269 The fix is: (1) For instantaneous collections (except Restart), test if a file already exists. This often happens at the start of a new day. (2) If the file exists, then open it for appending, and get the number of timestamps already in the file. (3) Then increment the timestamp counter to account for the number of existing timestamps. (For example, if there was one time point in the file, then write the next time slice with time index=2 instead of time index=1. NOTE: This fix also required the addition of a new subroutine (NC_APPEND) to NcdfUtil/ncdf_mod.F90. Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>

yantosca · 2020-04-08T15:51:26Z

I have pushed commit 28109a7, which is now currently on the bugfix/NcWrite branch. This will go into 12.8.1 once that version is ready (but you can check out this update if you need it).

I did a couple of test runs where I ran the transport tracer simulation for 1 day at a time and then saved out instantaneous 3-hourly output. I then checked to see if the file for the 2nd day had the first timestep (which was written out at the end of the previous day's run) preserved. Indeed it is!

netcdf GEOSChem.SpeciesConc.20160102_0000z {
dimensions:
        time = UNLIMITED ; // (8 currently)
        lev = 72 ;
        ilev = 73 ;
. . .
data:

 time = "2016-01-02", "2016-01-02 03", "2016-01-02 06", "2016-01-02 09", 
    "2016-01-02 12", "2016-01-02 15", "2016-01-02 18", "2016-01-02 21" ;

Without the fix, the list of timestamps would have started with 2016-01-02 03 (for 3-hourly output) instead of 2016-01-02 00, and there would have been 7 timesteps instead of the correct value of 8.

yantosca · 2020-04-13T19:46:56Z

Commit 28109a7 has now been merged up to version 12.8.1

The commit 28109a7, as described in this Github issue: #269 originally fixed an error where instantaneous files having data at 00 UTC were being overwritten by the next day's output. However, in commit 28109a7, we moved the statement Container%FirstInt=.FALSE lower in the subroutine so that it could be used again. However, we forgot to add this line in the IF block that returns out of the subroutine. Therefore, due to this bug, any existing instantaneous file would get overwritten inadvertently. This new bug fix now prevents this, as well as setting the netCDF reference time properly. Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>

yantosca · 2020-04-16T21:35:44Z

It was discovered that the prior commit 28109a7 had a bug that was caused by a missing Container%Set_Inst = .FALSE. statement. This error was causing instantaneous files to always use the reference time at the start of the simulation. This would try to append to existing files that should have been appended to (e..g the initial restart file).

This behavior is now corrected in commit 8003e32.

NOTE: We may add a further update to decide which instantaneous files should be appended to and which should not.

yantosca · 2020-04-20T13:35:12Z

Also note, a further commit 1a160e1 was made in order to make sure that instantaneous collections where frequency=duration are written to disk with the proper reference time.

Also, we removed "UTC" from the time:units" string, as this sometimes causes ncdump -t not to display the proper time values.

yantosca · 2020-04-20T20:20:03Z

Also added further commit 81c7126, which adds extra updates so that instantaneous collections with multiple time points are now archived properly.

Also, we now throw an error if we try to append to an instantaneous netCDF file with more than one time point. This is now in 12.8.1.

yantosca · 2020-04-21T18:01:18Z

After further consideration, we have decided to remove the need for appending to instantaneous file collection files due to the danger of clobbering data. This was updated in commit 94d78a7.

Instantaneous collection with more than one time point per file

First file timestamp: 1st day of simulation @ 01:00Z
Llast file timestamp: last day of simulation @ 00:00Z.
No appending needed.

For example, a 2-day run with hourly archiving saved to daily files creates this output:

GEOSChem.SpeciesConc.20160101_0010z.nc4
GEOSChem.SpeciesConc.20160102_0000z.nc4
GEOSChem.SpeciesConc.20160102_0010z.nc4
GEOSChem.SpeciesConc.20160103_0000z.nc4

Instantaneous collection with only one time point per file

The file name uses the date/time when the data is written to disk.
No appending needed.

For example, a multi-day run with hourly archiving saved to hourly files now creates this output:

GEOSChem.SpeciesConc.20160101_0100z.nc4
GEOSChem.SpeciesConc.20160101_0200z.nc4
GEOSChem.SpeciesConc.20160101_0300z.nc4
...
GEOSChem.SpeciesConc.20160101_2000z.nc4
GEOSChem.SpeciesConc.20160101_2100z.nc4
GEOSChem.SpeciesConc.20160101_2200z.nc4
GEOSChem.SpeciesConc.20160101_2300z.nc4
GEOSChem.SpeciesConc.20160102_0000z.nc4
GEOSChem.SpeciesConc.20160102_0100z.nc4
... etc ...

ktravis213 added the category: Bug Something isn't working label Apr 3, 2020

yantosca changed the title ~~[BUG/ISSUE]~~ [BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) Apr 3, 2020

yantosca added this to the 12.8.1 milestone Apr 3, 2020

yantosca self-assigned this Apr 3, 2020

yantosca closed this as completed Apr 8, 2020

yantosca reopened this Apr 16, 2020

yantosca closed this as completed Apr 20, 2020

msulprizio mentioned this issue May 15, 2020

[BUG/ISSUE] Using 3-hourly boundary conditions for the GEOS-Chem nested simulation #248

Closed

yantosca mentioned this issue Sep 9, 2020

[BUG/ISSUE] Restart file generated for unexpected date #86

Closed

rgstevens mentioned this issue Apr 21, 2021

[BUG/ISSUE] Boundary conditions not updating in nested grid simulations #700

Closed

tscarter mentioned this issue Jun 28, 2021

Excessive fall velocity? [BUG/ISSUE] #762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) #269

[BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) #269

ktravis213 commented Apr 3, 2020

lizziel commented Apr 3, 2020

yantosca commented Apr 3, 2020

yantosca commented Apr 6, 2020

yantosca commented Apr 8, 2020 •

edited

yantosca commented Apr 13, 2020

yantosca commented Apr 16, 2020

yantosca commented Apr 20, 2020

yantosca commented Apr 20, 2020

yantosca commented Apr 21, 2020

[BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) #269

[BUG/ISSUE] HISTORY diagnostics only saves 23 hours of data to diagnostic files (except for Budget) #269

Comments

ktravis213 commented Apr 3, 2020

lizziel commented Apr 3, 2020

yantosca commented Apr 3, 2020

yantosca commented Apr 6, 2020

yantosca commented Apr 8, 2020 • edited

yantosca commented Apr 13, 2020

yantosca commented Apr 16, 2020

yantosca commented Apr 20, 2020

yantosca commented Apr 20, 2020

yantosca commented Apr 21, 2020

Instantaneous collection with more than one time point per file

Instantaneous collection with only one time point per file

yantosca commented Apr 8, 2020 •

edited