Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obspack start+count exceeds dimension bound error #2170

Closed
alli-moon opened this issue Feb 29, 2024 · 21 comments
Closed

Obspack start+count exceeds dimension bound error #2170

alli-moon opened this issue Feb 29, 2024 · 21 comments
Assignees
Labels
category: Bug Something isn't working topic: Diagnostics Related to output diagnostic data

Comments

@alli-moon
Copy link

alli-moon commented Feb 29, 2024

Name and Institution (Required)

Name: Alli Moon
Institution: University of Washington

Confirm you have reviewed the following documentation

Description of your issue or question

Please provide as much detail as possible. Always include the GEOS-Chem version number and any relevant configuration and log files.

Hi there,
I am having issues trying to run Obspack in version 14.2.3. We previously successfully used a Python script to generate the obspack input files in version 13.4.1. We are trying to get hourly output from a specific grid box during two field campaigns, and then generate monthly averages for the whole globe during the rest of the simulation. The error occurs when Obspack is activated during the simulation in version 14.2.3. It produces the following error message when Obspack is activated 6 months into the simulation:

The error says

In Ncrd_1d_Char#2: NetCDF: Start+count exceeds dimension bound
65536    6
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Code stopped from DO_ERR_OUT (in module NcdfUtil/m_do_err_out.F90
This is an error that was encountered in one of the netCDF I/O modules, which indicates an error in writing to or reading from a net CDF file
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!".

After many attempts to debug, the script below is the current version we have for 14.2.3. We are still not able to get the simulation to run when Obspack is activated. I am just including the summer info for simplicity but we run the same loop using SAMPLING_DATES_WINTER to generate the input files for that campaign. Can you please help us figure out what we need to change for the simulation to run?
Thank you for your time,
Alli

#!/usr/bin/env python3

import datetime as dt
import numpy as np
import xarray as xr
import pandas as pd

# Configurations
#OUTPUT_DIR = '/home/disk/horton/allimoon/GCClassic_14.2.3/my_code_dir/BLEACH_prelim_tags_4x5_merra2_fullchem/generated_GC-obspack_inputs/'
DATASET_ID = 'GC-MODEL'
OUTPUT_DIR = './generated_GC-obspack_inputs/'

SAMPLING_DATES_SUMMER = pd.date_range(start='2022-05-31',end='2022-07-01', freq='d')
#SAMPLING_DATES_WINTER = pd.date_range(start='2023-01-13',end='2023-2-13', freq='d')

SITE_LAT = 32.2644
SITE_LON = -64.8767
SITE_ALT = 0 
ASSUMED_INLET_HEIGHT_ABOVE_GROUND = 30 # Unit:m

for idx_date, i_date in enumerate(SAMPLING_DATES_SUMMER):
    i_num_location = 1
    sites_info_dict = {}
    for idx_site in range(i_num_location):
        sites_info_dict[idx_site] = {}
        sites_info_dict[idx_site]['lat'] = SITE_LAT
        sites_info_dict[idx_site]['lon'] = SITE_LON
        sites_info_dict[idx_site]['alt'] = SITE_ALT + ASSUMED_INLET_HEIGHT_ABOVE_GROUND
    i_num_obs = i_num_location*24
    i_coords = {'calendar_':np.arange(6).astype('int8'),
                'string_of_200chars':np.arange(200).astype('int16'),
                'string_of_500chars':np.arange(500).astype('int16'),
                'obs':np.arange(i_num_obs),}
    i_lat_arr = np.ones(i_num_obs)*np.nan
    i_lon_arr = np.ones(i_num_obs)*np.nan
    i_alt_arr = np.ones(i_num_obs)*np.nan
    i_time_arr = np.zeros([i_num_obs]).astype('int32')
    i_samplemethod = np.ones(i_num_obs).astype('int8')*2 # Options: 2 = 1-hour avg; 4= instantaneous 
    #i_obspackid_arr = np.chararray([i_num_obs, 200]) 
    i_obspackid_arr = np.chararray([i_num_obs], itemsize=200) 
    idx_obs_tdy = 0
    i_date_str = i_date.strftime('%Y-%m-%d')
    i_date_24hour_index = pd.date_range(start=i_date_str+' 00:30',end=i_date_str+' 23:30', freq='h')
    i_date_in_dt_unit_arr = (i_date_24hour_index - dt.datetime(1970,1,1)).total_seconds()
    for i_hour in range(24):
        for idx_site in sites_info_dict.keys():  
            i_lat = sites_info_dict[idx_site]['lat']
            i_lon = sites_info_dict[idx_site]['lon']            
            i_lat_arr[idx_obs_tdy] = i_lat
            i_lon_arr[idx_obs_tdy] = i_lon
            i_alt_arr[idx_obs_tdy] = sites_info_dict[idx_site]['alt']            
            i_time_arr[idx_obs_tdy] = i_date_in_dt_unit_arr[i_hour]      
            i_obspackid_text = '{}_{:02}30_BermudaTudorHill_{}'.format(i_date.strftime('%Y-%m-%d'),i_hour,DATASET_ID)
            #i_obspackid_arr[idx_obs_tdy] = list('{:<200}'.format(i_obspackid_text))
            i_obspackid_arr[idx_obs_tdy] = '{:<200}'.format(i_obspackid_text)
            idx_obs_tdy +=1

    lat_data_var = xr.DataArray(i_lat_arr,dims=['obs'],coords={'obs':i_coords['obs']},name='latitude')
    lat_data_var.attrs['units'] = 'degrees_north'
    lat_data_var.attrs['_FillValue'] = -1.e+34
    lat_data_var.attrs['long_name'] = 'Sample latitude'
    lat_data_var.attrs['_Storage'] = 'chunked'
    lat_data_var.attrs['_ChunkSizes'] = 778
    lat_data_var.attrs['_Endianness'] = "little" ;

    lon_data_var = xr.DataArray(i_lon_arr,dims=['obs'],coords={'obs':i_coords['obs']},name='longitude')
    lon_data_var.attrs['units'] = 'degrees_east'
    lon_data_var.attrs['_FillValue'] = -1.e+34
    lon_data_var.attrs['long_name'] = 'Sample latitude'
    lon_data_var.attrs['_Storage'] = 'chunked'
    lon_data_var.attrs['_ChunkSizes'] = 778
    lon_data_var.attrs['_Endianness'] = "little" ;

    alt_data_var = xr.DataArray(i_alt_arr,dims=['obs'],coords={'obs':i_coords['obs']},name='altitude')
    alt_data_var.attrs['units'] = 'meters'
    alt_data_var.attrs['_FillValue'] = -1.e+34
    alt_data_var.attrs['long_name'] = 'sample altitude in meters above sea level'
    alt_data_var.attrs['comment'] = 'Altitude is surface elevation plus sample intake height in meters above sea level'
    alt_data_var.attrs['_Storage'] = 'chunked'
    alt_data_var.attrs['_ChunkSizes'] = 778
    alt_data_var.attrs['_Endianness'] = "little" ;

    time_data_var = xr.DataArray(i_time_arr,dims=['obs'],
                                           coords={'obs':i_coords['obs']}
                                           ,name='time')
    time_data_var.attrs['units'] = 'Seconds since 1970-01-01 00:00:00 UTC'
    time_data_var.attrs['_FillValue'] = -999999999
    time_data_var.attrs['long_name'] = 'Seconds since 1970-01-01 00:00:00 UTC'
    time_data_var.attrs['_Storage'] = 'chunked'
    time_data_var.attrs['_ChunkSizes'] = 778
    time_data_var.attrs['_DeflateLevel'] = 5
    time_data_var.attrs['_Endianness'] = "little" ;

    obspack_id_data_var = xr.DataArray(i_obspackid_arr,dims=['obs'],
                                           coords={'obs':i_coords['obs'],}
                                           ,name='obspack_id')
    obspack_id_data_var.attrs['long_name'] = "Unique ObsPack observation id"
    obspack_id_data_var.attrs['comment'] = "Unique observation id string that includes obs_id, dataset_id and obspack_num."
    obspack_id_data_var.attrs['_Storage'] = 'chunked'
    obspack_id_data_var.attrs['_ChunkSizes'] = [1,200]
    obspack_id_data_var.attrs['_DeflateLevel'] = 5

    samplemethod_data_var = xr.DataArray(i_samplemethod,dims=['obs'],coords={'obs':i_coords['obs']},name='CT_sampling_strategy')
    samplemethod_data_var.attrs['_FillValue'] = -9
    samplemethod_data_var.attrs['long_name'] = 'model sampling strategy'
    samplemethod_data_var.attrs['values'] = 'How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous'
    samplemethod_data_var.attrs['_ChunkSizes'] = 778
    samplemethod_data_var.attrs['_DeflateLevel'] = 5
    samplemethod_data_var.attrs['_Endianness'] = "little" ;

    i_config_ds = xr.merge([lat_data_var,
                            lon_data_var,
                            alt_data_var,
                            time_data_var,
                            obspack_id_data_var,
                            samplemethod_data_var])
    
    # Name the char dim name as 'string_of_200chars' to make GEOS-Chem happy. 
    # See the xarray GitHub link above for more info on .update({'char_dim_name': 'xxxxx'})
    i_config_ds.obspack_id.encoding.update({"char_dim_name": "string_of_200chars"})
    i_config_ds = i_config_ds.assign_coords(i_coords)

    i_output_filename = 'ObsPack_config_{}.nc'.format(i_date.strftime('%Y%m%d'))
    i_output_filepath = OUTPUT_DIR+i_output_filename
    
    i_config_ds.to_netcdf(i_output_filepath,unlimited_dims='obs')
Obspack_geos_config
@yantosca
Copy link
Contributor

Thanks for writing @alli-moon. Would you be able to run the following command and post the output here?

$ ncdump -cts filename.nc

where filename.nc is your ObsPack file. I wonder if there is an issue with the dimensions. You can also try opening the file in a netCDF file viewer such as ncview or panoply. If the file can't be opened then there is a problem that we will have to dig into.

Also if we can get this script to work, would you mind if we could include it in the GCPy package? That would be a useful tool for others in the GC community.

@yantosca
Copy link
Contributor

yantosca commented Feb 29, 2024

@alli-moon, I was able to run your python code locally and was able to generate some sample files. They have this format (i.e. output of ncdump -cts):

netcdf ObsPack_config_20220531 {
dimensions:
        obs = UNLIMITED ; // (24 currently)
        string_of_200chars = 200 ;
        calendar_ = 6 ;
        string_of_500chars = 500 ;
variables:
        int64 obs(obs) ;
                obs:_Storage = "chunked" ;
                obs:_ChunkSizes = 512 ;
                obs:_Endianness = "little" ;
        double latitude(obs) ;
                latitude:_FillValue = -1.e+34 ;
                latitude:units = "degrees_north" ;
                latitude:long_name = "Sample latitude" ;
                latitude:_Storage = "chunked" ;
                latitude:_ChunkSizes = 778LL ;
                latitude:_Endianness = "little" ;
                latitude:_Storage = "chunked" ;
                latitude:_ChunkSizes = 24 ;
                latitude:_Endianness = "little" ;
        double longitude(obs) ;
                longitude:_FillValue = -1.e+34 ;
                longitude:units = "degrees_east" ;
                longitude:long_name = "Sample latitude" ;
                longitude:_Storage = "chunked" ;
                longitude:_ChunkSizes = 778LL ;
                longitude:_Endianness = "little" ;
                longitude:_Storage = "chunked" ;
                longitude:_ChunkSizes = 24 ;
                longitude:_Endianness = "little" ;
        double altitude(obs) ;
                altitude:_FillValue = -1.e+34 ;
                altitude:units = "meters" ;
                altitude:long_name = "sample altitude in meters above sea level" ;
                altitude:comment = "Altitude is surface elevation plus sample intake height in meters above sea level" ;
                altitude:_Storage = "chunked" ;
                altitude:_ChunkSizes = 778LL ;
                altitude:_Endianness = "little" ;
                altitude:_Storage = "chunked" ;
                altitude:_ChunkSizes = 24 ;
                altitude:_Endianness = "little" ;
        int time(obs) ;
                time:_FillValue = -999999999 ; // "1938-04-24 22:13:21"
                time:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
                time:long_name = "Seconds since 1970-01-01 00:00:00 UTC" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 778LL ; // "1970-01-01 00:12:58"
                time:_DeflateLevel = 5LL ; // "1970-01-01 00:00:05"
                time:_Endianness = "little" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 24 ;
                time:_Endianness = "little" ;
        char obspack_id(obs, string_of_200chars) ;
                obspack_id:long_name = "Unique ObsPack observation id" ;
                obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
                obspack_id:_Storage = "chunked" ;
                obspack_id:_ChunkSizes = 1LL, 200LL ;
                obspack_id:_DeflateLevel = 5LL ;
                obspack_id:_Storage = "chunked" ;
                obspack_id:_ChunkSizes = 1, 200 ;
        byte CT_sampling_strategy(obs) ;
                CT_sampling_strategy:_FillValue = -9b ;
                CT_sampling_strategy:long_name = "model sampling strategy" ;
                CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;
                CT_sampling_strategy:_ChunkSizes = 778LL ;
                CT_sampling_strategy:_DeflateLevel = 5LL ;
                CT_sampling_strategy:_Endianness = "little" ;
                CT_sampling_strategy:_Storage = "chunked" ;
                CT_sampling_strategy:_ChunkSizes = 24 ;
        byte calendar_(calendar_) ;
                calendar_:_Storage = "contiguous" ;
        short string_of_200chars(string_of_200chars) ;
                string_of_200chars:_Storage = "contiguous" ;
                string_of_200chars:_Endianness = "little" ;
        short string_of_500chars(string_of_500chars) ;
                string_of_500chars:_Storage = "contiguous" ;
                string_of_500chars:_Endianness = "little" ;

// global attributes:
                :units = "degrees_north" ;
                :_FillValue = -1.e+34 ;
                :long_name = "Sample latitude" ;
                :_Storage = "chunked" ;
                :_ChunkSizes = 778LL ;
                :_Endianness = "little" ;
                :_NCProperties = "version=2,netcdf=4.8.1,hdf5=1.12.1" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;
data:

 obs = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
    20, 21, 22, 23 ;

 calendar_ = 0, 1, 2, 3, 4, 5 ;
 
 // global attributes:
                :units = "degrees_north" ;
                :_FillValue = -1.e+34 ;
                :long_name = "Sample latitude" ;
                :_Storage = "chunked" ;
                :_ChunkSizes = 778LL ;
                :_Endianness = "little" ;
                :_NCProperties = "version=2,netcdf=4.8.1,hdf5=1.12.1" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;
data:

 obs = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 
    20, 21, 22, 23 ;

 calendar_ = 0, 1, 2, 3, 4, 5 ;
 
 string_of_200chars = 0, 1, 2, ... 200
 
 string_of_500chars = 0, 1, 2, ... 500

I happen to have a sample obspack_co2_1_GLOBALVIEWplus_v4.1_2018-10-29.20080417.nc file on hand. When I do ncdump -cts, the output is:

netcdf obspack_co2_1_GLOBALVIEWplus_v4.1_2018-10-29.20080417 {
dimensions:
        obs = UNLIMITED ; // (3834 currently)
        calendar_components = 6 ;
        string_of_200chars = 200 ;
variables:
        int obs(obs) ;
                obs:long_name = "obs" ;
                obs:_Storage = "chunked" ;
                obs:_ChunkSizes = 1024 ;
                obs:_Endianness = "little" ;
        int time(obs) ;
                time:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
                time:_FillValue = -999999999 ; // "1938-04-24 22:13:21"
                time:long_name = "Seconds since 1970-01-01 00:00:00 UTC" ;
                time:_Storage = "chunked" ;
                time:_ChunkSizes = 959 ;
                time:_DeflateLevel = 5 ;
                time:_Endianness = "little" ;
        int model_sample_window_start(obs) ;
                model_sample_window_start:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
                model_sample_window_start:_FillValue = -999999999 ; // "1938-04-24 22:13:21"
                model_sample_window_start:long_name = "Model sampling window start time" ;
                model_sample_window_start:_Storage = "chunked" ;
                model_sample_window_start:_ChunkSizes = 959 ;
                model_sample_window_start:_DeflateLevel = 5 ;
                model_sample_window_start:_Endianness = "little" ;
        int model_sample_window_end(obs) ;
                model_sample_window_end:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
                model_sample_window_end:_FillValue = -999999999 ; // "1938-04-24 22:13:21"
                model_sample_window_end:long_name = "Model sampling window end time" ;
                model_sample_window_end:_Storage = "chunked" ;
                model_sample_window_end:_ChunkSizes = 959 ;
                model_sample_window_end:_DeflateLevel = 5 ;
                model_sample_window_end:_Endianness = "little" ;
        float latitude(obs) ;
                latitude:units = "degrees_north" ;
                latitude:_FillValue = -1.e+34f ;
                latitude:long_name = "Sample latitude" ;
                latitude:_Storage = "chunked" ;
                latitude:_ChunkSizes = 959 ;
                latitude:_DeflateLevel = 5 ;
                latitude:_Endianness = "little" ;
        float longitude(obs) ;
                longitude:units = "degrees_east" ;
                longitude:_FillValue = -1.e+34f ;
                longitude:long_name = "Sample longitude" ;
                longitude:_Storage = "chunked" ;
                longitude:_ChunkSizes = 959 ;
                longitude:_DeflateLevel = 5 ;
                longitude:_Endianness = "little" ;
        float altitude(obs) ;
                altitude:units = "meters" ;
                altitude:_FillValue = -1.e+34f ;
                altitude:long_name = "sample altitude in meters above sea level" ;
                altitude:comment = "Altitude is elevation plus sample intake height in meters above sea level." ;
                altitude:_Storage = "chunked" ;
                altitude:_ChunkSizes = 959 ;
                altitude:_DeflateLevel = 5 ;
                altitude:_Endianness = "little" ;
        float value(obs) ;
                value:units = "mol mol-1" ;
                value:_FillValue = -1.e+34f ;
                value:long_name = "measured mole fraction of carbon dioxide in air" ;
                value:comment = "Mole per mole of dry air." ;
                value:_Storage = "chunked" ;
                value:_ChunkSizes = 959 ;
                value:_DeflateLevel = 5 ;
                value:_Endianness = "little" ;
        double time_decimal(obs) ;
                time_decimal:units = "years" ;
                time_decimal:_FillValue = -1.e+34 ;
                time_decimal:long_name = "sample decimal year" ;
                time_decimal:comment = "Decimal time in UTC." ;
                time_decimal:_Storage = "chunked" ;
                time_decimal:_ChunkSizes = 480 ;
                time_decimal:_DeflateLevel = 5 ;
                time_decimal:_Endianness = "little" ;
        int time_components(obs, calendar_components) ;
                time_components:_FillValue = -9 ;
                time_components:long_name = "Calendar time components as integers.  Times and dates are UTC." ;
                time_components:order = "year, month, day, hour, minute, second" ;
                time_components:comment = "Calendar time components as integers.  Times and dates are UTC." ;
                time_components:_Storage = "chunked" ;
                time_components:_ChunkSizes = 1, 6 ;
                time_components:_DeflateLevel = 5 ;
                time_components:_Endianness = "little" ;
        char obspack_id(obs, string_of_200chars) ;
                obspack_id:long_name = "Unique ObsPack observation id" ;
                obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
                obspack_id:_Storage = "chunked" ;
                obspack_id:_ChunkSizes = 1, 200 ;
                obspack_id:_DeflateLevel = 5 ;
        int obs_flag(obs) ;
                obs_flag:units = "binary" ;
                obs_flag:_FillValue = -9 ;
                obs_flag:long_name = "obspack flag" ;
                obs_flag:comment = "Determined by data provider (1: large spatial scale representation; 0: local/regional influence)" ;
                obs_flag:_Storage = "chunked" ;
                obs_flag:_ChunkSizes = 959 ;
                obs_flag:_DeflateLevel = 5 ;
                obs_flag:_Endianness = "little" ;
        int CT_sampling_strategy(obs) ;
                CT_sampling_strategy:_FillValue = -9 ;
                CT_sampling_strategy:long_name = "model sampling strategy" ;
                CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;
                CT_sampling_strategy:_Storage = "chunked" ;
                CT_sampling_strategy:_ChunkSizes = 959 ;
                CT_sampling_strategy:_DeflateLevel = 5 ;
                CT_sampling_strategy:_Endianness = "little" ;
        float CT_MDM(obs) ;
                CT_MDM:units = "mol mol-1" ;
                CT_MDM:_FillValue = -1.e+34f ;
                CT_MDM:long_name = "CT2018 PROVISIONAL model-data mismatch, version 1" ;
                CT_MDM:warning = "Provisional CT2018 model-data-mismatch, subject to revision" ;
                CT_MDM:_Storage = "chunked" ;
                CT_MDM:_ChunkSizes = 959 ;
                CT_MDM:_DeflateLevel = 5 ;
                CT_MDM:_Endianness = "little" ;
        float CT_RMSE(obs) ;
                CT_RMSE:units = "mol mol-1" ;
                CT_RMSE:_FillValue = -1.e+34f ;
                CT_RMSE:long_name = "CT2018 PROVISIONAL RMSE, version 1" ;
                CT_RMSE:source = "RMSE from which CT2018 model-data-mismatch is generated." ;
                CT_RMSE:_Storage = "chunked" ;
                CT_RMSE:_ChunkSizes = 959 ;
                CT_RMSE:_DeflateLevel = 5 ;
                CT_RMSE:_Endianness = "little" ;
        int CT_assim(obs) ;
                CT_assim:units = "NA" ;
                CT_assim:_FillValue = -9 ;
                CT_assim:long_name = "assimilation flag (0=assim,1=no,2=withheld)" ;
                CT_assim:values = "0=assimilate; 1=do-not-assimilate; 2=assimilable but withheld for cross-validation" ;
                CT_assim:_Storage = "chunked" ;
                CT_assim:_ChunkSizes = 959 ;
                CT_assim:_DeflateLevel = 5 ;
                CT_assim:_Endianness = "little" ;
        int CT_may_reject(obs) ;
                CT_may_reject:_FillValue = -9 ;
                CT_may_reject:long_name = "may_reject EnKF flag" ;
                CT_may_reject:values = "CT internal EnKF flag, logical: 0 (FALSE)=may not reject; 1 (TRUE)=may reject" ;
                CT_may_reject:_Storage = "chunked" ;
                CT_may_reject:_ChunkSizes = 959 ;
                CT_may_reject:_DeflateLevel = 5 ;
                CT_may_reject:_Endianness = "little" ;
        int CT_may_localize(obs) ;
                CT_may_localize:_FillValue = -9 ;
                CT_may_localize:long_name = "may_localize EnKF flag" ;
                CT_may_localize:values = "CT internal EnKF flag, logical: 0 (FALSE)=may not localize; 1 (TRUE)=may localize" ;
                CT_may_localize:_Storage = "chunked" ;
                CT_may_localize:_ChunkSizes = 959 ;
                CT_may_localize:_DeflateLevel = 5 ;
                CT_may_localize:_Endianness = "little" ;
// global attributes:
                :notes = "This file contains CarbonTracker time averaged mole fraction information at select sampling locations derived from observations subset from ObsPack." ;
                :warning = "Site moves already applied. Latitude/longitude here may not be that of real site location." ;
                :disclaimer = "CarbonTracker is an open product of the NOAA Earth System Research Laboratory using data from the Global Monitoring Division greenhouse as observational network and collaborating institutions.  Model results including figures and tabular material found on the CarbonTracker website may be used for non-commercial purposes without restriction, but we request that the following acknowledgement text be included in documents or publications made using CarbonTracker results: CarbonTracker results provided by NOAA/ESRL, Boulder, Colorado, USA, http://carbontracker.noaa.gov Since we expect to continuously update the CarbonTracker product, it is important to identify which version you are using.  To provide accurate citation, please include the version of the CarbonTracker release in any use of these results. The CarbonTracker team welcomes special requests for data products not offered by default on this website, and encourages proposals for collaborative activities.  Contact us at carbontracker.team@noaa.gov." ;
                :email = "carbontracker.team@noaa.gov" ;
                :url = "http://carbontracker.noaa.gov" ;
                :institution = "NOAA Earth System Research Laboratory" ;
                :conventions = "CF-1.5" ;
                :history = "Created on Wed Oct 31 2018 03:41:19 UTC\nby script \'Time-stamp: <tfe05:/home/Andy.Jacobson/co2/input/co2_obs/obspacks/obspack_co2_1_GLOBALVIEWplus_v4.1_2018-10-29/data/andy/collate_into_dailies.r: 30 Oct 2018 (Tue) 23:39:08 UTC>\'" ;
                :output_flask_inwindow = "4" ;
                :time_window_start = "2008-04-17 00:00:00 UTC" ;
                :time_window_end = "2008-04-18 00:00:00 UTC" ;
                :_SuperblockVersion = 2 ;
                :_IsNetcdf4 = 1 ;
                :_Format = "netCDF-4" ;
data:

data:

 obs = 1, 2, 3, 4, ...
}

GEOS-Chem only needs to read in the latitude, longitude, altitude, obspack_id, time fields. I think your file has double longitude and latitude whereas GEOS-Chem is expecting to read in float. In Headers/state_diag_mod.F90, the variables that the ObsPack file variables are read into are declared as:

     !----------------------------------------------------------------------
     ! Variables for the ObsPack diagnostic
     ! NOTE: ObsPack archives point data, so don't register these
     ! as the ObsPack file format won't be COARDS-compliant!
     !----------------------------------------------------------------------

     ! ObsPack File variables
     LOGICAL                     :: Do_ObsPack
     INTEGER                     :: ObsPack_fId
     CHARACTER(LEN=1024)         :: ObsPack_InFile
     CHARACTER(LEN=1024)         :: ObsPack_OutFile

     ! ObsPack Inputs
     INTEGER                     :: ObsPack_nObs
     CHARACTER(LEN=200), POINTER :: ObsPack_Id           (:  )
     INTEGER,            POINTER :: ObsPack_nSamples     (:  )
     INTEGER,            POINTER :: ObsPack_Strategy     (:  )
     REAL(f4),           POINTER :: ObsPack_Latitude     (:  )    
     REAL(f4),           POINTER :: ObsPack_Longitude    (:  ) 
     REAL(f4),           POINTER :: ObsPack_Altitude     (:  )

You may also have a similar issue the obs dimension, which in your file is int64 (i.e. INTEGER*8) but GEOS_Chem assumes and int32 aka INTEGER.

@yantosca yantosca self-assigned this Feb 29, 2024
@yantosca yantosca added category: Bug Something isn't working topic: Diagnostics Related to output diagnostic data labels Feb 29, 2024
@alli-moon
Copy link
Author

Hi @yantosca,
Thank you for your quick response. I got the script from Yuk Chun Chan (@yc-chan) and he has agreed to have it included in GCPy once we get it working.

We added the following lines to the script.

i_coords = {'calendar_':np.arange(6).astype('int8'), 'string_of_200chars':np.arange(200).astype('int16'),
'string_of_500chars':np.arange(500).astype('int16'), 'obs':np.arange(i_num_obs).astype('int32'),}
i_lat_arr = (np.ones(i_num_obs)*np.nan).astype('float32')
i_lon_arr = (np.ones(i_num_obs)*np.nan).astype('float32')
i_alt_arr = (np.ones(i_num_obs)*np.nan).astype('float32')
i_time_arr = np.zeros([i_num_obs]).astype('int32')
i_samplemethod = np.ones(i_num_obs).astype('int8')*2 # Options: 2 = 1-hour avg; 4= instantaneous

Here is one of the resulting netCDF files:

netcdf ObsPack_config_20220531 {
dimensions:
obs = UNLIMITED ; // (24 currently)
string_of_200chars = 200 ;
calendar_ = 6 ;
string_of_500chars = 500 ;
variables:
int obs(obs) ;
float latitude(obs) ;
latitude:_FillValue = -1.e+34f ;
latitude:units = "degrees_north" ;
latitude:long_name = "Sample latitude" ;
latitude:_Storage = "chunked" ;
latitude:_ChunkSizes = 778LL ;
latitude:_Endianness = "little" ;
float longitude(obs) ;
longitude:_FillValue = -1.e+34f ;
longitude:units = "degrees_east" ;
longitude:long_name = "Sample latitude" ;
longitude:_Storage = "chunked" ;
longitude:_ChunkSizes = 778LL ;
longitude:_Endianness = "little" ;
float altitude(obs) ;
altitude:_FillValue = -1.e+34f ;
altitude:units = "meters" ;
altitude:long_name = "sample altitude in meters above sea level" ;
altitude:comment = "Altitude is surface elevation plus sample intake height in meters above sea level" ;
altitude:_Storage = "chunked" ;
altitude:_ChunkSizes = 778LL ;
altitude:_Endianness = "little" ;
int time(obs) ;
time:_FillValue = -999999999 ;
time:units = "Seconds since 1970-01-01 00:00:00 UTC" ;
time:long_name = "Seconds since 1970-01-01 00:00:00 UTC" ;
time:_Storage = "chunked" ;
time:_ChunkSizes = 778LL ;
time:_DeflateLevel = 5LL ;
time:_Endianness = "little" ;
char obspack_id(obs, string_of_200chars) ;
obspack_id:long_name = "Unique ObsPack observation id" ;
obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
obspack_id:_Storage = "chunked" ;
obspack_id:_ChunkSizes = 1LL, 200LL ;
obspack_id:_DeflateLevel = 5LL ;
byte CT_sampling_strategy(obs) ;
CT_sampling_strategy:_FillValue = -9b ;
CT_sampling_strategy:long_name = "model sampling strategy" ;
CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;
CT_sampling_strategy:_ChunkSizes = 778LL ;
CT_sampling_strategy:DeflateLevel = 5LL ;
CT_sampling_strategy:Endianness = "little" ;
byte calendar
(calendar
) ;
short string_of_200chars(string_of_200chars) ;
short string_of_500chars(string_of_500chars) ;
// global attributes:
:units = "degrees_north" ;
:_FillValue = -1.e+34 ;
:long_name = "Sample latitude" ;
:_Storage = "chunked" ;
:_ChunkSizes = 778LL ;
:_Endianness = "little" ;`

We still get the same error when the simulation begins time stepping on May 31st. Do you have any other recommendations?
Thanks!
Alli

@yantosca
Copy link
Contributor

yantosca commented Mar 1, 2024

Thanks for your patience @alli-moon. I am able to replicate this issue on my end. Stay tuned for advice.

@yantosca
Copy link
Contributor

yantosca commented Mar 1, 2024

Hi @alli-moon. The issue is in the obspack_id variable. Although the python script seems to be creating the variable properly, there seems to be an issue reading it into GEOS-Chem. I haven't figured it out yet. You may want to dig into this.

@yantosca
Copy link
Contributor

yantosca commented Mar 1, 2024

It could be that python is not saving out 200 characters per string and that is messing up the count.

@yantosca
Copy link
Contributor

yantosca commented Mar 4, 2024

Hi @alli-moon. I found a couple more things. First, the obspack_mod.F90 module was still using some references to the netCDF-F77 interface instead of the netCDF-F90 interface. I'll need to push a bugfix branch to get the fixes into 14.3.1

Secondly, the netCDF assumes that you are reading a string as an array of bytes. I was able to test this with:

 CHARACTER(LEN=1)     :: test(200,24)
  ...
 
    !----------------------------
    ! Read ID string
    !----------------------------
    st2d = (/ 1,            1    /)
    ct2d = (/ CHAR_LEN_OBS, nObs /)

    varName = 'obspack_id'
    !CALL NcRd( State_Diag%ObsPack_Id, fId, TRIM(varName), st2d, ct2d )
    CALL NcRd( test, fId, TRIM(varName), st2d, ct2d )
    print*, '### read obspack_id'
    print*, test(:,1)
    stop

which prints out:

 2022-05-31_0030_BermudaTudorHill_GC-MODEL

I think the solution is to read the obspack_id variable into a temporary array and then pack the string into State_Diag%ObsPack_id. If you want you can try to make this fix in your code but I will try to push a bugfix branch with this ASAP.

@yantosca yantosca added this to the 14.3.1 milestone Mar 5, 2024
yantosca added a commit that referenced this issue Mar 5, 2024
This merge brings PR #2188 (ObsPack bug fixes: (1) Use netCDF-F90 library
routines (2) Read obspack_id variable into a character array, by @yantosca)
into the GEOS-Chem "no-diff-to-benchmark" development stream.

This PR fixes a number of issues in ObsPack I/O as originally described
in issue #2170.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca
Copy link
Contributor

yantosca commented Mar 5, 2024

@alli-moon: I have pushed fixes for this issue into the dev/no-diff-to-benchmark branch. This will be merged into GEOS-Chem 14.3.1. I will close out this issue now as resolved.

@yantosca yantosca closed this as completed Mar 5, 2024
@alli-moon
Copy link
Author

Hi @yantosca ,
Thank you for getting the patch together so quickly! I believe I applied the patch but I am still getting the same Start+count error. Did you need to modify the Obspack input files at all in addition to making the patch? Do you have the revised Obspack script that I can try to make sure that isn't the issue?
Thanks!
Alli

@yantosca
Copy link
Contributor

yantosca commented Mar 8, 2024

Hi @alli-moon, yes, I did have to make a modification to the script. Here it is!

create_obspack_coords.py.txt.txt

@alli-moon
Copy link
Author

Hi @yantosca,
Thank you for your continued assistance to get Obspack up and running. I added the patch and used your script to generate my ObsPack inputs and I still have the same start+count error.

I am attaching my geos_chem_config, Obspack_config, and the script to generate the input files. I am also including the obspack_mod.F90 in case the patch did not get installed correctly.

Could you please take a look to see if you can figure out the issue / reproduce the error with my files?

Thanks!
Alli

Bermuda_Generate_ObsPack_bugfix.txt
geoschem_config.yml.txt
ObsPack_config_20220531.nc.txt
obspack_mod.F90.txt

@yantosca yantosca reopened this Mar 14, 2024
@yantosca
Copy link
Contributor

Thanks @alli-moon. I've reopened this issue for now. I used your netCDF file and was able to reproduce your error. Will dig a bit further.

@yantosca
Copy link
Contributor

yantosca commented Mar 14, 2024

Hi again @alli-moon. I think I see the difference. In your Python code you have:

        i_config_ds = i_config_ds.assign_coords(i_coords)
        i_config_ds.obspack_id.encoding.update({"char_dim_name": "string_of_200chars"})

which produces a variable with these dimensions:

        char obspack_id(obs, string_of_200chars) ;
                obspack_id:long_name = "Unique ObsPack observation id" ;
                obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
                obspack_id:_Storage = "chunked" ;
                obspack_id:_ChunkSizes = 1, 200 ;

In my python code I have

        # Name the char dim name as 'string_of_200chars' to make GEOS-Chem happy.
        # See the xarray GitHub link above for more info on
        # .update({'char_dim_name': 'xxxxx'})
        #i_config_ds.obspack_id.encoding.update({
        #    "char_dim_name": "string_of_200chars"
        #})
        i_config_ds = i_config_ds.assign_coords(i_coords)

so I get

        char obspack_id(obs, string200) ;
                obspack_id:long_name = "Unique ObsPack observation id" ;
                obspack_id:comment = "Unique observation id string that includes obs_id, dataset_id and obspack_num." ;
                obspack_id:_Storage = "chunked" ;
                obspack_id:_ChunkSizes = 1, 200 ;
        byte CT_sampling_strategy(obs) ;
                CT_sampling_strategy:_FillValue = -9b ;
                CT_sampling_strategy:long_name = "model sampling strategy" ;
                CT_sampling_strategy:values = "How to sample model. 1=4-hour avg; 2=1-hour avg; 3=90-min avg; 4=instantaneous" ;

I'm not 100% sure why, but letting Python set the dimension for the string width seems to work, but if you try to declare that manually, it doesn't.

@yantosca
Copy link
Contributor

yantosca commented Mar 14, 2024

@alli-moon, maybe Python (or xarray) has some special handling baked in for this, I don't know...

@alli-moon
Copy link
Author

alli-moon commented Mar 14, 2024

Hi @yantosca ,
Thanks for reopening! I tried it initially without
i_config_ds.obspack_id.encoding.update({"char_dim_name": "string_of_200chars"}).
I get the start/ count error either way. Here is the generated ObsPack file when I comment that line out. I'll also attatch the script as a sanity check. Can you see any issues with it now?
Thanks!
Alli
ObsPack_config_20220531.nc.txt
Bermuda_Generate_ObsPack_bugfix.ipynb.txt

`

@yantosca
Copy link
Contributor

Hi @alli-moon, thanks for your patience. I took your most recent Obspack_config_20220531.nc file and was able to read it successfully with the updates in these modules:

You can diff these files w/ yours and add the changes to your code.

Here is the GEOS-Chem log file from my simulation (I used the default species CO, NO, O3).

And here is the netCDF dump of the file generated by GEOS-Chem

netcdf GEOSChem.ObsPack.20220531_0000z {
dimensions:
	obs = UNLIMITED ; // (24 currently)
	species = 3 ;
	char_len_obs = 200 ;
variables:
	char obspack_id(obs, char_len_obs) ;
		obspack_id:long_name = "obspack_id" ;
		obspack_id:units = "1" ;
		obspack_id:_Storage = "chunked" ;
		obspack_id:_ChunkSizes = 1, 200 ;
		obspack_id:_NoFill = "true" ;
	int nsamples(obs) ;
		nsamples:long_name = "no. of model samples" ;
		nsamples:units = "1" ;
		nsamples:comment = "Number of discrete model samples in average" ;
		nsamples:_Storage = "chunked" ;
		nsamples:_ChunkSizes = 1024 ;
		nsamples:_Endianness = "little" ;
		nsamples:_NoFill = "true" ;
	int averaging_interval(obs) ;
		averaging_interval:long_name = "Amount of model time over which this observation is averaged" ;
		averaging_interval:units = "seconds" ;
		averaging_interval:_Storage = "chunked" ;
		averaging_interval:_ChunkSizes = 1024 ;
		averaging_interval:_Endianness = "little" ;
		averaging_interval:_NoFill = "true" ;
	int averaging_interval_start(obs) ;
		averaging_interval_start:long_name = "Start of averaging interval" ;
		averaging_interval_start:units = "seconds since 1970-01-01 00:00:00 UTC" ;
		averaging_interval_start:calendar = "standard" ;
		averaging_interval_start:_Storage = "chunked" ;
		averaging_interval_start:_ChunkSizes = 1024 ;
		averaging_interval_start:_Endianness = "little" ;
		averaging_interval_start:_NoFill = "true" ;
	int averaging_interval_end(obs) ;
		averaging_interval_end:long_name = "End of averaging interval" ;
		averaging_interval_end:units = "seconds since 1970-01-01 00:00:00 UTC" ;
		averaging_interval_end:calendar = "standard" ;
		averaging_interval_end:_Storage = "chunked" ;
		averaging_interval_end:_ChunkSizes = 1024 ;
		averaging_interval_end:_Endianness = "little" ;
		averaging_interval_end:_NoFill = "true" ;
	float lon(obs) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:_Storage = "chunked" ;
		lon:_ChunkSizes = 1024 ;
		lon:_Endianness = "little" ;
		lon:_NoFill = "true" ;
	float lat(obs) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:_Storage = "chunked" ;
		lat:_ChunkSizes = 1024 ;
		lat:_Endianness = "little" ;
		lat:_NoFill = "true" ;
	float u(obs) ;
		u:long_name = "Zonal component of wind" ;
		u:units = "m s^-1" ;
		u:_Storage = "chunked" ;
		u:_ChunkSizes = 1024 ;
		u:_Endianness = "little" ;
		u:_NoFill = "true" ;
	float v(obs) ;
		v:long_name = "Meridional component of wind" ;
		v:units = "m s^-1" ;
		v:_Storage = "chunked" ;
		v:_ChunkSizes = 1024 ;
		v:_Endianness = "little" ;
		v:_NoFill = "true" ;
	float blh(obs) ;
		blh:long_name = "Boundary layer height" ;
		blh:units = "m" ;
		blh:_Storage = "chunked" ;
		blh:_ChunkSizes = 1024 ;
		blh:_Endianness = "little" ;
		blh:_NoFill = "true" ;
	float q(obs) ;
		q:long_name = "mass_fraction_of_water_inair" ;
		q:units = "g water (kg air)^-1" ;
		q:_Storage = "chunked" ;
		q:_ChunkSizes = 1024 ;
		q:_Endianness = "little" ;
		q:_NoFill = "true" ;
	float pressure(obs) ;
		pressure:long_name = "pressure" ;
		pressure:units = "hPa" ;
		pressure:_Storage = "chunked" ;
		pressure:_ChunkSizes = 1024 ;
		pressure:_Endianness = "little" ;
		pressure:_NoFill = "true" ;
	float temperature(obs) ;
		temperature:long_name = "temperature" ;
		temperature:units = "K" ;
		temperature:_Storage = "chunked" ;
		temperature:_ChunkSizes = 1024 ;
		temperature:_Endianness = "little" ;
		temperature:_NoFill = "true" ;
	float CO(obs) ;
		CO:long_name = "Carbon monoxide" ;
		CO:units = "mol mol-1" ;
		CO:_FillValue = -1.e+34f ;
		CO:_Storage = "chunked" ;
		CO:_ChunkSizes = 1024 ;
		CO:_Endianness = "little" ;
		CO:_NoFill = "true" ;
	float NO(obs) ;
		NO:long_name = "Nitrogen oxide" ;
		NO:units = "mol mol-1" ;
		NO:_FillValue = -1.e+34f ;
		NO:_Storage = "chunked" ;
		NO:_ChunkSizes = 1024 ;
		NO:_Endianness = "little" ;
		NO:_NoFill = "true" ;
	float O3(obs) ;
		O3:long_name = "Ozone" ;
		O3:units = "mol mol-1" ;
		O3:_FillValue = -1.e+34f ;
		O3:_Storage = "chunked" ;
		O3:_ChunkSizes = 1024 ;
		O3:_Endianness = "little" ;
		O3:_NoFill = "true" ;

// global attributes:
		:history = "GEOS-Chem simulation at 2024/03/18 17:52" ;
		:conventions = "CF-1.4" ;
		:references = "www.geos-chem.org; wiki.geos-chem.org" ;
		:model_start_date = "2022/05/31 00:00:00 UTC" ;
		:model_end_date = "2022/06/01 00:00:00 UTC" ;
		:_NCProperties = "version=2,netcdf=4.8.0,hdf5=1.10.7" ;
		:_SuperblockVersion = 2 ;
		:_IsNetcdf4 = 0 ;
		:_Format = "netCDF-4 classic model" ;
data:
}

@alli-moon
Copy link
Author

Hi @yantosca ,
I updated my Obspack and state_diag files to match yours and still have the start+count error. The model runs fine when I turn Obspack off. Are there any other modules I need to update?
Thanks!
Alli

@yantosca
Copy link
Contributor

Hi @alli-moon! Yes, I believe I've forgotten Headers/charpak_mod.F90. That has a couple of functions that converts a character array to a string and back.

@msulprizio msulprizio removed this from the 14.3.1 milestone Apr 2, 2024
@yantosca
Copy link
Contributor

yantosca commented Apr 9, 2024

Hi @alli-moon! Were you able to resolve this issue? I wasn't able to reproduce your error with the obspack file that I produced (and with the modification in charpak_mod.F90).

@alli-moon
Copy link
Author

Hi @yantosca,
Thanks for following up and all of your obspack help. I was not able to get it to run without the start+count error. My current solution is to run hourly simulations for each campaign and slice output for the gridbox and level I need. This may actually be beneficial since the additional diagnostics might be helpful for our project (though storage space is an issue). I am happy to try other fixes you may have, especially if it turns out that other people have this obspack problem. Thank you again for your help!

Copy link
Contributor

yantosca commented Apr 9, 2024

Thanks @alli-moon. Sorry it didn't fix your issue. I'll close this issue for now.

You can also look into subsetting the History diagnostics with LON_RANGE and LAT_RANGE. For example, if you don't need the southern hemisphere you can cut all that out and only save the northern hemisphere.

@yantosca yantosca closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Diagnostics Related to output diagnostic data
Projects
None yet
3 participants