Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems when array of coordinate bounds is 2D #667

Closed
spencerahill opened this issue Nov 25, 2015 · 4 comments
Closed

Problems when array of coordinate bounds is 2D #667

spencerahill opened this issue Nov 25, 2015 · 4 comments

Comments

@spencerahill
Copy link
Contributor

Most of the netCDF data I work with stores, in addition to the coordinates themselves, the bounds of each coordinate value. Often these bounds are stored as arrays with shape Nx2, where N is the number of points for that coordinate. For example:

$ ncdump -c /archive/Spencer.Hill/am3/am3clim_hurrell/gfdl.ncrc2-intel-prod-openmp/pp/atmos/ts/monthly/1yr/atmos.201001-201012.t_surf.nc
netcdf atmos.201001-201012.t_surf {
dimensions:
    time = UNLIMITED ; // (12 currently)
    lat = 90 ;
    bnds = 2 ;
    lon = 144 ;
variables:
    double average_DT(time) ;
        average_DT:long_name = "Length of average period" ;
        average_DT:units = "days" ;
        average_DT:missing_value = 1.e+20 ;
        average_DT:_FillValue = 1.e+20 ;
    double average_T1(time) ;
        average_T1:long_name = "Start time for average period" ;
        average_T1:units = "days since 1980-01-01 00:00:00" ;
        average_T1:missing_value = 1.e+20 ;
        average_T1:_FillValue = 1.e+20 ;
    double average_T2(time) ;
        average_T2:long_name = "End time for average period" ;
        average_T2:units = "days since 1980-01-01 00:00:00" ;
        average_T2:missing_value = 1.e+20 ;
        average_T2:_FillValue = 1.e+20 ;
    double lat(lat) ;
        lat:long_name = "latitude" ;
        lat:units = "degrees_N" ;
        lat:cartesian_axis = "Y" ;
        lat:bounds = "lat_bnds" ;
    double lat_bnds(lat, bnds) ;
        lat_bnds:long_name = "latitude bounds" ;
        lat_bnds:units = "degrees_N" ;
        lat_bnds:cartesian_axis = "Y" ;
    double lon(lon) ;
        lon:long_name = "longitude" ;
        lon:units = "degrees_E" ;
        lon:cartesian_axis = "X" ;
        lon:bounds = "lon_bnds" ;
    double lon_bnds(lon, bnds) ;
        lon_bnds:long_name = "longitude bounds" ;
        lon_bnds:units = "degrees_E" ;
        lon_bnds:cartesian_axis = "X" ;
    float t_surf(time, lat, lon) ;
        t_surf:long_name = "surface temperature" ;
        t_surf:units = "deg_k" ;
        t_surf:valid_range = 100.f, 400.f ;
        t_surf:missing_value = 1.e+20f ;
        t_surf:_FillValue = 1.e+20f ;
        t_surf:cell_methods = "time: mean" ;
        t_surf:time_avg_info = "average_T1,average_T2,average_DT" ;
        t_surf:interp_method = "conserve_order2" ;
    double time(time) ;
        time:long_name = "time" ;
        time:units = "days since 1980-01-01 00:00:00" ;
        time:cartesian_axis = "T" ;
        time:calendar_type = "JULIAN" ;
        time:calendar = "JULIAN" ;
        time:bounds = "time_bounds" ;
    double time_bounds(time, bnds) ;
        time_bounds:long_name = "time axis boundaries" ;
        time_bounds:units = "days" ;
        time_bounds:missing_value = 1.e+20 ;
        time_bounds:_FillValue = 1.e+20 ;

// global attributes:
        :filename = "atmos.201001-201012.t_surf.nc" ;
        :title = "am3clim_hurrell" ;
        :grid_type = "mosaic" ;
        :grid_tile = "1" ;
        :comment = "pressure level interpolator, version 3.0, precision=double" ;
        :history = "fregrid --input_mosaic atmos_mosaic.nc --input_file 20100101.atmos_month --interp_method conserve_order2 --remap_file .fregrid_remap_file_144_by_90 --nlon 144 --nlat 90 --scalar_field (**please see the field list in this file**)" ;
        :code_version = "$Name: fre-nctools-bronx-7 $" ;
data:

 lat = -89, -87, -85, -83, -81, -79, -77, -75, -73, -71, -69, -67, -65, -63,
    -61, -59, -57, -55, -53, -51, -49, -47, -45, -43, -41, -39, -37, -35,
    -33, -31, -29, -27, -25, -23, -21, -19, -17, -15, -13, -11, -9, -7, -5,
    -3, -1, 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,
    35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69,
    71, 73, 75, 77, 79, 81, 83, 85, 87, 89 ;

 lon = 1.25, 3.75, 6.25, 8.75, 11.25, 13.75, 16.25, 18.75, 21.25, 23.75,
    26.25, 28.75, 31.25, 33.75, 36.25, 38.75, 41.25, 43.75, 46.25, 48.75,
    51.25, 53.75, 56.25, 58.75, 61.25, 63.75, 66.25, 68.75, 71.25, 73.75,
    76.25, 78.75, 81.25, 83.75, 86.25, 88.75, 91.25, 93.75, 96.25, 98.75,
    101.25, 103.75, 106.25, 108.75, 111.25, 113.75, 116.25, 118.75, 121.25,
    123.75, 126.25, 128.75, 131.25, 133.75, 136.25, 138.75, 141.25, 143.75,
    146.25, 148.75, 151.25, 153.75, 156.25, 158.75, 161.25, 163.75, 166.25,
    168.75, 171.25, 173.75, 176.25, 178.75, 181.25, 183.75, 186.25, 188.75,
    191.25, 193.75, 196.25, 198.75, 201.25, 203.75, 206.25, 208.75, 211.25,
    213.75, 216.25, 218.75, 221.25, 223.75, 226.25, 228.75, 231.25, 233.75,
    236.25, 238.75, 241.25, 243.75, 246.25, 248.75, 251.25, 253.75, 256.25,
    258.75, 261.25, 263.75, 266.25, 268.75, 271.25, 273.75, 276.25, 278.75,
    281.25, 283.75, 286.25, 288.75, 291.25, 293.75, 296.25, 298.75, 301.25,
    303.75, 306.25, 308.75, 311.25, 313.75, 316.25, 318.75, 321.25, 323.75,
    326.25, 328.75, 331.25, 333.75, 336.25, 338.75, 341.25, 343.75, 346.25,
    348.75, 351.25, 353.75, 356.25, 358.75 ;

 time = 10973.5, 11003, 11032.5, 11063, 11093.5, 11124, 11154.5, 11185.5,
    11216, 11246.5, 11277, 11307.5 ;
}

These 2-D bounding arrays lead to the "Buffer has wrong number of dimensions" error in #665. In the case of #665, only the time coordinate has this 2-D bounds array; here other coordinates (namely lat and lon) have it as well.

Conceptually, these bound arrays represent coordinates, but when read in as a Dataset, they become variables, not coordinates. Perhaps this is part of the problem?

@shoyer
Copy link
Member

shoyer commented Nov 25, 2015

Is it possible for you to share a netcdf file that reproduces the issue? Based on the ncdump result, I think xray should handle it fine...

@jhamman
Copy link
Member

jhamman commented Nov 25, 2015

I think ultimately, these are going to end up as Variables, not Coordinates. The CF convention refers to them as "Boundary Variables", and although they are essentially metadata for the Coordinates, I don't think that sort of complexity makes sense for xray right now.

It shouldn't be too hard to fix the 2-d bounds problem though.

@spencerkclark
Copy link
Member

I think the reason one gets the "Buffer has wrong number of dimensions" error here is still because of the presence of the time_bounds variable. If we drop time_bounds upon reading the file in, I think things work OK.

In [2]: xray.open_dataset('/archive/Spencer.Hill/am3/am3clim_hurrell/gfdl.ncrc2-intel-prod-openmp/pp/atmos/ts/monthly/1yr/atmos.201001-201012.t_surf.nc', drop_variables='time_bounds')
Out[2]:
<xray.Dataset>
Dimensions:     (bnds: 2, lat: 90, lon: 144, time: 12)
Coordinates:
  * lat         (lat) float64 -89.0 -87.0 -85.0 -83.0 -81.0 -79.0 -77.0 ...
  * lon         (lon) float64 1.25 3.75 6.25 8.75 11.25 13.75 16.25 18.75 ...
  * time        (time) datetime64[ns] 2010-01-16T12:00:00 2010-02-15 ...
  * bnds        (bnds) int64 0 1
Data variables:
    average_DT  (time) timedelta64[ns] 31 days 28 days 31 days 30 days ...
    average_T1  (time) datetime64[ns] 2010-01-01 2010-02-01 2010-03-01 ...
    average_T2  (time) datetime64[ns] 2010-02-01 2010-03-01 2010-04-01 ...
    lat_bnds    (lat, bnds) float64 -90.0 -88.0 -88.0 -86.0 -86.0 -84.0 ...
    lon_bnds    (lon, bnds) float64 0.0 2.5 2.5 5.0 5.0 7.5 7.5 10.0 10.0 ...
    t_surf      (time, lat, lon) float64 245.9 245.9 245.8 245.7 245.7 245.6 ...
Attributes:
    filename: atmos.201001-201012.t_surf.nc
    title: am3clim_hurrell
    grid_type: mosaic
    grid_tile: 1
    comment: pressure level interpolator, version 3.0, precision=double
    history: fregrid --input_mosaic atmos_mosaic.nc --input_file 20100101.atmos_month --interp_method conserve_order2 --remap_file .fregrid_remap_file_144_by_90 --nlon 144 --nlat 90 --scalar_field (**please see the field list in this file**)
    code_version: $Name: fre-nctools-bronx-7 $

@spencerahill
Copy link
Contributor Author

Sorry, @spencerkclark is right, the ValueError issue we had was also due to the 2D time bounds array only. For example: (the netCDF file used below is also at ftp://ftp.gfdl.noaa.gov/pub/s1h/atmos.201001-201012.t_surf.nc)

In [1]: ds = xray.open_dataset('/archive/Spencer.Hill/am3/am3clim_hurrell/gfdl.ncrc2-intel-prod-openmp/pp/atmos/ts/monthly/1yr/atmos.201001-201012.t_surf.nc')

In [2]: print(ds)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-4d24098ddece> in <module>()
----> 1 print(ds)

/home/s1h/anaconda/lib/python2.7/site-packages/xray/core/dataset.pyc in __repr__(self)
    885
    886     def __repr__(self):
--> 887         return formatting.dataset_repr(self)
    888
    889     @property

...

/home/s1h/anaconda/lib/python2.7/site-packages/pandas/tseries/timedeltas.pyc in _convert_listlike(arg, box, unit, name)
     47             value = arg.astype('timedelta64[{0}]'.format(unit)).astype('timedelta64[ns]', copy=False)
     48         else:
---> 49             value = tslib.array_to_timedelta64(_ensure_object(arg), unit=unit, errors=errors)
     50             value = value.astype('timedelta64[ns]', copy=False)
     51

pandas/tslib.pyx in pandas.tslib.array_to_timedelta64 (pandas/tslib.c:47046)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

In [3]: ds2 = ds.drop('time_bounds')

In [4]: print(ds2)
<xray.Dataset>
Dimensions:     (bnds: 2, lat: 90, lon: 144, time: 12)
Coordinates:
  * lat         (lat) float64 -89.0 -87.0 -85.0 -83.0 -81.0 -79.0 -77.0 ...
  * lon         (lon) float64 1.25 3.75 6.25 8.75 11.25 13.75 16.25 18.75 ...
  * time        (time) datetime64[ns] 2010-01-16T12:00:00 2010-02-15 ...
  * bnds        (bnds) int64 0 1
Data variables:
    average_DT  (time) timedelta64[ns] 31 days 28 days 31 days 30 days ...
    average_T1  (time) datetime64[ns] 2010-01-01 2010-02-01 2010-03-01 ...
    average_T2  (time) datetime64[ns] 2010-02-01 2010-03-01 2010-04-01 ...
    lat_bnds    (lat, bnds) float64 -90.0 -88.0 -88.0 -86.0 -86.0 -84.0 ...
    lon_bnds    (lon, bnds) float64 0.0 2.5 2.5 5.0 5.0 7.5 7.5 10.0 10.0 ...
    t_surf      (time, lat, lon) float64 245.9 245.9 245.8 245.7 245.7 245.6 ...
...

The errors I was thinking of relating to these lat- and lon-bounds were ultimately due to errors in my own code...my mistakes appear to be the unifying theme here! Sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants