-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'time.units' being dropped if differs across files being loaded #212
Comments
@spencerahill might you be able to address this with a |
@spencerkclark, thanks, yes, sort of forgot about But once I get that working I'd like to adapt the logic as necessary to create a permanent internal fix. If this particular CMIP run is processed like this, it is likely others are too (although none of the other 19 that I'm working with are). |
@spencerkclark here's what I came up with as the def _ensure_matching_time_units(ds, **kwargs):
"""Force all datasets to have matching time units attribute."""
target_units_yr = 1979 # arbitrary
time = ds[TIME_STR]
units = time.attrs['units']
units_yr = int(units.split(' since ')[1].split('-')[0])
# The logic below would likely fail for many edge cases, so make sure the data
# adheres precisely to my expectations before attempting this workaround.
do_shift_year = (units.startswith('days since') and
units.endswith('-01-01 00:00:00') and
time.attrs['calendar'] == 'noleap' and
units_yr != target_units_yr)
if do_shift_year:
days_diff = 365.*(units_yr - target_units_yr)
for var_str in [TIME_STR, TIME_BOUNDS_STR, 'time_bnds']:
if var_str in ds:
var = ds[var_str]
var += days_diff
var.attrs['units'] = units.replace(str(units_yr),
str(target_units_yr))
return ds There are some immediate fixes that could account for some of the edge cases. But before digging any deeper I'm wondering if you have any comments. This has addressed my immediate need already, so no rush; I know you have a lot on your plate right now. |
@spencerahill sorry for taking a while to get back to you on this! What you have above is a good start and captures the essence of what needs to be done. In a perfect world something like the following would work in general: def _ensure_matching_time_units(ds, **kwargs):
target_units = 'days since 1979-01-01'
for name in [TIME_STR, TIME_BOUNDS_STR]:
if name in ds:
units = ds[name].attrs['units']
calendar = ds[name].attrs['calendar']
var = xr.apply_ufunc(nc4.num2date, ds[name], units, calendar)
var = xr.apply_ufunc(nc4.date2num, var, target_units, calendar)
var.attrs['units'] = target_units
var.attrs['calendar'] = calendar
ds[name] = var
return ds But there are complications associated with standard (!) calendars. The issue is that in DataArrays, xarray automatically converts def _rebase_times(values, input_units, calendar, output_units):
dates = nc4.num2date(values, input_units, calendar)
return nc4.date2num(dates, output_units, calendar)
def _ensure_matching_time_units(ds, **kwargs):
target_units = 'days since 1979-01-01'
for name in [TIME_STR, TIME_BOUNDS_STR]:
if name in ds:
units = ds[name].attrs['units']
calendar = ds[name].attrs['calendar']
var = xr.apply_ufunc(_rebase_times, ds[name], units, calendar, target_units)
var.attrs['units'] = target_units
var.attrs['calendar'] = calendar
ds[name] = var
return ds Lastly, I've found that the units on def _rebase_times(values, input_units, calendar, output_units):
dates = nc4.num2date(values, input_units, calendar)
return nc4.date2num(dates, output_units, calendar)
def _ensure_matching_time_units(ds, **kwargs):
target_units = 'days since 1979-01-01'
units = ds[TIME_STR].attrs['units']
calendar = ds[TIME_STR].attrs['calendar']
for name in [TIME_STR, TIME_BOUNDS_STR]:
if name in ds:
var = xr.apply_ufunc(_rebase_times, ds[name], units, calendar, target_units)
var.attrs['units'] = target_units
var.attrs['calendar'] = calendar
ds[name] = var
return ds So still not as clean as we might like, but at least it delegates the parsing of the date units and time rebasing to upstream libraries, and is calendar-type agnostic. Really the ideal way to address this is to decode the times before concatenating the files (this is indeed what xarray does in |
@spencerkclark thanks for this excellent explanation and code snippets. I agree using the upstream, calendar-agnostic methods is the way to go. Will proceed (whenever it is I come back to this!) starting from your last
This statement is true for so many parts of our codebase! If only somebody came up with an upstream fix... 😜 |
@spencerkclark I was thinking about returning to this today, but now that the xarray More generally, would it be useful for us to start tracking the (many) things we want to do once that's in? Could be helpful in organizing our efforts but maybe not worth the time? |
@spencerahill assuming that it's not too tricky to add a version of Yes, it would be helpful to create an issue to organize our efforts. I'll try and get to it over the next few days. |
OK sounds good.
Thanks, I appreciate that. I figure you're better equipped to do it than I am. No rush -- after my blitz of activity today, I'll likely go dormant on this for another couple weeks or so...I'm finding lately the best way to make time for this without overdoing it is to block out a day or two one every few weeks. (I still check notifications nightly though for bug reports etc.) |
The data for a CMIP5 simulation I am trying to use is split into two netCDF files along the time dimension, as is common. But, annoyingly, they have encoded a different
units
attribute for each file. I think, as a result, the attribute is being dropped whenopen_mfdataset
is called, based on this partial traceback:The place where the attribute is being dropped is in the
xr.open_mfdataset
call in _load_data_from_disk. So it would appear we need to add some logic that checks that the time units attrs are the same across all netCDF files to be loaded. If not, we'll need to somehow force them to have the same units. Or something else? @spencerkclark, what do you think?Here's an
ncdump -h
for each:and
For those w/ GFDL access, the files in question are
The text was updated successfully, but these errors were encountered: