Resample not working when time coordinate is timezone aware #1490

benoit-fuentes · 2017-07-26T08:47:55Z

hi all,
here is the code to reproduce the bug

import pandas as pd
import xarray as xr
time1 = pd.date_range('2000-01-01', freq='H', periods=365 * 24)  #timezone naïve
time2 = pd.date_range('2000-01-01', freq='H', periods=365 * 24, tz='UTC')  #timezone aware
ds1 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time1})
ds2 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time2})
ds1.resample('3H', 'time', how='mean')  #works fine
ds2.resample('3H', 'time', how='mean')  #returns an error

This last line returns the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-0de4b0d703bd> in <module>()
      4 ds2 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time2})
      5 ds1.resample('3H', 'time', how='mean')
----> 6 ds.resample('3H', 'time', how='mean')

~/.virtualenvs/planck3/lib/python3.5/site-packages/xarray/core/common.py in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs)
    546         time_grouper = pd.TimeGrouper(freq=freq, how=how, closed=closed,
    547                                       label=label, base=base)
--> 548         gb = self._groupby_cls(self, group, grouper=time_grouper)
    549         if isinstance(how, basestring):
    550             f = getattr(gb, how)

~/.virtualenvs/planck3/lib/python3.5/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    243                 raise ValueError('index must be monotonic for resampling')
    244             s = pd.Series(np.arange(index.size), index)
--> 245             first_items = s.groupby(grouper).first()
    246             if first_items.isnull().any():
    247                 full_index = first_items.index

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   4414         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   4415                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 4416                        **kwargs)
   4417 
   4418     def asfreq(self, freq, method=None, how=None, normalize=False,

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1697         raise TypeError('invalid type: %s' % type(obj))
   1698 
-> 1699     return klass(obj, by, **kwds)
   1700 
   1701 

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    390                                                     level=level,
    391                                                     sort=sort,
--> 392                                                     mutated=self.mutated)
    393 
    394         self.obj = obj

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2605     # a passed-in Grouper, directly convert
   2606     if isinstance(key, Grouper):
-> 2607         binner, grouper, obj = key._get_grouper(obj)
   2608         if key.key is None:
   2609             return grouper, [], obj

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/resample.py in _get_grouper(self, obj)
   1093     def _get_grouper(self, obj):
   1094         # create the resampler and return our binner
-> 1095         r = self._get_resampler(obj)
   1096         r._set_binner()
   1097         return r.binner, r.grouper, r.obj

~/.virtualenvs/planck3/lib/python3.5/site-packages/pandas/core/resample.py in _get_resampler(self, obj, kind)
   1089         raise TypeError("Only valid with DatetimeIndex, "
   1090                         "TimedeltaIndex or PeriodIndex, "
-> 1091                         "but got an instance of %r" % type(ax).__name__)
   1092 
   1093     def _get_grouper(self, obj):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

My config:
xarray==0.9.6
pandas==0.20.3
numpy==1.13.1
python-dateutil==2.6.1
six==1.10.0
pytz==2017.2

Tested on python 2.7 and python 3.5.2

The text was updated successfully, but these errors were encountered:

darothen · 2017-07-26T14:57:58Z

Did some digging.

Note here that the dtypes of time1 and time2 are different; the first is a datetime64[ns] but the second is a datetime64[ns, UTC]. For the sake of illustration, I'm going to change the timezone to EST. If we print time2, we get something that looks like this:

>>> time2
DatetimeIndex(['2000-01-01 00:00:00-05:00', '2000-01-01 01:00:00-05:00',
               '2000-01-01 02:00:00-05:00', '2000-01-01 03:00:00-05:00',
               '2000-01-01 04:00:00-05:00', '2000-01-01 05:00:00-05:00',
               '2000-01-01 06:00:00-05:00', '2000-01-01 07:00:00-05:00',
               '2000-01-01 08:00:00-05:00', '2000-01-01 09:00:00-05:00',
               ...
               '2000-12-30 14:00:00-05:00', '2000-12-30 15:00:00-05:00',
               '2000-12-30 16:00:00-05:00', '2000-12-30 17:00:00-05:00',
               '2000-12-30 18:00:00-05:00', '2000-12-30 19:00:00-05:00',
               '2000-12-30 20:00:00-05:00', '2000-12-30 21:00:00-05:00',
               '2000-12-30 22:00:00-05:00', '2000-12-30 23:00:00-05:00'],
              dtype='datetime64[ns, EST]', length=8760, freq='H')

But, if we directly print its values, we get something slightly different:

>>> time2.values
array(['2000-01-01T05:00:00.000000000', '2000-01-01T06:00:00.000000000',
       '2000-01-01T07:00:00.000000000', ...,
       '2000-12-31T02:00:00.000000000', '2000-12-31T03:00:00.000000000',
       '2000-12-31T04:00:00.000000000'], dtype='datetime64[ns]')

The difference is that the timezone delta has been automatically added in terms of hours to each value in time2. This brings up something to note: if you construct your Dataset using time1.values and time2.values, there is no problem:

import pandas as pd
import xarray as xr
time1 = pd.date_range('2000-01-01', freq='H', periods=365 * 24)  #timezone naïve
time2 = pd.date_range('2000-01-01', freq='H', periods=365 * 24, tz='UTC')  #timezone aware
ds1 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time1.values})
ds2 = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time2.values})
ds1.resample('3H', 'time', how='mean')  # works fine
ds2.resample('3H', 'time', how='mean')  # works fine

Both time1 and time2 are instances of pd.DatetimeIndex which are subclasses of pd.Index. When xarray tries to turn them into Variables, it ultimately uses a PandasIndexAdapter to decode the contents of time1 and time2, and this is where the trouble happens. The PandasIndexAdapter tries to safely cast the dtype of the array it is passed, which works just fine for time1. But for some weird reason, numpy doesn't recognize its own datetime dtypes when they have timezone information. That is, this will work:

>>> np.dtype('datetime64[ns]')
dtype('<M8[ns]')

But this won't:

>>> np.dtype('datetime64[ns, UTC]')
TypeError: Invalid datetime unit in metadata string "[ns, UC]"

But also, the type of time2.dtype is a pandas.types.dtypes.DatetimeTZDtype, which NumPy doesn't know what to do with (it doesn't know how to map that type to its own datetime64).

So what happens is that the resulting Variable which defines the time coordinate on your ds2 has an array with the correct values, but is explicitly told to have the dtype object. When the array is decoded, then, bad things happen.

One solution would be to catch this potential glitch in either is_valid_numpy_dtype() or the PandasIndexAdapter constructor. Alternatively, we could eagerly coerce arrays with type pandas.types.dtypes.DatetimeTZDtype into numpy-compliant types at some earlier point.

shoyer · 2017-07-26T21:08:51Z

NumPy doesn't support timezones, but pandas does. This puts things in a slightly tricky position for xarray.

We do manage to get things to work for pandas dtypes stored in indexes, in most cases. Given that our resampling behavior also relies on pandas, I think we should be able to get this work, probably by tweaking our PandasIndexAdapter, as @darothen notes.

It's borderline whether this is a new bug or feature, but this would certainly be nice to fix if possible, so I'm marking this as "Contributions welcome".

stale · 2019-06-26T21:30:39Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

stale · 2021-07-02T13:51:30Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

Progress to end of 2022: Trying to incorporate temporal filtering into exposure tidal monitoring. Prototype workflow used pandas. Current approach using xarray. See the following issue for discussion of timezone aware datetimes in xarray pydata/xarray#1490

benoit-fuentes changed the title ~~Resample not working when time coordonate is timezone aware~~ Resample not working when time coordinate is timezone aware Jul 26, 2017

shoyer added the contrib-help-wanted label Jul 26, 2017

darothen mentioned this issue Aug 19, 2017

Groupby-like API for resampling #1272

Merged

9 tasks

zachdj mentioned this issue Apr 24, 2019

Bug in validate method cbarrick/apollo#86

Closed

stale bot added the stale label Jun 26, 2019

stale bot closed this as completed Jul 26, 2019

jhamman reopened this Jul 26, 2019

stale bot removed the stale label Jul 26, 2019

stale bot added the stale label Jul 2, 2021

dcherian removed the stale label Jul 2, 2021

dcherian mentioned this issue Jul 13, 2021

xarray does not seem to work with timezone aware indexes #5594

Open

dcherian added the topic-indexing label Jul 13, 2021

dcherian added this to To do in Explicit Indexes via automation Jul 13, 2021

dcherian moved this from To do to Would enable this in Explicit Indexes Jul 13, 2021

hrzn mentioned this issue Dec 1, 2021

Issues with time zones unit8co/darts#648

Closed

dopplershift mentioned this issue Dec 4, 2023

Timezone Aware Datetime with Python>=3.12 Unidata/MetPy#3298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resample not working when time coordinate is timezone aware #1490

Resample not working when time coordinate is timezone aware #1490

benoit-fuentes commented Jul 26, 2017 •

edited

darothen commented Jul 26, 2017

shoyer commented Jul 26, 2017

stale bot commented Jun 26, 2019

stale bot commented Jul 2, 2021

Resample not working when time coordinate is timezone aware #1490

Resample not working when time coordinate is timezone aware #1490

Comments

benoit-fuentes commented Jul 26, 2017 • edited

darothen commented Jul 26, 2017

shoyer commented Jul 26, 2017

stale bot commented Jun 26, 2019

stale bot commented Jul 2, 2021

benoit-fuentes commented Jul 26, 2017 •

edited