New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resample discards timezone information #13238

Closed
jeremywhelchel opened this Issue May 20, 2016 · 11 comments

Comments

Projects
None yet
6 participants
@jeremywhelchel

jeremywhelchel commented May 20, 2016

It appears that resample is now dropping timezone information on the index. Is this expected?

a = pp.Series(index=pp.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00'],
                                     tz=pytz.timezone('America/Los_Angeles')),
              data=[1, 2, 3])
b = a.resample('D', how='max')
print repr(a.index.tz)
print repr(b.index.tz)
print a.index.tz == b.index.tz

produces the following in 0.16.1. Notably the timezones for the original series and the resampled series are the same.

<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
True

However that's not the case for 0.18.1. Caveat is that we're using the new resample syntax: b = a.resample('D').max()

<DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>
<DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>
False
@jreback

This comment has been minimized.

Contributor

jreback commented May 20, 2016

rewriting your example like how we test

In [10]: s = pd.Series(index=pd.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00'],
                                     tz='America/Los_Angeles'),
              data=[1, 2, 3])

In [11]: s
Out[11]: 
2013-01-01 06:00:00-08:00    1
2013-01-01 07:00:00-08:00    2
2013-01-02 06:00:00-08:00    3
dtype: int64

In [12]: s.index
Out[12]: DatetimeIndex(['2013-01-01 06:00:00-08:00', '2013-01-01 07:00:00-08:00', '2013-01-02 06:00:00-08:00'], dtype='datetime64[ns, America/Los_Angeles]', freq=None)

In [13]: s.resample('D').max()
Out[13]: 
2013-01-01 00:00:00-08:00    2
2013-01-02 00:00:00-08:00    3
Freq: D, dtype: int64

In [14]: s.resample('D').max().index
Out[14]: DatetimeIndex(['2013-01-01 00:00:00-08:00', '2013-01-02 00:00:00-08:00'], dtype='datetime64[ns, America/Los_Angeles]', freq='D')

datetime w/tz and resampling are not tested very much. pull-requests to do more of this are welcome (the fix is very straightforward).

@jeremywhelchel

This comment has been minimized.

jeremywhelchel commented May 20, 2016

Thanks for the quick look!

Note that your re-written example doesn't display the problem. You need to look at repr(...index.tz) to see the difference (LMT-1 day, 16:07:00 STD vs PST-1 day, 16:00:00 STD)

@jreback

This comment has been minimized.

Contributor

jreback commented May 20, 2016

have a look again at [12] vs [14]. the time changed. (look t the hour). the issue is the re-localization is off. The actual tz is correct.

@jreback jreback modified the milestones: 0.19.0, Next Major Release Sep 28, 2016

@jaredsnyder

This comment has been minimized.

Contributor

jaredsnyder commented May 22, 2017

I'm going to attempt fixing this one

@jaredsnyder

This comment has been minimized.

Contributor

jaredsnyder commented May 22, 2017

Wouldn't we expect the index values to be at midnight on each day given that we're aggregating up to the day level? If we run the code above without the timezones we get a similar output:

>>> s_notz = pd.Series(index=pd.DatetimeIndex(['2013-01-01 06:00', '2013-01-01 07:00', '2013-01-02 06:00']),
...               data=[1, 2, 3])
>>> s_notz.index
DatetimeIndex(['2013-01-01 06:00:00', '2013-01-01 07:00:00',
               '2013-01-02 06:00:00'],
              dtype='datetime64[ns]', freq=None)
>>> s_notz.resample("D").max()
2013-01-01    2
2013-01-02    3
Freq: D, dtype: int64
>>> s_notz.resample("D").max().index
DatetimeIndex(['2013-01-01', '2013-01-02'], dtype='datetime64[ns]', freq='D')
@rockg

This comment has been minimized.

Contributor

rockg commented Jan 17, 2018

Any progress on this? Had to spend about 20 minutes figuring this out and it impacts all join operations as the timezones are not treated as equal despite having the same zone name. So all binary operations will result in losing timezone information ([13] below). Using tz_convert brings the timezone back in line, but shouldn't be necessary. Some more examples:

In [11]: s = pd.Series([2], index=pd.date_range('2017-01-01', periods=48, freq="H", tz="US/Eastern"))

In [12]: ss = s / s.resample("D").mean()

In [13]: ss.dropna()
Out[13]: 
2017-01-01 05:00:00+00:00    1.0
2017-01-02 05:00:00+00:00    1.0
dtype: float64

In [14]: ss1 = s / s.resample("D").mean().tz_convert("US/Eastern")

In [15]: ss1.dropna()
Out[15]: 
2017-01-01 00:00:00-05:00    1.0
2017-01-02 00:00:00-05:00    1.0
dtype: float64

In [16]: s.resample("D").mean()
Out[16]: 
2017-01-01 00:00:00-05:00    2
2017-01-02 00:00:00-05:00    2

In [22]: s.resample("D").mean().index.tz
Out[22]: <DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>

In [23]: s.index.tz
Out[23]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
@jreback

This comment has been minimized.

Contributor

jreback commented Jan 17, 2018

i think #18596 will fix this if u want to give that a try

@rockg

This comment has been minimized.

Contributor

rockg commented Jan 17, 2018

Yes, that will fix the join issue. Thanks.

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 17, 2018

ohh great

lmk rebase that tomorrow

i suspect this will close a number of other issues as well
if you could have a look thru would be great

@jreback

This comment has been minimized.

Contributor

jreback commented Jan 17, 2018

@rockg xref #19281, though I don't think this actually fixes it, the tz is still converted incorrectly. welcome an investigation.

@rockg

This comment has been minimized.

Contributor

rockg commented Jan 17, 2018

Yeah, I agree it doesn't fix the resample, just the join issue ([13] in my example). It works despite resample returning a different tzinfo, so half the battle.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment