New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AmbiguousTimeError on groupby when including a DST change #14682

Closed
j-santander opened this Issue Nov 17, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@j-santander

A small, complete example of the issue

#!/usr/bin/env python
import pandas as pd
df=pd.DataFrame([1477786980,1477790580],columns=['ts'])
df['date']=pd.to_datetime(df.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid')
df.set_index('date', inplace=True)

dfo = df.groupby(pd.TimeGrouper('5min'))

Expected Output

                           ts
date                         
2016-10-30 02:20:00+02:00   1
2016-10-30 02:25:00+02:00   0
2016-10-30 02:30:00+02:00   0
2016-10-30 02:35:00+02:00   0
2016-10-30 02:40:00+02:00   0
2016-10-30 02:45:00+02:00   0
2016-10-30 02:50:00+02:00   0
2016-10-30 02:55:00+02:00   0
2016-10-30 02:00:00+01:00   0
2016-10-30 02:05:00+01:00   0
2016-10-30 02:10:00+01:00   0
2016-10-30 02:15:00+01:00   0
2016-10-30 02:20:00+01:00   1

Output of pd.show_versions()

# Paste the output here pd.show_versions() here >>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 28.6.1
Cython: 0.25.1
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: 1.4.8
patsy: None
dateutil: 2.4.2
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.1
matplotlib: None
openpyxl: 2.2.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: 0.9.1
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

The above code raises an AmbiguousTimeError exception, when grouping by a time-date series including a DST change. In the above example the unix timestamps are for the recent DST change in Europe.

The stack trace is:

Traceback (most recent call last):
  File "./t.py", line 7, in <module>
    dfo = df.groupby(pd.TimeGrouper('5min'))
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3984, in groupby
    **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1501, in groupby
    return klass(obj, by, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 370, in __init__
    mutated=self.mutated)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2382, in _get_grouper
    binner, grouper, obj = key._get_grouper(obj)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1062, in _get_grouper
    r._set_binner()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 237, in _set_binner
    self.binner, self.grouper = self._get_binner()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 245, in _get_binner
    binner, bins, binlabels = self._get_binner_for_time()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 660, in _get_binner_for_time
    return self.groupby._get_time_bins(self.ax)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1118, in _get_time_bins
    base=self.base)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1262, in _get_range_edges
    closed=closed, base=base)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tseries/resample.py", line 1326, in _adjust_dates_anchored
    return (Timestamp(fresult).tz_localize(first_tzinfo),
  File "pandas/tslib.pyx", line 621, in pandas.tslib.Timestamp.tz_localize (pandas/tslib.c:13694)
  File "pandas/tslib.pyx", line 4308, in pandas.tslib.tz_localize_to_utc (pandas/tslib.c:74816)
pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from Timestamp('2016-10-30 02:20:00'), try using the 'ambiguous' argument

Code works if the series does not include a DST change (e.g. one day earlier):

#!/usr/bin/env python
import pandas as pd
df=pd.DataFrame([1477700580,1477704180],columns=['ts'])
df['date']=pd.to_datetime(df.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid')
df.set_index('date', inplace=True)

dfo = df.groupby(pd.TimeGrouper('5min'))

print dfo.count()

gets:

                           ts
date                         
2016-10-29 02:20:00+02:00   1
2016-10-29 02:25:00+02:00   0
2016-10-29 02:30:00+02:00   0
2016-10-29 02:35:00+02:00   0
2016-10-29 02:40:00+02:00   0
2016-10-29 02:45:00+02:00   0
2016-10-29 02:50:00+02:00   0
2016-10-29 02:55:00+02:00   0
2016-10-29 03:00:00+02:00   0
2016-10-29 03:05:00+02:00   0
2016-10-29 03:10:00+02:00   0
2016-10-29 03:15:00+02:00   0
2016-10-29 03:20:00+02:00   1
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Nov 17, 2016

Contributor

xref #10668 (though this looks separate).

yeah, prob need to specify ambiguous when creating the bins. a pull-request to fix would make the fix happen sooner.

Contributor

jreback commented Nov 17, 2016

xref #10668 (though this looks separate).

yeah, prob need to specify ambiguous when creating the bins. a pull-request to fix would make the fix happen sooner.

@jreback jreback added this to the Next Major Release milestone Nov 17, 2016

@j-santander

This comment has been minimized.

Show comment
Hide comment
@j-santander

j-santander Nov 17, 2016

I've been trying to debug the above issue.

Tried adding the ambiguous keyword to the constructor of the Timestamps... but I wasn't sure how to set it (as infer) didn't seem to be a valid option.

The code raising the exception seems to have been modified with commit dcc68d7 where the _adjust_dates_anchored() function at pandas.tseries.resample module first drops the tz information at the beginning of the function and then adds it back on the return statement.

I've modified the code to not do that... but then I had to modify an assert at pandas.tseries.index.py that it is checking for equality of time zones... but it turns that Europe/Madrid on DST is considered different from Europe/Madrid not on DST.

I'll try to create a pull request with my changes so that you can comment.

I've been trying to debug the above issue.

Tried adding the ambiguous keyword to the constructor of the Timestamps... but I wasn't sure how to set it (as infer) didn't seem to be a valid option.

The code raising the exception seems to have been modified with commit dcc68d7 where the _adjust_dates_anchored() function at pandas.tseries.resample module first drops the tz information at the beginning of the function and then adds it back on the return statement.

I've modified the code to not do that... but then I had to modify an assert at pandas.tseries.index.py that it is checking for equality of time zones... but it turns that Europe/Madrid on DST is considered different from Europe/Madrid not on DST.

I'll try to create a pull request with my changes so that you can comment.

@jreback jreback modified the milestones: 0.19.2, Next Major Release Nov 21, 2016

@jreback jreback closed this in 9f2e453 Nov 22, 2016

amolkahat added a commit to amolkahat/pandas that referenced this issue Nov 26, 2016

BUG: Avoid AmbiguousTimeError on groupby
closes #14682

Author: Julian Santander <julian.santander@nokia.com>
Author: Julian Santander <jsantander2@gmail.com>

Closes #14683 from j-santander/master and squashes the following commits:

d90afaf [Julian Santander] Addressing additional code inspection comments
817ed97 [Julian Santander] Addressing code inspections comments
99a5367 [Julian Santander] Fix unittest error and lint warning
940fb22 [Julian Santander] Avoid AmbiguousTimeError on groupby

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this issue Dec 14, 2016

[Backport #14683] BUG: Avoid AmbiguousTimeError on groupby
closes #14682

Author: Julian Santander <julian.santander@nokia.com>
Author: Julian Santander <jsantander2@gmail.com>

Closes #14683 from j-santander/master and squashes the following commits:

d90afaf [Julian Santander] Addressing additional code inspection comments
817ed97 [Julian Santander] Addressing code inspections comments
99a5367 [Julian Santander] Fix unittest error and lint warning
940fb22 [Julian Santander] Avoid AmbiguousTimeError on groupby

(cherry picked from commit 9f2e453)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment