You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a long clock-change day in Cuba, e.g 2018-11-04, midnight local time is an ambiguous timestamp. pd.Grouper does not handle this as I expect. More precisely the call to groupby in the code above raises an AmbiguousTimeError.
This issue is of a similar nature to #23742 but it seems #23742 was fixed in 0.24 whereas this was not.
Expected Output
The call to groupby should return three groups (one for each day, 3rd, 4th, and 5th of november). The group for the 4th of november should be labelled as '2018-11-04 00:00:00-04:00' (that is the first midnight, before the clock change) and it should contain the 25 hourly data points for this day.
Thanks for the follow up, and sorry about the late reply.
@mroeschke I don't know if we need to expose these new keywords in the API, as I don't think the meaning of freq='1D' is ambiguous here.
The "calendar day" in Cuba on this date starts at 2018-11-04 00:00:00-04:00 (and this day lasts 25 hours). This means that of the two possible behaviours you suggest, only the first seems correct to me.
If for some reason someone wants to aggregate the data between 2018-11-04 00:00:00-04:00 and 2018-11-04 00:00:00-05:00 with the data from the previous day (November 3rd), they would have to write a custom grouper (but I would be curious to know why exactly someone would want to do this as those 60 minutes unambiguously belong to November 4th in Cuba).
Code Sample
Problem description
On a long clock-change day in Cuba, e.g 2018-11-04, midnight local time is an ambiguous timestamp. pd.Grouper does not handle this as I expect. More precisely the call to
groupby
in the code above raises anAmbiguousTimeError
.This issue is of a similar nature to #23742 but it seems #23742 was fixed in 0.24 whereas this was not.
Expected Output
The call to
groupby
should return three groups (one for each day, 3rd, 4th, and 5th of november). The group for the 4th of november should be labelled as '2018-11-04 00:00:00-04:00' (that is the first midnight, before the clock change) and it should contain the 25 hourly data points for this day.Output of
pd.show_versions()
pandas: 0.24.2
pytest: 3.3.2
pip: None
setuptools: 40.6.3
Cython: 0.29.6
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: