Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
BUG: timezone lost in groupby-agg with cython functions #15426
Comments
|
Could you try it out on a more recent version of pandas, or add a copy-pastable example so that someone else can check? Might have been fixed already. |
|
further use |
|
Actually, here's a repro: In [63]: ts = pd.Series(pd.date_range('2016', periods=12, freq='H').tz_localize("UTC").tz_convert("US/Eastern"))
In [64]: ts
Out[64]:
0 2015-12-31 19:00:00-05:00
1 2015-12-31 20:00:00-05:00
2 2015-12-31 21:00:00-05:00
3 2015-12-31 22:00:00-05:00
4 2015-12-31 23:00:00-05:00
...
7 2016-01-01 02:00:00-05:00
8 2016-01-01 03:00:00-05:00
9 2016-01-01 04:00:00-05:00
10 2016-01-01 05:00:00-05:00
11 2016-01-01 06:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [65]: ts.groupby(level=0).agg(np.min)
Out[65]:
0 2016-01-01 00:00:00-05:00
1 2016-01-01 01:00:00-05:00
2 2016-01-01 02:00:00-05:00
3 2016-01-01 03:00:00-05:00
4 2016-01-01 04:00:00-05:00
...
7 2016-01-01 07:00:00-05:00
8 2016-01-01 08:00:00-05:00
9 2016-01-01 09:00:00-05:00
10 2016-01-01 10:00:00-05:00
11 2016-01-01 11:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [66]: ts.groupby(level=0).min()
Out[66]:
0 2016-01-01 00:00:00-05:00
1 2016-01-01 01:00:00-05:00
2 2016-01-01 02:00:00-05:00
3 2016-01-01 03:00:00-05:00
4 2016-01-01 04:00:00-05:00
...
7 2016-01-01 07:00:00-05:00
8 2016-01-01 08:00:00-05:00
9 2016-01-01 09:00:00-05:00
10 2016-01-01 10:00:00-05:00
11 2016-01-01 11:00:00-05:00
dtype: datetime64[ns, US/Eastern]
my thought too, but |
|
I think the expected output there is identical to the input (since the index is already unique). |
TomAugspurger
added Bug Groupby Timezones
labels
Feb 16, 2017
|
|
dupe of this: #10668 though I like this example. |
jreback
closed this
Feb 16, 2017
jreback
added the
Duplicate
label
Feb 16, 2017
jreback
added this to the
No action
milestone
Feb 16, 2017
|
actually, let's leave this one open instead. |
jreback
reopened this
Feb 16, 2017
jreback
removed the
Duplicate
label
Feb 16, 2017
jreback
modified the milestone: 0.20.0, No action
Feb 16, 2017
jreback
referenced
this issue
Feb 16, 2017
Closed
Taking first row from each group in groupby sometimes strips tzinfo #10668
jreback
changed the title from
Odd timezone behavior using groupby/agg in pandas to BUG: timezone lost in groupby-agg with cython functions
Feb 16, 2017
jreback
added Difficulty Intermediate Effort Medium
labels
Feb 16, 2017
|
@munierSalem if you'd like to debug would be great! The groupby tz support is a bit buggy. Basically since these are converted to i8 undert the hood to actually do the operations, need to:
roughtly here: |
munierSalem
commented
Feb 16, 2017
|
@jreback I can fix in my local repo, but I'll need to wait to do so from home to push back ... working behind a draconian corporate firewall :( |
|
sure np |
stephenrauch
referenced
this issue
Feb 16, 2017
Closed
BUG: GH15426 timezone lost in groupby-agg with cython functions #15433
jreback
closed this
in 6c17f67
Feb 27, 2017
AnkurDedania
added a commit
to AnkurDedania/pandas
that referenced
this issue
Mar 21, 2017
|
|
stephenrauch + AnkurDedania |
8d90d6c
|
munierSalem commentedFeb 16, 2017
•
edited by jreback
xref #10668 (for more examples)
Hello!
I'm running into some odd behavior trying to group rows of a pandas dataframe by ID and then selecting out max/min datetimes (w/ timezones). This is with python 2.7, pandas 0.18.1 and numpy 1.11.1 (I saw in earlier posts a similar problem was apparently fixed w/ pandas 0.15).
Specifically, if I try:
print orders.groupby('OrderID')['start_time'].agg(np.min).iloc[:5]I get:
Where the raw data had times closer to 8 am (US/Eastern). In other words, it reverted back to UTC times, even though it says it's eastern times, and has UTC-4 offset.
But if I instead try:
print orders.groupby('OrderID')['start_time'].agg(lambda x: np.min(x)).iloc[:5]I now get:
Which is the behavior I intended. This second method is vastly slower, and I would have assumed the two approaches would yield identical results ...