Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/TST: fix and test for timezone drop in GroupBy.shift/bfill/ffill #27992

Merged
merged 7 commits into from Sep 9, 2019

Conversation

@noritada
Copy link
Contributor

noritada commented Aug 18, 2019

closes #19995

Timezone info is dropped in GroupBy.shift, bfill, and ffill
since index calculated by Cythonized functions are applied to
NumPy representation of values without timezone.

Could you please find and review this fix?

Many thanks,

-noritada

In [1]: import pandas as pd 
   ...: pd.__version__ 
Out[1]: '0.24.2'

In [2]: df = pd.DataFrame([ 
   ...:     ['a', pd.Timestamp('2019-01-01 00:00:00+09:00')], 
   ...:     ['b', pd.Timestamp('2019-01-01 00:05:00+09:00')], 
   ...:     ['a', None], 
   ...:     ['b', None], 
   ...:     ['a', pd.Timestamp('2019-01-01 00:20:00+09:00')], 
   ...:     ['b', pd.Timestamp('2019-01-01 00:25:00+09:00')], 
   ...: ], columns=['id', 'time']) 

In [3]: df['time'].shift(1) 
Out[3]: 
0                         NaT
1   2019-01-01 00:00:00+09:00
2   2019-01-01 00:05:00+09:00
3                         NaT
4                         NaT
5   2019-01-01 00:20:00+09:00
Name: time, dtype: datetime64[ns, pytz.FixedOffset(540)]

In [4]: df.groupby('id')['time'].shift(1) 
Out[4]: 
0                   NaT
1                   NaT
2   2018-12-31 15:00:00
3   2018-12-31 15:05:00
4                   NaT
5                   NaT
Name: time, dtype: datetime64[ns]

In [5]: df.groupby('id')['time'].bfill() 
Out[5]: 
0   2018-12-31 15:00:00
1   2018-12-31 15:05:00
2   2018-12-31 15:20:00
3   2018-12-31 15:25:00
4   2018-12-31 15:20:00
5   2018-12-31 15:25:00
Name: time, dtype: datetime64[ns]

In [6]: df.groupby('id')['time'].ffill() 
Out[6]: 
0   2018-12-31 15:00:00
1   2018-12-31 15:05:00
2   2018-12-31 15:00:00
3   2018-12-31 15:05:00
4   2018-12-31 15:20:00
5   2018-12-31 15:25:00
Name: time, dtype: datetime64[ns]
  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
@@ -120,6 +120,7 @@ Groupby/resample/rolling
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` where applying a timezone conversion lambda function would drop timezone information (:issue:`27496`)
- Bug in windowing over read-only arrays (:issue:`27766`)
- Fixed segfault in `pandas.core.groupby.DataFrameGroupBy.quantile` when an invalid quantile was passed (:issue:`27470`)
- Bug in :meth:`pandas.core.groupby.GroupBy.shift`, :meth:`pandas.core.groupby.GroupBy.bfill` and :meth:`pandas.core.groupby.GroupBy.ffill` where timezone information would be dropped (:issue:`xxxxx`)

This comment has been minimized.

Copy link
@mroeschke

mroeschke Aug 18, 2019

Member

If there is not an existing issue for this bug, use the number of this PR.

),
],
)
def test_shift_bfill_ffill_tz(data, expected_shift, expected_bfill, expected_ffill):

This comment has been minimized.

Copy link
@mroeschke

mroeschke Aug 18, 2019

Member

Can you put this test in test_groupby.py

Also you can parameterize over shift, bfill, and ffill

@noritada

This comment has been minimized.

Copy link
Contributor Author

noritada commented Aug 19, 2019

@mroeschke
Thank you very much for your reviews.
I've updated as per your suggestions.
Could you please check again?

Many thanks,

-noritada

Copy link
Member

mroeschke left a comment

Very nice. One small comment otherwise LGTM.

@@ -1882,3 +1882,69 @@ def test_groupby_axis_1(group_name):
results = df.groupby(group_name, axis=1).sum()
expected = df.T.groupby(group_name).sum().T
assert_frame_equal(results, expected)


@pytest.mark.parametrize("tz", [None, "Asia/Tokyo"])

This comment has been minimized.

Copy link
@mroeschke

mroeschke Aug 19, 2019

Member

We have a tz_naive_fixture you can use in the function signature that will parameterize over more timezones.

def test_shift_bfill_ffill_tz(tz_naive_fixture, op, expected):
    tz = tz_naive_fixture
    ...
@noritada

This comment has been minimized.

Copy link
Contributor Author

noritada commented Aug 19, 2019

@mroeschke
Thank you so much again.
I should have found this function.
Test code updated again.

Many thanks,

-noritada

@@ -2220,26 +2220,28 @@ def _get_cythonized_result(
base_func = getattr(libgroupby, how)

for name, obj in self._iterate_slices():
values = obj._data._values

This comment has been minimized.

Copy link
@WillAyd

WillAyd Aug 23, 2019

Member

Hmm I'm not sure relying on underlying block values is the best way to go about this. Is it possible to just work with the obj here instead?

This comment has been minimized.

Copy link
@noritada

noritada Sep 9, 2019

Author Contributor

Sorry for my lack of understanding about underlying block values.
Should I leave the code using obj.values except for the last result = algorithms.take_nd(obj.values, result)?

@WillAyd

This comment has been minimized.

Copy link
Member

WillAyd commented Aug 23, 2019

Can you update whatsnew for 0.25.2 as well?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

TomAugspurger commented Aug 23, 2019

Copy link
Contributor

jreback left a comment

looks good. can you do an issue search to see if we have anything about this (IIRC we do). if so, pls reference these issues.

@@ -120,6 +120,7 @@ Groupby/resample/rolling
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` where applying a timezone conversion lambda function would drop timezone information (:issue:`27496`)
- Bug in windowing over read-only arrays (:issue:`27766`)
- Fixed segfault in `pandas.core.groupby.DataFrameGroupBy.quantile` when an invalid quantile was passed (:issue:`27470`)
- Bug in :meth:`pandas.core.groupby.GroupBy.shift`, :meth:`pandas.core.groupby.GroupBy.bfill` and :meth:`pandas.core.groupby.GroupBy.ffill` where timezone information would be dropped (:issue:`27992`)

This comment has been minimized.

Copy link
@jreback

jreback Sep 8, 2019

Contributor

can you move to 0.25.2

@jreback jreback added this to the 0.25.2 milestone Sep 8, 2019
@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Sep 8, 2019

also pls merge master

@noritada

This comment has been minimized.

Copy link
Contributor Author

noritada commented Sep 9, 2019

@WillAyd @TomAugspurger @jreback
Thanks for comments and sorry to be absent for 3 weeks.
I've merged master, move a whatsnew entry to v0.25.2 and document #19995 .

Could you please check this again?

Many thanks,

-noritada

@jreback
jreback approved these changes Sep 9, 2019
@jreback jreback merged commit e0c63b4 into pandas-dev:master Sep 9, 2019
13 checks passed
13 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
pandas-dev.pandas Build #20190909.8 succeeded
Details
pandas-dev.pandas (Checks) Checks succeeded
Details
pandas-dev.pandas (Docs) Docs succeeded
Details
pandas-dev.pandas (Linux py35_compat) Linux py35_compat succeeded
Details
pandas-dev.pandas (Linux py36_32bit) Linux py36_32bit succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow) Linux py36_locale_slow succeeded
Details
pandas-dev.pandas (Linux py36_locale_slow_old_np) Linux py36_locale_slow_old_np succeeded
Details
pandas-dev.pandas (Linux py37_locale) Linux py37_locale succeeded
Details
pandas-dev.pandas (Linux py37_np_dev) Linux py37_np_dev succeeded
Details
pandas-dev.pandas (Windows py36_np15) Windows py36_np15 succeeded
Details
pandas-dev.pandas (Windows py37_np141) Windows py37_np141 succeeded
Details
pandas-dev.pandas (macOS py35_macos) macOS py35_macos succeeded
Details
@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Sep 9, 2019

thanks @noritada

@meeseeksdev

This comment has been minimized.

Copy link

meeseeksdev bot commented Sep 9, 2019

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 0.25.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 e0c63b4cfaa821dfe310f4a8a1f84929ced5f5bd
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #27992: BUG/TST: fix and test for timezone drop in GroupBy.shift/bfill/ffill'
  1. Push to a named branch :
git push YOURFORK 0.25.x:auto-backport-of-pr-27992-on-0.25.x
  1. Create a PR against branch 0.25.x, I would have named this PR:

"Backport PR #27992 on branch 0.25.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

TomAugspurger commented Sep 9, 2019

@jreback I think the plan was for 0.25.2 to just be Python 3.8 compat and 0.25 regressions: #28110. I don't believe this was a regression in 0.25.

@jreback

This comment has been minimized.

Copy link
Contributor

jreback commented Sep 9, 2019

ok sure np

alright then prob just need to move the actual release note

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.