Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: resample with PeriodIndex #58021

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@mroeschke mroeschke added Period Period data type Resample resample method Clean labels Mar 27, 2024
@mroeschke
Copy link
Member

Based on #57033 and #56588 should we consider waiting on this deprecation?

@jbrockmendel
Copy link
Member Author

I dont think #56588 is relevant, will take a closer look at #57033

@ChadFulton
Copy link

In line with my comments on those issues, I think it would be great if we could slow down these deprecations. There are several people who are interested in this issue - including me - and so I wonder if we could step back and see if there is now a path forward to trying to fix period resampling rather that removing it (and without just putting the burden on you to fix it).

I would like to start getting in a place to help by understanding more about what @jbrockmendel was referring to by the downsampler being very broken.

For example, in #53481 it is pointed out that nunique gives wrong answers when doing a downsampling (or even when passing the same frequency):

pi = pd.period_range(start="2000", periods=3, freq="M")
ser= pd.Series(["foo", "bar", "baz"], index=pi)
rs = ser.resample("M")  # should be a no-op!

>>> rs.nunique()
2000-01    foo
2000-02    bar
2000-03    baz
Freq: M, dtype: object

but nonetheless the actual resample operation works as I expect it to:

>>> rs.first()
2000-01    a
2000-02    b
2000-03    c
Freq: M, dtype: object

So I was thinking it would be helpful if we could start by collecting the cases where the resampler is not acting correctly, and then define what the behavior ought to be. As we do this, I can try to get up to speed with the current resampling code to help add some person-hours to fixing the problems, rather than removing the feature.

@ChadFulton
Copy link

So I was thinking it would be helpful if we could start by collecting the cases where the resampler is not acting correctly, and then define what the behavior ought to be. As we do this, I can try to get up to speed with the current resampling code to help add some person-hours to fixing the problems, rather than removing the feature.

To be a little more explicit here, I use period indexes and resampling pretty frequently, and I do not recall a case where the resampling operation produced incorrect results. As a result, I am having a hard time getting a handle on what is so broken and therefore needs to be fixed, and I think that seeing the examples of incorrect behavior will help me "dig in" to help fix the code.

@jbrockmendel
Copy link
Member Author

I'm totally fine on holding off on enforcing this deprecation until we get resolution in the linked discussions. LMK if you have a preference over where to continue discussion.

I think that seeing the examples of incorrect behavior

Easiest to point you toward xfailed tests test_resample_nat_index_series (reason="Don't know why this fails") and test_monthly_convention_span (reason="Commented out for more than 3 years. Should this work?")

I suspect that the problem is in PeriodResampler._downsample where we return self.asfreq() in several cases, discarding "how" and **kwargs that looks really sketchy.

Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Period Period data type Resample resample method Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants