Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QST: Roadmap for deprecations of Period types #56588

Open
2 tasks done
ChadFulton opened this issue Dec 21, 2023 · 8 comments
Open
2 tasks done

QST: Roadmap for deprecations of Period types #56588

ChadFulton opened this issue Dec 21, 2023 · 8 comments
Labels
Needs Info Clarification about behavior needed to assess issue Period Period data type Usage Question

Comments

@ChadFulton
Copy link

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

NA

Question about pandas

Background

There have been several issues raised related to the Period types, such as:

And the latter deprecation of the business-day period has already been implemented.

This is of course related to the enhancement allowing non-ns units to datetimes, see e.g.:

Questions

  1. Given the fact that deprecations of Period types have already begun, it would be useful to understand what the expected roadmap for Period is. Is it expected that it will be removed as suggested in DEPR: Period, PeriodFoo #54235? Is there a plan for how to move forward on the "sticking points" listed there (especially the missing units: Week-with anchor, quarter, Year-with-anchor)?

  2. The current deprecation of business day periods in DEPR: Period[B] #53446 now requires bifurcation of code from Period to Timestamp in the special case of business days - if you are using Periods, you can't wholesale switch to Timestamps yet (at least if you have e.g. Quarters), but you also can no longer stick with just Periods.

Tentative request

To me, it would make sense to revert the deprecation of the BDay Period dtype until there is a more comprehensive roadmap for Period types and a path to make a more wholesale switch. But my apologies if I missed something fundamental here about why that deprecation is important in itself.

@ChadFulton ChadFulton added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Dec 21, 2023
@lithomas1
Copy link
Member

cc @jbrockmendel

@lithomas1 lithomas1 added the Period Period data type label Dec 21, 2023
@jbrockmendel
Copy link
Member

I don't think there is a clear roadmap for deprecating Period entirely; not much interest in #54235. We did recently deprecate support for PeriodIndex in resample.

@ChadFulton can you elaborate a bit on how the deprecation of Period[B] is a pain point?

@lithomas1 lithomas1 added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 22, 2023
@ChadFulton
Copy link
Author

Thanks @jbrockmendel. Currently I use Period objects almost exclusively, since for the economic data that I mostly work with, it doesn't make sense to have a time component. And I have to say that Pandas does a really excellent job of handling Periods, so a big thanks for all the work that goes into that.

So one practical example of how I use the information in Period: when plotting a line of data associated with a period, there are different conventions for where to place the mark, with the most common being start, end, and center. A toy example of this would be something like:

ix = pd.period_range(start='2000-01-01', end='2000-01-31', freq='B')
dta = pd.Series(np.arange(len(ix)), index=ix)

center = ix.start_time + (ix.end_time - ix.start_time) / 2
plt.plot(center, dta)

Overall, my hope is that we don't lose Periods, because I think that they represent a distinct and useful concept that is not captured by non-ns units for datetime. But I understand that it's a maintenance burden.

So more specifically here, my general feeling was just that because Period is not the same as "Timestamp + non-ns unit" (with one simple example being start_time and end_time attributes), it didn't necessarily make sense to me to remove the freq='B' Periods (at least in isolation, but also, I hope, at all 😊).

@jbrockmendel
Copy link
Member

Thanks for fleshing this out. The plotting example is useful (in fact ATM the dt64 plotting code currently converts to period internally, which we need to change before the Period[B] deprecation can be enforced. That is a non-trivial re-write that isn't likely to happen without funding, so may not happen for 3.0).

it didn't necessarily make sense to me to remove the freq='B' Periods (at least in isolation, but also, I hope, at all 😊)

"B" is an outlier in the Period code; getting rid of it will allow non-trivial (though not-huge) code simplifications. More importantly from my perspective, getting rid of it is a blocker to moving Period from using freq to unit, which I'm hopeful will put a stop to a long history of user confusion (see issues listed in #53446). So deprecating "B" is not necessarily a prelude to removing all Periods.

The immediate question is whether to revert that deprecation. Plotting is one use case that is inconvenienced here. Are there others?

(The rest here is not super-relevant, just things that came to mind while reading your comment)

because I think that they represent a distinct and useful concept that is not captured by non-ns units for datetime

This wording reminds me of years-old disagreement I used to have with jreback. My position was that for units where we have both, Timestamp[unit] is isomorphic to Period[unit]. IIRC the context for that disagreement was a request for a date dtype where I thought Period[D] handled the use case just fine. (Not sure if this paragraph is relevant to this discussion, but it came to mind)

with one simple example being start_time and end_time attributes

With the introduction of non-nano, there is now a question of what datetime64/Timestamp unit to return for Period properties. Keeping it as nanos is fine in most cases, but ATM that will raise for very-large Periods, which can be alleviated by returning a lower-resolution unit. This is a rare enough corner case that I haven't been interested in bothering with it.

@ChadFulton
Copy link
Author

Thanks again for your thoughts on this. Following up with a second specific use case that may help the conversation:

The immediate question is whether to revert that deprecation. Plotting is one use case that is inconvenienced here. Are there others?

Another use case that shows up as a problem is that Timestamp elements do not allow for a freq attribute and 'B' is not a valid unit. So there appears to me to be currently (i.e. after the deprecation) no way to identify a single object that is a 'B' frequency.

This leads to several secondary problems related to calling reset_index() or to_series() on a DatetimeIndex and losing the frequency:

index = pd.date_range('2000', periods=10, freq='B', name='B_index')
x = pd.DataFrame(0, columns=['a', 'b'], index=index)
print(x.index.freq)
y = x.reset_index().set_index('B_index')
print(y.index.freq)

yields

<BusinessDay>
None

while when using Periods:

index = pd.period_range('2000', periods=10, freq='B', name='B_index')
x = pd.DataFrame(0, columns=['a', 'b'], index=index)
print(x.index.freq)
y = x.reset_index().set_index('B_index')
print(y.index.freq)

yields

<BusinessDay>
<BusinessDay>

Of course this same behavior would happen with any frequency, not just 'B', but my point is just that with any other frequency I could simply use the PeriodIndex to ensure the frequency information is not lost. With this deprecation, that is no longer possible with 'B' frequencies.

@ChadFulton
Copy link
Author

Also, related to the eventual freq -> unit conversion, I would argue that until this is complete, deprecations like #9586 are also perhaps not ideal since they make the behavior of the freq attribute inconsistent:

For example:

import pandas as pd
ix = pd.date_range(start='2000', periods=10, freq='YE')
ix.to_period()
# ^ works and gives Y-DEC

pd.period_range(start='2000', periods=10, freq=ix.freq)
# ^ works and gives Y-DEC

pd.period_range(start='2000', periods=10, freq=ix.freqstr) 
# ^ raises ValueError: Invalid frequency: YE-DEC, failed to parse with error
#          message: ValueError("for Period, please use 'Y-DEC' instead of 'YE-DEC'")

@jerryqhyu
Copy link

Are period and period_index being deprecated or not? I've gone through the various issues on this topic and do not have a good idea on where the core devs stand on this issue. If I start a new project today should I avoid period?

@jbrockmendel
Copy link
Member

Period and PeriodIndex are not deprecated and unlikely to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue Period Period data type Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants