Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Make 'D' & offsets.Day always operate as calendar day instead of 24 Hour #41943

Open
2 tasks
mroeschke opened this issue Jun 11, 2021 · 4 comments · May be fixed by #55502
Open
2 tasks

API: Make 'D' & offsets.Day always operate as calendar day instead of 24 Hour #41943

mroeschke opened this issue Jun 11, 2021 · 4 comments · May be fixed by #55502
Labels
API Design Enhancement Frequency DateOffsets Needs Discussion Requires discussion from core team before further action
Milestone

Comments

@mroeschke
Copy link
Member

mroeschke commented Jun 11, 2021

From 2021-06-09 dev call, new discussion of #22864

Problem: Frequency string 'D' and pd.offsets.Day is defined to be a fixed 24 hour period since it's a subclass of Tick.

In the context of timezones with a DST crossing, 'D' acts as a calendar day (23/24/25H) instead for the following operations:

  • pd.date_range(start, end, freq="D")
  • df.resample("D")...

Original Settled Solution: Deprecate all behavior where Frequency string 'D' and pd.offsets.Day is a fixed 24 hours in favor of "24H". A private _Day offset would be used where appropriate internally and swapped out once the deprecation is enforced.

(Note: I lost steam last time catching all the warnings issued in the testing suite given the above solution touches datetimes, timedeltas, offsets, methods, etc)

Other Solutions

  1. Implement a new offset, e.g. "'DayDST'"/pd.offsets.CalendarDay", that users can migrate to.
  2. Deprecate Ticks (Day is a subclass) all together since they are redundant with Timedeltas

cc @pandas-dev/pandas-core

(@jbrockmendel) Updating with checkboxes to keep track of issues I think this would resolve:

@jbrockmendel
Copy link
Member

Giving this another shot, still have 563 failing tests on the current branch. Some points to figure out and/or decide:

  1. ATM multiplication/division with a Day can return a sub-day Tick. What do we do if Day ceases to be a Tick?

  2. What do we do when adding a Timedelta-like to a Day?

  3. Do we allow adding Day to TimedeltaArray/Index by treating it as 24H? If we do that might make dt64tz + day + tdi behavior ambiguous

  4. In Rolling._validate we do freq.nanos which raises if we make Day non-tick. Do we check for Day and convert it to 24H in this case? Or will that cause problems in tzaware cases?

  5. ATM Tick comparison considers Day(1) == Hour(24) and similar with other Tick subclasses. If Day is no longer a Tick, then Day(1) != Hour(24) and many existing tests break. Do we want to update the tests or patch __eq__. (For dev purposes ive patched __eq__ calls inside tm, am assuming this isnt a long-term soluton)

  6. when passing a freq to TimedeltaIndex, or timedelta_range, I think disallow Day objects but if users pass "D"-like strings we can call that close enough and cast that to 24H.

Will update this as I go.

@mroeschke
Copy link
Member Author

Not sure if #22864 (comment) provides any answers to the above

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Mar 13, 2023

What's the suggestion for when the result would fall on an ambiguous or non-existent timestamp?

Like

Timestamp('2015-03-28T2:30').tz_localize('Europe/Warsaw') + pd.tseries.offsets.Day(1)

Just raise?

@jbrockmendel
Copy link
Member

i think so, yes.

@jbrockmendel jbrockmendel linked a pull request Oct 12, 2023 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Frequency DateOffsets Needs Discussion Requires discussion from core team before further action
Projects
None yet
3 participants