Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: DatetimeIndex/TimedeltaIndex constructor keywords #55499

Open
jbrockmendel opened this issue Oct 12, 2023 · 5 comments
Open

DEPR: DatetimeIndex/TimedeltaIndex constructor keywords #55499

jbrockmendel opened this issue Oct 12, 2023 · 5 comments
Labels
API Design Datetime Datetime data dtype Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 12, 2023

DatetimeIndex.__new__ has constructor keywords (excluding ones present in the base class and ones already deprecated) freq, tz, ambiguous, dayfirst, yearfirst. dayfirst and yearfirst overlap with the pd.to_datetime keywords.

Could we deprecate some/all of these keywords and point users to pd.to_datetime instead?

  • freq we could either exempt or make a post-construction thing by making _with_freq public.
  • to_datetime has utc instead of tz. I generally like tz better than utc as a keyword, but deprecating utc might be painful. We could add tz to to_datetime and disallow passing tz-and-utc.
  • Adding ambiguous to to_datetime wouldn't be too bad aside from general keyword fatigue.

(similarly TimedeltaIndex has unit and freq, unit being present in pd.to_timedelta)

cc @MarcoGorelli @mroeschke @jorisvandenbossche

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2023
@lithomas1 lithomas1 added Datetime Datetime data dtype API Design Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2023
@MarcoGorelli
Copy link
Member

sounds good, would favour moving towards to_datetime as being the way to parse into datetime

@mroeschke
Copy link
Member

Agreed, would be simpler if DTI/TDI just accepted correctly typed arrays

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 20, 2023

See some related notes in #50894 and #50411 (comment), where it is also brought to let DatetimeIndex only accept ISO8601 strings, and the first issue also has some notes on differences in keywords between DatetimeIndex and to_datetime.

There is the more general question about what to support as input (only correctly typed arrays, still strings but only ISO formatted, anything that works with the Series/Index constructor when specifying the dtype, ..), although I think that for most of the keywords that's actually not that relevant.

I think dayfirst and yearfirst are solely related to string parsing, and we generally agree that users should use to_datetime for that if they want fine-grained control, so I agree those can certainly be removed.

freq we could either exempt or make a post-construction thing by making _with_freq public.

To me this feels like the one keyword that actually makes sense to keep in the DatetimeIndex constructor, since it is a DatetimeIndex specific attribute? (like you can also set name in the Index constructors)

to_datetime has utc instead of tz. I generally like tz better than utc as a keyword, but deprecating utc might be painful. We could add tz to to_datetime and disallow passing tz-and-utc.

I also like the tz keyword. From what I said in #50894 (comment), the reason that we went with utc=True/False in the past for to_datetime might be that a tz keyword can be a ambiguous on how to interpret the input data? (are it localized or utc values? i.e. do you do a tz_localize or tz_convert with the specified timezone?)

Adding ambiguous to to_datetime wouldn't be too bad aside from general keyword fatigue.

I think ambiguous is purely linked to tz (since that is essentially needed for the tz_localize logic), so I think the fate of this keyword can depend on what we decide for tz

@jbrockmendel
Copy link
Member Author

the reason that we went with utc=True/False in the past for to_datetime might be that a tz keyword can be a ambiguous on how to interpret the input data?

I've looked into this and determined the following:

In the DatetimeIndex constructor ambiguous is only used in cases where 1) the data is dt64-naive or 2) the data is object and array_to_datetime does not infer a tz. In both of these cases, DatetimeIndex(data, tz=tz, ambiguous=ambig) is equivalent to DatetimeIndex(data).tz_localize(tz, ambiguous=ambig) and also to pd.to_datetime(data).tz_localize(tz, ambiguous=ambig).

We could push ambiguous into the cython array_to_datetime_with_tz, which would allow us to respect ambiguous in some mixed-type data cases, but those cases would be pretty contrived. Moreover, we could only do this for a subset of the ambiguouss that we support, e.g. not "infer"` (allowing an arraylike would be possible but a PITA).

So i think we are safe:

  1. deprecating the ambiguous keyword in DatetimeIndex, telling users to call .tz_localize after construction instead
  2. Adding a tz keyword to to_datetime without a ambiguous keyword (mutually exclusive with utc, which we could deprecated)
  3. deprecating dayfirst, yearfirst in DatetimeIndex.

@jbrockmendel
Copy link
Member Author

Tried deprecating ambiguous, dayfirst, yearfirst in DatetimeIndex and hit an unexpected snag: when a freq is passed, we currently pass ambiguous to _validate_frequency, which appears to need it in test_resample_origin_with_day_freq_on_dst. Doesn't look insurmountable.

rapids-bot bot pushed a commit to rapidsai/cudf that referenced this issue Feb 8, 2024
For comparison:

pandas-dev/pandas#55856
pandas-dev/pandas#55895
pandas-dev/pandas#55499

The `errors="ignore"` parameter is the only one that is implemented so just added a test for that deprecation

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #14984
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Datetime Datetime data dtype Deprecate Functionality to remove in pandas Index Related to the Index class or subclasses
Projects
None yet
Development

No branches or pull requests

5 participants