-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support freq
in DatetimeIndex
#14593
Support freq
in DatetimeIndex
#14593
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests to this PR?
@@ -2142,6 +2141,8 @@ def __init__( | |||
if yearfirst is not False: | |||
raise NotImplementedError("yearfirst == True is not yet supported") | |||
|
|||
self._freq = _validate_freq(freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking on adding freq
support before, I found some APIs manipulate freq
(to new values) and return new results. (I vaguely remember..but I think that happens in binops?) Should we add a TODO comment here that this is not fully functional yet and freq
support needs to be added in rest of the code-base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although maybe the default behaviour could be for DatetimeIndex
to infer freq
from its values. Then this should just work.
Also, we should probably only do that in compatibility mode for perf reasons.
…hwina/cudf into support-freq-in-datetime-index
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
} | ||
) | ||
), | ||
reason="Nanosecond offsets being dropped by pandas, which is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this better solved by fixing the condition on the parameter, which should be "pandas < 2.0"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to do that but it happens only for a few parameter combinations and we currently xpass/xfail strictly. That's the reason for the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we have two diverging approaches at the same place but I plan on dropping these in pandas-2.0 feature branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. We can clean it up later.
@@ -463,13 +463,19 @@ class DateOffset: | |||
} | |||
|
|||
_CODES_TO_UNITS = { | |||
"N": "nanoseconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some vague recollection that we left these out on purpose... hmm. I think there was some pandas behavior for which "L"
and "ms"
were okay but "N"
, "U"
, "T"
, etc. were not supported. We'd probably be able to tell if there are any newly failing pandas tests? I'd just check to see where _CODES_TO_UNITS
is used and if there are any inconsistencies with this across different APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a bunch of failing tests without these changes, adding these units passed the cudf pytests.
There is only slight increase in pandas-pytest failures:
# This PR:
= 12094 failed, 174794 passed, 3850 skipped, 3314 xfailed, 8 xpassed, 21406 warnings, 102 errors in 1516.39s (0:25:16) =
# `branch-24.02`:
= 11607 failed, 175286 passed, 3849 skipped, 3312 xfailed, 11 xpassed, 21414 warnings, 97 errors in 1493.35s (0:24:53) =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving with a few final comments.
@@ -463,13 +463,19 @@ class DateOffset: | |||
} | |||
|
|||
_CODES_TO_UNITS = { | |||
"N": "nanoseconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks for checking.
} | ||
) | ||
), | ||
reason="Nanosecond offsets being dropped by pandas, which is " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. We can clean it up later.
Co-authored-by: Bradley Dice <bdice@bradleydice.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by some of the validation steps.
…hwina/cudf into support-freq-in-datetime-index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the repr -- then this is good from my side.
/merge |
When a
DatetimeIndex
has a fixed frequency offset, pandas defaults to it having a.freq
attribute. Because we don't support that, we raise in pandas compatible mode.Thus, working with datetimes is practically impossible in pandas compatible mode because so many datetime operations involve setting a datetime column as an index (resample, groupby).
This PR adds rudimentary support for the
freq
attribute.