Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Speed up Period construction #50149

Merged
merged 12 commits into from
Jan 18, 2023
Merged

Conversation

lithomas1
Copy link
Member

@lithomas1 lithomas1 commented Dec 9, 2022

ASVs.

       before           after         ratio
     [1d5ce5b3]       [b47aa45a]
     <main>           <period-speedup>
-     8.18±0.06ms       5.50±0.1ms     0.67  period.PeriodIndexConstructor.time_from_ints_daily('D', False)
-     8.06±0.06ms       5.36±0.1ms     0.67  period.PeriodIndexConstructor.time_from_ints_daily('D', True)
-     8.94±0.08ms      5.31±0.04ms     0.59  period.PeriodIndexConstructor.time_from_ints('D', False)
-     8.77±0.09ms       5.20±0.1ms     0.59  period.PeriodIndexConstructor.time_from_ints('D', True)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

parse_time_string already calls string_to_dts(parse_iso_8601_datetime), which I think is the only function that can currently parse nanoseconds(dateutil can't do it IIRC).

@lithomas1 lithomas1 added Performance Memory or execution speed performance Period Period data type labels Dec 9, 2022
if dt is NaT:
ordinal = NPY_NAT
# Doesn't matter what this is, we just need to have it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assigning a random value here would it make more sense to just not have it fail later on?

@@ -421,6 +421,10 @@ cdef parse_datetime_string_with_reso(
parsed = datetime(
dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us
)
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this impact the period construction? Feels a bit strange to re-assign this variable

if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns:
# No picoseconds, so no nanoseconds.
# Have to have seen microseconds, in order to have "seen" nanoseconds
out_bestunit = NPY_DATETIMEUNIT.NPY_FR_us
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think doing this will mess up Timestamp inference in a follow-up to #49737

Copy link
Member Author

@lithomas1 lithomas1 Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I only added this block because somehow CI was failing tests only on Linux(maybe a locale problem).
(Issue was that datetime object was returned, when reso was nanosecond, and datetimes don't have nanosecond support).

if dts.ps != 0 or out_local:
seems buggy.

IMO, we should be checking out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns. don't know if that will break anything, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated this block now to checking the bestunit.

@jbrockmendel @WillAyd Would this work fine?

@@ -386,7 +386,7 @@ cdef parse_datetime_string_with_reso(
&out_tzoffset, False
)
if not string_to_dts_failed:
if dts.ps != 0 or out_local:
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns or out_local:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns is handled here, should it be removed from the dict on L399-L409?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, both blocks go through the dict to get the string representation of the reso.

This is just to force the returned datetime object to be Timestamp, which has a nanosecond attribute.

@jbrockmendel
Copy link
Member

are the improvements in parsing.pyx independent of the improvements in period.pyx?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ex @jbrockmendel comments. Nice work!

@jbrockmendel
Copy link
Member

needs rebase, otherwise LGTM

@lithomas1
Copy link
Member Author

Thanks for the reviews.

This PR depends on #50417, though, so need someone to review/re-review there.

@jbrockmendel
Copy link
Member

ping on green

@lithomas1 lithomas1 merged commit d2d1797 into pandas-dev:main Jan 18, 2023
@lithomas1 lithomas1 deleted the period-speedup branch January 18, 2023 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Period constructor parses input twice
3 participants