Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Speed up Period construction #50149

Merged
merged 12 commits into from
Jan 18, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -680,6 +680,7 @@ Performance improvements
- Performance improvement in :func:`merge` when not merging on the index - the new index will now be :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49478`)
- Performance improvement in :meth:`DataFrame.to_dict` and :meth:`Series.to_dict` when using any non-object dtypes (:issue:`46470`)
- Performance improvement in :func:`read_html` when there are multiple tables (:issue:`49929`)
- Performance improvement in :class:`Period` constructor when constructing from a string or integer (:issue:`38312`)

.. ---------------------------------------------------------------------------
.. _whatsnew_200.bug_fixes:
Expand Down
4 changes: 4 additions & 0 deletions pandas/_libs/tslibs/parsing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,10 @@ cdef parse_datetime_string_with_reso(
parsed = datetime(
dts.year, dts.month, dts.day, dts.hour, dts.min, dts.sec, dts.us
)
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this impact the period construction? Feels a bit strange to re-assign this variable

# No picoseconds, so no nanoseconds.
# Have to have seen microseconds, in order to have "seen" nanoseconds
out_bestunit = NPY_DATETIMEUNIT.NPY_FR_us
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think doing this will mess up Timestamp inference in a follow-up to #49737

Copy link
Member Author

@lithomas1 lithomas1 Dec 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I only added this block because somehow CI was failing tests only on Linux(maybe a locale problem).
(Issue was that datetime object was returned, when reso was nanosecond, and datetimes don't have nanosecond support).

if dts.ps != 0 or out_local:
seems buggy.

IMO, we should be checking out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns. don't know if that will break anything, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated this block now to checking the bestunit.

@jbrockmendel @WillAyd Would this work fine?

reso = {
NPY_DATETIMEUNIT.NPY_FR_Y: "year",
NPY_DATETIMEUNIT.NPY_FR_M: "month",
Expand Down
15 changes: 7 additions & 8 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2574,16 +2574,15 @@ class Period(_Period):
value = str(value)
value = value.upper()
dt, reso = parse_time_string(value, freq)
try:
ts = Timestamp(value)
except ValueError:
nanosecond = 0
else:
nanosecond = ts.nanosecond
if nanosecond != 0:
reso = "nanosecond"
if reso == "nanosecond":
nanosecond = dt.nanosecond

if dt is NaT:
ordinal = NPY_NAT
# Doesn't matter what this is, we just need to have it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assigning a random value here would it make more sense to just not have it fail later on?

# so that we don't error in block below. We get converted
# to NaT later on anyways
reso = "nanosecond"

if freq is None:
try:
Expand Down