Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Speed up Period construction #50149

Merged
merged 12 commits into from
Jan 18, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -865,6 +865,7 @@ Performance improvements
- Performance improvement in :func:`merge` when not merging on the index - the new index will now be :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`49478`)
- Performance improvement in :meth:`DataFrame.to_dict` and :meth:`Series.to_dict` when using any non-object dtypes (:issue:`46470`)
- Performance improvement in :func:`read_html` when there are multiple tables (:issue:`49929`)
- Performance improvement in :class:`Period` constructor when constructing from a string or integer (:issue:`38312`)
- Performance improvement in :func:`to_datetime` when using ``'%Y%m%d'`` format (:issue:`17410`)
- Performance improvement in :func:`to_datetime` when format is given or can be inferred (:issue:`50465`)
- Performance improvement in :func:`read_csv` when passing :func:`to_datetime` lambda-function to ``date_parser`` and inputs have mixed timezone offsetes (:issue:`35296`)
Expand Down
10 changes: 6 additions & 4 deletions pandas/_libs/tslibs/parsing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -377,7 +377,11 @@ def parse_datetime_string_with_reso(
&out_tzoffset, False
)
if not string_to_dts_failed:
if out_bestunit == NPY_DATETIMEUNIT.NPY_FR_ns or out_local:
timestamp_units = {NPY_DATETIMEUNIT.NPY_FR_ns,
NPY_DATETIMEUNIT.NPY_FR_ps,
NPY_DATETIMEUNIT.NPY_FR_fs,
NPY_DATETIMEUNIT.NPY_FR_as}
if out_bestunit in timestamp_units or out_local:
# TODO: the not-out_local case we could do without Timestamp;
# avoid circular import
from pandas import Timestamp
Expand All @@ -389,9 +393,7 @@ def parse_datetime_string_with_reso(
# Match Timestamp and drop picoseconds, femtoseconds, attoseconds
# The new resolution will just be nano
# GH 50417
if out_bestunit in {NPY_DATETIMEUNIT.NPY_FR_ps,
NPY_DATETIMEUNIT.NPY_FR_fs,
NPY_DATETIMEUNIT.NPY_FR_as}:
if out_bestunit in timestamp_units:
out_bestunit = NPY_DATETIMEUNIT.NPY_FR_ns
reso = {
NPY_DATETIMEUNIT.NPY_FR_Y: "year",
Expand Down
13 changes: 4 additions & 9 deletions pandas/_libs/tslibs/period.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -2592,18 +2592,13 @@ class Period(_Period):

freqstr = freq.rule_code if freq is not None else None
dt, reso = parse_datetime_string_with_reso(value, freqstr)
try:
ts = Timestamp(value)
except ValueError:
nanosecond = 0
else:
nanosecond = ts.nanosecond
if nanosecond != 0:
reso = "nanosecond"
if reso == "nanosecond":
nanosecond = dt.nanosecond
if dt is NaT:
ordinal = NPY_NAT

if freq is None:
if freq is None and ordinal != NPY_NAT:
# Skip NaT, since it doesn't have a resolution
try:
freq = attrname_to_abbrevs[reso]
except KeyError:
Expand Down