PERF: significant speedups in tz-aware operations #24491

qwhelan · 2018-12-30T01:41:05Z

Operations involving tz-aware data currently incur a pretty substantial penalty:

[ 93.04%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 93.04%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.41±0.07ms
             **US/Eastern     150±2ms**
                  UTC       2.64±0.07ms
                tzutc()     2.70±0.06ms
              ============ =============

[ 93.19%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 93.19%] ··· ============ ============
               index_type
              ------------ ------------
                  dst          n/a
                repeated       n/a
              **tz_aware     305±7ms**
                tz_naive    3.38±0.2ms
              ============ ============

This PR improves the performance of tz-aware operations to near that of tz-naive ones through a couple approaches:

Eliminate a duplicative validation check by setting _freq directly
- freq.setter calls a validation check that compares the input value against to_offset(self.inferred_freq), which is exactly what we are passing
- For tz-aware data, .inferred_freq requires converting the entire array to the appropriate tz. The time to do so dominates the runtime for our benchmark. Simply eliminating 1 of 2 calls cuts runtime by 50%.
Improve the performance of pandas._libs.tslibs.conversion functions by batching searchsorted calls rather than doing piecewise.
- In theory, both approaches should be O(n log(k)). However, call overhead appears to be substantially larger than the actual search time for the N of our existing benchmark.
- The existing comment suggests operating piecewise is beneficial in the presence of lots of iNaTs, but this ignores the fact that searchsorted has some optimizations for when the needle array is also sorted (which allows it to incrementally shrink the search region).
Minor speedup due to minimizing is_tzlocal()

Here's the same comparison with the PR:

[ 26.38%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 26.38%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.31±0.06ms
               US/Eastern   3.70±0.08ms
                  UTC       2.49±0.03ms
                tzutc()      2.43±0.1ms
              ============ =============

[ 26.52%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 26.52%] ··· ============ =============
               index_type
              ------------ -------------
                  dst           n/a
                repeated        n/a
                tz_aware     4.01±0.2ms
                tz_naive    2.33±0.04ms
              ============ =============

And asv output:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
-      3.38±0.2ms      2.33±0.04ms     0.69  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      21.0±0.2ms       14.4±0.3ms     0.68  inference.DateInferOps.time_timedelta_plus_datetime
-        700±80ns         366±20ns     0.52  timestamp.TimestampProperties.time_dayofweek(<UTC>, 'B')
-         181±2ms       30.6±0.7ms     0.17  timeseries.DatetimeAccessor.time_dt_accessor_time('US/Eastern')
-        180±10ms       29.1±0.9ms     0.16  timeseries.DatetimeAccessor.time_dt_accessor_date('US/Eastern')
-         178±3ms       26.7±0.6ms     0.15  timeseries.DatetimeAccessor.time_dt_accessor_day_name('US/Eastern')
-         179±2ms       26.3±0.4ms     0.15  timeseries.DatetimeIndex.time_to_time('tz_aware')
-         172±4ms       24.8±0.7ms     0.14  timeseries.DatetimeAccessor.time_dt_accessor_month_name('US/Eastern')
-         175±5ms       24.1±0.9ms     0.14  timeseries.DatetimeIndex.time_to_date('tz_aware')
-         150±2ms      3.70±0.08ms     0.02  timeseries.DatetimeAccessor.time_dt_accessor_year('US/Eastern')
-        95.3±5ms      2.02±0.07ms     0.02  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('datetime', 'nonunique_monotonic_inc')
-         149±2ms      2.85±0.02ms     0.02  timeseries.DatetimeIndex.time_timeseries_is_month_start('tz_aware')
-         305±7ms       4.01±0.2ms     0.01  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        95.3±3ms          356±9μs     0.00  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
+         323±6ms          386±9ms     1.19  timeseries.ToDatetimeISO8601.time_iso8601_tz_spaceformat
+         165±2ms          191±4ms     1.15  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
+         151±5μs          170±2μs     1.13  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2018-12-30T02:17:24Z

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.31%   92.31%           
=======================================
  Files         166      166           
  Lines       52412    52412           
=======================================
  Hits        48382    48382           
  Misses       4030     4030

Flag	Coverage Δ
#multiple	`90.73% <100%> (ø)`	⬆️
#single	`43.05% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/timedeltas.py	`87.68% <100%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`97.74% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a97c0...6ffad7c. Read the comment docs.

codecov · 2018-12-30T02:17:25Z

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files         166      166           
  Lines       52478    52478           
=======================================
  Hits        48483    48483           
  Misses       3995     3995

Flag	Coverage Δ
#multiple	`90.81% <100%> (ø)`	⬆️
#single	`43.04% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/arrays/timedeltas.py	`87.99% <100%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`98.01% <100%> (ø)`	⬆️
pandas/core/internals/blocks.py	`94.49% <0%> (ø)`	⬆️
pandas/core/indexes/accessors.py	`91% <0%> (ø)`	⬆️
pandas/core/indexes/timedeltas.py	`90.22% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62506ca...360415c. Read the comment docs.

doc/source/whatsnew/v0.24.0.rst

pandas/_libs/tslibs/conversion.pyx

mroeschke · 2018-12-30T05:31:17Z

Overall LGTM. Could you also add tzlocal in the relevant benchmarks `that parameterize over timezones?

jbrockmendel · 2018-12-30T18:01:25Z

There are I think 7 places in tslibs where we do this iteration. Can they all be optimized?

jbrockmendel · 2018-12-30T18:09:31Z

Another perf improvement waiting in these files is calling searchsorted via cnp.PyArray_SearchSorted instead of arr.searchsorted, since the latter is a python-space call. Doing this is at least 3 kinds of a hassle so I never implemented it myself. Feel free to ping me if you're interested in giving it a go.

jreback · 2018-12-30T18:31:59Z

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

jreback

lgtm. ex comments about adding some additional benchamarks including tzlocal.

pandas/_libs/tslibs/conversion.pyx

jbrockmendel · 2018-12-30T18:48:09Z

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

jreback · 2018-12-30T18:52:30Z

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

yep.

qwhelan · 2018-12-30T20:27:50Z

@jbrockmendel Not all, but at least one or two more. I just got back into town and have some errands to run, so will have new asv results this evening.

I have an (unsubmitted) PR for np.searchsorted to fix the needle/haystack-have-different-dtypes performance issue, so might be in a position to investigate switching to the c-api after I finish cleaning that up.

jreback · 2019-01-01T16:28:21Z

@qwhelan can you update (small comment), ping on green.

jreback · 2019-01-03T00:57:56Z

@qwhelan can you merge master and update that comment

qwhelan · 2019-01-03T18:25:46Z

@mroeschke Looks like the tzlocal benchmark improves a fair bit:

       before           after         ratio
     [02a97c0a]       [d20f6fac]
     <tz_aware_op_speedup~4>       <tz_aware_op_speedup>
-      11.9±0.01s       5.77±0.01s     0.48  timeseries.DatetimeIndex.time_add_timedelta('tz_local')

But it's still ~200x slower:

[ 53.12%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 53.12%] ··· ============ =============
               index_type
              ------------ -------------
                  dst         471±8μs
                repeated    3.52±0.05ms
                tz_aware    3.78±0.07ms
              **tz_local     5.77±0.01s**
                tz_naive    1.98±0.03ms
              ============ =============

@jreback digging into a regression I thought was spurious, will have an update in an hour or two

jreback · 2019-01-03T18:27:19Z

sure!

rather than piecewise

qwhelan · 2019-01-03T23:20:21Z

@jreback Good to merge, can't reproduce the spurious regression.

jreback · 2019-01-04T00:26:48Z

thanks @qwhelan keep em coming!

qwhelan force-pushed the tz_aware_op_speedup branch from 5a5ed18 to 6ffad7c Compare December 30, 2018 01:47

gfyoung added Timeseries Performance Memory or execution speed performance Timezones Timezone data dtype labels Dec 30, 2018

gfyoung reviewed Dec 30, 2018

View reviewed changes

doc/source/whatsnew/v0.24.0.rst Outdated Show resolved Hide resolved

gfyoung reviewed Dec 30, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved

gfyoung reviewed Dec 30, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Show resolved Hide resolved

gfyoung requested review from jreback and mroeschke December 30, 2018 04:44

mroeschke reviewed Dec 30, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Show resolved Hide resolved

qwhelan force-pushed the tz_aware_op_speedup branch from 6ffad7c to 5996d17 Compare December 30, 2018 05:03

gfyoung reviewed Dec 30, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved

jreback approved these changes Dec 30, 2018

View reviewed changes

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved

jreback added this to the 0.24.0 milestone Dec 30, 2018

PERF: speed up tz-aware datetimearray ops by skipping useless validation

c1a9049

qwhelan mentioned this pull request Jan 3, 2019

PERF: investigate performance impact of iNaTs on tz-aware operations #24603

Closed

PERF: speed up tz-aware operations by making searchsorted call in bulk,

360415c

rather than piecewise

qwhelan force-pushed the tz_aware_op_speedup branch from 5996d17 to 360415c Compare January 3, 2019 20:52

jreback merged commit c21f1a6 into pandas-dev:master Jan 4, 2019

jorisvandenbossche mentioned this pull request Jan 4, 2019

BLD: 32-bit builds failing #24613

Closed

qwhelan mentioned this pull request Jan 12, 2019

PERF: leverage tzlocal package to provide 2000x speedup for dateutil.tz.tzlocal operations #24737

Closed

4 tasks

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: significant speedups in tz-aware operations (pandas-dev#24491)

93c315c

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

PERF: significant speedups in tz-aware operations (pandas-dev#24491)

f427066

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: significant speedups in tz-aware operations #24491

PERF: significant speedups in tz-aware operations #24491

qwhelan commented Dec 30, 2018 •

edited

Loading

codecov bot commented Dec 30, 2018

codecov bot commented Dec 30, 2018 •

edited

Loading

mroeschke commented Dec 30, 2018

jbrockmendel commented Dec 30, 2018

jbrockmendel commented Dec 30, 2018

jreback commented Dec 30, 2018

jreback left a comment

jbrockmendel commented Dec 30, 2018

jreback commented Dec 30, 2018

qwhelan commented Dec 30, 2018

jreback commented Jan 1, 2019

jreback commented Jan 3, 2019

qwhelan commented Jan 3, 2019

jreback commented Jan 3, 2019

qwhelan commented Jan 3, 2019

jreback commented Jan 4, 2019

PERF: significant speedups in tz-aware operations #24491

PERF: significant speedups in tz-aware operations #24491

Conversation

qwhelan commented Dec 30, 2018 • edited Loading

codecov bot commented Dec 30, 2018

Codecov Report

codecov bot commented Dec 30, 2018 • edited Loading

Codecov Report

mroeschke commented Dec 30, 2018

jbrockmendel commented Dec 30, 2018

jbrockmendel commented Dec 30, 2018

jreback commented Dec 30, 2018

jreback left a comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 30, 2018

jreback commented Dec 30, 2018

qwhelan commented Dec 30, 2018

jreback commented Jan 1, 2019

jreback commented Jan 3, 2019

qwhelan commented Jan 3, 2019

jreback commented Jan 3, 2019

qwhelan commented Jan 3, 2019

jreback commented Jan 4, 2019

qwhelan commented Dec 30, 2018 •

edited

Loading

codecov bot commented Dec 30, 2018 •

edited

Loading