Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: significant speedups in tz-aware operations #24491

Merged
merged 2 commits into from Jan 4, 2019

Conversation

Projects
None yet
5 participants
@qwhelan
Copy link
Contributor

commented Dec 30, 2018

Operations involving tz-aware data currently incur a pretty substantial penalty:

[ 93.04%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 93.04%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.41±0.07ms
             **US/Eastern     150±2ms**
                  UTC       2.64±0.07ms
                tzutc()     2.70±0.06ms
              ============ =============

[ 93.19%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 93.19%] ··· ============ ============
               index_type
              ------------ ------------
                  dst          n/a
                repeated       n/a
              **tz_aware     305±7ms**
                tz_naive    3.38±0.2ms
              ============ ============

This PR improves the performance of tz-aware operations to near that of tz-naive ones through a couple approaches:

  • Eliminate a duplicative validation check by setting _freq directly
    • freq.setter calls a validation check that compares the input value against to_offset(self.inferred_freq), which is exactly what we are passing
    • For tz-aware data, .inferred_freq requires converting the entire array to the appropriate tz. The time to do so dominates the runtime for our benchmark. Simply eliminating 1 of 2 calls cuts runtime by 50%.
  • Improve the performance of pandas._libs.tslibs.conversion functions by batching searchsorted calls rather than doing piecewise.
    • In theory, both approaches should be O(n log(k)). However, call overhead appears to be substantially larger than the actual search time for the N of our existing benchmark.
    • The existing comment suggests operating piecewise is beneficial in the presence of lots of iNaTs, but this ignores the fact that searchsorted has some optimizations for when the needle array is also sorted (which allows it to incrementally shrink the search region).
  • Minor speedup due to minimizing is_tzlocal()

Here's the same comparison with the PR:

[ 26.38%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 26.38%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.31±0.06ms
               US/Eastern   3.70±0.08ms
                  UTC       2.49±0.03ms
                tzutc()      2.43±0.1ms
              ============ =============

[ 26.52%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 26.52%] ··· ============ =============
               index_type
              ------------ -------------
                  dst           n/a
                repeated        n/a
                tz_aware     4.01±0.2ms
                tz_naive    2.33±0.04ms
              ============ =============

And asv output:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
-      3.38±0.2ms      2.33±0.04ms     0.69  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      21.0±0.2ms       14.4±0.3ms     0.68  inference.DateInferOps.time_timedelta_plus_datetime
-        700±80ns         366±20ns     0.52  timestamp.TimestampProperties.time_dayofweek(<UTC>, 'B')
-         181±2ms       30.6±0.7ms     0.17  timeseries.DatetimeAccessor.time_dt_accessor_time('US/Eastern')
-        180±10ms       29.1±0.9ms     0.16  timeseries.DatetimeAccessor.time_dt_accessor_date('US/Eastern')
-         178±3ms       26.7±0.6ms     0.15  timeseries.DatetimeAccessor.time_dt_accessor_day_name('US/Eastern')
-         179±2ms       26.3±0.4ms     0.15  timeseries.DatetimeIndex.time_to_time('tz_aware')
-         172±4ms       24.8±0.7ms     0.14  timeseries.DatetimeAccessor.time_dt_accessor_month_name('US/Eastern')
-         175±5ms       24.1±0.9ms     0.14  timeseries.DatetimeIndex.time_to_date('tz_aware')
-         150±2ms      3.70±0.08ms     0.02  timeseries.DatetimeAccessor.time_dt_accessor_year('US/Eastern')
-        95.3±5ms      2.02±0.07ms     0.02  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('datetime', 'nonunique_monotonic_inc')
-         149±2ms      2.85±0.02ms     0.02  timeseries.DatetimeIndex.time_timeseries_is_month_start('tz_aware')
-         305±7ms       4.01±0.2ms     0.01  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        95.3±3ms          356±9μs     0.00  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
+         323±6ms          386±9ms     1.19  timeseries.ToDatetimeISO8601.time_iso8601_tz_spaceformat
+         165±2ms          191±4ms     1.15  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
+         151±5μs          170±2μs     1.13  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@qwhelan qwhelan force-pushed the qwhelan:tz_aware_op_speedup branch from 5a5ed18 to 6ffad7c Dec 30, 2018

@codecov

This comment has been minimized.

Copy link

commented Dec 30, 2018

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.31%   92.31%           
=======================================
  Files         166      166           
  Lines       52412    52412           
=======================================
  Hits        48382    48382           
  Misses       4030     4030
Flag Coverage Δ
#multiple 90.73% <100%> (ø) ⬆️
#single 43.05% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 87.68% <100%> (ø) ⬆️
pandas/core/arrays/datetimes.py 97.74% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a97c0...6ffad7c. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Dec 30, 2018

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files         166      166           
  Lines       52478    52478           
=======================================
  Hits        48483    48483           
  Misses       3995     3995
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 43.04% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 87.99% <100%> (ø) ⬆️
pandas/core/arrays/datetimes.py 98.01% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 94.49% <0%> (ø) ⬆️
pandas/core/indexes/accessors.py 91% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.22% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62506ca...360415c. Read the comment docs.

@gfyoung gfyoung requested review from jreback and mroeschke Dec 30, 2018

@qwhelan qwhelan force-pushed the qwhelan:tz_aware_op_speedup branch from 6ffad7c to 5996d17 Dec 30, 2018

@mroeschke

This comment has been minimized.

Copy link
Member

commented Dec 30, 2018

Overall LGTM. Could you also add tzlocal in the relevant benchmarks `that parameterize over timezones?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Dec 30, 2018

There are I think 7 places in tslibs where we do this iteration. Can they all be optimized?

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Dec 30, 2018

Another perf improvement waiting in these files is calling searchsorted via cnp.PyArray_SearchSorted instead of arr.searchsorted, since the latter is a python-space call. Doing this is at least 3 kinds of a hassle so I never implemented it myself. Feel free to ping me if you're interested in giving it a go.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2018

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

@jreback
Copy link
Contributor

left a comment

lgtm. ex comments about adding some additional benchamarks including tzlocal.

Show resolved Hide resolved pandas/_libs/tslibs/conversion.pyx Outdated

@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Dec 30, 2018

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2018

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

yep.

@qwhelan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 30, 2018

@jbrockmendel Not all, but at least one or two more. I just got back into town and have some errands to run, so will have new asv results this evening.

I have an (unsubmitted) PR for np.searchsorted to fix the needle/haystack-have-different-dtypes performance issue, so might be in a position to investigate switching to the c-api after I finish cleaning that up.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 1, 2019

@qwhelan can you update (small comment), ping on green.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2019

@qwhelan can you merge master and update that comment

@qwhelan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

@mroeschke Looks like the tzlocal benchmark improves a fair bit:

       before           after         ratio
     [02a97c0a]       [d20f6fac]
     <tz_aware_op_speedup~4>       <tz_aware_op_speedup>
-      11.9±0.01s       5.77±0.01s     0.48  timeseries.DatetimeIndex.time_add_timedelta('tz_local')

But it's still ~200x slower:

[ 53.12%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 53.12%] ··· ============ =============
               index_type
              ------------ -------------
                  dst         471±8μs
                repeated    3.52±0.05ms
                tz_aware    3.78±0.07ms
              **tz_local     5.77±0.01s**
                tz_naive    1.98±0.03ms
              ============ =============

@jreback digging into a regression I thought was spurious, will have an update in an hour or two

@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 3, 2019

sure!

@qwhelan qwhelan force-pushed the qwhelan:tz_aware_op_speedup branch from 5996d17 to 360415c Jan 3, 2019

@qwhelan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 3, 2019

@jreback Good to merge, can't reproduce the spurious regression.

@jreback jreback merged commit c21f1a6 into pandas-dev:master Jan 4, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Copy link
Contributor

commented Jan 4, 2019

thanks @qwhelan keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.