Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: significant speedups in tz-aware operations #24491

Merged
merged 2 commits into from
Jan 4, 2019

Conversation

qwhelan
Copy link
Contributor

@qwhelan qwhelan commented Dec 30, 2018

Operations involving tz-aware data currently incur a pretty substantial penalty:

[ 93.04%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 93.04%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.41±0.07ms
             **US/Eastern     150±2ms**
                  UTC       2.64±0.07ms
                tzutc()     2.70±0.06ms
              ============ =============

[ 93.19%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 93.19%] ··· ============ ============
               index_type
              ------------ ------------
                  dst          n/a
                repeated       n/a
              **tz_aware     305±7ms**
                tz_naive    3.38±0.2ms
              ============ ============

This PR improves the performance of tz-aware operations to near that of tz-naive ones through a couple approaches:

  • Eliminate a duplicative validation check by setting _freq directly
    • freq.setter calls a validation check that compares the input value against to_offset(self.inferred_freq), which is exactly what we are passing
    • For tz-aware data, .inferred_freq requires converting the entire array to the appropriate tz. The time to do so dominates the runtime for our benchmark. Simply eliminating 1 of 2 calls cuts runtime by 50%.
  • Improve the performance of pandas._libs.tslibs.conversion functions by batching searchsorted calls rather than doing piecewise.
    • In theory, both approaches should be O(n log(k)). However, call overhead appears to be substantially larger than the actual search time for the N of our existing benchmark.
    • The existing comment suggests operating piecewise is beneficial in the presence of lots of iNaTs, but this ignores the fact that searchsorted has some optimizations for when the needle array is also sorted (which allows it to incrementally shrink the search region).
  • Minor speedup due to minimizing is_tzlocal()

Here's the same comparison with the PR:

[ 26.38%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 26.38%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.31±0.06ms
               US/Eastern   3.70±0.08ms
                  UTC       2.49±0.03ms
                tzutc()      2.43±0.1ms
              ============ =============

[ 26.52%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 26.52%] ··· ============ =============
               index_type
              ------------ -------------
                  dst           n/a
                repeated        n/a
                tz_aware     4.01±0.2ms
                tz_naive    2.33±0.04ms
              ============ =============

And asv output:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
-      3.38±0.2ms      2.33±0.04ms     0.69  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      21.0±0.2ms       14.4±0.3ms     0.68  inference.DateInferOps.time_timedelta_plus_datetime
-        700±80ns         366±20ns     0.52  timestamp.TimestampProperties.time_dayofweek(<UTC>, 'B')
-         181±2ms       30.6±0.7ms     0.17  timeseries.DatetimeAccessor.time_dt_accessor_time('US/Eastern')
-        180±10ms       29.1±0.9ms     0.16  timeseries.DatetimeAccessor.time_dt_accessor_date('US/Eastern')
-         178±3ms       26.7±0.6ms     0.15  timeseries.DatetimeAccessor.time_dt_accessor_day_name('US/Eastern')
-         179±2ms       26.3±0.4ms     0.15  timeseries.DatetimeIndex.time_to_time('tz_aware')
-         172±4ms       24.8±0.7ms     0.14  timeseries.DatetimeAccessor.time_dt_accessor_month_name('US/Eastern')
-         175±5ms       24.1±0.9ms     0.14  timeseries.DatetimeIndex.time_to_date('tz_aware')
-         150±2ms      3.70±0.08ms     0.02  timeseries.DatetimeAccessor.time_dt_accessor_year('US/Eastern')
-        95.3±5ms      2.02±0.07ms     0.02  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('datetime', 'nonunique_monotonic_inc')
-         149±2ms      2.85±0.02ms     0.02  timeseries.DatetimeIndex.time_timeseries_is_month_start('tz_aware')
-         305±7ms       4.01±0.2ms     0.01  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        95.3±3ms          356±9μs     0.00  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
+         323±6ms          386±9ms     1.19  timeseries.ToDatetimeISO8601.time_iso8601_tz_spaceformat
+         165±2ms          191±4ms     1.15  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
+         151±5μs          170±2μs     1.13  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')
  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Dec 30, 2018

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.31%   92.31%           
=======================================
  Files         166      166           
  Lines       52412    52412           
=======================================
  Hits        48382    48382           
  Misses       4030     4030
Flag Coverage Δ
#multiple 90.73% <100%> (ø) ⬆️
#single 43.05% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 87.68% <100%> (ø) ⬆️
pandas/core/arrays/datetimes.py 97.74% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02a97c0...6ffad7c. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 30, 2018

Codecov Report

Merging #24491 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #24491   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files         166      166           
  Lines       52478    52478           
=======================================
  Hits        48483    48483           
  Misses       3995     3995
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 43.04% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/timedeltas.py 87.99% <100%> (ø) ⬆️
pandas/core/arrays/datetimes.py 98.01% <100%> (ø) ⬆️
pandas/core/internals/blocks.py 94.49% <0%> (ø) ⬆️
pandas/core/indexes/accessors.py 91% <0%> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.22% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 62506ca...360415c. Read the comment docs.

@gfyoung gfyoung added Timeseries Performance Memory or execution speed performance Timezones Timezone data dtype labels Dec 30, 2018
@mroeschke
Copy link
Member

Overall LGTM. Could you also add tzlocal in the relevant benchmarks `that parameterize over timezones?

@jbrockmendel
Copy link
Member

There are I think 7 places in tslibs where we do this iteration. Can they all be optimized?

@jbrockmendel
Copy link
Member

Another perf improvement waiting in these files is calling searchsorted via cnp.PyArray_SearchSorted instead of arr.searchsorted, since the latter is a python-space call. Doing this is at least 3 kinds of a hassle so I never implemented it myself. Feel free to ping me if you're interested in giving it a go.

@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ex comments about adding some additional benchamarks including tzlocal.

pandas/_libs/tslibs/conversion.pyx Outdated Show resolved Hide resolved
@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018
@jbrockmendel
Copy link
Member

a bit -1 on actually using the numpy c-api for searchsorted as it adds complexity; I suspect it won't actually speed things up for us (and is out of scope for this PR anyhow).

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

Definitely agree on the scope, and like I said it is a PITA. But it's also one of the few python-space calls left in these files that we do at each stage of a loop.

yep.

@qwhelan
Copy link
Contributor Author

qwhelan commented Dec 30, 2018

@jbrockmendel Not all, but at least one or two more. I just got back into town and have some errands to run, so will have new asv results this evening.

I have an (unsubmitted) PR for np.searchsorted to fix the needle/haystack-have-different-dtypes performance issue, so might be in a position to investigate switching to the c-api after I finish cleaning that up.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2019

@qwhelan can you update (small comment), ping on green.

@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

@qwhelan can you merge master and update that comment

@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 3, 2019

@mroeschke Looks like the tzlocal benchmark improves a fair bit:

       before           after         ratio
     [02a97c0a]       [d20f6fac]
     <tz_aware_op_speedup~4>       <tz_aware_op_speedup>
-      11.9±0.01s       5.77±0.01s     0.48  timeseries.DatetimeIndex.time_add_timedelta('tz_local')

But it's still ~200x slower:

[ 53.12%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 53.12%] ··· ============ =============
               index_type
              ------------ -------------
                  dst         471±8μs
                repeated    3.52±0.05ms
                tz_aware    3.78±0.07ms
              **tz_local     5.77±0.01s**
                tz_naive    1.98±0.03ms
              ============ =============

@jreback digging into a regression I thought was spurious, will have an update in an hour or two

@jreback
Copy link
Contributor

jreback commented Jan 3, 2019

sure!

@qwhelan
Copy link
Contributor Author

qwhelan commented Jan 3, 2019

@jreback Good to merge, can't reproduce the spurious regression.

@jreback jreback merged commit c21f1a6 into pandas-dev:master Jan 4, 2019
@jreback
Copy link
Contributor

jreback commented Jan 4, 2019

thanks @qwhelan keep em coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Timeseries Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants