Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Use RangeIndex properties to compute max/min #17607

Closed
jschendel opened this issue Sep 20, 2017 · 2 comments · Fixed by #17611
Closed

PERF: Use RangeIndex properties to compute max/min #17607

jschendel opened this issue Sep 20, 2017 · 2 comments · Fixed by #17611
Labels
Performance Memory or execution speed performance
Milestone

Comments

@jschendel
Copy link
Member

Problem description

Currently RangeIndex.max and RangeIndex.min fallback to nanops.nanmax and nanops.nanmin, but it's possible to determine these more efficiently using properties of RangeIndex.

I don't imagine that these are used very frequently, but the implementation is straightforward and appears to yield a good performance boost, so seems worthwhile. Wanted to check here first before putting in too much effort though.

Did a preliminary implementation and created some asv benchmarks:

      before           after         ratio
     [b59f107a]       [85c7fef1]
-        29.3±0ms      1.74±0.06μs     0.00  index_object.Range.time_max
-        27.3±1ms      1.39±0.06μs     0.00  index_object.Range.time_min
-        24.6±0ms      1.18±0.08μs     0.00  index_object.Range.time_min_trivial
-      27.3±0.7ms      1.15±0.04μs     0.00  index_object.Range.time_max_trivial

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

Where the benchmarks are generated from:

class Range(object):
    goal_time = 0.2

    def setup(self):
        self.idx_inc = RangeIndex(start=0, stop=10**7, step=3)
        self.idx_dec = RangeIndex(start=10**7, stop=-1, step=-3)

    def time_max(self):
        self.idx_inc.max()

    def time_max_trivial(self):
        self.idx_dec.max()

    def time_min(self):
        self.idx_dec.min()

    def time_min_trivial(self):
        self.idx_inc.min()

Note that the _trivial suffix denotes a fastpath for when the max/min are just the _start value (e.g. the minimum value of an increasing RangeIndex is just _start). This isn't necessarily the case with _stop, as it may not be included if misaligned with _step (e.g. RangeIndex(0, 10, 3)).

@TomAugspurger TomAugspurger added the Performance Memory or execution speed performance label Sep 20, 2017
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 20, 2017

Sounds good. I think ∞x performance improvements are usually worthwhile :)

It should be a matter of overriding those methods on RangeIndex, correct?

@jschendel
Copy link
Member Author

It should be a matter of overriding those methods on RangeIndex, correct?

Yeah, that's what I did in my preliminary implementation, and don't expect it to change.

Still need to formally write up some tests, and add a whatsnew entry. Maybe try a few tweaks, but mostly there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants