Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Implement RangeIndex min/max using RangeIndex properties #17611

Merged
merged 3 commits into from
Sep 22, 2017

Conversation

jschendel
Copy link
Member

Benchmarks are essentially the same as what's referenced in the issue. Maybe a few ns slower since I combined logic into a helper function vs having identical code (other than elif) for both min/max, which was the case I did the benchmarks posted in the issue. Updated benchmarks:

      before           after         ratio
     [fedf9228]       [4cb0de8d]
-        25.4±0ms      1.77±0.06μs     0.00  index_object.Range.time_min
-      26.0±0.9ms      1.71±0.06μs     0.00  index_object.Range.time_max
-        25.4±0ms         1.51±0μs     0.00  index_object.Range.time_max_trivial
-        25.4±0ms         1.30±0μs     0.00  index_object.Range.time_min_trivial

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@codecov
Copy link

codecov bot commented Sep 21, 2017

Codecov Report

Merging #17611 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17611      +/-   ##
==========================================
- Coverage    91.2%   91.18%   -0.02%     
==========================================
  Files         163      163              
  Lines       49637    49648      +11     
==========================================
+ Hits        45269    45271       +2     
- Misses       4368     4377       +9
Flag Coverage Δ
#multiple 88.97% <100%> (ø) ⬆️
#single 40.18% <27.27%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/range.py 92.83% <100%> (+0.24%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8276a42...5379fc2. Read the comment docs.

@topper-123
Copy link
Contributor

I'm wondering it something similar, but more general, couldn't be done that works with both indices and series using is_monotonic_*.

E.g.:

def max(self):
    if self.is_monotonic_increasing:
        return self[-1]
    elif self.is_monotonic_decreasing:
        return self[0]
    else:
        return nanops.nanmax(self.values)

Seems to me this should work. What do you think, have I missed something nonobvious?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc comment otherwise lgtm

@@ -472,6 +472,7 @@ Performance Improvements
- Improved performance of :meth:`Categorical.set_categories` by not materializing the values (:issue:`17508`)
- :attr:`Timestamp.microsecond` no longer re-computes on attribute access (:issue:`17331`)
- Improved performance of the :class:`CategoricalIndex` for data that is already categorical dtype (:issue:`17513`)
- Improved performance of ``RangeIndex.min`` and ``RangeIndex.max`` by using ``RangeIndex`` properties to perform the computations (:issue:`17607`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u use :meth: here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will using :meth: work in this case? It doesn't look like RangeIndex is in api.rst, and I can't find any RangeIndex related items in the existing HTML API reference. Not super well versed on how the docs work though, so could be mistaken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so should prob add RangeIndex to api.rst :>

(and prob any other Indexes that are missing). We don't need to repeat the methods though (they are inherited).

@@ -994,3 +994,22 @@ def test_append(self):
# Append single item rather than list
result2 = indices[0].append(indices[1])
tm.assert_index_equal(result2, expected, exact=True)

def test_max_min(self):
params = [(0, 400, 3), (500, 0, -6), (-10**6, 10**6, 4),
Copy link
Member

@gfyoung gfyoung Sep 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • pytest.mark.parametrize this
  • Reference issue number above this

@gfyoung gfyoung added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Sep 21, 2017
@jschendel
Copy link
Member Author

@topper-123 : I think that will work in general, but I wonder if there are cases where the additional overhead of computing is_monotonic_* could outweigh performance gains. Would be interesting to look into.

@jreback jreback added this to the 0.21.0 milestone Sep 21, 2017
Implemented RangeIndex min/max in terms of RangeIndex properties.

.. autosummary::
:toctree: generated/
:template: autosummary/class_without_autosummary.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just list them all in a single autosummary directive?

@jreback jreback merged commit 26681db into pandas-dev:master Sep 22, 2017
@jreback
Copy link
Contributor

jreback commented Sep 22, 2017

thanks @jschendel

@@ -473,6 +473,7 @@ Performance Improvements
- Improved performance of :meth:`Categorical.set_categories` by not materializing the values (:issue:`17508`)
- :attr:`Timestamp.microsecond` no longer re-computes on attribute access (:issue:`17331`)
- Improved performance of the :class:`CategoricalIndex` for data that is already categorical dtype (:issue:`17513`)
- Improved performance of :meth:`RangeIndex.min` and :meth:`RangeIndex.max` by using ``RangeIndex`` properties to perform the computations (:issue:`17607`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those links won't work because the pages are not generated (only the class docstring is included in the api).
To solve this, we would need to find a way to specify which of the methods to include of which not (or list them manually in the api.rst, that is probably easier)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Use RangeIndex properties to compute max/min
5 participants