PERF: accessing sliced indexes with populated indexing engines #51738

topper-123 · 2023-03-02T08:11:12Z

Improves performance of indexes that are sliced from indexes with already-built indexing engines by copying the relevant data from the existing indexing engine, thereby avoiding recomputation.

Performance example:

>>> import pandas as pd
>>>
>>> idx = pd.Index(np.arange(1_000_000))
>>> idx.is_unique, dx.is_monotonic_increasing  # building the engine
(True, True)
>>> %timeit idx[:].is_unique
13.9 ms ± 78.8 µs per loop  # main
2.76 µs ± 9.74 ns per loop  # this PR
>>> %timeit idx[:].is_monotonic_increasing
4.26 ms ± 1.21 µs per loop  # main
2.7 µs ± 3.9 ns per loop  # this PR
>>> %timeit  idx[:].get_loc(999_999)
4.26 ms ± 1.49 µs per loop  # main
3.77 µs ± 41.7 ns per loop  # this PR

Not sure how to test this, as the relevant attributes are in cython code, but I don't think we do tests for indexing engines currently?

jbrockmendel · 2023-03-02T16:48:22Z

nice! this has been on my todo list for ages but was always intimidating

.pre-commit-config.yaml

jbrockmendel

LGTM

mroeschke · 2023-03-07T23:41:37Z

pandas/core/indexes/base.py

-        return type(self)._simple_new(res, name=self._name)
+        result = type(self)._simple_new(res, name=self._name)
+        if "_engine" in self._cache:
+            reverse = slobj.step is not None and slobj.step < 0


Just confirming this is still valid if slobj is empty? slice(0,0)

Yeah, this is valid if we have e.g. slice(None), because then slobj.step is always None. For slice(0,0) the .step attribute is None, so no problem there.

mroeschke · 2023-03-08T17:51:04Z

Thanks @topper-123

topper-123 force-pushed the index_slice_perf branch from fff62b1 to 8543351 Compare March 2, 2023 08:18

topper-123 force-pushed the index_slice_perf branch 2 times, most recently from 30d8e4e to 7b50a89 Compare March 4, 2023 21:40

mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Mar 6, 2023

mroeschke reviewed Mar 6, 2023

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

topper-123 added 6 commits March 6, 2023 23:18

PERF: accessing sliced indexes with populated indexing engines

1d9f0e7

update

369215a

also use for RangeIndex

8045a70

pre-commit issues

2ed96cb

pre-commit issues

19ed52c

add back pylint

989b081

topper-123 force-pushed the index_slice_perf branch from 7b50a89 to 989b081 Compare March 6, 2023 23:21

jbrockmendel approved these changes Mar 7, 2023

View reviewed changes

mroeschke reviewed Mar 7, 2023

View reviewed changes

mroeschke added this to the 2.1 milestone Mar 8, 2023

mroeschke approved these changes Mar 8, 2023

View reviewed changes

mroeschke merged commit 9b4cffc into pandas-dev:main Mar 8, 2023

topper-123 deleted the index_slice_perf branch March 8, 2023 19:39

kandersolar mentioned this pull request Nov 1, 2023

BUG: infer_freq has stateful behavior #55794

Open

3 tasks

rob-sil mentioned this pull request Mar 26, 2024

BUG: Fix is_unique regression for slices of Indexes #57958

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: accessing sliced indexes with populated indexing engines #51738

PERF: accessing sliced indexes with populated indexing engines #51738

topper-123 commented Mar 2, 2023 •

edited

jbrockmendel commented Mar 2, 2023

jbrockmendel left a comment

mroeschke Mar 7, 2023

topper-123 Mar 8, 2023

mroeschke commented Mar 8, 2023

PERF: accessing sliced indexes with populated indexing engines #51738

PERF: accessing sliced indexes with populated indexing engines #51738

Conversation

topper-123 commented Mar 2, 2023 • edited

jbrockmendel commented Mar 2, 2023

jbrockmendel left a comment

Choose a reason for hiding this comment

mroeschke Mar 7, 2023

Choose a reason for hiding this comment

topper-123 Mar 8, 2023

Choose a reason for hiding this comment

mroeschke commented Mar 8, 2023

topper-123 commented Mar 2, 2023 •

edited