PERF: Index.getitem performance issue #6370

immerrr · 2014-02-16T14:33:39Z

Once again, caused by #6328 investigation.

There's something very strange with how Index objects handle slices:

In [1]: import pandas.util.testing as tm

In [2]: idx = tm.makeStringIndex(1000000)

In [3]: timeit idx[:-1]
100000 loops, best of 3: 2 µs per loop

In [4]: timeit idx[slice(None,-1)]
100 loops, best of 3: 6.5 ms per loop

Obviously, this happens because Index doesn't override __getslice__ provided by ndarray, hence idx[:-1] is executed via ndarray.__getslice__ -> Index.__array_finalize__ and idx[slice(None, -1)] goes via Index.__getitem__ -> Index.__new__.

__getitem__ is made 1000x slower trying to infer slice data type and convert it to a different subclass. The problem is that interactive invocation idx[:-1], which is when that milliseconds-vs-microseconds issue doesn't matter, is likely to miss this feature, because it's dispatched via __getslice__ . But for programmatic invocation idx[slice(None, -1)] which hits this soft spot, I'd argue that this type conversion magic is not at all necessary.

Is there a rationale behind this?

The text was updated successfully, but these errors were encountered:

jreback · 2014-02-16T16:39:59Z

I think this orginally had to do with some compat with ndarray (e.g. trying to preserver the interfact), but since __getslice__ is deprecated even in ndarray, seems silly to call it. Why don't you change and see if tests pass and get perf gain (you may need to add this vbench).

immerrr · 2014-02-17T07:57:42Z

As for __getslice__, the problem is how python2 is implemented: when you do obj[i:j] it first looks for obj.__getslice__(i,j) falling back to obj.__getitem__(slice(i,j)). Since ndarray provides __getslice__ (at least in 1.7), the only thing that we can do in order to provide custom __getitem__ is to override __getslice__ to reroute those "basic slicing" requests.

I've rewritten this a bit, but this caused test breakage. Will look into it later.

This was referenced Feb 16, 2014

ENH: fastpath indexer API proposal (draft) #6328

Closed

PERF: fix performance for series slices (even more) #6372

Merged

jreback added the Performance label Feb 16, 2014

jreback added this to the 0.14.0 milestone Feb 16, 2014

This was referenced Feb 22, 2014

API: allow the iloc indexer to run off the end and not raise IndexError (GH6296) #6299

Merged

PERF: optimize index.__getitem__ for slice & boolean mask indexers #6440

Merged

jreback added the Indexing label Feb 22, 2014

jreback closed this as completed in #6440 Feb 28, 2014

shoyer mentioned this issue May 19, 2014

Fix concatenating Variables with dtype=datetime64 pydata/xarray#134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Index.getitem performance issue #6370

PERF: Index.getitem performance issue #6370

immerrr commented Feb 16, 2014

jreback commented Feb 16, 2014

immerrr commented Feb 17, 2014

PERF: Index.__getitem__ performance issue #6370

PERF: Index.__getitem__ performance issue #6370

Comments

immerrr commented Feb 16, 2014

jreback commented Feb 16, 2014

immerrr commented Feb 17, 2014

PERF: Index.getitem performance issue #6370

PERF: Index.getitem performance issue #6370