Speed up slice object processing #63013

scoder · 2013-08-22T20:52:52Z

BPO	18813
Nosy	@mdickinson, @pitrou, @scoder, @serhiy-storchaka
Files	faster_PyEval_SliceIndex.patch use_locals_in_PySlice_GetIndicesEx.patch cache_slice_indices.patch cache_slice_indices.patch: updated patch that normalises C step=1 to Python step=None

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-01-29.07:12:55.113>
created_at = <Date 2013-08-22.20:52:51.965>
labels = ['interpreter-core', 'performance']
title = 'Speed up slice object processing'
updated_at = <Date 2015-01-29.07:12:55.112>
user = 'https://github.com/scoder'

bugs.python.org fields:

activity = <Date 2015-01-29.07:12:55.112>
actor = 'scoder'
assignee = 'none'
closed = True
closed_date = <Date 2015-01-29.07:12:55.113>
closer = 'scoder'
components = ['Interpreter Core']
creation = <Date 2013-08-22.20:52:51.965>
creator = 'scoder'
dependencies = []
files = ['31422', '31423', '31431', '37205']
hgrepos = []
issue_num = 18813
keywords = ['patch']
message_count = 16.0
messages = ['195918', '195919', '195920', '195922', '195925', '195944', '231215', '231219', '231235', '231237', '231238', '231244', '231247', '231256', '234884', '234949']
nosy_count = 4.0
nosy_names = ['mark.dickinson', 'pitrou', 'scoder', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue18813'
versions = ['Python 3.5']

scoder · 2013-08-22T20:52:52Z

The get/set/delitem slicing protocol has replaced the old Py2.x get/set/delslice protocol in Py3.x. This change introduces a substantial overhead due to processing indices as Python objects rather than plain Py_ssize_t values. This overhead should be reduced as much as possible.

This ticket can be seen as a more general follow-up to bpo-10227.

scoder · 2013-08-22T20:55:14Z

This tiny patch adds a fast-path to _PyEval_SliceIndex() that speeds up the slicing-heavy "fannkuch" benchmark by 8% for me.

scoder · 2013-08-22T21:15:56Z

Sorry, broken patch. Here's a new one (same results).

scoder · 2013-08-22T21:22:44Z

Another patch, originally proposed in bpo-10227 by Kristján Valur Jónsson. It uses local variables instead of pointers in PySlice_GetIndicesEx(). I see no performance difference whatsoever (Linux/gcc 4.7), but also I can't see a reason not to do this. I think it cleans up the code a bit.

scoder · 2013-08-22T21:48:47Z

And in fact, callgrind shows an *increase* in the number of instructions executed for the "use locals" patch (898M with vs. 846M without that patch when running the fannkuch benchmark twice). That speaks against making that change, I guess.

scoder · 2013-08-23T06:31:52Z

Here is another patch that remembers the Py_ssize_t slice indices if they are known at instantiation time. It only makes a very small difference for the "fannkuch" benchmark, so that's no reason to add both the complexity and the (IMHO ignorable) memory overhead.

However, it also adds a public C-API function PySlice_FromIndices() that allows setting the values from C code at instantiation time, thus avoiding the overhead of having to do the conversion back again.

It might also be worth exploring if we can't instantiate the Python object indices at first request using properties, iff the slice was created with integer indices. Meaning, we'd leave the PyObject* fields NULL in that case until they are being used. That would reduce the overhead of creating the slice object in the first place. Since we added the one-instance cache, it's now dominated by the creation of the index value objects when _PySlice_FromIndices() is being used internally (e.g. when calling PySequence_Get/Set/DelSlice()).

serhiy-storchaka · 2014-11-15T20:02:47Z

Are there any benchmarks?

scoder · 2014-11-15T21:32:30Z

As mentioned, the fannkuch benchmark benefits quite a bit from the fast path, by 8%. It was a while ago when I tested, though, so I don't have exact numbers ATM.

scoder · 2014-11-16T08:17:12Z

Here's another idea. There could be two implementations of "slice", one that uses Python object indices (as currently) and one that has Py_ssize_t indices (and properties for the start/stop/step attributes). Similar to what Unicode strings do, the slice type would decide which to use at instantiation time.

The internals of PySliceObject are not part of the stable ABI, and the normal way to figure out the indices is PySlice_GetIndicesEx(), which would be adapted accordingly.

This breaks compatibility with some C extensions that access the slice attributes directly, but that should be rare, given how complex index calculations are.

This change is certainly more invasive than adding Py_ssize_t fields to the existing object, though.

scoder · 2014-11-16T08:35:48Z

Thanks for the review, Serhiy. There isn't really a reason not to 'normalise' the Python step object to None when 1 is passed from C code, so I updated the patch to do that.

Regarding on-demand creation of the Python values, the problem is that C code that accesses the struct fields directly would not be prepared to find NULL values in them, so this would be a visible change and potential source of crashes. However, my guess is that this would rarely be a problem (see my last comment on changing the slice type internally).

serhiy-storchaka · 2014-11-16T09:47:57Z

There could be two implementations of "slice", one that
uses Python object indices (as currently) and one that has Py_ssize_t
indices (and properties for the start/stop/step attributes).

Agree, this idea LGTM. Single boolean flag should be enough to switch between
implementations. This shouldn't break well written code. The PySliceObject
structure is not a part of stable API.

scoder · 2014-11-16T10:41:41Z

I reran the benchmarks with the fast path in _PyEval_SliceIndex() and the results are not conclusive. There is no clear indication that it improves the performance.

The problem is that most slices have no step, many slices are open ended on at least one side (i.e. usually have one or two fields set to None) and they are only used once, so integer index optimisations can only have a limited impact compared to instantiating the slice object in the first place.

serhiy-storchaka · 2014-11-16T14:33:00Z

Indeed, PySequence_GetSlice() is used only in few performance non-critical places in the stdlib, PySequence_Set/DelSlice() are not used at all. So looks as this way doesn't speed up any code.

As for faster_PyEval_SliceIndex.patch see bpo-17576. May be such optimization is not always correct (only for exact PyLong). And why you pass through fast path only if -1<= Py_SIZE(v) <=1?

scoder · 2014-11-16T18:43:13Z

why you pass through fast path only if -1<= Py_SIZE(v) <=1?

It's usually enough, it's the fastest fast path, and it doesn't even need
error handling.

Any slicing where the slice calculation time matters is going to be of
fairly limited size, often involving the values (!) 1 or -1 in the slice
object fields, but definitely small integers.

serhiy-storchaka · 2015-01-28T10:29:34Z

So may be close this issue as it doesn't affect performance?

scoder · 2015-01-29T07:12:55Z

Closing as not worth doing.

scoder added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Aug 22, 2013

scoder closed this as completed Jan 29, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up slice object processing #63013

Speed up slice object processing #63013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 23, 2013

serhiy-storchaka commented Nov 15, 2014

scoder commented Nov 15, 2014

scoder commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Jan 28, 2015

scoder commented Jan 29, 2015

Speed up slice object processing #63013

Speed up slice object processing #63013

Comments

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 22, 2013

scoder commented Aug 23, 2013

serhiy-storchaka commented Nov 15, 2014

scoder commented Nov 15, 2014

scoder commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Nov 16, 2014

scoder commented Nov 16, 2014

serhiy-storchaka commented Jan 28, 2015

scoder commented Jan 29, 2015