Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up slice object processing #63013

Closed
scoder opened this issue Aug 22, 2013 · 16 comments
Closed

Speed up slice object processing #63013

scoder opened this issue Aug 22, 2013 · 16 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@scoder
Copy link
Contributor

scoder commented Aug 22, 2013

BPO 18813
Nosy @mdickinson, @pitrou, @scoder, @serhiy-storchaka
Files
  • faster_PyEval_SliceIndex.patch
  • use_locals_in_PySlice_GetIndicesEx.patch
  • cache_slice_indices.patch
  • cache_slice_indices.patch: updated patch that normalises C step=1 to Python step=None
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-01-29.07:12:55.113>
    created_at = <Date 2013-08-22.20:52:51.965>
    labels = ['interpreter-core', 'performance']
    title = 'Speed up slice object processing'
    updated_at = <Date 2015-01-29.07:12:55.112>
    user = 'https://github.com/scoder'

    bugs.python.org fields:

    activity = <Date 2015-01-29.07:12:55.112>
    actor = 'scoder'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-01-29.07:12:55.113>
    closer = 'scoder'
    components = ['Interpreter Core']
    creation = <Date 2013-08-22.20:52:51.965>
    creator = 'scoder'
    dependencies = []
    files = ['31422', '31423', '31431', '37205']
    hgrepos = []
    issue_num = 18813
    keywords = ['patch']
    message_count = 16.0
    messages = ['195918', '195919', '195920', '195922', '195925', '195944', '231215', '231219', '231235', '231237', '231238', '231244', '231247', '231256', '234884', '234949']
    nosy_count = 4.0
    nosy_names = ['mark.dickinson', 'pitrou', 'scoder', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'rejected'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue18813'
    versions = ['Python 3.5']

    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 22, 2013

    The get/set/delitem slicing protocol has replaced the old Py2.x get/set/delslice protocol in Py3.x. This change introduces a substantial overhead due to processing indices as Python objects rather than plain Py_ssize_t values. This overhead should be reduced as much as possible.

    This ticket can be seen as a more general follow-up to bpo-10227.

    @scoder scoder added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Aug 22, 2013
    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 22, 2013

    This tiny patch adds a fast-path to _PyEval_SliceIndex() that speeds up the slicing-heavy "fannkuch" benchmark by 8% for me.

    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 22, 2013

    Sorry, broken patch. Here's a new one (same results).

    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 22, 2013

    Another patch, originally proposed in bpo-10227 by Kristján Valur Jónsson. It uses local variables instead of pointers in PySlice_GetIndicesEx(). I see no performance difference whatsoever (Linux/gcc 4.7), but also I can't see a reason not to do this. I think it cleans up the code a bit.

    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 22, 2013

    And in fact, callgrind shows an *increase* in the number of instructions executed for the "use locals" patch (898M with vs. 846M without that patch when running the fannkuch benchmark twice). That speaks against making that change, I guess.

    @scoder
    Copy link
    Contributor Author

    scoder commented Aug 23, 2013

    Here is another patch that remembers the Py_ssize_t slice indices if they are known at instantiation time. It only makes a very small difference for the "fannkuch" benchmark, so that's no reason to add both the complexity and the (IMHO ignorable) memory overhead.

    However, it also adds a public C-API function PySlice_FromIndices() that allows setting the values from C code at instantiation time, thus avoiding the overhead of having to do the conversion back again.

    It might also be worth exploring if we can't instantiate the Python object indices at first request using properties, iff the slice was created with integer indices. Meaning, we'd leave the PyObject* fields NULL in that case until they are being used. That would reduce the overhead of creating the slice object in the first place. Since we added the one-instance cache, it's now dominated by the creation of the index value objects when _PySlice_FromIndices() is being used internally (e.g. when calling PySequence_Get/Set/DelSlice()).

    @serhiy-storchaka
    Copy link
    Member

    Are there any benchmarks?

    @scoder
    Copy link
    Contributor Author

    scoder commented Nov 15, 2014

    As mentioned, the fannkuch benchmark benefits quite a bit from the fast path, by 8%. It was a while ago when I tested, though, so I don't have exact numbers ATM.

    @scoder
    Copy link
    Contributor Author

    scoder commented Nov 16, 2014

    Here's another idea. There could be two implementations of "slice", one that uses Python object indices (as currently) and one that has Py_ssize_t indices (and properties for the start/stop/step attributes). Similar to what Unicode strings do, the slice type would decide which to use at instantiation time.

    The internals of PySliceObject are not part of the stable ABI, and the normal way to figure out the indices is PySlice_GetIndicesEx(), which would be adapted accordingly.

    This breaks compatibility with some C extensions that access the slice attributes directly, but that should be rare, given how complex index calculations are.

    This change is certainly more invasive than adding Py_ssize_t fields to the existing object, though.

    @scoder
    Copy link
    Contributor Author

    scoder commented Nov 16, 2014

    Thanks for the review, Serhiy. There isn't really a reason not to 'normalise' the Python step object to None when 1 is passed from C code, so I updated the patch to do that.

    Regarding on-demand creation of the Python values, the problem is that C code that accesses the struct fields directly would not be prepared to find NULL values in them, so this would be a visible change and potential source of crashes. However, my guess is that this would rarely be a problem (see my last comment on changing the slice type internally).

    @serhiy-storchaka
    Copy link
    Member

    There could be two implementations of "slice", one that
    uses Python object indices (as currently) and one that has Py_ssize_t
    indices (and properties for the start/stop/step attributes).

    Agree, this idea LGTM. Single boolean flag should be enough to switch between
    implementations. This shouldn't break well written code. The PySliceObject
    structure is not a part of stable API.

    @scoder
    Copy link
    Contributor Author

    scoder commented Nov 16, 2014

    I reran the benchmarks with the fast path in _PyEval_SliceIndex() and the results are not conclusive. There is no clear indication that it improves the performance.

    The problem is that most slices have no step, many slices are open ended on at least one side (i.e. usually have one or two fields set to None) and they are only used once, so integer index optimisations can only have a limited impact compared to instantiating the slice object in the first place.

    @serhiy-storchaka
    Copy link
    Member

    Indeed, PySequence_GetSlice() is used only in few performance non-critical places in the stdlib, PySequence_Set/DelSlice() are not used at all. So looks as this way doesn't speed up any code.

    As for faster_PyEval_SliceIndex.patch see bpo-17576. May be such optimization is not always correct (only for exact PyLong). And why you pass through fast path only if -1<= Py_SIZE(v) <=1?

    @scoder
    Copy link
    Contributor Author

    scoder commented Nov 16, 2014

    why you pass through fast path only if -1<= Py_SIZE(v) <=1?

    It's usually enough, it's the fastest fast path, and it doesn't even need
    error handling.

    Any slicing where the slice calculation time matters is going to be of
    fairly limited size, often involving the values (!) 1 or -1 in the slice
    object fields, but definitely small integers.

    @serhiy-storchaka
    Copy link
    Member

    So may be close this issue as it doesn't affect performance?

    @scoder
    Copy link
    Contributor Author

    scoder commented Jan 29, 2015

    Closing as not worth doing.

    @scoder scoder closed this as completed Jan 29, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants