Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace_interleave can be optimized for single character byte strings #70761

Closed
JoshSnider mannequin opened this issue Mar 16, 2016 · 8 comments
Closed

replace_interleave can be optimized for single character byte strings #70761

JoshSnider mannequin opened this issue Mar 16, 2016 · 8 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@JoshSnider
Copy link
Mannequin

JoshSnider mannequin commented Mar 16, 2016

BPO 26574
Nosy @vstinner, @serhiy-storchaka
Files
  • bytes.patch
  • bytes-2.patch
  • bench.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-03-21.09:40:19.278>
    created_at = <Date 2016-03-16.23:28:10.591>
    labels = ['library', 'performance']
    title = 'replace_interleave can be optimized for single character byte strings'
    updated_at = <Date 2016-03-21.10:39:44.000>
    user = 'https://bugs.python.org/JoshSnider'

    bugs.python.org fields:

    activity = <Date 2016-03-21.10:39:44.000>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-03-21.09:40:19.278>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2016-03-16.23:28:10.591>
    creator = 'Josh Snider'
    dependencies = []
    files = ['42179', '42180', '42229']
    hgrepos = []
    issue_num = 26574
    keywords = ['patch']
    message_count = 8.0
    messages = ['261870', '261871', '261873', '262111', '262112', '262113', '262114', '262115']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'Josh Snider']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue26574'
    versions = ['Python 3.6']

    @JoshSnider
    Copy link
    Mannequin Author

    JoshSnider mannequin commented Mar 16, 2016

    replace_interleave in Objects/bytesobject.c and Objects/bytearrayobject.c can be optimized for the special case where the interleaving byte string is a single character.

    Here's some quick results from timeit showing that it's about three times faster for the special case.
    * Before (cold start):
    >>> timeit.timeit('(b"x" * 2000000).replace(b"", b".")', number=1000)
    7.619218342995737
    * After (cold start):
    >>> timeit.timeit('(b"x" * 2000000).replace(b"", b".")', number=1000)
    2.7605581780080684

    For the non-special case, running timeit.timeit('(b"x" * 2000000).replace(b"", b".0")', number=10000) takes ~173 seconds on both versions.

    @JoshSnider JoshSnider mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 16, 2016
    @vstinner
    Copy link
    Member

    vstinner commented Mar 16, 2016

    I reviewed your patch on Rietveld (you should get an email notification).

    @JoshSnider
    Copy link
    Mannequin Author

    JoshSnider mannequin commented Mar 17, 2016

    Addresses review comments.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 21, 2016

    I wrote a microbenchmark with my benchmark.py tool.

    The patch always make bytes.replace(b'', char) and bytearray.replace(b'', char) faster even for strings of 10 bytes, the speedup on string of 1000 bytes or more is very interesting, even I never used this Python instruction :-)

    -------------+-------------+---------------
    type bytes | orig | patch
    -------------+-------------+---------------

    length=10    |  250 ns (*) |  211 ns (-15%)
    length=10**3 | 4.67 us (*) | 1.07 us (-77%)
    length=10**5 |  441 us (*) | 78.2 us (-82%)
    -------------+-------------+

    Total | 446 us (*) | 79.5 us (-82%)
    -------------+-------------+---------------

    ---------------+-------------+---------------
    type bytearray | orig | patch
    ---------------+-------------+---------------

    length=10      |  266 ns (*) |  224 ns (-16%)
    length=10**3   | 4.67 us (*) | 1.08 us (-77%)
    length=10**5   |  441 us (*) | 78.3 us (-82%)
    ---------------+-------------+

    Total | 446 us (*) | 79.6 us (-82%)
    ---------------+-------------+---------------

    ---------------+------------+---------------
    Summary | orig | patch
    ---------------+------------+---------------
    type bytes | 446 us () | 79.5 us (-82%)
    type bytearray | 446 us (
    ) | 79.6 us (-82%)
    ---------------+------------+---------------
    Total | 892 us (*) | 159 us (-82%)
    ---------------+------------+---------------

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 21, 2016

    New changeset 62e3b7af0697 by Victor Stinner in branch 'default':
    Optimize bytes.replace(b'', b'.')
    https://hg.python.org/cpython/rev/62e3b7af0697

    @vstinner
    Copy link
    Member

    vstinner commented Mar 21, 2016

    I pushed my latest patches, thanks for your contribution Josh.

    @serhiy-storchaka
    Copy link
    Member

    serhiy-storchaka commented Mar 21, 2016

    Is it worth to optimize this pretty rare special case?

    @vstinner
    Copy link
    Member

    vstinner commented Mar 21, 2016

    Is it worth to optimize this pretty rare special case?

    There was a TODO in the code, so I guess that the author wanted to write specialized code for 1-char replacement. Since the patch is short (adds 8 lines of C code), I consider that it's ok to optimize it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants