Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memcmp into unicode_compare for optimizing comparisons #57488

Closed
RichIsMyName mannequin opened this issue Oct 27, 2011 · 12 comments
Closed

Add memcmp into unicode_compare for optimizing comparisons #57488

RichIsMyName mannequin opened this issue Oct 27, 2011 · 12 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@RichIsMyName
Copy link
Mannequin

RichIsMyName mannequin commented Oct 27, 2011

BPO 13279
Nosy @loewis, @rhettinger, @pitrou, @scoder, @vstinner, @ashemedai, @ezio-melotti, @akheron
Files
  • unicode_with_memcmp.patch: Patch for unicode_compare changes, plus added tests in test_unicode.py
  • unicode_with_memcmp_and_ucs_specialization.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-11-01.06:56:14.401>
    created_at = <Date 2011-10-27.17:52:41.105>
    labels = ['interpreter-core', 'performance']
    title = 'Add memcmp into unicode_compare for optimizing comparisons'
    updated_at = <Date 2011-11-01.06:56:36.519>
    user = 'https://bugs.python.org/RichIsMyName'

    bugs.python.org fields:

    activity = <Date 2011-11-01.06:56:36.519>
    actor = 'loewis'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-11-01.06:56:14.401>
    closer = 'loewis'
    components = ['Interpreter Core']
    creation = <Date 2011-10-27.17:52:41.105>
    creator = 'RichIsMyName'
    dependencies = []
    files = ['23537', '23574']
    hgrepos = []
    issue_num = 13279
    keywords = ['patch']
    message_count = 12.0
    messages = ['146508', '146511', '146540', '146541', '146730', '146732', '146733', '146742', '146744', '146747', '146755', '146759']
    nosy_count = 9.0
    nosy_names = ['loewis', 'rhettinger', 'pitrou', 'scoder', 'vstinner', 'asmodai', 'ezio.melotti', 'petri.lehtinen', 'RichIsMyName']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue13279'
    versions = ['Python 3.3']

    @RichIsMyName
    Copy link
    Mannequin Author

    RichIsMyName mannequin commented Oct 27, 2011

    In discussions of memcmp performance, (http://www.picklingtools.com/study.pdf)
    it was noted how well Python 2.7 can take advantage of faster memcmps (indeed, the rich comparisons are all memcmp calls).
    There have been some discussion on python-dev@python.org as well.

    With unicode and Python 3.3 (and anyPython 3.x) there are a
    few places we could call memcmp to make string comparisons faster, but they are not completely trivial.

    Basically, if the unicode strings are "1 byte kind", then memcmp can be used almost as is. If the unicode strings are the same kind, they can at least use memcmp to compare for equality or inequality.

    There is also a minor optimization laying in unicode_compare: if you
    are comparing two strings for equality/inequality, there is no reason to look at the entire string if the lengths are different.

    These 3 minor optimizations can make unicode_compare faster.

    @RichIsMyName RichIsMyName mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 27, 2011
    @RichIsMyName
    Copy link
    Mannequin Author

    RichIsMyName mannequin commented Oct 27, 2011

    This is a potential patch:
    I believe it follows the C-style of PEP-7

    There is a test as well, testing 1 and 2 byte kinds.

    I have run it through the python tests and have added no new breakages
    (there were some tests that failed, but they failed with and without the patch)

    @RichIsMyName RichIsMyName mannequin changed the title Add memcmp into unicode_compare for optimizing compares Add memcmp into unicode_compare for optimizing comparisons Oct 27, 2011
    @vstinner
    Copy link
    Member

    I would be nice to have a third path for inegality with kind1==kind2, something like:

    else if (kind1 == PyUnicode_2BYTE_KIND && kind2 == PyUnicode_2BYTE_KIND)
    {
    /* use Py_UCS2* pointers */
    }
    else if (kind1 == PyUnicode_4BYTE_KIND && kind2 == PyUnicode_4BYTE_KIND)
    {
    /* use Py_UCS4* pointers */
    }

    Inegality comparaisons are used to sort Unicode lists for example.

    @vstinner
    Copy link
    Member

    These 3 minor optimizations can make unicode_compare faster.

    Can you please try to write a short benchmark script? (or just run a benchmark using ./python -m timeit)

    @RichIsMyName
    Copy link
    Mannequin Author

    RichIsMyName mannequin commented Oct 31, 2011

    Here's a test demonstrating the memcmp optimization effect:
    -----------------------------------------------------------
    more ~/PickTest5/Python/string_test3.py

    a = []
    b = []
    c = []
    d = []
    for x in range(0,1000) :
        a.append("the quick brown fox"+str(x))
        b.append("the wuick brown fox"+str(x))
        c.append("the quick brown fox"+str(x))
        d.append("the wuick brown fox"+str(x))
    
    count = 0
    for x in range(0,200000) :
        if a==c : count += 1
        if a==c : count += 2
        if a==d : count += 3
        if b==c : count += 5
        if b==d : count += 7
        if c==d : count += 11
    
    print(count)

    # plain Python 3.3 no memcmp, no UCS specialization

    % time ./python ~/PickTest5/Python/string_test3.py
    2000000
    28.918u 0.014s 0:29.02 99.6% 0+0k 0+0io 0pf+0w

    % setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)

    % time ./python ~/PickTest5/Python/string_test3.py
    2000000
    29.783u 0.017s 0:29.88 99.6% 0+0k 0+0io 0pf+0w

    -------------------------------------------------------------------------
    # Python 3.3 with memcmp optimizations and UCS2/4 specialization (no CFLAGS)

    % time ./python ~/PickTest5/Python/string_test3.py
    2000000
    37.126u 0.046s 0:37.35 99.4% 0+0k 0+0io 0pf+0w

    % setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)

    % time ./python ~/PickTest5/Python/string_test3.py
    2000000
    18.621u 0.013s 0:18.69 99.6% 0+0k 0+0io 0pf+0w

    ---------------------------------------------------------------------

    Note we only really see the effect if we make sure that gcc
    isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
    is SO important on gcc builds!
    See http://www.picklingtools.com/study.pdf

    I am currently working with Bob Arendt (a colleague who works
    frequently with Fedora) to try to put the
    -fno-builtin-memcmp in the .spec file for their Python

    @pitrou
    Copy link
    Member

    pitrou commented Oct 31, 2011

    I am currently working with Bob Arendt (a colleague who works
    frequently with Fedora) to try to put the
    -fno-builtin-memcmp in the .spec file for their Python

    Wouldn't it be better to have it enabled by default in their gcc?

    @RichIsMyName
    Copy link
    Mannequin Author

    RichIsMyName mannequin commented Oct 31, 2011

    Added branches for specializing for UCS2 and UCS4 types

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 31, 2011

    Note we only really see the effect if we make sure that gcc
    isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
    is SO important on gcc builds!

    I'd rather infer the opposite: given how GCC generates code, this patch
    is not worthwhile. Given that it actually slows down Python in the
    default system configuration, I'm -1 on applying it. This is really not
    a route we should take; it leads to maintenance pain.

    @loewis loewis mannequin changed the title Add memcmp into unicode_compare for optimizing comparisons Add memcmp into unicode_compare for optimizing comparisons Oct 31, 2011
    @rhettinger
    Copy link
    Contributor

    I concur with Martin.

    @RichIsMyName
    Copy link
    Mannequin Author

    RichIsMyName mannequin commented Oct 31, 2011

    Some more information:

    Bob Arendt and I have been playing with the Fedora Core .spec file
    for python on Fedora Core 15:
    the compile options we found seem to automatically (as we did non invoke
    this option) invoke '-fno-builtin-memcmp' somehow? We disassembled the
    Python binary we built for the machine ourselves (via the spec file)
    code and saw, yes, it was calling memcmp on the system, even though we
    didn't bypass it explicitly.

    Our conjecture is that the -m32 or -mtune=atom automatically turns the builtin memcmp off, but we're not sure (we're still playing with it).

    However, perhaps Fedora builds on a more generic machine: that generic build keeps the 'rep cmbsb'?

    Frustrating.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 31, 2011

    > Note we only really see the effect if we make sure that gcc
    > isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
    > is SO important on gcc builds!

    I'd rather infer the opposite: given how GCC generates code, this patch
    is not worthwhile. Given that it actually slows down Python in the
    default system configuration, I'm -1 on applying it. This is really not
    a route we should take; it leads to maintenance pain.

    I agree with Martin. This patch would be very nice if there wasn't the
    memcmp() perf problem. A possible solution would be to write a simple
    optimized memcmp()-alike for our own purposes.
    But I'm not sure it's really worth the hassle: comparing long unicode
    strings doesn't strike me as a very common operation. Finding a short
    substring, conatenating strings together, are all much more common.

    @pitrou pitrou changed the title Add memcmp into unicode_compare for optimizing comparisons Add memcmp into unicode_compare for optimizing comparisons Oct 31, 2011
    @loewis loewis mannequin closed this as completed Nov 1, 2011
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 1, 2011

    Ok, closing the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants