Add memcmp into unicode_compare for optimizing comparisons #57488

RichIsMyName · 2011-10-27T17:52:41Z

BPO	13279
Nosy	@loewis, @rhettinger, @pitrou, @scoder, @vstinner, @ashemedai, @ezio-melotti, @akheron
Files	unicode_with_memcmp.patch: Patch for unicode_compare changes, plus added tests in test_unicode.py unicode_with_memcmp_and_ucs_specialization.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2011-11-01.06:56:14.401>
created_at = <Date 2011-10-27.17:52:41.105>
labels = ['interpreter-core', 'performance']
title = 'Add memcmp into unicode_compare for optimizing comparisons'
updated_at = <Date 2011-11-01.06:56:36.519>
user = 'https://bugs.python.org/RichIsMyName'

bugs.python.org fields:

activity = <Date 2011-11-01.06:56:36.519>
actor = 'loewis'
assignee = 'none'
closed = True
closed_date = <Date 2011-11-01.06:56:14.401>
closer = 'loewis'
components = ['Interpreter Core']
creation = <Date 2011-10-27.17:52:41.105>
creator = 'RichIsMyName'
dependencies = []
files = ['23537', '23574']
hgrepos = []
issue_num = 13279
keywords = ['patch']
message_count = 12.0
messages = ['146508', '146511', '146540', '146541', '146730', '146732', '146733', '146742', '146744', '146747', '146755', '146759']
nosy_count = 9.0
nosy_names = ['loewis', 'rhettinger', 'pitrou', 'scoder', 'vstinner', 'asmodai', 'ezio.melotti', 'petri.lehtinen', 'RichIsMyName']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue13279'
versions = ['Python 3.3']

RichIsMyName · 2011-10-27T17:52:41Z

In discussions of memcmp performance, (http://www.picklingtools.com/study.pdf)
it was noted how well Python 2.7 can take advantage of faster memcmps (indeed, the rich comparisons are all memcmp calls).
There have been some discussion on python-dev@python.org as well.

With unicode and Python 3.3 (and anyPython 3.x) there are a
few places we could call memcmp to make string comparisons faster, but they are not completely trivial.

Basically, if the unicode strings are "1 byte kind", then memcmp can be used almost as is. If the unicode strings are the same kind, they can at least use memcmp to compare for equality or inequality.

There is also a minor optimization laying in unicode_compare: if you
are comparing two strings for equality/inequality, there is no reason to look at the entire string if the lengths are different.

These 3 minor optimizations can make unicode_compare faster.

RichIsMyName · 2011-10-27T17:58:54Z

This is a potential patch:
I believe it follows the C-style of PEP-7

There is a test as well, testing 1 and 2 byte kinds.

I have run it through the python tests and have added no new breakages
(there were some tests that failed, but they failed with and without the patch)

vstinner · 2011-10-28T08:52:45Z

I would be nice to have a third path for inegality with kind1==kind2, something like:

else if (kind1 == PyUnicode_2BYTE_KIND && kind2 == PyUnicode_2BYTE_KIND)
{
/* use Py_UCS2* pointers */
}
else if (kind1 == PyUnicode_4BYTE_KIND && kind2 == PyUnicode_4BYTE_KIND)
{
/* use Py_UCS4* pointers */
}

Inegality comparaisons are used to sort Unicode lists for example.

vstinner · 2011-10-28T08:54:09Z

These 3 minor optimizations can make unicode_compare faster.

Can you please try to write a short benchmark script? (or just run a benchmark using ./python -m timeit)

RichIsMyName · 2011-10-31T19:00:00Z

Here's a test demonstrating the memcmp optimization effect:
-----------------------------------------------------------
more ~/PickTest5/Python/string_test3.py

a = []
b = []
c = []
d = []
for x in range(0,1000) :
    a.append("the quick brown fox"+str(x))
    b.append("the wuick brown fox"+str(x))
    c.append("the quick brown fox"+str(x))
    d.append("the wuick brown fox"+str(x))

count = 0
for x in range(0,200000) :
    if a==c : count += 1
    if a==c : count += 2
    if a==d : count += 3
    if b==c : count += 5
    if b==d : count += 7
    if c==d : count += 11

print(count)

# plain Python 3.3 no memcmp, no UCS specialization

% time ./python ~/PickTest5/Python/string_test3.py
2000000
28.918u 0.014s 0:29.02 99.6% 0+0k 0+0io 0pf+0w

% setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)

% time ./python ~/PickTest5/Python/string_test3.py
2000000
29.783u 0.017s 0:29.88 99.6% 0+0k 0+0io 0pf+0w

-------------------------------------------------------------------------
# Python 3.3 with memcmp optimizations and UCS2/4 specialization (no CFLAGS)

% time ./python ~/PickTest5/Python/string_test3.py
2000000
37.126u 0.046s 0:37.35 99.4% 0+0k 0+0io 0pf+0w

% setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)

% time ./python ~/PickTest5/Python/string_test3.py
2000000
18.621u 0.013s 0:18.69 99.6% 0+0k 0+0io 0pf+0w

---------------------------------------------------------------------

Note we only really see the effect if we make sure that gcc
isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
is SO important on gcc builds!
See http://www.picklingtools.com/study.pdf

I am currently working with Bob Arendt (a colleague who works
frequently with Fedora) to try to put the
-fno-builtin-memcmp in the .spec file for their Python

pitrou · 2011-10-31T19:03:37Z

I am currently working with Bob Arendt (a colleague who works
frequently with Fedora) to try to put the
-fno-builtin-memcmp in the .spec file for their Python

Wouldn't it be better to have it enabled by default in their gcc?

RichIsMyName · 2011-10-31T19:03:47Z

Added branches for specializing for UCS2 and UCS4 types

loewis · 2011-10-31T20:49:33Z

Note we only really see the effect if we make sure that gcc
isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
is SO important on gcc builds!

I'd rather infer the opposite: given how GCC generates code, this patch
is not worthwhile. Given that it actually slows down Python in the
default system configuration, I'm -1 on applying it. This is really not
a route we should take; it leads to maintenance pain.

rhettinger · 2011-10-31T20:56:52Z

I concur with Martin.

RichIsMyName · 2011-10-31T22:30:03Z

Some more information:

Bob Arendt and I have been playing with the Fedora Core .spec file
for python on Fedora Core 15:
the compile options we found seem to automatically (as we did non invoke
this option) invoke '-fno-builtin-memcmp' somehow? We disassembled the
Python binary we built for the machine ourselves (via the spec file)
code and saw, yes, it was calling memcmp on the system, even though we
didn't bypass it explicitly.

Our conjecture is that the -m32 or -mtune=atom automatically turns the builtin memcmp off, but we're not sure (we're still playing with it).

However, perhaps Fedora builds on a more generic machine: that generic build keeps the 'rep cmbsb'?

Frustrating.

pitrou · 2011-10-31T23:47:44Z

> Note we only really see the effect if we make sure that gcc
> isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
> is SO important on gcc builds!

I'd rather infer the opposite: given how GCC generates code, this patch
is not worthwhile. Given that it actually slows down Python in the
default system configuration, I'm -1 on applying it. This is really not
a route we should take; it leads to maintenance pain.

I agree with Martin. This patch would be very nice if there wasn't the
memcmp() perf problem. A possible solution would be to write a simple
optimized memcmp()-alike for our own purposes.
But I'm not sure it's really worth the hassle: comparing long unicode
strings doesn't strike me as a very common operation. Finding a short
substring, conatenating strings together, are all much more common.

loewis · 2011-11-01T06:56:36Z

Ok, closing the issue.

RichIsMyName mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 27, 2011

RichIsMyName mannequin changed the title ~~Add memcmp into unicode_compare for optimizing compares~~ Add memcmp into unicode_compare for optimizing comparisons Oct 27, 2011

loewis mannequin changed the title ~~Add memcmp into unicode_compare for optimizing comparisons~~ Add memcmp into unicode_compare for optimizing comparisons Oct 31, 2011

pitrou changed the title ~~Add memcmp into unicode_compare for optimizing comparisons~~ Add memcmp into unicode_compare for optimizing comparisons Oct 31, 2011

loewis mannequin closed this as completed Nov 1, 2011

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add memcmp into unicode_compare for optimizing comparisons #57488

Add memcmp into unicode_compare for optimizing comparisons #57488

RichIsMyName mannequin commented Oct 27, 2011

RichIsMyName mannequin commented Oct 27, 2011

RichIsMyName mannequin commented Oct 27, 2011

vstinner commented Oct 28, 2011

vstinner commented Oct 28, 2011

RichIsMyName mannequin commented Oct 31, 2011

pitrou commented Oct 31, 2011

RichIsMyName mannequin commented Oct 31, 2011

loewis mannequin commented Oct 31, 2011

rhettinger commented Oct 31, 2011

RichIsMyName mannequin commented Oct 31, 2011

pitrou commented Oct 31, 2011

loewis mannequin commented Nov 1, 2011

Add memcmp into unicode_compare for optimizing comparisons #57488

Add memcmp into unicode_compare for optimizing comparisons #57488

Comments

RichIsMyName mannequin commented Oct 27, 2011

RichIsMyName mannequin commented Oct 27, 2011

RichIsMyName mannequin commented Oct 27, 2011

vstinner commented Oct 28, 2011

vstinner commented Oct 28, 2011

RichIsMyName mannequin commented Oct 31, 2011

pitrou commented Oct 31, 2011

RichIsMyName mannequin commented Oct 31, 2011

loewis mannequin commented Oct 31, 2011

rhettinger commented Oct 31, 2011

RichIsMyName mannequin commented Oct 31, 2011

pitrou commented Oct 31, 2011

loewis mannequin commented Nov 1, 2011