speed up function calls #43303

nnorwitz · 2006-05-01T06:58:24Z

BPO	1479611
Nosy	@loewis, @rhettinger, @etrepum, @abalkin, @pitrou
Files	speed.diff: v1 s2.diff: v2 func-speed.diff: v3 funcall.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2012-05-16.16:45:57.301>
created_at = <Date 2006-05-01.06:58:24.000>
labels = ['interpreter-core', 'performance']
title = 'speed up function calls'
updated_at = <Date 2012-05-16.16:45:57.299>
user = 'https://bugs.python.org/nnorwitz'

bugs.python.org fields:

activity = <Date 2012-05-16.16:45:57.299>
actor = 'pitrou'
assignee = 'none'
closed = True
closed_date = <Date 2012-05-16.16:45:57.301>
closer = 'pitrou'
components = ['Interpreter Core']
creation = <Date 2006-05-01.06:58:24.000>
creator = 'nnorwitz'
dependencies = []
files = ['7219', '7220', '7221', '9154']
hgrepos = []
issue_num = 1479611
keywords = ['patch', 'needs review']
message_count = 12.0
messages = ['50158', '50159', '50160', '50161', '50162', '50163', '50164', '50165', '59872', '110446', '110448', '160886']
nosy_count = 10.0
nosy_names = ['loewis', 'nnorwitz', 'collinwinter', 'rhettinger', 'bob.ippolito', 'belopolsky', 'pitrou', 'jyasskin', 'BreamoreBoy', 'pas']
pr_nums = []
priority = 'low'
resolution = 'out of date'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue1479611'
versions = ['Python 3.3']

nnorwitz · 2006-05-01T06:58:24Z

Results: 2.86% for 1 arg (len), 11.8% for 2 args
(min), and 1.6% for pybench.

trunk-speed$ ./python.exe -m timeit 'for x in
xrange(10000): len([])'
100 loops, best of 3: 4.74 msec per loop
trunk-speed$ ./python.exe -m timeit 'for x in
xrange(10000): min(1,2)'
100 loops, best of 3: 8.03 msec per loop

trunk-clean$ ./python.exe -m timeit 'for x in
xrange(10000): len([])'
100 loops, best of 3: 4.88 msec per loop
trunk-clean$ ./python.exe -m timeit 'for x in
xrange(10000): min(1,2)'
100 loops, best of 3: 9.09 msec per loop

pybench goes from 5688.00 down to 5598.00

Details about the patch:

There are 2 unrelated changes. They both seem to
provide equal benefits for calling varargs C. One is
very simple and just inlines calling a varargs C
function rather than calling PyCFunction_Call() which
does extra checks that are already known. This moves
meth and self up one block. and breaks the C_TRACE into
2. (When looking at the patch, this will make sense I
hope.)

The other change is more dangerous. It modifies
load_args() to hold on to tuples so they aren't
allocated and deallocated. The initialization is done
one time in the new func _PyEval_Init().

It allocates 64 tuples of size 8 that are never
deallocated. The idea is that there won't be usually
be more than 64 frames with 8 or less parameters active
on the stack at any one time (stack depth). There are
cases where this can degenerate, but for the most part,
it should only be marginally slower, but generally this
should be a fair amount faster by skipping the alloc
and dealloc and some extra work. My decrementing the
_last_index inside the needs_free blocks, that could
improve behaviour.

This really needs comments added to the code. But I'm
not gonna get there tonight. I'd be interested in
comments about the code.

nnorwitz · 2006-05-01T07:08:02Z

Logged In: YES
user_id=33168

I should note the numbers 64 and 8 are total guesses. It
might be good to try and determine values based on empirical
data.

loewis · 2006-05-01T08:27:45Z

Logged In: YES
user_id=21627

The tuples should get deallocated when Py_Finalize is called.

It would be good if there was (conditional) statistical
analysis, showing how often no tuple was found because the
number of arguments was too large, and how often no tuple
was found because the candidate was in use.

I think it should be more stack-like, starting off with no
tuples allocated, then returning them inside the needs_free
blocks only if the refcount is 1 (or 2?). This would avoid
degeneralized cases where some function holds onto its
argument tuple indefinitely, thus consuming all 64 tuples.

For the other part, I think it would make the code more
readable if it inlined PyCFunction_Call even more: the test
for NOARGS|O could be integrated into the switch statement
(one case for each), VARARGS and VARARGS|KEYWORDS would both
load the arguments, then call the function directly
(possibly with NULL keywords). OLDARGS should goto either
METH_NOARGS, METH_O, or METH_VARARGS depending on na (if you
don't like goto, modifying flags would work as well).

nnorwitz · 2006-05-05T08:27:33Z

Logged In: YES
user_id=33168

v2 attached. You might not want to review yet. I mostly
did the first part of your suggest (stats, _Fini, and
stack-like if I understood you correctly). I didn't do
anything on the second part about inlinting Function_Call.

perf seems to be about the same. I'm not entirely sure the
patch is correct yet. I found one or two problems in the
original. I added some more comments.

nnorwitz · 2006-05-11T07:43:05Z

Logged In: YES
user_id=33168

This version actually works (in both normal and debug
builds). It adds some stats which are useful and updates
Misc/SpecialBuilds.txt.

I modified to not preallocate and only hold a ref when the
function didn't keep a ref.

I still need to inline more of PyCFunction_Call. Speed is
still the same as before.

I'm not sure if I'll finish this before the sprint next
week. Anyone there feel free to check this in if you finish it.

etrepum · 2006-05-22T12:02:52Z

Logged In: YES
user_id=139309

The performance gain for this patch (as-is) on Mac OS X i386 with a release
build seems totally negligible. I'm not getting any consistent win with any of the
timeit or pybench benchmarks.

nnorwitz · 2006-05-23T05:32:09Z

Logged In: YES
user_id=33168

Interesting. I did the original work for this on an amd64
(gcc 3.4 i think). And continued work on ppc mac laptop
(gcc 4.0 i think). Both had improvements. I assume you
tested with v3? What about v1?

etrepum · 2006-05-23T08:45:57Z

Logged In: YES
user_id=139309

This was v3 on a MacBook Pro running 10.4.6 (gcc 4, of course, since that's the
only Apple-distributed i386 GCC for OS X).

pitrou · 2008-01-13T23:23:48Z

Here is a patch applicable for SVN trunk.
However, as Bob I have mixed results on this. For example, functions
with variable parameter count have become slower:

# With patch
$ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f(1)'
100 loops, best of 3: 4.92 msec per loop
$ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f()'
100 loops, best of 3: 4.07 msec per loop
$ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f(1,2)'
100 loops, best of 3: 5.04 msec per loop

# Without patch
$ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000):
f(1)'
100 loops, best of 3: 4.22 msec per loop
$ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f()'
100 loops, best of 3: 3.5 msec per loop
$ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000):
f(1,2)'
100 loops, best of 3: 4.46 msec per loop

BreamoreBoy · 2010-07-16T14:32:30Z

I'm not sure if this is worth pursuing given the way performance is so often governed by networking and/or IO issues today, bearing in mind comments like msg50163 and msg59872. I'd certainly like to see more comments from core developers. Could someone in the know please put them on the nosy list.

abalkin · 2010-07-16T14:48:54Z

I think Raymond might be interested. Since this is not a bug fix, it can only be considered for 3.x.

pitrou · 2012-05-16T16:45:57Z

Closing as terribly outdated (and not very promising).

nnorwitz mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 1, 2006

tiran added type-feature A feature request or enhancement labels Jan 12, 2008

terryjreedy added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Aug 9, 2010

rhettinger assigned rhettinger Aug 11, 2010

rhettinger removed their assignment Mar 22, 2011

pitrou closed this as completed May 16, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up function calls #43303

speed up function calls #43303

nnorwitz mannequin commented May 1, 2006

nnorwitz mannequin commented May 1, 2006

nnorwitz mannequin commented May 1, 2006

loewis mannequin commented May 1, 2006

nnorwitz mannequin commented May 5, 2006

nnorwitz mannequin commented May 11, 2006

etrepum mannequin commented May 22, 2006

nnorwitz mannequin commented May 23, 2006

etrepum mannequin commented May 23, 2006

pitrou commented Jan 13, 2008

BreamoreBoy mannequin commented Jul 16, 2010

abalkin commented Jul 16, 2010

pitrou commented May 16, 2012

speed up function calls #43303

speed up function calls #43303

Comments

nnorwitz mannequin commented May 1, 2006

nnorwitz mannequin commented May 1, 2006

nnorwitz mannequin commented May 1, 2006

loewis mannequin commented May 1, 2006

nnorwitz mannequin commented May 5, 2006

nnorwitz mannequin commented May 11, 2006

etrepum mannequin commented May 22, 2006

nnorwitz mannequin commented May 23, 2006

etrepum mannequin commented May 23, 2006

pitrou commented Jan 13, 2008

BreamoreBoy mannequin commented Jul 16, 2010

abalkin commented Jul 16, 2010

pitrou commented May 16, 2012