Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up function calls #43303

Closed
nnorwitz mannequin opened this issue May 1, 2006 · 12 comments
Closed

speed up function calls #43303

nnorwitz mannequin opened this issue May 1, 2006 · 12 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@nnorwitz
Copy link
Mannequin

nnorwitz mannequin commented May 1, 2006

BPO 1479611
Nosy @loewis, @rhettinger, @etrepum, @abalkin, @pitrou
Files
  • speed.diff: v1
  • s2.diff: v2
  • func-speed.diff: v3
  • funcall.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-05-16.16:45:57.301>
    created_at = <Date 2006-05-01.06:58:24.000>
    labels = ['interpreter-core', 'performance']
    title = 'speed up function calls'
    updated_at = <Date 2012-05-16.16:45:57.299>
    user = 'https://bugs.python.org/nnorwitz'

    bugs.python.org fields:

    activity = <Date 2012-05-16.16:45:57.299>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-05-16.16:45:57.301>
    closer = 'pitrou'
    components = ['Interpreter Core']
    creation = <Date 2006-05-01.06:58:24.000>
    creator = 'nnorwitz'
    dependencies = []
    files = ['7219', '7220', '7221', '9154']
    hgrepos = []
    issue_num = 1479611
    keywords = ['patch', 'needs review']
    message_count = 12.0
    messages = ['50158', '50159', '50160', '50161', '50162', '50163', '50164', '50165', '59872', '110446', '110448', '160886']
    nosy_count = 10.0
    nosy_names = ['loewis', 'nnorwitz', 'collinwinter', 'rhettinger', 'bob.ippolito', 'belopolsky', 'pitrou', 'jyasskin', 'BreamoreBoy', 'pas']
    pr_nums = []
    priority = 'low'
    resolution = 'out of date'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue1479611'
    versions = ['Python 3.3']

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented May 1, 2006

    Results: 2.86% for 1 arg (len), 11.8% for 2 args
    (min), and 1.6% for pybench.

    trunk-speed$ ./python.exe -m timeit 'for x in
    xrange(10000): len([])'
    100 loops, best of 3: 4.74 msec per loop
    trunk-speed$ ./python.exe -m timeit 'for x in
    xrange(10000): min(1,2)'
    100 loops, best of 3: 8.03 msec per loop

    trunk-clean$ ./python.exe -m timeit 'for x in
    xrange(10000): len([])'
    100 loops, best of 3: 4.88 msec per loop
    trunk-clean$ ./python.exe -m timeit 'for x in
    xrange(10000): min(1,2)'
    100 loops, best of 3: 9.09 msec per loop

    pybench goes from 5688.00 down to 5598.00

    Details about the patch:

    There are 2 unrelated changes. They both seem to
    provide equal benefits for calling varargs C. One is
    very simple and just inlines calling a varargs C
    function rather than calling PyCFunction_Call() which
    does extra checks that are already known. This moves
    meth and self up one block. and breaks the C_TRACE into
    2. (When looking at the patch, this will make sense I
    hope.)

    The other change is more dangerous. It modifies
    load_args() to hold on to tuples so they aren't
    allocated and deallocated. The initialization is done
    one time in the new func _PyEval_Init().

    It allocates 64 tuples of size 8 that are never
    deallocated. The idea is that there won't be usually
    be more than 64 frames with 8 or less parameters active
    on the stack at any one time (stack depth). There are
    cases where this can degenerate, but for the most part,
    it should only be marginally slower, but generally this
    should be a fair amount faster by skipping the alloc
    and dealloc and some extra work. My decrementing the
    _last_index inside the needs_free blocks, that could
    improve behaviour.

    This really needs comments added to the code. But I'm
    not gonna get there tonight. I'd be interested in
    comments about the code.

    @nnorwitz nnorwitz mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 1, 2006
    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented May 1, 2006

    Logged In: YES
    user_id=33168

    I should note the numbers 64 and 8 are total guesses. It
    might be good to try and determine values based on empirical
    data.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 1, 2006

    Logged In: YES
    user_id=21627

    The tuples should get deallocated when Py_Finalize is called.

    It would be good if there was (conditional) statistical
    analysis, showing how often no tuple was found because the
    number of arguments was too large, and how often no tuple
    was found because the candidate was in use.

    I think it should be more stack-like, starting off with no
    tuples allocated, then returning them inside the needs_free
    blocks only if the refcount is 1 (or 2?). This would avoid
    degeneralized cases where some function holds onto its
    argument tuple indefinitely, thus consuming all 64 tuples.

    For the other part, I think it would make the code more
    readable if it inlined PyCFunction_Call even more: the test
    for NOARGS|O could be integrated into the switch statement
    (one case for each), VARARGS and VARARGS|KEYWORDS would both
    load the arguments, then call the function directly
    (possibly with NULL keywords). OLDARGS should goto either
    METH_NOARGS, METH_O, or METH_VARARGS depending on na (if you
    don't like goto, modifying flags would work as well).

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented May 5, 2006

    Logged In: YES
    user_id=33168

    v2 attached. You might not want to review yet. I mostly
    did the first part of your suggest (stats, _Fini, and
    stack-like if I understood you correctly). I didn't do
    anything on the second part about inlinting Function_Call.

    perf seems to be about the same. I'm not entirely sure the
    patch is correct yet. I found one or two problems in the
    original. I added some more comments.

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented May 11, 2006

    Logged In: YES
    user_id=33168

    This version actually works (in both normal and debug
    builds). It adds some stats which are useful and updates
    Misc/SpecialBuilds.txt.

    I modified to not preallocate and only hold a ref when the
    function didn't keep a ref.

    I still need to inline more of PyCFunction_Call. Speed is
    still the same as before.

    I'm not sure if I'll finish this before the sprint next
    week. Anyone there feel free to check this in if you finish it.

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented May 22, 2006

    Logged In: YES
    user_id=139309

    The performance gain for this patch (as-is) on Mac OS X i386 with a release
    build seems totally negligible. I'm not getting any consistent win with any of the
    timeit or pybench benchmarks.

    @nnorwitz
    Copy link
    Mannequin Author

    nnorwitz mannequin commented May 23, 2006

    Logged In: YES
    user_id=33168

    Interesting. I did the original work for this on an amd64
    (gcc 3.4 i think). And continued work on ppc mac laptop
    (gcc 4.0 i think). Both had improvements. I assume you
    tested with v3? What about v1?

    @etrepum
    Copy link
    Mannequin

    etrepum mannequin commented May 23, 2006

    Logged In: YES
    user_id=139309

    This was v3 on a MacBook Pro running 10.4.6 (gcc 4, of course, since that's the
    only Apple-distributed i386 GCC for OS X).

    @tiran tiran added type-feature A feature request or enhancement labels Jan 12, 2008
    @pitrou
    Copy link
    Member

    pitrou commented Jan 13, 2008

    Here is a patch applicable for SVN trunk.
    However, as Bob I have mixed results on this. For example, functions
    with variable parameter count have become slower:

    # With patch
    $ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f(1)'
    100 loops, best of 3: 4.92 msec per loop
    $ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f()'
    100 loops, best of 3: 4.07 msec per loop
    $ ./python -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f(1,2)'
    100 loops, best of 3: 5.04 msec per loop

    # Without patch
    $ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000):
    f(1)'
    100 loops, best of 3: 4.22 msec per loop
    $ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000): f()'
    100 loops, best of 3: 3.5 msec per loop
    $ ./python-orig -m timeit -s "def f(*x): pass" 'for x in xrange(10000):
    f(1,2)'
    100 loops, best of 3: 4.46 msec per loop

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 16, 2010

    I'm not sure if this is worth pursuing given the way performance is so often governed by networking and/or IO issues today, bearing in mind comments like msg50163 and msg59872. I'd certainly like to see more comments from core developers. Could someone in the know please put them on the nosy list.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 16, 2010

    I think Raymond might be interested. Since this is not a bug fix, it can only be considered for 3.x.

    @terryjreedy terryjreedy added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Aug 9, 2010
    @rhettinger rhettinger removed their assignment Mar 22, 2011
    @pitrou
    Copy link
    Member

    pitrou commented May 16, 2012

    Closing as terribly outdated (and not very promising).

    @pitrou pitrou closed this as completed May 16, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants