Optimize wrapper descriptors using FASTCALL #75724

vstinner · 2017-09-21T13:25:31Z

BPO	31543
Nosy	@rhettinger, @pitrou, @vstinner, @methane, @serhiy-storchaka
PRs	[WIP] bpo-31543: Optimize wrapper descriptors #3685

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-11-24.01:10:38.616>
created_at = <Date 2017-09-21.13:25:31.091>
labels = ['interpreter-core', '3.7', 'performance']
title = 'Optimize wrapper descriptors using FASTCALL'
updated_at = <Date 2017-11-24.01:10:38.614>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2017-11-24.01:10:38.614>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2017-11-24.01:10:38.616>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2017-09-21.13:25:31.091>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 31543
keywords = ['patch']
message_count = 8.0
messages = ['302692', '302697', '302709', '302710', '302717', '302724', '302733', '306865']
nosy_count = 5.0
nosy_names = ['rhettinger', 'pitrou', 'vstinner', 'methane', 'serhiy.storchaka']
pr_nums = ['3685']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue31543'
versions = ['Python 3.7']

vstinner · 2017-09-21T13:25:31Z

Attached pull request adds a fastpath for wrapper descriptors to use the FASTCALL calling convention. It's a follow up of bpo-31410 and all my work on FASTCALL during Python 3.6 and 3.7 development cycles.

Microbenchmark:

./python -m perf timeit -s 'import array; obj=array.array("b"); wrap=array.array.__len__' 'wrap(obj)'

Result:

haypo@selma$ ./python -m perf compare_to ref.json patch.json
Mean +- std dev: [ref] 59.2 ns +- 0.6 ns -> [patch] 28.2 ns +- 0.9 ns: 2.10x faster (-52%)

It removes 31 nanoseconds on such very fast C function, array_length().

Attached PR is still a work-in-progress. First I would like to know if it's worth it because working on polishing the actual code.

pitrou · 2017-09-21T15:44:16Z

First I would like to know if it's worth it

I don't know. What's the point of optimizing array.array.__len__(obj)? People usually call len(obj) for that...

vstinner · 2017-09-21T16:46:42Z

I don't know. What's the point of optimizing array.array.__len__(obj)? People usually call len(obj) for that...

Right, type.method(self) is less than common than self.method().

I looked at the stdlib. I found that the following method are called using wrapper descriptors:

object.__repr__(self)
object.__getattribute__(self, name)
_asyncio.Future.__del__(self)
int.__str__(obj)
etc.

But it seems like such calls are rare compared to other kinds of function calls.

--

By the way, _PyMethodDescr_FastCallKeywords() is only called from call_function() in Python/ceval.c. It's not used in Objects/call.c. Maybe we should use it there as well? It seems like this is a question about tracing. But maybe we can copy/paste the code from call_function()?

serhiy-storchaka · 2017-09-21T16:52:11Z

It optimizes the same cases as bpo-31410. bpo-31410 removed a half of the overhead for wrapper descriptors, Victor's patch removes the remaining half. Actually it makes calling the descriptor faster than calling the corresponding builtin! But bpo-31410 changes were simple, just few lines, and PR 3685 looks much more complex.

Are non-fast wrappers still needed? If the patch replaces the code instead of adding it this would decrease its cost. AFAIK the only descriptors that should support non-fast calling convention are __new__, __init__ and __call__.

As for practicality, this change should slightly speed up the code that directly calls __eq__, __lt__. For example classes decorated with total_ordering. Victor, can you provide microbenchmarks with more practical code?

rhettinger · 2017-09-21T19:40:37Z

This seems like a straight-forward win. I don't think we really need to see microbenchmarks before going forward with this one.

vstinner · 2017-09-21T21:09:37Z

The most complex slots are __new__, __init__ and __call__ because they
accept keywords. It's hard to design a calling convention which optimize
all cases because of the backward compatibility. The risk is to convert a
dict to args + kwnames (tuple) and then back to a dict: two expensive and
useless conversions.

It's my 3rd or 4th attempt to optimize new/init/call :-) My previous
attempt was to add new tp_fastXXX slots to types, but again thr backward
compatibilty was a major pain.
https://bugs.python.org/issue29358

Here the scope is much small and backward compatibilty shouldn't affect us.

I will try to find a way to optimize these slots as well and complete my
patch.

serhiy-storchaka · 2017-09-21T22:01:12Z

These slots are the only slots that accept keyword arguments and arbitrary number of positional parameters. Other slots can use fast calls.

vstinner · 2017-11-24T01:10:38Z

It seems like these specific descriptors are rare, so the added complexity is not worth it. I close my issue as rejected.

vstinner added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Sep 21, 2017

vstinner closed this as completed Nov 24, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize wrapper descriptors using FASTCALL #75724

Optimize wrapper descriptors using FASTCALL #75724

vstinner commented Sep 21, 2017

vstinner commented Sep 21, 2017

pitrou commented Sep 21, 2017

vstinner commented Sep 21, 2017

serhiy-storchaka commented Sep 21, 2017

rhettinger commented Sep 21, 2017

vstinner commented Sep 21, 2017

serhiy-storchaka commented Sep 21, 2017

vstinner commented Nov 24, 2017

Optimize wrapper descriptors using FASTCALL #75724

Optimize wrapper descriptors using FASTCALL #75724

Comments

vstinner commented Sep 21, 2017

vstinner commented Sep 21, 2017

pitrou commented Sep 21, 2017

vstinner commented Sep 21, 2017

serhiy-storchaka commented Sep 21, 2017

rhettinger commented Sep 21, 2017

vstinner commented Sep 21, 2017

serhiy-storchaka commented Sep 21, 2017

vstinner commented Nov 24, 2017