Reduce the overhead of tracing, profiling and quickening checks for calls. #112

markshannon · 2021-11-08T14:28:45Z

When entering a Python function we need to check for tracing/profiling and check to see if the function needs to be quickened.
We should be able to eliminate these checks for calls in most cases by:

Adding a START_FUNCTION instruction which does the above checks.
Quickening the START_FUNCTION to a NOP.
When specializing Python-to-Python calls, set f_lasti to 0 not -1, thus skipping the entry sequence entirely.

When exiting a function, we still need to check for tracing and profiling. This can be eliminated by adding a RETURN_VALUE_QUICK bytecode that skips the checks.

The text was updated successfully, but these errors were encountered:

markshannon · 2021-11-16T11:36:20Z

python/cpython#29575 is prep for this.

lpereira · 2021-12-01T00:59:15Z

I took a stab at implementing this; not sure yet if I haven't made a silly mistake (as I don't have an intuitive understanding of the whole compilation/evaluation machinery at this point, and need to play with this thing a little bit more to see if the changes in quickening are actually working as I expect them to.)

Changes can be seen in my personal fork. (NB: There are a few commits that aren't strictly related to this change, but they're just cleaning some stuff up.)

One of the things that are missing at this point is avoiding evaluating the START_FUNCTION instruction on Python-to-Python calls (@markshannon's third bullet point). Setting f_lasti to 0 when CALL_FUNCTION_PY_SIMPLE is being evaluated doesn't seem to be sufficient. I need to understand this thing a bit more in detail.

gvanrossum · 2021-12-01T19:19:02Z

Cool to see this. The issue with setting f_lasti = 0 might be that in some cases there are some opcodes generated before START_FUNCTION, specifically related to cells (free vars and/or cell vars)? Depends of course on what symptoms you see.

gvanrossum · 2021-12-08T23:54:29Z

Should we assign this issue to @lpereira and move it to her column on the project board?

markshannon · 2022-01-18T14:31:57Z

We now have a RESUME instruction which is executed every time a Python function is entered. This is the ideal place to insert checks for tracing, rather than performing them on each dispatch.
The general idea (which underpins PEP 669 as well) is to check whether the instrumentation for a code object matches the expected value (a quick int equality test) on every RESUME.
There is no need to check when returning from calls, as we can walk the stack when sys.settrace() (or similar) is called and eagerly perform the instrumentation.

Instrumentation should be precise and quick if we want tracing and profiling to perform well.
Initially, however, it can be a simple as setting all instructions in the quickened code to DO_TRACING. This gets a speedup in the non-tracing case and probably no slowdown for tracing case. We can them refine the instrumentation to improve the performance of coverage.py and (if accepted) support PEP 669.

ericsnowcurrently · 2022-01-19T16:29:15Z

There is no need to check when returning from calls, as we can walk the stack when sys.settrace() (or similar) is called and eagerly perform the instrumentation.

Good point. This would keep the semantics the same, right?

markshannon · 2022-04-04T10:13:52Z

Deferring the remainder of this issue (eliminating overhead of tracing, when not tracing) until 3.12a0 (which is only a month way now).

There are two issues that will need a bit of wider discussion:

We will we need to add a "compare and swap" operation to the PyAtomic API, in order to merge the check for instrumentation into the eval breaker checker.
To implement this correctly, we need to instrument all code objects currently being executed when changing the profiling or tracing function. This will slow down sys.settrace() and sys.setprofile() a lot, even if it does speed up the subsequent profiling.

gvanrossum · 2022-04-04T15:54:06Z

2. instrument all code objects currently being executed

I suppose you need to crawl the stack to find them all. But do you need to do anything for generators that are currently suspended (i.e. waiting for yield to resume)?

Would it be crazy to do the patchup for regular functions upon returning rather than at the time sys.settrace() etc. is called?

markshannon · 2022-04-04T16:03:51Z

Would it be crazy to do the patchup for regular functions upon returning rather than at the time sys.settrace() etc. is called?

Not crazy at all, but I couldn't make it efficient as it requires a check in too many places.
My initial strategy was to instrument all the code objects on the stack, but that doesn't work with re-entrant code.

gvanrossum · 2022-04-04T17:08:21Z

What's re-entrant code in this context? Generators?

markshannon · 2022-04-04T17:35:12Z

Any function can be re-entrant.
If we walk up the stack marking the return points, there is no guarantee that we will hit the return point in the call we expect or even in the same thread.

I think it should be possible to make it work just marking certain instructions as "pending instrumentation", but it there are too many corner cases for me to be confident that it would be correct.
Instrumenting all code objects present on any stack is a bit inefficient, but it is correct.

gvanrossum · 2022-04-04T17:40:01Z

Got it. So then I ask again, what about suspended generators? They're not on any stack IIUC.

markshannon · 2022-04-05T09:54:31Z

Suspended generators aren't on any stack, so we ignore them.

gvanrossum · 2022-04-05T22:24:54Z

Okay, I guess I don't understand the reason we need to "instrument all code objects currently being executed when changing the profiling or tracing function." Probably because I haven't thought enough about the requirements that lead to the implementation. I'll come back when I've had time to think about that.

markshannon · 2022-08-15T15:31:45Z

I'm postponing this until PEP 669 has been accepted or rejected.

markshannon · 2023-01-07T11:58:29Z

PEP 669 has been accepted, so instrumentation will be implemented as part of that PEP.

markshannon self-assigned this Nov 16, 2021

markshannon added this to To do in Old Faster CPython Project Board via automation Nov 16, 2021

markshannon moved this from To do to Doing (Mark) in Old Faster CPython Project Board Nov 16, 2021

markshannon mentioned this issue Nov 18, 2021

bpo-45753: Interpreter internal tweaks python/cpython#29575

Merged

lpereira mentioned this issue Dec 1, 2021

Reduce the overhead of tracing, profiling, and quickening checks for calls faster-cpython/cpython#8

Closed

gramster moved this from Doing (Mark) to In Progress in Old Faster CPython Project Board Dec 9, 2021

gramster moved this from In Progress to To do in Old Faster CPython Project Board Dec 13, 2021

markshannon mentioned this issue Jan 18, 2022

bpo-46420: Use NOTRACE_DISPATCH() in specialized opcodes python/cpython#30652

Merged

markshannon mentioned this issue Apr 4, 2022

Things to do before 3.11 beta #320

Closed

10 tasks

markshannon added the deferred label Aug 15, 2022

markshannon closed this as completed Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the overhead of tracing, profiling and quickening checks for calls. #112

Reduce the overhead of tracing, profiling and quickening checks for calls. #112

markshannon commented Nov 8, 2021

markshannon commented Nov 16, 2021

lpereira commented Dec 1, 2021 •

edited

Loading

gvanrossum commented Dec 1, 2021

gvanrossum commented Dec 8, 2021

markshannon commented Jan 18, 2022

ericsnowcurrently commented Jan 19, 2022

markshannon commented Apr 4, 2022

gvanrossum commented Apr 4, 2022

markshannon commented Apr 4, 2022

gvanrossum commented Apr 4, 2022

markshannon commented Apr 4, 2022 •

edited

Loading

gvanrossum commented Apr 4, 2022

markshannon commented Apr 5, 2022

gvanrossum commented Apr 5, 2022

markshannon commented Aug 15, 2022

markshannon commented Jan 7, 2023

Reduce the overhead of tracing, profiling and quickening checks for calls. #112

Reduce the overhead of tracing, profiling and quickening checks for calls. #112

Comments

markshannon commented Nov 8, 2021

markshannon commented Nov 16, 2021

lpereira commented Dec 1, 2021 • edited Loading

gvanrossum commented Dec 1, 2021

gvanrossum commented Dec 8, 2021

markshannon commented Jan 18, 2022

ericsnowcurrently commented Jan 19, 2022

markshannon commented Apr 4, 2022

gvanrossum commented Apr 4, 2022

markshannon commented Apr 4, 2022

gvanrossum commented Apr 4, 2022

markshannon commented Apr 4, 2022 • edited Loading

gvanrossum commented Apr 4, 2022

markshannon commented Apr 5, 2022

gvanrossum commented Apr 5, 2022

markshannon commented Aug 15, 2022

markshannon commented Jan 7, 2023

lpereira commented Dec 1, 2021 •

edited

Loading

markshannon commented Apr 4, 2022 •

edited

Loading