Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the overhead of tracing, profiling and quickening checks for calls. #112

Closed
markshannon opened this issue Nov 8, 2021 · 16 comments
Closed

Comments

@markshannon
Copy link
Member

When entering a Python function we need to check for tracing/profiling and check to see if the function needs to be quickened.
We should be able to eliminate these checks for calls in most cases by:

  1. Adding a START_FUNCTION instruction which does the above checks.
  2. Quickening the START_FUNCTION to a NOP.
  3. When specializing Python-to-Python calls, set f_lasti to 0 not -1, thus skipping the entry sequence entirely.

When exiting a function, we still need to check for tracing and profiling. This can be eliminated by adding a RETURN_VALUE_QUICK bytecode that skips the checks.

@markshannon markshannon self-assigned this Nov 16, 2021
@markshannon markshannon added this to To do in Old Faster CPython Project Board via automation Nov 16, 2021
@markshannon markshannon moved this from To do to Doing (Mark) in Old Faster CPython Project Board Nov 16, 2021
@markshannon
Copy link
Member Author

python/cpython#29575 is prep for this.

@lpereira
Copy link

lpereira commented Dec 1, 2021

I took a stab at implementing this; not sure yet if I haven't made a silly mistake (as I don't have an intuitive understanding of the whole compilation/evaluation machinery at this point, and need to play with this thing a little bit more to see if the changes in quickening are actually working as I expect them to.)

Changes can be seen in my personal fork. (NB: There are a few commits that aren't strictly related to this change, but they're just cleaning some stuff up.)

One of the things that are missing at this point is avoiding evaluating the START_FUNCTION instruction on Python-to-Python calls (@markshannon's third bullet point). Setting f_lasti to 0 when CALL_FUNCTION_PY_SIMPLE is being evaluated doesn't seem to be sufficient. I need to understand this thing a bit more in detail.

@gvanrossum
Copy link
Collaborator

Cool to see this. The issue with setting f_lasti = 0 might be that in some cases there are some opcodes generated before START_FUNCTION, specifically related to cells (free vars and/or cell vars)? Depends of course on what symptoms you see.

@gvanrossum
Copy link
Collaborator

Should we assign this issue to @lpereira and move it to her column on the project board?

@gramster gramster moved this from Doing (Mark) to In Progress in Old Faster CPython Project Board Dec 9, 2021
@gramster gramster moved this from In Progress to To do in Old Faster CPython Project Board Dec 13, 2021
@markshannon
Copy link
Member Author

We now have a RESUME instruction which is executed every time a Python function is entered. This is the ideal place to insert checks for tracing, rather than performing them on each dispatch.
The general idea (which underpins PEP 669 as well) is to check whether the instrumentation for a code object matches the expected value (a quick int equality test) on every RESUME.
There is no need to check when returning from calls, as we can walk the stack when sys.settrace() (or similar) is called and eagerly perform the instrumentation.

Instrumentation should be precise and quick if we want tracing and profiling to perform well.
Initially, however, it can be a simple as setting all instructions in the quickened code to DO_TRACING. This gets a speedup in the non-tracing case and probably no slowdown for tracing case. We can them refine the instrumentation to improve the performance of coverage.py and (if accepted) support PEP 669.

@ericsnowcurrently
Copy link
Collaborator

There is no need to check when returning from calls, as we can walk the stack when sys.settrace() (or similar) is called and eagerly perform the instrumentation.

Good point. This would keep the semantics the same, right?

@markshannon
Copy link
Member Author

Deferring the remainder of this issue (eliminating overhead of tracing, when not tracing) until 3.12a0 (which is only a month way now).

There are two issues that will need a bit of wider discussion:

  1. We will we need to add a "compare and swap" operation to the PyAtomic API, in order to merge the check for instrumentation into the eval breaker checker.
  2. To implement this correctly, we need to instrument all code objects currently being executed when changing the profiling or tracing function. This will slow down sys.settrace() and sys.setprofile() a lot, even if it does speed up the subsequent profiling.

@gvanrossum
Copy link
Collaborator

2. instrument all code objects currently being executed

I suppose you need to crawl the stack to find them all. But do you need to do anything for generators that are currently suspended (i.e. waiting for yield to resume)?

Would it be crazy to do the patchup for regular functions upon returning rather than at the time sys.settrace() etc. is called?

@markshannon
Copy link
Member Author

Would it be crazy to do the patchup for regular functions upon returning rather than at the time sys.settrace() etc. is called?

Not crazy at all, but I couldn't make it efficient as it requires a check in too many places.
My initial strategy was to instrument all the code objects on the stack, but that doesn't work with re-entrant code.

@gvanrossum
Copy link
Collaborator

What's re-entrant code in this context? Generators?

@markshannon
Copy link
Member Author

markshannon commented Apr 4, 2022

Any function can be re-entrant.
If we walk up the stack marking the return points, there is no guarantee that we will hit the return point in the call we expect or even in the same thread.

I think it should be possible to make it work just marking certain instructions as "pending instrumentation", but it there are too many corner cases for me to be confident that it would be correct.
Instrumenting all code objects present on any stack is a bit inefficient, but it is correct.

@gvanrossum
Copy link
Collaborator

Got it. So then I ask again, what about suspended generators? They're not on any stack IIUC.

@markshannon
Copy link
Member Author

Suspended generators aren't on any stack, so we ignore them.

@gvanrossum
Copy link
Collaborator

Okay, I guess I don't understand the reason we need to "instrument all code objects currently being executed when changing the profiling or tracing function." Probably because I haven't thought enough about the requirements that lead to the implementation. I'll come back when I've had time to think about that.

@markshannon
Copy link
Member Author

I'm postponing this until PEP 669 has been accepted or rejected.

@markshannon
Copy link
Member Author

PEP 669 has been accepted, so instrumentation will be implemented as part of that PEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants