Make it easier to traverse the frame stack for third party tools. #100987

markshannon · 2023-01-12T14:27:36Z

Profilers and debuggers need to traverse the frame stack, but the layout of the stack is an internal implementation detail.
However can make some limited promises to make porting tools between Python versions a bit easier.

In order to traverse the stack, the offset of the previous pointer needs to be known. To understand the frame, more information is needed.

@pablogsal
@Yhg1s
expressed interest in this.

Linked PRs

GH-100987: Refactor _PyInterpreterFrame a bit, to assist generator improvement. #100988
GH-100987: Don't cache references to the names and consts array in _PyEval_EvalFrameDefault. #102640
GH-100987: Allow non python frames in frame stack. #103010
GH-100987: Allow objects other than code objects as the "executable" of an internal frame. #105727

The text was updated successfully, but these errors were encountered:

markshannon · 2023-01-12T14:34:43Z

Initially, I propose to refactor the PyInterpreterFrame struct such that it starts:

typedef struct _PyInterpreterFrame {
    PyCodeObject *f_code;
    struct _PyInterpreterFrame *previous;
    ...

Currently f_code must be a code object, but we could generalize it to allow other objects.
For example, the shim frame inserted on entry to _PyEval_EvalFrameDefault could have that field set to None indicating it should be skipped in tracebacks, etc.

The order of f_code and previous doesn't really matter, but have f_code first makes #100719 a bit simpler

pablogsal · 2023-01-12T15:01:44Z

Let me collect some feedback from maintainers of debuggers and profilers and will comment here the requirements so we can think of solutions.

…improvement. (GH-100988) Refactor _PyInterpreterFrame a bit, to assist generator improvement.

* main: pythongh-101810: Remove duplicated st_ino calculation (pythonGH-101811) pythongh-92547: Purge sqlite3_enable_shared_cache() detection from configure (python#101873) pythonGH-100987: Refactor `_PyInterpreterFrame` a bit, to assist generator improvement. (pythonGH-100988) pythonGH-87849: Simplify stack effect of SEND and specialize it for generators and coroutines. (pythonGH-101788) Correct trivial grammar in reset_mock docs (python#101861) pythongh-101845: pyspecific: Fix i18n for availability directive (pythonGH-101846) pythongh-89792: Limit test_tools freeze test build parallelism based on the number of cores (python#101841) pythongh-85984: Utilize new "winsize" functions from termios in pty tests. (python#101831) pythongh-89792: Prevent test_tools from copying 1000M of "source" in freeze test (python#101837) Fix typo in test_fstring.py (python#101823) pythonGH-101797: allocate `PyExpat_CAPI` capsule on heap (python#101798) pythongh-101390: Fix docs for `imporlib.util.LazyLoader.factory` to properly call it a class method (pythonGH-101391)

markshannon · 2023-03-13T11:10:12Z

@pablogsal Any feedback?

markshannon · 2023-03-13T11:21:34Z

We can further improve traversal of the _PyInterpreterFrame for debugging and introspection by allowing C extensions to create frames without the rigmarole of creating a code object.

We should rename the f_code field to f_executable, and allow any object.

typedef struct _PyVMFrame {
    PyObject *f_executable;
    struct _PyVMFrame *previous;
} PyVMFrame;

Although tools and the VM should tolerate any object, we should in practice only allow a few classes:

CodeObject: Implies that the PyVMFrame is a full _PyInterpreterFrame. Only the VM should make this kind of frame
Builtin function, method descriptor, slot wrapper, etc. The frame represents a call to the given object.
None: Internal shim. Tools should skip this frame.
Tuple: First three items should be name, filename, flags where flags determine the meaning of additional entries.

The tuple form is for tools like Cython, Nanobind, etc. Creating a tuple of strs and ints is much simpler and faster than creating a fake code object.

C extension can link themselves into the frame stack at the cost of about 4 memory writes, and 3 reads:

    PyVMFrame frame;
    frame.previous = tstate->current_frame.frame;
    frame.f_executable = &EXECUTABLE_OBJECT;
    tstate->current_frame.frame = &frame;
    /* body of function goes here */
    tstate->current_frame.frame = frame.previous;

We can do this for builtins functions by modifying the vectorcall function assigned to the builtin function/method descriptor.
We would need to benchmark this to see the performance impact, but it will be much cheaper than sys.activate_stack_trampoline()

pablogsal · 2023-03-13T14:18:36Z

@pablogsal Any feedback?

I have reached out again to tool authors, give me a couple of days to gather comments. Apologies for the delay

markshannon · 2023-03-13T17:28:41Z

No problem.

…PyEval_EvalFrameDefault`. (#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.

* main: (50 commits) pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685) pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661) pythongh-102354: change python3 to python in docs examples (python#102696) pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506) pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684) pythongh-100315: clarification to `__slots__` docs. (python#102621) pythonGH-100227: cleanup initialization of global interned dict (python#102682) doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677) pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014) pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649) pythongh-102627: Replace address pointing toward malicious web page (python#102630) pythongh-98831: Use DECREF_INPUTS() more (python#102409) pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659) pythongh-101524: Fix the ChannelID tp_name (pythongh-102655) pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075) pythongh-98169 dataclasses.astuple support DefaultDict (python#98170) pythongh-102650: Remove duplicate include directives from multiple source files (python#102651) pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640) pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631) ...

itamarst · 2023-03-15T01:55:51Z

@benfred -^

markshannon · 2023-03-16T14:59:25Z

I've made a branch that adds "lightweight" frames (just a pointer to a "code" object and a link pointer), and inserts one for each call to a builtin function in the interpreter. The performance impact is negligible and all builtin function and class calls are present in the frame stack.

Branch: https://github.com/python/cpython/compare/main...faster-cpython:cpython:allow-non-python-frames?expand=1

Performance: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20230315-3.12.0a6%2B-3e2c3ab/bm-20230315-linux-x86_64-faster%252dcpython-allow_non_python_fra-3.12.0a6%2B-3e2c3ab-vs-base.png

pablogsal · 2023-03-16T15:02:50Z

I've made a branch that adds "lightweight" frames (just a pointer to a "code" object and a link pointer), and inserts one for each call to a builtin function in the interpreter. The performance impact is negligible and all builtin function and class calls are present in the frame stack.

Branch: https://github.com/python/cpython/compare/main...faster-cpython:cpython:allow-non-python-frames?expand=1

Performance: https://github.com/faster-cpython/benchmarking-public/blob/main/results/bm-20230315-3.12.0a6%2B-3e2c3ab/bm-20230315-linux-x86_64-faster%252dcpython-allow_non_python_fra-3.12.0a6%2B-3e2c3ab-vs-base.png

We still need the concept of entry frames for tools that merge native and python stacks. Why do you removed _PyFrame_IsEntryFrame in your branch?

markshannon · 2023-03-16T15:40:10Z

It's a proof of concept, it was easier to remove _PyFrame_IsEntryFrame than re-implement it.
_PyFrame_IsEntryFrame can be added back, if necessary.

pablogsal · 2023-03-16T15:45:47Z

It's a proof of concept, it was easier to remove _PyFrame_IsEntryFrame than re-implement it. _PyFrame_IsEntryFrame can be added back, if necessary.

👍

carljm · 2023-03-17T17:56:55Z

We are also very interested in this proposal from the Cinder JIT perspective.

One difference I see with our use case compared to what the draft PR so far aims to support is that we would like to be able to link in "minimal frames" that are still considered "complete": they are fetched by _PyFrame_GetFirstComplete() and can be materialized into a full PyFrameObject. (In the draft PR here, only _PyInterpreterFrame frames are considered "complete".) We don't want to constantly keep a _PyInterpreterFrame (localsplus contents) up to date while the JIT is running (this is expensive), so we'd need to get some kind of callback to reify our minimal frame into a PyFrameObject on-demand (i.e. some hook into _PyFrame_GetFrameObject).

In the tuple form of f_executable, could there be a well-known bit-flag in the third element that tells the interpreter "this should be considered a 'complete' frame, and the next word in its struct is a function pointer that will take the VMFrame struct and return a PyFrameObject."?

pablogsal · 2023-03-27T12:48:20Z

We should rename the f_code field to f_executable, and allow any object.

In general the feedback is that this will make introspection tool much harder to implement. Currently the fact that this can only be a code object makes it very easy to traverse the frame stack. If you allow any Python object it makes it harder or in some cases even impossible.

If we restrict this to a finite set of possibilities, it still makes it much harder but if we add some kind of enumeration to the frame that tells the tool what's going to be there it makes it a bit easier.

In general I don't think that this proposal helps introspection tools, it actually makes the implementation harder and less efficient because more pointers need to be copied and more logic needs to be included.

pablogsal · 2023-03-27T12:49:01Z

Some comments from authors:

I feel it won't be too easy to decipher the type of the object remotely. This would likely increase the number of private structures that we need to copy over from Python headers to parse this information (e.g. tuples), making things more complex. Of course one could just try treating the object as a PyCodeObject and check for failures, but this would now imply a potential loss of captured information, unless all the other object types that can appear here are also handled. Perhaps an extra int field that specifies the type of the object being passed with f_executable might help in this direction, to some extent. But perhaps one simplification that depends on a positive answer to the following question could be adopted: is the value f_executable crucial for the actual execution, or is it just added to carry the frame's metadata (e.g. filename, function name, line number, ...)? If that is added just for the metadata, perhaps that could be added directly to the _PyVMFrame structure in the form of extra fields? There could be a core set of fields that are common to all object types (filename, function qualname, location data), plus a generic PyObject reference that can be consumed easily by in-process tools. However, I can see the downside being that the cost would probably end up being slightly more than just 4 W and 3 R operations in general.

pablogsal · 2023-03-27T12:50:35Z

In general the sentiment is that the more regular the structure is, the easier is for profilers and debuggers to traverse the stack. The more variations and python-isms (as in, using PyObject* instead of C structs) the harder it makes for these tools to properly traverse the stack, which goes against (partially) what we are trying to do here

markshannon · 2023-03-27T13:17:27Z

Feedback from who? Which operations for which tools become harder?
It is hard to take vague and anonymous feedback seriously.

No one is forcing tools to handle all possible frames. They can skip frames that have "executable" objects other than code objects. The presence of additional information that tools ignore cannot be worse than that information not being present in the first place.
Add not all tools will ignore it; the PR already give better tracebacks in the faulthandler module.

In general the sentiment is that the more regular the structure is, the easier is for profilers and debuggers to traverse the stack. The more variations and python-isms (as in, using PyObject* instead of C structs) the harder it makes for these tools to properly traverse the stack, which goes against (partially) what we are trying to do here

Two fields, one pointing to the next frame, and one pointing to the executable object, seems quite regular to me.
Traversal of the stack is trivial. Just follow the previous pointer.

markshannon · 2023-03-27T13:22:14Z

It might be informative to compare this with adding perf support:

Adding perf frames causes a slowdown of 8%. The PR above has negligible performance impact.
This works on Windows and any machine that does not have perf installed.
It works with PEP 523 or PEP 669.

… in `_PyEval_EvalFrameDefault`. (python#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.

pablogsal · 2023-03-27T13:53:36Z

Feedback from who?

Authors of profilers and debuggers. For now authors of Austin, py-spy, scygraph and fil, and myself (memray/pystack). I collected feedback from them but if you prefer that they comment here directly individually I can also ask for that.

Which operations for which tools become harder?

Getting the Python stack from a remote process reading memory.

It might be informative to compare this with adding perf support:

Informative how? perf support is optional, it doesn't affect profiling or debugging tools other than allowing perf to work and it doesn't collide with this work. I am failing to see the argument here. This issue is called "Make it easier to traverse the frame stack for third party tools" and we are literally discussing changes that will achieve the opposite for some tools, not sure how the perf support is involved.

markshannon · 2023-03-27T14:30:09Z

if you prefer that they comment here directly individually I can also ask for that.

Yes, please.

perf support is optional

Is it really? In that case let's drop it now before it causes trouble for 3.13.
We aren't going to support perf in any future JIT compiler.

markshannon · 2023-03-27T14:40:57Z

Let's make it clear. The choice isn't between the proposed ABI and the status quo. The choice is between a well defined, if minimal, ABI and no ABI guarantees whatsoever.
The _PyInterpreterFrame struct is internal and will change.

For example, we need to insert frames for shims at the exit from __init__ functions in order to inline them. We might want to do the same for calls to __setitem__ and __setattr__ as well. Create code objects for these shims is a waste of effort.
We might want to replace calls to tuple with a surrogate that constructs the tuple in bytecode. We will want to have the tuple object as the "executable", so that the frame stack looks the same.

And don't forget the producers of frames, as well as the consumers.
JIT compilers may want to insert frames. Cinder does. I suspect we won't, but we might.
Cython and pybind11 may want to insert frames, if not all the time, at least if an error occurs. Creating fake code objects is unnecessary overhead.

If the ABI I'm suggesting is not a good one, then you need to suggest a better one.

I am failing to see the argument here (Contrasting with perf support)

The purpose of adding perf support is that tools can see the mixed C/Python stack.
perf support does that by faking Python frames on the C stack. I propose adding frames for C callables (and other things) to the Python stack.
Adding it to the Python stack means that it is easily accessible to in-process tools, and with very low overhead.
Adding to the C stack requires support for the native debugging format and ABI, and has a large cost.

markshannon · 2023-03-27T15:45:16Z

So it doesn't mean optional for us and we are stuck with it?
In that case it needs a PEP.

Supporting perf in a JIT compiler is going to be lot of extra work, with a benefit for a few large corps that have the infrastructure to support perf profiling in production, and a cost (worse performance and/or reliability) for many.

pablogsal · 2023-03-27T15:47:29Z

So it doesn't mean optional for us and we are stuck with it?
In that case it needs a PEP.

The feature has landed already in 3.12 and is already released in alpha. I respect your position but I disagree with it. I suggest that if you want to discuss this, we can do it in a more real-time channel other than a GitHub issue.

pablogsal · 2023-03-27T16:07:31Z

tighter the restrictions, the easier it is for introspection tools. The looser the restrictions, the easier it is for producers of frame.

Refocusing the discussion on the original issue at hand. I think that if we add some kind of metadata to the frame that tells the tool what kind of frame is this (so basically what's going to be in the "f_executable" field, that's already a win. As you mention, tools may want to skip some of these frames if it cannot be handled.

On the other hand if we support simple structs or simple Python objects (as opposed to custom classes or even dictionaries) in f_executable that's also a win.

Additionally, I would like if we keep the current structure as preserved as possible (with this I mean that most frames will have a f_executable that points to a well-defined code object.

Also, I think having these fields as you propose:

typedef struct _PyInterpreterFrame {
    PyCodeObject *f_code;
    struct _PyInterpreterFrame *previous;
    ...

is a big win as tools don't need to update these definitions every single time.

markshannon · 2023-03-27T17:34:54Z

The "f_executable" field is its own metadata, as Python objects are self-describing.
Additional data adds bulk to the frame, and slows down frame creation.

We can restrict the number of classes that are officially supported. If "f_executable" object is one of those, then tools can use that information. It is something else, they can just ignore it.

In the minimal case of just supporting code objects:

while (frame);
   if (frame->f_executable->ob_type == &PyCode_Type) {
       do_stuff_with_frame(frame);
   }
   frame = frame->previous;
}

As for what should be supported:

Code objects
Classes
Builtin functions
Method descriptors
(name, filename, lineno) tuples.
(Maybe other C callables, like slot wrappers)*

I'd like to get rid of slot wrappers and other oddities and merge them into builtin functions, but that's another issue.

The VM might create frames for other objects, but tools should ignore them.

pablogsal · 2023-03-27T17:40:55Z

The "f_executable" field is its own metadata, as Python objects are self-describing. Additional data adds bulk to the frame, and slows down frame creation.

We can restrict the number of classes that are officially supported. If "f_executable" object is one of those, then tools can use that information. It is something else, they can just ignore it.

In the minimal case of just supporting code objects:

I understand, but this makes life for inspection tools harder because it forces to copy much more stuff (the class and the name at least) instead of inspecting an enum. I would like to have what is in f_executable in the frame object. I understand that you don't but I want to state that I do :)

As for what should be supported:

Code objects

Classes

Builtin functions

Method descriptors

(name, filename, lineno) tuples.

(Maybe other C callables, like slot wrappers)*

I see what you are coming from, but allowing all these things is going to make implementing these tools a nightmare because supporting all these possibilities is a lot. I would like to restrict this list to just simple stuff like tuples, code objects and maybe some c-like struct that can be used for more exotic stuff. This is basically what you said here:

The tighter the restrictions, the easier it is for introspection tools. The looser the restrictions, the easier it is for producers of frame.

I am advocating for much tighter restrictions, but I understand that's not the direction that you want to go and I respect that.

carljm · 2023-03-27T22:54:23Z

We could use a tagged pointer in f_executable to provide an easy-to-read "flag" indicating the frame type without adding any additional bulk to frames, and not much extra cost to frame creation. This is what Cinder shadowframes does today: https://github.com/facebookincubator/cinder/blob/cinder/3.10/Include/internal/pycore_shadow_frame_struct.h#L42-L60

In the minimal form, we can leave the low bit 0 to indicate "normal frame, pointer to code object" (then there is also zero overhead in normal interpreter frame creation) and set it to 1 to mean "pointer to something new and different." Then existing tools that want to just handle normal code objects like they already do only need a single bit test to discard frames they don't want to deal with.

There are another two bits we could play with if we want to provide streamlined indication of any other common cases (builtin function, tuple form, maybe?).

I hope we can make life easier for existing inspection tools by making it really easy to detect the common cases they want to care about, but I also hope (from the Cinder JIT perspective) that at least one of the valid options for f_executable is "extensible". E.g. if(name, filename, lineo) tuple is allowed, that it's also valid to have a longer tuple carrying additional payload, with the first three elements interpreted as name, filename, lineo.

carljm · 2023-03-27T22:58:52Z

Classes

I'm curious, why would we want frames to hold a pointer to a class (I assume while executing the class body?) rather than to the code object of the class body?

markshannon · 2023-03-28T09:23:53Z

I'm curious, why would we want frames to hold a pointer to a class

class C: pass

c = C()  # This is a call to a class

markshannon · 2023-03-28T09:30:19Z

@pablogsal
Is checking for five or six distinct values, rather than two or three that big a deal?
Also, why is comparing to an address a problem, whereas comparing to an int is not?
frame->f_kind == CODE+OBJECT_KIND is a 32 bit comparison.
frame->f_executable == &PyCode_Type is a 64 bit comparison.
Fetching the address of PyCode_Type will need the symbol table, but you'll need that anyway.

markshannon · 2023-03-28T09:31:52Z

@carlmjm
I don't see the value in tagging bits. The tag you propose holds no additional information, as the same value can be got with the simple comparison f_executable == &PyCodeObject

pablogsal · 2023-03-28T11:47:50Z

Fetching the address of PyCode_Type will need the symbol table, but you'll need that anyway.

Not necessarily. These tools need to work sometimes with stripped binaries or core files and requiring the symbol table there can be a huge pain compared with just checking against an integer, as we are vendoring the headers anyway. In particular, as an example (please don't focus too much on this) in core files is a huge pain because the address may not be in the core if is in the .rodata segment.

Currently, once you get the frame, you access the f_code pointer and you KNOW is a code object so once you have the layout for it (because we vendor the headers) you know how to extract the function name and the filename.

If we have an integer in the frame that tells what f_executable will have, then we can compare against it directly and know what we are going to find. No extra information or copies are required.

If we need to compare with something now we require:

Copy the pointers/structures of all the possible types that can be (that is PyCode_Type and the same for tuples, functions...). This is already problematic because they may not be in the core OR we may not have symbols so we may be unable to locate them even in a live process.
Once you find the address of PyCode_Type and friends, you need to relocate to find the real address. Quite simple to do, but is more operations.
If we allow random classes then the tools are unable to even compare against pointers because we don't even know where they are

But an enum describing what it contains allows us to KNOW what the pointer will contain and for instance be super sure that the pointer is some custom stuff that we won't be able to understand instead of having to "guess" based on "oh, this pointer is not one of the ones we know about (code, tuples...))

markshannon · 2023-03-28T14:47:54Z

I appreciate that having extra information will make life easier for a few tool authors, but it might make things a little bit slower for very many Python users.

How do you get the frame without any symbols?

If we allow random classes...

Any class outside of the fixed set (whatever that ends up being) should be ignored, so no "random" classes.

pablogsal · 2023-03-28T15:09:07Z

How do you get the frame without any symbols?

Find the interpreter state and having the headers vendor so we know the offsets to the pointers in every struct and we know what we are going to find because at the moment is fully determined. The interpreter state can be found because we (cpython) place the runtime structure in its own section so it can be found without symbols:

cpython/Python/pylifecycle.c

Line 100 in 7703def

__attribute__ ((section (".PyRuntime")))

Although this is technically not needed because it can be found by finding the cycle interpreter state <-> thread state by scanning the bss which is what py-spy does.

carljm · 2023-03-28T15:29:32Z

I don't see the value in tagging bits. The tag you propose holds no additional information, as the same value can be got with the simple comparison f_executable == &PyCodeObject

Sure, if you have &PyCodeObject available.

Tagging bits could "make life easier for a few tool authors" in the scenarios @pablogsal is mentioning without "making things slower for very many Python users."

EDIT: also, it's not f_executable == &PyCodeObject, it's f_executable->ob_type == &PyCodeObject, so it's adding an extra pointer chase for every frame also.

markshannon · 2023-03-28T16:13:05Z

Tagging might solve the performance issue. But we need to support 32bit machines, so we only have 2 bits to play with, which is not enough.

markshannon · 2023-03-28T16:16:32Z

If tools can find the runtime, then we can put an array of pointers there. No runtime overhead at the cost of ~40 bytes.

PyObject *callable_types[] = {
   &PyCode_Type,
   ...
};

pablogsal · 2023-03-28T16:58:20Z

If tools can find the runtime, then we can put an array of pointers there.

That would be an acceptable compromise I think.

markshannon · 2023-03-29T09:03:52Z

OK, let's go with that then.

FTR, one other reason not to use an enum is this: what happens when the enumeration and the executable don't match?
We can be fairly sure it won't happen in our code, but it would be an easy mistake to make in third-party code.

By allowing objects of any class, but designating a small set of "approved" classes, the system is much more robust.

markshannon · 2023-03-29T09:09:17Z

@pablogsal
Where should the array go, exactly?

markshannon · 2023-03-29T09:14:54Z

@carljm

I hope we can make life easier for existing inspection tools by making it really easy to detect the common cases they want to care about, but I also hope (from the Cinder JIT perspective) that at least one of the valid options for f_executable is "extensible". E.g. if(name, filename, lineo) tuple is allowed, that it's also valid to have a longer tuple carrying additional payload, with the first three elements interpreted as name, filename, lineo.

I don't see a problem with that.
Tools should check the length of the tuple before extracting the contents, for safety.
We could allow any length array, specifying only that the first three elements, if they exist, should be the name, filename and line number.
("foo",) and ("foo", "foo.py", 121, "special-data-34.8") should both be acceptable.

@pablogsal Would this be OK, or is this too complex for your tastes?

P403n1x87 · 2023-03-29T21:29:28Z

Some comments from authors:

I feel it won't be too easy to decipher the type of the object remotely. This would likely increase the number of private structures that we need to copy over from Python headers to parse this information (e.g. tuples), making things more complex. Of course one could just try treating the object as a PyCodeObject and check for failures, but this would now imply a potential loss of captured information, unless all the other object types that can appear here are also handled. Perhaps an extra int field that specifies the type of the object being passed with f_executable might help in this direction, to some extent. But perhaps one simplification that depends on a positive answer to the following question could be adopted: is the value f_executable crucial for the actual execution, or is it just added to carry the frame's metadata (e.g. filename, function name, line number, ...)? If that is added just for the metadata, perhaps that could be added directly to the _PyVMFrame structure in the form of extra fields? There could be a core set of fields that are common to all object types (filename, function qualname, location data), plus a generic PyObject reference that can be consumed easily by in-process tools. However, I can see the downside being that the cost would probably end up being slightly more than just 4 W and 3 R operations in general.

This would be me, maintainer of Austin. For context, Austin uses system calls like process_vm_readv to read memory out of process.

P403n1x87 · 2023-03-29T21:57:04Z

How do you get the frame without any symbols?

Austin uses symbols to locate _PyRuntime, but if those are not available, there is a fallback on BSS scan to locate something that looks like _PyRuntime or an interpreter state. So symbols are not strictly required (but good to have of course!).

P403n1x87 · 2023-03-31T11:20:49Z

Apologies if I slightly derail the conversation, but I wanted to express the following thought. Based on my experience with Austin, I would regard frame stack unwinding as just one aspect of the more general topic of observability into the Python VM. For example, one other thing that Austin tries to do is to sample the GC state to give an idea of how much CPU time is being spent on GC. Or detect who is holding the GIL to give a better estimate of RSS allocations. Therefore, I would tend to view frame stacks as just a part of what can be observed out of process. So the way I see a tool like Austin extracting this information in the future is by looking into an "observability entry point", much like _PyRuntime, but specifically engineered for out-of-process tools. From there one can rely on an ever growing (in an ideally backwards-compatible fashion) list of things one can observe, e.g.

.section _PyRuntimeStateABI

runtime_state {
  interpreter_state {
    thread_count: int,
    threads: [{
      top_frame: { ... },
      ....,
    ]
  },
  gc_state: ...,
  gil_state: ...,
}

… in `_PyEval_EvalFrameDefault`. (python#102640) * Rename local variables, names and consts, from the interpeter loop. Will allow non-code objects in frames for better introspection of C builtins and extensions. * Remove unused dummy variables.

…of an internal frame. (GH-105727) * Add table describing possible executable classes for out-of-process debuggers. * Remove shim code object creation code as it is no longer needed. * Make lltrace a bit more robust w.r.t. non-standard frames.

markshannon added the type-feature A feature request or enhancement label Jan 12, 2023

pablogsal self-assigned this Jan 12, 2023

bedevere-bot mentioned this issue Jan 12, 2023

GH-100987: Refactor _PyInterpreterFrame a bit, to assist generator improvement. #100988

Merged

markshannon added a commit that referenced this issue Feb 13, 2023

GH-100987: Refactor _PyInterpreterFrame a bit, to assist generator …

d919917

…improvement. (GH-100988) Refactor _PyInterpreterFrame a bit, to assist generator improvement.

markshannon mentioned this issue Mar 13, 2023

GH-100987: Don't cache references to the names and consts array in _PyEval_EvalFrameDefault. #102640

Merged

bedevere-bot mentioned this issue Mar 24, 2023

GH-100987: Allow non python frames in frame stack. #103010

Closed

markshannon mentioned this issue Apr 3, 2023

Allow the f_func field of the _PyInterpreterFrame struct to be any object (and rename it) #96237

Closed

This was referenced Jun 9, 2023

Move observability-relevant structure fields to the top #105271

Merged

GH-100987: Allow objects other than code objects as the "executable" of an internal frame. #105727

Merged

pablogsal mentioned this issue Jul 10, 2023

Add a collection of offsets to facilitate the work of out-of-process debuggers #106597

Closed

iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023

Make it easier to traverse the frame stack for third party tools. #100987

Make it easier to traverse the frame stack for third party tools. #100987

Comments

markshannon commented Jan 12, 2023 • edited by bedevere-bot

Linked PRs

markshannon commented Jan 12, 2023

pablogsal commented Jan 12, 2023

markshannon commented Mar 13, 2023

markshannon commented Mar 13, 2023

pablogsal commented Mar 13, 2023

markshannon commented Mar 13, 2023

itamarst commented Mar 15, 2023

markshannon commented Mar 16, 2023

pablogsal commented Mar 16, 2023

markshannon commented Mar 16, 2023

pablogsal commented Mar 16, 2023

carljm commented Mar 17, 2023 • edited

pablogsal commented Mar 27, 2023

pablogsal commented Mar 27, 2023

pablogsal commented Mar 27, 2023

markshannon commented Mar 27, 2023

markshannon commented Mar 27, 2023

pablogsal commented Mar 27, 2023 • edited

markshannon commented Mar 27, 2023

markshannon commented Mar 27, 2023

markshannon commented Mar 27, 2023

pablogsal commented Mar 27, 2023 • edited

pablogsal commented Mar 27, 2023

markshannon commented Mar 27, 2023

pablogsal commented Mar 27, 2023

carljm commented Mar 27, 2023 • edited

carljm commented Mar 27, 2023

markshannon commented Mar 28, 2023

markshannon commented Mar 28, 2023

markshannon commented Mar 28, 2023

pablogsal commented Mar 28, 2023 • edited

markshannon commented Mar 28, 2023

pablogsal commented Mar 28, 2023 • edited

carljm commented Mar 28, 2023 • edited

markshannon commented Mar 28, 2023

markshannon commented Mar 28, 2023

pablogsal commented Mar 28, 2023

markshannon commented Mar 29, 2023

markshannon commented Mar 29, 2023

markshannon commented Mar 29, 2023

P403n1x87 commented Mar 29, 2023

P403n1x87 commented Mar 29, 2023

P403n1x87 commented Mar 31, 2023

markshannon commented Jan 12, 2023 •

edited by bedevere-bot

carljm commented Mar 17, 2023 •

edited

pablogsal commented Mar 27, 2023 •

edited

pablogsal commented Mar 27, 2023 •

edited

carljm commented Mar 27, 2023 •

edited

pablogsal commented Mar 28, 2023 •

edited

pablogsal commented Mar 28, 2023 •

edited

carljm commented Mar 28, 2023 •

edited