-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C API] Add public getter functions for the internal PyFrameObject structure #84601
Comments
Similarly to bpo-39573 (make PyObject opaque) and bpo-39947 (make PyThreadState opaque), I propose to add getter functions to access PyFrameObject members without exposing the PyFrameObject structure in the C API. The first step is to identify common usage of the PyFrameObject structure inside CPython code base to add getter functions, and maybe a few setter functions as well. frameobject.h is not part of Python.h, but it's part of the public C API. The long term plan is to move PyFrameObject structure to the internal C API to hide implementation details from the public C API. |
New changeset b8f704d by Victor Stinner in branch 'master': |
I looked how Cython uses PyFrameObject:
Details:
// Initialised by module init code.
static size_t __pyx_pyframe_localsplus_offset = 0;
#include "frameobject.h"
// This is the long runtime version of
// #define __Pyx_PyFrame_GetLocalsplus(frame) ((frame)->f_localsplus)
// offsetof(PyFrameObject, f_localsplus) differs between regular C-Python and Stackless Python.
// Therefore the offset is computed at run time from PyFrame_type.tp_basicsize. That is feasible,
// because f_localsplus is the last field of PyFrameObject (checked by Py_BUILD_ASSERT_EXPR below).
#define __Pxy_PyFrame_Initialize_Offsets() \
((void)__Pyx_BUILD_ASSERT_EXPR(sizeof(PyFrameObject) == offsetof(PyFrameObject, f_localsplus) + Py_MEMBER_SIZE(PyFrameObject, f_localsplus)), \
(void)(__pyx_pyframe_localsplus_offset = ((size_t)PyFrame_Type.tp_basicsize) - Py_MEMBER_SIZE(PyFrameObject, f_localsplus)))
#define __Pyx_PyFrame_GetLocalsplus(frame) \
(assert(__pyx_pyframe_localsplus_offset), (PyObject **)(((char *)(frame)) + __pyx_pyframe_localsplus_offset))
|
"maybe a few setter functions as well" Also, could you remove PyFrame_GetLastInstr(PyFrameObject *frame)? Perhaps Stefan can tell us why Cython needs to access the internals of the frame object. |
There is no such "major change". PyFrameObject structure was fully exposed in the public C API since the earliest Python version. I don't see how adding setter is a major change, since it's already possible to directly modify *any* field of PyFrameObject without any kind of protection. My plan is to have two milestones: A) Make the structure opaque, so it's not longer possible to directly access it. B) Deprecate or remove a few getter or setter functions, or move them to the internal C API. I don't think that moving directly to step (B) is a realistic migration plan. So far, nobody analyzed all C extensions on PyPI to see how PyFrameObject is accessed. I prefer to move slowly, step by step. Breaking C extensions which currently *modify* directly PyFrameObject on purpose doesn't seem like a reasonable option to me. -- I'm trying to distinguish functions which should be "safe"/"portable" from the ones which may be too "CPython specific":
Note: Compared to accessing directly PyFrameObject.f_code, PyFrame_GetCode() also avoids the issue of borrowed references since it returns a strong reference. PyFrame_GetBack() looks specific to the current implementation of CPython. Another implementation might decide to not chain frames. Or maybe don't provide an easy way to traverse the chain of frames. Or at least, it might be a different API than PyFrame_GetBack(). --
I didn't add it :-) It's the pending PR 19764. I didn't merge it since it's unclear to me if it's a good idea to directly expose it or not. Cython uses it, but Cython also abuses CPython internals in general, usually for best performances :-) *Maybe* one compromise would be to add a private _PyFrame_GetLastInstr() to ease migration to step (A) (replace direct access to the structure with function calls). |
Adding to the list above: "f_lasti" is used in "libpython.py", which is an almost exact copy of the one in CPython's "Tools/gdb/" for debugging Python code in gdb. If that implementation changes, we can probably adapt, especially since it uses runtime generated code. I don't find a C function for this necessary at this point, given that CPython will also need to access its own internals here in some way. "f_localsplus" is used in the FASTCALL implementation and seems specific to Py3.6/7(?) which lacks the vectorcall protocol. So I guess it won't impact 3.8+ anymore. (The reason why its access is so complex is that StacklessPython has a slightly different frame struct, but we need to be binary(!) compatible with both.) "f_back" is used in the coroutine implementation for correct exception traceback building, and the usage is pretty much copied from CPython. I doubt that there is much use for setting it outside of "code that tries to reimplement CPython behaviour", but Cython has to do exactly that here. "f_lineno" is also used for traceback generation, because Cython creates frames when an exception occurs and needs to tell it what the current code line is. It's only used on newly created frames, although I can't guarantee that that will never change. Being able to read and set it seems reasonable. "f_trace" is obviously used for tracing/profiling, and there isn't currently a good C-API for that, besides talking to frame struct fields (which is quite efficient when it comes to performance). |
The coverage project has a ctrace C extension which access PyFrameObject.f_lasti which is gone in Python 3.11. It uses MyFrame_lasti() helper to handle Python 3.10 lasti change: // The f_lasti field changed meaning in 3.10.0a7. It had been bytes, but
// now is instructions, so we need to adjust it to use it as a byte index.
#if PY_VERSION_HEX >= 0x030A00A7
#define MyFrame_lasti(f) (f->f_lasti * 2)
#else
#define MyFrame_lasti(f) f->f_lasti
#endif // 3.10.0a7 f_lasti is used for two things in coverage/ctracer/tracer.c: (1) get the last opcode: /* Need to distinguish between RETURN_VALUE and YIELD_VALUE. Read
* the current bytecode to see what it is. In unusual circumstances
* (Cython code), co_code can be the empty string, so range-check
* f_lasti before reading the byte.
*/
int bytecode = RETURN_VALUE;
PyObject * pCode = MyFrame_GetCode(frame)->co_code;
int lasti = MyFrame_lasti(frame);
if (lasti < PyBytes_GET_SIZE(pCode)) {
bytecode = PyBytes_AS_STRING(pCode)[lasti];
}
if (bytecode != YIELD_VALUE) {
int first = MyFrame_GetCode(frame)->co_firstlineno;
if (CTracer_record_pair(self, self->pcur_entry->last_line, -first) < 0) {
goto error;
}
} (2) get the line number, with a special case for generator which is not started yet (lasti < 0) /* A call event is really a "start frame" event, and can happen for
* re-entering a generator also. f_lasti is -1 for a true call, and a
* real byte offset for a generator re-entry.
*/
if (frame->f_lasti < 0) {
self->pcur_entry->last_line = -MyFrame_GetCode(frame)->co_firstlineno;
}
else {
self->pcur_entry->last_line = PyFrame_GetLineNumber(frame);
} Since Python 3.10.0a3, PyFrame_GetLineNumber() handles the case of negative f_lasti, thanks to the commit 877df85 related to the PEP-626 implementation: int
PyCode_Addr2Line(PyCodeObject *co, int addrq)
{
if (addrq < 0) {
return co->co_firstlineno;
}
...
} => coverage would need an abstraction to get the last opcode: use case (1). I recall that an old version of asyncio also had to get latest opcode, in pure Python, to workaround the CPython bpo-21209 bug: +# Check for CPython issue bpo-21209 (...) + if _YIELD_FROM_BUG: Hopefully, this code could be be removed from asyncio, since the bug was fixed (and asyncio is now only maintained in the Python stdlib, it's not longer a third party module). |
See also bpo-45247: [C API] Add explicit support for Cython to the C API. |
I went ahead and changed the coverage.py code to this: #if PY_VERSION_HEX >= 0x030B00A0
// 3.11 moved f_lasti into an internal structure. This is totally the wrong way
// to make this work, but it's all I've got until https://bugs.python.org/issue40421
// is resolved.
#include <internal/pycore_frame.h>
#define MyFrame_lasti(f) ((f)->f_frame->f_lasti * 2)
#elif PY_VERSION_HEX >= 0x030A00A7
// The f_lasti field changed meaning in 3.10.0a7. It had been bytes, but
// now is instructions, so we need to adjust it to use it as a byte index.
#define MyFrame_lasti(f) ((f)->f_lasti * 2)
#else
#define MyFrame_lasti(f) ((f)->f_lasti)
#endif If we had an API that let me distinguish between call and resume and between return and yield, I could get rid of this. |
The docs for PyFrame_GetCode say it's returning an "int". "Get the frame code." is not very clear. Could this link to https://docs.python.org/3/c-api/code.html ? |
The gevent project is not compatible with Python 3.11: it gets and sets directly PyFrameObject.f_code member which has been removed in Python 3.11 (moved to PyFrameObject.f_frame.f_code internal C API). gevent issue: gevent/gevent#1867 |
I created bpo-46836: "[C API] Move PyFrameObject to the internal C API". |
Oh. I missed your comment. I created #75716 to fix the return type in the documentation. |
Ned Batchelder:
I proposed a coverage PR using PyObject_GetAttrString(frame, "f_lasti") which should works on all Python versions: |
Mark Shannon:
I found the uwsgi project on PyPI which uses f_lasti with PyCode_Addr2Line(). I wrote #75717 to suggest using PyFrame_GetLineNumber() in What's New in Python 3.11. |
Oh, plugins/python/profiler.c uses that to define PyFrame_GetLineNumber() on Python older than 2.7, Python 3.0 and Python 3.1. In 2022, it's no longer relevant. But well, there might be other code in the wild using PyCode_Addr2Line() with f_lasti, so IMO it's worth it to document the suggestion ;-) |
I searched for "\<_f" regex in Cython (0.29.x) branch. I found the following code getting or setting PyFrameObject fields. == Set f_back == __Pyx_Coroutine_SendEx() at Utility/Coroutine.c:721:
__Pyx_Coroutine_ResetFrameBackpointer() at Utility/Coroutine.c:775: Py_CLEAR(f->f_back); == Set f_lineno == __Pyx_PyFrame_SetLineNumber() at Utility/ModuleSetupCode.c:543: #define __Pyx_PyFrame_SetLineNumber(frame, lineno) (frame)->f_lineno = (lineno) __Pyx_PyFrame_SetLineNumber() is called by 3 functions:
__Pyx_AddTraceback() pseudo-code: static void __Pyx_AddTraceback(const char *funcname, int c_line,
int py_line, const char *filename)
{
py_frame = PyFrame_New(..., py_code, ...);
__Pyx_PyFrame_SetLineNumber(py_frame, py_line);
...
} == f_localsplus == If the CYTHON_FAST_PYCALL macro is defined, sizeof(PyFrameObject) is used to get the f_localsplus member. __Pyx_PyFrame_GetLocalsplus() at Utility/ObjectHandling.c:1996: // Initialised by module init code.
static size_t __pyx_pyframe_localsplus_offset = 0;
#include "frameobject.h"
#define __Pxy_PyFrame_Initialize_Offsets() \
((void)__Pyx_BUILD_ASSERT_EXPR(sizeof(PyFrameObject) == offsetof(PyFrameObject, f_localsplus) + Py_MEMBER_SIZE(PyFrameObject, f_localsplus)), \
(void)(__pyx_pyframe_localsplus_offset = ((size_t)PyFrame_Type.tp_basicsize) - Py_MEMBER_SIZE(PyFrameObject, f_localsplus)))
#define __Pyx_PyFrame_GetLocalsplus(frame) \
(assert(__pyx_pyframe_localsplus_offset), (PyObject **)(((char *)(frame)) + __pyx_pyframe_localsplus_offset)) == Get f_trace == __Pyx_TraceLine() at Utility/Profile.c:238:
== Set f_frame == __Pyx_TraceSetupAndCall() at Utility/Profile.c:299:
|
I've outlined the requirements for a frame stack API at faster-cpython/ideas#309. The problem with adding an API for PyFrameObject (beyond simple getters) is that it makes assumptions about the frame stack that aren't valid. A stack of frames is not just a linked list of frames. It never was, and it certainly isn't now. |
I created bpo-47092: [C API] Add PyFrame_GetVar(frame, name) function. |
Issue title: "[C API] Add getter functions for PyFrameObject and maybe move PyFrameObject to the internal C API" bpo-46836 moved PyFrameObject to the internal C API. I update the issue title. |
IMO the initial goal is now reached. I close the issue. Thanks to everyone who helped implementing these changes! The PyFrameObject structure is now opaque in Python 3.11. New getter functions of the Python 3.11 C API:
Finally, the PyFrameObject structure now has its own page in the C API documentation: As explained in previous comments, the work is not done: Cython, greenlet, gevent and coverage still need more getter and/or setter functions. Adding more functions is being discussed in this external issue: I propose to open new specific issues to add new functions. For example, open an issue to add a getter for the f_lasti member. |
The PyFrameObject structure was made opaque by bpo-46836. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: