-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreter state overview #633
Comments
@brandtbucher, @markshannon, @iritkatriel -- please review. (Note that the "JIT considerations" section came out of a conversation I had with Brandt this afternoon.) |
A good start.
than this:
Instruction formatsThe tier 2 instruction format is an implementation detail. We should keep this at a more semantic level. Just say that tier 2 instruction may include an We probably don't want to be executing the same format as we use for optimization. A large, regular format is great for optimization, but a smaller, irregular format is likely better for execution. The eval-breaker will probably move to the thread state soon, as part of the work on PEP 703. The frame also holds a pointer to the current function, globals and builtins. (And "slow" locals, but that's relatively unimportant). Frame state shadowed in localsFrame state shadowed in C locals. "locals" has many meanings. Tier 2 considerations and JIT considerations.The evaluation stackRather than describing operations in terms of stack pointer, I think it better to describe them in terms of the stack itself, and how it is implemented. The canonical, in-memory, representation of the stack is an array of values in the frame, with The interpreters share an implementation which uses the same memory but caches the depth (as a pointer) in a C local. Instruction pointerThe canonical, in-memory, representation of the instruction pointer is Tier 2:
The tier 2 instruction pointer is strictly internal to the tier 2 interpreter, so isn't visible to any other part of the code. UnwindingUnwinding uses exception tables to find the next point at which normal execution can occur, or fail if there are no exception handlers. During unwinding both the stack and the instruction pointer should be in their canonical, in-memory representation. Jumps in bytecodeThe implementation of jumps within a single tier 2 superblock/trace is just that, an implementation. The implementation in the JIT and in the tier 2 interpreter will necessarily be different. We need the following types of jumps:
Currently, we don't have patchable exits. |
I realize I should have started a PR proposing specific contents for a file from the start; I wrote some free-flowing text and received free-flowing feedback that's hard to turn into a concrete new version of the text. Please review python/cpython#111621 instead. |
I'm just going to land that PR once tests pass. |
I have a question in mind. is it possible to use AOT compiler techniques in cPython in future? I am not so well versed in JIT and AOT, just a question from a curious mind. |
Our plans for JIT compilation are built almost entirely on in-process runtime profiling data. So AOT compilation isn't really on our roadmap. |
I've been asked to write up an overview of the interpreter state. Let me start writing it up here; we can transfer it to a .md file in the code tree later.
Definition of Tiers
Tier 1 is the classic Python bytecode interpreter. This includes the specializing adaptive interpreter described in PEP 659 and introduced in Python 3.11.
Tier 2, also known as the micro-instruction ("uop") interpreter, is a new interpreter with a different instruction format. It will be introduced in Python 3.13, and also forms the basis for a JIT using copy-and-patch technology that is likely to be introduced at the same time (but, unlike the Tier 2 interpreter, hasn't landed in the main branch yet).
Tier 2 instruction format
Tier 2 instructions are all the same size; there is no equivalent to
EXTENDED_ARG
or trailing inline cache entries. Each instruction is a struct with the following fields:int32_t opcode;
-- the opcode. Sometimes the same as a Tier 1 opcode, sometimes a separate micro opcode. Tier 2 opcodes are 9 bits (as opposed to Tier 1 opcodes, which fit in 8 bits). By convention, Tier 2 opcode names start with_
.int32_t oparg;
-- the argument. Usually the same as the Tier 1 oparg after expansion ofEXTENDED_ARG
prefixes.int64_t operand;
-- a second argument, Typically the value of one cache item from the Tier 1 inline cache.The tier 2 instruction format is also the basis for hypothetical optimization passes.
Frame state
Almost all interpreter state is nominally stored in the frame structure. A pointer to the current frame is held in
frame
. It contains:There are some other fields in the frame structure of less importance; notably frames are linked together in a singly-linked list via the
previous
pointer, pointing from callee to caller.Frame state shadowed in locals
A few items above are not kept up to date continuously, but shadowed by C local variables (hopefully kept in registers by the C compiler) in
_PyEval_EvalFrameDefault()
:stack_pointer
: points just past the top item on the stack (the stack top isstack_pointer[-1]
). Loaded from the frame usingLOAD_SP()
, i.e.stack_pointer = _PyFrame_GetStackPointer(frame)
. Stored usingSTORE_SP()
, i.e._PyFrame_SetStackPointer(frame, stack_pointer)
. Note that the frame stores an integer,stacktop
, and that_PyFrame_GetStackPointer()
invalidates the latter by overwriting it with-1
.next_instr
: corresponds at times toframe->instr_ptr
. Loaded usingLOAD_IP(offset)
(i.e.,next_instr = frame->instr_ptr + offset
). Stored usingframe->instr_ptr = next_instr
or equivalent.Thread state and interpreter state
Another important piece of state is the thread state, held in
tstate
. The current frame pointer,frame
, is always equal totstate->current_frame
. The thread state also holds the exception state (tstate->exc_info
) and the recursion counters (tstate->c_recursion_remaining
andtstate->py_recursion_remaining
).The thread state is also used to access the interpreter state (
tstate->interp
), which is important since the "eval breaker" flags are stored there (tstate->interp->ceval.eval_brealer
, an "atomic" variable), as well as the "PEP 523 function" (tstate->interp->eval_frame
). The interpreter state also holds the optimizer state (optimizer
and some counters).Tier 2 considerations
The Tier 2 interpreter uses
frame
andtstate
exactly as the Tier 1 interpreter.It happens to use
stack_pointer
the same way as well, but by convention when switching between tiers the stack pointer is stored and reloaded using theSTORE_SP()
andLOAD_SP()
macros (it just so happens that they share the local variablestack_pointer
, and the sequenceSTORE_SP(); LOAD_SP()
is effectively a no-op, even preserving the invariant that when the stack pointer is held instack_pointer
,frame->stacktop
is invalidated by storing-1
there).The instruction pointer is handled entirely differently between the tiers. In Tier 1,
next_instr
is used to decode each instruction, and also used to access inline cache values. Whenever an instruction is decoded,frame->instr_ptr
is made to point to the start of the instruction (excludingEXTENDED_ARG
prefixes -- the important invariant is thatframe->instr_ptr->op.code
is the opcode of the current instruction). Simultaneously,next_instr
is incremented so that it either points at the next instruction (for simple instructions), or at the first in-line cache entry for the current instruction. In-line cache values are accessed asnext_instr[0].cache
,next_instr[1].cache
, etc. (For cache values larger than 16 bits, helper functions likeread_u32()
exist.)However, in Tier 2, there are actually two "instruction pointers": First, there is a pointer to the (current or next) Tier 2 micro-op. This tells the Tier 2 interpreter the location of the next micro-instruction. Separately, the Tier 1 instruction pointer must still be available, because it (or, actually, the corresponding index into the Tier 1 bytecode array) is used for exception handling. It is used by the exception handling table to find the exception handler if an exception occurs, and to report the line and column numbers if a traceback is produced. By convention, these processes use the instruction pointer from
frame->instr_ptr
, not the contents of the (only roughly corresponding)next_instr
variable.The translation process from Tier 1 bytecode to Tier 2 micro-ops outputs a
_SET_IP
micro-op at the start of each translated bytecode. This is a special uop that directs the Tier 2 interpreter to store a specific value intoframe->instr_ptr
. Because this produces a lot of_SET_IP
uops, a quick post-processing pass removes_SET_IP
uops whose original Tier 1 bytecode cannot produce an exception (e.g.,LOAD_FAST
orSTORE_FAST
).JIT considerations
The copy-and-patch JIT is considered a variant of the Tier 2 interpreter. The
frame
,stack_pointer
andtstate
values are treated exactly the same. This means that it also automatically updatesframe->instr_ptr
the same way whenever it executes a_SET_IP
uop. The only difference is that in the JIT world, the array of Tier 2 uops is irrelevant, and there is no Tier 2 instruction pointer. Instead, there is just the hardware CPU's program counter.There are a few complications that will probably change:
oparg
). Instead, we should introduce a macro that in the Tier 2 interpreter that updates the Tier 2 program counter (the Tier 1 program counter will be taken care by an explicit_SET_IP
uop at the jump target), whereas in the JIT the macro will just set a flag that is tested by the JIT template. Note that for uops that can't jump, the C compiler will optimize away this flag and its testing (as it does for the current hack)._SET_IP
uop currently uses a helper variableip_offset
which is set to point to the start of the bytecode array whenever the Tier 2 interpreter switches frames (i.e., at the top, and in the_PUSH_FRAME
and_POP_FRAME
uops). We should get rid of the helper variable and just replace its (few) uses with_PyCode_CODE(_PyFrame_GetCode(frame))
.The text was updated successfully, but these errors were encountered: