Numba recently gained support for compiling generator functions. This document explains some of the implementation choices.
For clarity, we distinguish between generator functions and generators. A generator function is a function containing one or several yield
statements. A generator (sometimes also called "generator iterator") is the return value of a generator function; it resumes execution inside its frame each time :pynext
is called.
A yield point is the place where a yield
statement is called. A resumption point is the place just after a yield point where execution is resumed when :pynext
is called again.
Suppose we have the following simple generator function:
def gen(x, y):
yield x + y
yield x - y
Here is its CPython bytecode, as printed out using :pydis.dis
:
7 0 LOAD_FAST 0 (x)
3 LOAD_FAST 1 (y)
6 BINARY_ADD
7 YIELD_VALUE
8 POP_TOP
8 9 LOAD_FAST 0 (x)
12 LOAD_FAST 1 (y)
15 BINARY_SUBTRACT
16 YIELD_VALUE
17 POP_TOP
18 LOAD_CONST 0 (None)
21 RETURN_VALUE
When compiling this function with NUMBA_DUMP_IR
set to 1, the following information is printed out:
----------------------------------IR DUMP: gen----------------------------------
label 0:
x = arg(0, name=x) ['x']
y = arg(1, name=y) ['y']
$0.3 = x + y ['$0.3', 'x', 'y']
$0.4 = yield $0.3 ['$0.3', '$0.4']
del $0.4 []
del $0.3 []
$0.7 = x - y ['$0.7', 'x', 'y']
del y []
del x []
$0.8 = yield $0.7 ['$0.7', '$0.8']
del $0.8 []
del $0.7 []
$const0.9 = const(NoneType, None) ['$const0.9']
$0.10 = cast(value=$const0.9) ['$0.10', '$const0.9']
del $const0.9 []
return $0.10 ['$0.10']
------------------------------GENERATOR INFO: gen-------------------------------
generator state variables: ['$0.3', '$0.7', 'x', 'y']
yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
yield point #2: live variables = [], weak live variables = ['$0.7']
What does it mean? The first part is the Numba IR, as already seen in arch_generate_numba_ir
. We can see the two yield points (yield $0.3
and yield $0.7
).
The second part shows generator-specific information. To understand it we have to understand what suspending and resuming a generator means.
When suspending a generator, we are not merely returning a value to the caller (the operand of the yield
statement). We also have to save the generator's current state in order to resume execution. In trivial use cases, perhaps the CPU's register values or stack slots would be preserved until the next call to next(). However, any non-trivial case will hopelessly clobber those values, so we have to save them in a well-defined place.
What are the values we need to save? Well, in the context of the Numba Intermediate Representation, we must save all live variables at each yield point. These live variables are computed thanks to the control flow graph.
Once live variables are saved and the generator is suspended, resuming the generator simply involves the inverse operation: the live variables are restored from the saved generator state.
Note
It is the same analysis which helps insert Numba del
instructions where appropriate.
Let's go over the generator info again:
generator state variables: ['$0.3', '$0.7', 'x', 'y']
yield point #1: live variables = ['x', 'y'], weak live variables = ['$0.3']
yield point #2: live variables = [], weak live variables = ['$0.7']
Numba has computed the union of all live variables (denoted as "state variables"). This will help define the layout of the generator
structure <generator-structure>
. Also, for each yield point, we have computed two sets of variables:
- the live variables are the variables which are used by code following the resumption point (i.e. after the
yield
statement) - the weak live variables are variables which are del'ed immediately after the resumption point; they have to be saved in
object mode
, to ensure proper reference cleanup
Function analysis helps us gather enough information to define the layout of the generator structure, which will store the entire execution state of a generator. Here is a sketch of the generator structure's layout, in pseudo-code:
struct gen_struct_t {
int32_t resume_index;
struct gen_args_t {
arg_0_t arg0;
arg_1_t arg1;
...
arg_N_t argN;
}
struct gen_state_t {
state_0_t state_var0;
state_1_t state_var1;
...
state_N_t state_varN;
}
}
Let's describe those fields in order.
- The first member, the resume index, is an integer telling the generator at which resumption point execution must resume. By convention, it can have two special values: 0 means execution must start at the beginning of the generator (i.e. the first time :py
next
is called); -1 means the generator is exhausted and resumption must immediately raise StopIteration. Other values indicate the yield point's index starting from 1 (corresponding to the indices shown in the generator info above). - The second member, the arguments structure is read-only after it is first initialized. It stores the values of the arguments the generator function was called with. In our example, these are the values of
x
andy
. - The third member, the state structure, stores the live variables as computed above.
Concretely, our example's generator structure (assuming the generator function is called with floating-point numbers) is then:
struct gen_struct_t {
int32_t resume_index;
struct gen_args_t {
double arg0;
double arg1;
}
struct gen_state_t {
double $0.3;
double $0.7;
double x;
double y;
}
}
Note that here, saving x
and y
is redundant: Numba isn't able to recognize that the state variables x
and y
have the same value as arg0
and arg1
.
How does Numba ensure the generator structure is preserved long enough? There are two cases:
- When a Numba-compiled generator function is called from a Numba-compiled function, the structure is allocated on the stack by the callee. In this case, generator instantiation is practically costless.
- When a Numba-compiled generator function is called from regular Python code, a CPython-compatible wrapper is instantiated that has the right amount of allocated space to store the structure, and whose :c
~PyTypeObject.tp_iternext
slot is a wrapper around the generator's native code.
When compiling a generator function, three native functions are actually generated by Numba:
- An initialization function. This is the function corresponding to the generator function itself: it receives the function arguments and stores them inside the generator structure (which is passed by pointer). It also initialized the resume index to 0, indicating that the generator hasn't started yet.
- A next() function. This is the function called to resume execution inside the generator. Its single argument is a pointer to the generator structure and it returns the next yielded value (or a special exit code is used if the generator is exhausted, for quick checking when called from Numba-compiled functions).
- An optional finalizer. In object mode, this function ensures that all live variables stored in the generator state are decref'ed, even if the generator is destroyed without having been exhausted.
The next() function is the least straight-forward of the three native functions. It starts with a trampoline which dispatches execution to the right resume point depending on the resume index stored in the generator structure. Here is how the function start may look like in our example:
define i32 @"__main__.gen.next"(
double* nocapture %retptr,
{ i8*, i32 }** nocapture readnone %excinfo,
i8* nocapture readnone %env,
{ i32, { double, double }, { double, double, double, double } }* nocapture %arg.gen)
{
entry:
%gen.resume_index = getelementptr { i32, { double, double }, { double, double, double, double } }* %arg.gen, i64 0, i32 0
%.47 = load i32* %gen.resume_index, align 4
switch i32 %.47, label %stop_iteration [
i32 0, label %B0
i32 1, label %generator_resume1
i32 2, label %generator_resume2
]
; rest of the function snipped
(uninteresting stuff trimmed from the LLVM IR to make it more readable)
We recognize the pointer to the generator structure in %arg.gen
. The trampoline switch has three targets (one for each resume index 0, 1 and 2), and a fallback target label named stop_iteration
. Label B0
represents the function's start, generator_resume1
(resp. generator_resume2
) is the resumption point after the first (resp. second) yield point.
After generation by LLVM, the whole native assembly code for this function may look like this (on x86-64):
.globl __main__.gen.next
.align 16, 0x90
- __main__.gen.next:
movl (%rcx), %eax cmpl $2, %eax je .LBB1_5 cmpl $1, %eax jne .LBB1_2 movsd 40(%rcx), %xmm0 subsd 48(%rcx), %xmm0 movl $2, (%rcx) movsd %xmm0, (%rdi) xorl %eax, %eax retq
- .LBB1_5:
movl $-1, (%rcx) jmp .LBB1_6
- .LBB1_2:
testl %eax, %eax jne .LBB1_6 movsd 8(%rcx), %xmm0 movsd 16(%rcx), %xmm1 movaps %xmm0, %xmm2 addsd %xmm1, %xmm2 movsd %xmm1, 48(%rcx) movsd %xmm0, 40(%rcx) movl $1, (%rcx) movsd %xmm2, (%rdi) xorl %eax, %eax retq
- .LBB1_6:
movl $-3, %eax retq
Note the function returns 0 to indicate a value is yielded, -3 to indicate StopIteration. %rcx
points to the start of the generator structure, where the resume index is stored.