Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-112354: Add executor for less-taken branch #112902

Closed
wants to merge 41 commits into from
Closed
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
36feeb1
Skip ENTER_EXECUTOR as deopt target (use vm_data)
gvanrossum Nov 24, 2023
d12533b
Add an array of 'extras' to UOpExecutor
gvanrossum Nov 23, 2023
f21f2d8
Count side exits per uop loc and print if >= 10
gvanrossum Nov 24, 2023
8463965
Add _PyOptimizer_Anywhere (not yet used)
gvanrossum Nov 25, 2023
6403752
Only jump in ENTER_EXECUTOR if overwriting JUMP_BACKWARD
gvanrossum Nov 25, 2023
329dead
Assert base opcode in _Py_Specialize_ForIter
gvanrossum Nov 25, 2023
f1998c0
Disable curses tests in --fast-ci mode (make test)
gvanrossum Nov 25, 2023
b0944e6
Improve (?) check for executor recursion
gvanrossum Nov 25, 2023
26b5f89
Only generate extra executors for branches
gvanrossum Nov 25, 2023
649581c
Fix Uop -> UOp
gvanrossum Dec 1, 2023
835bf13
WIP
gvanrossum Dec 1, 2023
256b156
Fix where next_instr points upon E_E avoidance
gvanrossum Dec 2, 2023
75c7c32
Allow executors with oparg >= 256
gvanrossum Dec 2, 2023
a94c7f1
Don't try to optimize with default optimizer
gvanrossum Dec 2, 2023
747a3f0
Use separate 'counters' and 'executors' arrays
gvanrossum Dec 11, 2023
682cf5a
Jump directly to side-exit executors
gvanrossum Dec 12, 2023
359c6fc
Remove progress check; clean up the rest a big
gvanrossum Dec 13, 2023
ca6ed3a
Ensure array of executor pointers is 64-bit aligned
gvanrossum Dec 13, 2023
e2a26b5
Check at least two uops; further cleanup
gvanrossum Dec 13, 2023
38c7aab
Move exit_trace up, since it is smaller
gvanrossum Dec 13, 2023
0f64231
Use configured threshold and exp. backoff for counter
gvanrossum Dec 13, 2023
83297df
Add API to access sub-interpreters
gvanrossum Dec 13, 2023
d065a94
Move optimizer/executor tests to new file test_capi/test_opt.py
gvanrossum Dec 13, 2023
0f71a03
Merge branch 'test-opt' into uops-extras
gvanrossum Dec 13, 2023
075ab91
In _PyOptimizer_Unanchored, assert not ENTER_EXECUTOR, accept JUMP_BA…
gvanrossum Dec 13, 2023
c54daef
Call DISPATCH() directly from exit_trace
gvanrossum Dec 13, 2023
934a115
Correct comment on deoptimize
gvanrossum Dec 13, 2023
52c49eb
Merge branch 'main' into uops-extras
gvanrossum Dec 13, 2023
8f5e623
Remove enter_tier_one label
gvanrossum Dec 13, 2023
10b98f1
Add test
gvanrossum Dec 13, 2023
1450ca6
Fix memory leak
gvanrossum Dec 13, 2023
dcde4d3
Clear sub-executors array upon dealloc
gvanrossum Dec 13, 2023
15df63f
Add blurb
gvanrossum Dec 13, 2023
c786418
Avoid redundant stack frame saves/restores
gvanrossum Dec 13, 2023
ee0734b
Revert "Disable curses tests in --fast-ci mode (make test)"
gvanrossum Dec 14, 2023
655a841
Merge branch 'main' into uops-extras
gvanrossum Dec 14, 2023
32e36fa
Merge branch 'main' into uops-extras
gvanrossum Dec 14, 2023
f5b317a
Fix compiler warning about int/Py_ssize_t
gvanrossum Dec 14, 2023
4804a3c
Be less casual about incref/decref current executor
gvanrossum Dec 16, 2023
46c7d26
Slightly nicer way to handle refcounts
gvanrossum Dec 16, 2023
b991279
Silence compiler warning
gvanrossum Dec 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions Include/cpython/optimizer.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ PyAPI_FUNC(_PyOptimizerObject *) PyUnstable_GetOptimizer(void);

PyAPI_FUNC(_PyExecutorObject *) PyUnstable_GetExecutor(PyCodeObject *code, int offset);

int
_PyOptimizer_BackEdge(struct _PyInterpreterFrame *frame, _Py_CODEUNIT *src, _Py_CODEUNIT *dest, PyObject **stack_pointer);
int _PyOptimizer_BackEdge(struct _PyInterpreterFrame *frame, _Py_CODEUNIT *src, _Py_CODEUNIT *dest, PyObject **stack_pointer);
int _PyOptimizer_Unanchored(struct _PyInterpreterFrame *frame, _Py_CODEUNIT *instr, _PyExecutorObject **pexecutor, PyObject **stack_pointer);

extern _PyOptimizerObject _PyOptimizer_Default;

Expand Down
3 changes: 3 additions & 0 deletions Include/internal/pycore_uops.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ typedef struct {

typedef struct {
_PyExecutorObject base;
// Auxiliary arrays, allocated after trace[base.ob_size]
uint16_t *counters; // An array of counters
_PyExecutorObject **executors; // An array of executors
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
_PyUOpInstruction trace[1];
} _PyUOpExecutorObject;

Expand Down
21 changes: 21 additions & 0 deletions Lib/test/test_capi/test_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -539,6 +539,27 @@ def testfunc(n):
# too much already.
self.assertEqual(count, 1)

def test_side_exits(self):
def testfunc():
for _ in range(100):
for i in range(100):
if i >= 70:
i = 0

opt = _testinternalcapi.get_uop_optimizer()
with temporary_optimizer(opt):
testfunc()

ex = get_first_executor(testfunc)
self.assertIsNotNone(ex)
uops = {opname for opname, _, _ in ex}
self.assertIn("_GUARD_IS_FALSE_POP", uops)
subs = [sub for sub in ex.sub_executors() if sub is not None]
self.assertGreater(len(subs), 0)
sub = subs[0]
sub_uops = {opname for opname, _, _ in sub}
self.assertIn("_GUARD_IS_TRUE_POP", sub_uops)


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
In the Tier 2 interpreter, add side exits to sub-executors for certain
micro-opcodes (currently only conditional branches).
113 changes: 110 additions & 3 deletions Python/ceval.c
Original file line number Diff line number Diff line change
Expand Up @@ -755,6 +755,7 @@
next_instr = frame->instr_ptr;
gvanrossum marked this conversation as resolved.
Show resolved Hide resolved
resume_frame:
stack_pointer = _PyFrame_GetStackPointer(frame);
resume_frame_using_stack_pointer:

#ifdef LLTRACE
lltrace = maybe_lltrace_resume_frame(frame, &entry_frame, GLOBALS());
Expand Down Expand Up @@ -1063,17 +1064,123 @@

// Jump here from DEOPT_IF()
deoptimize:
next_instr = next_uop[-1].target + _PyCode_CODE(_PyFrame_GetCode(frame));
frame->instr_ptr = next_uop[-1].target + _PyCode_CODE(_PyFrame_GetCode(frame));
DPRINTF(2, "DEOPT: [UOp %d (%s), oparg %d, operand %" PRIu64 ", target %d @ %d -> %s]\n",
uopcode, _PyUOpName(uopcode), next_uop[-1].oparg, next_uop[-1].operand, next_uop[-1].target,
(int)(next_uop - current_executor->trace - 1),
_PyOpcode_OpName[frame->instr_ptr->op.code]);
OPT_HIST(trace_uop_execution_counter, trace_run_length_hist);
UOP_STAT_INC(uopcode, miss);
Py_DECREF(current_executor);
DISPATCH();
frame->return_offset = 0; // Don't leave this random

// Check if there is a side-exit executor here already.
int pc = next_uop - 1 - current_executor->trace;

Check warning on line 1077 in Python/ceval.c

View workflow job for this annotation

GitHub Actions / Windows / build and test (x64)

'initializing': conversion from '__int64' to 'int', possible loss of data [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj]

Check warning on line 1077 in Python/ceval.c

View workflow job for this annotation

GitHub Actions / Windows / build (arm64)

'initializing': conversion from '__int64' to 'int', possible loss of data [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj]

Check warning on line 1077 in Python/ceval.c

View workflow job for this annotation

GitHub Actions / Windows (free-threaded) / build and test (x64)

'initializing': conversion from '__int64' to 'int', possible loss of data [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj]

Check warning on line 1077 in Python/ceval.c

View workflow job for this annotation

GitHub Actions / Windows (free-threaded) / build (arm64)

'initializing': conversion from '__int64' to 'int', possible loss of data [D:\a\cpython\cpython\PCbuild\pythoncore.vcxproj]
_PyExecutorObject **pexecutor = current_executor->executors + pc;
if (*pexecutor != NULL) {
#ifdef Py_DEBUG
PyCodeObject *code = _PyFrame_GetCode(frame);
DPRINTF(2, "Jumping to new executor for %s (%s:%d) at byte offset %d\n",
PyUnicode_AsUTF8(code->co_qualname),
PyUnicode_AsUTF8(code->co_filename),
code->co_firstlineno,
2 * (int)(frame->instr_ptr - _PyCode_CODE(_PyFrame_GetCode(frame))));
#endif
_PyUOpExecutorObject *new_executor = (_PyUOpExecutorObject *)Py_NewRef(*pexecutor);
Py_DECREF(current_executor);
current_executor = new_executor;
goto enter_tier_two;
}

// Increment and check side exit counter.
// (Even though we only need it for certain opcodes.)
next_instr = frame->instr_ptr;
uint16_t *pcounter = current_executor->counters + pc;
*pcounter += 1 << OPTIMIZER_BITS_IN_COUNTER;
/* We are using unsigned values, but we really want signed values, so
* do the 2s complement comparison manually */
uint16_t ucounter = *pcounter + (1 << 15);
uint16_t threshold = tstate->interp->optimizer_resume_threshold + (1 << 15);
if (ucounter <= threshold)
{
Py_DECREF(current_executor);
goto resume_frame_using_stack_pointer;
}

// Decode instruction to look past EXTENDED_ARG.
opcode = next_instr[0].op.code;
if (opcode == EXTENDED_ARG) {
opcode = next_instr[1].op.code;
}

// For selected opcodes build a new executor and enter it now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "selected opcodes", why not everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an earlier version that somehow didn't work. Right now the check whether the new trace isn't going to immediately deopt again relies on these opcodes. I figured once we have the side exit machinery working we could gradually increase the scope to other deoptimizations. Also, not all deoptimizations are worthy of the effort (e.g. the PEP 523 test).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No special cases, please, it just make the code more complicated and slower.
If we want to treat some exits differently, let's do it properly faster-cpython/ideas#638, not here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several reasons. First, as I explain below, for bytecodes other than branches, I can't promise an exact check for whether the newly created sub-executor doesn't just repeat the same deoptimizing uop that triggered its creation (in which case the sub-executor would always deopt immediately if it is entered at all).

Second, for most bytecodes other than branches, deoptimization paths are relatively rare (IIRC this is apparent from the pystats data -- with the exception of some LOAD_ATTR specializations).

For branches, we expect many cases where the "common" path is not much more common than the "uncommon" path (e.g. 60/40 or 70/30). Now, it might make sense have a different special case here, where if e.g. _GUARD_IS_TRUE_POP has a hot side-exit, we know that the branch goes the other way, so we can simply create a sub-executor starting at the less-common branch target. The current approach doesn't do this (mostly because I'd have to thread the special case all the way through the various optimizer functions) but just creates a new executor starting at the original Tier 1 branch bytecode -- in the expectation that if the counters are tuned just right, we will have executed the less-common branch in Tier 1 while taking the common branch in Tier 2, so that Tier 1's shift register has changed state and now indicates that the less-common branch is actually taken more frequently. The checks at L1161 and ff. below are a safeguard in case that didn't happen yet (there are all kinds of interesting scenarios, e.g. loops that don't iterate much -- remember that the first iteration each time we enter a loop will be done in Tier 1, where we stay until we hit the JUMP_BACKWARD bytecode at the end of the loop).

I propose this PR as a starting point for futher iterations, not as the ultimate design for side-exits. Let's discuss this Monday.

if (opcode == POP_JUMP_IF_FALSE ||
opcode == POP_JUMP_IF_TRUE ||
opcode == POP_JUMP_IF_NONE ||
opcode == POP_JUMP_IF_NOT_NONE)
{
DPRINTF(2, "--> %s @ %d in %p has %d side exits\n",
_PyUOpName(uopcode), pc, current_executor, (int)(*pcounter));
DPRINTF(2, " T1: %s\n", _PyOpcode_OpName[opcode]);

_PyExecutorObject *tmp_executor = NULL;
int optimized = _PyOptimizer_Unanchored(frame, next_instr, &tmp_executor, stack_pointer);
if (optimized < 0) {
goto error_tier_two;
}

if (!optimized) {
DPRINTF(2, "--> Failed to optimize %s @ %d in %p\n",
_PyUOpName(uopcode), pc, current_executor);
}
else {
#ifdef Py_DEBUG
DPRINTF(1, "--> Optimized %s @ %d in %p\n",
_PyUOpName(uopcode), pc, current_executor);
PyCodeObject *code = _PyFrame_GetCode(frame);
DPRINTF(2, "Jumping to fresh executor for %s (%s:%d) at byte offset %d\n",
PyUnicode_AsUTF8(code->co_qualname),
PyUnicode_AsUTF8(code->co_filename),
code->co_firstlineno,
2 * (int)(frame->instr_ptr - _PyCode_CODE(_PyFrame_GetCode(frame))));
#endif
_PyUOpExecutorObject *new_executor = (_PyUOpExecutorObject *)Py_NewRef(tmp_executor);

// Reject trace if it repeats the uop that just deoptimized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test may be a bit imprecise(*), but it tries to discard the case where, even though the counter in the executor indicated that this side exit is "hot", the Tier 1 bytecode hasn't been re-specialized yet. In that case the trace projection will just repeat the uop that just took a deopt side exit, causing it to immediately deopt again. This seems a waste of time and executors -- eventually the sub-executor's deopt counter will also indicate it is hot, and then we'll try again, but it seems better (if we can catch it) to avoid creating the sub-executor in the first place, relying on exponential backoff for the side-exit counter instead (implemented below at L1180 and ff.).

For various reasons, the side-exit counters and the Tier 1 deopt counters don't run in sync, so it's possible that the side-exit counter triggers before the Tier 1 counter has re-specialized. This check gives that another chance.

The test that I would like to use here would be check if the Tier 1 opcode is still unchanged (i.e., not re-specialized), but the executor doesn't record that information (and it would take up a lot of space, we'd need an extra byte for each uop that can deoptimize at least).


(*) The test I wrote is exact for the conditional branches I special-cased above (that's why there's a further special case here for _IS_NONE). For other opcodes it may miss a few cases, e.g. when a single T1 bytecode translates to multiple guards and the failing guard is not the first uop in the translation (i.e. this would always happens for calls, whose translation always starts with _PEP_523, which never dopts in cases we care about). In those cases we can produce a sub-executor that immediately deoptimizes. (And we never try to re-create executors, no matter how often it deoptimizes -- that's a general flaw in the current executor architecture that we should probably file separately.)

int jump_opcode = new_executor->trace[0].opcode;
if (jump_opcode == _IS_NONE) {
jump_opcode = new_executor->trace[1].opcode;
}
if (jump_opcode != uopcode) {
*pexecutor = tmp_executor;
*pcounter &= ((1 << OPTIMIZER_BITS_IN_COUNTER) - 1);
Py_DECREF(current_executor);
current_executor = new_executor;
goto enter_tier_two; // All systems go!
}

// The trace is guaranteed to deopt again; forget about it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See explanation above.

DPRINTF(2, "Alas, it's the same uop again (%s) -- discarding trace\n",
_PyUOpName(jump_opcode));
Py_DECREF(tmp_executor);
Py_DECREF(new_executor);
}
}

// Exponential backoff if we didn't optimize.
int backoff = *pcounter & ((1 << OPTIMIZER_BITS_IN_COUNTER) - 1);
if (backoff < MINIMUM_TIER2_BACKOFF) {
backoff = MINIMUM_TIER2_BACKOFF;
}
else if (backoff < 15 - OPTIMIZER_BITS_IN_COUNTER) {
backoff++;
}
assert(backoff <= 15 - OPTIMIZER_BITS_IN_COUNTER);
*pcounter = ((1 << 16) - ((1 << OPTIMIZER_BITS_IN_COUNTER) << backoff)) | backoff;

Py_DECREF(current_executor);
goto resume_frame_using_stack_pointer;
}

#if defined(__GNUC__)
# pragma GCC diagnostic pop
#elif defined(_MSC_VER) /* MS_WINDOWS */
Expand Down
69 changes: 62 additions & 7 deletions Python/optimizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ int
_PyOptimizer_BackEdge(_PyInterpreterFrame *frame, _Py_CODEUNIT *src, _Py_CODEUNIT *dest, PyObject **stack_pointer)
{
assert(src->op.code == JUMP_BACKWARD);
PyCodeObject *code = (PyCodeObject *)frame->f_executable;
PyCodeObject *code = _PyFrame_GetCode(frame);
assert(PyCode_Check(code));
PyInterpreterState *interp = _PyInterpreterState_GET();
if (!has_space_for_executor(code, src)) {
Expand Down Expand Up @@ -189,6 +189,27 @@ _PyOptimizer_BackEdge(_PyInterpreterFrame *frame, _Py_CODEUNIT *src, _Py_CODEUNI
return 1;
}

// Return an unanchored executor. The caller owns the executor when returning 1.
// No ENTER_EXECUTOR is inserted, nor is the executor added to the code object.
int
_PyOptimizer_Unanchored(
_PyInterpreterFrame *frame,
_Py_CODEUNIT *instr,
_PyExecutorObject **pexecutor,
PyObject **stack_pointer)
{
assert(instr->op.code != ENTER_EXECUTOR);
PyCodeObject *code = _PyFrame_GetCode(frame);
assert(PyCode_Check(code));
PyInterpreterState *interp = _PyInterpreterState_GET();
_PyOptimizerObject *opt = interp->optimizer;
if (strcmp(opt->ob_base.ob_type->tp_name, "uop_optimizer") != 0) {
return 0;
}
*pexecutor = NULL;
return opt->optimize(opt, code, instr, pexecutor, (int)(stack_pointer - _PyFrame_Stackbase(frame)));
}

_PyExecutorObject *
PyUnstable_GetExecutor(PyCodeObject *code, int offset)
{
Expand Down Expand Up @@ -321,6 +342,11 @@ PyUnstable_Optimizer_NewCounter(void)
static void
uop_dealloc(_PyUOpExecutorObject *self) {
_Py_ExecutorClear((_PyExecutorObject *)self);
if (self->executors != NULL) {
for (Py_ssize_t i = Py_SIZE(self); --i >= 0; ) {
Py_XDECREF(self->executors[i]);
}
}
PyObject_Free(self);
}

Expand Down Expand Up @@ -375,15 +401,41 @@ PySequenceMethods uop_as_sequence = {
.sq_item = (ssizeargfunc)uop_item,
};

static PyObject *
sub_executors(PyObject *self, PyObject *Py_UNUSED(ignored))
{
_PyUOpExecutorObject *executor = (_PyUOpExecutorObject *)self;
Py_ssize_t len = uop_len(executor);
PyObject *list = PyList_New(len);
if (list == NULL) {
return NULL;
}
for (Py_ssize_t i = 0; i < len; i++) {
PyObject *sub = (PyObject *)executor->executors[i];
if (sub == NULL) {
sub = Py_None;
}
Py_INCREF(sub);
PyList_SET_ITEM(list, i, (PyObject *)sub);
}
return list;
}

static PyMethodDef uop_executor_methods[] = {
{ "is_valid", is_valid, METH_NOARGS, NULL },
{ "sub_executors", sub_executors, METH_NOARGS, NULL },
{ NULL, NULL },
};

PyTypeObject _PyUOpExecutor_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
.tp_name = "uop_executor",
.tp_basicsize = sizeof(_PyUOpExecutorObject) - sizeof(_PyUOpInstruction),
.tp_itemsize = sizeof(_PyUOpInstruction),
.tp_itemsize = sizeof(_PyUOpInstruction) + sizeof(uint16_t) + sizeof(_PyExecutorObject *),
.tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_DISALLOW_INSTANTIATION,
.tp_dealloc = (destructor)uop_dealloc,
.tp_as_sequence = &uop_as_sequence,
.tp_methods = executor_methods,
.tp_methods = uop_executor_methods,
};

/* TO DO -- Generate these tables */
Expand Down Expand Up @@ -499,7 +551,7 @@ translate_bytecode_to_trace(
code = trace_stack[trace_stack_depth].code; \
instr = trace_stack[trace_stack_depth].instr;

DPRINTF(4,
DPRINTF(2,
"Optimizing %s (%s:%d) at byte offset %d\n",
PyUnicode_AsUTF8(code->co_qualname),
PyUnicode_AsUTF8(code->co_filename),
Expand Down Expand Up @@ -825,6 +877,10 @@ make_executor_from_uops(_PyUOpInstruction *buffer, _PyBloomFilter *dependencies)
if (executor == NULL) {
return NULL;
}
executor->executors = (_PyExecutorObject **)(&executor->trace[length]);
executor->counters = (uint16_t *)(&executor->executors[length]);
memset(executor->executors, 0, sizeof(_PyExecutorObject *) * length);
memset(executor->counters, 0, sizeof(uint16_t) * length);
int dest = length - 1;
/* Scan backwards, so that we see the destinations of jumps before the jumps themselves. */
for (int i = _Py_UOP_MAX_TRACE_LENGTH-1; i >= 0; i--) {
Expand Down Expand Up @@ -933,9 +989,8 @@ PyUnstable_Optimizer_NewUOpOptimizer(void)
return NULL;
}
opt->optimize = uop_optimize;
opt->resume_threshold = INT16_MAX;
// Need at least 3 iterations to settle specializations.
// A few lower bits of the counter are reserved for other flags.
// The lower bits are reserved for exponential backoff.
opt->resume_threshold = 16 << OPTIMIZER_BITS_IN_COUNTER;
opt->backedge_threshold = 16 << OPTIMIZER_BITS_IN_COUNTER;
return (PyObject *)opt;
}
Expand Down
1 change: 1 addition & 0 deletions Python/specialize.c
Original file line number Diff line number Diff line change
Expand Up @@ -2353,6 +2353,7 @@ int
void
_Py_Specialize_ForIter(PyObject *iter, _Py_CODEUNIT *instr, int oparg)
{
assert(_PyOpcode_Deopt[instr->op.code] == FOR_ITER);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really add such asserts to many specialization functions; I ran into this one during an intense debugging session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert can be instr->op.code == FOR_ITER and it shouldn't be necessary, as _Py_Specialize_ForIter is only called from FOR_ITER.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried that and I get a Bus error. And of course it's not supposed to be called with something else! But a logic error in my early prototype caused that to happen, and it took me quite a while to track it down.

assert(ENABLE_SPECIALIZATION);
assert(_PyOpcode_Caches[FOR_ITER] == INLINE_CACHE_ENTRIES_FOR_ITER);
_PyForIterCache *cache = (_PyForIterCache *)(instr + 1);
Expand Down