Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pluggable optimizer API #104584

Open
markshannon opened this issue May 17, 2023 · 3 comments
Open

Pluggable optimizer API #104584

markshannon opened this issue May 17, 2023 · 3 comments
Labels
3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented May 17, 2023

We need an API for optimizers to be plugged in to CPython.

The proposed model is that of client server, where the VM is the client and the optimizer is the server.
The optimizer registers with the VM, then VM calls the optimizer when hotspots are detected.

The API:

type struct {
    OBJECT_HEADER;
    _PyInterpreterFrame *(*execute)(PyExecutorObject *self, _PyInterpreterFrame *frame, PyObject **stack_pointer);
    /* Data needed by the executor goes here, but is opaque to the VM */
} PyExecutorObject;

/* This would be nicer as an enum, but C doesn't define the size of enums */
#define PY_OPTIMIZE_FUNCTION_ENTRY 1
#define PY_OPTIMIZE_RESUME_AFTER_YIELD 2
#define PY_OPTIMIZE_BACK_EDGE 4
typedef uint32_t PyOptimizerCapabilities;

type struct {
    OBJECT_HEADER;
    PyExecutorObject *(*compile)(PyOptimizerObject* self, PyCodeObject *code, int offset);
    PyOptimizerCapabilities capabilities;
    float optimization_cost;
    float run_cost;
    /* Data needed by the compiler goes here, but is opaque to the VM */
} PyOptimizerObject;

void _Py_Executor_Replace(PyCodeObject *code, int offset, PyExecutorObject *executor);

int _Py_Optimizer_Register(PyOptimizerObject* optimizer);

The semantics of a PyExecutorObject is that upon return from its execute function, the VM state will have advanced N instructions. Where N is a non-negative integer.

Full discussion here: faster-cpython/ideas#380

This is not a replacement for PEP 523. That will need a PEP. We should get this working first, before we consider replacing PEP 523.

Linked PRs

@markshannon markshannon added performance Performance or resource usage 3.13 new features, bugs and security fixes labels May 17, 2023
@markshannon
Copy link
Member Author

markshannon commented May 18, 2023

Note that the above API is just the initial version to support our work on speeding up Python 3.13.
It will probably need to be extended to support PyTorch Dynamo and other users of PEP 523 that cannot use PEP 669, but that is for another issue.

markshannon added a commit that referenced this issue Jun 19, 2023
* Add test for long loops

* Clear ENTER_EXECUTOR when deopting code objects.
gvanrossum added a commit that referenced this issue Jun 27, 2023
Added a new, experimental, tracing optimizer and interpreter (a.k.a. "tier 2"). This currently pessimizes, so don't use yet -- this is infrastructure so we can experiment with optimizing passes. To enable it, pass ``-Xuops`` or set ``PYTHONUOPS=1``. To get debug output, set ``PYTHONUOPSDEBUG=N`` where ``N`` is a debug level (0-4, where 0 is no debug output and 4 is excessively verbose).

All of this code is likely to change dramatically before the 3.13 feature freeze. But this is a first step.
gvanrossum added a commit that referenced this issue Jun 27, 2023
This effectively reverts bb578a0, restoring the original DEOPT_IF() macro in ceval_macros.h, and redefining it in the Tier 2 interpreter. We can get rid of the PREDICTED() macros there as well!
vstinner added a commit to vstinner/cpython that referenced this issue Jun 28, 2023
test_counter_optimizer() and test_long_loop() of test_capi now create
a new function at each call. Otherwise, the optimizer counters are
not the expected values when the test is run more than once.
vstinner added a commit that referenced this issue Jun 28, 2023
…6171)

test_counter_optimizer() and test_long_loop() of test_capi now create
a new function at each call. Otherwise, the optimizer counters are
not the expected values when the test is run more than once.
gvanrossum added a commit that referenced this issue Jun 28, 2023
This produces longer traces (superblocks?).

Also improved debug output (uop names are now printed instead of numeric opcodes). This would be simpler if the numeric opcode values were generated by generate_cases.py, but that's another project.

Refactored some code in generate_cases.py so the essential algorithm for cache effects is only run once. (Deciding which effects are used and what the total cache size is, regardless of what's used.)
markshannon added a commit that referenced this issue Jul 3, 2023
* Check eval-breaker in ENTER_EXECUTOR.

* Make sure that frame->prev_instr is set before entering executor.
gvanrossum added a commit that referenced this issue Jul 6, 2023
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.

The problem this causes is subtle -- the only repro I have is in PR gh-106393, at commit d7df54b. But the fix is real (as shown later in that PR).

While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.

<!-- gh-issue-number: gh-104584 -->
* Issue: gh-104584
<!-- /gh-issue-number -->
gvanrossum added a commit that referenced this issue Jul 6, 2023
The uops test wasn't testing anything by default,
and was failing when run with -Xuops.

Made the two executor-related context managers global,
so TestUops can use them (notably `with temporary_optimizer(opt)`).

Made clear_executor() a little more thorough.

Fixed a crash upon finalizing a uop optimizer,
by adding a `tp_dealloc` handler.
gvanrossum added a commit to gvanrossum/cpython that referenced this issue Jul 6, 2023
…6492)

The uops test wasn't testing anything by default,
and was failing when run with -Xuops.

Made the two executor-related context managers global,
so TestUops can use them (notably `with temporary_optimizer(opt)`).

Made clear_executor() a little more thorough.

Fixed a crash upon finalizing a uop optimizer,
by adding a `tp_dealloc` handler.
gvanrossum added a commit that referenced this issue Jul 7, 2023
Instead of special-casing specific instructions,
we add a few more special values to the 'size' field of expansions,
so in the future we can automatically handle
additional super-instructions in the generator.
gvanrossum added a commit that referenced this issue Jul 7, 2023
This adds several of unspecialized opcodes to superblocks:

TO_BOOL, BINARY_SUBSCR, STORE_SUBSCR,
UNPACK_SEQUENCE, LOAD_GLOBAL, LOAD_ATTR,
COMPARE_OP, BINARY_OP.

While we may not want that eventually, for now this helps finding bugs.

There is a rudimentary test checking for UNPACK_SEQUENCE.

Once we're ready to undo this, that would be simple:
just replace the call to variable_used_unspecialized
with a call to variable_used (as shown in a comment).
Or add individual opcdes to FORBIDDEN_NAMES_IN_UOPS.
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 new features, bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants
@iritkatriel @markshannon and others