Feature request: Expose optimisation configuration into user space. #8430

stuartarchibald · 2022-09-09T15:08:14Z

Following on from numerous reports and at least two lengthy discussions at different times during the weekly Numba public meetings, this ticket is a meta-issue to promote and record discussion on the following:

The "default" optimisation level Numba should use for compilation.
How to expose more and more fine grained optimisation options into user space through
the existing Numba APIs.

At present 1. is implemented approximately as follows:

Run a "cheap" optimisation pass with view of inlining as much as possible.
This is so as to expose as many Numba reference counting operations as
possible to Numba's custom reference count pruning pass. This currently
comprises:
- Running something like -O0 with "loop rotation", "loop invariant code
  motion" and "CFG simplification" passes added. It turns out in practice
  that these are commonly needed to help transform the LLVM IR that Numba
  generates into something that will perform well under the "expensive"
  optimisation pass, particularly with respect to vectorisation.
- Running the aforementioned reference count pruning pass.
Run an "expensive" optimisation pass, which is something like -O3 cf.
clang -O3, with loop and SLP vectorisation enabled.

Historically (prior to Numba 0.55) what eventually became the "cheap" pass was running at -O3 and there was a less sophisticated reference count pruner running (it could only analyse operations within a basic block). Essentially, Numba ran the -O3 passes over the code twice!

The reason for the change between 0.54 and 0.55 was that a new Numba-reference-count operation pruning pass was developed. These reference counts a) impact runtime performance and b) prevent certain classes of optimisations), therefore a strategy to do as much as possible to remove Numba specific reference counting operations was employed as described above. Further, for a lot of code, it was observed that running -O3 twice had little benefit, it can end up making negligible difference to performance but at a much increased compilation cost, this further informed the strategy above.

As has been noted in various open issues, there have been cases where a single -O3 pass has missed optimisations which can be undertaken by running a subsequent -O3 pass. See issues: #8398, #8172, #8314, #6547.

Input on what a "better" default would be is welcomed!

With regards to 2. a brief summary from prior discussions (this is from memory, so please do correct as needed).

Commonly described use cases:

Users who are perhaps not explicitly concerned about a certain performance characteristic, they want something that is "reasonable" in terms of compilation and execution time by default.
Users in HPC/high performance situations where any compilation cost is accepted if it reduces runtime. i.e. compilation time is dwarfed by the run time.
Users that are compilation time constrained and know which functions are worth optimising. e.g. dynamic code generation situations/"interactive" applications.
Users wanting to do incredibly fine grained tuning of optimisation pipelines for some purpose.
Users researching compilers wanting fine grained specification of the optimisation pipelines.

Previously discussed options (not mutually exclusive):

Expose pre-canned defaults like O0/O1/O2/O3 etc.
Permit use of a custom optimisation pipeline spelled via the LLVM pass names.
Expose some more colloquial terms like "hot", "cold" etc to govern the amount
of effort Numba should put in to compiling a given function.
Expose an option for "keep trying until the optimisation passes have no further meaningful effect".

Previously discussed method(s) of exposing the options (not mutually exclusive):

Setting a new global default.
Setting the option per-function as part of the @njit decoration options.

Input is welcomed on use cases, optimisation options, and their method of exposure into user space.

The text was updated successfully, but these errors were encountered:

dlee992 · 2024-05-15T00:00:17Z

Expose some more colloquial terms like "hot", "cold" etc to govern the amount
of effort Numba should put in to compiling a given function.

Setting the option per-function as part of the @njit decoration options.

This feature of allowing control over the compilation effect on each function could be useful from my first thought! In this way, we can fine-tune the compilation speed, and try not compromise the runtime.

stuartarchibald added feature_request discussion An issue requiring discussion labels Sep 9, 2022

stuartarchibald mentioned this issue Sep 9, 2022

50% performance drop from py3.8/numba 0.51.2 to py3.9/numba 0.55.1 #8398

Open

gmarkall mentioned this issue Sep 12, 2023

[Discuss] generates extra branches in LLVM IR 0.57.1 #9186

Closed

dlee992 mentioned this issue May 15, 2024

Gain more vectorization opportunities #9570

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Expose optimisation configuration into user space. #8430

Feature request: Expose optimisation configuration into user space. #8430

stuartarchibald commented Sep 9, 2022

dlee992 commented May 15, 2024

Feature request: Expose optimisation configuration into user space. #8430

Feature request: Expose optimisation configuration into user space. #8430

Comments

stuartarchibald commented Sep 9, 2022

dlee992 commented May 15, 2024