Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Expose optimisation configuration into user space. #8430

Open
stuartarchibald opened this issue Sep 9, 2022 · 1 comment
Labels
discussion An issue requiring discussion feature_request

Comments

@stuartarchibald
Copy link
Contributor

Following on from numerous reports and at least two lengthy discussions at different times during the weekly Numba public meetings, this ticket is a meta-issue to promote and record discussion on the following:

  1. The "default" optimisation level Numba should use for compilation.
  2. How to expose more and more fine grained optimisation options into user space through
    the existing Numba APIs.

At present 1. is implemented approximately as follows:

  • Run a "cheap" optimisation pass with view of inlining as much as possible.
    This is so as to expose as many Numba reference counting operations as
    possible to Numba's custom reference count pruning pass. This currently
    comprises:

    • Running something like -O0 with "loop rotation", "loop invariant code
      motion" and "CFG simplification" passes added. It turns out in practice
      that these are commonly needed to help transform the LLVM IR that Numba
      generates into something that will perform well under the "expensive"
      optimisation pass, particularly with respect to vectorisation.
    • Running the aforementioned reference count pruning pass.
  • Run an "expensive" optimisation pass, which is something like -O3 cf.
    clang -O3, with loop and SLP vectorisation enabled.

Historically (prior to Numba 0.55) what eventually became the "cheap" pass was running at -O3 and there was a less sophisticated reference count pruner running (it could only analyse operations within a basic block). Essentially, Numba ran the -O3 passes over the code twice!

The reason for the change between 0.54 and 0.55 was that a new Numba-reference-count operation pruning pass was developed. These reference counts a) impact runtime performance and b) prevent certain classes of optimisations), therefore a strategy to do as much as possible to remove Numba specific reference counting operations was employed as described above. Further, for a lot of code, it was observed that running -O3 twice had little benefit, it can end up making negligible difference to performance but at a much increased compilation cost, this further informed the strategy above.

As has been noted in various open issues, there have been cases where a single -O3 pass has missed optimisations which can be undertaken by running a subsequent -O3 pass. See issues: #8398, #8172, #8314, #6547.

Input on what a "better" default would be is welcomed!


With regards to 2. a brief summary from prior discussions (this is from memory, so please do correct as needed).

Commonly described use cases:

  1. Users who are perhaps not explicitly concerned about a certain performance characteristic, they want something that is "reasonable" in terms of compilation and execution time by default.
  2. Users in HPC/high performance situations where any compilation cost is accepted if it reduces runtime. i.e. compilation time is dwarfed by the run time.
  3. Users that are compilation time constrained and know which functions are worth optimising. e.g. dynamic code generation situations/"interactive" applications.
  4. Users wanting to do incredibly fine grained tuning of optimisation pipelines for some purpose.
  5. Users researching compilers wanting fine grained specification of the optimisation pipelines.

Previously discussed options (not mutually exclusive):

  1. Expose pre-canned defaults like O0/O1/O2/O3 etc.
  2. Permit use of a custom optimisation pipeline spelled via the LLVM pass names.
  3. Expose some more colloquial terms like "hot", "cold" etc to govern the amount
    of effort Numba should put in to compiling a given function.
  4. Expose an option for "keep trying until the optimisation passes have no further meaningful effect".

Previously discussed method(s) of exposing the options (not mutually exclusive):

  1. Setting a new global default.
  2. Setting the option per-function as part of the @njit decoration options.

Input is welcomed on use cases, optimisation options, and their method of exposure into user space.

@dlee992
Copy link
Contributor

dlee992 commented May 15, 2024

Expose some more colloquial terms like "hot", "cold" etc to govern the amount
of effort Numba should put in to compiling a given function.

Setting the option per-function as part of the @njit decoration options.

This feature of allowing control over the compilation effect on each function could be useful from my first thought! In this way, we can fine-tune the compilation speed, and try not compromise the runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion An issue requiring discussion feature_request
Projects
None yet
Development

No branches or pull requests

2 participants