Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split optimisation passes. #6335

Merged

Conversation

stuartarchibald
Copy link
Contributor

This splits up the module level optimisation passes as follows:

  1. Runs a cheap pass to inline across the module, this in an
    attempt to bring as many refops into the same function as
    possible.
  2. Runs the reference count pruning pass.
  3. Runs the full optimisation suite, this should discover many
    more opportunities for optimisation as a result of the inline
    and refop prune.

Closes #5033

This splits up the module level optimisation passes as follows:

1. Runs a cheap pass to inline across the module, this in an
   attempt to bring as many refops into the same function as
   possible.
2. Runs the reference count pruning pass.
3. Runs the full optimisation suite, this should discover many
   more opportunities for optimisation as a result of the inline
   and refop prune.

Closes numba#5033
Comment on lines 1045 to 1046
self._mpm_cheap = self._module_pass_manager(loop_vectorize=False,
opt=2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious... why O2 and not O1?

Also, do we need to override the inlining_threshold? e.g. cheap and full run has different threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was think that producing more optimised code might let the refpruner run quicker and also permit more inlining if the complexity is reduce. Turns out in some checks @esc did that O2 massively increases compile time, whereas O1 increases it a small bit, but both cases leading to huge performance gains, so I think O1 is probably the way to go for now. RE inline threshold, I've been thinking lately that it'd be a good idea to put more of these "trade-off" options into the hands of users, some will want to optimise something as much as possible regardless of the compilation cost, others will want to optimise for short compilation times, others may be inbetween!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4cf27a9 moves to O1

@esc esc mentioned this pull request Oct 12, 2020
As title, results in similar runtime performance as O2 but at much
cheaper compile time cost.
@stuartarchibald stuartarchibald added 4 - Waiting on reviewer Waiting for reviewer to respond to author and removed 3 - Ready for Review labels Oct 12, 2020
@stuartarchibald
Copy link
Contributor Author

Build fail is due to #6356

numba/core/codegen.py Outdated Show resolved Hide resolved
numba/core/codegen.py Outdated Show resolved Hide resolved
@sklam
Copy link
Member

sklam commented Oct 13, 2020

@stuartarchibald, just some minor suggestions to simplify the code a little.

@stuartarchibald
Copy link
Contributor Author

@stuartarchibald, just some minor suggestions to simplify the code a little.

Thanks, done in 449d0f5

Copy link
Member

@sklam sklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sklam sklam added 5 - Ready to merge Review and testing done, is ready to merge and removed 4 - Waiting on reviewer Waiting for reviewer to respond to author labels Oct 13, 2020
@stuartarchibald stuartarchibald merged commit 790373e into numba:master Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to merge Review and testing done, is ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIMD vectorization failure with nditer
2 participants