New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split optimisation passes. #6335
Split optimisation passes. #6335
Conversation
This splits up the module level optimisation passes as follows: 1. Runs a cheap pass to inline across the module, this in an attempt to bring as many refops into the same function as possible. 2. Runs the reference count pruning pass. 3. Runs the full optimisation suite, this should discover many more opportunities for optimisation as a result of the inline and refop prune. Closes numba#5033
numba/core/codegen.py
Outdated
self._mpm_cheap = self._module_pass_manager(loop_vectorize=False, | ||
opt=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious... why O2 and not O1?
Also, do we need to override the inlining_threshold? e.g. cheap and full run has different threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was think that producing more optimised code might let the refpruner run quicker and also permit more inlining if the complexity is reduce. Turns out in some checks @esc did that O2
massively increases compile time, whereas O1
increases it a small bit, but both cases leading to huge performance gains, so I think O1
is probably the way to go for now. RE inline threshold, I've been thinking lately that it'd be a good idea to put more of these "trade-off" options into the hands of users, some will want to optimise something as much as possible regardless of the compilation cost, others will want to optimise for short compilation times, others may be inbetween!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4cf27a9 moves to O1
As title, results in similar runtime performance as O2 but at much cheaper compile time cost.
Build fail is due to #6356 |
@stuartarchibald, just some minor suggestions to simplify the code a little. |
As title
Thanks, done in 449d0f5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This splits up the module level optimisation passes as follows:
attempt to bring as many refops into the same function as
possible.
more opportunities for optimisation as a result of the inline
and refop prune.
Closes #5033