New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance hit with local temporary variables #3980
Comments
It looks like the array-expression fusion is not working across assignment. |
After reviewing the implementation for array-expr rewrite, I noticed it is designed to work on a single basic-block at a time. It cannot fuse across basic-block and cannot determine if temporary variable (i.e. On the other hand, the wanted optimization is already provided by parallel-accelerator suite of optimizations. By enabling it (i.e. @stuartarchibald, @ehsantn , is there a way to enable the loop-fusion using sequential lowering? Maybe we can make that a default behavior. |
Somewhat suspected this was the case, thanks for confirming. This is not something for production use, but yes, sequential lowering will work I think as the fusion passes will be run: import numpy as np
import numba
import numexpr
from IPython import get_ipython
ipython = get_ipython()
def f1(x):
return np.sin(x) ** 2 + np.cos(x) ** 2
def f2(x):
s = np.sin(x)
c = np.cos(x)
return s ** 2 + c ** 2
a = np.arange(1.e4)
from numba import parfor
parfor.sequential_parfor_lowering=True
f2_numba = numba.njit(parallel=True)(f2)
f2_numba(a)
print(f2_numba.parallel_diagnostics(level=4)) this gives the diagnostic:
and then inspecting the IR post legalization gives:
block |
it looks like we should consider moving |
I think if you can apply this optimisation by default with
Why not do this optimization by default?
Thank you for all your work on Numba, it is wonderful! |
Yes, I think this should be a jit option. ParallelAcclerator's optimizations including fusion are quite useful in general without the threading backend. |
the change log (https://github.com/numba/numba/blob/master/CHANGE_LOG).
to write one see http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports).
It looks like Numba cannot optimise simple functions will if they contain local temporary variables on multiple lines, instead of a long expression on a single line.
With the latest Numba 0.43.1 on macOS from Anaconda and the
sin(x) ** 2 + cos(x) ** 2
example from http://numba.pydata.org/numba-doc/latest/user/performance-tips.html I see this:f2
a bit slower thanf1
? I thought Numpy creates temp variables and it doesn't matter how I write my code?f2_numba
so much slower thanf1_numba
? Shouldn't both compile to the same function and give identical performance?I couldn't find any information on http://numba.pydata.org/numba-doc/latest/user/performance-tips.html or anywhere in the Numba docs concerning this question of whether using temp local variables is OK or not when one wants good performance.
Could you please add some documentation giving advice / information on this point?
Or - if possible - improve Numba to make this question void by generating the same good performance one gets from a single expression?
The text was updated successfully, but these errors were encountered: