-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed changes to enable vectorization for parfors. #2709
Conversation
…the dimensionality of the input array. If neighborhood not specified, then calculate it only once and store it in StencilFunc and can then be queried through neighborhood attribute.
…run this example without it though.
…e img object when done.
…ing section that stencil decorator returns StencilFunc type and that that contains a neighborhood attribute.
…ction about the various checks performed and exceptions raised.
… have access to attributes. Added a check to stencil decorator to check for unknown stencil options. Added support for standard_indexing option.
… memory layout at runtime
and fixes for stride 1
…axis have 1s or 0s dims
in case of any layout array.
Enh/ascontiguousarray
# Conflicts: # numba/targets/arrayobj.py
1) C layout array can also be F contiguous. Need to check this in the is_contig checks 2) Numba is now smarter about contiguousness. Update test to reflect that.
Lots of travis problems. Think I may know the reason but if I'm correct the solution would also prevent vectorization. I will investigate. |
So, it seems that with "noalias nocapture" on parfor gufunc arrays, the LLVM LoopVectorizer will report that vectorization is possible but not profitable for the pseudo-blackscholes with transcendentals removed. I'm not an expert on analyzing this output but the LAA: Bad stride looks a likely culprit. I did some experimenting and we get these same messages whether ascontiguousarray is used, not used, or used only for input params. So, I'm not sure it is doing for us what we thought it should. LV: Checking a loop in "_ZN5numba8npyufunc6parfor39__numba_parfor_gufunc_0x7f162ddc471$242E5ArrayIxLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE5ArrayIdLi1E1C7mutable7alignedE" from gufunc_no_sqrt.ll LAA: Accesses(6): |
I wonder if turning on fastmath will help vectorize the code. |
When I did these tests, I did put fastmath on for the main function. At that time I wondered whether that flag would be carried through to where the gufunc was compiled. Later, I think I did some printing of the flags at the gufunc compilation point and I believe that fastmath was not defined there. So, I could add it back in there and give it a try. |
Some comments on that LLVM opt output from an expert. The -debug dump below details the estimated cost of the original scalar loop body “For VF=1”, vs. the estimated cost of vectorizing “For VF=2”, which clearly favors the scalar version: 136 < 180. |
…ot check for negative array indices which means that LLVM can figure out it is unit stride and vectorize.
…y aliases are found.
Codecov Report
@@ Coverage Diff @@
## master #2709 +/- ##
==========================================
+ Coverage 86.23% 86.25% +0.01%
==========================================
Files 321 323 +2
Lines 66192 66591 +399
Branches 7378 7426 +48
==========================================
+ Hits 57081 57436 +355
- Misses 7932 7962 +30
- Partials 1179 1193 +14 |
I have confirmed with these changes that a version of blackscholes with transcendentals removed will vectorize with these changes. There are probably many options to transmit this noalias flag to the relevant spot. Please review and if you want a different mechanism let's discuss.