New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missed SLP vectorization #49277
Comments
Ok, with typedef unsigned short uint8_t; LLVM produces good codegen. So it looks like a cost model issue for (U)INT8. |
Not sure if its purely a cost model issue, but also to do with 8 x uint8_t being smaller than the 128-bit vector target minimum. |
But according to llvm-mca, ICC's codegen is much better Block RThroughput is 4, for LLVM is 9.8 |
With -mllvm -slp-min-reg-size=64 We have this nice codegen
Block RThroughput: 1.8 |
The best way to fix this bug is to wait for this patch to land: https://reviews.llvm.org/D57059 ("non-power-of-2 vectors"). I've checked that it works better:
Although this vectorized codegen is different compared to This difference comes from using |
Added test to track issue: https://reviews.llvm.org/rGdd028c359e09 |
mentioned in issue llvm/llvm-bugzilla-archive#49934 |
Fixed in 9dc4ced |
Extended Description
ICC:
LLVM does not vectorize it with avx/avx2/avx512 - cost model issue?
https://godbolt.org/z/Kheeec4cG
The text was updated successfully, but these errors were encountered: