-
Notifications
You must be signed in to change notification settings - Fork 11.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[X86] Prefer trunc(reduce(x)) over reduce(trunc(x)) #81469
Labels
Comments
@llvm/issue-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon)
Reported here: https://discourse.llvm.org/t/avx2-popcount-regression/76926
int popcount8(uint64_t data[8]) {
int count = 0;
for (int i = 0; i < 8; ++i)
count += __builtin_popcountll(data[i]);
return count;
} define i32 @<!-- -->popcount8(ptr %data) {
entry:
%0 = load <8 x i64>, ptr %data, align 8
%1 = tail call <8 x i64> @<!-- -->llvm.ctpop.v8i64(<8 x i64> %0)
%2 = trunc <8 x i64> %1 to <8 x i32>
%3 = tail call i32 @<!-- -->llvm.vector.reduce.add.v8i32(<8 x i32> %2)
ret i32 %3
}
declare <8 x i64> @<!-- -->llvm.ctpop.v8i64(<8 x i64>)
declare i32 @<!-- -->llvm.vector.reduce.add.v8i32(<8 x i32>) We can avoid the vector truncation replacing with a free scalar truncation if we perform the reduction on the v8i64: define i32 @<!-- -->popcount8(ptr %data) {
entry:
%0 = load <8 x i64>, ptr %data, align 8
%1 = tail call <8 x i64> @<!-- -->llvm.ctpop.v8i64(<8 x i64> %0)
%2 = tail call i64 @<!-- -->llvm.vector.reduce.add.v8i64 (<8 x i64 > %1)
%3 = trunc i64 %2 to i32
ret i32 %3
}
declare <8 x i64> @<!-- -->llvm.ctpop.v8i64(<8 x i64>) #<!-- -->1
declare i64 @<!-- -->llvm.vector.reduce.add.v8i64(<8 x i64>) Godbolt: https://simd.godbolt.org/z/ooK497x7s We might be best off attempting this in vector-combine |
Alive2: https://alive2.llvm.org/ce/z/phx0Lp AFAICT we can do this for add/mul/and/or/xor reductions |
RKSimon
added a commit
to RKSimon/llvm-project
that referenced
this issue
Feb 15, 2024
…fective Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes llvm#81469
RKSimon
added a commit
to RKSimon/llvm-project
that referenced
this issue
Feb 16, 2024
…fective Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes llvm#81469
RKSimon
added a commit
to RKSimon/llvm-project
that referenced
this issue
Feb 19, 2024
…fective Vector truncations can be pretty expensive, especially on X86, whilst scalar truncations are often free. If the cost of performing the add/mul/and/or/xor reduction is cheap enough on the pre-truncated type, then avoid the vector truncation entirely. Fixes llvm#81469
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Reported here: https://discourse.llvm.org/t/avx2-popcount-regression/76926
We can avoid the vector truncation replacing with a free scalar truncation if we perform the reduction on the v8i64:
Godbolt: https://simd.godbolt.org/z/ooK497x7s
We might be best off attempting this in vector-combine
The text was updated successfully, but these errors were encountered: