-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instruction pmaxsw not generated #55013
Comments
@llvm/issue-subscribers-backend-x86 |
Regressed with new intrinsics? |
Godbolt: https://godbolt.org/z/7vxqbMzaa This looks to be generating pmaxsw instructions to me? |
Here is what I was seeing: https://godbolt.org/z/hxz1xhMPh |
hm, lul. Difference is whether we are in C or C++ mode. |
Interestingly, Clang in C mode creates phi i32, in C++ phi i16. Caused by some language rules, @AaronBallman ? |
@llvm/issue-subscribers-clang-codegen |
Independent of the front-end question, there seems to be an optimizer enhancement opportunity. With the phi i32 (seen in the 'C' source IR example), both incoming values are sexts initially, but -ipsccp manages to convert one into a zext. This patch made ipsccp smarter, but I'm not sure if that's where the difference on this example shows up (might be in some underlying analysis too): Instcombine / ValueTracking then can't reconcile the difference when used in an icmp, and so we miss recognizing the min/max pattern. So we either need to make ipsccp smarter again or fix ValueTracking (isKnownNonNegative) + instcombine...or both. There's a potentially-related ValueTracking improvement proposed here: Ie, maybe we can use matchSimpleRecurrence to recognize that a phi value is never negative. |
If a value is known to be non-negative and zexted, that's the same thing as sexted. So for the purpose of looking past the casts with an icmp, treat it as if it was a sext: https://alive2.llvm.org/ce/z/_BDsGV This is necessary, but not enough to solve the motivating problem: #55013 Differential Revision: https://reviews.llvm.org/D124419
Strangely enough, changing the |
This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.
This PR fixes llvm#55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented llvm#67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.
The code generated by 'x86-64 clang 13.0.1' in /O3 will generate pmaxsw instructions. With 'x86-64 clang 14.0.0' and 'x86-64 clang (trunk)', the pmaxw is not generated, and the code size is slightly bigger.
The text was updated successfully, but these errors were encountered: