Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instruction pmaxsw not generated #55013

Closed
mariannems opened this issue Apr 21, 2022 · 10 comments · Fixed by #70845
Closed

Instruction pmaxsw not generated #55013

mariannems opened this issue Apr 21, 2022 · 10 comments · Fixed by #70845

Comments

@mariannems
Copy link
Contributor

mariannems commented Apr 21, 2022

The code generated by 'x86-64 clang 13.0.1' in /O3 will generate pmaxsw instructions. With 'x86-64 clang 14.0.0' and 'x86-64 clang (trunk)', the pmaxw is not generated, and the code size is slightly bigger.

short vecreduce_smax_v2i16(int n, short* v)
{
  short p = 0;
  for (int i = 0; i < n; ++i)
    p = p < v[i] ? v[i] : p;
  return p;
}
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 21, 2022

@llvm/issue-subscribers-backend-x86

@davidbolvansky
Copy link
Collaborator

Regressed with new intrinsics?

@rotateright @nikic

@RKSimon
Copy link
Collaborator

RKSimon commented Apr 21, 2022

Godbolt: https://godbolt.org/z/7vxqbMzaa

This looks to be generating pmaxsw instructions to me?

@mariannems
Copy link
Contributor Author

Here is what I was seeing: https://godbolt.org/z/hxz1xhMPh

@davidbolvansky
Copy link
Collaborator

hm, lul. Difference is whether we are in C or C++ mode.

https://godbolt.org/z/8zeqzoff9

@davidbolvansky
Copy link
Collaborator

Interestingly, Clang in C mode creates phi i32, in C++ phi i16.

Caused by some language rules, @AaronBallman ?

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 22, 2022

@llvm/issue-subscribers-clang-codegen

@rotateright
Copy link
Contributor

Independent of the front-end question, there seems to be an optimizer enhancement opportunity. With the phi i32 (seen in the 'C' source IR example), both incoming values are sexts initially, but -ipsccp manages to convert one into a zext.

This patch made ipsccp smarter, but I'm not sure if that's where the difference on this example shows up (might be in some underlying analysis too):
https://reviews.llvm.org/D81756

Instcombine / ValueTracking then can't reconcile the difference when used in an icmp, and so we miss recognizing the min/max pattern.

So we either need to make ipsccp smarter again or fix ValueTracking (isKnownNonNegative) + instcombine...or both.

There's a potentially-related ValueTracking improvement proposed here:
https://reviews.llvm.org/D123748

Ie, maybe we can use matchSimpleRecurrence to recognize that a phi value is never negative.

@rotateright
Copy link
Contributor

rotateright added a commit that referenced this issue Apr 26, 2022
If a value is known to be non-negative and zexted,
that's the same thing as sexted.

So for the purpose of looking past the casts with
an icmp, treat it as if it was a sext:
https://alive2.llvm.org/ce/z/_BDsGV

This is necessary, but not enough to solve the
motivating problem:
#55013

Differential Revision: https://reviews.llvm.org/D124419
@leo-ard
Copy link
Contributor

leo-ard commented Sep 15, 2023

Strangely enough, changing the < into a > in the test unrolls the loop correctly with calls to pminsw... It seems to be a regression of the 13.0.1 version: https://godbolt.org/z/cPzWjT86T

dtcxzyw pushed a commit that referenced this issue Nov 12, 2023
This PR fixes #55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented #67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
zahiraam pushed a commit to zahiraam/llvm-project that referenced this issue Nov 20, 2023
This PR fixes llvm#55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented llvm#67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants