Instruction pmaxsw not generated #55013

mariannems · 2022-04-21T13:25:40Z

The code generated by 'x86-64 clang 13.0.1' in /O3 will generate pmaxsw instructions. With 'x86-64 clang 14.0.0' and 'x86-64 clang (trunk)', the pmaxw is not generated, and the code size is slightly bigger.

short vecreduce_smax_v2i16(int n, short* v)
{
  short p = 0;
  for (int i = 0; i < n; ++i)
    p = p < v[i] ? v[i] : p;
  return p;
}

llvmbot · 2022-04-21T14:06:42Z

@llvm/issue-subscribers-backend-x86

davidbolvansky · 2022-04-21T16:06:32Z

Regressed with new intrinsics?

@rotateright @nikic

RKSimon · 2022-04-21T16:12:38Z

Godbolt: https://godbolt.org/z/7vxqbMzaa

This looks to be generating pmaxsw instructions to me?

mariannems · 2022-04-21T16:40:22Z

Here is what I was seeing: https://godbolt.org/z/hxz1xhMPh

davidbolvansky · 2022-04-21T17:25:25Z

hm, lul. Difference is whether we are in C or C++ mode.

https://godbolt.org/z/8zeqzoff9

davidbolvansky · 2022-04-22T20:08:40Z

Interestingly, Clang in C mode creates phi i32, in C++ phi i16.

Caused by some language rules, @AaronBallman ?

llvmbot · 2022-04-22T20:59:19Z

@llvm/issue-subscribers-clang-codegen

rotateright · 2022-04-25T15:17:40Z

Independent of the front-end question, there seems to be an optimizer enhancement opportunity. With the phi i32 (seen in the 'C' source IR example), both incoming values are sexts initially, but -ipsccp manages to convert one into a zext.

This patch made ipsccp smarter, but I'm not sure if that's where the difference on this example shows up (might be in some underlying analysis too):
https://reviews.llvm.org/D81756

Instcombine / ValueTracking then can't reconcile the difference when used in an icmp, and so we miss recognizing the min/max pattern.

So we either need to make ipsccp smarter again or fix ValueTracking (isKnownNonNegative) + instcombine...or both.

There's a potentially-related ValueTracking improvement proposed here:
https://reviews.llvm.org/D123748

Ie, maybe we can use matchSimpleRecurrence to recognize that a phi value is never negative.

rotateright · 2022-04-25T21:41:30Z

https://reviews.llvm.org/D124419

If a value is known to be non-negative and zexted, that's the same thing as sexted. So for the purpose of looking past the casts with an icmp, treat it as if it was a sext: https://alive2.llvm.org/ce/z/_BDsGV This is necessary, but not enough to solve the motivating problem: #55013 Differential Revision: https://reviews.llvm.org/D124419

leo-ard · 2023-09-15T18:50:13Z

Strangely enough, changing the < into a > in the test unrolls the loop correctly with calls to pminsw... It seems to be a regression of the 13.0.1 version: https://godbolt.org/z/cPzWjT86T

@rotateright

This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

@rotateright

This PR fixes llvm#55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented llvm#67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

github-actions bot added the new issue label Apr 21, 2022

EugeneZelenko added backend:X86 and removed new issue labels Apr 21, 2022

RKSimon added vectorization llvm:SLPVectorizer llvm:optimizations and removed backend:X86 labels Apr 22, 2022

RKSimon added the clang:codegen label Apr 22, 2022

leo-ard mentioned this issue Sep 27, 2023

[SCCP] [Transform] Adding ICMP folding for zext and sext in SCCPSolver #67594

Closed

leo-ard mentioned this issue Oct 31, 2023

[InstCombine] Use zext's nneg flag for icmp folding #70845

Merged

dtcxzyw closed this as completed in #70845 Nov 12, 2023

EugeneZelenko added llvm:instcombine and removed clang:codegen vectorization llvm:SLPVectorizer llvm:optimizations labels Nov 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instruction pmaxsw not generated #55013

Instruction pmaxsw not generated #55013

mariannems commented Apr 21, 2022 •

edited by VoltrexKeyva

Loading

llvmbot commented Apr 21, 2022

davidbolvansky commented Apr 21, 2022

RKSimon commented Apr 21, 2022

mariannems commented Apr 21, 2022

davidbolvansky commented Apr 21, 2022

davidbolvansky commented Apr 22, 2022

llvmbot commented Apr 22, 2022

rotateright commented Apr 25, 2022

rotateright commented Apr 25, 2022

leo-ard commented Sep 15, 2023 •

edited

Loading

Instruction pmaxsw not generated #55013

Instruction pmaxsw not generated #55013

Comments

mariannems commented Apr 21, 2022 • edited by VoltrexKeyva Loading

llvmbot commented Apr 21, 2022

davidbolvansky commented Apr 21, 2022

RKSimon commented Apr 21, 2022

mariannems commented Apr 21, 2022

davidbolvansky commented Apr 21, 2022

davidbolvansky commented Apr 22, 2022

llvmbot commented Apr 22, 2022

rotateright commented Apr 25, 2022

rotateright commented Apr 25, 2022

leo-ard commented Sep 15, 2023 • edited Loading

mariannems commented Apr 21, 2022 •

edited by VoltrexKeyva

Loading

leo-ard commented Sep 15, 2023 •

edited

Loading