[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

andjo403 · 2024-04-15T18:47:23Z

From what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes.

CC @nikic

llvmbot · 2024-04-15T18:47:56Z

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Andreas Jonson (andjo403)

Changes

From what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes.

CC @nikic

Patch is 223.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88776.diff

45 Files Affected:

(modified) llvm/include/llvm/IR/Attributes.h (+5)
(modified) llvm/include/llvm/IR/InstrTypes.h (+5)
(modified) llvm/lib/IR/Attributes.cpp (+7)
(modified) llvm/lib/IR/Instruction.cpp (+6)
(modified) llvm/lib/IR/Operator.cpp (+5)
(modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+13-18)
(modified) llvm/test/Transforms/InstCombine/bit_ceil.ll (+14-14)
(modified) llvm/test/Transforms/InstCombine/bit_floor.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/cmp-intrinsic.ll (+14-14)
(modified) llvm/test/Transforms/InstCombine/ctlz-cttz-bitreverse.ll (+6-6)
(modified) llvm/test/Transforms/InstCombine/ctlz-cttz-shifts.ll (+9-16)
(modified) llvm/test/Transforms/InstCombine/ctpop-bswap-bitreverse.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/ctpop-cttz.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/ctpop-pow2.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/ctpop.ll (+32-32)
(modified) llvm/test/Transforms/InstCombine/cttz-abs.ll (+17-17)
(modified) llvm/test/Transforms/InstCombine/cttz-negative.ll (+7-7)
(modified) llvm/test/Transforms/InstCombine/cttz.ll (+18-18)
(modified) llvm/test/Transforms/InstCombine/ffs-1.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/fls.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/fold-ctpop-of-not.ll (+16-16)
(modified) llvm/test/Transforms/InstCombine/fold-log2-ceil-idiom.ll (+27-30)
(modified) llvm/test/Transforms/InstCombine/freeze-integer-intrinsics.ll (+8-8)
(modified) llvm/test/Transforms/InstCombine/freeze.ll (+1-2)
(modified) llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll (+10-10)
(modified) llvm/test/Transforms/InstCombine/intrinsic-select.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/intrinsics.ll (+12-12)
(modified) llvm/test/Transforms/InstCombine/ispow2.ll (+83-83)
(modified) llvm/test/Transforms/InstCombine/known-non-zero.ll (+9-9)
(modified) llvm/test/Transforms/InstCombine/known-phi-recurse.ll (+5-5)
(modified) llvm/test/Transforms/InstCombine/minmax-fold.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/reduction-add-sext-zext-i1.ll (+7-7)
(modified) llvm/test/Transforms/InstCombine/reduction-xor-sext-zext-i1.ll (+7-7)
(modified) llvm/test/Transforms/InstCombine/select-cmp-cttz-ctlz.ll (+47-47)
(modified) llvm/test/Transforms/InstCombine/select-ctlz-to-cttz.ll (+16-16)
(modified) llvm/test/Transforms/InstCombine/select.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/sext.ll (+3-3)
(modified) llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll (+6-9)
(modified) llvm/test/Transforms/InstCombine/shift.ll (+2-2)
(modified) llvm/test/Transforms/InstCombine/sub-xor.ll (+1-1)
(modified) llvm/test/Transforms/InstCombine/xor.ll (+4-4)
(modified) llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll (+7-7)
(modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+3-3)
(modified) llvm/test/Transforms/PhaseOrdering/X86/loop-idiom-vs-indvars.ll (+1-1)
(modified) llvm/test/Transforms/PhaseOrdering/lower-table-based-cttz.ll (+1-1)

diff --git a/llvm/include/llvm/IR/Attributes.h b/llvm/include/llvm/IR/Attributes.h
index 7dd8a329029a34..5e3ba1f32e6ab0 100644
--- a/llvm/include/llvm/IR/Attributes.h
+++ b/llvm/include/llvm/IR/Attributes.h
@@ -747,6 +747,11 @@ class AttributeList {
   addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned ArgNo,
                                     uint64_t Bytes) const;
 
+  /// Add the range attribute to the attribute set at the return value index.
+  /// Returns a new list because attribute lists are immutable.
+  [[nodiscard]] AttributeList addRangeRetAttr(LLVMContext &C,
+                                              const ConstantRange &CR) const;
+
   /// Add the allocsize attribute to the attribute set at the given arg index.
   /// Returns a new list because attribute lists are immutable.
   [[nodiscard]] AttributeList
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index cfe1b11ade5a4e..b66c0c28c86c85 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1914,6 +1914,11 @@ class CallBase : public Instruction {
     Attrs = Attrs.addDereferenceableRetAttr(getContext(), Bytes);
   }
 
+  /// adds the dereferenceable attribute to the list of attributes.
+  void addRangeRetAttr(const ConstantRange &CR) {
+    Attrs = Attrs.addRangeRetAttr(getContext(), CR);
+  }
+
   /// Determine whether the return value has the given attribute.
   bool hasRetAttr(Attribute::AttrKind Kind) const {
     return hasRetAttrImpl(Kind);
diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp
index b2d9992cdc0258..9c48a481de1ff6 100644
--- a/llvm/lib/IR/Attributes.cpp
+++ b/llvm/lib/IR/Attributes.cpp
@@ -1530,6 +1530,13 @@ AttributeList::addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned Index,
   return addParamAttributes(C, Index, B);
 }
 
+AttributeList AttributeList::addRangeRetAttr(LLVMContext &C,
+                                             const ConstantRange &CR) const {
+  AttrBuilder B(C);
+  B.addRangeAttr(CR);
+  return addRetAttributes(C, B);
+}
+
 AttributeList AttributeList::addAllocSizeParamAttr(
     LLVMContext &C, unsigned Index, unsigned ElemSizeArg,
     const std::optional<unsigned> &NumElemsArg) {
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index b9efe9cdcfe310..da36a9e9dfb46f 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -13,7 +13,9 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/ADT/DenseSet.h"
 #include "llvm/IR/AttributeMask.h"
+#include "llvm/IR/Attributes.h"
 #include "llvm/IR/Constants.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
@@ -450,6 +452,10 @@ void Instruction::dropPoisonGeneratingFlags() {
     cast<TruncInst>(this)->setHasNoUnsignedWrap(false);
     cast<TruncInst>(this)->setHasNoSignedWrap(false);
     break;
+
+  case Instruction::Call:
+  case Instruction::Invoke:
+    cast<CallBase>(this)->removeRetAttr(Attribute::Range);
   }
 
   if (isa<FPMathOperator>(this)) {
diff --git a/llvm/lib/IR/Operator.cpp b/llvm/lib/IR/Operator.cpp
index ccc624d854429c..47ee5ed635ee89 100644
--- a/llvm/lib/IR/Operator.cpp
+++ b/llvm/lib/IR/Operator.cpp
@@ -11,8 +11,10 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/IR/Operator.h"
+#include "llvm/IR/Attributes.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/GetElementPtrTypeIterator.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
 
 #include "ConstantsContext.h"
@@ -49,6 +51,9 @@ bool Operator::hasPoisonGeneratingFlags() const {
     if (auto *NNI = dyn_cast<PossiblyNonNegInst>(this))
       return NNI->hasNonNeg();
     return false;
+  case Instruction::Call:
+  case Instruction::Invoke:
+    return cast<CallBase>(this)->hasRetAttr(Attribute::Range);
   default:
     if (const auto *FP = dyn_cast<FPMathOperator>(this))
       return FP->hasNoNaNs() || FP->hasNoInfs();
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 20f51c8af617de..554e50a2890af1 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -607,14 +607,13 @@ static Instruction *foldCttzCtlz(IntrinsicInst &II, InstCombinerImpl &IC) {
       return IC.replaceOperand(II, 1, IC.Builder.getTrue());
   }
 
-  // Add range metadata since known bits can't completely reflect what we know.
-  auto *IT = cast<IntegerType>(Op0->getType()->getScalarType());
-  if (IT && IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
-    Metadata *LowAndHigh[] = {
-        ConstantAsMetadata::get(ConstantInt::get(IT, DefiniteZeros)),
-        ConstantAsMetadata::get(ConstantInt::get(IT, PossibleZeros + 1))};
-    II.setMetadata(LLVMContext::MD_range,
-                   MDNode::get(II.getContext(), LowAndHigh));
+  // Add range attribute since known bits can't completely reflect what we know.
+  unsigned BitWidth = Op0->getType()->getScalarSizeInBits();
+  if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+      !II.getMetadata(LLVMContext::MD_range)) {
+    ConstantRange Range(APInt(BitWidth, DefiniteZeros),
+                        APInt(BitWidth, PossibleZeros + 1));
+    II.addRangeRetAttr(Range);
     return &II;
   }
 
@@ -686,16 +685,12 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
                                                   Constant::getNullValue(Ty)),
                             Ty);
 
-  // Add range metadata since known bits can't completely reflect what we know.
-  auto *IT = cast<IntegerType>(Ty->getScalarType());
-  unsigned MinCount = Known.countMinPopulation();
-  unsigned MaxCount = Known.countMaxPopulation();
-  if (IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
-    Metadata *LowAndHigh[] = {
-        ConstantAsMetadata::get(ConstantInt::get(IT, MinCount)),
-        ConstantAsMetadata::get(ConstantInt::get(IT, MaxCount + 1))};
-    II.setMetadata(LLVMContext::MD_range,
-                   MDNode::get(II.getContext(), LowAndHigh));
+  // Add range attribute since known bits can't completely reflect what we know.
+  if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+      !II.getMetadata(LLVMContext::MD_range)) {
+    ConstantRange Range(APInt(BitWidth, Known.countMinPopulation()),
+                        APInt(BitWidth, Known.countMaxPopulation() + 1));
+    II.addRangeRetAttr(Range);
     return &II;
   }
 
diff --git a/llvm/test/Transforms/InstCombine/bit_ceil.ll b/llvm/test/Transforms/InstCombine/bit_ceil.ll
index 52e70c78ba5428..16631afa4878da 100644
--- a/llvm/test/Transforms/InstCombine/bit_ceil.ll
+++ b/llvm/test/Transforms/InstCombine/bit_ceil.ll
@@ -5,7 +5,7 @@
 define i32 @bit_ceil_32(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -24,7 +24,7 @@ define i32 @bit_ceil_32(i32 %x) {
 define i64 @bit_ceil_64(i64 %x) {
 ; CHECK-LABEL: @bit_ceil_64(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i64 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i64 0, 65) i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i64 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i64 [[TMP1]], 63
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i64 1, [[TMP2]]
@@ -44,7 +44,7 @@ define i32 @bit_ceil_32_minus_1(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_minus_1(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[X:%.*]], -2
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -64,7 +64,7 @@ entry:
 ; std::bit_ceil<uint32_t>(x + 1)
 define i32 @bit_ceil_32_plus_1(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_plus_1(
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -84,7 +84,7 @@ define i32 @bit_ceil_plus_2(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_plus_2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[X:%.*]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -105,7 +105,7 @@ define i32 @bit_ceil_32_neg(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_neg(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = xor i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -127,7 +127,7 @@ define i32 @bit_ceil_not(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_not(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = sub i32 -2, [[X:%.*]]
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -147,7 +147,7 @@ entry:
 define i32 @bit_ceil_commuted_operands(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_commuted_operands(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    ret i32 [[SHL]]
@@ -165,7 +165,7 @@ define i32 @bit_ceil_commuted_operands(i32 %x) {
 define i32 @bit_ceil_wrong_select_constant(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_wrong_select_constant(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT_INV:%.*]] = icmp ult i32 [[X]], 2
@@ -185,7 +185,7 @@ define i32 @bit_ceil_wrong_select_constant(i32 %x) {
 define i32 @bit_ceil_32_wrong_cond(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_wrong_cond(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 2
@@ -205,7 +205,7 @@ define i32 @bit_ceil_32_wrong_cond(i32 %x) {
 define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_wrong_sub_constant(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 33, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -225,7 +225,7 @@ define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
 define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_ceil_32_shl_used_twice(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -247,7 +247,7 @@ define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
 define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_ceil_32_sub_used_twice(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -269,7 +269,7 @@ define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
 define <4 x i32> @bit_ceil_v4i32(<4 x i32> %x) {
 ; CHECK-LABEL: @bit_ceil_v4i32(
 ; CHECK-NEXT:    [[DEC:%.*]] = add <4 x i32> [[X:%.*]], <i32 -1, i32 -1, i32 -1, i32 -1>
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw <4 x i32> zeroinitializer, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and <4 x i32> [[TMP1]], <i32 31, i32 31, i32 31, i32 31>
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP2]]
diff --git a/llvm/test/Transforms/InstCombine/bit_floor.ll b/llvm/test/Transforms/InstCombine/bit_floor.ll
index 9daa8eee8969c0..bd8aabf4431c0a 100644
--- a/llvm/test/Transforms/InstCombine/bit_floor.ll
+++ b/llvm/test/Transforms/InstCombine/bit_floor.ll
@@ -5,7 +5,7 @@ define i32 @bit_floor_32(i32 %x) {
 ; CHECK-LABEL: @bit_floor_32(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -24,7 +24,7 @@ define i64 @bit_floor_64(i64 %x) {
 ; CHECK-LABEL: @bit_floor_64(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i64 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i64 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i64 1, 65) i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i64 64, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i64 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i64 0, i64 [[SHL]]
@@ -44,7 +44,7 @@ define i32 @bit_floor_commuted_operands(i32 %x) {
 ; CHECK-LABEL: @bit_floor_commuted_operands(
 ; CHECK-NEXT:    [[NE0_NOT:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[NE0_NOT]], i32 0, i32 [[SHL]]
@@ -64,7 +64,7 @@ define i32 @bit_floor_lshr_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_lshr_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -86,7 +86,7 @@ define i32 @bit_floor_ctlz_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_ctlz_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -108,7 +108,7 @@ define i32 @bit_floor_sub_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_sub_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -130,7 +130,7 @@ define i32 @bit_floor_shl_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_shl_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -152,7 +152,7 @@ define <4 x i32> @bit_floor_v4i32(<4 x i32> %x) {
 ; CHECK-LABEL: @bit_floor_v4i32(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq <4 x i32> [[X:%.*]], zeroinitializer
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr <4 x i32> [[X]], <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw <4 x i32> <i32 32, i32 32, i32 32, i32 32>, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select <4 x i1> [[EQ0]], <4 x i32> zeroinitializer, <4 x i32> [[SHL]]
diff --git a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
index 66cbb2636cbc2b..9a9f359fa80b4a 100644
--- a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
+++ b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
@@ -125,7 +125,7 @@ define <2 x ...
[truncated]

andjo403 · 2024-04-15T18:55:48Z

llvm/lib/IR/Instruction.cpp

+
+  case Instruction::Call:
+  case Instruction::Invoke:
+    cast<CallBase>(this)->removeRetAttr(Attribute::Range);


this only remove the attribute from the Call/Invoke instruction but Operator::hasPoisonGeneratingFlags() will also look at the callee for the attribute I do not know how to handle this as there is an assert in the end of this function that they shall be in sync.

In hasPoisonGeneratingFlags you can explicitly fetch the AttributeList of the call and do the check there.

I was thinking that as all other places we call hasRetAttr so the instruction is still causing poison.

This is all a bit tricky, but the tl;dr is that we only care about call-site attributes in this context. Any attributes we'd have on an intrinsic function declaration would hold unconditionally and would not actually make the result more poisonous. E.g. if you have a ctlz.i32 we could add range(0, 32) to the declaration, but that will never make the result more poisonous. Non-intrinsics are currently not relevant in this context, because the surrounding analysis wouldn't support them anyway.

nikic

Could you please split out the changes to flag clearing into a separate PR? We'll also want to explicitly test them using some call-site of dropPoisonGeneratingFlagsAndMetadata().

And on that note, I don't think this change should be in hasPoisonGeneratingFlags/dropPoisonGeneratingFlags itself, which should stay just about flags. We should handle this in the "AndMetadata" variants -- possibly with a rename of the method, if we want to be pedantic (in the interest of not getting dropPoisonGeneratingFlagsAndMetadataAndAttributes, maybe something like dropPoisonGeneratingAnnotations?). It may make sense to handle the nonnull and align metadata at the same time, if we can come up with some examples of miscompiles for it (maybe something involving ptrmask?)

andjo403 · 2024-04-16T06:31:08Z

will try to split out the poison handling, I tried to do that first but was not able come up with some test that miscompiled.
there is test that fails in Transforms/InstCombine/ispow2.ll if the poison handling is removed from here so then the poison handling PR needs to be done first

nikic · 2024-04-16T07:48:08Z

will try to split out the poison handling, I tried to do that first but was not able come up with some test that miscompiled.

The test modified in cb6240d should work.

dtcxzyw · 2024-04-16T07:54:24Z

Failed Tests (2):
LLVM :: Transforms/InstCombine/ffs-i16.ll
LLVM :: Transforms/InstCombine/fls-i16.ll

…z/ctpop

andjo403 · 2024-04-18T20:14:20Z

rebased to get the poison handling and fixed the test (noticed that I had only had the x86 target in my build config) so this is ready for review again.

nikic

There are clang test failures:


�_bk;t=1713477292662�Failed Tests (7):

�_bk;t=1713477292662�  Clang :: CodeGen/SystemZ/builtins-systemz-zvector.c

�_bk;t=1713477292662�  Clang :: CodeGen/SystemZ/builtins-systemz-zvector2.c

�_bk;t=1713477292662�  Clang :: CodeGen/builtins-wasm.c

�_bk;t=1713477292662�  Clang :: CodeGen/ms-intrinsics-other.c

�_bk;t=1713477292662�  Clang :: CodeGen/ms-intrinsics.c

�_bk;t=1713477292662�  Clang :: CodeGenOpenCL/builtins-generic-amdgcn.cl

�_bk;t=1713477292662�  Clang :: Headers/wasm.c

…ttz/ctlz/ctpop

andjo403 · 2024-04-19T20:18:02Z

This time it seems like I have managed to get all the tests to pass at least on linux, windows have not executed yet.

nikic

LGTM

andjo403 · 2024-04-24T17:24:05Z

I still have not asked for rights to merge do not know when there is a good time to do that feels like there is a lot of discussions about if it is needed to have it or not.

Revamp the NVVMIntrRange pass making the following updates: - Use range attributes over range metadata. This is what instcombine has move to for ranges on intrinsics in #88776 and it seems a bit cleaner. - Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function metadata when adding ranges for `tid` srge instrinsics. This can allow for smaller ranges and more optimization. - When range attributes are already present, use the intersection of the old and new range. This complements the metadata change by allowing ranges to be shrunk when an intrinsic is in a function which is inlined into a kernel with metadata. While we don't call this more then once yet, we should consider adding a second call after inlining, once this has had a chance to soak for a while and no issues have arisen. I've also re-enabled this pass in the TM, it was disabled years ago due to "numerical discrepancies" https://reviews.llvm.org/D96166. In our testing we haven't seen any issues with adding ranges to intrinsics, and I cannot find any further info about what issues were encountered.

andjo403 requested a review from nikic as a code owner April 15, 2024 18:47

llvmbot added llvm:ir llvm:transforms labels Apr 15, 2024

andjo403 commented Apr 15, 2024

View reviewed changes

nikic reviewed Apr 16, 2024

View reviewed changes

andjo403 mentioned this pull request Apr 17, 2024

Take poison-generating attributes into account #89138

Merged

[InstCombine] Swap out range metadata to range attribute for cttz/ctl…

bd3e83f

…z/ctpop

andjo403 force-pushed the rangeOnCttzCtlzandCtpop branch from 9db3377 to bd3e83f Compare April 18, 2024 20:09

nikic reviewed Apr 19, 2024

View reviewed changes

fixup! [InstCombine] Swap out range metadata to range attribute for c…

ac3b0a5

…ttz/ctlz/ctpop

llvmbot added clang Clang issues not falling into any other category backend:SystemZ labels Apr 19, 2024

nikic approved these changes Apr 24, 2024

View reviewed changes

dtcxzyw merged commit b8f3024 into llvm:main Apr 24, 2024
5 of 6 checks passed

andjo403 deleted the rangeOnCttzCtlzandCtpop branch April 24, 2024 19:19

AlexMaclean mentioned this pull request Jun 5, 2024

[NVPTX] Revamp NVVMIntrRange pass #94422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

andjo403 commented Apr 15, 2024

llvmbot commented Apr 15, 2024 •

edited

andjo403 Apr 15, 2024

nikic Apr 16, 2024

andjo403 Apr 16, 2024

nikic Apr 16, 2024

nikic left a comment

andjo403 commented Apr 16, 2024

nikic commented Apr 16, 2024

dtcxzyw commented Apr 16, 2024

andjo403 commented Apr 18, 2024

nikic left a comment

andjo403 commented Apr 19, 2024

nikic left a comment

andjo403 commented Apr 24, 2024

[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

Conversation

andjo403 commented Apr 15, 2024

llvmbot commented Apr 15, 2024 • edited

andjo403 Apr 15, 2024

Choose a reason for hiding this comment

nikic Apr 16, 2024

Choose a reason for hiding this comment

andjo403 Apr 16, 2024

Choose a reason for hiding this comment

nikic Apr 16, 2024

Choose a reason for hiding this comment

nikic left a comment

Choose a reason for hiding this comment

andjo403 commented Apr 16, 2024

nikic commented Apr 16, 2024

dtcxzyw commented Apr 16, 2024

andjo403 commented Apr 18, 2024

nikic left a comment

Choose a reason for hiding this comment

andjo403 commented Apr 19, 2024

nikic left a comment

Choose a reason for hiding this comment

andjo403 commented Apr 24, 2024

llvmbot commented Apr 15, 2024 •

edited