Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776

Merged
merged 2 commits into from
Apr 24, 2024

Conversation

andjo403
Copy link
Contributor

From what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes.

CC @nikic

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Andreas Jonson (andjo403)

Changes

From what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes.

CC @nikic


Patch is 223.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88776.diff

45 Files Affected:

  • (modified) llvm/include/llvm/IR/Attributes.h (+5)
  • (modified) llvm/include/llvm/IR/InstrTypes.h (+5)
  • (modified) llvm/lib/IR/Attributes.cpp (+7)
  • (modified) llvm/lib/IR/Instruction.cpp (+6)
  • (modified) llvm/lib/IR/Operator.cpp (+5)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (+13-18)
  • (modified) llvm/test/Transforms/InstCombine/bit_ceil.ll (+14-14)
  • (modified) llvm/test/Transforms/InstCombine/bit_floor.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/cmp-intrinsic.ll (+14-14)
  • (modified) llvm/test/Transforms/InstCombine/ctlz-cttz-bitreverse.ll (+6-6)
  • (modified) llvm/test/Transforms/InstCombine/ctlz-cttz-shifts.ll (+9-16)
  • (modified) llvm/test/Transforms/InstCombine/ctpop-bswap-bitreverse.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/ctpop-cttz.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/ctpop-pow2.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/ctpop.ll (+32-32)
  • (modified) llvm/test/Transforms/InstCombine/cttz-abs.ll (+17-17)
  • (modified) llvm/test/Transforms/InstCombine/cttz-negative.ll (+7-7)
  • (modified) llvm/test/Transforms/InstCombine/cttz.ll (+18-18)
  • (modified) llvm/test/Transforms/InstCombine/ffs-1.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/fls.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/fold-ctpop-of-not.ll (+16-16)
  • (modified) llvm/test/Transforms/InstCombine/fold-log2-ceil-idiom.ll (+27-30)
  • (modified) llvm/test/Transforms/InstCombine/freeze-integer-intrinsics.ll (+8-8)
  • (modified) llvm/test/Transforms/InstCombine/freeze.ll (+1-2)
  • (modified) llvm/test/Transforms/InstCombine/icmp-ne-pow2.ll (+10-10)
  • (modified) llvm/test/Transforms/InstCombine/intrinsic-select.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/intrinsics.ll (+12-12)
  • (modified) llvm/test/Transforms/InstCombine/ispow2.ll (+83-83)
  • (modified) llvm/test/Transforms/InstCombine/known-non-zero.ll (+9-9)
  • (modified) llvm/test/Transforms/InstCombine/known-phi-recurse.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/minmax-fold.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/reduction-add-sext-zext-i1.ll (+7-7)
  • (modified) llvm/test/Transforms/InstCombine/reduction-xor-sext-zext-i1.ll (+7-7)
  • (modified) llvm/test/Transforms/InstCombine/select-cmp-cttz-ctlz.ll (+47-47)
  • (modified) llvm/test/Transforms/InstCombine/select-ctlz-to-cttz.ll (+16-16)
  • (modified) llvm/test/Transforms/InstCombine/select.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/sext.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/shift-cttz-ctlz.ll (+6-9)
  • (modified) llvm/test/Transforms/InstCombine/shift.ll (+2-2)
  • (modified) llvm/test/Transforms/InstCombine/sub-xor.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/xor.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/zext-ctlz-trunc-to-ctlz-add.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-inloop.ll (+3-3)
  • (modified) llvm/test/Transforms/PhaseOrdering/X86/loop-idiom-vs-indvars.ll (+1-1)
  • (modified) llvm/test/Transforms/PhaseOrdering/lower-table-based-cttz.ll (+1-1)
diff --git a/llvm/include/llvm/IR/Attributes.h b/llvm/include/llvm/IR/Attributes.h
index 7dd8a329029a34..5e3ba1f32e6ab0 100644
--- a/llvm/include/llvm/IR/Attributes.h
+++ b/llvm/include/llvm/IR/Attributes.h
@@ -747,6 +747,11 @@ class AttributeList {
   addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned ArgNo,
                                     uint64_t Bytes) const;
 
+  /// Add the range attribute to the attribute set at the return value index.
+  /// Returns a new list because attribute lists are immutable.
+  [[nodiscard]] AttributeList addRangeRetAttr(LLVMContext &C,
+                                              const ConstantRange &CR) const;
+
   /// Add the allocsize attribute to the attribute set at the given arg index.
   /// Returns a new list because attribute lists are immutable.
   [[nodiscard]] AttributeList
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index cfe1b11ade5a4e..b66c0c28c86c85 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1914,6 +1914,11 @@ class CallBase : public Instruction {
     Attrs = Attrs.addDereferenceableRetAttr(getContext(), Bytes);
   }
 
+  /// adds the dereferenceable attribute to the list of attributes.
+  void addRangeRetAttr(const ConstantRange &CR) {
+    Attrs = Attrs.addRangeRetAttr(getContext(), CR);
+  }
+
   /// Determine whether the return value has the given attribute.
   bool hasRetAttr(Attribute::AttrKind Kind) const {
     return hasRetAttrImpl(Kind);
diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp
index b2d9992cdc0258..9c48a481de1ff6 100644
--- a/llvm/lib/IR/Attributes.cpp
+++ b/llvm/lib/IR/Attributes.cpp
@@ -1530,6 +1530,13 @@ AttributeList::addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned Index,
   return addParamAttributes(C, Index, B);
 }
 
+AttributeList AttributeList::addRangeRetAttr(LLVMContext &C,
+                                             const ConstantRange &CR) const {
+  AttrBuilder B(C);
+  B.addRangeAttr(CR);
+  return addRetAttributes(C, B);
+}
+
 AttributeList AttributeList::addAllocSizeParamAttr(
     LLVMContext &C, unsigned Index, unsigned ElemSizeArg,
     const std::optional<unsigned> &NumElemsArg) {
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index b9efe9cdcfe310..da36a9e9dfb46f 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -13,7 +13,9 @@
 #include "llvm/IR/Instruction.h"
 #include "llvm/ADT/DenseSet.h"
 #include "llvm/IR/AttributeMask.h"
+#include "llvm/IR/Attributes.h"
 #include "llvm/IR/Constants.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
@@ -450,6 +452,10 @@ void Instruction::dropPoisonGeneratingFlags() {
     cast<TruncInst>(this)->setHasNoUnsignedWrap(false);
     cast<TruncInst>(this)->setHasNoSignedWrap(false);
     break;
+
+  case Instruction::Call:
+  case Instruction::Invoke:
+    cast<CallBase>(this)->removeRetAttr(Attribute::Range);
   }
 
   if (isa<FPMathOperator>(this)) {
diff --git a/llvm/lib/IR/Operator.cpp b/llvm/lib/IR/Operator.cpp
index ccc624d854429c..47ee5ed635ee89 100644
--- a/llvm/lib/IR/Operator.cpp
+++ b/llvm/lib/IR/Operator.cpp
@@ -11,8 +11,10 @@
 //===----------------------------------------------------------------------===//
 
 #include "llvm/IR/Operator.h"
+#include "llvm/IR/Attributes.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/GetElementPtrTypeIterator.h"
+#include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/Instructions.h"
 
 #include "ConstantsContext.h"
@@ -49,6 +51,9 @@ bool Operator::hasPoisonGeneratingFlags() const {
     if (auto *NNI = dyn_cast<PossiblyNonNegInst>(this))
       return NNI->hasNonNeg();
     return false;
+  case Instruction::Call:
+  case Instruction::Invoke:
+    return cast<CallBase>(this)->hasRetAttr(Attribute::Range);
   default:
     if (const auto *FP = dyn_cast<FPMathOperator>(this))
       return FP->hasNoNaNs() || FP->hasNoInfs();
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 20f51c8af617de..554e50a2890af1 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -607,14 +607,13 @@ static Instruction *foldCttzCtlz(IntrinsicInst &II, InstCombinerImpl &IC) {
       return IC.replaceOperand(II, 1, IC.Builder.getTrue());
   }
 
-  // Add range metadata since known bits can't completely reflect what we know.
-  auto *IT = cast<IntegerType>(Op0->getType()->getScalarType());
-  if (IT && IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
-    Metadata *LowAndHigh[] = {
-        ConstantAsMetadata::get(ConstantInt::get(IT, DefiniteZeros)),
-        ConstantAsMetadata::get(ConstantInt::get(IT, PossibleZeros + 1))};
-    II.setMetadata(LLVMContext::MD_range,
-                   MDNode::get(II.getContext(), LowAndHigh));
+  // Add range attribute since known bits can't completely reflect what we know.
+  unsigned BitWidth = Op0->getType()->getScalarSizeInBits();
+  if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+      !II.getMetadata(LLVMContext::MD_range)) {
+    ConstantRange Range(APInt(BitWidth, DefiniteZeros),
+                        APInt(BitWidth, PossibleZeros + 1));
+    II.addRangeRetAttr(Range);
     return &II;
   }
 
@@ -686,16 +685,12 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
                                                   Constant::getNullValue(Ty)),
                             Ty);
 
-  // Add range metadata since known bits can't completely reflect what we know.
-  auto *IT = cast<IntegerType>(Ty->getScalarType());
-  unsigned MinCount = Known.countMinPopulation();
-  unsigned MaxCount = Known.countMaxPopulation();
-  if (IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
-    Metadata *LowAndHigh[] = {
-        ConstantAsMetadata::get(ConstantInt::get(IT, MinCount)),
-        ConstantAsMetadata::get(ConstantInt::get(IT, MaxCount + 1))};
-    II.setMetadata(LLVMContext::MD_range,
-                   MDNode::get(II.getContext(), LowAndHigh));
+  // Add range attribute since known bits can't completely reflect what we know.
+  if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+      !II.getMetadata(LLVMContext::MD_range)) {
+    ConstantRange Range(APInt(BitWidth, Known.countMinPopulation()),
+                        APInt(BitWidth, Known.countMaxPopulation() + 1));
+    II.addRangeRetAttr(Range);
     return &II;
   }
 
diff --git a/llvm/test/Transforms/InstCombine/bit_ceil.ll b/llvm/test/Transforms/InstCombine/bit_ceil.ll
index 52e70c78ba5428..16631afa4878da 100644
--- a/llvm/test/Transforms/InstCombine/bit_ceil.ll
+++ b/llvm/test/Transforms/InstCombine/bit_ceil.ll
@@ -5,7 +5,7 @@
 define i32 @bit_ceil_32(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -24,7 +24,7 @@ define i32 @bit_ceil_32(i32 %x) {
 define i64 @bit_ceil_64(i64 %x) {
 ; CHECK-LABEL: @bit_ceil_64(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i64 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i64 0, 65) i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i64 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i64 [[TMP1]], 63
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i64 1, [[TMP2]]
@@ -44,7 +44,7 @@ define i32 @bit_ceil_32_minus_1(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_minus_1(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[X:%.*]], -2
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -64,7 +64,7 @@ entry:
 ; std::bit_ceil<uint32_t>(x + 1)
 define i32 @bit_ceil_32_plus_1(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_plus_1(
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -84,7 +84,7 @@ define i32 @bit_ceil_plus_2(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_plus_2(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = add i32 [[X:%.*]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -105,7 +105,7 @@ define i32 @bit_ceil_32_neg(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_neg(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = xor i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -127,7 +127,7 @@ define i32 @bit_ceil_not(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_not(
 ; CHECK-NEXT:  entry:
 ; CHECK-NEXT:    [[SUB:%.*]] = sub i32 -2, [[X:%.*]]
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
 ; CHECK-NEXT:    [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = and i32 [[TMP0]], 31
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -147,7 +147,7 @@ entry:
 define i32 @bit_ceil_commuted_operands(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_commuted_operands(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    ret i32 [[SHL]]
@@ -165,7 +165,7 @@ define i32 @bit_ceil_commuted_operands(i32 %x) {
 define i32 @bit_ceil_wrong_select_constant(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_wrong_select_constant(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT_INV:%.*]] = icmp ult i32 [[X]], 2
@@ -185,7 +185,7 @@ define i32 @bit_ceil_wrong_select_constant(i32 %x) {
 define i32 @bit_ceil_32_wrong_cond(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_32_wrong_cond(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 2
@@ -205,7 +205,7 @@ define i32 @bit_ceil_32_wrong_cond(i32 %x) {
 define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
 ; CHECK-LABEL: @bit_ceil_wrong_sub_constant(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 33, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -225,7 +225,7 @@ define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
 define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_ceil_32_shl_used_twice(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -247,7 +247,7 @@ define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
 define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_ceil_32_sub_used_twice(
 ; CHECK-NEXT:    [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -269,7 +269,7 @@ define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
 define <4 x i32> @bit_ceil_v4i32(<4 x i32> %x) {
 ; CHECK-LABEL: @bit_ceil_v4i32(
 ; CHECK-NEXT:    [[DEC:%.*]] = add <4 x i32> [[X:%.*]], <i32 -1, i32 -1, i32 -1, i32 -1>
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 0, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub nsw <4 x i32> zeroinitializer, [[CTLZ]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = and <4 x i32> [[TMP1]], <i32 31, i32 31, i32 31, i32 31>
 ; CHECK-NEXT:    [[SEL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP2]]
diff --git a/llvm/test/Transforms/InstCombine/bit_floor.ll b/llvm/test/Transforms/InstCombine/bit_floor.ll
index 9daa8eee8969c0..bd8aabf4431c0a 100644
--- a/llvm/test/Transforms/InstCombine/bit_floor.ll
+++ b/llvm/test/Transforms/InstCombine/bit_floor.ll
@@ -5,7 +5,7 @@ define i32 @bit_floor_32(i32 %x) {
 ; CHECK-LABEL: @bit_floor_32(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -24,7 +24,7 @@ define i64 @bit_floor_64(i64 %x) {
 ; CHECK-LABEL: @bit_floor_64(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i64 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i64 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i64 1, 65) i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i64 64, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i64 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i64 0, i64 [[SHL]]
@@ -44,7 +44,7 @@ define i32 @bit_floor_commuted_operands(i32 %x) {
 ; CHECK-LABEL: @bit_floor_commuted_operands(
 ; CHECK-NEXT:    [[NE0_NOT:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[NE0_NOT]], i32 0, i32 [[SHL]]
@@ -64,7 +64,7 @@ define i32 @bit_floor_lshr_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_lshr_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -86,7 +86,7 @@ define i32 @bit_floor_ctlz_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_ctlz_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -108,7 +108,7 @@ define i32 @bit_floor_sub_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_sub_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -130,7 +130,7 @@ define i32 @bit_floor_shl_used_twice(i32 %x, ptr %p) {
 ; CHECK-LABEL: @bit_floor_shl_used_twice(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -152,7 +152,7 @@ define <4 x i32> @bit_floor_v4i32(<4 x i32> %x) {
 ; CHECK-LABEL: @bit_floor_v4i32(
 ; CHECK-NEXT:    [[EQ0:%.*]] = icmp eq <4 x i32> [[X:%.*]], zeroinitializer
 ; CHECK-NEXT:    [[LSHR:%.*]] = lshr <4 x i32> [[X]], <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT:    [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT:    [[CTLZ:%.*]] = tail call range(i32 1, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false)
 ; CHECK-NEXT:    [[SUB:%.*]] = sub nuw nsw <4 x i32> <i32 32, i32 32, i32 32, i32 32>, [[CTLZ]]
 ; CHECK-NEXT:    [[SHL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[SUB]]
 ; CHECK-NEXT:    [[SEL:%.*]] = select <4 x i1> [[EQ0]], <4 x i32> zeroinitializer, <4 x i32> [[SHL]]
diff --git a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
index 66cbb2636cbc2b..9a9f359fa80b4a 100644
--- a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
+++ b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
@@ -125,7 +125,7 @@ define <2 x ...
[truncated]


case Instruction::Call:
case Instruction::Invoke:
cast<CallBase>(this)->removeRetAttr(Attribute::Range);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only remove the attribute from the Call/Invoke instruction but Operator::hasPoisonGeneratingFlags() will also look at the callee for the attribute I do not know how to handle this as there is an assert in the end of this function that they shall be in sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hasPoisonGeneratingFlags you can explicitly fetch the AttributeList of the call and do the check there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that as all other places we call hasRetAttr so the instruction is still causing poison.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all a bit tricky, but the tl;dr is that we only care about call-site attributes in this context. Any attributes we'd have on an intrinsic function declaration would hold unconditionally and would not actually make the result more poisonous. E.g. if you have a ctlz.i32 we could add range(0, 32) to the declaration, but that will never make the result more poisonous. Non-intrinsics are currently not relevant in this context, because the surrounding analysis wouldn't support them anyway.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split out the changes to flag clearing into a separate PR? We'll also want to explicitly test them using some call-site of dropPoisonGeneratingFlagsAndMetadata().

And on that note, I don't think this change should be in hasPoisonGeneratingFlags/dropPoisonGeneratingFlags itself, which should stay just about flags. We should handle this in the "AndMetadata" variants -- possibly with a rename of the method, if we want to be pedantic (in the interest of not getting dropPoisonGeneratingFlagsAndMetadataAndAttributes, maybe something like dropPoisonGeneratingAnnotations?). It may make sense to handle the nonnull and align metadata at the same time, if we can come up with some examples of miscompiles for it (maybe something involving ptrmask?)

@andjo403
Copy link
Contributor Author

will try to split out the poison handling, I tried to do that first but was not able come up with some test that miscompiled.
there is test that fails in Transforms/InstCombine/ispow2.ll if the poison handling is removed from here so then the poison handling PR needs to be done first

@nikic
Copy link
Contributor

nikic commented Apr 16, 2024

will try to split out the poison handling, I tried to do that first but was not able come up with some test that miscompiled.

The test modified in cb6240d should work.

@dtcxzyw
Copy link
Member

dtcxzyw commented Apr 16, 2024

Failed Tests (2):
LLVM :: Transforms/InstCombine/ffs-i16.ll
LLVM :: Transforms/InstCombine/fls-i16.ll

@andjo403
Copy link
Contributor Author

rebased to get the poison handling and fixed the test (noticed that I had only had the x86 target in my build config) so this is ready for review again.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are clang test failures:


�_bk;t=1713477292662�Failed Tests (7):

�_bk;t=1713477292662�  Clang :: CodeGen/SystemZ/builtins-systemz-zvector.c

�_bk;t=1713477292662�  Clang :: CodeGen/SystemZ/builtins-systemz-zvector2.c

�_bk;t=1713477292662�  Clang :: CodeGen/builtins-wasm.c

�_bk;t=1713477292662�  Clang :: CodeGen/ms-intrinsics-other.c

�_bk;t=1713477292662�  Clang :: CodeGen/ms-intrinsics.c

�_bk;t=1713477292662�  Clang :: CodeGenOpenCL/builtins-generic-amdgcn.cl

�_bk;t=1713477292662�  Clang :: Headers/wasm.c

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:SystemZ labels Apr 19, 2024
@andjo403
Copy link
Contributor Author

This time it seems like I have managed to get all the tests to pass at least on linux, windows have not executed yet.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andjo403
Copy link
Contributor Author

I still have not asked for rights to merge do not know when there is a good time to do that feels like there is a lot of discussions about if it is needed to have it or not.

@dtcxzyw dtcxzyw merged commit b8f3024 into llvm:main Apr 24, 2024
5 of 6 checks passed
@andjo403 andjo403 deleted the rangeOnCttzCtlzandCtpop branch April 24, 2024 19:19
AlexMaclean added a commit that referenced this pull request Jun 6, 2024
Revamp the NVVMIntrRange pass making the following updates:
- Use range attributes over range metadata. This is what instcombine has
move to for ranges on intrinsics in
#88776 and it seems a bit
cleaner.
- Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function
metadata when adding ranges for `tid` srge instrinsics. This can allow
for smaller ranges and more optimization.
- When range attributes are already present, use the intersection of the
old and new range. This complements the metadata change by allowing
ranges to be shrunk when an intrinsic is in a function which is inlined
into a kernel with metadata. While we don't call this more then once
yet, we should consider adding a second call after inlining, once this
has had a chance to soak for a while and no issues have arisen.

I've also re-enabled this pass in the TM, it was disabled years ago due
to "numerical discrepancies" https://reviews.llvm.org/D96166. In our
testing we haven't seen any issues with adding ranges to intrinsics, and
I cannot find any further info about what issues were encountered.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:SystemZ clang Clang issues not falling into any other category llvm:ir llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants