-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Swap out range metadata to range attribute for cttz/ctlz/ctpop #88776
Conversation
@llvm/pr-subscribers-backend-systemz @llvm/pr-subscribers-llvm-ir Author: Andreas Jonson (andjo403) ChangesFrom what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes. CC @nikic Patch is 223.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88776.diff 45 Files Affected:
diff --git a/llvm/include/llvm/IR/Attributes.h b/llvm/include/llvm/IR/Attributes.h
index 7dd8a329029a34..5e3ba1f32e6ab0 100644
--- a/llvm/include/llvm/IR/Attributes.h
+++ b/llvm/include/llvm/IR/Attributes.h
@@ -747,6 +747,11 @@ class AttributeList {
addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned ArgNo,
uint64_t Bytes) const;
+ /// Add the range attribute to the attribute set at the return value index.
+ /// Returns a new list because attribute lists are immutable.
+ [[nodiscard]] AttributeList addRangeRetAttr(LLVMContext &C,
+ const ConstantRange &CR) const;
+
/// Add the allocsize attribute to the attribute set at the given arg index.
/// Returns a new list because attribute lists are immutable.
[[nodiscard]] AttributeList
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index cfe1b11ade5a4e..b66c0c28c86c85 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -1914,6 +1914,11 @@ class CallBase : public Instruction {
Attrs = Attrs.addDereferenceableRetAttr(getContext(), Bytes);
}
+ /// adds the dereferenceable attribute to the list of attributes.
+ void addRangeRetAttr(const ConstantRange &CR) {
+ Attrs = Attrs.addRangeRetAttr(getContext(), CR);
+ }
+
/// Determine whether the return value has the given attribute.
bool hasRetAttr(Attribute::AttrKind Kind) const {
return hasRetAttrImpl(Kind);
diff --git a/llvm/lib/IR/Attributes.cpp b/llvm/lib/IR/Attributes.cpp
index b2d9992cdc0258..9c48a481de1ff6 100644
--- a/llvm/lib/IR/Attributes.cpp
+++ b/llvm/lib/IR/Attributes.cpp
@@ -1530,6 +1530,13 @@ AttributeList::addDereferenceableOrNullParamAttr(LLVMContext &C, unsigned Index,
return addParamAttributes(C, Index, B);
}
+AttributeList AttributeList::addRangeRetAttr(LLVMContext &C,
+ const ConstantRange &CR) const {
+ AttrBuilder B(C);
+ B.addRangeAttr(CR);
+ return addRetAttributes(C, B);
+}
+
AttributeList AttributeList::addAllocSizeParamAttr(
LLVMContext &C, unsigned Index, unsigned ElemSizeArg,
const std::optional<unsigned> &NumElemsArg) {
diff --git a/llvm/lib/IR/Instruction.cpp b/llvm/lib/IR/Instruction.cpp
index b9efe9cdcfe310..da36a9e9dfb46f 100644
--- a/llvm/lib/IR/Instruction.cpp
+++ b/llvm/lib/IR/Instruction.cpp
@@ -13,7 +13,9 @@
#include "llvm/IR/Instruction.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/IR/AttributeMask.h"
+#include "llvm/IR/Attributes.h"
#include "llvm/IR/Constants.h"
+#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
@@ -450,6 +452,10 @@ void Instruction::dropPoisonGeneratingFlags() {
cast<TruncInst>(this)->setHasNoUnsignedWrap(false);
cast<TruncInst>(this)->setHasNoSignedWrap(false);
break;
+
+ case Instruction::Call:
+ case Instruction::Invoke:
+ cast<CallBase>(this)->removeRetAttr(Attribute::Range);
}
if (isa<FPMathOperator>(this)) {
diff --git a/llvm/lib/IR/Operator.cpp b/llvm/lib/IR/Operator.cpp
index ccc624d854429c..47ee5ed635ee89 100644
--- a/llvm/lib/IR/Operator.cpp
+++ b/llvm/lib/IR/Operator.cpp
@@ -11,8 +11,10 @@
//===----------------------------------------------------------------------===//
#include "llvm/IR/Operator.h"
+#include "llvm/IR/Attributes.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"
+#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"
#include "ConstantsContext.h"
@@ -49,6 +51,9 @@ bool Operator::hasPoisonGeneratingFlags() const {
if (auto *NNI = dyn_cast<PossiblyNonNegInst>(this))
return NNI->hasNonNeg();
return false;
+ case Instruction::Call:
+ case Instruction::Invoke:
+ return cast<CallBase>(this)->hasRetAttr(Attribute::Range);
default:
if (const auto *FP = dyn_cast<FPMathOperator>(this))
return FP->hasNoNaNs() || FP->hasNoInfs();
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 20f51c8af617de..554e50a2890af1 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -607,14 +607,13 @@ static Instruction *foldCttzCtlz(IntrinsicInst &II, InstCombinerImpl &IC) {
return IC.replaceOperand(II, 1, IC.Builder.getTrue());
}
- // Add range metadata since known bits can't completely reflect what we know.
- auto *IT = cast<IntegerType>(Op0->getType()->getScalarType());
- if (IT && IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
- Metadata *LowAndHigh[] = {
- ConstantAsMetadata::get(ConstantInt::get(IT, DefiniteZeros)),
- ConstantAsMetadata::get(ConstantInt::get(IT, PossibleZeros + 1))};
- II.setMetadata(LLVMContext::MD_range,
- MDNode::get(II.getContext(), LowAndHigh));
+ // Add range attribute since known bits can't completely reflect what we know.
+ unsigned BitWidth = Op0->getType()->getScalarSizeInBits();
+ if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+ !II.getMetadata(LLVMContext::MD_range)) {
+ ConstantRange Range(APInt(BitWidth, DefiniteZeros),
+ APInt(BitWidth, PossibleZeros + 1));
+ II.addRangeRetAttr(Range);
return &II;
}
@@ -686,16 +685,12 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
Constant::getNullValue(Ty)),
Ty);
- // Add range metadata since known bits can't completely reflect what we know.
- auto *IT = cast<IntegerType>(Ty->getScalarType());
- unsigned MinCount = Known.countMinPopulation();
- unsigned MaxCount = Known.countMaxPopulation();
- if (IT->getBitWidth() != 1 && !II.getMetadata(LLVMContext::MD_range)) {
- Metadata *LowAndHigh[] = {
- ConstantAsMetadata::get(ConstantInt::get(IT, MinCount)),
- ConstantAsMetadata::get(ConstantInt::get(IT, MaxCount + 1))};
- II.setMetadata(LLVMContext::MD_range,
- MDNode::get(II.getContext(), LowAndHigh));
+ // Add range attribute since known bits can't completely reflect what we know.
+ if (BitWidth != 1 && !II.hasRetAttr(Attribute::Range) &&
+ !II.getMetadata(LLVMContext::MD_range)) {
+ ConstantRange Range(APInt(BitWidth, Known.countMinPopulation()),
+ APInt(BitWidth, Known.countMaxPopulation() + 1));
+ II.addRangeRetAttr(Range);
return &II;
}
diff --git a/llvm/test/Transforms/InstCombine/bit_ceil.ll b/llvm/test/Transforms/InstCombine/bit_ceil.ll
index 52e70c78ba5428..16631afa4878da 100644
--- a/llvm/test/Transforms/InstCombine/bit_ceil.ll
+++ b/llvm/test/Transforms/InstCombine/bit_ceil.ll
@@ -5,7 +5,7 @@
define i32 @bit_ceil_32(i32 %x) {
; CHECK-LABEL: @bit_ceil_32(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -24,7 +24,7 @@ define i32 @bit_ceil_32(i32 %x) {
define i64 @bit_ceil_64(i64 %x) {
; CHECK-LABEL: @bit_ceil_64(
; CHECK-NEXT: [[DEC:%.*]] = add i64 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i64 0, 65) i64 @llvm.ctlz.i64(i64 [[DEC]], i1 false)
; CHECK-NEXT: [[TMP1:%.*]] = sub nsw i64 0, [[CTLZ]]
; CHECK-NEXT: [[TMP2:%.*]] = and i64 [[TMP1]], 63
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i64 1, [[TMP2]]
@@ -44,7 +44,7 @@ define i32 @bit_ceil_32_minus_1(i32 %x) {
; CHECK-LABEL: @bit_ceil_32_minus_1(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[X:%.*]], -2
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
; CHECK-NEXT: [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP0]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -64,7 +64,7 @@ entry:
; std::bit_ceil<uint32_t>(x + 1)
define i32 @bit_ceil_32_plus_1(i32 %x) {
; CHECK-LABEL: @bit_ceil_32_plus_1(
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[X:%.*]], i1 false)
; CHECK-NEXT: [[TMP1:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP2]]
@@ -84,7 +84,7 @@ define i32 @bit_ceil_plus_2(i32 %x) {
; CHECK-LABEL: @bit_ceil_plus_2(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[X:%.*]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
; CHECK-NEXT: [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP0]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -105,7 +105,7 @@ define i32 @bit_ceil_32_neg(i32 %x) {
; CHECK-LABEL: @bit_ceil_32_neg(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SUB:%.*]] = xor i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
; CHECK-NEXT: [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP0]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -127,7 +127,7 @@ define i32 @bit_ceil_not(i32 %x) {
; CHECK-LABEL: @bit_ceil_not(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[SUB:%.*]] = sub i32 -2, [[X:%.*]]
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[SUB]], i1 false)
; CHECK-NEXT: [[TMP0:%.*]] = sub nsw i32 0, [[CTLZ]]
; CHECK-NEXT: [[TMP1:%.*]] = and i32 [[TMP0]], 31
; CHECK-NEXT: [[SEL:%.*]] = shl nuw i32 1, [[TMP1]]
@@ -147,7 +147,7 @@ entry:
define i32 @bit_ceil_commuted_operands(i32 %x) {
; CHECK-LABEL: @bit_ceil_commuted_operands(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: ret i32 [[SHL]]
@@ -165,7 +165,7 @@ define i32 @bit_ceil_commuted_operands(i32 %x) {
define i32 @bit_ceil_wrong_select_constant(i32 %x) {
; CHECK-LABEL: @bit_ceil_wrong_select_constant(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[UGT_INV:%.*]] = icmp ult i32 [[X]], 2
@@ -185,7 +185,7 @@ define i32 @bit_ceil_wrong_select_constant(i32 %x) {
define i32 @bit_ceil_32_wrong_cond(i32 %x) {
; CHECK-LABEL: @bit_ceil_32_wrong_cond(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[UGT:%.*]] = icmp ugt i32 [[X]], 2
@@ -205,7 +205,7 @@ define i32 @bit_ceil_32_wrong_cond(i32 %x) {
define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
; CHECK-LABEL: @bit_ceil_wrong_sub_constant(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 33, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -225,7 +225,7 @@ define i32 @bit_ceil_wrong_sub_constant(i32 %x) {
define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_ceil_32_shl_used_twice(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -247,7 +247,7 @@ define i32 @bit_ceil_32_shl_used_twice(i32 %x, ptr %p) {
define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_ceil_32_sub_used_twice(
; CHECK-NEXT: [[DEC:%.*]] = add i32 [[X:%.*]], -1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) i32 @llvm.ctlz.i32(i32 [[DEC]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[UGT:%.*]] = icmp ugt i32 [[X]], 1
@@ -269,7 +269,7 @@ define i32 @bit_ceil_32_sub_used_twice(i32 %x, ptr %p) {
define <4 x i32> @bit_ceil_v4i32(<4 x i32> %x) {
; CHECK-LABEL: @bit_ceil_v4i32(
; CHECK-NEXT: [[DEC:%.*]] = add <4 x i32> [[X:%.*]], <i32 -1, i32 -1, i32 -1, i32 -1>
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 0, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[DEC]], i1 false)
; CHECK-NEXT: [[TMP1:%.*]] = sub nsw <4 x i32> zeroinitializer, [[CTLZ]]
; CHECK-NEXT: [[TMP2:%.*]] = and <4 x i32> [[TMP1]], <i32 31, i32 31, i32 31, i32 31>
; CHECK-NEXT: [[SEL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP2]]
diff --git a/llvm/test/Transforms/InstCombine/bit_floor.ll b/llvm/test/Transforms/InstCombine/bit_floor.ll
index 9daa8eee8969c0..bd8aabf4431c0a 100644
--- a/llvm/test/Transforms/InstCombine/bit_floor.ll
+++ b/llvm/test/Transforms/InstCombine/bit_floor.ll
@@ -5,7 +5,7 @@ define i32 @bit_floor_32(i32 %x) {
; CHECK-LABEL: @bit_floor_32(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0:![0-9]+]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -24,7 +24,7 @@ define i64 @bit_floor_64(i64 %x) {
; CHECK-LABEL: @bit_floor_64(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i64 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i64 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false), !range [[RNG1:![0-9]+]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i64 1, 65) i64 @llvm.ctlz.i64(i64 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i64 64, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i64 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i64 0, i64 [[SHL]]
@@ -44,7 +44,7 @@ define i32 @bit_floor_commuted_operands(i32 %x) {
; CHECK-LABEL: @bit_floor_commuted_operands(
; CHECK-NEXT: [[NE0_NOT:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[NE0_NOT]], i32 0, i32 [[SHL]]
@@ -64,7 +64,7 @@ define i32 @bit_floor_lshr_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_floor_lshr_used_twice(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -86,7 +86,7 @@ define i32 @bit_floor_ctlz_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_floor_ctlz_used_twice(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -108,7 +108,7 @@ define i32 @bit_floor_sub_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_floor_sub_used_twice(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -130,7 +130,7 @@ define i32 @bit_floor_shl_used_twice(i32 %x, ptr %p) {
; CHECK-LABEL: @bit_floor_shl_used_twice(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq i32 [[X:%.*]], 0
; CHECK-NEXT: [[LSHR:%.*]] = lshr i32 [[X]], 1
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) i32 @llvm.ctlz.i32(i32 [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw i32 32, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw i32 1, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[EQ0]], i32 0, i32 [[SHL]]
@@ -152,7 +152,7 @@ define <4 x i32> @bit_floor_v4i32(<4 x i32> %x) {
; CHECK-LABEL: @bit_floor_v4i32(
; CHECK-NEXT: [[EQ0:%.*]] = icmp eq <4 x i32> [[X:%.*]], zeroinitializer
; CHECK-NEXT: [[LSHR:%.*]] = lshr <4 x i32> [[X]], <i32 1, i32 1, i32 1, i32 1>
-; CHECK-NEXT: [[CTLZ:%.*]] = tail call <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false), !range [[RNG0]]
+; CHECK-NEXT: [[CTLZ:%.*]] = tail call range(i32 1, 33) <4 x i32> @llvm.ctlz.v4i32(<4 x i32> [[LSHR]], i1 false)
; CHECK-NEXT: [[SUB:%.*]] = sub nuw nsw <4 x i32> <i32 32, i32 32, i32 32, i32 32>, [[CTLZ]]
; CHECK-NEXT: [[SHL:%.*]] = shl nuw <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[SUB]]
; CHECK-NEXT: [[SEL:%.*]] = select <4 x i1> [[EQ0]], <4 x i32> zeroinitializer, <4 x i32> [[SHL]]
diff --git a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
index 66cbb2636cbc2b..9a9f359fa80b4a 100644
--- a/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
+++ b/llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
@@ -125,7 +125,7 @@ define <2 x ...
[truncated]
|
llvm/lib/IR/Instruction.cpp
Outdated
|
||
case Instruction::Call: | ||
case Instruction::Invoke: | ||
cast<CallBase>(this)->removeRetAttr(Attribute::Range); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only remove the attribute from the Call/Invoke instruction but Operator::hasPoisonGeneratingFlags() will also look at the callee for the attribute I do not know how to handle this as there is an assert in the end of this function that they shall be in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hasPoisonGeneratingFlags you can explicitly fetch the AttributeList of the call and do the check there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that as all other places we call hasRetAttr so the instruction is still causing poison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all a bit tricky, but the tl;dr is that we only care about call-site attributes in this context. Any attributes we'd have on an intrinsic function declaration would hold unconditionally and would not actually make the result more poisonous. E.g. if you have a ctlz.i32 we could add range(0, 32) to the declaration, but that will never make the result more poisonous. Non-intrinsics are currently not relevant in this context, because the surrounding analysis wouldn't support them anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please split out the changes to flag clearing into a separate PR? We'll also want to explicitly test them using some call-site of dropPoisonGeneratingFlagsAndMetadata().
And on that note, I don't think this change should be in hasPoisonGeneratingFlags/dropPoisonGeneratingFlags itself, which should stay just about flags. We should handle this in the "AndMetadata" variants -- possibly with a rename of the method, if we want to be pedantic (in the interest of not getting dropPoisonGeneratingFlagsAndMetadataAndAttributes, maybe something like dropPoisonGeneratingAnnotations?). It may make sense to handle the nonnull and align metadata at the same time, if we can come up with some examples of miscompiles for it (maybe something involving ptrmask?)
will try to split out the poison handling, I tried to do that first but was not able come up with some test that miscompiled. |
The test modified in cb6240d should work. |
|
9db3377
to
bd3e83f
Compare
rebased to get the poison handling and fixed the test (noticed that I had only had the x86 target in my build config) so this is ready for review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are clang test failures:
�_bk;t=1713477292662�Failed Tests (7):
�_bk;t=1713477292662� Clang :: CodeGen/SystemZ/builtins-systemz-zvector.c
�_bk;t=1713477292662� Clang :: CodeGen/SystemZ/builtins-systemz-zvector2.c
�_bk;t=1713477292662� Clang :: CodeGen/builtins-wasm.c
�_bk;t=1713477292662� Clang :: CodeGen/ms-intrinsics-other.c
�_bk;t=1713477292662� Clang :: CodeGen/ms-intrinsics.c
�_bk;t=1713477292662� Clang :: CodeGenOpenCL/builtins-generic-amdgcn.cl
�_bk;t=1713477292662� Clang :: Headers/wasm.c
This time it seems like I have managed to get all the tests to pass at least on linux, windows have not executed yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I still have not asked for rights to merge do not know when there is a good time to do that feels like there is a lot of discussions about if it is needed to have it or not. |
Revamp the NVVMIntrRange pass making the following updates: - Use range attributes over range metadata. This is what instcombine has move to for ranges on intrinsics in #88776 and it seems a bit cleaner. - Consider the `!"maxntid{x,y,z}"` and `!"reqntid{x,y,z}"` function metadata when adding ranges for `tid` srge instrinsics. This can allow for smaller ranges and more optimization. - When range attributes are already present, use the intersection of the old and new range. This complements the metadata change by allowing ranges to be shrunk when an intrinsic is in a function which is inlined into a kernel with metadata. While we don't call this more then once yet, we should consider adding a second call after inlining, once this has had a chance to soak for a while and no issues have arisen. I've also re-enabled this pass in the TM, it was disabled years ago due to "numerical discrepancies" https://reviews.llvm.org/D96166. In our testing we haven't seen any issues with adding ranges to intrinsics, and I cannot find any further info about what issues were encountered.
From what I can find all optimizations that use range metadata now also handle range attribute so think that we can start to replace writes of range metadata for call instructions to range attributes.
CC @nikic