New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Clang] Correct __builtin_dynamic_object_size for subobject types #78526
Conversation
The second argument of __builtin_dynamic_object_size controls whether it returns the size of the whole object or the closest surrounding object. For this struct: struct s { int foo; char bar[2][40]; int baz; int qux; }; int main(int argc, char **argv) { struct s f; #define report(x) fprintf(stderr, #x ": %zu\n", x) argc = 1; report(__builtin_dynamic_object_size(f.bar[argc], 0)); report(__builtin_dynamic_object_size(f.bar[argc], 1)); return 0; } should return: __builtin_dynamic_object_size(f.bar[argc], 0): 48 __builtin_dynamic_object_size(f.bar[argc], 1): 40 determined by the least significant bit of the TYPE. Add a new parameter to the llvm.objectsize intrinsic to control which size it should return.
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang-codegen Author: Bill Wendling (bwendling) ChangesThe second argument of __builtin_dynamic_object_size controls whether it
should return:
determined by the least significant bit of the TYPE. Add a new parameter to the llvm.objectsize intrinsic to control which Patch is 97.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78526.diff 40 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index de48b15645b1ab..513a007aa009b7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -1103,12 +1103,17 @@ CodeGenFunction::emitBuiltinObjectSize(const Expr *E, unsigned Type,
Function *F =
CGM.getIntrinsic(Intrinsic::objectsize, {ResType, Ptr->getType()});
+ // If the least significant bit is clear, objects are whole variables. If
+ // it's set, a closest surrounding subobject is considered the object a
+ // pointer points to.
+ Value *WholeObj = Builder.getInt1((Type & 1) == 0);
+
// LLVM only supports 0 and 2, make sure that we pass along that as a boolean.
Value *Min = Builder.getInt1((Type & 2) != 0);
// For GCC compatibility, __builtin_object_size treat NULL as unknown size.
Value *NullIsUnknown = Builder.getTrue();
Value *Dynamic = Builder.getInt1(IsDynamic);
- return Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic});
+ return Builder.CreateCall(F, {Ptr, WholeObj, Min, NullIsUnknown, Dynamic});
}
namespace {
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d12e85b48d0b00..7149e459a1390e 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -744,11 +744,13 @@ void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, SourceLocation Loc,
// FIXME: Get object address space
llvm::Type *Tys[2] = { IntPtrTy, Int8PtrTy };
llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::objectsize, Tys);
+ llvm::Value *WholeObj = Builder.getTrue();
llvm::Value *Min = Builder.getFalse();
llvm::Value *NullIsUnknown = Builder.getFalse();
llvm::Value *Dynamic = Builder.getFalse();
llvm::Value *LargeEnough = Builder.CreateICmpUGE(
- Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic}), Size);
+ Builder.CreateCall(F, {Ptr, WholeObj, Min, NullIsUnknown, Dynamic}),
+ Size);
Checks.push_back(std::make_pair(LargeEnough, SanitizerKind::ObjectSize));
}
}
diff --git a/clang/test/CodeGen/catch-undef-behavior.c b/clang/test/CodeGen/catch-undef-behavior.c
index af37ef9e8565b1..41cbe6db881bf0 100644
--- a/clang/test/CodeGen/catch-undef-behavior.c
+++ b/clang/test/CodeGen/catch-undef-behavior.c
@@ -35,7 +35,7 @@
void foo(void) {
union { int i; } u;
- // CHECK-COMMON: %[[SIZE:.*]] = call i64 @llvm.objectsize.i64.p0(ptr %[[PTR:.*]], i1 false, i1 false, i1 false)
+ // CHECK-COMMON: %[[SIZE:.*]] = call i64 @llvm.objectsize.i64.p0(ptr %[[PTR:.*]], i1 true, i1 false, i1 false, i1 false)
// CHECK-COMMON-NEXT: %[[OK:.*]] = icmp uge i64 %[[SIZE]], 4
// CHECK-UBSAN: br i1 %[[OK]], {{.*}} !prof ![[WEIGHT_MD:.*]], !nosanitize
diff --git a/clang/test/CodeGen/pass-object-size.c b/clang/test/CodeGen/pass-object-size.c
index c7c505b0fb3e78..cbc54db301a5c8 100644
--- a/clang/test/CodeGen/pass-object-size.c
+++ b/clang/test/CodeGen/pass-object-size.c
@@ -85,16 +85,16 @@ void test1(unsigned long sz) {
char *ptr = (char *)malloc(sz);
- // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 false, i1 true, i1 true)
+ // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 false, i1 true, i1 true)
// CHECK: call i32 @DynamicObjectSize0(ptr noundef %{{.*}}, i64 noundef [[REG]])
gi = DynamicObjectSize0(ptr);
// CHECK: [[WITH_OFFSET:%.*]] = getelementptr
- // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[WITH_OFFSET]], i1 false, i1 true, i1 true)
+ // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[WITH_OFFSET]], i1 true, i1 false, i1 true, i1 true)
// CHECK: call i32 @DynamicObjectSize0(ptr noundef {{.*}}, i64 noundef [[REG]])
gi = DynamicObjectSize0(ptr+10);
- // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 true, i1 true)
+ // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 true, i1 true, i1 true)
// CHECK: call i32 @DynamicObjectSize2(ptr noundef {{.*}}, i64 noundef [[REG]])
gi = DynamicObjectSize2(ptr);
}
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d881deb30049a2..054333fe52fcd5 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26520,8 +26520,8 @@ Syntax:
::
- declare i32 @llvm.objectsize.i32(ptr <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
- declare i64 @llvm.objectsize.i64(ptr <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
+ declare i32 @llvm.objectsize.i32(ptr <object>, i1 <wholeobj>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
+ declare i64 @llvm.objectsize.i64(ptr <object>, i1 <wholeobj>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
Overview:
"""""""""
@@ -26535,18 +26535,26 @@ class, structure, array, or other object.
Arguments:
""""""""""
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes five arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which size ``llvm.objectsize`` returns:
+ - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+ surrounding subobject.
+ - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+- The third argument controls which value to return when the size is unknown:
+ - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+ - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+- The fourth argument controls how ``llvm.objectsize`` acts when ``null`` in
+ address space 0 is used as its pointer argument:
+ - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+ ``null``.
+ - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+ the size is assumed to be unknown.
+- The fifth argument to ``llvm.objectsize`` determines if the value should be
+ evaluated at runtime.
-The second, third, and fourth arguments only accept constants.
+The second, third, fourth, and fifth arguments accept only constants.
Semantics:
""""""""""
diff --git a/llvm/include/llvm/Analysis/MemoryBuiltins.h b/llvm/include/llvm/Analysis/MemoryBuiltins.h
index 37ce1518f00c08..ee5cc950788678 100644
--- a/llvm/include/llvm/Analysis/MemoryBuiltins.h
+++ b/llvm/include/llvm/Analysis/MemoryBuiltins.h
@@ -157,6 +157,11 @@ struct ObjectSizeOpts {
/// Whether to round the result up to the alignment of allocas, byval
/// arguments, and global variables.
bool RoundToAlign = false;
+ /// If this is true, return the whole size of the object. Otherwise, return
+ /// the size of the closest surrounding subobject.
+ /// FIXME: The default before being added was to return the whole size of the
+ /// object. Review if this is the correct default.
+ bool WholeObjectSize = true;
/// If this is true, null pointers in address space 0 will be treated as
/// though they can't be evaluated. Otherwise, null is always considered to
/// point to a 0 byte region of memory.
@@ -231,6 +236,7 @@ class ObjectSizeOffsetVisitor
APInt Zero;
SmallDenseMap<Instruction *, SizeOffsetAPInt, 8> SeenInsts;
unsigned InstructionsVisited;
+ const StructType *AllocaTy = nullptr;
APInt align(APInt Size, MaybeAlign Align);
@@ -242,6 +248,8 @@ class ObjectSizeOffsetVisitor
SizeOffsetAPInt compute(Value *V);
+ const StructType *getAllocaType() const { return AllocaTy; }
+
// These are "private", except they can't actually be made private. Only
// compute() should be used by external users.
SizeOffsetAPInt visitAllocaInst(AllocaInst &I);
@@ -313,6 +321,7 @@ class ObjectSizeOffsetEvaluator
PtrSetTy SeenVals;
ObjectSizeOpts EvalOpts;
SmallPtrSet<Instruction *, 8> InsertedInstructions;
+ const StructType *AllocaTy = nullptr;
SizeOffsetValue compute_(Value *V);
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index b54c697296b20a..cef4c756d5a31e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1071,7 +1071,7 @@ def int_maximum : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
// Internal interface for object size checking
def int_objectsize : DefaultAttrsIntrinsic<[llvm_anyint_ty],
- [llvm_anyptr_ty, llvm_i1_ty,
+ [llvm_anyptr_ty, llvm_i1_ty, llvm_i1_ty,
llvm_i1_ty, llvm_i1_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn,
ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>,
diff --git a/llvm/lib/Analysis/MemoryBuiltins.cpp b/llvm/lib/Analysis/MemoryBuiltins.cpp
index 46a7a921d86d3d..56d6313905865e 100644
--- a/llvm/lib/Analysis/MemoryBuiltins.cpp
+++ b/llvm/lib/Analysis/MemoryBuiltins.cpp
@@ -615,12 +615,15 @@ Value *llvm::lowerObjectSizeCall(
assert(ObjectSize->getIntrinsicID() == Intrinsic::objectsize &&
"ObjectSize must be a call to llvm.objectsize!");
- bool MaxVal = cast<ConstantInt>(ObjectSize->getArgOperand(1))->isZero();
ObjectSizeOpts EvalOptions;
EvalOptions.AA = AA;
+ EvalOptions.WholeObjectSize =
+ cast<ConstantInt>(ObjectSize->getArgOperand(1))->isOne();
+
// Unless we have to fold this to something, try to be as accurate as
// possible.
+ bool MaxVal = cast<ConstantInt>(ObjectSize->getArgOperand(2))->isZero();
if (MustSucceed)
EvalOptions.EvalMode =
MaxVal ? ObjectSizeOpts::Mode::Max : ObjectSizeOpts::Mode::Min;
@@ -628,10 +631,10 @@ Value *llvm::lowerObjectSizeCall(
EvalOptions.EvalMode = ObjectSizeOpts::Mode::ExactSizeFromOffset;
EvalOptions.NullIsUnknownSize =
- cast<ConstantInt>(ObjectSize->getArgOperand(2))->isOne();
+ cast<ConstantInt>(ObjectSize->getArgOperand(3))->isOne();
auto *ResultType = cast<IntegerType>(ObjectSize->getType());
- bool StaticOnly = cast<ConstantInt>(ObjectSize->getArgOperand(3))->isZero();
+ bool StaticOnly = cast<ConstantInt>(ObjectSize->getArgOperand(4))->isZero();
if (StaticOnly) {
// FIXME: Does it make sense to just return a failure value if the size won't
// fit in the output and `!MustSucceed`?
@@ -726,6 +729,25 @@ SizeOffsetAPInt ObjectSizeOffsetVisitor::computeImpl(Value *V) {
if (!IndexTypeSizeChanged && Offset.isZero())
return SOT;
+ if (!Options.WholeObjectSize && AllocaTy) {
+ // At this point, SOT.Size is the size of the whole struct. However, we
+ // want the size of the sub-object.
+ const StructLayout &SL = *DL.getStructLayout(
+ const_cast<StructType *>(AllocaTy));
+
+ unsigned Idx = SL.getElementContainingOffset(Offset.getLimitedValue());
+
+ // Get the size of the sub-object.
+ TypeSize ElemSize = DL.getTypeAllocSize(AllocaTy->getTypeAtIndex(Idx));
+ APInt Size(InitialIntTyBits, ElemSize.getKnownMinValue());
+
+ // Adjust the offset to reflect the sub-object's offset.
+ TypeSize ElemOffset = SL.getElementOffset(Idx);
+ Offset -= ElemOffset.getKnownMinValue();
+
+ SOT = SizeOffsetAPInt(Size, Offset);
+ }
+
// We stripped an address space cast that changed the index type size or we
// accumulated some constant offset (or both). Readjust the bit width to match
// the argument index type size and apply the offset, as required.
@@ -781,9 +803,12 @@ SizeOffsetAPInt ObjectSizeOffsetVisitor::visitAllocaInst(AllocaInst &I) {
TypeSize ElemSize = DL.getTypeAllocSize(I.getAllocatedType());
if (ElemSize.isScalable() && Options.EvalMode != ObjectSizeOpts::Mode::Min)
return ObjectSizeOffsetVisitor::unknown();
+
APInt Size(IntTyBits, ElemSize.getKnownMinValue());
- if (!I.isArrayAllocation())
+ if (!I.isArrayAllocation()) {
+ AllocaTy = dyn_cast<StructType>(I.getAllocatedType());
return SizeOffsetAPInt(align(Size, I.getAlign()), Zero);
+ }
Value *ArraySize = I.getArraySize();
if (const ConstantInt *C = dyn_cast<ConstantInt>(ArraySize)) {
@@ -1086,6 +1111,8 @@ SizeOffsetValue ObjectSizeOffsetEvaluator::compute(Value *V) {
SizeOffsetValue ObjectSizeOffsetEvaluator::compute_(Value *V) {
ObjectSizeOffsetVisitor Visitor(DL, TLI, Context, EvalOpts);
SizeOffsetAPInt Const = Visitor.compute(V);
+ AllocaTy = Visitor.getAllocaType();
+
if (Const.bothKnown())
return SizeOffsetValue(ConstantInt::get(Context, Const.Size),
ConstantInt::get(Context, Const.Offset));
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 1a9e474911b39c..0fafe91762d162 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -4393,12 +4393,43 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
break;
case Intrinsic::objectsize: {
- Value *NullIsUnknownSize =
- CI->arg_size() == 2 ? Builder.getFalse() : CI->getArgOperand(2);
- Value *Dynamic =
- CI->arg_size() < 4 ? Builder.getFalse() : CI->getArgOperand(3);
- NewCall = Builder.CreateCall(
- NewFn, {CI->getArgOperand(0), CI->getArgOperand(1), NullIsUnknownSize, Dynamic});
+ // The default behavior before the addition of the '<wholeobj>' argument
+ // was to return the size of the whole object.
+ Value *WholeObj = nullptr;
+ Value *UnknownVal = nullptr;
+ Value *NullIsUnknownSize = nullptr;
+ Value *Dynamic = nullptr;
+
+ switch (CI->arg_size()) {
+ case 2:
+ WholeObj = Builder.getTrue();
+ UnknownVal = CI->getArgOperand(1);
+ NullIsUnknownSize = Builder.getFalse();
+ Dynamic = Builder.getFalse();
+ break;
+ case 3:
+ WholeObj = Builder.getTrue();
+ UnknownVal = CI->getArgOperand(1);
+ NullIsUnknownSize = CI->getArgOperand(2);
+ Dynamic = Builder.getFalse();
+ break;
+ case 4:
+ WholeObj = Builder.getTrue();
+ UnknownVal = CI->getArgOperand(1);
+ NullIsUnknownSize = CI->getArgOperand(2);
+ Dynamic = CI->getArgOperand(3);
+ break;
+ case 5:
+ WholeObj = CI->getArgOperand(1);
+ UnknownVal = CI->getArgOperand(2);
+ NullIsUnknownSize = CI->getArgOperand(3);
+ Dynamic = CI->getArgOperand(4);
+ break;
+ }
+
+ NewCall =
+ Builder.CreateCall(NewFn, {CI->getArgOperand(0), WholeObj, UnknownVal,
+ NullIsUnknownSize, Dynamic});
break;
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 5e73411cae9b70..3bb203442dd513 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -1486,8 +1486,8 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToLDS(AllocaInst &I,
PointerType::get(Context, AMDGPUAS::LOCAL_ADDRESS)});
CallInst *NewCall = Builder.CreateCall(
- ObjectSize,
- {Src, Intr->getOperand(1), Intr->getOperand(2), Intr->getOperand(3)});
+ ObjectSize, {Src, Intr->getOperand(1), Intr->getOperand(2),
+ Intr->getOperand(3), Intr->getOperand(4)});
Intr->replaceAllUsesWith(NewCall);
Intr->eraseFromParent();
continue;
diff --git a/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll b/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
index 3c39e456eed5f9..3d2e836b97517a 100644
--- a/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
@@ -18,7 +18,7 @@ define i32 @trivially_free() {
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true, i1 true)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a6 = call ptr @llvm.ptr.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.var.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef
@@ -38,7 +38,7 @@ define i32 @trivially_free() {
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true, i1 true)
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a6 = call ptr @llvm.ptr.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.var.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
; CHECK-THROUGHPUT-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
@@ -58,7 +58,7 @@ define i32 @trivially_free() {
%a4 = call i1 @llvm.is.constant.i32(i32 undef)
call void @llvm.lifetime.start.p0(i64 1, ptr undef)
call void @llvm.lifetime.end.p0(i64 1, ptr undef)
- %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 1, i1 1, i1 1)
+ %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 1, i1 1, i1 1, i1 1)
%a6 = call ptr @llvm.ptr.annotation.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
call void @llvm.var.annotation(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
ret i32 undef
@@ -79,7 +79,7 @@ declare ptr @llvm.strip.invariant.group.p0(ptr)
declare i1 @llvm.is.constant.i32(i32)
declare void @llvm.lifetime.start.p0(i64, ptr)
declare void @llvm.lifetime.end.p0(i64, ptr)
-declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1)
+declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1, i1)
declare ptr @llvm.ptr.annotation.p0(ptr, ptr, ptr, i32, ptr)
declare void @llvm.var.annotation(ptr, ptr, ptr, i32, ptr)
diff --git a/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll b/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
index b5f204c2d9b26e..3674e87a6fc05b 100644
--- a/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
+++ b/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
@@ -20,7 +20,7 @@ define i32 @trivially_free() {
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %a...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this before, years ago. Previously, the sticking point was: how is LLVM going to know what the frontend considers the closest surrounding subobject to be? The LLVM type doesn't give you that information, and it seems like there's nowhere else that LLVM can get it from either, so this flag ends up not being useful and the best we can do is to give a conservatively-correct answer -- the complete object size if we want an upper bound, or 0 if we want a lower bound.
Clang does respect the "subobject" flag if it can symbolically evaluate the operand of __builtin_object_size
sufficiently to determine which subobject is being referenced. Previously we've thought that that was the best we could do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using anything but the size and alignment of the alloca type in a way that affects program semantics is illegal.
Right now we're giving a wrong answer (see below), so that's no good. If we're going to err on the side of caution, then we should return the default "can't calculate the size" value. When you say that we can't detect what the front-end considers the "closest surrounding subobject" to be, is that mostly due to corner cases or is it a more general concern? (Note, we're really only interested in supporting this for C structs. C++ structs / classes would require therapy.)
This is why the current behavior is wrong, in my opinion. The motivating example is below:
The output with this change is now:
Without the change, the last line is:
Which isn't correct according to GNU's documentation. So if we can't honor the TYPE bit, then we should return |
I'm sorry, but I don't understand your comment here. Could you supply some context? |
It's a more general concern: LLVM simply has no idea what the frontend considers to be a subobject. The LLVM type doesn't carry that information.
Perhaps according to the GCC documentation as written. But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48. Do you have a use case for which -1 is a better answer? I suspect the only change we're missing here is a change to our documentation to explicitly say that we give an upper/lower bound when we can't compute an exact size. |
It's the second bit that controls whether it's an upper bound or lower bound that's returned, not the least significant bit.
I'm sure one could be constructed (for instance, someone wanting to check if it's okay to We're trying to implement a GNU builtin, and the only defined semantics we have to go on are GNU's documentation. I can't see how we can deviate from their documentation unless it's to say "we can't determine this value" and so return Lastly, returning From what I can tell, GNU's idea of what constitutes a "closest surrounding sub-object" is one of a handful of things: the structure containing the field, the array being pointed to with a struct, or the field being pointed into. Example of the last one:
All of these are explicit in the LLVM IR. Is the worry that they've been changed from some transformations? Or are there others I'm missing? |
@bwendling I think you are reading the GCC docs too pedantically. In particular, they also say
which makes it abundantly clear that what you get is an upper bound or lower bound, respectively. -1 and 0 are just the upper and lower bounds if you have no useful information at all. If you want to check whether the size is exactly known, you'll have to compare both bounds. Outside of doing that, you can never assume that the bound is precise. Btw, it looks like your initial example gets 48 for both modes on GCC as well? https://c.godbolt.org/z/EfGWv4Wrh
Apart from the fact that what you are doing is simply illegal under our IR semantics, a practical case where this will likely compute incorrect results are unions. For unions, clang will use the type of the union member with the largest size as the alloca type, regardless of which union member is active. I haven't tried, but your patch will probably compute the subobject size based on that arbitrarily picked member, rather than the one being accessed. |
Another concrete case where using the IR type won't work is arrays whose initializer is mostly-zeroes. For example, for
we give |
Isn't this referring to pointing to an object in a union? That's only way this makes sense (in C).
I'm still not convinced that it's abundantly clear... but if we want to compare max vs. min, things get worse because we return
If you remove the
?? I think it's a bit much to say that this is "illegal." We may not get every instance of something, but we can certainly determine the size of a struct. If by "illegal" you mean determining what a sub-object is, I don't agree but if it's so there might be a way to retain whatever information is needed to determine what a sub-object is.
We don't seem to support unions correctly now:
|
My patch only works with structs. |
Right, that's what I said: modes 0 and 1 ask for an upper bound. (Modes 2 and 3 ask for a lower bound.)
We're not really deviating from the documented rules. By the time we get to LLVM IR lowering of the builtin, we have lost the precise frontend information. But we know the pointer might point into the complete object (where it would be able to write to 48 bytes) or to some subobject of that complete object (where it would be able to write to 48 bytes or less). Therefore the correct answer to return is 48. In the same way, it would be correct to return 48 here:
... though we miscompile this and return 40 regardless of whether we're in upper bound or lower bound mode. :-(
No, that is not what we are saying when we return 48. We're saying "a write past 48 bytes is definitely bad". If you want a lower bound on the number of bytes that you can write, then you need to use mode 2 or 3 instead. |
Taking a step back, while this patch is not the right direction, we can and should do better for the original example. Probably the best way to do that is to analyze the operand to
A possibly simpler alternative would be for the frontend to pass an upper bound on the result to the LLVM builtin in mode 1, so Clang could say "I know the result will never be more than 40" and LLVM could provide either that size or the complete object size, whichever is smaller. That wouldn't give as tight a bound for the argc == 2 case, though. |
Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.
But in the example, writing up to 48 bytes is definitely bad. And as I mentioned, we disagree on what's meant by "upper" and "lower" bounds in the case of a pointer to a non-union field. For me, that's managed by the first bit. I know that we lose precise struct information going to LLVM IR. If that's what's needed here, there are ways to pass this information along. We retain this information via DWARF. We could use similar metadata for this instance. Would that be acceptable? |
It might be possible to do something like this. If retaining precise type information from the front-end is the sticking issue, it could be as simple as casting this pointer top the type of the sub-object. I'll see what I can do. |
Per @nikic's example, it seems reasonably clear to me that GCC's intended semantics are to get either the best upper bound or the best lower bound that the compiler is able to compute. I mean sure, we can ask, and it does no harm to do so, but it is not reasonable to expect that a quantity like this coming from the vagaries of GCC's implementation details will be exactly the same for all examples when computed by a different implementation. (It's not even the same across optimization levels in GCC.)
In theory, yes, we could preserve enough information in metadata to compute this in the middle-end. But we would need to emit a lot of metadata, just on the off-chance that after optimization we happen to have a builtin_object_size query that points to each object that we emit, so in practice I don't think there's any chance we can do this. We can't use DWARF, because it won't necessarily be available (and typically won't be available in the interesting case where we find the object size only after optimization), and in any case, the presence or absence of DWARF isn't supposed to affect the executable code. Also, we'd need to annotate things that simply don't exist at all in the LLVM IR:
Here, |
I completely agree that using GCC's implementation and documentation is definitely lacking. That's the pitfalls of relying upon features added not by the standards committee, but by compiler developers. They're likely to be buggy as hell for unforeseen corner cases (despite some developers being on the standards committee). My answer for the question "what's the semantics of GCC's builtin X?" has always been "whatever GCC does." It's the best we can rely upon. But then we get into situations like this, where you and @nikic have one interpretation of their documentation and I have another. I can point to their behavior to back up my claim, but in the end it's probably not exactly clear even to GCC. My concern is that we want to use this for code hardening. Without precise object sizes, we're hampered in our goal. The unfortunate reality is that we can only get that size via these
The sub-object for For the record, I didn't mean that we should use DWARF. I'm not that much of a masochist. :-) |
@nikic demonstrated that our current behavior is already compatible with GCC's behavior. If GCC's behavior is the spec, then we are allowed to return 48 rather than only 40 or -1 (or presumably 0 if
That's a totally understandable desire, but I think it's not realistic to expect precise subobject sizes in the same cases that GCC can provide them, due to the different architectural choices in the two compilers. If we had a mid-level IR for Clang that still had frontend information, we could do better by evaluating BOS there, so maybe that's one long term path forward to consider. And in the short term, while there are cases where we won't be able to match GCC, I think Clang should do better than it currently does in the frontend, specifically in cases like the one in the bug report where there's an obvious better answer that doesn't require any sophisticated analysis to discover.
GCC disagrees with you: https://godbolt.org/z/s4P74oEqx |
No, he exposed a fluke in their implementation. I gave examples of where clang gets it wrong, and yet I apparently haven't "proven" anything according to this logic?
We're not talking about something like inline asm, where the internals of their register allocator is more-or-less exposed to the user, and therefore it's next to impossible for us to replicate all of it. This feature is far simpler, and should be something we can determine based on all of the information we're given.
What "obvious better answer" is that? |
It's not just a union field, it could literally point to one of many potential objects, e.g.:
Here, In general, we want to allow returning a conservative estimate whenever possible and not fail. This is a deliberate design decision because the key usage of I obviously have no comments on the patch itself, but hopefully this gives enough context to interpret the GCC implementation and/or documentation. It would be ideal to always return a precise object size expression, but falling back to an estimate is better than simply bailing out. |
Trying to summarize my understanding here:
|
Thanks for summarizing, that matches my understanding. As to how to implement this "properly", that would probably be https://discourse.llvm.org/t/exploring-the-effects-and-uses-of-the-memory-region-declaration-intrinsic/72756. The memory.region.decl intrinsics effectively encode subobjects. However, doing this has some fairly substantial externalities. |
It matches my understanding too. There are more issues with I'm exploring some ways of fixing this issue:
|
I'm not sure I follow; if you know the object's type, doesn't that mean you also know its size?
Any such transform has to reason about all the uses, so the llvm.objectsize call itself would prevent the transform from happening. |
Not necessarily. If you have something like this: struct x {
int a;
char foo[2][40];
int b;
int c;
};
size_t f(struct x *p, int idx) {
return __builtin_dynamic_object_size(&p->foo[idx], 1);
} But in general, it would be tricky for something like: __builtin_dynamic_object_size(&((char *)p)[idx], 1); which I'm not sure if that's something GCC can handle. The documentation says that if the pointer can point to multiple objects, then we can return the whole object value (or something to that affect).
Bon! :-) |
If I'm following correctly, the return here is 0, 40, or 80, depending on the value of idx? That's not a constant, but the computation is entirely syntactic; it doesn't matter what "p" actually points to. So clang can lower the builtin itself. Currently it doesn't, I think, because all the relevant code is in ExprConstant, but the code could be adapted. The problem, really, is that we can't easily extend that approach to stuff like the following: size_t f(struct x *p, int idx) {
char *c = &p->foo[idx];
return __builtin_dynamic_object_size(c, 1);
} |
Right. That's what I want to add to the front-end.
Yup! I've been forbidden from doing this in the back-end, so I have to jump through hoops now and do partial solutions and hope that it works for most people and that when we get it wrong it doesn't hurt security (spoilers: it will). |
It might be academic at this point, but for what it's worth, |
The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:
should return:
determined by the least significant bit of the TYPE.
Add a new parameter to the llvm.objectsize intrinsic to control which
size it should return.