Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Correct __builtin_dynamic_object_size for subobject types #78526

Closed
wants to merge 6 commits into from

Conversation

bwendling
Copy link
Collaborator

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

  struct s {
    int  foo;
    char bar[2][40];
    int  baz;
    int  qux;
  };

  int main(int argc, char **argv) {
    struct s f;

  #define report(x) fprintf(stderr, #x ": %zu\n", x)

    argc = 1;
    report(__builtin_dynamic_object_size(f.bar[argc], 0));
    report(__builtin_dynamic_object_size(f.bar[argc], 1));
    return 0;
  }

should return:

  __builtin_dynamic_object_size(f.bar[argc], 0): 48
  __builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

Add a new parameter to the llvm.objectsize intrinsic to control which
size it should return.

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

  struct s {
    int  foo;
    char bar[2][40];
    int  baz;
    int  qux;
  };

  int main(int argc, char **argv) {
    struct s f;

  #define report(x) fprintf(stderr, #x ": %zu\n", x)

    argc = 1;
    report(__builtin_dynamic_object_size(f.bar[argc], 0));
    report(__builtin_dynamic_object_size(f.bar[argc], 1));
    return 0;
  }

should return:

  __builtin_dynamic_object_size(f.bar[argc], 0): 48
  __builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

Add a new parameter to the llvm.objectsize intrinsic to control which
size it should return.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 18, 2024

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang-codegen

Author: Bill Wendling (bwendling)

Changes

The second argument of __builtin_dynamic_object_size controls whether it
returns the size of the whole object or the closest surrounding object.
For this struct:

  struct s {
    int  foo;
    char bar[2][40];
    int  baz;
    int  qux;
  };

  int main(int argc, char **argv) {
    struct s f;

  #define report(x) fprintf(stderr, #x ": %zu\n", x)

    argc = 1;
    report(__builtin_dynamic_object_size(f.bar[argc], 0));
    report(__builtin_dynamic_object_size(f.bar[argc], 1));
    return 0;
  }

should return:

  __builtin_dynamic_object_size(f.bar[argc], 0): 48
  __builtin_dynamic_object_size(f.bar[argc], 1): 40

determined by the least significant bit of the TYPE.

Add a new parameter to the llvm.objectsize intrinsic to control which
size it should return.


Patch is 97.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78526.diff

40 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+6-1)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+3-1)
  • (modified) clang/test/CodeGen/catch-undef-behavior.c (+1-1)
  • (modified) clang/test/CodeGen/pass-object-size.c (+3-3)
  • (modified) llvm/docs/LangRef.rst (+21-13)
  • (modified) llvm/include/llvm/Analysis/MemoryBuiltins.h (+9)
  • (modified) llvm/include/llvm/IR/Intrinsics.td (+1-1)
  • (modified) llvm/lib/Analysis/MemoryBuiltins.cpp (+31-4)
  • (modified) llvm/lib/IR/AutoUpgrade.cpp (+37-6)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+2-2)
  • (modified) llvm/test/Analysis/CostModel/X86/free-intrinsics.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/free-intrinsics-no_info.ll (+4-4)
  • (modified) llvm/test/Assembler/auto_upgrade_intrinsics.ll (+4-4)
  • (modified) llvm/test/Bitcode/objectsize-upgrade-7.0.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/memcpy_chk_no_tail.ll (+2-2)
  • (modified) llvm/test/CodeGen/AArch64/memsize-remarks.ll (+16-16)
  • (modified) llvm/test/CodeGen/AMDGPU/promote-alloca-mem-intrinsics.ll (+4-4)
  • (modified) llvm/test/Other/cgscc-libcall-update.ll (+2-2)
  • (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/debug-info.ll (+4-4)
  • (modified) llvm/test/Transforms/InferAddressSpaces/AMDGPU/intrinsics.ll (+6-6)
  • (modified) llvm/test/Transforms/InferAlignment/propagate-assume.ll (+2-2)
  • (modified) llvm/test/Transforms/Inline/call-intrinsic-objectsize.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/allocsize.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll (+13-13)
  • (modified) llvm/test/Transforms/InstCombine/builtin-object-size-custom-dl.ll (+3-3)
  • (modified) llvm/test/Transforms/InstCombine/builtin-object-size-strdup-family.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/invoke.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/memset_chk-1.ll (+5-5)
  • (modified) llvm/test/Transforms/InstCombine/objsize.ll (+38-38)
  • (modified) llvm/test/Transforms/InstCombine/stpcpy_chk-1.ll (+4-4)
  • (modified) llvm/test/Transforms/InstCombine/strcpy_chk-1.ll (+5-5)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-load.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-phi.ll (+5-5)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/builtin-object-size-posix-memalign.ll (+8-8)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/constant-intrinsics.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/crash-on-large-allocas.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/objectsize_basic.ll (+18-19)
  • (modified) llvm/test/Transforms/LowerConstantIntrinsics/stale-worklist-phi.ll (+2-2)
  • (modified) llvm/test/Transforms/SCCP/issue59602-assume-like-call-users.ll (+5-5)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index de48b15645b1ab..513a007aa009b7 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -1103,12 +1103,17 @@ CodeGenFunction::emitBuiltinObjectSize(const Expr *E, unsigned Type,
   Function *F =
       CGM.getIntrinsic(Intrinsic::objectsize, {ResType, Ptr->getType()});
 
+  // If the least significant bit is clear, objects are whole variables. If
+  // it's set, a closest surrounding subobject is considered the object a
+  // pointer points to.
+  Value *WholeObj = Builder.getInt1((Type & 1) == 0);
+
   // LLVM only supports 0 and 2, make sure that we pass along that as a boolean.
   Value *Min = Builder.getInt1((Type & 2) != 0);
   // For GCC compatibility, __builtin_object_size treat NULL as unknown size.
   Value *NullIsUnknown = Builder.getTrue();
   Value *Dynamic = Builder.getInt1(IsDynamic);
-  return Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic});
+  return Builder.CreateCall(F, {Ptr, WholeObj, Min, NullIsUnknown, Dynamic});
 }
 
 namespace {
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d12e85b48d0b00..7149e459a1390e 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -744,11 +744,13 @@ void CodeGenFunction::EmitTypeCheck(TypeCheckKind TCK, SourceLocation Loc,
       // FIXME: Get object address space
       llvm::Type *Tys[2] = { IntPtrTy, Int8PtrTy };
       llvm::Function *F = CGM.getIntrinsic(llvm::Intrinsic::objectsize, Tys);
+      llvm::Value *WholeObj = Builder.getTrue();
       llvm::Value *Min = Builder.getFalse();
       llvm::Value *NullIsUnknown = Builder.getFalse();
       llvm::Value *Dynamic = Builder.getFalse();
       llvm::Value *LargeEnough = Builder.CreateICmpUGE(
-          Builder.CreateCall(F, {Ptr, Min, NullIsUnknown, Dynamic}), Size);
+          Builder.CreateCall(F, {Ptr, WholeObj, Min, NullIsUnknown, Dynamic}),
+          Size);
       Checks.push_back(std::make_pair(LargeEnough, SanitizerKind::ObjectSize));
     }
   }
diff --git a/clang/test/CodeGen/catch-undef-behavior.c b/clang/test/CodeGen/catch-undef-behavior.c
index af37ef9e8565b1..41cbe6db881bf0 100644
--- a/clang/test/CodeGen/catch-undef-behavior.c
+++ b/clang/test/CodeGen/catch-undef-behavior.c
@@ -35,7 +35,7 @@
 void foo(void) {
   union { int i; } u;
 
-  // CHECK-COMMON: %[[SIZE:.*]] = call i64 @llvm.objectsize.i64.p0(ptr %[[PTR:.*]], i1 false, i1 false, i1 false)
+  // CHECK-COMMON: %[[SIZE:.*]] = call i64 @llvm.objectsize.i64.p0(ptr %[[PTR:.*]], i1 true, i1 false, i1 false, i1 false)
   // CHECK-COMMON-NEXT: %[[OK:.*]] = icmp uge i64 %[[SIZE]], 4
 
   // CHECK-UBSAN: br i1 %[[OK]], {{.*}} !prof ![[WEIGHT_MD:.*]], !nosanitize
diff --git a/clang/test/CodeGen/pass-object-size.c b/clang/test/CodeGen/pass-object-size.c
index c7c505b0fb3e78..cbc54db301a5c8 100644
--- a/clang/test/CodeGen/pass-object-size.c
+++ b/clang/test/CodeGen/pass-object-size.c
@@ -85,16 +85,16 @@ void test1(unsigned long sz) {
 
   char *ptr = (char *)malloc(sz);
 
-  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 false, i1 true, i1 true)
+  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 false, i1 true, i1 true)
   // CHECK: call i32 @DynamicObjectSize0(ptr noundef %{{.*}}, i64 noundef [[REG]])
   gi = DynamicObjectSize0(ptr);
 
   // CHECK: [[WITH_OFFSET:%.*]] = getelementptr
-  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[WITH_OFFSET]], i1 false, i1 true, i1 true)
+  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0(ptr [[WITH_OFFSET]], i1 true, i1 false, i1 true, i1 true)
   // CHECK: call i32 @DynamicObjectSize0(ptr noundef {{.*}}, i64 noundef [[REG]])
   gi = DynamicObjectSize0(ptr+10);
 
-  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 true, i1 true)
+  // CHECK: [[REG:%.*]] = call i64 @llvm.objectsize.i64.p0({{.*}}, i1 true, i1 true, i1 true, i1 true)
   // CHECK: call i32 @DynamicObjectSize2(ptr noundef {{.*}}, i64 noundef [[REG]])
   gi = DynamicObjectSize2(ptr);
 }
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst
index d881deb30049a2..054333fe52fcd5 100644
--- a/llvm/docs/LangRef.rst
+++ b/llvm/docs/LangRef.rst
@@ -26520,8 +26520,8 @@ Syntax:
 
 ::
 
-      declare i32 @llvm.objectsize.i32(ptr <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
-      declare i64 @llvm.objectsize.i64(ptr <object>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
+      declare i32 @llvm.objectsize.i32(ptr <object>, i1 <wholeobj>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
+      declare i64 @llvm.objectsize.i64(ptr <object>, i1 <wholeobj>, i1 <min>, i1 <nullunknown>, i1 <dynamic>)
 
 Overview:
 """""""""
@@ -26535,18 +26535,26 @@ class, structure, array, or other object.
 Arguments:
 """"""""""
 
-The ``llvm.objectsize`` intrinsic takes four arguments. The first argument is a
-pointer to or into the ``object``. The second argument determines whether
-``llvm.objectsize`` returns 0 (if true) or -1 (if false) when the object size is
-unknown. The third argument controls how ``llvm.objectsize`` acts when ``null``
-in address space 0 is used as its pointer argument. If it's ``false``,
-``llvm.objectsize`` reports 0 bytes available when given ``null``. Otherwise, if
-the ``null`` is in a non-zero address space or if ``true`` is given for the
-third argument of ``llvm.objectsize``, we assume its size is unknown. The fourth
-argument to ``llvm.objectsize`` determines if the value should be evaluated at
-runtime.
+The ``llvm.objectsize`` intrinsic takes five arguments:
+
+- The first argument is a pointer to or into the ``object``.
+- The second argument controls which size ``llvm.objectsize`` returns:
+  - If it's ``false``, ``llvm.objectsize`` returns the size of the closest
+    surrounding subobject.
+  - If it's ``true``, ``llvm.objectsize`` returns the size of the whole object.
+- The third argument controls which value to return when the size is unknown:
+  - If it's ``false``, ``llvm.objectsize`` returns ``-1``.
+  - If it's ``true``, ``llvm.objectsize`` returns ``0``.
+- The fourth argument controls how ``llvm.objectsize`` acts when ``null`` in
+  address space 0 is used as its pointer argument:
+  - If it's ``false``, ``llvm.objectsize`` reports 0 bytes available when given
+    ``null``.
+  - If it's ``true``, or the ``null`` pointer is in a non-zero address space,
+    the size is assumed to be unknown.
+- The fifth argument to ``llvm.objectsize`` determines if the value should be
+  evaluated at runtime.
 
-The second, third, and fourth arguments only accept constants.
+The second, third, fourth, and fifth arguments accept only constants.
 
 Semantics:
 """"""""""
diff --git a/llvm/include/llvm/Analysis/MemoryBuiltins.h b/llvm/include/llvm/Analysis/MemoryBuiltins.h
index 37ce1518f00c08..ee5cc950788678 100644
--- a/llvm/include/llvm/Analysis/MemoryBuiltins.h
+++ b/llvm/include/llvm/Analysis/MemoryBuiltins.h
@@ -157,6 +157,11 @@ struct ObjectSizeOpts {
   /// Whether to round the result up to the alignment of allocas, byval
   /// arguments, and global variables.
   bool RoundToAlign = false;
+  /// If this is true, return the whole size of the object. Otherwise, return
+  /// the size of the closest surrounding subobject.
+  /// FIXME: The default before being added was to return the whole size of the
+  /// object. Review if this is the correct default.
+  bool WholeObjectSize = true;
   /// If this is true, null pointers in address space 0 will be treated as
   /// though they can't be evaluated. Otherwise, null is always considered to
   /// point to a 0 byte region of memory.
@@ -231,6 +236,7 @@ class ObjectSizeOffsetVisitor
   APInt Zero;
   SmallDenseMap<Instruction *, SizeOffsetAPInt, 8> SeenInsts;
   unsigned InstructionsVisited;
+  const StructType *AllocaTy = nullptr;
 
   APInt align(APInt Size, MaybeAlign Align);
 
@@ -242,6 +248,8 @@ class ObjectSizeOffsetVisitor
 
   SizeOffsetAPInt compute(Value *V);
 
+  const StructType *getAllocaType() const { return AllocaTy; }
+
   // These are "private", except they can't actually be made private. Only
   // compute() should be used by external users.
   SizeOffsetAPInt visitAllocaInst(AllocaInst &I);
@@ -313,6 +321,7 @@ class ObjectSizeOffsetEvaluator
   PtrSetTy SeenVals;
   ObjectSizeOpts EvalOpts;
   SmallPtrSet<Instruction *, 8> InsertedInstructions;
+  const StructType *AllocaTy = nullptr;
 
   SizeOffsetValue compute_(Value *V);
 
diff --git a/llvm/include/llvm/IR/Intrinsics.td b/llvm/include/llvm/IR/Intrinsics.td
index b54c697296b20a..cef4c756d5a31e 100644
--- a/llvm/include/llvm/IR/Intrinsics.td
+++ b/llvm/include/llvm/IR/Intrinsics.td
@@ -1071,7 +1071,7 @@ def int_maximum : DefaultAttrsIntrinsic<[llvm_anyfloat_ty],
 
 // Internal interface for object size checking
 def int_objectsize : DefaultAttrsIntrinsic<[llvm_anyint_ty],
-                               [llvm_anyptr_ty, llvm_i1_ty,
+                               [llvm_anyptr_ty, llvm_i1_ty, llvm_i1_ty,
                                 llvm_i1_ty, llvm_i1_ty],
                                [IntrNoMem, IntrSpeculatable, IntrWillReturn,
                                 ImmArg<ArgIndex<1>>, ImmArg<ArgIndex<2>>,
diff --git a/llvm/lib/Analysis/MemoryBuiltins.cpp b/llvm/lib/Analysis/MemoryBuiltins.cpp
index 46a7a921d86d3d..56d6313905865e 100644
--- a/llvm/lib/Analysis/MemoryBuiltins.cpp
+++ b/llvm/lib/Analysis/MemoryBuiltins.cpp
@@ -615,12 +615,15 @@ Value *llvm::lowerObjectSizeCall(
   assert(ObjectSize->getIntrinsicID() == Intrinsic::objectsize &&
          "ObjectSize must be a call to llvm.objectsize!");
 
-  bool MaxVal = cast<ConstantInt>(ObjectSize->getArgOperand(1))->isZero();
   ObjectSizeOpts EvalOptions;
   EvalOptions.AA = AA;
 
+  EvalOptions.WholeObjectSize =
+      cast<ConstantInt>(ObjectSize->getArgOperand(1))->isOne();
+
   // Unless we have to fold this to something, try to be as accurate as
   // possible.
+  bool MaxVal = cast<ConstantInt>(ObjectSize->getArgOperand(2))->isZero();
   if (MustSucceed)
     EvalOptions.EvalMode =
         MaxVal ? ObjectSizeOpts::Mode::Max : ObjectSizeOpts::Mode::Min;
@@ -628,10 +631,10 @@ Value *llvm::lowerObjectSizeCall(
     EvalOptions.EvalMode = ObjectSizeOpts::Mode::ExactSizeFromOffset;
 
   EvalOptions.NullIsUnknownSize =
-      cast<ConstantInt>(ObjectSize->getArgOperand(2))->isOne();
+      cast<ConstantInt>(ObjectSize->getArgOperand(3))->isOne();
 
   auto *ResultType = cast<IntegerType>(ObjectSize->getType());
-  bool StaticOnly = cast<ConstantInt>(ObjectSize->getArgOperand(3))->isZero();
+  bool StaticOnly = cast<ConstantInt>(ObjectSize->getArgOperand(4))->isZero();
   if (StaticOnly) {
     // FIXME: Does it make sense to just return a failure value if the size won't
     // fit in the output and `!MustSucceed`?
@@ -726,6 +729,25 @@ SizeOffsetAPInt ObjectSizeOffsetVisitor::computeImpl(Value *V) {
   if (!IndexTypeSizeChanged && Offset.isZero())
     return SOT;
 
+  if (!Options.WholeObjectSize && AllocaTy) {
+    // At this point, SOT.Size is the size of the whole struct. However, we
+    // want the size of the sub-object.
+    const StructLayout &SL = *DL.getStructLayout(
+        const_cast<StructType *>(AllocaTy));
+
+    unsigned Idx = SL.getElementContainingOffset(Offset.getLimitedValue());
+
+    // Get the size of the sub-object.
+    TypeSize ElemSize = DL.getTypeAllocSize(AllocaTy->getTypeAtIndex(Idx));
+    APInt Size(InitialIntTyBits, ElemSize.getKnownMinValue());
+
+    // Adjust the offset to reflect the sub-object's offset.
+    TypeSize ElemOffset = SL.getElementOffset(Idx);
+    Offset -= ElemOffset.getKnownMinValue();
+
+    SOT = SizeOffsetAPInt(Size, Offset);
+  }
+
   // We stripped an address space cast that changed the index type size or we
   // accumulated some constant offset (or both). Readjust the bit width to match
   // the argument index type size and apply the offset, as required.
@@ -781,9 +803,12 @@ SizeOffsetAPInt ObjectSizeOffsetVisitor::visitAllocaInst(AllocaInst &I) {
   TypeSize ElemSize = DL.getTypeAllocSize(I.getAllocatedType());
   if (ElemSize.isScalable() && Options.EvalMode != ObjectSizeOpts::Mode::Min)
     return ObjectSizeOffsetVisitor::unknown();
+
   APInt Size(IntTyBits, ElemSize.getKnownMinValue());
-  if (!I.isArrayAllocation())
+  if (!I.isArrayAllocation()) {
+    AllocaTy = dyn_cast<StructType>(I.getAllocatedType());
     return SizeOffsetAPInt(align(Size, I.getAlign()), Zero);
+  }
 
   Value *ArraySize = I.getArraySize();
   if (const ConstantInt *C = dyn_cast<ConstantInt>(ArraySize)) {
@@ -1086,6 +1111,8 @@ SizeOffsetValue ObjectSizeOffsetEvaluator::compute(Value *V) {
 SizeOffsetValue ObjectSizeOffsetEvaluator::compute_(Value *V) {
   ObjectSizeOffsetVisitor Visitor(DL, TLI, Context, EvalOpts);
   SizeOffsetAPInt Const = Visitor.compute(V);
+  AllocaTy = Visitor.getAllocaType();
+
   if (Const.bothKnown())
     return SizeOffsetValue(ConstantInt::get(Context, Const.Size),
                            ConstantInt::get(Context, Const.Offset));
diff --git a/llvm/lib/IR/AutoUpgrade.cpp b/llvm/lib/IR/AutoUpgrade.cpp
index 1a9e474911b39c..0fafe91762d162 100644
--- a/llvm/lib/IR/AutoUpgrade.cpp
+++ b/llvm/lib/IR/AutoUpgrade.cpp
@@ -4393,12 +4393,43 @@ void llvm::UpgradeIntrinsicCall(CallBase *CI, Function *NewFn) {
     break;
 
   case Intrinsic::objectsize: {
-    Value *NullIsUnknownSize =
-        CI->arg_size() == 2 ? Builder.getFalse() : CI->getArgOperand(2);
-    Value *Dynamic =
-        CI->arg_size() < 4 ? Builder.getFalse() : CI->getArgOperand(3);
-    NewCall = Builder.CreateCall(
-        NewFn, {CI->getArgOperand(0), CI->getArgOperand(1), NullIsUnknownSize, Dynamic});
+    // The default behavior before the addition of the '<wholeobj>' argument
+    // was to return the size of the whole object.
+    Value *WholeObj = nullptr;
+    Value *UnknownVal = nullptr;
+    Value *NullIsUnknownSize = nullptr;
+    Value *Dynamic = nullptr;
+
+    switch (CI->arg_size()) {
+    case 2:
+      WholeObj = Builder.getTrue();
+      UnknownVal = CI->getArgOperand(1);
+      NullIsUnknownSize = Builder.getFalse();
+      Dynamic = Builder.getFalse();
+      break;
+    case 3:
+      WholeObj = Builder.getTrue();
+      UnknownVal = CI->getArgOperand(1);
+      NullIsUnknownSize = CI->getArgOperand(2);
+      Dynamic = Builder.getFalse();
+      break;
+    case 4:
+      WholeObj = Builder.getTrue();
+      UnknownVal = CI->getArgOperand(1);
+      NullIsUnknownSize = CI->getArgOperand(2);
+      Dynamic = CI->getArgOperand(3);
+      break;
+    case 5:
+      WholeObj = CI->getArgOperand(1);
+      UnknownVal = CI->getArgOperand(2);
+      NullIsUnknownSize = CI->getArgOperand(3);
+      Dynamic = CI->getArgOperand(4);
+      break;
+    }
+
+    NewCall =
+        Builder.CreateCall(NewFn, {CI->getArgOperand(0), WholeObj, UnknownVal,
+                                   NullIsUnknownSize, Dynamic});
     break;
   }
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index 5e73411cae9b70..3bb203442dd513 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -1486,8 +1486,8 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToLDS(AllocaInst &I,
            PointerType::get(Context, AMDGPUAS::LOCAL_ADDRESS)});
 
       CallInst *NewCall = Builder.CreateCall(
-          ObjectSize,
-          {Src, Intr->getOperand(1), Intr->getOperand(2), Intr->getOperand(3)});
+          ObjectSize, {Src, Intr->getOperand(1), Intr->getOperand(2),
+                       Intr->getOperand(3), Intr->getOperand(4)});
       Intr->replaceAllUsesWith(NewCall);
       Intr->eraseFromParent();
       continue;
diff --git a/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll b/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
index 3c39e456eed5f9..3d2e836b97517a 100644
--- a/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/X86/free-intrinsics.ll
@@ -18,7 +18,7 @@ define i32 @trivially_free() {
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true, i1 true)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a6 = call ptr @llvm.ptr.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.var.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret i32 undef
@@ -38,7 +38,7 @@ define i32 @trivially_free() {
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true, i1 true)
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a6 = call ptr @llvm.ptr.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.var.annotation.p0.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
 ; CHECK-THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
@@ -58,7 +58,7 @@ define i32 @trivially_free() {
   %a4 = call i1 @llvm.is.constant.i32(i32 undef)
   call void @llvm.lifetime.start.p0(i64 1, ptr undef)
   call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-  %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 1, i1 1, i1 1)
+  %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 1, i1 1, i1 1, i1 1)
   %a6 = call ptr @llvm.ptr.annotation.p0(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
   call void @llvm.var.annotation(ptr undef, ptr undef, ptr undef, i32 undef, ptr undef)
   ret i32 undef
@@ -79,7 +79,7 @@ declare ptr @llvm.strip.invariant.group.p0(ptr)
 declare i1 @llvm.is.constant.i32(i32)
 declare void @llvm.lifetime.start.p0(i64, ptr)
 declare void @llvm.lifetime.end.p0(i64, ptr)
-declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1)
+declare i64 @llvm.objectsize.i64.p0(ptr, i1, i1, i1, i1)
 declare ptr @llvm.ptr.annotation.p0(ptr, ptr, ptr, i32, ptr)
 declare void @llvm.var.annotation(ptr, ptr, ptr, i32, ptr)
 
diff --git a/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll b/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
index b5f204c2d9b26e..3674e87a6fc05b 100644
--- a/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
+++ b/llvm/test/Analysis/CostModel/free-intrinsics-datalayout.ll
@@ -20,7 +20,7 @@ define i32 @trivially_free() {
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a4 = call i1 @llvm.is.constant.i32(i32 undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.start.p0(i64 1, ptr undef)
 ; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: call void @llvm.lifetime.end.p0(i64 1, ptr undef)
-; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a5 = call i64 @llvm.objectsize.i64.p0(ptr undef, i1 true, i1 true, i1 true)
+; CHECK-SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %a...
[truncated]

Copy link

github-actions bot commented Jan 18, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this before, years ago. Previously, the sticking point was: how is LLVM going to know what the frontend considers the closest surrounding subobject to be? The LLVM type doesn't give you that information, and it seems like there's nowhere else that LLVM can get it from either, so this flag ends up not being useful and the best we can do is to give a conservatively-correct answer -- the complete object size if we want an upper bound, or 0 if we want a lower bound.

Clang does respect the "subobject" flag if it can symbolically evaluate the operand of __builtin_object_size sufficiently to determine which subobject is being referenced. Previously we've thought that that was the best we could do.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using anything but the size and alignment of the alloca type in a way that affects program semantics is illegal.

@bwendling
Copy link
Collaborator Author

We've discussed this before, years ago. Previously, the sticking point was: how is LLVM going to know what the frontend considers the closest surrounding subobject to be? The LLVM type doesn't give you that information, and it seems like there's nowhere else that LLVM can get it from either, so this flag ends up not being useful and the best we can do is to give a conservatively-correct answer -- the complete object size if we want an upper bound, or 0 if we want a lower bound.

Right now we're giving a wrong answer (see below), so that's no good. If we're going to err on the side of caution, then we should return the default "can't calculate the size" value.

When you say that we can't detect what the front-end considers the "closest surrounding subobject" to be, is that mostly due to corner cases or is it a more general concern? (Note, we're really only interested in supporting this for C structs. C++ structs / classes would require therapy.)

Clang does respect the "subobject" flag if it can symbolically evaluate the operand of __builtin_object_size sufficiently to determine which subobject is being referenced. Previously we've thought that that was the best we could do.

This is why the current behavior is wrong, in my opinion. The motivating example is below:

struct suspend_stats {
        int     success;
        int     fail;
        int     failed_freeze;
        int     failed_prepare;
        int     failed_suspend;
        int     failed_suspend_late;
        int     failed_suspend_noirq;
        int     failed_resume;
        int     failed_resume_early;
        int     failed_resume_noirq;
#define REC_FAILED_NUM  2
        int     last_failed_dev;
        char    failed_devs[REC_FAILED_NUM][40]; /* offsetof(struct suspend_stats, failed_devs) == 44 */
        int     last_failed_errno;
        int     bar;
};

#define report(x) __builtin_printf(#x ": %zu\n", x)

int main(int argc, char *argv[])
{
    struct suspend_stats foo;

    report(sizeof(foo.failed_devs[1]));
    report(sizeof(foo.failed_devs[argc]));
    report(__builtin_dynamic_object_size(&foo.fail, 0));
    report(__builtin_dynamic_object_size(&foo.fail, 1));
    report(__builtin_dynamic_object_size(&foo.failed_freeze, 0));
    report(__builtin_dynamic_object_size(foo.failed_devs[1], 0));
    report(__builtin_dynamic_object_size(foo.failed_devs[1], 1));
    report(__builtin_dynamic_object_size(foo.failed_devs[argc], 0));
    report(__builtin_dynamic_object_size(foo.failed_devs[argc], 1));

    return 0;
}

The output with this change is now:

__builtin_dynamic_object_size(&foo.fail, 0): 128
__builtin_dynamic_object_size(&foo.fail, 1): 4
__builtin_dynamic_object_size(&foo.failed_freeze, 0): 124
__builtin_dynamic_object_size(foo.failed_devs[1], 0): 48
__builtin_dynamic_object_size(foo.failed_devs[1], 1): 40
__builtin_dynamic_object_size(foo.failed_devs[argc], 0): 48
__builtin_dynamic_object_size(foo.failed_devs[argc], 1): 40

Without the change, the last line is:

__builtin_dynamic_object_size(foo.failed_devs[argc], 1): 48

Which isn't correct according to GNU's documentation. So if we can't honor the TYPE bit, then we should return -1 / 0 here, right?

@bwendling
Copy link
Collaborator Author

Using anything but the size and alignment of the alloca type in a way that affects program semantics is illegal.

I'm sorry, but I don't understand your comment here. Could you supply some context?

@zygoloid
Copy link
Collaborator

When you say that we can't detect what the front-end considers the "closest surrounding subobject" to be, is that mostly due to corner cases or is it a more general concern?

It's a more general concern: LLVM simply has no idea what the frontend considers to be a subobject. The LLVM type doesn't carry that information.

struct suspend_stats {
        //...
        char    failed_devs[REC_FAILED_NUM][40];
        int     last_failed_errno;
        int     bar;
};
//...

Without the change, the last line is:

__builtin_dynamic_object_size(foo.failed_devs[argc], 1): 48

Which isn't correct according to GNU's documentation. So if we can't honor the TYPE bit, then we should return -1 / 0 here, right?

Perhaps according to the GCC documentation as written. But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48. Do you have a use case for which -1 is a better answer?

I suspect the only change we're missing here is a change to our documentation to explicitly say that we give an upper/lower bound when we can't compute an exact size.

@bwendling
Copy link
Collaborator Author

Perhaps according to the GCC documentation as written. But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48.

It's the second bit that controls whether it's an upper bound or lower bound that's returned, not the least significant bit.

Do you have a use case for which -1 is a better answer?

I'm sure one could be constructed (for instance, someone wanting to check if it's okay to strcpy a string of size 48 to a pointer to failed_devs[argc] in the example above), but that's not really the point.

We're trying to implement a GNU builtin, and the only defined semantics we have to go on are GNU's documentation. I can't see how we can deviate from their documentation unless it's to say "we can't determine this value" and so return -1 instead of an answer that might be wildly wrong and potentially cause a memory leak of some sort. In my made-up example, if we said, "Yes you can write up to 48 bytes into p->failed_devs[argc], then a user may overwrite the two fields after field_devs. If we return -1, they'll have to take the "slow", but ideally more secure, route.

Lastly, returning 48 in the made-up case is different from the case when an immediate is used rather than a variable, which is certainly unexpected. We could document that we can't support GNU's implementation exactly, but I don't think that's good enough.

From what I can tell, GNU's idea of what constitutes a "closest surrounding sub-object" is one of a handful of things: the structure containing the field, the array being pointed to with a struct, or the field being pointed into. Example of the last one:

struct foo {
    int a;
    int b;
    int c;
};

size_t foo(struct foo *p, int index) {
    return __builtin_dynamic_object_size(&((char *)p->b)[idx], 1); // b is the sub-object.
}

All of these are explicit in the LLVM IR. Is the worry that they've been changed from some transformations? Or are there others I'm missing?

@nikic
Copy link
Contributor

nikic commented Jan 19, 2024

@bwendling I think you are reading the GCC docs too pedantically. In particular, they also say

If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero.

which makes it abundantly clear that what you get is an upper bound or lower bound, respectively. -1 and 0 are just the upper and lower bounds if you have no useful information at all. If you want to check whether the size is exactly known, you'll have to compare both bounds. Outside of doing that, you can never assume that the bound is precise.

Btw, it looks like your initial example gets 48 for both modes on GCC as well? https://c.godbolt.org/z/EfGWv4Wrh

All of these are explicit in the LLVM IR. Is the worry that they've been changed from some transformations? Or are there others I'm missing?

Apart from the fact that what you are doing is simply illegal under our IR semantics, a practical case where this will likely compute incorrect results are unions. For unions, clang will use the type of the union member with the largest size as the alloca type, regardless of which union member is active. I haven't tried, but your patch will probably compute the subobject size based on that arbitrarily picked member, rather than the one being accessed.

@zygoloid
Copy link
Collaborator

For unions, clang will use the type of the union member with the largest size as the alloca type, regardless of which union member is active. I haven't tried, but your patch will probably compute the subobject size based on that arbitrarily picked member, rather than the one being accessed.

Another concrete case where using the IR type won't work is arrays whose initializer is mostly-zeroes. For example, for

int arr[20] = {[5] = 6};

we give arr the IR type <{ i32, i32, i32, i32, i32, i32, [14 x i32] }> so that we can use a zeroinitializer to fill the trailing portion of the array.

@bwendling
Copy link
Collaborator Author

@bwendling I think you are reading the GCC docs too pedantically. In particular, they also say

If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero.

Isn't this referring to pointing to an object in a union? That's only way this makes sense (in C).

which makes it abundantly clear that what you get is an upper bound or lower bound, respectively. -1 and 0 are just the upper and lower bounds if you have no useful information at all. If you want to check whether the size is exactly known, you'll have to compare both bounds. Outside of doing that, you can never assume that the bound is precise.

I'm still not convinced that it's abundantly clear... but if we want to compare max vs. min, things get worse because we return 0 when the type is 3 (https://godbolt.org/z/4f5onM3W7). So even with checking upper and lower bounds, we come away knowing nothing about the size of the object (probably a deficiency in Clang).

Btw, it looks like your initial example gets 48 for both modes on GCC as well? https://c.godbolt.org/z/EfGWv4Wrh

If you remove the argc = 1;, you get the difference. Your example might be a GCC bug.

All of these are explicit in the LLVM IR. Is the worry that they've been changed from some transformations? Or are there others I'm missing?

Apart from the fact that what you are doing is simply illegal under our IR semantics, a practical case where this will likely compute incorrect results are unions.

?? I think it's a bit much to say that this is "illegal." We may not get every instance of something, but we can certainly determine the size of a struct. If by "illegal" you mean determining what a sub-object is, I don't agree but if it's so there might be a way to retain whatever information is needed to determine what a sub-object is.

For unions, clang will use the type of the union member with the largest size as the alloca type, regardless of which union member is active. I haven't tried, but your patch will probably compute the subobject size based on that arbitrarily picked member, rather than the one being accessed.

We don't seem to support unions correctly now:

$ cat ~/llvm/__bdos3.c
struct suspend_stats {
        int     failed_resume_noirq;
        int     last_failed_dev;
        union {
          char    failed_devs[2][40];
          char    failed_devs2[38];
        };
        int     last_failed_errno;
        int     bar;
};

#define report(x) __builtin_printf(#x ": %zu\n", x)

int main(int argc, char *argv[])
{
    struct suspend_stats foo;

    report(__builtin_dynamic_object_size(&foo.failed_devs2[argc], 0));
    report(__builtin_dynamic_object_size(&foo.failed_devs2[argc], 1));

    report(__builtin_dynamic_object_size(&foo.failed_devs2[argc], 2));
    report(__builtin_dynamic_object_size(&foo.failed_devs2[argc], 3));


    return 0;
}
$ clang -O2 ~/llvm/__bdos3.c && ./a.out
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 0): 87
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 1): 87
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 2): 87
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 3): 0
$ gcc -O2 ~/llvm/__bdos3.c && ./a.out
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 0): 87
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 1): 37
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 2): 87
__builtin_dynamic_object_size(&foo.failed_devs2[argc], 3): 37

@bwendling
Copy link
Collaborator Author

For unions, clang will use the type of the union member with the largest size as the alloca type, regardless of which union member is active. I haven't tried, but your patch will probably compute the subobject size based on that arbitrarily picked member, rather than the one being accessed.

Another concrete case where using the IR type won't work is arrays whose initializer is mostly-zeroes. For example, for

int arr[20] = {[5] = 6};

we give arr the IR type <{ i32, i32, i32, i32, i32, i32, [14 x i32] }> so that we can use a zeroinitializer to fill the trailing portion of the array.

My patch only works with structs.

@zygoloid
Copy link
Collaborator

But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48.

It's the second bit that controls whether it's an upper bound or lower bound that's returned, not the least significant bit.

Right, that's what I said: modes 0 and 1 ask for an upper bound. (Modes 2 and 3 ask for a lower bound.)

We're trying to implement a GNU builtin, and the only defined semantics we have to go on are GNU's documentation. I can't see how we can deviate from their documentation unless it's to say "we can't determine this value" and so return -1 instead of an answer that might be wildly wrong and potentially cause a memory leak of some sort.

We're not really deviating from the documented rules. By the time we get to LLVM IR lowering of the builtin, we have lost the precise frontend information. But we know the pointer might point into the complete object (where it would be able to write to 48 bytes) or to some subobject of that complete object (where it would be able to write to 48 bytes or less). Therefore the correct answer to return is 48.

In the same way, it would be correct to return 48 here:

char a[48];
char b[40];
bool always_false = false; // happens to never be modified
int n = __builtin_dynamic_object_size(always_false ? a  : b, 1);

... though we miscompile this and return 40 regardless of whether we're in upper bound or lower bound mode. :-(

In my made-up example, if we said, "Yes you can write up to 48 bytes into p->failed_devs[argc], then a user may overwrite the two fields after field_devs. If we return -1, they'll have to take the "slow", but ideally more secure, route.

No, that is not what we are saying when we return 48. We're saying "a write past 48 bytes is definitely bad". If you want a lower bound on the number of bytes that you can write, then you need to use mode 2 or 3 instead.

@zygoloid
Copy link
Collaborator

Taking a step back, while this patch is not the right direction, we can and should do better for the original example. Probably the best way to do that is to analyze the operand to __builtin_[dynamic_]object_size in the frontend and compute a better bound based on the form of the expression. It looks like it should be feasible to produce a tighter bound for an array-to-pointer-decay expression like f.bar[argc] in subobject mode, as:

  • select llvm.is.constant(argc) and (argc < 0 or argc >= 2), 0, sizeof(f.bar[argc]) for the non-dynamic case, and just
  • select (argc < 0 or argc >= 2), 0, sizeof(f.bar[argc]) for the dynamic case.

A possibly simpler alternative would be for the frontend to pass an upper bound on the result to the LLVM builtin in mode 1, so Clang could say "I know the result will never be more than 40" and LLVM could provide either that size or the complete object size, whichever is smaller. That wouldn't give as tight a bound for the argc == 2 case, though.

@bwendling
Copy link
Collaborator Author

But mode 0 and 1 are in general asking for an upper bound on the accessible bytes (that is, an N so any.access beyond N bytes is definitely out of bounds), so it seems to me that returning -1 is strictly worse than returning 48.

It's the second bit that controls whether it's an upper bound or lower bound that's returned, not the least significant bit.

Right, that's what I said: modes 0 and 1 ask for an upper bound. (Modes 2 and 3 ask for a lower bound.)

Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.

We're trying to implement a GNU builtin, and the only defined semantics we have to go on are GNU's documentation. I can't see how we can deviate from their documentation unless it's to say "we can't determine this value" and so return -1 instead of an answer that might be wildly wrong and potentially cause a memory leak of some sort.

We're not really deviating from the documented rules. By the time we get to LLVM IR lowering of the builtin, we have lost the precise frontend information. But we know the pointer might point into the complete object (where it would be able to write to 48 bytes) or to some subobject of that complete object (where it would be able to write to 48 bytes or less). Therefore the correct answer to return is 48.

In the same way, it would be correct to return 48 here:

char a[48];
char b[40];
bool always_false = false; // happens to never be modified
int n = __builtin_dynamic_object_size(always_false ? a  : b, 1);

... though we miscompile this and return 40 regardless of whether we're in upper bound or lower bound mode. :-(

In my made-up example, if we said, "Yes you can write up to 48 bytes into p->failed_devs[argc], then a user may overwrite the two fields after field_devs. If we return -1, they'll have to take the "slow", but ideally more secure, route.

No, that is not what we are saying when we return 48. We're saying "a write past 48 bytes is definitely bad". If you want a lower bound on the number of bytes that you can write, then you need to use mode 2 or 3 instead.

But in the example, writing up to 48 bytes is definitely bad. And as I mentioned, we disagree on what's meant by "upper" and "lower" bounds in the case of a pointer to a non-union field. For me, that's managed by the first bit.

I know that we lose precise struct information going to LLVM IR. If that's what's needed here, there are ways to pass this information along. We retain this information via DWARF. We could use similar metadata for this instance. Would that be acceptable?

@bwendling
Copy link
Collaborator Author

Taking a step back, while this patch is not the right direction, we can and should do better for the original example. Probably the best way to do that is to analyze the operand to __builtin_[dynamic_]object_size in the frontend and compute a better bound based on the form of the expression. It looks like it should be feasible to produce a tighter bound for an array-to-pointer-decay expression like f.bar[argc] in subobject mode, as:

  • select llvm.is.constant(argc) and (argc < 0 or argc >= 2), 0, sizeof(f.bar[argc]) for the non-dynamic case, and just
  • select (argc < 0 or argc >= 2), 0, sizeof(f.bar[argc]) for the dynamic case.

A possibly simpler alternative would be for the frontend to pass an upper bound on the result to the LLVM builtin in mode 1, so Clang could say "I know the result will never be more than 40" and LLVM could provide either that size or the complete object size, whichever is smaller. That wouldn't give as tight a bound for the argc == 2 case, though.

It might be possible to do something like this. If retaining precise type information from the front-end is the sticking issue, it could be as simple as casting this pointer top the type of the sub-object. I'll see what I can do.

@zygoloid
Copy link
Collaborator

Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.

Per @nikic's example, it seems reasonably clear to me that GCC's intended semantics are to get either the best upper bound or the best lower bound that the compiler is able to compute. I mean sure, we can ask, and it does no harm to do so, but it is not reasonable to expect that a quantity like this coming from the vagaries of GCC's implementation details will be exactly the same for all examples when computed by a different implementation. (It's not even the same across optimization levels in GCC.)

I know that we lose precise struct information going to LLVM IR. If that's what's needed here, there are ways to pass this information along. We retain this information via DWARF. We could use similar metadata for this instance. Would that be acceptable?

In theory, yes, we could preserve enough information in metadata to compute this in the middle-end. But we would need to emit a lot of metadata, just on the off-chance that after optimization we happen to have a builtin_object_size query that points to each object that we emit, so in practice I don't think there's any chance we can do this. We can't use DWARF, because it won't necessarily be available (and typically won't be available in the interesting case where we find the object size only after optimization), and in any case, the presence or absence of DWARF isn't supposed to affect the executable code. Also, we'd need to annotate things that simply don't exist at all in the LLVM IR:

typedef struct X { int a, b; } X;
int f(void) {
  X *p = malloc(sizeof(X));
  int *q = &p->a;
  return __builtin_object_size(q, 1);
}

Here, f ideally would return 4, but at the LLVM IR level, p and q are identical values and the &p->a operation is a no-op. In cases like this, the best we can realistically do is to return 8.

@bwendling
Copy link
Collaborator Author

Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.

Per @nikic's example, it seems reasonably clear to me that GCC's intended semantics are to get either the best upper bound or the best lower bound that the compiler is able to compute. I mean sure, we can ask, and it does no harm to do so, but it is not reasonable to expect that a quantity like this coming from the vagaries of GCC's implementation details will be exactly the same for all examples when computed by a different implementation. (It's not even the same across optimization levels in GCC.)

I completely agree that using GCC's implementation and documentation is definitely lacking. That's the pitfalls of relying upon features added not by the standards committee, but by compiler developers. They're likely to be buggy as hell for unforeseen corner cases (despite some developers being on the standards committee).

My answer for the question "what's the semantics of GCC's builtin X?" has always been "whatever GCC does." It's the best we can rely upon. But then we get into situations like this, where you and @nikic have one interpretation of their documentation and I have another. I can point to their behavior to back up my claim, but in the end it's probably not exactly clear even to GCC.

My concern is that we want to use this for code hardening. Without precise object sizes, we're hampered in our goal. The unfortunate reality is that we can only get that size via these __builtin_[dynamic_]object_size functions. Thankfully, Apple has a way to help alleviate some of these issues, which they're push out soon-ish.

I know that we lose precise struct information going to LLVM IR. If that's what's needed here, there are ways to pass this information along. We retain this information via DWARF. We could use similar metadata for this instance. Would that be acceptable?

In theory, yes, we could preserve enough information in metadata to compute this in the middle-end. But we would need to emit a lot of metadata, just on the off-chance that after optimization we happen to have a builtin_object_size query that points to each object that we emit, so in practice I don't think there's any chance we can do this. We can't use DWARF, because it won't necessarily be available (and typically won't be available in the interesting case where we find the object size only after optimization), and in any case, the presence or absence of DWARF isn't supposed to affect the executable code. Also, we'd need to annotate things that simply don't exist at all in the LLVM IR:

typedef struct X { int a, b; } X;
int f(void) {
  X *p = malloc(sizeof(X));
  int *q = &p->a;
  return __builtin_object_size(q, 1);
}

Here, f ideally would return 4, but at the LLVM IR level, p and q are identical values and the &p->a operation is a no-op. In cases like this, the best we can realistically do is to return 8.

The sub-object for &p->a and even &p->b is struct X, not the integers themselves. If you want that, you'll have to use casts: &((char *)p->b)[2];. (I had to take care to get that correct.) So f should return 8 (note it's likely to get 8 from the alloc_size attribute on malloc in your example).

For the record, I didn't mean that we should use DWARF. I'm not that much of a masochist. :-)

@zygoloid
Copy link
Collaborator

My answer for the question "what's the semantics of GCC's builtin X?" has always been "whatever GCC does." It's the best we can rely upon. But then we get into situations like this, where you and @nikic have one interpretation of their documentation and I have another. I can point to their behavior to back up my claim, but in the end it's probably not exactly clear even to GCC.

@nikic demonstrated that our current behavior is already compatible with GCC's behavior. If GCC's behavior is the spec, then we are allowed to return 48 rather than only 40 or -1 (or presumably 0 if argc is out of bounds) for the original example, because in some cases GCC does so.

My concern is that we want to use this for code hardening. Without precise object sizes, we're hampered in our goal. The unfortunate reality is that we can only get that size via these __builtin_[dynamic_]object_size functions.

That's a totally understandable desire, but I think it's not realistic to expect precise subobject sizes in the same cases that GCC can provide them, due to the different architectural choices in the two compilers. If we had a mid-level IR for Clang that still had frontend information, we could do better by evaluating BOS there, so maybe that's one long term path forward to consider. And in the short term, while there are cases where we won't be able to match GCC, I think Clang should do better than it currently does in the frontend, specifically in cases like the one in the bug report where there's an obvious better answer that doesn't require any sophisticated analysis to discover.

Here, f ideally would return 4, but at the LLVM IR level, p and q are identical values and the &p->a operation is a no-op. In cases like this, the best we can realistically do is to return 8.

The sub-object for &p->a and even &p->b is struct X, not the integers themselves. If you want that, you'll have to use casts: &((char *)p->b)[2];. (I had to take care to get that correct.) So f should return 8 (note it's likely to get 8 from the alloc_size attribute on malloc in your example).

GCC disagrees with you: https://godbolt.org/z/s4P74oEqx

@bwendling
Copy link
Collaborator Author

My answer for the question "what's the semantics of GCC's builtin X?" has always been "whatever GCC does." It's the best we can rely upon. But then we get into situations like this, where you and @nikic have one interpretation of their documentation and I have another. I can point to their behavior to back up my claim, but in the end it's probably not exactly clear even to GCC.

@nikic demonstrated that our current behavior is already compatible with GCC's behavior. If GCC's behavior is the spec, then we are allowed to return 48 rather than only 40 or -1 (or presumably 0 if argc is out of bounds) for the original example, because in some cases GCC does so.

No, he exposed a fluke in their implementation. I gave examples of where clang gets it wrong, and yet I apparently haven't "proven" anything according to this logic?

My concern is that we want to use this for code hardening. Without precise object sizes, we're hampered in our goal. The unfortunate reality is that we can only get that size via these __builtin_[dynamic_]object_size functions.

That's a totally understandable desire, but I think it's not realistic to expect precise _sub_object sizes in the same cases that GCC can provide them, due to the different architectural choices in the two compilers. If we had a mid-level IR for Clang that still had frontend information, we could do better by evaluating BOS there, so maybe that's one long term path forward to consider. And in the short term, while there are cases where we won't be able to match GCC, I think Clang should do better than it currently does in the frontend,

We're not talking about something like inline asm, where the internals of their register allocator is more-or-less exposed to the user, and therefore it's next to impossible for us to replicate all of it. This feature is far simpler, and should be something we can determine based on all of the information we're given.

specifically in cases like the one in the bug report where there's an obvious better answer that doesn't require any sophisticated analysis to discover.

What "obvious better answer" is that?

@siddhesh
Copy link
Contributor

Perhaps we need clarification on what GCC means by "may point to multiple objects" in this instance. To me that means either "get me the size of the largest of these multiple objects" or "size of the smallest." In my eyes, that means pointing to a union field.

It's not just a union field, it could literally point to one of many potential objects, e.g.:

  struct A *obj1 = ...;
  char *obj2 = ...;
  int *obj3 = ...;

  void *ptr = cond1 == satisfied ? (cond2 == satisfied ? obj1 : obj2) : obj3;

  return __builtin_object_size (ptr, 1);

Here, __builtin_object_size will return the maximum estimate it can compute among the three potential objects that ptr could point to. __builtin_dynamic_object_size is special in that it can handle this case, i.e. return an expression that will compute the size of the object returned through that condition but I retained the 'estimate' wording for the specification in that documentation because of reasons that have already been touched upon in this discussion.

In general, we want to allow returning a conservative estimate whenever possible and not fail. This is a deliberate design decision because the key usage of __builtin_object_size (and consequently, __builtin_dynamic_object_size) is exploit mitigation and we're much better off providing a conservative return value than completely bailing out. In the above case, returning the whole object size where the subobject size is unavailable is better than bailing out because while it cannot do the ideal thing of protecting the precise bounds, it at least bounds any potential overflows to the whole object., thus limiting what can be done in an exploit that uses this overflow.

I obviously have no comments on the patch itself, but hopefully this gives enough context to interpret the GCC implementation and/or documentation. It would be ideal to always return a precise object size expression, but falling back to an estimate is better than simply bailing out.

@efriedma-quic
Copy link
Collaborator

Trying to summarize my understanding here:

  • Using the type of an alloca in LLVM IR is wrong, for a variety of reasons. (At this point, the type of an alloca is basically just leftover junk from before the opaque pointer transition; I expect that we'll kill it off completely at some point.)
  • There's some dispute over what it means to compute the size of a subobject "conservatively", in particular what's supposed to happen if we can't identify a subobject.
  • clang returns significantly worse results than gcc in many cases because we don't encode subobject information into LLVM IR, and there's no IR between clang ASTs and LLVM IR we can use for this sort of analysis. Encoding the information into LLVM IR (metadata on allocas, or something like that) might be feasible, but tricky to design well. Doing dataflow analysis directly on the clang AST is extremely complicated and bug-prone because we don't directly encode control flow.
  • Both gcc and LLVM compute object sizes after optimization in some cases; it's impossible to completely replicate that behavior because it depends on unrelated optimizations driven by heuristics. But that's not really relevant for the cases we're discussing here.

@nikic
Copy link
Contributor

nikic commented Jan 23, 2024

Thanks for summarizing, that matches my understanding.

As to how to implement this "properly", that would probably be https://discourse.llvm.org/t/exploring-the-effects-and-uses-of-the-memory-region-declaration-intrinsic/72756. The memory.region.decl intrinsics effectively encode subobjects. However, doing this has some fairly substantial externalities.

@bwendling
Copy link
Collaborator Author

It matches my understanding too. There are more issues with __bdos that arose due to this discussion and related discussions with the GCC maintainers.

I'm exploring some ways of fixing this issue:

  • As @efriedma-quic mentioned, generating the size calculation in the front end (thus eliminating the need for the intrinsic) would be difficult in the general case, but may be possible for some simpler ones. One benefit of generating the calculation instead of the intrinsic is a potential win for inlining.

  • If the main issue is that we don't know the subtype information, maybe we could add the subtype as part of the llvm.objectsize intrinsic and use that instead of grappling with the whole object's type. I'm unsure if this will work correctly for all cases though. (I don't readily know of any transformation that changes a structure's layout. Does it exist?) It's similar to the llvm.memory.region.decl proposal, but less intrusive.

@efriedma-quic
Copy link
Collaborator

maybe we could add the subtype as part of the llvm.objectsize intrinsic and use that instead of grappling with the whole object's type

I'm not sure I follow; if you know the object's type, doesn't that mean you also know its size?

(I don't readily know of any transformation that changes a structure's layout. Does it exist?)

Any such transform has to reason about all the uses, so the llvm.objectsize call itself would prevent the transform from happening.

@bwendling
Copy link
Collaborator Author

maybe we could add the subtype as part of the llvm.objectsize intrinsic and use that instead of grappling with the whole object's type

I'm not sure I follow; if you know the object's type, doesn't that mean you also know its size?

Not necessarily. If you have something like this:

struct x {
    int a;
    char foo[2][40];
    int b;
    int c;
};

size_t f(struct x *p, int idx) {
    return __builtin_dynamic_object_size(&p->foo[idx], 1);
}

But in general, it would be tricky for something like:

__builtin_dynamic_object_size(&((char *)p)[idx], 1);

which I'm not sure if that's something GCC can handle. The documentation says that if the pointer can point to multiple objects, then we can return the whole object value (or something to that affect).

(I don't readily know of any transformation that changes a structure's layout. Does it exist?)

Any such transform has to reason about all the uses, so the llvm.objectsize call itself would prevent the transform from happening.

Bon! :-)

@efriedma-quic
Copy link
Collaborator

struct x {
    int a;
    char foo[2][40];
    int b;
    int c;
};

size_t f(struct x *p, int idx) {
    return __builtin_dynamic_object_size(&p->foo[idx], 1);
}

If I'm following correctly, the return here is 0, 40, or 80, depending on the value of idx? That's not a constant, but the computation is entirely syntactic; it doesn't matter what "p" actually points to. So clang can lower the builtin itself. Currently it doesn't, I think, because all the relevant code is in ExprConstant, but the code could be adapted.

The problem, really, is that we can't easily extend that approach to stuff like the following:

size_t f(struct x *p, int idx) {
    char *c = &p->foo[idx];
    return __builtin_dynamic_object_size(c, 1);
}

@bwendling
Copy link
Collaborator Author

struct x {
    int a;
    char foo[2][40];
    int b;
    int c;
};

size_t f(struct x *p, int idx) {
    return __builtin_dynamic_object_size(&p->foo[idx], 1);
}

If I'm following correctly, the return here is 0, 40, or 80, depending on the value of idx? That's not a constant, but the computation is entirely syntactic; it doesn't matter what "p" actually points to. So clang can lower the builtin itself. Currently it doesn't, I think, because all the relevant code is in ExprConstant, but the code could be adapted.

Right. That's what I want to add to the front-end.

The problem, really, is that we can't easily extend that approach to stuff like the following:

size_t f(struct x *p, int idx) {
    char *c = &p->foo[idx];
    return __builtin_dynamic_object_size(c, 1);
}

Yup! I've been forbidden from doing this in the back-end, so I have to jump through hoops now and do partial solutions and hope that it works for most people and that when we get it wrong it doesn't hurt security (spoilers: it will).

@arsenm arsenm removed their request for review January 30, 2024 06:02
@zygoloid
Copy link
Collaborator

It might be academic at this point, but for what it's worth, __builtin_dynamic_object_size is not a GCC builtin that clang copied, it's a clang builtin that GCC copied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants