Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang] Implement the 'counted_by' attribute #76348

Merged
merged 11 commits into from
Jan 10, 2024

Conversation

bwendling
Copy link
Collaborator

@bwendling bwendling commented Dec 25, 2023

The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have at least 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }

In this example, an update to 'p->count' maintains the relationship
requirement:

  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }

The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member.

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }

In this example, an update to 'p->count' maintains the relationship
requirement:

  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen labels Dec 25, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 25, 2023

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-clang

Author: Bill Wendling (bwendling)

Changes

The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

struct bar;
struct foo {
int count;
struct inner {
struct {
int count; /* The 'count' referenced by 'counted_by' /
};
struct {
/
... */
struct bar *array[] attribute((counted_by(count)));
};
} baz;
};

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] attribute((counted_by(count)));
};

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have at least 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

struct foo *p;

void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

void use_foo(int index, int val) {
p->count += 42;
p->array[index] = val; /* The sanitizer can't properly check this access */
}

In this example, an update to 'p->count' maintains the relationship
requirement:

void use_foo(int index, int val) {
if (p->count == 0)
return;
--p->count;
p->array[index] = val;
}


Patch is 183.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76348.diff

20 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+5)
  • (modified) clang/include/clang/AST/DeclBase.h (+10)
  • (modified) clang/include/clang/Basic/Attr.td (+18)
  • (modified) clang/include/clang/Basic/AttrDocs.td (+78)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+13)
  • (modified) clang/include/clang/Sema/Sema.h (+3)
  • (modified) clang/include/clang/Sema/TypoCorrection.h (+8-4)
  • (modified) clang/lib/AST/ASTImporter.cpp (+13)
  • (modified) clang/lib/AST/DeclBase.cpp (+73-1)
  • (modified) clang/lib/AST/Expr.cpp (+10-73)
  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+191)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+345-6)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+21)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+6)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+133)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+9-7)
  • (added) clang/test/CodeGen/attr-counted-by.c (+1815)
  • (modified) clang/test/CodeGen/bounds-checking.c (+9-1)
  • (modified) clang/test/Misc/pragma-attribute-supported-attributes-list.test (+1)
  • (added) clang/test/Sema/attr-counted-by.c (+64)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index ee211c16a48ac8..cfd89e32c3d0b9 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -202,6 +202,11 @@ C Language Changes
 - Enums will now be represented in TBAA metadata using their actual underlying
   integer type. Previously they were treated as chars, which meant they could
   alias with all other types.
+- Clang now supports the C-only attribute ``counted_by``. When applied to a
+  struct's flexible array member, it points to the struct field that holds the
+  number of elements in the flexible array member. This information can improve
+  the results of the array bound sanitizer and the
+  ``__builtin_dynamic_object_size`` builtin.
 
 C23 Feature Support
 ^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/AST/DeclBase.h b/clang/include/clang/AST/DeclBase.h
index 10dcbdb262d84e..5b1038582bc674 100644
--- a/clang/include/clang/AST/DeclBase.h
+++ b/clang/include/clang/AST/DeclBase.h
@@ -19,6 +19,7 @@
 #include "clang/AST/SelectorLocationsKind.h"
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/LLVM.h"
+#include "clang/Basic/LangOptions.h"
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Basic/Specifiers.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -488,6 +489,15 @@ class alignas(8) Decl {
   // Return true if this is a FileContext Decl.
   bool isFileContextDecl() const;
 
+  /// Whether it resembles a flexible array member. This is a static member
+  /// because we want to be able to call it with a nullptr. That allows us to
+  /// perform non-Decl specific checks based on the object's type and strict
+  /// flex array level.
+  static bool isFlexibleArrayMemberLike(
+      ASTContext &Context, const Decl *D, QualType Ty,
+      LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel,
+      bool IgnoreTemplateOrMacroSubstitution);
+
   ASTContext &getASTContext() const LLVM_READONLY;
 
   /// Helper to get the language options from the ASTContext.
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index db17211747b17d..f8f55ef513d07e 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -4349,3 +4349,21 @@ def CodeAlign: StmtAttr {
     static constexpr int MaximumAlignment = 4096;
   }];
 }
+
+def CountedBy : InheritableAttr {
+  let Spellings = [Clang<"counted_by">];
+  let Subjects = SubjectList<[Field]>;
+  let Args = [IdentifierArgument<"CountedByField">];
+  let Documentation = [CountedByDocs];
+  let LangOpts = [COnly];
+  // FIXME: This is ugly. Let using a DeclArgument would be nice, but a Decl
+  // isn't yet available due to the fact that we're still parsing the
+  // structure. Maybe that code could be changed sometime in the future.
+  code AdditionalMembers = [{
+    private:
+      SourceRange CountedByFieldLoc;
+    public:
+      SourceRange getCountedByFieldLoc() const { return CountedByFieldLoc; }
+      void setCountedByFieldLoc(SourceRange Loc) { CountedByFieldLoc = Loc; }
+  }];
+}
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 98a7ecc7fd7df3..a795adc770f061 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -7722,3 +7722,81 @@ Both coroutines and coroutine wrappers are part of this analysis.
 .. _`CRT`: https://clang.llvm.org/docs/AttributeReference.html#coro-return-type
 }];
 }
+
+def CountedByDocs : Documentation {
+  let Category = DocCatField;
+  let Content = [{
+Clang supports the ``counted_by`` attribute on the flexible array member of a
+structure in C. The argument for the attribute is the name of a field member
+holding the count of elements in the flexible array. This information can be
+used to improve the results of the array bound sanitizer and the
+``__builtin_dynamic_object_size`` builtin. The ``count`` field member must be
+within the same non-anonymous, enclosing struct as the flexible array member.
+
+This example specifies that the flexible array member ``array`` has the number
+of elements allocated for it in ``count``:
+
+.. code-block:: c
+
+  struct bar;
+
+  struct foo {
+    size_t count;
+    char other;
+    struct bar *array[] __attribute__((counted_by(count)));
+  };
+
+This establishes a relationship between ``array`` and ``count``. Specifically,
+``array`` must have at least ``count`` number of elements available. It's the
+user's responsibility to ensure that this relationship is maintained through
+changes to the structure.
+
+In the following example, the allocated array erroneously has fewer elements
+than what's specified by ``p->count``. This would result in an out-of-bounds
+access not being detected.
+
+.. code-block:: c
+
+  #define SIZE_INCR 42
+
+  struct foo *p;
+
+  void foo_alloc(size_t count) {
+    p = malloc(MAX(sizeof(struct foo),
+                   offsetof(struct foo, array[0]) + count * sizeof(struct bar *)));
+    p->count = count + SIZE_INCR;
+  }
+
+The next example updates ``p->count``, but breaks the relationship requirement
+that ``p->array`` must have at least ``p->count`` number of elements available:
+
+.. code-block:: c
+
+  #define SIZE_INCR 42
+
+  struct foo *p;
+
+  void foo_alloc(size_t count) {
+    p = malloc(MAX(sizeof(struct foo),
+                   offsetof(struct foo, array[0]) + count * sizeof(struct bar *)));
+    p->count = count;
+  }
+
+  void use_foo(int index, int val) {
+    p->count += SIZE_INCR + 1; /* 'count' is now larger than the number of elements of 'array'. */
+    p->array[index] = val;     /* The sanitizer can't properly check this access. */
+  }
+
+In this example, an update to ``p->count`` maintains the relationship
+requirement:
+
+.. code-block:: c
+
+  void use_foo(int index, int val) {
+    if (p->count == 0)
+      return;
+    --p->count;
+    p->array[index] = val;
+  }
+  }];
+}
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index aebb7d9b945c33..26b7416b6e5c09 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -6441,6 +6441,19 @@ def warn_superclass_variable_sized_type_not_at_end : Warning<
   "field %0 can overwrite instance variable %1 with variable sized type %2"
   " in superclass %3">, InGroup<ObjCFlexibleArray>;
 
+def err_flexible_array_count_not_in_same_struct : Error<
+  "'counted_by' field %0 isn't within the same struct as the flexible array">;
+def err_counted_by_attr_not_on_flexible_array_member : Error<
+  "'counted_by' only applies to C99 flexible array members">;
+def err_counted_by_attr_refers_to_flexible_array : Error<
+  "'counted_by' cannot refer to the flexible array %0">;
+def err_counted_by_must_be_in_structure : Error<
+  "field %0 in 'counted_by' not inside structure">;
+def err_flexible_array_counted_by_attr_field_not_integer : Error<
+  "field %0 in 'counted_by' must be a non-boolean integer type">;
+def note_flexible_array_counted_by_attr_field : Note<
+  "field %0 declared here">;
+
 let CategoryName = "ARC Semantic Issue" in {
 
 // ARC-mode diagnostics.
diff --git a/clang/include/clang/Sema/Sema.h b/clang/include/clang/Sema/Sema.h
index 5e3b57ea33220b..40b8133202e23e 100644
--- a/clang/include/clang/Sema/Sema.h
+++ b/clang/include/clang/Sema/Sema.h
@@ -4799,6 +4799,8 @@ class Sema final {
   bool CheckAlwaysInlineAttr(const Stmt *OrigSt, const Stmt *CurSt,
                              const AttributeCommonInfo &A);
 
+  bool CheckCountedByAttr(Scope *Scope, const FieldDecl *FD);
+
   /// Adjust the calling convention of a method to be the ABI default if it
   /// wasn't specified explicitly.  This handles method types formed from
   /// function type typedefs and typename template arguments.
@@ -5642,6 +5644,7 @@ class Sema final {
                       CorrectionCandidateCallback &CCC,
                       TemplateArgumentListInfo *ExplicitTemplateArgs = nullptr,
                       ArrayRef<Expr *> Args = std::nullopt,
+                      DeclContext *LookupCtx = nullptr,
                       TypoExpr **Out = nullptr);
 
   DeclResult LookupIvarInObjCMethod(LookupResult &Lookup, Scope *S,
diff --git a/clang/include/clang/Sema/TypoCorrection.h b/clang/include/clang/Sema/TypoCorrection.h
index e0f8d152dbe554..09de164297e7ba 100644
--- a/clang/include/clang/Sema/TypoCorrection.h
+++ b/clang/include/clang/Sema/TypoCorrection.h
@@ -282,7 +282,7 @@ class CorrectionCandidateCallback {
 public:
   static const unsigned InvalidDistance = TypoCorrection::InvalidDistance;
 
-  explicit CorrectionCandidateCallback(IdentifierInfo *Typo = nullptr,
+  explicit CorrectionCandidateCallback(const IdentifierInfo *Typo = nullptr,
                                        NestedNameSpecifier *TypoNNS = nullptr)
       : Typo(Typo), TypoNNS(TypoNNS) {}
 
@@ -319,7 +319,7 @@ class CorrectionCandidateCallback {
   /// this method.
   virtual std::unique_ptr<CorrectionCandidateCallback> clone() = 0;
 
-  void setTypoName(IdentifierInfo *II) { Typo = II; }
+  void setTypoName(const IdentifierInfo *II) { Typo = II; }
   void setTypoNNS(NestedNameSpecifier *NNS) { TypoNNS = NNS; }
 
   // Flags for context-dependent keywords. WantFunctionLikeCasts is only
@@ -345,13 +345,13 @@ class CorrectionCandidateCallback {
            candidate.getCorrectionSpecifier() == TypoNNS;
   }
 
-  IdentifierInfo *Typo;
+  const IdentifierInfo *Typo;
   NestedNameSpecifier *TypoNNS;
 };
 
 class DefaultFilterCCC final : public CorrectionCandidateCallback {
 public:
-  explicit DefaultFilterCCC(IdentifierInfo *Typo = nullptr,
+  explicit DefaultFilterCCC(const IdentifierInfo *Typo = nullptr,
                             NestedNameSpecifier *TypoNNS = nullptr)
       : CorrectionCandidateCallback(Typo, TypoNNS) {}
 
@@ -365,6 +365,10 @@ class DefaultFilterCCC final : public CorrectionCandidateCallback {
 template <class C>
 class DeclFilterCCC final : public CorrectionCandidateCallback {
 public:
+  explicit DeclFilterCCC(const IdentifierInfo *Typo = nullptr,
+                         NestedNameSpecifier *TypoNNS = nullptr)
+      : CorrectionCandidateCallback(Typo, TypoNNS) {}
+
   bool ValidateCandidate(const TypoCorrection &candidate) override {
     return candidate.getCorrectionDeclAs<C>();
   }
diff --git a/clang/lib/AST/ASTImporter.cpp b/clang/lib/AST/ASTImporter.cpp
index 949310856562cd..571572fe65aecf 100644
--- a/clang/lib/AST/ASTImporter.cpp
+++ b/clang/lib/AST/ASTImporter.cpp
@@ -9021,6 +9021,10 @@ class AttrImporter {
 public:
   AttrImporter(ASTImporter &I) : Importer(I), NImporter(I) {}
 
+  // Useful for accessing the imported attribute.
+  template <typename T> T *castAttrAs() { return cast<T>(ToAttr); }
+  template <typename T> const T *castAttrAs() const { return cast<T>(ToAttr); }
+
   // Create an "importer" for an attribute parameter.
   // Result of the 'value()' of that object is to be passed to the function
   // 'importAttr', in the order that is expected by the attribute class.
@@ -9234,6 +9238,15 @@ Expected<Attr *> ASTImporter::Import(const Attr *FromAttr) {
                   From->args_size());
     break;
   }
+  case attr::CountedBy: {
+    AI.cloneAttr(FromAttr);
+    const auto *CBA = cast<CountedByAttr>(FromAttr);
+    Expected<SourceRange> SR = Import(CBA->getCountedByFieldLoc()).get();
+    if (!SR)
+      return SR.takeError();
+    AI.castAttrAs<CountedByAttr>()->setCountedByFieldLoc(SR.get());
+    break;
+  }
 
   default: {
     // The default branch works for attributes that have no arguments to import.
diff --git a/clang/lib/AST/DeclBase.cpp b/clang/lib/AST/DeclBase.cpp
index 5e03f0223d311c..e4d7169752bc85 100644
--- a/clang/lib/AST/DeclBase.cpp
+++ b/clang/lib/AST/DeclBase.cpp
@@ -29,7 +29,6 @@
 #include "clang/AST/Type.h"
 #include "clang/Basic/IdentifierTable.h"
 #include "clang/Basic/LLVM.h"
-#include "clang/Basic/LangOptions.h"
 #include "clang/Basic/Module.h"
 #include "clang/Basic/ObjCRuntime.h"
 #include "clang/Basic/PartialDiagnostic.h"
@@ -411,6 +410,79 @@ bool Decl::isFileContextDecl() const {
   return DC && DC->isFileContext();
 }
 
+bool Decl::isFlexibleArrayMemberLike(
+    ASTContext &Ctx, const Decl *D, QualType Ty,
+    LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel,
+    bool IgnoreTemplateOrMacroSubstitution) {
+  // For compatibility with existing code, we treat arrays of length 0 or
+  // 1 as flexible array members.
+  const auto *CAT = Ctx.getAsConstantArrayType(Ty);
+  if (CAT) {
+    using FAMKind = LangOptions::StrictFlexArraysLevelKind;
+
+    llvm::APInt Size = CAT->getSize();
+    if (StrictFlexArraysLevel == FAMKind::IncompleteOnly)
+      return false;
+
+    // GCC extension, only allowed to represent a FAM.
+    if (Size.isZero())
+      return true;
+
+    if (StrictFlexArraysLevel == FAMKind::ZeroOrIncomplete && Size.uge(1))
+      return false;
+
+    if (StrictFlexArraysLevel == FAMKind::OneZeroOrIncomplete && Size.uge(2))
+      return false;
+  } else if (!Ctx.getAsIncompleteArrayType(Ty)) {
+    return false;
+  }
+
+  if (const auto *OID = dyn_cast_if_present<ObjCIvarDecl>(D))
+    return OID->getNextIvar() == nullptr;
+
+  const auto *FD = dyn_cast_if_present<FieldDecl>(D);
+  if (!FD)
+    return false;
+
+  if (CAT) {
+    // GCC treats an array memeber of a union as an FAM if the size is one or
+    // zero.
+    llvm::APInt Size = CAT->getSize();
+    if (FD->getParent()->isUnion() && (Size.isZero() || Size.isOne()))
+      return true;
+  }
+
+  // Don't consider sizes resulting from macro expansions or template argument
+  // substitution to form C89 tail-padded arrays.
+  if (IgnoreTemplateOrMacroSubstitution) {
+    TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
+    while (TInfo) {
+      TypeLoc TL = TInfo->getTypeLoc();
+
+      // Look through typedefs.
+      if (TypedefTypeLoc TTL = TL.getAsAdjusted<TypedefTypeLoc>()) {
+        const TypedefNameDecl *TDL = TTL.getTypedefNameDecl();
+        TInfo = TDL->getTypeSourceInfo();
+        continue;
+      }
+
+      if (auto CTL = TL.getAs<ConstantArrayTypeLoc>()) {
+        if (const Expr *SizeExpr =
+                dyn_cast_if_present<IntegerLiteral>(CTL.getSizeExpr());
+            !SizeExpr || SizeExpr->getExprLoc().isMacroID())
+          return false;
+      }
+
+      break;
+    }
+  }
+
+  // Test that the field is the last in the structure.
+  RecordDecl::field_iterator FI(
+      DeclContext::decl_iterator(const_cast<FieldDecl *>(FD)));
+  return ++FI == FD->getParent()->field_end();
+}
+
 TranslationUnitDecl *Decl::getTranslationUnitDecl() {
   if (auto *TUD = dyn_cast<TranslationUnitDecl>(this))
     return TUD;
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index a90f92d07f86d2..b125fc676da841 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -205,85 +205,22 @@ bool Expr::isKnownToHaveBooleanValue(bool Semantic) const {
 }
 
 bool Expr::isFlexibleArrayMemberLike(
-    ASTContext &Context,
+    ASTContext &Ctx,
     LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel,
     bool IgnoreTemplateOrMacroSubstitution) const {
-
-  // For compatibility with existing code, we treat arrays of length 0 or
-  // 1 as flexible array members.
-  const auto *CAT = Context.getAsConstantArrayType(getType());
-  if (CAT) {
-    llvm::APInt Size = CAT->getSize();
-
-    using FAMKind = LangOptions::StrictFlexArraysLevelKind;
-
-    if (StrictFlexArraysLevel == FAMKind::IncompleteOnly)
-      return false;
-
-    // GCC extension, only allowed to represent a FAM.
-    if (Size == 0)
-      return true;
-
-    if (StrictFlexArraysLevel == FAMKind::ZeroOrIncomplete && Size.uge(1))
-      return false;
-
-    if (StrictFlexArraysLevel == FAMKind::OneZeroOrIncomplete && Size.uge(2))
-      return false;
-  } else if (!Context.getAsIncompleteArrayType(getType()))
-    return false;
-
   const Expr *E = IgnoreParens();
+  const Decl *D = nullptr;
 
-  const NamedDecl *ND = nullptr;
-  if (const auto *DRE = dyn_cast<DeclRefExpr>(E))
-    ND = DRE->getDecl();
-  else if (const auto *ME = dyn_cast<MemberExpr>(E))
-    ND = ME->getMemberDecl();
+  if (const auto *ME = dyn_cast<MemberExpr>(E))
+    D = ME->getMemberDecl();
+  else if (const auto *DRE = dyn_cast<DeclRefExpr>(E))
+    D = DRE->getDecl();
   else if (const auto *IRE = dyn_cast<ObjCIvarRefExpr>(E))
-    return IRE->getDecl()->getNextIvar() == nullptr;
-
-  if (!ND)
-    return false;
+    D = IRE->getDecl();
 
-  // A flexible array member must be the last member in the class.
-  // FIXME: If the base type of the member expr is not FD->getParent(),
-  // this should not be treated as a flexible array member access.
-  if (const auto *FD = dyn_cast<FieldDecl>(ND)) {
-    // GCC treats an array memeber of a union as an FAM if the size is one or
-    // zero.
-    if (CAT) {
-      llvm::APInt Size = CAT->getSize();
-      if (FD->getParent()->isUnion() && (Size.isZero() || Size.isOne()))
-        return true;
-    }
-
-    // Don't consider sizes resulting from macro expansions or template argument
-    // substitution to form C89 tail-padded arrays.
-    if (IgnoreTemplateOrMacroSubstitution) {
-      TypeSourceInfo *TInfo = FD->getTypeSourceInfo();
-      while (TInfo) {
-        TypeLoc TL = TInfo->getTypeLoc();
-        // Look through typedefs.
-        if (TypedefTypeLoc TTL = TL.getAsAdjusted<TypedefTypeLoc>()) {
-          const TypedefNameDecl *TDL = TTL.getTypedefNameDecl();
-          TInfo = TDL->getTypeSourceInfo();
-          continue;
-        }
-        if (ConstantArrayTypeLoc CTL = TL.getAs<ConstantArrayTypeLoc>()) {
-          const Expr *SizeExpr = dyn_cast<IntegerLiteral>(CTL.getSizeExpr());
-          if (!SizeExpr || SizeExpr->getExprLoc().isMacroID())
-            return false;
-        }
-        break;
-      }
-    }
-
-    RecordDecl::field_iterator FI(
-        DeclContext::decl_iterator(const_cast<FieldDecl *>(FD)));
-    return ++FI == FD->getParent()->field_end();
-  }
-
-  return false;
+  return Decl::isFlexibleArrayMemberLike(Ctx, D, E->getType(),
+                                         StrictFlexArraysLevel,
+                                         IgnoreTemplateOrMacroSubstitution);
 }
 
 const ValueDecl *
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 5081062da2862e..f43c7a0b53d953 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -25,6 +25,7 @@
 #include "clang/AST/Attr.h"
 #include "clang/AST/Decl.h"
 #include "clang/AST/OSLog.h"
+#include "clang/AST/OperationKinds.h"
 #include "clang/Basic/TargetBuiltins.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Basic/TargetOptions.h"
@@ -818,6 +819,189 @@ CodeGenFunction::evaluateOrEmitBuiltinObjectSize(const Expr *E, unsigned Type,
   return ConstantInt::get(ResType, ObjectSize, /*isSigned=*/true);
 }
 
+const FieldDecl *CodeGenFunction::FindFlexibleArrayMemberField(
+    ASTContext &Ctx, const RecordDecl *RD, uint64_t &Offset) {
+  const LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
+      getLangOpts().getStrictFlexArraysLevel();
+  unsigned FieldNo = 0;
+
+  for (const Decl *D : RD->decls()) {
+    if (const auto *Field = dyn_cast<FieldDecl>(D);
+        Field && Decl::isFlexibleArrayMemberLike(
+                     Ctx, Field, Field->getType(), StrictFlexArraysLevel,
+                     /*IgnoreTemplateOrMacroSubstitution=*/true)) {
+      const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
+      Offset += Layout.getFieldOffset(FieldNo);
+      return Field;
+    }
+
+    if (const auto *Record = dyn_cast<RecordDecl>(D))
+      if (const FieldDecl *Field =
+              FindFlexibleArrayMemberField(Ctx, Record, Offset)) {
+        const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
+        Offset += Layout.getFieldOffset(FieldNo);
+        return Field;
+      }
+
+    if (isa<FieldDecl>(D))
+      ++FieldNo;
+  }
+
+  return nullptr;
+}
+
+llvm::Value *
+CodeGenFunction::emitFlexibleArrayMemberSize(const Expr *E, unsigned Type,
+                                             llvm::IntegerType *ResType) {
+  // The code generated here calculates the size of a struct with a flexible
+  // array member that uses the counted_by attribute. There are two instances
+  // we handle:
+  //
+  //       struct s {
+  //         unsigned long flags;
+  //         int count;
+  //         int array...
[truncated]

@bwendling
Copy link
Collaborator Author

bwendling commented Dec 27, 2023

Some (hopefully) helpful comments on this admittedly very large patch.

General Comment:

I pulled back on allowing the count to be in a non-anonymous struct that doesn't contain the flexible array. So this is no longer allowed:

struct count_not_in_substruct {
  unsigned long flags;
  unsigned char count; // expected-note {{field 'count' declared here}}
  struct A {
    int dummy;
    int array[] __counted_by(count); // expected-error {{'counted_by' field 'count' isn't within the same struct as the flexible array}}
  } a;
};

This should align with @rapidsna's bounds checking work and get rid of the "different behaviors for a pointer and non-pointer" that we had in the previous PR (#73730 (comment)).

SEMA:

The SEMA code is a bit different from the previous PR. Most notable is that it's not using the SEMA "lookup" utility the same way. There's no clear way to prevent lookup from climbing from the current scope up through the top-level scope. That leads to false negatives, duplicate warnings, and other horrors. Also, the Scope for each sub-structure is deleted by the time the entire structure's parsing is completed, making checking the entire struct after parsing more difficult. And the isAnonymousStructOrUnion flag isn't set during SEMA.

All of these combined to make the use of the "lookup" utility very frustrating. There are probably many improvements that could be made to the lookup, but I'd prefer to do those in a follow-up patch.

CodeGen:

One observation: the getNonTransparentContext() (sp?) method doesn't appear to work. In particular, if called on an anonymous struct, it will return that struct and not the enclosing "non-transparent" one. I may be misunderstanding what's meant by non-transparent in this case.

There are two ways that the counted_by attribute are used during code generation: for sanitizer checks, and with __builtin_dynamic_object_size().

  • Sanitizer check: This uses @rjmccall's suggestion ([Clang] Generate the GEP instead of adding AST nodes #73730 (comment)) of calculating a byte offset from the flexible array to the "count." It works very well (barring some issues with negative offsets being represented by an unsigned ~1U value). The only bit of ugliness was determining the offsets of fields that aren't in the top-level of a RecordDecl. (I'm curious why this hasn't come up before, but I digress.) I created getFieldOffsetInBits() to do this. It's not quite a hack, but could probably be a bit more elegant. (For instance, there could be a RecordLayout method indicating if a FieldDecl exists in the RecordDecl, but that wouldn't take sub-RecordDecls into account, and so on, and so on...)

  • __builtin_dynamic_object_size(): Side-effects aren't allowed with this builtin. So we can work only with pointers that have either already been emitted or that have no side-effects. The "visitor" tries to find the "base" of the expression. If it's a DeclRefExpr, we should be able to emit a load of that and continue. If not, we look at LocalDeclMap to see if the base is already available. If not, we check to see if it has side-effects and emit it if it doesn't. Assuming one of those three happen, we grab the indices to the count and build a series of GEPs to them (which seems to be how Clang generates GEPs to nested fields). Again, there didn't appear to be an existing method to get these indices. (I assume that the general case isn't as easy as when working with C structures.)

So that's basically it. I have a program on my machine that runs most of the examples in the test/CodeGen/attr-counted-by.c and it works well. I don't know if there's a unittest like place that runs examples such as these, but I'd be happy to include that test there.

Thank you all for taking the time to review this!

Share and enjoy!

@bwendling
Copy link
Collaborator Author

Pinging for after-holidays visibility. :-)

clang/include/clang/Sema/TypoCorrection.h Show resolved Hide resolved
Comment on lines 822 to 851
const FieldDecl *CodeGenFunction::FindFlexibleArrayMemberField(
ASTContext &Ctx, const RecordDecl *RD, uint64_t &Offset) {
const LangOptions::StrictFlexArraysLevelKind StrictFlexArraysLevel =
getLangOpts().getStrictFlexArraysLevel();
unsigned FieldNo = 0;

for (const Decl *D : RD->decls()) {
if (const auto *Field = dyn_cast<FieldDecl>(D);
Field && Decl::isFlexibleArrayMemberLike(
Ctx, Field, Field->getType(), StrictFlexArraysLevel,
/*IgnoreTemplateOrMacroSubstitution=*/true)) {
const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
Offset += Layout.getFieldOffset(FieldNo);
return Field;
}

if (const auto *Record = dyn_cast<RecordDecl>(D))
if (const FieldDecl *Field =
FindFlexibleArrayMemberField(Ctx, Record, Offset)) {
const ASTRecordLayout &Layout = Ctx.getASTRecordLayout(RD);
Offset += Layout.getFieldOffset(FieldNo);
return Field;
}

if (isa<FieldDecl>(D))
++FieldNo;
}

return nullptr;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FindFieldInTopLevelOrAnonymousStruct looks somewhat similar. Any chance to reuse code more between the two?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, and they're just differently enough to not be too compatible. The best(?) we could do would be to have a generic "deep" iterator through all of the fields in a struct. (As it is, the iterators are only for the surface level.)

In fact, I think there should be methods added to ASTRecordLayout / CGRecordLayout to iterate through more than just the first-level decls. I'd like to do that as a follow-up commit though.

Comment on lines +1040 to +1041
if (Value *V = emitFlexibleArrayMemberSize(E, Type, ResType))
return V;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should emitFlexibleArrayMemberSize be made infallible (via asserts rather than early returns of nullptr)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over the weekend, I thought about emitting a -1 / 0 in the cases where we weren't able to generate the BDOS calculation. Thoughts?

Copy link
Contributor

@rapidsna rapidsna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Looking good overall but LocalDeclMap doesn't seem to serve the purpose for this PR. And I'm not sure the code block using it would be actually necessary. Please see my inlined review.

clang/lib/CodeGen/CGExpr.cpp Show resolved Hide resolved
clang/lib/CodeGen/CGBuiltin.cpp Show resolved Hide resolved
clang/lib/CodeGen/CGExpr.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGExpr.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGExpr.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved
clang/lib/CodeGen/CGBuiltin.cpp Show resolved Hide resolved
clang/lib/CodeGen/CGExpr.cpp Show resolved Hide resolved
@kees
Copy link
Contributor

kees commented Jan 8, 2024

Possibly due to bug #72032 , I can get this tree to crash using the latest array-bounds.c test from
https://github.com/kees/kernel-tools/tree/trunk/fortify

Specifically:

struct annotated {
        unsigned long flags;
        long count;
        int array[] __counted_by(count);
};

struct composite {
        unsigned stuff;
        struct annotated inner;
};

static struct composite * noinline alloc_composite(int index)
{
        struct composite *p;

        p = malloc(sizeof(*p) + index * sizeof(*p->inner.array));
        p->inner.count = index;

        return p;
}

       struct composite *c;
       c = alloc_composite(index);
       ... actions on c->inner.array ...
3.      array-bounds.c:363:1 <Spelling=array-bounds.c:363:6>: Generating code for declaration 'counted_by_seen_by_bdos'
4.      array-bounds.c:405:2 <Spelling=array-bounds.c:23:32>: LLVM IR generation of compound statement ('{}')
...
 #4 0x0000556574d5b858 clang::CodeGen::CodeGenTBAA::getAccessInfo(clang::QualType)
 #5 0x000055657489e25d clang::CodeGen::CodeGenModule::getTBAAAccessInfo(clang::QualType)
 #6 0x00005565748a9c20 clang::CodeGen::CodeGenModule::getNaturalTypeAlignment(clang::QualType, clang::CodeGen::LValueBaseInfo*, clang::CodeGen::TBAAAccessInfo*, bool)
 #7 0x00005565749fc4a2 EmitPointerWithAlignment(clang::Expr const*, clang::CodeGen::LValueBaseInfo*, clang::CodeGen::TBAAAccessInfo*, clang::CodeGen::KnownNonNull_t, clang::CodeGen::CodeGenFunction&) CGExpr.cpp:0:0
 #8 0x00005565749f94bd clang::CodeGen::CodeGenFunction::EmitCountedByFieldExpr(clang::Expr const*, clang::FieldDecl const*, clang::FieldDecl const*)

@bwendling
Copy link
Collaborator Author

@kees Should be fixed now.

…le the types ourselves throughout the function.
@kees
Copy link
Contributor

kees commented Jan 9, 2024

Thanks! The update fixes the anon struct issue I hit. I've found one more issue, though this appears to be a miscalculation with a pathological count value (i.e. count has a signed type and contains a negative value):

struct annotated {
    unsigned long flags;
    int count;
    int array __counted_by(count);
};

static struct annotated * noinline alloc_annotated(int index)
{
        struct annotated *p;

        p = malloc(sizeof(*p) + index * sizeof(*p->array));
        p->count = index;

        return p;
}

...
       struct annotated *a;

       c = alloc_annotated(index);
       c->count = -1;
       printf("%zu\n", __builtin_dynamic_object_size(p->array, 1));

This prints a wrapped calculation instead of the expected "0".

@rapidsna
Copy link
Contributor

rapidsna commented Jan 9, 2024

This prints a wrapped calculation instead of the expected "0".

@kees Shouldn't getDefaultBuiltinObjectSizeResult return -1? Have you tried c->count = -2 to see if it's returning the negative count value or it happens to be equal to the default value?

@kees
Copy link
Contributor

kees commented Jan 9, 2024

This prints a wrapped calculation instead of the expected "0".

Shouldn't getDefaultBuiltinObjectSizeResult return -1? Have you tried -2 to see if it's returning the negative count value or it happens to be equal to the default value?

Well, maybe it's not wrapped, but it should absolutely return 0, not -1. The return value of -1 means "I don't know how large this is". In this case we do know (there is an associated counted_by variable), but the value is nonsense, so we must return 0 so that anything checking lengths will not write anything to the array.

@rapidsna
Copy link
Contributor

rapidsna commented Jan 9, 2024

but the value is nonsense, so we must return 0 so that anything checking lengths will not write anything to the array.

@kees Oh, I see. I did not know such the convention but it makes sense. Is it documented somewhere?

@kees
Copy link
Contributor

kees commented Jan 9, 2024

but the value is nonsense, so we must return 0 so that anything checking lengths will not write anything to the array.

@kees Oh, I see. I did not know such the convention but it makes sense. Is it documented somewhere?

This is new territory (having a multiplier for finding size that may be negative), so there's nothing to document it beyond FORTIFY users needing to maintain safe checks. The only safe size to return for "impossible size" is 0 in this case, otherwise a confused state (negative count) can lead to FORTIFY bypasses.

Before counted_by there were only 2 possible states for __bdos: 1) I know the size exactly, 2) I have no information with which to calculate the size (return maximum possible size to avoid blocking writes). With counted_by allowing signed variables (which is a requirement for adoption), there is a new state: 3) I know how to calculate the size but the result is nonsensical. This last state needs to return 0 to fail safe.

…e value is unknown. In the negative bounds case that isn't the case.
@bwendling
Copy link
Collaborator Author

@kees, the issue should be fixed with the newest push.

@bwendling bwendling deleted the counted-by-attribute branch January 10, 2024 23:21
nico added a commit that referenced this pull request Jan 11, 2024
This reverts commit fefdef8.

Breaks check-clang, see
#76348 (comment)

Also revert follow-on "[Clang] Update 'counted_by' documentation"

This reverts commit 4a3fb9c.
@nico
Copy link
Contributor

nico commented Jan 11, 2024

Reverted in 2dce772 for now.

bwendling added a commit that referenced this pull request Jan 11, 2024
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
@bwendling
Copy link
Collaborator Author

Resubmitted with fix. Sorry about the failures:

To https://github.com/llvm/llvm-project.git
   e8790027b169..164f85db876e  main -> main

t-rasmud added a commit that referenced this pull request Jan 16, 2024
bwendling added a commit that referenced this pull request Jan 16, 2024
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
@seanm
Copy link

seanm commented Jan 20, 2024

@bwendling is there any plan / possibility for simple expressions (with no side effects)? Like:

struct libusb_bos_dev_capability_descriptor {
	uint8_t  bLength;
	uint8_t  bDescriptorType;
	uint8_t  bDevCapabilityType;
	uint8_t  dev_capability_data[] __attribute__((counted_by(bLength - 3)));
};

@bwendling
Copy link
Collaborator Author

@bwendling is there any plan / possibility for simple expressions (with no side effects)? Like:

struct libusb_bos_dev_capability_descriptor {
	uint8_t  bLength;
	uint8_t  bDescriptorType;
	uint8_t  bDevCapabilityType;
	uint8_t  dev_capability_data[] __attribute__((counted_by(bLength - 3)));
};

Not right now. Apple is in the process of expanding the bounds checking code beyond checking only flexible array members. We will most likely follow their lead on constexpr-like expressions. But the utility of such a feature would need to be explored.

@seanm
Copy link

seanm commented Jan 22, 2024

@bwendling thanks for your reply. No idea how representative it is of other projects, but 3 of the 6 flexible arrays in libusb structures have these kinds of non-trivial counts.

The GCC folks seem to have considered adding such support, but it's not clear what, if anything, they decided.

@bwendling
Copy link
Collaborator Author

@bwendling thanks for your reply. No idea how representative it is of other projects, but 3 of the 6 flexible arrays in libusb structures have these kinds of non-trivial counts.

Can you point them out?

The GCC folks seem to have considered adding such support, but it's not clear what, if anything, they decided.

I know about that. They haven't made a decision on it as far as I know.

@seanm
Copy link

seanm commented Jan 22, 2024

Can you point them out?

Certainly. Here: libusb/libusb@28394f4

@rapidsna
Copy link
Contributor

@seanm Apple's -fbounds-safety model will allow arithmetic expressions to be used for 'counted_by' attribute: https://github.com/llvm/llvm-project/blob/main/clang/docs/BoundsSafety.rst#external-bounds-annotations

We are currently working on extending the 'counted_by' attribute argument to be parsed as an expression: #78000. While this patch will initially restrict the count argument to be "identification" only as it currently works (there's more work to do to actually allow an arbitrary expression), this will definitely help extend the model to support your use cases in the future.

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
This reverts commit fefdef8.

Breaks check-clang, see
llvm#76348 (comment)

Also revert follow-on "[Clang] Update 'counted_by' documentation"

This reverts commit 4a3fb9c.
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member holding the
count of elements in the flexible array. This information is used to
improve the results of the array bound sanitizer and the
'__builtin_dynamic_object_size' builtin. The 'count' field member must
be within the same non-anonymous, enclosing struct as the flexible array
member. For example:

```
  struct bar;
  struct foo {
    int count;
    struct inner {
      struct {
        int count; /* The 'count' referenced by 'counted_by' */
      };
      struct {
        /* ... */
        struct bar *array[] __attribute__((counted_by(count)));
      };
    } baz;
  };
```

This example specifies that the flexible array member 'array' has the
number of elements allocated for it in 'count':

```
  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };
```

This establishes a relationship between 'array' and 'count';
specifically that 'p->array' must have *at least* 'p->count' number of
elements available. It's the user's responsibility to ensure that this
relationship is maintained throughout changes to the structure.

In the following, the allocated array erroneously has fewer elements
than what's specified by 'p->count'. This would result in an
out-of-bounds access not not being detected:

```
  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }
```

The next example updates 'p->count', breaking the relationship
requirement that 'p->array' must have at least 'p->count' number of
elements available:

```
  void use_foo(int index, int val) {
    p->count += 42;
    p->array[index] = val; /* The sanitizer can't properly check this access */
  }
```

In this example, an update to 'p->count' maintains the relationship
requirement:

```
  void use_foo(int index, int val) {
    if (p->count == 0)
      return;
    --p->count;
    p->array[index] = val;
  }
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants