Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][dataflow] Correctly handle InitListExpr of union type. #82348

Merged
merged 2 commits into from
Feb 21, 2024

Conversation

martinboehme
Copy link
Contributor

No description provided.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Feb 20, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 20, 2024

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/82348.diff

5 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (+6-3)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+14-4)
  • (modified) clang/lib/Analysis/FlowSensitive/Transfer.cpp (+15-11)
  • (modified) clang/unittests/Analysis/FlowSensitive/TestingSupport.h (+19)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+12-2)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
index 0aecc749bf415c..b3dc940705f870 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
@@ -753,9 +753,12 @@ RecordStorageLocation *getImplicitObjectLocation(const CXXMemberCallExpr &MCE,
 RecordStorageLocation *getBaseObjectLocation(const MemberExpr &ME,
                                              const Environment &Env);
 
-/// Returns the fields of `RD` that are initialized by an `InitListExpr`, in the
-/// order in which they appear in `InitListExpr::inits()`.
-std::vector<FieldDecl *> getFieldsForInitListExpr(const RecordDecl *RD);
+/// Returns the fields of a `RecordDecl` that are initialized by an
+/// `InitListExpr`, in the order in which they appear in
+/// `InitListExpr::inits()`.
+/// `Init->getType()` must be a record type.
+std::vector<const FieldDecl *>
+getFieldsForInitListExpr(const InitListExpr *InitList);
 
 /// Associates a new `RecordValue` with `Loc` and returns the new value.
 RecordValue &refreshRecordValue(RecordStorageLocation &Loc, Environment &Env);
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index d487944ce92111..0cfc26ea952cda 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -361,8 +361,8 @@ getFieldsGlobalsAndFuncs(const Stmt &S, FieldSet &Fields,
     if (const auto *FD = dyn_cast<FieldDecl>(VD))
       Fields.insert(FD);
   } else if (auto *InitList = dyn_cast<InitListExpr>(&S)) {
-    if (RecordDecl *RD = InitList->getType()->getAsRecordDecl())
-      for (const auto *FD : getFieldsForInitListExpr(RD))
+    if (InitList->getType()->isRecordType())
+      for (const auto *FD : getFieldsForInitListExpr(InitList))
         Fields.insert(FD);
   }
 }
@@ -1104,12 +1104,22 @@ RecordStorageLocation *getBaseObjectLocation(const MemberExpr &ME,
   return Env.get<RecordStorageLocation>(*Base);
 }
 
-std::vector<FieldDecl *> getFieldsForInitListExpr(const RecordDecl *RD) {
+std::vector<const FieldDecl *>
+getFieldsForInitListExpr(const InitListExpr *InitList) {
+  const RecordDecl *RD = InitList->getType()->getAsRecordDecl();
+  assert(RD != nullptr);
+
+  std::vector<const FieldDecl *> Fields;
+
+  if (InitList->getType()->isUnionType()) {
+    Fields.push_back(InitList->getInitializedFieldInUnion());
+    return Fields;
+  }
+
   // Unnamed bitfields are only used for padding and do not appear in
   // `InitListExpr`'s inits. However, those fields do appear in `RecordDecl`'s
   // field list, and we thus need to remove them before mapping inits to
   // fields to avoid mapping inits to the wrongs fields.
-  std::vector<FieldDecl *> Fields;
   llvm::copy_if(
       RD->fields(), std::back_inserter(Fields),
       [](const FieldDecl *Field) { return !Field->isUnnamedBitfield(); });
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index fe13e919bddcd8..e793e8fb593dc3 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -663,14 +663,7 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
   void VisitInitListExpr(const InitListExpr *S) {
     QualType Type = S->getType();
 
-    if (Type->isUnionType()) {
-      // FIXME: Initialize unions properly.
-      if (auto *Val = Env.createValue(Type))
-        Env.setValue(*S, *Val);
-      return;
-    }
-
-    if (!Type->isStructureOrClassType()) {
+    if (!Type->isRecordType()) {
       // Until array initialization is implemented, we don't need to care about
       // cases where `getNumInits() > 1`.
       if (S->getNumInits() == 1)
@@ -688,10 +681,9 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
     llvm::DenseMap<const ValueDecl *, StorageLocation *> FieldLocs;
 
     // This only contains the direct fields for the given type.
-    std::vector<FieldDecl *> FieldsForInit =
-        getFieldsForInitListExpr(Type->getAsRecordDecl());
+    std::vector<const FieldDecl *> FieldsForInit = getFieldsForInitListExpr(S);
 
-    // `S->inits()` contains all the initializer epressions, including the
+    // `S->inits()` contains all the initializer expressions, including the
     // ones for direct base classes.
     auto Inits = S->inits();
     size_t InitIdx = 0;
@@ -731,6 +723,18 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
       FieldLocs.insert({Field, &Loc});
     }
 
+    // In the case of a union, we don't in general have initializers for all
+    // of the fields. Create storage locations for the remaining fields (but
+    // don't associate them with values).
+    if (Type->isUnionType()) {
+      for (const FieldDecl *Field :
+           Env.getDataflowAnalysisContext().getModeledFields(Type)) {
+        if (!FieldLocs.contains(Field))
+          FieldLocs.insert(
+              {Field, &Env.createStorageLocation(Field->getType())});
+      }
+    }
+
     // Check that we satisfy the invariant that a `RecordStorageLoation`
     // contains exactly the set of modeled fields for that type.
     // `ModeledFields` includes fields from all the bases, but only the
diff --git a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
index 0d36d2802897fd..b7cf6cc966edb0 100644
--- a/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
+++ b/clang/unittests/Analysis/FlowSensitive/TestingSupport.h
@@ -432,6 +432,8 @@ llvm::Error checkDataflowWithNoopAnalysis(
         {});
 
 /// Returns the `ValueDecl` for the given identifier.
+/// The returned pointer is guaranteed to be non-null; the function asserts if
+/// no `ValueDecl` with the given name is found.
 ///
 /// Requirements:
 ///
@@ -475,6 +477,15 @@ ValueT &getValueForDecl(ASTContext &ASTCtx, const Environment &Env,
   return *cast<ValueT>(Env.getValue(*VD));
 }
 
+/// Returns the storage location for the field called `Name` of `Loc`.
+/// Optionally casts the field storage location to `T`.
+template <typename T = StorageLocation>
+std::enable_if_t<std::is_base_of_v<StorageLocation, T>, T &>
+getFieldLoc(const RecordStorageLocation &Loc, llvm::StringRef Name,
+            ASTContext &ASTCtx) {
+  return *cast<T>(Loc.getChild(*findValueDecl(ASTCtx, Name)));
+}
+
 /// Returns the value of a `Field` on the record referenced by `Loc.`
 /// Returns null if `Loc` is null.
 inline Value *getFieldValue(const RecordStorageLocation *Loc,
@@ -487,6 +498,14 @@ inline Value *getFieldValue(const RecordStorageLocation *Loc,
   return Env.getValue(*FieldLoc);
 }
 
+/// Returns the value of a `Field` on the record referenced by `Loc.`
+/// Returns null if `Loc` is null.
+inline Value *getFieldValue(const RecordStorageLocation *Loc,
+                            llvm::StringRef Name, ASTContext &ASTCtx,
+                            const Environment &Env) {
+  return getFieldValue(Loc, *findValueDecl(ASTCtx, Name), Env);
+}
+
 /// Creates and owns constraints which are boolean values.
 class ConstraintContext {
   unsigned NextAtom = 0;
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index a65b0446ac7818..e7d74581865a32 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -2377,14 +2377,24 @@ TEST(TransferTest, InitListExprAsUnion) {
       } F;
 
      public:
-      constexpr target() : F{nullptr} {}
+      constexpr target() : F{nullptr} {
+        int *null = nullptr;
+        F.b;  // Make sure we reference 'b' so it is modeled.
+        // [[p]]
+      }
     };
   )cc";
   runDataflow(
       Code,
       [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
          ASTContext &ASTCtx) {
-        // Just verify that it doesn't crash.
+        const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
+
+        auto &FLoc = getFieldLoc<RecordStorageLocation>(
+            *Env.getThisPointeeStorageLocation(), "F", ASTCtx);
+        auto *AVal = cast<PointerValue>(getFieldValue(&FLoc, "a", ASTCtx, Env));
+        ASSERT_EQ(AVal, &getValueForDecl<PointerValue>(ASTCtx, Env, "null"));
+        ASSERT_EQ(getFieldValue(&FLoc, "b", ASTCtx, Env), nullptr);
       });
 }
 

const RecordDecl *RD = InitList->getType()->getAsRecordDecl();
assert(RD != nullptr);

std::vector<const FieldDecl *> Fields;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a reserve call for this vector would be beneficial. Even if we do not know the exact number of fields we want to insert, probably relatively few of them are unnamed bit fields in most code bases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Unfortunately, RecordDecl doesn't have a way (that I know of) to query the number of fields. There's no getNumFields() or similar, and field_end()-field_begin() also doesn't work (because the iterator doesn't support operator-. So we'd have to do a complete iteration through the fields to determine the number, and at that point, I think we're getting into diminishing returns.

for (const FieldDecl *Field :
Env.getDataflowAnalysisContext().getModeledFields(Type)) {
if (!FieldLocs.contains(Field))
FieldLocs.insert(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we could save a lookup if we did something like

if (auto [it, inserted] = FieldLocs.insert({Field, nullptr}); inserted)
  it->second = &Env.createStorageLocation(Field->getType());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea -- done!

@martinboehme
Copy link
Contributor Author

CI failure appears to be an infrastructure issue. Patch cleanly builds and tests locally.

@martinboehme martinboehme merged commit 4725993 into llvm:main Feb 21, 2024
3 of 4 checks passed
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Feb 26, 2024
This fixes a crash introduced by llvm#82348
but also adds additional handling to make sure that we treat empty initializer
lists for both unions and structs/classes correctly (see tests added in this
patch).
martinboehme pushed a commit that referenced this pull request Feb 26, 2024
…pe." (#82856)

Reverts #82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.

```cc
union U {
  double double_value;
  int int_value;
};

void target() {
  U value;
  value = {};
}
```

Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
martinboehme added a commit to martinboehme/llvm-project that referenced this pull request Feb 27, 2024
This fixes a crash introduced by llvm#82348
but also adds additional handling to make sure that we treat empty initializer
lists for both unions and structs/classes correctly (see tests added in this
patch).
martinboehme added a commit that referenced this pull request Mar 1, 2024
…#82986)

This fixes a crash introduced by
#82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants