Skip to content

[clang][dataflow] Handle CXXInheritedCtorInitExpr in ResultObjectVisitor. #99616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 26, 2024

Conversation

Pask00
Copy link
Contributor

@Pask00 Pask00 commented Jul 19, 2024

CXXInheritedCtorInitExpr is another of the node kinds that should be considered an "original initializer". An assertion failure in assert(Children.size() == 1) happens without this fix.

…tor.

`CXXInheritedCtorInitExpr` is another of the node kinds that should be considered an "original initializer". An assertion failure in `assert(Children.size() == 1)` happens without this fix.
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Jul 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: Pasquale Riello (Pask00)

Changes

CXXInheritedCtorInitExpr is another of the node kinds that should be considered an "original initializer". An assertion failure in assert(Children.size() == 1) happens without this fix.


Full diff: https://github.com/llvm/llvm-project/pull/99616.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+1-1)
  • (modified) clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp (+40)
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index f734168e647bd..3307981ce7e0f 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -416,7 +416,7 @@ class ResultObjectVisitor : public AnalysisASTVisitor<ResultObjectVisitor> {
     // below them can initialize the same object (or part of it).
     if (isa<CXXConstructExpr>(E) || isa<CallExpr>(E) || isa<LambdaExpr>(E) ||
         isa<CXXDefaultArgExpr>(E) || isa<CXXStdInitializerListExpr>(E) ||
-        isa<AtomicExpr>(E) ||
+        isa<AtomicExpr>(E) || isa<CXXInheritedCtorInitExpr>(E) ||
         // We treat `BuiltinBitCastExpr` as an "original initializer" too as
         // it may not even be casting from a record type -- and even if it is,
         // the two objects are in general of unrelated type.
diff --git a/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp b/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
index a4ac597bb06d6..8481026f7c1fc 100644
--- a/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
@@ -442,6 +442,46 @@ TEST_F(EnvironmentTest, CXXDefaultInitExprResultObjIsWrappedExprResultObj) {
             &Env.getResultObjectLocation(*DefaultInit->getExpr()));
 }
 
+TEST_F(EnvironmentTest, ResultObjectLocationForInheritedCtorInitExpr) {
+  using namespace ast_matchers;
+
+  std::string Code = R"(
+    struct Base {
+      Base(int b) {}
+    };
+    struct Derived : Base {
+      using Base::Base;
+    };
+
+    Derived d = Derived(0);
+  )";
+
+  auto Unit =
+      tooling::buildASTFromCodeWithArgs(Code, {"-fsyntax-only", "-std=c++20"});
+  auto &Context = Unit->getASTContext();
+
+  ASSERT_EQ(Context.getDiagnostics().getClient()->getNumErrors(), 0U);
+
+  auto Results =
+      match(cxxConstructorDecl(
+                hasAnyConstructorInitializer(cxxCtorInitializer(
+                    withInitializer(expr().bind("inherited_ctor_init_expr")))))
+                .bind("ctor"),
+            Context);
+  const auto *Constructor = selectFirst<CXXConstructorDecl>("ctor", Results);
+  const auto *InheritedCtorInit =
+      selectFirst<CXXInheritedCtorInitExpr>("inherited_ctor_init_expr", Results);
+
+  // Verify that `inherited_ctor_init_expr` has no children.
+  ASSERT_EQ(InheritedCtorInit->child_begin(), InheritedCtorInit->child_end());
+
+  Environment Env(DAContext, *Constructor);
+  Env.initialize();
+
+  RecordStorageLocation &Loc = Env.getResultObjectLocation(*InheritedCtorInit);
+  ASSERT_NE(&Loc, nullptr);
+}
+
 TEST_F(EnvironmentTest, Stmt) {
   using namespace ast_matchers;
 

Copy link
Contributor

@martinboehme martinboehme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on!

Looks good apart from the fact that the test should be in a different file (and done slightly differently, see detailed comment in the code).

@@ -442,6 +442,46 @@ TEST_F(EnvironmentTest, CXXDefaultInitExprResultObjIsWrappedExprResultObj) {
&Env.getResultObjectLocation(*DefaultInit->getExpr()));
}

TEST_F(EnvironmentTest, ResultObjectLocationForInheritedCtorInitExpr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to put tests for getResultObjectLocation() in TransferTest. See an example for AtomicExpr here, which you can also use as a model for a test for CXXInheritedCtorInitExpr.

Doing this in TransferTest.cpp has the benefit that the test is smaller, plus you can make the assertion stronger (see the test for AtomicExpr linked above which checks the exact value returned by getResultObjectLocation() instead of just testing that it is non-null).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at this Martin!

I tried doing that, but in that way I can't have access to the right Environment.
The problem is that the specific part of the AST I am interested to (a CXXConstructorDecl) is implicitly "generated" (because it is an inherited constructor), so I can't put an annotation inside of it.
I may be completely wrong, but I think that in this case the only way is to explicitly initialize the environment as it is usually done in EnvironmentTest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the specific part of the AST I am interested to (a CXXConstructorDecl) is implicitly "generated" (because it is an inherited constructor), so I can't put an annotation inside of it.

Ah, thanks for the explanation, I understand now. Yes, agree that you can't do this in the way we do other tests for getResultObjectLocation() in TransferTest.cpp.

Can you add a comment to the test that explains this (and points out that most of the tests for getResultObjectLocation() are in TransferTest.cpp)? I'd like to avoid other tests for getResultObjectLocation() getting added next to this test just because it looks like the right place for them.

@@ -442,6 +442,46 @@ TEST_F(EnvironmentTest, CXXDefaultInitExprResultObjIsWrappedExprResultObj) {
&Env.getResultObjectLocation(*DefaultInit->getExpr()));
}

TEST_F(EnvironmentTest, ResultObjectLocationForInheritedCtorInitExpr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the specific part of the AST I am interested to (a CXXConstructorDecl) is implicitly "generated" (because it is an inherited constructor), so I can't put an annotation inside of it.

Ah, thanks for the explanation, I understand now. Yes, agree that you can't do this in the way we do other tests for getResultObjectLocation() in TransferTest.cpp.

Can you add a comment to the test that explains this (and points out that most of the tests for getResultObjectLocation() are in TransferTest.cpp)? I'd like to avoid other tests for getResultObjectLocation() getting added next to this test just because it looks like the right place for them.

auto Results =
match(cxxConstructorDecl(
hasAnyConstructorInitializer(cxxCtorInitializer(
withInitializer(expr().bind("inherited_ctor_init_expr")))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
withInitializer(expr().bind("inherited_ctor_init_expr")))))
withInitializer(cxxInheritedCtorInitExpr().bind("inherited_ctor_init_expr")))))

to make this more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be perfect, but actually there are no matchers for constructor initializer expressions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I specifically checked ASTMatchers.h for this and could have sworn I found this, but on looking again, I see that it doesn't exist.

Co-authored-by: martinboehme <mboehme@google.com>
@haoNoQ
Copy link
Collaborator

haoNoQ commented Jul 24, 2024

I have a high-level question not directly related to the patch. We have probably even talked about it at a conference a few years ago but I don't remember 😅

ResultObjectVisitor

I briefly looked at the implementation and I suspect that you folks might be reinventing ConstructionContext which is typically¹ already available in the CFG. (Under an off-by-default CFG build option.) (I'm in favor of turning it on by default.) (Together with a few other options that improve precision, eg. copy elision and RVO support.)

__
¹ Not all of the 30+ possible construction contexts are currently implemented. But your visitor doesn't seem to implement them either.

@paulsemel paulsemel merged commit 49cb170 into llvm:main Jul 26, 2024
7 checks passed
Copy link

@Pask00 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested
by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as
the builds can include changes from many authors. It is not uncommon for your
change to be included in a build that fails due to someone else's changes, or
infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself.
This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

@martinboehme
Copy link
Contributor

I have a high-level question not directly related to the patch. We have probably even talked about it at a conference a few years ago but I don't remember 😅

I don't recall -- maybe you talked to someone else?

ResultObjectVisitor

I briefly looked at the implementation and I suspect that you folks might be reinventing ConstructionContext which is typically¹ already available in the CFG. (Under an off-by-default CFG build option.) (I'm in favor of turning it on by default.) (Together with a few other options that improve precision, eg. copy elision and RVO support.)

__ ¹ Not all of the 30+ possible construction contexts are currently implemented. But your visitor doesn't seem to implement them either.

Thanks for the info -- I wasn't aware of ConstructionContext. Will take a look. Can you point me to some examples of how this is used?

Agree that we should use ConstructionContext in favor of ResultObjectVisitor if feasible.

@haoNoQ
Copy link
Collaborator

haoNoQ commented Jul 31, 2024

Thanks for the info -- I wasn't aware of ConstructionContext. Will take a look. Can you point me to some examples of how this is used?

The initial motivation was here: https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L130 - this is the place where the clang static analyzer's path-sensitive engine looks at the CXXConstructExpr in the CFG and utilizes the additional information to foresee the storage in which the prvalue is about to be constructed. So that when the constructor call is modeled, we knew what the this value is. Which, yeah, naturally allows us to map every field of the object to the field in the storage, if necessary. Or just to model the value of this if the code ever tries to use it directly.

Additionally, it allows us to keep track of that storage location for the purposes of future events (see the next function at https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L408 where we write the evaluated MemRegion * back into our abstract state). This is particularly useful for temporary destructors in cases like this:

foo(x ? MyClass(1, 2, 3) : MyClass(4, 5, 6));

where the CFG looks like a double diamond: we may construct one of the two objects, we may destroy one of the two objects, but we merge in the middle in order to call foo() unconditionally. So you need to figure out at runtime which object gets destroyed. And by the time we reach the destructor at the end of the full-expression (basically ExprWithCleanups) we already forgot which of these two objects we needed to destroy. There's no longer any expression that evaluates to the location of that object at this point, so if we don't keep track of it separately, we don't remember it at all. Or, when foo() accepts the object as const MyClass &, some branches don't even require a temporary destructor at all, but the CFG is still the same double diamond:

MyClass existing_object(4, 5, 6);
foo(x ? MyClass(1, 2, 3) : existing_object);

Thanks to Manuel Klimek's hard work, the magic CFG terminator at the beginning of the bottom diamond (which corresponds to the runtime branch responsible for conditionally invoking the destructor) remembers a CXXBindTemporaryExpr * that points to the exact object that we need to destroy. Because this CXXBindTemporaryExpr is foretold by the ConstructionContext, we can write it down in the abstract state at construction time and then make the decision on the magic terminator based on the contents of the abstract state. (Another way to organize this would be to write down this info in the abstract state slightly later, when we reach the actual CXXBindTemporaryExpr. But in our case we'd probably still need to write down the exact MemRegion, so it'd still require the extra annotations in the CFG to compute that region, so we might as well do that at construction where these annotations have already been made available available. Especially given that we needed to foretell the respective MaterializeTemporaryExpr too, and then possibly copy elision on top of it, which could be another reason why a temporary destructor may suddenly become unnecessary.)

So, yeah, the construction context helps us discover facts about the program such as "if x is false then this invocation of foo() doesn't cause ~MyClass() to be invoked". Which is something you can probably discover in simple cases by simply looking at the AST in a top-down manner ("if it never constructs the object then it never destroys it, makes sense right?") but if you're doing a principled one-step-at-a-time analysis over the CFG then the ability to "foretell" the destructor makes your life much easier, simplifies the code by a lot.

I made a tiny conference talk about this (the second half of https://www.youtube.com/watch?v=4n3l-ZcDJNY&t=692s). The interesting part is probably where I enumerate all the ~30 cases I've found, that produce substantially different AST, which probably require a different ConstructionContext subclass for each of them, to properly foretell everything we needed it to foretell in each case:
Screenshot 2024-07-30 at 7 21 27 PM
(A lot of these also produce completely different ASTs before and after C++17.)

I'm very open to changes in ConstructionContext to accommodate your needs, like provide extra information that we didn't need but you did, or provide it in a different way. It's probably not dramatically different from what you're doing (i.e. your RecordStorageLocation appears to be very similar, albeit less abstract), but I suspect that it's slightly more mature at this point, and I'm fairly confident in the design:

  • ConstructionContext looks flexible enough to handle even the most obscure cases, assuming that the tool knows how to interpret it;
  • I still think that it's a very good idea to separate construction contexts into different subclasses because in many cases the tool needs to handle them differently even when the information they carry may be the same... in particular, a lot of the things wouldn't out work for us if we tried to reduce ConstructionContext to a single Stmt *;
  • In particular, I still think that there's a finite amount of ConstructionContexts in the language: even though you're allowed to put an arbitrary amount of ?: operators between the CXXConstructExpr and its respective MaterializeTemporaryExpr, we're still able cover all of these patterns with the same ConstructionContext subclass that carries a fixed amount of AST pointers.

@martinboehme
Copy link
Contributor

@haoNoQ Thank you for going to the trouble to explain all of this in detail!

I'm very open to changes in ConstructionContext to accommodate your needs, like provide extra information that we didn't need but you did, or provide it in a different way.

I haven't looked in detail yet, but I think all of the information we need is there.

The main thing I'm wondering is whether we can make it so there are fewer case distinctions that the code in FlowSensitive would need to perform -- i.e. so that we wouldn't need to handle all of the possible cases in ConstructionContext::Kind individually.

As far as I can tell, the various "intermediate" classes in the class hierarchy look as if they provide a mechanism for this. I.e. instead of distinguishing between all of the possible leaf classes, we would merely handle VariableConstructionContext, ConstructorInitializerConstructionContext, TemporaryObjectConstructionContext, and ReturnedValueConstructionContext. I think that for our purposes (i.e. identifying the result object that a record prvalue initializes), these intermediate classes should provide all of the information we need. (I believe the finer-grained information is only necessary for the additional analysis that you do, e.g. for temporary destructors, as you explain.) Does this sound right?

If you agree that this works, then the main question to me is how we make sure that we make sure we update our code if more cases are added to ConstructionContext. This is easy to do with a switch-case over ConstructionContext::Kind, as the compiler will warn if any of the cases are not handled, but we don't get any such warning when doing a series of dyn_casts over the various "intermediate" classes in the hierarchy (i.e. VariableConstructionContext and so on). We could, of course, still use a switch case but then cast to the appropriate intermediate class. I see that you also do in this in some places in the code, e.g. in ExprEngineCXX.cpp. Is this the approach that you would recommend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants