Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][dataflow] Treat comma operator correctly in getResultObjectLocation(). #78427

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

martinboehme
Copy link
Contributor

No description provided.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Jan 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-clang-analysis

@llvm/pr-subscribers-clang

Author: None (martinboehme)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/78427.diff

2 Files Affected:

  • (modified) clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp (+7-2)
  • (modified) clang/unittests/Analysis/FlowSensitive/TransferTest.cpp (+16-6)
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
index a50ee57a3c11b4..99ed55fee779a3 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
@@ -781,8 +781,13 @@ Environment::getResultObjectLocation(const Expr &RecordPRValue) const {
     return Val->getLoc();
   }
 
-  // Expression nodes that propagate a record prvalue should have exactly one
-  // child.
+  if (auto *Op = dyn_cast<BinaryOperator>(&RecordPRValue);
+      Op && Op->isCommaOp()) {
+    return getResultObjectLocation(*Op->getRHS());
+  }
+
+  // All other expression nodes that propagate a record prvalue should have
+  // exactly one child.
   llvm::SmallVector<const Stmt *> children(RecordPRValue.child_begin(),
                                            RecordPRValue.child_end());
   assert(children.size() == 1);
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 056c4f3383d832..9eabbdfb423a1f 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -2642,14 +2642,17 @@ TEST(TransferTest, ResultObjectLocation) {
     };
 
     void target() {
-      A();
+      0, A();
       (void)0; // [[p]]
     }
   )";
+  using ast_matchers::binaryOperator;
   using ast_matchers::cxxBindTemporaryExpr;
   using ast_matchers::cxxTemporaryObjectExpr;
   using ast_matchers::exprWithCleanups;
   using ast_matchers::has;
+  using ast_matchers::hasOperatorName;
+  using ast_matchers::hasRHS;
   using ast_matchers::match;
   using ast_matchers::selectFirst;
   using ast_matchers::traverse;
@@ -2659,26 +2662,33 @@ TEST(TransferTest, ResultObjectLocation) {
          ASTContext &ASTCtx) {
         const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
 
-        // The expresssion `A()` in the code above produces the following
-        // structure, consisting of three prvalues of record type.
+        // The expression `0, A()` in the code above produces the following
+        // structure, consisting of four prvalues of record type.
         // `Env.getResultObjectLocation()` should return the same location for
         // all of these.
         auto MatchResult = match(
             traverse(TK_AsIs,
                      exprWithCleanups(
-                         has(cxxBindTemporaryExpr(
-                                 has(cxxTemporaryObjectExpr().bind("toe")))
-                                 .bind("bte")))
+                         has(binaryOperator(
+                                 hasOperatorName(","),
+                                 hasRHS(cxxBindTemporaryExpr(
+                                            has(cxxTemporaryObjectExpr().bind(
+                                                "toe")))
+                                            .bind("bte")))
+                                 .bind("comma")))
                          .bind("ewc")),
             ASTCtx);
         auto *TOE = selectFirst<CXXTemporaryObjectExpr>("toe", MatchResult);
         ASSERT_NE(TOE, nullptr);
+        auto *Comma = selectFirst<BinaryOperator>("comma", MatchResult);
+        ASSERT_NE(Comma, nullptr);
         auto *EWC = selectFirst<ExprWithCleanups>("ewc", MatchResult);
         ASSERT_NE(EWC, nullptr);
         auto *BTE = selectFirst<CXXBindTemporaryExpr>("bte", MatchResult);
         ASSERT_NE(BTE, nullptr);
 
         RecordStorageLocation &Loc = Env.getResultObjectLocation(*TOE);
+        EXPECT_EQ(&Loc, &Env.getResultObjectLocation(*Comma));
         EXPECT_EQ(&Loc, &Env.getResultObjectLocation(*EWC));
         EXPECT_EQ(&Loc, &Env.getResultObjectLocation(*BTE));
       });

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change itself looks good to me, but I start to doubt whether we actually need getResultObjectLocation the way it is currently implemented. One of the red flags for me seeing some getResultObjectLocation calls in UncheckedOptionalAccessModel.cpp. I think most (all?) of those calls are redundant, the type of the expression already guarantees that we have the right location. Moreover, I think the propagation should happen internally in the built-in transfers, so authors of user code like UncheckedOptionalAccessModel should never need to think about these implementation details.

I think it might make sense to review all of the calls and after we removed the redundant ones, we might want to reconsider if it makes sense to restrict the use of this function somewhat.

Let me know if I am missing something.

@martinboehme
Copy link
Contributor Author

martinboehme commented Jan 19, 2024

This change itself looks good to me, but I start to doubt whether we actually need getResultObjectLocation the way it is currently implemented. One of the red flags for me seeing some getResultObjectLocation calls in UncheckedOptionalAccessModel.cpp. I think most (all?) of those calls are redundant, the type of the expression already guarantees that we have the right location. Moreover, I think the propagation should happen internally in the built-in transfers, so authors of user code like UncheckedOptionalAccessModel should never need to think about these implementation details.

I think it might make sense to review all of the calls and after we removed the redundant ones, we might want to reconsider if it makes sense to restrict the use of this function somewhat.

Let me know if I am missing something.

The missing context here, I think, is that the current implementation of getResultObjectLocation() is deficient and I want to fix it (hopefully soon). This fix should be transparent to the existing callers of getResultObjectLocation() (but it does mean that we currently call getResultObjectLocation() in places where it looks as if we could do something simpler).

Let me expand. Currently, getResultObjectLocation() propagates the location from the "original record constructor" up towards any prvalue ancestors of the original record constructor and, ultimately, the result object. The location for the "original record constructor" is simply pulled out of thin air by the transfer function for that node.

But this is wrong. For example, consider the following code:

  A a = some_condition()? A(1) : A(2, 3);

When either the A(1) or A(2, 3) constructor are called, the this pointer that is passed to them should be &a. But our current implementation doesn't do this. When visiting A(1) and A(2, 3), it independently creates storage locations for them ("out of thin air") and tries to propagate them upwards (towards the root). This poses an unsolvable problem when processing the conditional operator; the "result object location" for the conditional operator should be identical to the result object location for both branches, but this is currently impossible to achieve. Instead, we currently don't really model the conditional operator at all -- we just invent a fresh location for it, and hence we claim (erroneously) that the conditional operator is an "original record constructor". See also this comment.

What we should instead be doing is propagating the result object location downwards from the result object to any prvalues that might initialize it. This is described in a FIXME here. (By the way, this is what CodeGen does too -- not surprisingly.)

The current implementation is also terrible in other ways. We do actually get the following right:

  A a = A(1);

If you query getResultObjectLocation() for A(1), you'll find it gives you the same location that you get if we call Environment::getStorageLocation() for a. But the way we arrive here is a terrible hack: When processing the VarDecl for a, we retrieve the RecordValue for its initializer, look at its RecordValue::getLoc(), and "steal" this location as the location to use for the storage of the variable. But we can't actually always do this stealing; for example, when processing a member initializer, the location for the member variable has already been determined, and so we need to perform a copy that should not actually be happening.

A nice side effect of fixing getResultObjectLocation() to do the right thing is that it will make RecordValue::getLoc() obsolete -- and because this is the last remaining reason for RecordValue to exist at all, this will allow us to eliminate RecordValue entirely, something I've been working towards for a while (see for example this discussion).

Getting back to your original point: Yes, many of the existing getResultObjectLocation() calls are redundant today -- they could be replaced with code that retrieves the RecordValue, then obtains the location from RecordValue::getLoc(). But the getResultObjectLocation() calls future-proof the code -- they will allow the code to continue to work when we change getResultObjectLocation() to propagate locations from result objects down towards the "original record constructors", rather than upwards as we do today.

This has been pretty lengthy, but I hope it gives more context.

@Xazax-hun
Copy link
Collaborator

Xazax-hun commented Jan 19, 2024

This has been pretty lengthy, but I hope it gives more context.

Yeah, this was really helpful, thanks! This makes a lot of sense to me, I agree that this is a facility that we need. On the other hand, I am still wondering whether it is a good idea to expose this publicly to the checks or we could do it automatically when necessary in the engine code. That being said, I think the "publicness" of this API is not that important to solve at this point.

@martinboehme
Copy link
Contributor Author

On the other hand, I am still wondering whether it is a good idea to expose this publicly to the checks or we could do it automatically when necessary in the engine code. That being said, I think the "publicness" of this API is not that important to solve at this point.

You mean the publicness of getResultObjectLocation()?

I don't really see a good alternative. Consider the UncheckedOptionalAccessModel. It needs to provide its own custom transfer function for the constructor of an optional, which initializes the state of the optional object, and to do so, it needs to answer the question "in which location will this end up getting stored". Put differently, it needs to know what the constructor's implicit this argument is. That's what getResultObjectLocation() returns.

Maybe you're reacting to the verbose and slightly opaque name? I don't love the name either, but it kind of follows from the terms that the standard uses. "Result object" is the term that the standard uses for "what is the object that this prvalue ends up initializing", and given that term, getResultObjectLocation() is a pretty obvious (though verbose) name for the function.

But we'll certainly keep this on the table -- maybe we come up with a good alternative in the future.

@martinboehme martinboehme merged commit a2caa49 into llvm:main Jan 22, 2024
7 checks passed
@Xazax-hun
Copy link
Collaborator

You mean the publicness of getResultObjectLocation()?

Yeah, I was wondering if it would be more user friendly if something like State.Env.get<RecordStorageLocation>(Expr); would automatically recognize that the user actually wants the result location and returned that without the user having to consider this special case explicitly.

@martinboehme
Copy link
Contributor Author

You mean the publicness of getResultObjectLocation()?

Yeah, I was wondering if it would be more user friendly if something like State.Env.get<RecordStorageLocation>(Expr); would automatically recognize that the user actually wants the result location and returned that without the user having to consider this special case explicitly.

Thanks, I understand now.

I see the attraction, but I think it would dilute the semantics of get<derived_from<StorageLocation>>. Currently, this method may only be called on glvalues, and that makes it very clear what it will return -- as a glvalue is a storage location.

Extending get<derived_from<StorageLocation>> to do what getResultObjectLocation() does today means that it would become permissible to call this method on prvalues (of record type), and the semantics of what this operation does would be quite different from what it does on glvalues, and more involved -- because it would now be returning the storage location of a different AST node (the result object), and connecting the AST node for the result object to the prvalue is not entirely trivial.

I think it would therefore be better to make this difference in semantics clear by separating out the "get the result object location for a record prvalue" operation into a separate method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants