[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

ymand · 2024-02-26T02:42:27Z

Draft to demo how we can pull out the boolean model. Let's discuss specifics of
namings, location, etc.

The purpose of this refactoring is to enable us to compare the performance of
different boolean models. In particular, we're interested in investigating a
very simple semantic domain of just the booleans (and Top).

In the process, the PR drastically simplifies the handling of terminators. This
cleanup can be pulled out into its own PR, to precede the refactoring work.

martinboehme · 2024-02-26T08:57:37Z

Draft to demo how we can pull out the boolean model. Let's discuss specifics of namings, location, etc.

Not sure -- do you mean let's wordsmith names now, or do you mean we should discuss naming and location, but that should happen after we've talked about the general approach?

The purpose of this refactoring is to enable us to compare the performance of different boolean models. In particular, we're interested in investigating a very simple semantic domain of just the booleans (and Top).

Can you expand on how we would swap in a different boolean model?

Just put #ifdefs in the various functions in bool_model?
Provide different namespaces containing different boolean models, then in namespace bool_model, do using namespace my_desired_bool_model?
Something else?

I would favour a model that's as simple as possible -- I don't think we want to use template parameters, for example -- and what you have here looks like it's intended to be simple. I'm just not sure exactly where this is intended to go?

In the process, the PR drastically simplifies the handling of terminators. This cleanup can be pulled out into its own PR, to precede the refactoring work.

I like the cleanup, and I think pulling it out into a separate patch is a good idea because a) it's unrelated to the rest of this patch, and b) it can land today, without further discussion needed (IMO).

ymand · 2024-03-08T14:56:29Z

Draft to demo how we can pull out the boolean model. Let's discuss specifics of namings, location, etc.

Not sure -- do you mean let's wordsmith names now, or do you mean we should discuss naming and location, but that should happen after we've talked about the general approach?

Either way -- I just meant that I'm not tied to the particulars included in the draft.

The purpose of this refactoring is to enable us to compare the performance of different boolean models. In particular, we're interested in investigating a very simple semantic domain of just the booleans (and Top).

Can you expand on how we would swap in a different boolean model?

Just put #ifdefs in the various functions in bool_model?

Provide different namespaces containing different boolean models, then in namespace bool_model, do using namespace my_desired_bool_model?

Something else?

For starters, the namespace approach. That will give us a simple way to experiment with alternatives. But, next step is to turn this namespace into a derivative of DataflowModel -- we'll just need to extend that interface to support transferBranch.

Additional alternatives:

template parameter -- make the boolean model a static parameter of the overall system. This sounds like a nightmare, to be blunt, forcing a huge amount of code into templates and massive rewriting. Let's not.
multiple models -- the boolean model actions occur inside functions that are already parameterized. If we would support multiple models, then we could simply remove the boolean modeling altogether from the core and bundle it up as a standalone model. That is, the core will not concern itself with boolean modeling. I like this for the long term, but don't want to block on this for now.

I would favour a model that's as simple as possible -- I don't think we want to use template parameters, for example -- and what you have here looks like it's intended to be simple. I'm just not sure exactly where this is intended to go?

In the process, the PR drastically simplifies the handling of terminators. This cleanup can be pulled out into its own PR, to precede the refactoring work.

I like the cleanup, and I think pulling it out into a separate patch is a good idea because a) it's unrelated to the rest of this patch, and b) it can land today, without further discussion needed (IMO).

Will do!

ymand · 2024-03-08T15:23:00Z

Terminator cleanup split out into #84499

github-actions · 2024-03-11T17:06:36Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 734026347cca85cf0e242ef5f04896f55e0ac113 ff9537d374ba3062874d7b64aaa6947c860e0c79 -- clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h clang/include/clang/Analysis/FlowSensitive/Transfer.h clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp clang/lib/Analysis/FlowSensitive/Transfer.cpp clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp clang/unittests/Analysis/FlowSensitive/TransferTest.cpp clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp

View the diff from clang-format here.

diff --git a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
index f840ccd382..57a5524b13 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
@@ -652,9 +652,8 @@ const Formula &evaluateEquality(Arena &A, const Formula &EqVal,
   // b) (!LHS & !RHS) => EqVal
   //    If neither is set, then they are equal.
   // We rewrite b) as !EqVal => (LHS v RHS), for a more compact formula.
-  return A.makeAnd(
-      A.makeImplies(EqVal, A.makeEquals(LHS, RHS)),
-      A.makeImplies(A.makeNot(EqVal), A.makeOr(LHS, RHS)));
+  return A.makeAnd(A.makeImplies(EqVal, A.makeEquals(LHS, RHS)),
+                   A.makeImplies(A.makeNot(EqVal), A.makeOr(LHS, RHS)));
 }
 
 void transferOptionalAndOptionalCmp(const clang::CXXOperatorCallExpr *CmpExpr,
diff --git a/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp b/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
index 7011345053..6dedbe17f2 100644
--- a/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
@@ -1382,7 +1382,8 @@ protected:
 
 INSTANTIATE_TEST_SUITE_P(
     UncheckedOptionalUseTestInst, UncheckedOptionalAccessTest,
-    ::testing::Values(OptionalTypeIdentifier{"std", "optional"}// ,
+    ::testing::Values(OptionalTypeIdentifier{"std", "optional"}
+                      // ,
                       // OptionalTypeIdentifier{"absl", "optional"},
                       // OptionalTypeIdentifier{"base", "Optional"}
                       ),

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

martinboehme · 2024-03-12T10:42:52Z

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp

+  auto V = simple_bool_model::getLiteralValue(F, *this);
+  return V.has_value() && *V;
+}
+#endif

 bool Environment::allows(const Formula &F) const {
  return DACtx->flowConditionAllows(FlowConditionToken, F);


Wouldn't this also need to be changed? I think this can just do return proves(F);?

Yes, but not to proves: it should be value_or(true) -- that is, if we lack any definite setting, we "allow" it to be anything.

martinboehme · 2024-03-12T10:46:53Z

clang/lib/Analysis/FlowSensitive/Transfer.cpp

+    return Env.getAtomValue(F.getAtom());
+  case Formula::Literal:
+    return F.literal();
+  case Formula::Not: {


Why do we need this case? Is it not covered by neOp() above? (I.e. are there ever any cases where we actually have Formula::Not formulas?)

If we do need this, don't we also need corresponding cases for Formula::And and Formula::Or?

Sorry, this got clobbered when I pushed after a rebase. But, this was on the function getLiteralValue (now in DataflowEnvironment.cpp) and I don't think the original concerns apply, since we now support And and Or, etc.

…module. In the process, drastically simplify the handling of terminators.

The new model still uses atomic variables in boolean formulas, but it limits the environment to accumulating truth values for atomic variables, rather than the arbitrary formula allowed by the flow condition.

ymand · 2024-03-21T17:58:59Z

Martin, I've thoroughly updated the refactoring, exactly as you suggested -- all of the interesting differences are actually just in how we handle the logical operations, so most of the changes are now in DataflowEnvironment.cpp. I've left the factoring in Transfer because we may want it for the future -- these two models both use formulae, but other implementations could differ.

The test failures are down to 35 and all they are all WAI -- places where we have genuine differences between the models, primarily around encoding custom API properties with formulae.

…oves) and drop distinction between boolean operations. This commit drastically simplifies the original refactoring. We keep the boolean model separately, but we only maintain one version, since there turned out to be no meaningful difference between them. Instead, the difference lies in the logical operations, so we've abstacted those. We're down to 35 failing tests, all with clear explanations based on the limitations of this approach; primarily, the inability to encode custom API/operator meanings using logical formulae.

ymand requested a review from martinboehme February 26, 2024 02:42

ymand force-pushed the better-bools branch from e0c5196 to 16cdcfa Compare March 11, 2024 17:04

martinboehme reviewed Mar 12, 2024

View reviewed changes

clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp Outdated Show resolved Hide resolved

martinboehme reviewed Mar 12, 2024

View reviewed changes

ymand added 2 commits March 21, 2024 12:55

[clang][dataflow] Factor out built-in boolean model into an explicit …

33f753d

…module. In the process, drastically simplify the handling of terminators.

[clang][dataflow] Add implementation of a simple boolean model

ed9bb21

The new model still uses atomic variables in boolean formulas, but it limits the environment to accumulating truth values for atomic variables, rather than the arbitrary formula allowed by the flow condition.

ymand force-pushed the better-bools branch from 16cdcfa to 486686c Compare March 21, 2024 17:27

ymand force-pushed the better-bools branch from 486686c to 3a6266e Compare March 21, 2024 17:59

ymand force-pushed the better-bools branch from 3a6266e to ff9537d Compare March 21, 2024 18:00

cpovirk mentioned this pull request Apr 5, 2024

False positive with field nullability stored in local variable uber/NullAway#98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

ymand commented Feb 26, 2024

martinboehme commented Feb 26, 2024

ymand commented Mar 8, 2024

ymand commented Mar 8, 2024

github-actions bot commented Mar 11, 2024 •

edited

martinboehme Mar 12, 2024

ymand Mar 21, 2024

martinboehme Mar 12, 2024

ymand Mar 21, 2024

ymand commented Mar 21, 2024

[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

Are you sure you want to change the base?

[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

Conversation

ymand commented Feb 26, 2024

martinboehme commented Feb 26, 2024

ymand commented Mar 8, 2024

ymand commented Mar 8, 2024

github-actions bot commented Mar 11, 2024 • edited

martinboehme Mar 12, 2024

Choose a reason for hiding this comment

ymand Mar 21, 2024

Choose a reason for hiding this comment

martinboehme Mar 12, 2024

Choose a reason for hiding this comment

ymand Mar 21, 2024

Choose a reason for hiding this comment

ymand commented Mar 21, 2024

github-actions bot commented Mar 11, 2024 •

edited