Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][dataflow] Factor out built-in boolean model into an explicit module. #82950

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ymand
Copy link
Collaborator

@ymand ymand commented Feb 26, 2024

Draft to demo how we can pull out the boolean model. Let's discuss specifics of
namings, location, etc.

The purpose of this refactoring is to enable us to compare the performance of
different boolean models. In particular, we're interested in investigating a
very simple semantic domain of just the booleans (and Top).

In the process, the PR drastically simplifies the handling of terminators. This
cleanup can be pulled out into its own PR, to precede the refactoring work.

@martinboehme
Copy link
Contributor

Draft to demo how we can pull out the boolean model. Let's discuss specifics of namings, location, etc.

Not sure -- do you mean let's wordsmith names now, or do you mean we should discuss naming and location, but that should happen after we've talked about the general approach?

The purpose of this refactoring is to enable us to compare the performance of different boolean models. In particular, we're interested in investigating a very simple semantic domain of just the booleans (and Top).

Can you expand on how we would swap in a different boolean model?

  • Just put #ifdefs in the various functions in bool_model?

  • Provide different namespaces containing different boolean models, then in namespace bool_model, do using namespace my_desired_bool_model?

  • Something else?

I would favour a model that's as simple as possible -- I don't think we want to use template parameters, for example -- and what you have here looks like it's intended to be simple. I'm just not sure exactly where this is intended to go?

In the process, the PR drastically simplifies the handling of terminators. This cleanup can be pulled out into its own PR, to precede the refactoring work.

I like the cleanup, and I think pulling it out into a separate patch is a good idea because a) it's unrelated to the rest of this patch, and b) it can land today, without further discussion needed (IMO).

@ymand
Copy link
Collaborator Author

ymand commented Mar 8, 2024

Draft to demo how we can pull out the boolean model. Let's discuss specifics of namings, location, etc.

Not sure -- do you mean let's wordsmith names now, or do you mean we should discuss naming and location, but that should happen after we've talked about the general approach?

Either way -- I just meant that I'm not tied to the particulars included in the draft.

The purpose of this refactoring is to enable us to compare the performance of different boolean models. In particular, we're interested in investigating a very simple semantic domain of just the booleans (and Top).

Can you expand on how we would swap in a different boolean model?

  • Just put #ifdefs in the various functions in bool_model?
  • Provide different namespaces containing different boolean models, then in namespace bool_model, do using namespace my_desired_bool_model?
  • Something else?

For starters, the namespace approach. That will give us a simple way to experiment with alternatives. But, next step is to turn this namespace into a derivative of DataflowModel -- we'll just need to extend that interface to support transferBranch.

Additional alternatives:

  1. template parameter -- make the boolean model a static parameter of the overall system. This sounds like a nightmare, to be blunt, forcing a huge amount of code into templates and massive rewriting. Let's not.
  2. multiple models -- the boolean model actions occur inside functions that are already parameterized. If we would support multiple models, then we could simply remove the boolean modeling altogether from the core and bundle it up as a standalone model. That is, the core will not concern itself with boolean modeling. I like this for the long term, but don't want to block on this for now.

I would favour a model that's as simple as possible -- I don't think we want to use template parameters, for example -- and what you have here looks like it's intended to be simple. I'm just not sure exactly where this is intended to go?

In the process, the PR drastically simplifies the handling of terminators. This cleanup can be pulled out into its own PR, to precede the refactoring work.

I like the cleanup, and I think pulling it out into a separate patch is a good idea because a) it's unrelated to the rest of this patch, and b) it can land today, without further discussion needed (IMO).

Will do!

@ymand
Copy link
Collaborator Author

ymand commented Mar 8, 2024

Terminator cleanup split out into #84499

Copy link

github-actions bot commented Mar 11, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 734026347cca85cf0e242ef5f04896f55e0ac113 ff9537d374ba3062874d7b64aaa6947c860e0c79 -- clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h clang/include/clang/Analysis/FlowSensitive/Transfer.h clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp clang/lib/Analysis/FlowSensitive/Transfer.cpp clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp clang/unittests/Analysis/FlowSensitive/TransferTest.cpp clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
index f840ccd382..57a5524b13 100644
--- a/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp
@@ -652,9 +652,8 @@ const Formula &evaluateEquality(Arena &A, const Formula &EqVal,
   // b) (!LHS & !RHS) => EqVal
   //    If neither is set, then they are equal.
   // We rewrite b) as !EqVal => (LHS v RHS), for a more compact formula.
-  return A.makeAnd(
-      A.makeImplies(EqVal, A.makeEquals(LHS, RHS)),
-      A.makeImplies(A.makeNot(EqVal), A.makeOr(LHS, RHS)));
+  return A.makeAnd(A.makeImplies(EqVal, A.makeEquals(LHS, RHS)),
+                   A.makeImplies(A.makeNot(EqVal), A.makeOr(LHS, RHS)));
 }
 
 void transferOptionalAndOptionalCmp(const clang::CXXOperatorCallExpr *CmpExpr,
diff --git a/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp b/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
index 7011345053..6dedbe17f2 100644
--- a/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/UncheckedOptionalAccessModelTest.cpp
@@ -1382,7 +1382,8 @@ protected:
 
 INSTANTIATE_TEST_SUITE_P(
     UncheckedOptionalUseTestInst, UncheckedOptionalAccessTest,
-    ::testing::Values(OptionalTypeIdentifier{"std", "optional"}// ,
+    ::testing::Values(OptionalTypeIdentifier{"std", "optional"}
+                      // ,
                       // OptionalTypeIdentifier{"absl", "optional"},
                       // OptionalTypeIdentifier{"base", "Optional"}
                       ),

auto V = simple_bool_model::getLiteralValue(F, *this);
return V.has_value() && *V;
}
#endif

bool Environment::allows(const Formula &F) const {
return DACtx->flowConditionAllows(FlowConditionToken, F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this also need to be changed? I think this can just do return proves(F);?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but not to proves: it should be value_or(true) -- that is, if we lack any definite setting, we "allow" it to be anything.

return Env.getAtomValue(F.getAtom());
case Formula::Literal:
return F.literal();
case Formula::Not: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this case? Is it not covered by neOp() above? (I.e. are there ever any cases where we actually have Formula::Not formulas?)

If we do need this, don't we also need corresponding cases for Formula::And and Formula::Or?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this got clobbered when I pushed after a rebase. But, this was on the function getLiteralValue (now in DataflowEnvironment.cpp) and I don't think the original concerns apply, since we now support And and Or, etc.

…module.

In the process, drastically simplify the handling of terminators.
The new model still uses atomic variables in boolean formulas, but it limits the
environment to accumulating truth values for atomic variables, rather than
the arbitrary formula allowed by the flow condition.
@ymand
Copy link
Collaborator Author

ymand commented Mar 21, 2024

Martin, I've thoroughly updated the refactoring, exactly as you suggested -- all of the interesting differences are actually just in how we handle the logical operations, so most of the changes are now in DataflowEnvironment.cpp. I've left the factoring in Transfer because we may want it for the future -- these two models both use formulae, but other implementations could differ.

The test failures are down to 35 and all they are all WAI -- places where we have genuine differences between the models, primarily around encoding custom API properties with formulae.

…oves) and drop distinction between boolean operations.

This commit drastically simplifies the original refactoring. We keep the boolean
model separately, but we only maintain one version, since there turned out to be
no meaningful difference between them. Instead, the difference lies in the
logical operations, so we've abstacted those.

We're down to 35 failing tests, all with clear explanations based on the
limitations of this approach; primarily, the inability to encode custom
API/operator meanings using logical formulae.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants