Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataflow] Add global condition to DataflowAnalysisContext #65949

Merged
merged 4 commits into from
Sep 14, 2023

Conversation

sam-mccall
Copy link
Collaborator

This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:

  • we may see the same Value for multiple expressions (e.g. reads of the
    same field) in multiple environments (multiple blocks or iterations)
  • we want to record information only when we first see the Value
    (e.g. Nullability annotations on fields only add information if we
    don't know where the value came from)
  • this information should be expressible as a SAT condition
  • we must add this SAT condition to every environment where the
    Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:

  • store the constraint directly as a property on the Value.
    But it's more composable for such properties to always be variables
    (AtomicBoolValue), and constrain them with SAT conditions.
  • add a hook whenever values are created, giving the analysis the
    chance to populate them.
    However the framework relies on/provides the ability to construct
    values in arbitrary places without providing the context such a hook
    would need, this would be a very invasive change.

This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:
 - we may see the same Value for multiple expressions (e.g. reads of the
   same field) in multiple environments (multiple blocks or iterations)
 - we want to record information only when we first see the Value
   (e.g. Nullability annotations on fields only add information if we
   don't know where the value came from)
 - this information should be expressible as a SAT condition
 - we must add this SAT condition to every environment where the
   Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:
 - store the constraint directly as a property on the Value.
   But it's more composable for such properties to always be variables
   (AtomicBoolValue), and constrain them with SAT conditions.
 - add a hook whenever values are created, giving the analysis the
   chance to populate them.
   However the framework relies on/provides the ability to construct
   values in arbitrary places without providing the context such a hook
   would need, this would be a very invasive change.
@sam-mccall sam-mccall requested review from a team as code owners September 11, 2023 11:45
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:analysis labels Sep 11, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 11, 2023

@llvm/pr-subscribers-clang

Changes

This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:

  • we may see the same Value for multiple expressions (e.g. reads of the
    same field) in multiple environments (multiple blocks or iterations)
  • we want to record information only when we first see the Value
    (e.g. Nullability annotations on fields only add information if we
    don't know where the value came from)
  • this information should be expressible as a SAT condition
  • we must add this SAT condition to every environment where the
    Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:

  • store the constraint directly as a property on the Value.
    But it's more composable for such properties to always be variables
    (AtomicBoolValue), and constrain them with SAT conditions.
  • add a hook whenever values are created, giving the analysis the
    chance to populate them.
    However the framework relies on/provides the ability to construct
    values in arbitrary places without providing the context such a hook
    would need, this would be a very invasive change.

--
Full diff: https://github.com/llvm/llvm-project/pull/65949.diff

3 Files Affected:

  • (modified) clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h (+14-6)
  • (modified) clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp (+36-24)
  • (modified) clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp (+16)
diff --git a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
index 685bbe486b54e1f..8ff7e584887241a 100644
--- a/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
+++ b/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h
@@ -108,6 +108,14 @@ class DataflowAnalysisContext {
   /// A null `PointeeType` can be used for the pointee of `std::nullptr_t`.
   PointerValue &getOrCreateNullPointerValue(QualType PointeeType);
 
+  /// Adds `Constraint` to current and future flow conditions in this context.
+  ///
+  /// This must be used with care: the constraint must be truly global across
+  /// the analysis, and should not invalidate the result of past logic.
+  ///
+  /// Adding constraints on freshly created boolean variables is generally safe.
+  void addGlobalConstraint(const Formula &Constraint);
+
   /// Adds `Constraint` to the flow condition identified by `Token`.
   void addFlowConditionConstraint(Atom Token, const Formula &Constraint);
 
@@ -174,12 +182,11 @@ class DataflowAnalysisContext {
   void addModeledFields(const FieldSet &Fields);
 
   /// Adds all constraints of the flow condition identified by `Token` and all
-  /// of its transitive dependencies to `Constraints`. `VisitedTokens` is used
-  /// to track tokens of flow conditions that were already visited by recursive
-  /// calls.
-  void addTransitiveFlowConditionConstraints(
-      Atom Token, llvm::SetVector &Constraints,
-      llvm::DenseSet &VisitedTokens);
+  /// of its transitive dependencies to `Constraints`.
+  void
+  addTransitiveFlowConditionConstraints(Atom Token,
+                                        llvm::SetVector &Out);
+
 
   /// Returns true if the solver is able to prove that there is no satisfying
   /// assignment for `Constraints`
@@ -224,6 +231,7 @@ class DataflowAnalysisContext {
   // dependencies is stored in the `FlowConditionDeps` map.
   llvm::DenseMap> FlowConditionDeps;
   llvm::DenseMap FlowConditionConstraints;
+  const Formula *GlobalConstraints = nullptr;
 
   llvm::DenseMap FunctionContexts;
 
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
index 47a994f4bbdb6a1..f2b3207609801e4 100644
--- a/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
+++ b/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
@@ -104,6 +104,13 @@ DataflowAnalysisContext::getOrCreateNullPointerValue(QualType PointeeType) {
   return *Res.first->second;
 }
 
+void DataflowAnalysisContext::addGlobalConstraint(const Formula &Constraint) {
+  if (GlobalConstraints == nullptr)
+    GlobalConstraints = &Constraint;
+  else
+    GlobalConstraints = &arena().makeAnd(*GlobalConstraints, Constraint);
+}
+
 void DataflowAnalysisContext::addFlowConditionConstraint(
     Atom Token, const Formula &Constraint) {
   auto Res = FlowConditionConstraints.try_emplace(Token, &Constraint);
@@ -150,7 +157,7 @@ bool DataflowAnalysisContext::flowConditionImplies(Atom Token,
   Constraints.insert(&arena().makeAtomRef(Token));
   Constraints.insert(&arena().makeNot(Val));
   llvm::DenseSet VisitedTokens;
-  addTransitiveFlowConditionConstraints(Token, Constraints, VisitedTokens);
+  addTransitiveFlowConditionConstraints(Token, Constraints);
   return isUnsatisfiable(std::move(Constraints));
 }
 
@@ -160,7 +167,7 @@ bool DataflowAnalysisContext::flowConditionIsTautology(Atom Token) {
   llvm::SetVector Constraints;
   Constraints.insert(&arena().makeNot(arena().makeAtomRef(Token)));
   llvm::DenseSet VisitedTokens;
-  addTransitiveFlowConditionConstraints(Token, Constraints, VisitedTokens);
+  addTransitiveFlowConditionConstraints(Token, Constraints);
   return isUnsatisfiable(std::move(Constraints));
 }
 
@@ -172,27 +179,33 @@ bool DataflowAnalysisContext::equivalentFormulas(const Formula &Val1,
 }
 
 void DataflowAnalysisContext::addTransitiveFlowConditionConstraints(
-    Atom Token, llvm::SetVector &Constraints,
-    llvm::DenseSet &VisitedTokens) {
-  auto Res = VisitedTokens.insert(Token);
-  if (!Res.second)
-    return;
-
-  auto ConstraintsIt = FlowConditionConstraints.find(Token);
-  if (ConstraintsIt == FlowConditionConstraints.end()) {
-    Constraints.insert(&arena().makeAtomRef(Token));
-  } else {
-    // Bind flow condition token via `iff` to its set of constraints:
-    // FC <=> (C1 ^ C2 ^ ...), where Ci are constraints
-    Constraints.insert(&arena().makeEquals(arena().makeAtomRef(Token),
-                                           *ConstraintsIt->second));
-  }
+    Atom Token, llvm::SetVector &Constraints) {
+  llvm::DenseSet AddedTokens;
+  std::vector Remaining = {Token};
+
+  if (GlobalConstraints)
+    Constraints.insert(GlobalConstraints);
+  // Define all the flow conditions that might be referenced in constraints.
+  while (!Remaining.empty()) {
+    auto Token = Remaining.back();
+    Remaining.pop_back();
+    if (!AddedTokens.insert(Token).second)
+      continue;
+
+    auto ConstraintsIt = FlowConditionConstraints.find(Token);
+    if (ConstraintsIt == FlowConditionConstraints.end()) {
+      Constraints.insert(&arena().makeAtomRef(Token));
+    } else {
+      // Bind flow condition token via `iff` to its set of constraints:
+      // FC <=> (C1 ^ C2 ^ ...), where Ci are constraints
+      Constraints.insert(&arena().makeEquals(arena().makeAtomRef(Token),
+                                             *ConstraintsIt->second));
+    }
 
-  auto DepsIt = FlowConditionDeps.find(Token);
-  if (DepsIt != FlowConditionDeps.end()) {
-    for (Atom DepToken : DepsIt->second) {
-      addTransitiveFlowConditionConstraints(DepToken, Constraints,
-                                            VisitedTokens);
+    if (auto DepsIt = FlowConditionDeps.find(Token);
+        DepsIt != FlowConditionDeps.end()) {
+      for (Atom A : DepsIt->second)
+        Remaining.push_back(A);
     }
   }
 }
@@ -201,8 +214,7 @@ void DataflowAnalysisContext::dumpFlowCondition(Atom Token,
                                                 llvm::raw_ostream &OS) {
   llvm::SetVector Constraints;
   Constraints.insert(&arena().makeAtomRef(Token));
-  llvm::DenseSet VisitedTokens;
-  addTransitiveFlowConditionConstraints(Token, Constraints, VisitedTokens);
+  addTransitiveFlowConditionConstraints(Token, Constraints);
 
   // TODO: have formulas know about true/false directly instead
   Atom True = arena().makeLiteral(true).getAtom();
diff --git a/clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp b/clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp
index f6e8b30d898e892..e624f6cfd0f6b6f 100644
--- a/clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/DataflowAnalysisContextTest.cpp
@@ -46,6 +46,22 @@ TEST_F(DataflowAnalysisContextTest, AddFlowConditionConstraint) {
   EXPECT_TRUE(Context.flowConditionImplies(FC, C));
 }
 
+TEST_F(DataflowAnalysisContextTest, AddGlobalConstraint) {
+  Atom FC = A.makeFlowConditionToken();
+  auto &C = A.makeAtomRef(A.makeAtom());
+  Context.addGlobalConstraint(C);
+  EXPECT_TRUE(Context.flowConditionImplies(FC, C));
+}
+
+TEST_F(DataflowAnalysisContextTest, GlobalAndFCConstraintInteract) {
+  Atom FC = A.makeFlowConditionToken();
+  auto &C = A.makeAtomRef(A.makeAtom());
+  auto &D = A.makeAtomRef(A.makeAtom());
+  Context.addGlobalConstraint(A.makeImplies(C, D));
+  Context.addFlowConditionConstraint(FC, C);
+  EXPECT_TRUE(Context.flowConditionImplies(FC, D));
+}
+
 TEST_F(DataflowAnalysisContextTest, ForkFlowCondition) {
   Atom FC1 = A.makeFlowConditionToken();
   auto &C1 = A.makeAtomRef(A.makeAtom());

@@ -108,6 +108,16 @@ class DataflowAnalysisContext {
/// A null `PointeeType` can be used for the pointee of `std::nullptr_t`.
PointerValue &getOrCreateNullPointerValue(QualType PointeeType);

/// Adds `Constraint` to current and future flow conditions in this context.
///
/// The global condition must contain only flow-insensitive information, i.e.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if GlobalConstraint is a bit vague. How about something like addFlowInsensitiveInvariant. This makes part of the comment unnecessary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much more accurate descriptor, but IME "flow insensitive" is not good as an actual name. We used "non-flow-sensitive" extensively in nullability and gradually phasing it out in favor of the "type" metaphor, as after many months reading it still requires "deciphering". Mostly it's too long, it's a negative definition, and too technical I think.
The parallel between addFlowConditionConstraint and addFlowInsensitiveInvariant is also IMO not clear enough. (I'd be in favor of dropping "Constraint" from both, but that's a different change)

I've tried Common instead of Global which might more clearly emphasize the sharing that happens here - WDYT?
If you feel strongly about FlowInsensitive I can use that, it might be more consistent with the framework's style.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps just addInvariant. The term "invariant" kind of says it all and (at least in the literature) types are frequently referrered to as specifying program invariants.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings about either of the options. I think I have a slight preference of Invariant over Constraint. But I am not opposed to any of the alternatives.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invariant sounds good! Done

Drop unused
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@sam-mccall sam-mccall merged commit 21ab252 into llvm:main Sep 14, 2023
2 checks passed
ZijunZhaoCCK pushed a commit to ZijunZhaoCCK/llvm-project that referenced this pull request Sep 19, 2023
…lvm#65949)

This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:
 - we may see the same Value for multiple expressions (e.g. reads of the
   same field) in multiple environments (multiple blocks or iterations)
 - we want to record information only when we first see the Value
   (e.g. Nullability annotations on fields only add information if we
   don't know where the value came from)
 - this information should be expressible as a SAT condition
 - we must add this SAT condition to every environment where the
   Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:
 - store the constraint directly as a property on the Value.
   But it's more composable for such properties to always be variables
   (AtomicBoolValue), and constrain them with SAT conditions.
 - add a hook whenever values are created, giving the analysis the
   chance to populate them.
   However the framework relies on/provides the ability to construct
   values in arbitrary places without providing the context such a hook
   would need, this would be a very invasive change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants