Skip to content

Commit

Permalink
[dataflow] Add dedicated representation of boolean formulas
Browse files Browse the repository at this point in the history
This is the first step in untangling the two current jobs of BoolValue.

=== Desired end-state: ===

- BoolValue will model C++ booleans e.g. held in StorageLocations.
  this includes describing uncertainty (e.g. "top" is a Value concern)
- Formula describes analysis-level assertions in terms of SAT atoms.

These can still be linked together: a BoolValue may have a corresponding
SAT atom which is constrained by formulas.

=== Done in this patch: ===

BoolValue is left intact, Formula is just the input type to the
SAT solver, and we build formulas as needed to invoke the solver.

=== Incidental changes to debug string printing: ===

- variables renamed from B0 etc to V0 etc
  B0 collides with the names of basic blocks, which is confusing when
  debugging flow conditions.
- debug printing of formulas (Formula and Atom) uses operator<<
  rather than debugString(), so works with gtest.
  Therefore moved out of DebugSupport.h
- Did the same to Solver::Result, and some helper changes to SolverTest,
  so that we get useful messages on unit test failures
- formulas are now printed as infix expressions on one line, rather than
  wrapped/indented S-exprs. My experience is that this is easier to scan
  FCs for small examples, and large ones are unreadable either way.
- most of the several debugString() functions for constraints/results
  are unused, so removed them rather than updating tests.
  Inlined the one that was actually used into its callsite.

Differential Revision: https://reviews.llvm.org/D153366
  • Loading branch information
sam-mccall committed Jul 4, 2023
1 parent 61bcaae commit 2fd614e
Show file tree
Hide file tree
Showing 15 changed files with 587 additions and 918 deletions.
14 changes: 14 additions & 0 deletions clang/include/clang/Analysis/FlowSensitive/Arena.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE__ARENA_H
#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE__ARENA_H

#include "clang/Analysis/FlowSensitive/Formula.h"
#include "clang/Analysis/FlowSensitive/StorageLocation.h"
#include "clang/Analysis/FlowSensitive/Value.h"
#include <vector>
Expand Down Expand Up @@ -104,7 +105,17 @@ class Arena {
return create<AtomicBoolValue>();
}

/// Gets the boolean formula equivalent of a BoolValue.
/// Each unique Top values is translated to an Atom.
/// TODO: migrate to storing Formula directly in Values instead.
const Formula &getFormula(const BoolValue &);

/// Returns a new atomic boolean variable, distinct from any other.
Atom makeAtom() { return static_cast<Atom>(NextAtom++); };

private:
llvm::BumpPtrAllocator Alloc;

// Storage for the state of a program.
std::vector<std::unique_ptr<StorageLocation>> Locs;
std::vector<std::unique_ptr<Value>> Vals;
Expand All @@ -122,6 +133,9 @@ class Arena {
llvm::DenseMap<std::pair<BoolValue *, BoolValue *>, BiconditionalValue *>
BiconditionalVals;

llvm::DenseMap<const BoolValue *, const Formula *> ValToFormula;
unsigned NextAtom = 0;

AtomicBoolValue &TrueVal;
AtomicBoolValue &FalseVal;
};
Expand Down
44 changes: 0 additions & 44 deletions clang/include/clang/Analysis/FlowSensitive/DebugSupport.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@

#include "clang/Analysis/FlowSensitive/Solver.h"
#include "clang/Analysis/FlowSensitive/Value.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/StringRef.h"

namespace clang {
Expand All @@ -28,52 +27,9 @@ namespace dataflow {
/// Returns a string representation of a value kind.
llvm::StringRef debugString(Value::Kind Kind);

/// Returns a string representation of a boolean assignment to true or false.
llvm::StringRef debugString(Solver::Result::Assignment Assignment);

/// Returns a string representation of the result status of a SAT check.
llvm::StringRef debugString(Solver::Result::Status Status);

/// Returns a string representation for the boolean value `B`.
///
/// Atomic booleans appearing in the boolean value `B` are assigned to labels
/// either specified in `AtomNames` or created by default rules as B0, B1, ...
///
/// Requirements:
///
/// Names assigned to atoms should not be repeated in `AtomNames`.
std::string debugString(
const BoolValue &B,
llvm::DenseMap<const AtomicBoolValue *, std::string> AtomNames = {{}});

/// Returns a string representation for `Constraints` - a collection of boolean
/// formulas.
///
/// Atomic booleans appearing in the boolean value `Constraints` are assigned to
/// labels either specified in `AtomNames` or created by default rules as B0,
/// B1, ...
///
/// Requirements:
///
/// Names assigned to atoms should not be repeated in `AtomNames`.
std::string debugString(
const llvm::ArrayRef<BoolValue *> Constraints,
llvm::DenseMap<const AtomicBoolValue *, std::string> AtomNames = {{}});

/// Returns a string representation for `Constraints` - a collection of boolean
/// formulas and the `Result` of satisfiability checking.
///
/// Atomic booleans appearing in `Constraints` and `Result` are assigned to
/// labels either specified in `AtomNames` or created by default rules as B0,
/// B1, ...
///
/// Requirements:
///
/// Names assigned to atoms should not be repeated in `AtomNames`.
std::string debugString(
ArrayRef<BoolValue *> Constraints, const Solver::Result &Result,
llvm::DenseMap<const AtomicBoolValue *, std::string> AtomNames = {{}});

} // namespace dataflow
} // namespace clang

Expand Down
137 changes: 137 additions & 0 deletions clang/include/clang/Analysis/FlowSensitive/Formula.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
//===- Formula.h - Boolean formulas -----------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_FORMULA_H
#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_FORMULA_H

#include "clang/Basic/LLVM.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/raw_ostream.h"
#include <cassert>
#include <string>
#include <type_traits>

namespace clang::dataflow {

/// Identifies an atomic boolean variable such as "V1".
///
/// This often represents an assertion that is interesting to the analysis but
/// cannot immediately be proven true or false. For example:
/// - V1 may mean "the program reaches this point",
/// - V2 may mean "the parameter was null"
///
/// We can use these variables in formulas to describe relationships we know
/// to be true: "if the parameter was null, the program reaches this point".
/// We also express hypotheses as formulas, and use a SAT solver to check
/// whether they are consistent with the known facts.
enum class Atom : unsigned {};

/// A boolean expression such as "true" or "V1 & !V2".
/// Expressions may refer to boolean atomic variables. These should take a
/// consistent true/false value across the set of formulas being considered.
///
/// (Formulas are always expressions in terms of boolean variables rather than
/// e.g. integers because our underlying model is SAT rather than e.g. SMT).
///
/// Simple formulas such as "true" and "V1" are self-contained.
/// Compound formulas connect other formulas, e.g. "(V1 & V2) || V3" is an 'or'
/// formula, with pointers to its operands "(V1 & V2)" and "V3" stored as
/// trailing objects.
/// For this reason, Formulas are Arena-allocated and over-aligned.
class Formula;
class alignas(const Formula *) Formula {
public:
enum Kind : unsigned {
/// A reference to an atomic boolean variable.
/// We name these e.g. "V3", where 3 == atom identity == Value.
AtomRef,
// FIXME: add const true/false rather than modeling them as variables

Not, /// True if its only operand is false

// These kinds connect two operands LHS and RHS
And, /// True if LHS and RHS are both true
Or, /// True if either LHS or RHS is true
Implies, /// True if LHS is false or RHS is true
Equal, /// True if LHS and RHS have the same truth value
};
Kind kind() const { return FormulaKind; }

Atom getAtom() const {
assert(kind() == AtomRef);
return static_cast<Atom>(Value);
}

ArrayRef<const Formula *> operands() const {
return ArrayRef(reinterpret_cast<Formula *const *>(this + 1),
numOperands(kind()));
}

using AtomNames = llvm::DenseMap<Atom, std::string>;
// Produce a stable human-readable representation of this formula.
// For example: (V3 | !(V1 & V2))
// If AtomNames is provided, these override the default V0, V1... names.
void print(llvm::raw_ostream &OS, const AtomNames * = nullptr) const;

// Allocate Formulas using Arena rather than calling this function directly.
static Formula &create(llvm::BumpPtrAllocator &Alloc, Kind K,
ArrayRef<const Formula *> Operands,
unsigned Value = 0);

private:
Formula() = default;
Formula(const Formula &) = delete;
Formula &operator=(const Formula &) = delete;

static unsigned numOperands(Kind K) {
switch (K) {
case AtomRef:
return 0;
case Not:
return 1;
case And:
case Or:
case Implies:
case Equal:
return 2;
}
}

Kind FormulaKind;
// Some kinds of formula have scalar values, e.g. AtomRef's atom number.
unsigned Value;
};

// The default names of atoms are V0, V1 etc in order of creation.
inline llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, Atom A) {
return OS << 'V' << static_cast<unsigned>(A);
}
inline llvm::raw_ostream &operator<<(llvm::raw_ostream &OS, const Formula &F) {
F.print(OS);
return OS;
}

} // namespace clang::dataflow
namespace llvm {
template <> struct DenseMapInfo<clang::dataflow::Atom> {
using Atom = clang::dataflow::Atom;
using Underlying = std::underlying_type_t<Atom>;

static inline Atom getEmptyKey() { return Atom(Underlying(-1)); }
static inline Atom getTombstoneKey() { return Atom(Underlying(-2)); }
static unsigned getHashValue(const Atom &Val) {
return DenseMapInfo<Underlying>::getHashValue(Underlying(Val));
}
static bool isEqual(const Atom &LHS, const Atom &RHS) { return LHS == RHS; }
};
} // namespace llvm
#endif
27 changes: 10 additions & 17 deletions clang/include/clang/Analysis/FlowSensitive/Solver.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,10 @@
#ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_SOLVER_H
#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_SOLVER_H

#include "clang/Analysis/FlowSensitive/Value.h"
#include "clang/Analysis/FlowSensitive/Formula.h"
#include "clang/Basic/LLVM.h"
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseSet.h"
#include "llvm/Support/Compiler.h"
#include <optional>
#include <vector>

Expand Down Expand Up @@ -49,8 +47,7 @@ class Solver {

/// Constructs a result indicating that the queried boolean formula is
/// satisfiable. The result will hold a solution found by the solver.
static Result
Satisfiable(llvm::DenseMap<AtomicBoolValue *, Assignment> Solution) {
static Result Satisfiable(llvm::DenseMap<Atom, Assignment> Solution) {
return Result(Status::Satisfiable, std::move(Solution));
}

Expand All @@ -68,19 +65,17 @@ class Solver {

/// Returns a truth assignment to boolean values that satisfies the queried
/// boolean formula if available. Otherwise, an empty optional is returned.
std::optional<llvm::DenseMap<AtomicBoolValue *, Assignment>>
getSolution() const {
std::optional<llvm::DenseMap<Atom, Assignment>> getSolution() const {
return Solution;
}

private:
Result(
enum Status SATCheckStatus,
std::optional<llvm::DenseMap<AtomicBoolValue *, Assignment>> Solution)
Result(Status SATCheckStatus,
std::optional<llvm::DenseMap<Atom, Assignment>> Solution)
: SATCheckStatus(SATCheckStatus), Solution(std::move(Solution)) {}

Status SATCheckStatus;
std::optional<llvm::DenseMap<AtomicBoolValue *, Assignment>> Solution;
std::optional<llvm::DenseMap<Atom, Assignment>> Solution;
};

virtual ~Solver() = default;
Expand All @@ -91,14 +86,12 @@ class Solver {
/// Requirements:
///
/// All elements in `Vals` must not be null.
virtual Result solve(llvm::ArrayRef<BoolValue *> Vals) = 0;

LLVM_DEPRECATED("Pass ArrayRef for determinism", "")
virtual Result solve(llvm::DenseSet<BoolValue *> Vals) {
return solve(ArrayRef(std::vector<BoolValue *>(Vals.begin(), Vals.end())));
}
virtual Result solve(llvm::ArrayRef<const Formula *> Vals) = 0;
};

llvm::raw_ostream &operator<<(llvm::raw_ostream &, const Solver::Result &);
llvm::raw_ostream &operator<<(llvm::raw_ostream &, Solver::Result::Assignment);

} // namespace dataflow
} // namespace clang

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
#ifndef LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H
#define LLVM_CLANG_ANALYSIS_FLOWSENSITIVE_WATCHEDLITERALSSOLVER_H

#include "clang/Analysis/FlowSensitive/Formula.h"
#include "clang/Analysis/FlowSensitive/Solver.h"
#include "clang/Analysis/FlowSensitive/Value.h"
#include "llvm/ADT/ArrayRef.h"
#include <limits>

Expand Down Expand Up @@ -46,7 +46,7 @@ class WatchedLiteralsSolver : public Solver {
explicit WatchedLiteralsSolver(std::int64_t WorkLimit)
: MaxIterations(WorkLimit) {}

Result solve(llvm::ArrayRef<BoolValue *> Vals) override;
Result solve(llvm::ArrayRef<const Formula *> Vals) override;
};

} // namespace dataflow
Expand Down
46 changes: 46 additions & 0 deletions clang/lib/Analysis/FlowSensitive/Arena.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,50 @@ IntegerValue &Arena::makeIntLiteral(llvm::APInt Value) {
return *It->second;
}

const Formula &Arena::getFormula(const BoolValue &B) {
auto It = ValToFormula.find(&B);
if (It != ValToFormula.end())
return *It->second;
Formula &F = [&]() -> Formula & {
switch (B.getKind()) {
case Value::Kind::Integer:
case Value::Kind::Reference:
case Value::Kind::Pointer:
case Value::Kind::Struct:
llvm_unreachable("not a boolean");
case Value::Kind::TopBool:
case Value::Kind::AtomicBool:
// TODO: assign atom numbers on creation rather than in getFormula(), so
// they will be deterministic and maybe even meaningful.
return Formula::create(Alloc, Formula::AtomRef, {},
static_cast<unsigned>(makeAtom()));
case Value::Kind::Conjunction:
return Formula::create(
Alloc, Formula::And,
{&getFormula(cast<ConjunctionValue>(B).getLeftSubValue()),
&getFormula(cast<ConjunctionValue>(B).getRightSubValue())});
case Value::Kind::Disjunction:
return Formula::create(
Alloc, Formula::Or,
{&getFormula(cast<DisjunctionValue>(B).getLeftSubValue()),
&getFormula(cast<DisjunctionValue>(B).getRightSubValue())});
case Value::Kind::Negation:
return Formula::create(Alloc, Formula::Not,
{&getFormula(cast<NegationValue>(B).getSubVal())});
case Value::Kind::Implication:
return Formula::create(
Alloc, Formula::Implies,
{&getFormula(cast<ImplicationValue>(B).getLeftSubValue()),
&getFormula(cast<ImplicationValue>(B).getRightSubValue())});
case Value::Kind::Biconditional:
return Formula::create(
Alloc, Formula::Equal,
{&getFormula(cast<BiconditionalValue>(B).getLeftSubValue()),
&getFormula(cast<BiconditionalValue>(B).getRightSubValue())});
}
}();
ValToFormula.try_emplace(&B, &F);
return F;
}

} // namespace clang::dataflow
1 change: 1 addition & 0 deletions clang/lib/Analysis/FlowSensitive/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ add_clang_library(clangAnalysisFlowSensitive
ControlFlowContext.cpp
DataflowAnalysisContext.cpp
DataflowEnvironment.cpp
Formula.cpp
HTMLLogger.cpp
Logger.cpp
RecordOps.cpp
Expand Down

0 comments on commit 2fd614e

Please sign in to comment.