Skip to content

Commit

Permalink
[clang][dataflow] Add synthetic fields to RecordStorageLocation (#7…
Browse files Browse the repository at this point in the history
…3860)

Synthetic fields are intended to model the internal state of a class
(e.g. the value stored in a `std::optional`) without having to depend on
that class's implementation details.

Today, this is typically done with properties on `RecordValue`s, but
these have several drawbacks:

* Care must be taken to call `refreshRecordValue()` before modifying a
property so that the modified property values aren’t seen by other
environments that may have access to the same `RecordValue`.

* Properties aren’t associated with a storage location. If an analysis
needs to associate a location with the value stored in a property (e.g.
to model the reference returned by `std::optional::value()`), it needs
to manually add an indirection using a `PointerValue`. (See for example
the way this is done in UncheckedOptionalAccessModel.cpp, specifically
in `maybeInitializeOptionalValueMember()`.)

* Properties don’t participate in the builtin compare, join, and widen
operations. If an analysis needs to apply these operations to
properties, it needs to override the corresponding methods of
`ValueModel`.

* Longer-term, we plan to eliminate `RecordValue`, as by-value
operations on records aren’t really “a thing” in C++ (see
https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This
would obviously eliminate the ability to set properties on
`RecordValue`s.

To demonstrate the advantages of synthetic fields, this patch converts
UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly
simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it
down into a stack of three commits, each of which contains a set of
logically related changes. I considered submitting each of these as a
separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in
UncheckedOptionalAccessModel.cpp. This gives a flavor for how the
various API changes work together in the context of an analysis. Then,
review the rest of the changes.
  • Loading branch information
martinboehme committed Dec 4, 2023
1 parent 0cdacd5 commit 71f2ec2
Show file tree
Hide file tree
Showing 18 changed files with 448 additions and 384 deletions.
18 changes: 17 additions & 1 deletion clang/include/clang/Analysis/FlowSensitive/DataflowAnalysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,22 @@ runDataflowAnalysis(
return std::move(BlockStates);
}

// Create an analysis class that is derived from `DataflowAnalysis`. This is an
// SFINAE adapter that allows us to call two different variants of constructor
// (either with or without the optional `Environment` parameter).
// FIXME: Make all classes derived from `DataflowAnalysis` take an `Environment`
// parameter in their constructor so that we can get rid of this abomination.
template <typename AnalysisT>
auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
-> decltype(AnalysisT(ASTCtx, Env)) {
return AnalysisT(ASTCtx, Env);
}
template <typename AnalysisT>
auto createAnalysis(ASTContext &ASTCtx, Environment &Env)
-> decltype(AnalysisT(ASTCtx)) {
return AnalysisT(ASTCtx);
}

/// Runs a dataflow analysis over the given function and then runs `Diagnoser`
/// over the results. Returns a list of diagnostics for `FuncDecl` or an
/// error. Currently, errors can occur (at least) because the analysis requires
Expand Down Expand Up @@ -262,7 +278,7 @@ llvm::Expected<llvm::SmallVector<Diagnostic>> diagnoseFunction(
const WatchedLiteralsSolver *Solver = OwnedSolver.get();
DataflowAnalysisContext AnalysisContext(std::move(OwnedSolver));
Environment Env(AnalysisContext, FuncDecl);
AnalysisT Analysis(ASTCtx);
AnalysisT Analysis = createAnalysis<AnalysisT>(ASTCtx, Env);
llvm::SmallVector<Diagnostic> Diagnostics;
if (llvm::Error Err =
runTypeErasedDataflowAnalysis(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,10 @@ using FieldSet = llvm::SmallSetVector<const FieldDecl *, 4>;
/// Returns the set of all fields in the type.
FieldSet getObjectFields(QualType Type);

/// Returns whether `Fields` and `FieldLocs` contain the same fields.
bool containsSameFields(const FieldSet &Fields,
const RecordStorageLocation::FieldToLoc &FieldLocs);

struct ContextSensitiveOptions {
/// The maximum depth to analyze. A value of zero is equivalent to disabling
/// context-sensitive analysis entirely.
Expand Down Expand Up @@ -92,11 +96,39 @@ class DataflowAnalysisContext {
/*Logger=*/nullptr});
~DataflowAnalysisContext();

/// Sets a callback that returns the names and types of the synthetic fields
/// to add to a `RecordStorageLocation` of a given type.
/// Typically, this is called from the constructor of a `DataflowAnalysis`
///
/// To maintain the invariant that all `RecordStorageLocation`s of a given
/// type have the same fields:
/// * The callback must always return the same result for a given type
/// * `setSyntheticFieldCallback()` must be called before any
// `RecordStorageLocation`s are created.
void setSyntheticFieldCallback(
std::function<llvm::StringMap<QualType>(QualType)> CB) {
assert(!RecordStorageLocationCreated);
SyntheticFieldCallback = CB;
}

/// Returns a new storage location appropriate for `Type`.
///
/// A null `Type` is interpreted as the pointee type of `std::nullptr_t`.
StorageLocation &createStorageLocation(QualType Type);

/// Creates a `RecordStorageLocation` for the given type and with the given
/// fields.
///
/// Requirements:
///
/// `FieldLocs` must contain exactly the fields returned by
/// `getModeledFields(Type)`.
/// `SyntheticFields` must contain exactly the fields returned by
/// `getSyntheticFields(Type)`.
RecordStorageLocation &createRecordStorageLocation(
QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
RecordStorageLocation::SyntheticFieldMap SyntheticFields);

/// Returns a stable storage location for `D`.
StorageLocation &getStableStorageLocation(const ValueDecl &D);

Expand Down Expand Up @@ -169,6 +201,15 @@ class DataflowAnalysisContext {
/// context.
FieldSet getModeledFields(QualType Type);

/// Returns the names and types of the synthetic fields for the given record
/// type.
llvm::StringMap<QualType> getSyntheticFields(QualType Type) {
assert(Type->isRecordType());
if (SyntheticFieldCallback)
return SyntheticFieldCallback(Type);
return {};
}

private:
friend class Environment;

Expand Down Expand Up @@ -250,6 +291,11 @@ class DataflowAnalysisContext {
FieldSet ModeledFields;

std::unique_ptr<Logger> LogOwner; // If created via flags.

std::function<llvm::StringMap<QualType>(QualType)> SyntheticFieldCallback;

/// Has any `RecordStorageLocation` been created yet?
bool RecordStorageLocationCreated = false;
};

} // namespace dataflow
Expand Down
25 changes: 23 additions & 2 deletions clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,15 @@ class Environment {
/// with a symbolic representation of the `this` pointee.
Environment(DataflowAnalysisContext &DACtx, const DeclContext &DeclCtx);

/// Assigns storage locations and values to all parameters, captures, global
/// variables, fields and functions referenced in the function currently being
/// analyzed.
///
/// Requirements:
///
/// The function must have a body.
void initialize();

/// Returns a new environment that is a copy of this one.
///
/// The state of the program is initially the same, but can be mutated without
Expand Down Expand Up @@ -283,7 +292,15 @@ class Environment {
/// Returns the storage location assigned to the `this` pointee in the
/// environment or null if the `this` pointee has no assigned storage location
/// in the environment.
RecordStorageLocation *getThisPointeeStorageLocation() const;
RecordStorageLocation *getThisPointeeStorageLocation() const {
return ThisPointeeLoc;
}

/// Sets the storage location assigned to the `this` pointee in the
/// environment.
void setThisPointeeStorageLocation(RecordStorageLocation &Loc) {
ThisPointeeLoc = &Loc;
}

/// Returns the location of the result object for a record-type prvalue.
///
Expand Down Expand Up @@ -367,7 +384,8 @@ class Environment {
/// storage locations and values for indirections until it finds a
/// non-pointer/non-reference type.
///
/// If `Type` is a class, struct, or union type, calls `setValue()` to
/// If `Type` is a class, struct, or union type, creates values for all
/// modeled fields (including synthetic fields) and calls `setValue()` to
/// associate the `RecordValue` with its storage location
/// (`RecordValue::getLoc()`).
///
Expand Down Expand Up @@ -570,6 +588,9 @@ class Environment {
return dyn_cast<FunctionDecl>(getDeclCtx());
}

/// Returns the size of the call stack.
size_t callStackSize() const { return CallStack.size(); }

/// Returns whether this `Environment` can be extended to analyze the given
/// `Callee` (i.e. if `pushCall` can be used), with recursion disallowed and a
/// given `MaxDepth`.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ struct UncheckedOptionalAccessModelOptions {
class UncheckedOptionalAccessModel
: public DataflowAnalysis<UncheckedOptionalAccessModel, NoopLattice> {
public:
UncheckedOptionalAccessModel(ASTContext &Ctx);
UncheckedOptionalAccessModel(ASTContext &Ctx, dataflow::Environment &Env);

/// Returns a matcher for the optional classes covered by this model.
static ast_matchers::DeclarationMatcher optionalClassDecl();
Expand All @@ -54,17 +54,6 @@ class UncheckedOptionalAccessModel

void transfer(const CFGElement &Elt, NoopLattice &L, Environment &Env);

ComparisonResult compare(QualType Type, const Value &Val1,
const Environment &Env1, const Value &Val2,
const Environment &Env2) override;

bool merge(QualType Type, const Value &Val1, const Environment &Env1,
const Value &Val2, const Environment &Env2, Value &MergedVal,
Environment &MergedEnv) override;

Value *widen(QualType Type, Value &Prev, const Environment &PrevEnv,
Value &Current, Environment &CurrentEnv) override;

private:
CFGMatchSwitch<TransferState<NoopLattice>> TransferMatchSwitch;
};
Expand Down
17 changes: 9 additions & 8 deletions clang/include/clang/Analysis/FlowSensitive/RecordOps.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ namespace dataflow {

/// Copies a record (struct, class, or union) from `Src` to `Dst`.
///
/// This performs a deep copy, i.e. it copies every field and recurses on
/// fields of record type. It also copies properties from the `RecordValue`
/// associated with `Src` to the `RecordValue` associated with `Dst` (if these
/// `RecordValue`s exist).
/// This performs a deep copy, i.e. it copies every field (including synthetic
/// fields) and recurses on fields of record type. It also copies properties
/// from the `RecordValue` associated with `Src` to the `RecordValue` associated
/// with `Dst` (if these `RecordValue`s exist).
///
/// If there is a `RecordValue` associated with `Dst` in the environment, this
/// function creates a new `RecordValue` and associates it with `Dst`; clients
Expand All @@ -47,10 +47,11 @@ void copyRecord(RecordStorageLocation &Src, RecordStorageLocation &Dst,
/// retrieved from `Env2`. A convenience overload retrieves values for `Loc1`
/// and `Loc2` from the same environment.
///
/// This performs a deep comparison, i.e. it compares every field and recurses
/// on fields of record type. Fields of reference type compare equal if they
/// refer to the same storage location. If `RecordValue`s are associated with
/// `Loc1` and `Loc2`, it also compares the properties on those `RecordValue`s.
/// This performs a deep comparison, i.e. it compares every field (including
/// synthetic fields) and recurses on fields of record type. Fields of reference
/// type compare equal if they refer to the same storage location. If
/// `RecordValue`s are associated with `Loc1` and Loc2`, it also compares the
/// properties on those `RecordValue`s.
///
/// Note on how to interpret the result:
/// - If this returns true, the records are guaranteed to be equal at runtime.
Expand Down
32 changes: 26 additions & 6 deletions clang/include/clang/Analysis/FlowSensitive/StorageLocation.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,13 @@ class ScalarStorageLocation final : public StorageLocation {
///
/// Contains storage locations for all modeled fields of the record (also
/// referred to as "children"). The child map is flat, so accessible members of
/// the base class are directly accesible as children of this location.
/// the base class are directly accessible as children of this location.
///
/// Record storage locations may also contain so-called synthetic fields. These
/// are typically used to model the internal state of a class (e.g. the value
/// stored in a `std::optional`) without having to depend on that class's
/// implementation details. All `RecordStorageLocation`s of a given type should
/// have the same synthetic fields.
///
/// The storage location for a field of reference type may be null. This
/// typically occurs in one of two situations:
Expand All @@ -88,12 +94,12 @@ class ScalarStorageLocation final : public StorageLocation {
class RecordStorageLocation final : public StorageLocation {
public:
using FieldToLoc = llvm::DenseMap<const ValueDecl *, StorageLocation *>;
using SyntheticFieldMap = llvm::StringMap<StorageLocation *>;

explicit RecordStorageLocation(QualType Type)
: RecordStorageLocation(Type, FieldToLoc()) {}

RecordStorageLocation(QualType Type, FieldToLoc TheChildren)
: StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)) {
RecordStorageLocation(QualType Type, FieldToLoc TheChildren,
SyntheticFieldMap TheSyntheticFields)
: StorageLocation(Kind::Record, Type), Children(std::move(TheChildren)),
SyntheticFields(std::move(TheSyntheticFields)) {
assert(!Type.isNull());
assert(Type->isRecordType());
assert([this] {
Expand Down Expand Up @@ -133,6 +139,19 @@ class RecordStorageLocation final : public StorageLocation {
return It->second;
}

/// Returns the storage location for the synthetic field `Name`.
/// The synthetic field must exist.
StorageLocation &getSyntheticField(llvm::StringRef Name) const {
StorageLocation *Loc = SyntheticFields.lookup(Name);
assert(Loc != nullptr);
return *Loc;
}

llvm::iterator_range<SyntheticFieldMap::const_iterator>
synthetic_fields() const {
return {SyntheticFields.begin(), SyntheticFields.end()};
}

/// Changes the child storage location for a field `D` of reference type.
/// All other fields cannot change their storage location and always retain
/// the storage location passed to the `RecordStorageLocation` constructor.
Expand All @@ -151,6 +170,7 @@ class RecordStorageLocation final : public StorageLocation {

private:
FieldToLoc Children;
SyntheticFieldMap SyntheticFields;
};

} // namespace dataflow
Expand Down
40 changes: 39 additions & 1 deletion clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,11 +68,38 @@ StorageLocation &DataflowAnalysisContext::createStorageLocation(QualType Type) {
else
FieldLocs.insert({Field, &createStorageLocation(
Field->getType().getNonReferenceType())});
return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs));

RecordStorageLocation::SyntheticFieldMap SyntheticFields;
for (const auto &Entry : getSyntheticFields(Type))
SyntheticFields.insert(
{Entry.getKey(),
&createStorageLocation(Entry.getValue().getNonReferenceType())});

return createRecordStorageLocation(Type, std::move(FieldLocs),
std::move(SyntheticFields));
}
return arena().create<ScalarStorageLocation>(Type);
}

// Returns the keys for a given `StringMap`.
// Can't use `StringSet` as the return type as it doesn't support `operator==`.
template <typename T>
static llvm::DenseSet<llvm::StringRef> getKeys(const llvm::StringMap<T> &Map) {
return llvm::DenseSet<llvm::StringRef>(Map.keys().begin(), Map.keys().end());
}

RecordStorageLocation &DataflowAnalysisContext::createRecordStorageLocation(
QualType Type, RecordStorageLocation::FieldToLoc FieldLocs,
RecordStorageLocation::SyntheticFieldMap SyntheticFields) {
assert(Type->isRecordType());
assert(containsSameFields(getModeledFields(Type), FieldLocs));
assert(getKeys(getSyntheticFields(Type)) == getKeys(SyntheticFields));

RecordStorageLocationCreated = true;
return arena().create<RecordStorageLocation>(Type, std::move(FieldLocs),
std::move(SyntheticFields));
}

StorageLocation &
DataflowAnalysisContext::getStableStorageLocation(const ValueDecl &D) {
if (auto *Loc = DeclToLoc.lookup(&D))
Expand Down Expand Up @@ -367,3 +394,14 @@ clang::dataflow::FieldSet clang::dataflow::getObjectFields(QualType Type) {
getFieldsFromClassHierarchy(Type, Fields);
return Fields;
}

bool clang::dataflow::containsSameFields(
const clang::dataflow::FieldSet &Fields,
const clang::dataflow::RecordStorageLocation::FieldToLoc &FieldLocs) {
if (Fields.size() != FieldLocs.size())
return false;
for ([[maybe_unused]] auto [Field, Loc] : FieldLocs)
if (!Fields.contains(cast_or_null<FieldDecl>(Field)))
return false;
return true;
}

0 comments on commit 71f2ec2

Please sign in to comment.