Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][dataflow] Fix an issue with Environment::getResultObjectLocation(). #75483

Merged
merged 4 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 8 additions & 25 deletions clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h
Original file line number Diff line number Diff line change
Expand Up @@ -325,7 +325,8 @@ class Environment {
///
/// Requirements:
/// `E` must be a prvalue of record type.
RecordStorageLocation &getResultObjectLocation(const Expr &RecordPRValue);
RecordStorageLocation &
getResultObjectLocation(const Expr &RecordPRValue) const;

/// Returns the return value of the current function. This can be null if:
/// - The function has a void return type
Expand Down Expand Up @@ -434,24 +435,14 @@ class Environment {

/// Assigns `Val` as the value of the prvalue `E` in the environment.
///
/// If `E` is not yet associated with a storage location, associates it with
/// a newly created storage location. In any case, associates the storage
/// location of `E` with `Val`.
///
/// Once the migration to strict handling of value categories is complete
/// (see https://discourse.llvm.org/t/70086), this function will be renamed to
/// `setValue()`. At this point, prvalue expressions will be associated
/// directly with `Value`s, and the legacy behavior of associating prvalue
/// expressions with storage locations (as described above) will be
/// eliminated.
///
/// Requirements:
///
/// `E` must be a prvalue
/// If `Val` is a `RecordValue`, its `RecordStorageLocation` must be the
/// same as that of any `RecordValue` that has already been associated with
/// `E`. This is to guarantee that the result object initialized by a prvalue
/// `RecordValue` has a durable storage location.
/// - `E` must be a prvalue
/// - If `Val` is a `RecordValue`, its `RecordStorageLocation` must be
/// `getResultObjectLocation(E)`. An exception to this is if `E` is an
/// expression that originally creates a `RecordValue` (such as a
/// `CXXConstructExpr` or `CallExpr`), as these establish the location of
/// the result object in the first place.
void setValue(const Expr &E, Value &Val);

/// Returns the value assigned to `Loc` in the environment or null if `Loc`
Expand Down Expand Up @@ -608,14 +599,6 @@ class Environment {
// The copy-constructor is for use in fork() only.
Environment(const Environment &) = default;

/// Internal version of `setStorageLocation()` that doesn't check if the
/// expression is a prvalue.
void setStorageLocationInternal(const Expr &E, StorageLocation &Loc);

/// Internal version of `getStorageLocation()` that doesn't check if the
/// expression is a prvalue.
StorageLocation *getStorageLocationInternal(const Expr &E) const;

/// Creates a value appropriate for `Type`, if `Type` is supported, otherwise
/// return null.
///
Expand Down
79 changes: 58 additions & 21 deletions clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -726,27 +726,70 @@ void Environment::setStorageLocation(const Expr &E, StorageLocation &Loc) {
// so allow these as an exception.
assert(E.isGLValue() ||
E.getType()->isSpecificBuiltinType(BuiltinType::BuiltinFn));
setStorageLocationInternal(E, Loc);
const Expr &CanonE = ignoreCFGOmittedNodes(E);
assert(!ExprToLoc.contains(&CanonE));
ExprToLoc[&CanonE] = &Loc;
}

StorageLocation *Environment::getStorageLocation(const Expr &E) const {
// See comment in `setStorageLocation()`.
assert(E.isGLValue() ||
E.getType()->isSpecificBuiltinType(BuiltinType::BuiltinFn));
return getStorageLocationInternal(E);
auto It = ExprToLoc.find(&ignoreCFGOmittedNodes(E));
return It == ExprToLoc.end() ? nullptr : &*It->second;
}

// Returns whether a prvalue of record type is the one that originally
// constructs the object (i.e. it doesn't propagate it from one of its
// children).
static bool isOriginalRecordConstructor(const Expr &RecordPRValue) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it is better to enumerate the "transparent" AST nodes where we need to do the propagation.
My questions are:

  • Roughly how many such transparent nodes do we have?
  • In case we miss something, is it less desirable to erroneously propagate something or to erroneously create a new PRValue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it is better to enumerate the "transparent" AST nodes where we need to do the propagation. My questions are:

  • Roughly how many such transparent nodes do we have?

This function in the Crubit lifetime analysis gives a rough idea. It does essentially what the FIXME in the documentation for getResultObjectLocation() says we eventually want to do also in the framework.

From this, we see that there are more "transparent" nodes than "is original record constructor" nodes. Also, the "is original record constructor" nodes are the ones for which we know that the framework produces a RecordValue -- so it seemed to make more sense to enumerate those.

  • In case we miss something, is it less desirable to erroneously propagate something or to erroneously create a new PRValue?

I think it's a wash. In both cases:

  • We will hit one of the assertions (or will perform the fallback behavior in non-assertion compiles)
  • We will model the behavior of the code incorrectly. I don't think one of the two variants of wrong behavior will clearly be more benign in all cases -- I think there are some cases where we would prefer one variant and some cases where we would prefer the other, but of course ultimately we should fix such errors anyway.

if (auto *Init = dyn_cast<InitListExpr>(&RecordPRValue))
return !Init->isSemanticForm() || !Init->isTransparent();
return isa<CXXConstructExpr>(RecordPRValue) || isa<CallExpr>(RecordPRValue) ||
isa<LambdaExpr>(RecordPRValue) ||
// The framework currently does not propagate the objects created in
// the two branches of a `ConditionalOperator` because there is no way
// to reconcile their storage locations, which are different. We
// therefore claim that the `ConditionalOperator` is the expression
// that originally constructs the object.
// Ultimately, this will be fixed by propagating locations down from
// the result object, rather than up from the original constructor as
// we do now (see also the FIXME in the documentation for
// `getResultObjectLocation()`).
isa<ConditionalOperator>(RecordPRValue);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this deserve a FIXME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add one originally because there's already an extensive FIXME in the documentation for getResultObjectLocation() that describes this in more detail, but your comment made me realize we should refer to that FIXME here. Done.

}

RecordStorageLocation &
Environment::getResultObjectLocation(const Expr &RecordPRValue) {
Environment::getResultObjectLocation(const Expr &RecordPRValue) const {
assert(RecordPRValue.getType()->isRecordType());
assert(RecordPRValue.isPRValue());

if (StorageLocation *ExistingLoc = getStorageLocationInternal(RecordPRValue))
return *cast<RecordStorageLocation>(ExistingLoc);
auto &Loc = cast<RecordStorageLocation>(
DACtx->getStableStorageLocation(RecordPRValue));
setStorageLocationInternal(RecordPRValue, Loc);
return Loc;
// Returns a storage location that we can use if assertions fail.
auto FallbackForAssertFailure =
[this, &RecordPRValue]() -> RecordStorageLocation & {
return cast<RecordStorageLocation>(
DACtx->getStableStorageLocation(RecordPRValue));
};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the special handling of assertion failures? We don't typically do this, IIUC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's unusual about this code is that it's handling a lot of expression node kinds generically.

I think I've enumerated all the "is original record constructor" nodes that we need, but it felt sensible to put some kind of fallback in place in case there is some node kind I haven't discovered that doesn't have any children and would therefore return an invalid reference if we didn't have this fallback.

if (isOriginalRecordConstructor(RecordPRValue)) {
auto *Val = cast_or_null<RecordValue>(getValue(RecordPRValue));
// The builtin transfer function should have created a `RecordValue` for all
// original record constructors.
assert(Val);
if (!Val)
return FallbackForAssertFailure();
return Val->getLoc();
}

// Expression nodes that propagate a record prvalue should have exactly one
// child.
llvm::SmallVector<const Stmt *> children(RecordPRValue.child_begin(),
RecordPRValue.child_end());
assert(children.size() == 1);
if (children.empty())
return FallbackForAssertFailure();

return getResultObjectLocation(*cast<Expr>(children[0]));
}

PointerValue &Environment::getOrCreateNullPointerValue(QualType PointeeType) {
Expand All @@ -760,6 +803,11 @@ void Environment::setValue(const StorageLocation &Loc, Value &Val) {
}

void Environment::setValue(const Expr &E, Value &Val) {
if (auto *RecordVal = dyn_cast<RecordValue>(&Val)) {
assert(isOriginalRecordConstructor(E) ||
&RecordVal->getLoc() == &getResultObjectLocation(E));
}

assert(E.isPRValue());
ExprToVal[&E] = &Val;
}
Expand Down Expand Up @@ -799,18 +847,6 @@ Value *Environment::createValue(QualType Type) {
return Val;
}

void Environment::setStorageLocationInternal(const Expr &E,
StorageLocation &Loc) {
const Expr &CanonE = ignoreCFGOmittedNodes(E);
assert(!ExprToLoc.contains(&CanonE));
ExprToLoc[&CanonE] = &Loc;
}

StorageLocation *Environment::getStorageLocationInternal(const Expr &E) const {
auto It = ExprToLoc.find(&ignoreCFGOmittedNodes(E));
return It == ExprToLoc.end() ? nullptr : &*It->second;
}

Value *Environment::createValueUnlessSelfReferential(
QualType Type, llvm::DenseSet<QualType> &Visited, int Depth,
int &CreatedValuesCount) {
Expand Down Expand Up @@ -1044,6 +1080,7 @@ RecordValue &refreshRecordValue(const Expr &Expr, Environment &Env) {
if (auto *ExistingVal = cast_or_null<RecordValue>(Env.getValue(Expr))) {
auto &NewVal = Env.create<RecordValue>(ExistingVal->getLoc());
Env.setValue(Expr, NewVal);
Env.setValue(NewVal.getLoc(), NewVal);
return NewVal;
}

Expand Down
9 changes: 8 additions & 1 deletion clang/lib/Analysis/FlowSensitive/Transfer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,6 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
if (S->getType()->isRecordType()) {
auto &InitialVal = *cast<RecordValue>(Env.createValue(S->getType()));
Env.setValue(*S, InitialVal);
copyRecord(InitialVal.getLoc(), Env.getResultObjectLocation(*S), Env);
}

transferInlineCall(S, ConstructorDecl);
Expand Down Expand Up @@ -582,6 +581,14 @@ class TransferVisitor : public ConstStmtVisitor<TransferVisitor> {
Env.setValue(*S, *ArgVal);
} else if (const FunctionDecl *F = S->getDirectCallee()) {
transferInlineCall(S, F);

// If this call produces a prvalue of record type, make sure that we have
// a `RecordValue` for it. This is required so that
// `Environment::getResultObjectLocation()` is able to return a location
// for this `CallExpr`.
if (S->getType()->isRecordType() && S->isPRValue())
if (Env.getValue(*S) == nullptr)
refreshRecordValue(*S, Env);
}
}

Expand Down
49 changes: 49 additions & 0 deletions clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2635,6 +2635,55 @@ TEST(TransferTest, BindTemporary) {
});
}

TEST(TransferTest, ResultObjectLocation) {
std::string Code = R"(
struct A {
virtual ~A() = default;
};

void target() {
A();
(void)0; // [[p]]
}
)";
using ast_matchers::cxxBindTemporaryExpr;
using ast_matchers::cxxTemporaryObjectExpr;
using ast_matchers::exprWithCleanups;
using ast_matchers::has;
using ast_matchers::match;
using ast_matchers::selectFirst;
using ast_matchers::traverse;
runDataflow(
Code,
[](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
ASTContext &ASTCtx) {
const Environment &Env = getEnvironmentAtAnnotation(Results, "p");

// The expresssion `A()` in the code above produces the following
// structure, consisting of three prvalues of record type.
// `Env.getResultObjectLocation()` should return the same location for
// all of these.
auto MatchResult = match(
traverse(TK_AsIs,
exprWithCleanups(
has(cxxBindTemporaryExpr(
has(cxxTemporaryObjectExpr().bind("toe")))
.bind("bte")))
.bind("ewc")),
ASTCtx);
auto *TOE = selectFirst<CXXTemporaryObjectExpr>("toe", MatchResult);
ASSERT_NE(TOE, nullptr);
auto *EWC = selectFirst<ExprWithCleanups>("ewc", MatchResult);
ASSERT_NE(EWC, nullptr);
auto *BTE = selectFirst<CXXBindTemporaryExpr>("bte", MatchResult);
ASSERT_NE(BTE, nullptr);

RecordStorageLocation &Loc = Env.getResultObjectLocation(*TOE);
EXPECT_EQ(&Loc, &Env.getResultObjectLocation(*EWC));
EXPECT_EQ(&Loc, &Env.getResultObjectLocation(*BTE));
});
}

TEST(TransferTest, StaticCast) {
std::string Code = R"(
void target(int Foo) {
Expand Down