-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][dataflow] Strengthen pointer comparison. #75170
Conversation
@llvm/pr-subscribers-clang-analysis @llvm/pr-subscribers-clang Author: None (martinboehme) Changes
On an internal codebase, this change reduces SAT solver timeouts by over 20% and @Xazax-hun inspired me to think about this with his The one thing where the new code currently does the wrong thing is when I propose that this is acceptable because is unlikely to affect the behavior of With this change, the test TransferTest.DifferentReferenceLocInJoin started to Full diff: https://github.com/llvm/llvm-project/pull/75170.diff 3 Files Affected:
diff --git a/clang/lib/Analysis/FlowSensitive/Transfer.cpp b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
index bbf5f12359bc70..55db15e03b39cb 100644
--- a/clang/lib/Analysis/FlowSensitive/Transfer.cpp
+++ b/clang/lib/Analysis/FlowSensitive/Transfer.cpp
@@ -60,6 +60,11 @@ static BoolValue &evaluateBooleanEquality(const Expr &LHS, const Expr &RHS,
if (auto *RHSBool = dyn_cast_or_null<BoolValue>(RHSValue))
return Env.makeIff(*LHSBool, *RHSBool);
+ if (auto *LHSPtr = dyn_cast_or_null<PointerValue>(LHSValue))
+ if (auto *RHSPtr = dyn_cast_or_null<PointerValue>(RHSValue))
+ return Env.getBoolLiteralValue(&LHSPtr->getPointeeLoc() ==
+ &RHSPtr->getPointeeLoc());
+
return Env.makeAtomicBoolValue();
}
diff --git a/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp b/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
index 003434a58b1075..de18f7de02c12d 100644
--- a/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/DataflowEnvironmentTest.cpp
@@ -24,6 +24,7 @@ namespace {
using namespace clang;
using namespace dataflow;
+using ::clang::dataflow::test::findValueDecl;
using ::clang::dataflow::test::getFieldValue;
using ::testing::Contains;
using ::testing::IsNull;
@@ -152,6 +153,47 @@ TEST_F(EnvironmentTest, JoinRecords) {
}
}
+TEST_F(EnvironmentTest, DifferentReferenceLocInJoin) {
+ // This tests the case where the storage location for a reference-type
+ // variable is different for two states being joined. We used to believe this
+ // could not happen and therefore had an assertion disallowing this; this test
+ // exists to demonstrate that we can handle this condition without a failing
+ // assertion. See also the discussion here:
+ // https://discourse.llvm.org/t/70086/6
+
+ using namespace ast_matchers;
+
+ std::string Code = R"cc(
+ void f(int &ref) {}
+ )cc";
+
+ auto Unit =
+ tooling::buildASTFromCodeWithArgs(Code, {"-fsyntax-only", "-std=c++11"});
+ auto &Context = Unit->getASTContext();
+
+ ASSERT_EQ(Context.getDiagnostics().getClient()->getNumErrors(), 0U);
+
+ const ValueDecl *Ref = findValueDecl(Context, "ref");
+
+ Environment Env1(DAContext);
+ StorageLocation &Loc1 = Env1.createStorageLocation(Context.IntTy);
+ Env1.setStorageLocation(*Ref, Loc1);
+
+ Environment Env2(DAContext);
+ StorageLocation &Loc2 = Env2.createStorageLocation(Context.IntTy);
+ Env2.setStorageLocation(*Ref, Loc2);
+
+ EXPECT_NE(&Loc1, &Loc2);
+
+ Environment::ValueModel Model;
+ Environment EnvJoined = Environment::join(Env1, Env2, Model);
+
+ // Joining environments with different storage locations for the same
+ // declaration results in the declaration being removed from the joined
+ // environment.
+ EXPECT_EQ(EnvJoined.getStorageLocation(*Ref), nullptr);
+}
+
TEST_F(EnvironmentTest, InitGlobalVarsFun) {
using namespace ast_matchers;
diff --git a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
index 8da55953a32986..a4b47f25f36908 100644
--- a/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
+++ b/clang/unittests/Analysis/FlowSensitive/TransferTest.cpp
@@ -3893,6 +3893,66 @@ TEST(TransferTest, BooleanInequality) {
});
}
+TEST(TransferTest, PointerEquality) {
+ std::string Code = R"(
+ void target() {
+ int i = 0;
+ int i_other = 0;
+ int *p1 = &i;
+ int *p2 = &i;
+ int *p_other = &i_other;
+ int *null = nullptr;
+
+ bool p1_eq_p1 = (p1 == p1);
+ bool p1_eq_p2 = (p1 == p2);
+ bool p1_eq_p_other = (p1 == p_other);
+
+ bool p1_eq_null = (p1 == null);
+ bool p1_eq_nullptr = (p1 == nullptr);
+ bool null_eq_nullptr = (null == nullptr);
+ bool nullptr_eq_nullptr = (nullptr == nullptr);
+
+ // We won't duplicate all of the tests above with `!=`, as we know that
+ // the implementation simply negates the result of the `==` comparison.
+ // Instaed, just spot-check one case.
+ bool p1_ne_p_other = (p1 != p_other);
+
+ (void)0; // [[p]]
+ }
+ )";
+ runDataflow(
+ Code,
+ [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
+ ASTContext &ASTCtx) {
+ const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
+
+ // Check the we have indeed set things up so that `p1` and `p2` have
+ // different pointer values.
+ EXPECT_NE(&getValueForDecl<PointerValue>(ASTCtx, Env, "p1"),
+ &getValueForDecl<PointerValue>(ASTCtx, Env, "p2"));
+
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_eq_p1"),
+ &Env.getBoolLiteralValue(true));
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_eq_p2"),
+ &Env.getBoolLiteralValue(true));
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_eq_p_other"),
+ &Env.getBoolLiteralValue(false));
+
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_eq_null"),
+ &Env.getBoolLiteralValue(false));
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_eq_nullptr"),
+ &Env.getBoolLiteralValue(false));
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "null_eq_nullptr"),
+ &Env.getBoolLiteralValue(true));
+ EXPECT_EQ(
+ &getValueForDecl<BoolValue>(ASTCtx, Env, "nullptr_eq_nullptr"),
+ &Env.getBoolLiteralValue(true));
+
+ EXPECT_EQ(&getValueForDecl<BoolValue>(ASTCtx, Env, "p1_ne_p_other"),
+ &Env.getBoolLiteralValue(true));
+ });
+}
+
TEST(TransferTest, IntegerLiteralEquality) {
std::string Code = R"(
void target() {
@@ -6315,48 +6375,4 @@ TEST(TransferTest, LambdaCaptureThis) {
});
}
-TEST(TransferTest, DifferentReferenceLocInJoin) {
- // This test triggers a case where the storage location for a reference-type
- // variable is different for two states being joined. We used to believe this
- // could not happen and therefore had an assertion disallowing this; this test
- // exists to demonstrate that we can handle this condition without a failing
- // assertion. See also the discussion here:
- // https://discourse.llvm.org/t/70086/6
- std::string Code = R"(
- namespace std {
- template <class T> struct initializer_list {
- const T* begin();
- const T* end();
- };
- }
-
- void target(char* p, char* end) {
- while (p != end) {
- if (*p == ' ') {
- p++;
- continue;
- }
-
- auto && range = {1, 2};
- for (auto b = range.begin(), e = range.end(); b != e; ++b) {
- }
- (void)0;
- // [[p]]
- }
- }
- )";
- runDataflow(
- Code,
- [](const llvm::StringMap<DataflowAnalysisState<NoopLattice>> &Results,
- ASTContext &ASTCtx) {
- const Environment &Env = getEnvironmentAtAnnotation(Results, "p");
-
- // Joining environments with different storage locations for the same
- // declaration results in the declaration being removed from the joined
- // environment.
- const ValueDecl *VD = findValueDecl(ASTCtx, "range");
- ASSERT_EQ(Env.getStorageLocation(*VD), nullptr);
- });
-}
-
} // namespace
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a massive improvement! Thanks! Looks great to me, I have some not too important comments inline.
@@ -152,6 +153,47 @@ TEST_F(EnvironmentTest, JoinRecords) { | |||
} | |||
} | |||
|
|||
TEST_F(EnvironmentTest, DifferentReferenceLocInJoin) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While running analysis on C++ code is indirect, it has the advantage of documenting what code can trigger such behavior. I think having direct tests is fine, I just wanted to make this decision explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. The original test case was minimized from a failing assertion, but it would likely be fiddly to try and recreate a test case that triggers this again. Testing directly also has the benefit of being less brittle, so I think it's an acceptable tradeoff in this case.
@@ -3893,6 +3893,66 @@ TEST(TransferTest, BooleanInequality) { | |||
}); | |||
} | |||
|
|||
TEST(TransferTest, PointerEquality) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great to add a test case with union members to document the limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point -- done.
An alternative approach would be to do hash consing. Since pointer values are completely determined by the pointee (for some definition of the pointee, e.g., we might have That being said, this would be a bit of a departure from the current architecture, so I think even if we want to do this, it should be a separate PR. |
270172e
to
dd4aa8b
Compare
Thanks, these are good points. I think we may want to revisit multiple aspects of pointer values and storage locations at some point. For example, storage locations always have a type, and so storage locations of different types necessarily compare different. For example: struct Base {};
struct Derived : public Base {};
Derived d;
Derived *p_derived = &d;
Base *p_base = d;
// `p_derived == p_base` should hold, but the framework assumes `p_derived != p_base`. It's probably a good thing for storage locations to have types (because we want to be able to model the dynamic type of an object), but this probably means that we want to model addresses separately from storage locations. This would then also help in other cases, such as union members and pointer interconvertibility. But all of this would be a major undertaking, I think. |
Our knowledge of the dynamic type might change during the analysis. Consider the following snippet:
So, we either need to mutate storage locations, or we need to store the dynamic types on the side. The Clang Static Analyzer (CSA) is doing the latter. While CSA currently has types associated with the storage locations, we are considering undoing this decision and make storage locations only represented by sizes and offsets. We plan to always rely on the AST to determine the static types, and use a data structure on the side to track the dynamic type information. For some frameworks having typed storage locations might be a good idea, it is not obvious to me where is the sweet spot and it will need careful consideration. (And as far as I understand, we are not even sure whether we want to switch to a different representation like access paths in the future.)
Absolutely agree. The Clang Static Analyzer still has problems with type casts in some cases, it is a really challenging problem. That being said, the CSA is already immensely useful even with the partial modeling it has, so this is probably one of the less pressing problems to solve in the near/medium term. |
Heads up: I'm holding off on merging this patch because it makes some tests in the Crubit nullability analysis fail. I'm still investigating exactly why this is. |
Finally getting back to this. The reason this was causing tests to fail is really pretty obvious in hindsight: Different Here's an example: void f(int *p1, int *p2) {
if (p1 == p2) {
do_something();
}
} The framework initializes In summary, if the storage locations for two pointers are the same, we can return a true literal for the comparison, but if the storage locations are different, we need to return an atom. I'll change the PR accordingly and will let you know once it's ready for another look. |
- Instead of comparing the identity of the `PointerValue`s, compare the underlying `StorageLocation`s. - If the `StorageLocation`s are the same, return a definite "true" as the result of the comparison. Before, if the `PointerValue`s were different, we would return an atom, even if the storage locations themselves were the same. - If the `StorageLocation`s are different, return an atom (as before). Pointers that have different storage locations may still alias, so we can't return a definite "false" in this case. The application-level gains from this are relatively modest. For the Crubit nullability check running on an internal codebase, this change reduces the number of functions on which the SAT solver times out from 223 to 221; the number of "pointer expression not modeled" errors reduces from 3815 to 3778. Still, it seems that the gain in precision is generally worthwhile. @Xazax-hun inspired me to think about this with his [comments](llvm#73860 (review)) on a different PR.
dd4aa8b
to
0ec8426
Compare
Sorry, this completely fell off my radar. I've finally revisited this now, and the PR is ready for review again. PTAL. |
I am wondering, if it would make sense to "fix up" the state when we discover aliasing? I.e., "merging" the two storage locations. Consider:
|
This sounds great -- but I think getting this to work wouldn't be trivial. At least, I don't yet have any good ideas on how to make it work. I think we'd have to do something like this:
Would require some more thought and definitely quite a bit of effort to implement, I think. |
CI failure (DataFlowSanitizer-x86_64 :: release_shadow_space.c) looks unrelated. |
See the TODOs that have now been fixed. The dataflow framework now considers pointers with the same storage location to compare equal (see upstream [#75170](llvm/llvm-project#75170), and this allows us to directly compare pointers in the tests. PiperOrigin-RevId: 635791000 Change-Id: I1351d76eee76668b2a3be0efbce829c8c65874ee
Instead of comparing the identity of the
PointerValue
s, compare theunderlying
StorageLocation
s.If the
StorageLocation
s are the same, return a definite "true" as theresult of the comparison. Before, if the
PointerValue
s were different, wewould return an atom, even if the storage locations themselves were the same.
If the
StorageLocation
s are different, return an atom (as before). Pointersthat have different storage locations may still alias, so we can't return a
definite "false" in this case.
The application-level gains from this are relatively modest. For the Crubit
nullability check running on an internal codebase, this change reduces the
number of functions on which the SAT solver times out from 223 to 221; the
number of "pointer expression not modeled" errors reduces from 3815 to 3778.
Still, it seems that the gain in precision is generally worthwhile.
@Xazax-hun inspired me to think about this with his
comments
on a different PR.