-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[analyzer] CStringChecker: do not branch assuming the arguments of str(n)cmp
are equal
#161644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tr(n)cmp` are equal
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: None (guillem-bartrina-sonarsource) ChangesUpon running into This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change. Full diff: https://github.com/llvm/llvm-project/pull/161644.diff 2 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 0ae784c000f60..c15464e9a713f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -2418,15 +2418,12 @@ void CStringChecker::evalStrcmpCommon(CheckerContext &C, const CallEvent &Call,
// If the two arguments might be the same buffer, we know the result is 0,
// and we only need to check one size.
- if (StSameBuf) {
+ if (StSameBuf && !StNotSameBuf) {
StSameBuf =
StSameBuf->BindExpr(Call.getOriginExpr(), LCtx,
svalBuilder.makeZeroVal(Call.getResultType()));
C.addTransition(StSameBuf);
-
- // If the two arguments are GUARANTEED to be the same, we're done!
- if (!StNotSameBuf)
- return;
+ return;
}
assert(StNotSameBuf);
diff --git a/clang/test/Analysis/string.c b/clang/test/Analysis/string.c
index cdd36275568e3..45b81297a026e 100644
--- a/clang/test/Analysis/string.c
+++ b/clang/test/Analysis/string.c
@@ -955,6 +955,14 @@ void strcmp_fn_l(char *x) {
strcmp((char*)&strcmp_null_argument, x); // expected-warning{{Argument to string comparison function is the address of the function 'strcmp_null_argument', which is not a null-terminated string}}
}
+void strcmp_assumptions(char *a, char *b) {
+ if (strcmp(a, b) == 0) {
+ // ...
+ }
+
+ clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
//===----------------------------------------------------------------------===
// strncmp()
//===----------------------------------------------------------------------===
@@ -1070,6 +1078,14 @@ int strncmp_null_argument(char *a, size_t n) {
return strncmp(a, b, n); // expected-warning{{Null pointer passed as 2nd argument to string comparison function}}
}
+void strncmp_assumptions(char *a, char *b, size_t n) {
+ if (strncmp(a, b, n) == 0) {
+ // ...
+ }
+
+ clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
//===----------------------------------------------------------------------===
// strcasecmp()
//===----------------------------------------------------------------------===
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the old behavior had problems, but the new code also has a problematic aspect: in the case when the pointers may or may not be same (that is, when StSameBuf && StNotSameBuf
is true) it always assumes that the pointers are not equal (because it assigns state = StNotSameBuf;
in the line just below the diff context displayed by github).
To test this, you should replace the // ...
placeholders in your new tests with clang_analyzer_eval(a == b); // expected-warning{{UNKNOWN}}
. Ideally these checks should pass (if the strings are identical, the pointers may or may not be equal) but with your code they will produce FALSE (because the StSameBuf
branch is silently discarded).
To avoid losing the "pointers may be equal" case you should remove the statement state = StNotSameBuf;
. Unfortunately, deleting this statement also has a serious side effect: after it clang_analyzer_eval(a == b);
would also be UNKNOWN
on the branch where strcmp
is nonzero. (However, this side effect is just "not recording knowledge" which is IMO less problematic than "arbitrarily discarding a case that would be possible".)
This side effect can be easily mitigated in the case when the analyzer can evaluate the comparison of known strings (just re-add state = StNotSameBuf;
on the branch where the checker records the assumption that the result is GT or LT compared to zero).
Mitigating the side effect is also possible in the case when the analyzer cannot determine the comparison result (which includes e.g. the tests added in this PR): to do this you should introduce the following steps (on the branch where the analyzer didn't determine the comparison result):
- call
auto [StResultEqual, StResultNonEqual] = state->assume(resultVal)
(whereresultVal
is the freshly conjured otherwise unconstrained result value)- these states will be both non-null because the value was unconstrained;
- add a transition to
StResultEqual
(i.e. the string values equal, pointers may or may not be equal); - starting from
StResultNotEqual
(when the string values differ) assume that the two pointers are different, then add a second transition to the resulting state.
This second mitigation introduces a state split, but IMO this is a much more "benign" state split than the one that was introduced by the old code. Separating "strcmp
returns zero" and "strcmp
returns nonzero" is a reasonable after a call to strcmp
(in fact, this is very similar to the eagerly-assume
heuristic, which works correctly and is enabled by default) -- while the old "pointers are equal" and "pointers are different" state split was indeed a surprising and unjustified consequence of calling strcmp
.
I admit that these suggestions are a bit complex, feel free to ask for details if you don't understand something...
Upon running into
str(n)cmp
with unknown symbolic arguments,CStringChecker
forks the analysis assuming that the two arguments are EQUAL in one branch. There is no explicit motivation for this behavior in the original commit message c026370, which leads me to believe it was not intentional.This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function
memcmp
(here). I suggest doing the same as in the latter case: not forking the analysis and proceeding to one branch or the other, whether we know for certain that the two arguments are EQUAL or not.Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change.