[analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal #161644

guillem-bartrina-sonarsource · 2025-10-02T09:52:56Z

Upon running into str(n)cmp with unknown symbolic arguments, CStringChecker forks the analysis assuming that the two arguments are EQUAL in one branch. There is no explicit motivation for this behavior in the original commit message c026370, which leads me to believe it was not intentional.

This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function memcmp (here). I suggest doing the same as in the latter case: not forking the analysis and proceeding to one branch or the other, whether we know for certain that the two arguments are EQUAL or not.

Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change.

…tr(n)cmp` are equal

llvmbot · 2025-10-02T09:53:31Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (guillem-bartrina-sonarsource)

Changes

Upon running into str(n)cmp with unknown symbolic arguments, CStringChecker forks the analysis assuming that the two arguments are EQUAL in one branch. There is no explicit motivation for this behavior in the original commit message c026370, which leads me to believe it was not intentional.

This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function memcmp (here). I suggest doing the same as in the latter case: not forking the analysis and proceeding to one branch or the other, whether we know for certain that the two arguments are EQUAL or not.

Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change.

Full diff: https://github.com/llvm/llvm-project/pull/161644.diff

2 Files Affected:

(modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+2-5)
(modified) clang/test/Analysis/string.c (+16)

diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 0ae784c000f60..c15464e9a713f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -2418,15 +2418,12 @@ void CStringChecker::evalStrcmpCommon(CheckerContext &C, const CallEvent &Call,
 
   // If the two arguments might be the same buffer, we know the result is 0,
   // and we only need to check one size.
-  if (StSameBuf) {
+  if (StSameBuf && !StNotSameBuf) {
     StSameBuf =
         StSameBuf->BindExpr(Call.getOriginExpr(), LCtx,
                             svalBuilder.makeZeroVal(Call.getResultType()));
     C.addTransition(StSameBuf);
-
-    // If the two arguments are GUARANTEED to be the same, we're done!
-    if (!StNotSameBuf)
-      return;
+    return;
   }
 
   assert(StNotSameBuf);
diff --git a/clang/test/Analysis/string.c b/clang/test/Analysis/string.c
index cdd36275568e3..45b81297a026e 100644
--- a/clang/test/Analysis/string.c
+++ b/clang/test/Analysis/string.c
@@ -955,6 +955,14 @@ void strcmp_fn_l(char *x) {
   strcmp((char*)&strcmp_null_argument, x); // expected-warning{{Argument to string comparison function is the address of the function 'strcmp_null_argument', which is not a null-terminated string}}
 }
 
+void strcmp_assumptions(char *a, char *b) {
+  if (strcmp(a, b) == 0) {
+    // ...
+  }
+
+  clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
 //===----------------------------------------------------------------------===
 // strncmp()
 //===----------------------------------------------------------------------===
@@ -1070,6 +1078,14 @@ int strncmp_null_argument(char *a, size_t n) {
   return strncmp(a, b, n); // expected-warning{{Null pointer passed as 2nd argument to string comparison function}}
 }
 
+void strncmp_assumptions(char *a, char *b, size_t n) {
+  if (strncmp(a, b, n) == 0) {
+    // ...
+  }
+
+  clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
 //===----------------------------------------------------------------------===
 // strcasecmp()
 //===----------------------------------------------------------------------===

NagyDonat

I see that the old behavior had problems, but the new code also has a problematic aspect: in the case when the pointers may or may not be same (that is, when StSameBuf && StNotSameBuf is true) it always assumes that the pointers are not equal (because it assigns state = StNotSameBuf; in the line just below the diff context displayed by github).

To test this, you should replace the // ... placeholders in your new tests with clang_analyzer_eval(a == b); // expected-warning{{UNKNOWN}}. Ideally these checks should pass (if the strings are identical, the pointers may or may not be equal) but with your code they will produce FALSE (because the StSameBuf branch is silently discarded).

To avoid losing the "pointers may be equal" case you should remove the statement state = StNotSameBuf;. Unfortunately, deleting this statement also has a serious side effect: after it clang_analyzer_eval(a == b); would also be UNKNOWN on the branch where strcmp is nonzero. (However, this side effect is just "not recording knowledge" which is IMO less problematic than "arbitrarily discarding a case that would be possible".)

This side effect can be easily mitigated in the case when the analyzer can evaluate the comparison of known strings (just re-add state = StNotSameBuf; on the branch where the checker records the assumption that the result is GT or LT compared to zero).

Mitigating the side effect is also possible in the case when the analyzer cannot determine the comparison result (which includes e.g. the tests added in this PR): to do this you should introduce the following steps (on the branch where the analyzer didn't determine the comparison result):

call auto [StResultEqual, StResultNonEqual] = state->assume(resultVal) (where resultVal is the freshly conjured otherwise unconstrained result value)
- these states will be both non-null because the value was unconstrained;
add a transition to StResultEqual (i.e. the string values equal, pointers may or may not be equal);
starting from StResultNotEqual (when the string values differ) assume that the two pointers are different, then add a second transition to the resulting state.

This second mitigation introduces a state split, but IMO this is a much more "benign" state split than the one that was introduced by the old code. Separating "strcmp returns zero" and "strcmp returns nonzero" is a reasonable after a call to strcmp (in fact, this is very similar to the eagerly-assume heuristic, which works correctly and is enabled by default) -- while the old "pointers are equal" and "pointers are different" state split was indeed a surprising and unjustified consequence of calling strcmp.

I admit that these suggestions are a bit complex, feel free to ask for details if you don't understand something...

[analyzer] CStringChecker: do not branch assuming the arguments of `s…

9a9e8fd

…tr(n)cmp` are equal

llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 2, 2025

NagyDonat requested review from balazske and NagyDonat October 2, 2025 12:05

NagyDonat reviewed Oct 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal #161644

[analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal #161644

guillem-bartrina-sonarsource commented Oct 2, 2025 •

edited

Loading

Uh oh!

llvmbot commented Oct 2, 2025 •

edited

Loading

Uh oh!

NagyDonat left a comment •

edited

Loading

Uh oh!

Uh oh!

[analyzer] CStringChecker: do not branch assuming the arguments of str(n)cmp are equal #161644

Are you sure you want to change the base?

[analyzer] CStringChecker: do not branch assuming the arguments of str(n)cmp are equal #161644

Conversation

guillem-bartrina-sonarsource commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NagyDonat left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

[analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal #161644

[analyzer] CStringChecker: do not branch assuming the arguments of `str(n)cmp` are equal #161644

guillem-bartrina-sonarsource commented Oct 2, 2025 •

edited

Loading

llvmbot commented Oct 2, 2025 •

edited

Loading

NagyDonat left a comment •

edited

Loading