Skip to content

Conversation

guillem-bartrina-sonarsource
Copy link
Contributor

@guillem-bartrina-sonarsource guillem-bartrina-sonarsource commented Oct 2, 2025

Upon running into str(n)cmp with unknown symbolic arguments, CStringChecker forks the analysis assuming that the two arguments are EQUAL in one branch. There is no explicit motivation for this behavior in the original commit message c026370, which leads me to believe it was not intentional.

This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function memcmp (here). I suggest doing the same as in the latter case: not forking the analysis and proceeding to one branch or the other, whether we know for certain that the two arguments are EQUAL or not.

Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Oct 2, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 2, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (guillem-bartrina-sonarsource)

Changes

Upon running into str(n)cmp with unknown symbolic arguments, CStringChecker forks the analysis assuming that the two arguments are EQUAL in one branch. There is no explicit motivation for this behavior in the original commit message c026370, which leads me to believe it was not intentional.

This behavior is quite counterintuitive, as well as noisy, and differs from that of the similar function memcmp (here). I suggest doing the same as in the latter case: not forking the analysis and proceeding to one branch or the other, whether we know for certain that the two arguments are EQUAL or not.

Note that the behavior I am modifying was not tested at all in the tests, as none broke due to the change.


Full diff: https://github.com/llvm/llvm-project/pull/161644.diff

2 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp (+2-5)
  • (modified) clang/test/Analysis/string.c (+16)
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 0ae784c000f60..c15464e9a713f 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -2418,15 +2418,12 @@ void CStringChecker::evalStrcmpCommon(CheckerContext &C, const CallEvent &Call,
 
   // If the two arguments might be the same buffer, we know the result is 0,
   // and we only need to check one size.
-  if (StSameBuf) {
+  if (StSameBuf && !StNotSameBuf) {
     StSameBuf =
         StSameBuf->BindExpr(Call.getOriginExpr(), LCtx,
                             svalBuilder.makeZeroVal(Call.getResultType()));
     C.addTransition(StSameBuf);
-
-    // If the two arguments are GUARANTEED to be the same, we're done!
-    if (!StNotSameBuf)
-      return;
+    return;
   }
 
   assert(StNotSameBuf);
diff --git a/clang/test/Analysis/string.c b/clang/test/Analysis/string.c
index cdd36275568e3..45b81297a026e 100644
--- a/clang/test/Analysis/string.c
+++ b/clang/test/Analysis/string.c
@@ -955,6 +955,14 @@ void strcmp_fn_l(char *x) {
   strcmp((char*)&strcmp_null_argument, x); // expected-warning{{Argument to string comparison function is the address of the function 'strcmp_null_argument', which is not a null-terminated string}}
 }
 
+void strcmp_assumptions(char *a, char *b) {
+  if (strcmp(a, b) == 0) {
+    // ...
+  }
+
+  clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
 //===----------------------------------------------------------------------===
 // strncmp()
 //===----------------------------------------------------------------------===
@@ -1070,6 +1078,14 @@ int strncmp_null_argument(char *a, size_t n) {
   return strncmp(a, b, n); // expected-warning{{Null pointer passed as 2nd argument to string comparison function}}
 }
 
+void strncmp_assumptions(char *a, char *b, size_t n) {
+  if (strncmp(a, b, n) == 0) {
+    // ...
+  }
+
+  clang_analyzer_eval(a == b); // expected-warning{{FALSE}}
+}
+
 //===----------------------------------------------------------------------===
 // strcasecmp()
 //===----------------------------------------------------------------------===

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the old behavior had problems, but the new code also has a problematic aspect: in the case when the pointers may or may not be same (that is, when StSameBuf && StNotSameBuf is true) it always assumes that the pointers are not equal (because it assigns state = StNotSameBuf; in the line just below the diff context displayed by github).

To test this, you should replace the // ... placeholders in your new tests with clang_analyzer_eval(a == b); // expected-warning{{UNKNOWN}}. Ideally these checks should pass (if the strings are identical, the pointers may or may not be equal) but with your code they will produce FALSE (because the StSameBuf branch is silently discarded).

To avoid losing the "pointers may be equal" case you should remove the statement state = StNotSameBuf;. Unfortunately, deleting this statement also has a serious side effect: after it clang_analyzer_eval(a == b); would also be UNKNOWN on the branch where strcmp is nonzero. (However, this side effect is just "not recording knowledge" which is IMO less problematic than "arbitrarily discarding a case that would be possible".)

This side effect can be easily mitigated in the case when the analyzer can evaluate the comparison of known strings (just re-add state = StNotSameBuf; on the branch where the checker records the assumption that the result is GT or LT compared to zero).

Mitigating the side effect is also possible in the case when the analyzer cannot determine the comparison result (which includes e.g. the tests added in this PR): to do this you should introduce the following steps (on the branch where the analyzer didn't determine the comparison result):

  • call auto [StResultEqual, StResultNonEqual] = state->assume(resultVal) (where resultVal is the freshly conjured otherwise unconstrained result value)
    • these states will be both non-null because the value was unconstrained;
  • add a transition to StResultEqual (i.e. the string values equal, pointers may or may not be equal);
  • starting from StResultNotEqual (when the string values differ) assume that the two pointers are different, then add a second transition to the resulting state.

This second mitigation introduces a state split, but IMO this is a much more "benign" state split than the one that was introduced by the old code. Separating "strcmp returns zero" and "strcmp returns nonzero" is a reasonable after a call to strcmp (in fact, this is very similar to the eagerly-assume heuristic, which works correctly and is enabled by default) -- while the old "pointers are equal" and "pointers are different" state split was indeed a surprising and unjustified consequence of calling strcmp.

I admit that these suggestions are a bit complex, feel free to ask for details if you don't understand something...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants