Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Improve handling of unsigned values in ArrayBoundCheckerV2 #81034

Merged

Conversation

NagyDonat
Copy link
Contributor

A memory access is an out of bounds error if the offset is < the extent of the memory region. Notice that here "<" is a mathematical comparison between two numbers and NOT a C/C++ operator that compares two typed C++ values: for example -1 < 1000 is true in mathematics, but if the -1 is an int and the 1000 is a size_t value, then evaluating the C/C++ operator < will return false because the -1 will be converted to SIZE_MAX by the automatic type conversions.

This means that it's incorrect to perform a bounds check with evalBinOpNN(State, BO_LT, ...) which performs automatic conversions and can produce wildly incorrect results.

ArrayBoundsCheckerV2 already had a special case where it avoided calling evalBinOpNN in a situation where it would have performed an automatic conversion; this commit replaces that code with a more general one that covers more situations. (It's still not perfect, but it's better than the previous version and I think it will cover practically all real-world code.)

Note that this is not a limitation/bug of the simplification algorithm defined in getSimplifedOffsets(): the simplification is not applied in the test case test_comparison_with_extent_symbol (because the Extent is not a concrete int), but without the new code it would still run into a -1 < UNSIGNED comparison that evaluates to false because evalBinOpNN performs an automatic type conversion.

A memory access is an out of bounds error if the offset is < the extent
of the memory region. Notice that here "<" is a _mathematical_
comparison between two numbers and NOT a C/C++ operator that compares
two typed C++ values: for example -1 < 1000 is true in mathematics, but
if the `-1` is an `int` and the 1000 is a `size_t` value, then
evaluating the C/C++ operator `<` will return false because the `-1`
will be converted to `SIZE_MAX` by the automatic type conversions.

This means that it's incorrect to perform a bounds check with
`evalBinOpNN(State, BO_LT, ...)` which performs automatic conversions
and can produce wildly incorrect results.

ArrayBoundsCheckerV2 already had a special case where it avoided calling
`evalBinOpNN` in a situation where it would have performed an automatic
conversion; this commit replaces that code with a more general one that
covers more situations. (It's still not perfect, but it's better than
the previous version and I think it will cover practically all
real-world code.)

Note that this is not a limitation/bug of the simplification algorithm
defined in `getSimplifedOffsets()`: the simplification is not applied in
the test case `test_comparison_with_extent_symbol` (because the `Extent`
is not a concrete int), but without the new code it would still run into
a `-1 < UNSIGNED` comparison that evaluates to false because
`evalBinOpNN` performs an automatic type conversion.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Feb 7, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 7, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (NagyDonat)

Changes

A memory access is an out of bounds error if the offset is < the extent of the memory region. Notice that here "<" is a mathematical comparison between two numbers and NOT a C/C++ operator that compares two typed C++ values: for example -1 < 1000 is true in mathematics, but if the -1 is an int and the 1000 is a size_t value, then evaluating the C/C++ operator &lt; will return false because the -1 will be converted to SIZE_MAX by the automatic type conversions.

This means that it's incorrect to perform a bounds check with evalBinOpNN(State, BO_LT, ...) which performs automatic conversions and can produce wildly incorrect results.

ArrayBoundsCheckerV2 already had a special case where it avoided calling evalBinOpNN in a situation where it would have performed an automatic conversion; this commit replaces that code with a more general one that covers more situations. (It's still not perfect, but it's better than the previous version and I think it will cover practically all real-world code.)

Note that this is not a limitation/bug of the simplification algorithm defined in getSimplifedOffsets(): the simplification is not applied in the test case test_comparison_with_extent_symbol (because the Extent is not a concrete int), but without the new code it would still run into a -1 &lt; UNSIGNED comparison that evaluates to false because evalBinOpNN performs an automatic type conversion.


Full diff: https://github.com/llvm/llvm-project/pull/81034.diff

3 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h (+8-4)
  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+33-9)
  • (modified) clang/test/Analysis/out-of-bounds.c (+8)
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
index d7cff49036cb81..a560f274c43ccd 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
@@ -110,12 +110,16 @@ class SValBuilder {
   /// that value is returned. Otherwise, returns NULL.
   virtual const llvm::APSInt *getKnownValue(ProgramStateRef state, SVal val) = 0;
 
-  /// Tries to get the minimal possible (integer) value of a given SVal. If the
-  /// constraint manager cannot provide an useful answer, this returns NULL.
+  /// Tries to get the minimal possible (integer) value of a given SVal. This
+  /// always returns the value of a ConcreteInt, but may return NULL if the
+  /// value is symbolic and the constraint manager cannot provide a useful
+  /// answer.
   virtual const llvm::APSInt *getMinValue(ProgramStateRef state, SVal val) = 0;
 
-  /// Tries to get the maximal possible (integer) value of a given SVal. If the
-  /// constraint manager cannot provide an useful answer, this returns NULL.
+  /// Tries to get the maximal possible (integer) value of a given SVal. This
+  /// always returns the value of a ConcreteInt, but may return NULL if the
+  /// value is symbolic and the constraint manager cannot provide a useful
+  /// answer.
   virtual const llvm::APSInt *getMaxValue(ProgramStateRef state, SVal val) = 0;
 
   /// Simplify symbolic expressions within a given SVal. Return an SVal
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 05fc00a990d524..fdcc46e58580b4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -268,6 +268,16 @@ getSimplifiedOffsets(NonLoc offset, nonloc::ConcreteInt extent,
   return std::pair<NonLoc, nonloc::ConcreteInt>(offset, extent);
 }
 
+static bool isNegative(SValBuilder &SVB, ProgramStateRef State, NonLoc Value) {
+  const llvm::APSInt *MaxV = SVB.getMaxValue(State, Value);
+  return MaxV && MaxV->isNegative();
+}
+
+static bool isUnsigned(SValBuilder &SVB, NonLoc Value) {
+  QualType T = Value.getType(SVB.getContext());
+  return T->isUnsignedIntegerType();
+}
+
 // Evaluate the comparison Value < Threshold with the help of the custom
 // simplification algorithm defined for this checker. Return a pair of states,
 // where the first one corresponds to "value below threshold" and the second
@@ -281,18 +291,32 @@ compareValueToThreshold(ProgramStateRef State, NonLoc Value, NonLoc Threshold,
   if (auto ConcreteThreshold = Threshold.getAs<nonloc::ConcreteInt>()) {
     std::tie(Value, Threshold) = getSimplifiedOffsets(Value, *ConcreteThreshold, SVB);
   }
-  if (auto ConcreteThreshold = Threshold.getAs<nonloc::ConcreteInt>()) {
-    QualType T = Value.getType(SVB.getContext());
-    if (T->isUnsignedIntegerType() && ConcreteThreshold->getValue().isNegative()) {
-      // In this case we reduced the bound check to a comparison of the form
-      //   (symbol or value with unsigned type) < (negative number)
-      // which is always false. We are handling these cases separately because
-      // evalBinOpNN can perform a signed->unsigned conversion that turns the
-      // negative number into a huge positive value and leads to wildly
-      // inaccurate conclusions.
+
+  // We want to perform a _mathematical_ comparison between the numbers `Value`
+  // and `Threshold`; but `evalBinOpNN` evaluates a C/C++ operator that may
+  // perform automatic conversions. For example the number -1 is less than the
+  // number 1000, but -1 < `1000ull` will evaluate to `false` because the `int`
+  // -1 is converted to ULONGLONG_MAX.
+  // To avoid automatic conversions, we evaluate the "obvious" cases without
+  // calling `evalBinOpNN`:
+  if (isNegative(SVB, State, Value) && isUnsigned(SVB, Threshold)) {
+    if (CheckEquality) {
+      // negative_value == unsigned_value is always false
       return {nullptr, State};
     }
+    // negative_value < unsigned_value is always false
+    return {State, nullptr};
   }
+  if (isUnsigned(SVB, Value) && isNegative(SVB, State, Threshold)) {
+    // unsigned_value == negative_value and unsigned_value < negative_value are
+    // both always false
+    return {nullptr, State};
+  }
+  // FIXME: these special cases are sufficient for handling real-world
+  // comparisons, but in theory there could be contrived situations where
+  // automatic conversion of a symbolic value (which can be negative and can be
+  // positive) leads to incorrect results.
+
   const BinaryOperatorKind OpKind = CheckEquality ? BO_EQ : BO_LT;
   auto BelowThreshold =
       SVB.evalBinOpNN(State, OpKind, Value, Threshold, SVB.getConditionType())
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index ed457e86960063..1f771c2b3bd138 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -186,3 +186,11 @@ void test_assume_after_access2(unsigned long x) {
   clang_analyzer_eval(x <= 99); // expected-warning{{TRUE}}
 }
 
+struct incomplete;
+char test_comparison_with_extent_symbol(struct incomplete *p) {
+  // Previously this was reported as a (false positive) overflow error because
+  // the extent symbol of the area pointed by `p` was an unsigned and the '-1'
+  // was converted to its type by `evalBinOpNN`.
+  return ((char *)p)[-1]; // no-warning
+}
+

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Feb 7, 2024

I found yet another situation where unsigned numbers cause stupid errors in the analyzer 😓. This patch is a conservative, minimal solution to "plug the hole", but perhaps it would be better to eliminate unsigned numbers from the out-of-bound calculations and ensure that all comparisons are performed on ssize_t values (which can sufficiently represent all realistic offset and extent values).

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Feb 19, 2024

I did an open source evaluation of this commit and there are surprisingly many changes:

Project With this commit Without this commit
memcached View View
tmux No new reports No resolved reports
curl View View
twin View View
vim View View
openssl View View
sqlite View View
ffmpeg View View
postgres View View
tinyxml2 View View
libwebm No new reports No resolved reports
xerces No new reports View
bitcoin View View
protobuf View View
qtbase View View
contour View View
openrct2 View View

Among the "New" reports there is a single division by zero issue from some very complex code (might be a FP, but I don't understand it; probably it's on a code path where we previously reported an out of bounds error); and LOTS of alpha.security.ArrayBound reports.

Note that alpha.security.ArrayBoundV2 activates at a point that's just before the activation of alpha.security.ArrayBound, so it's natural that ArrayBound (V1) reports will appear in situations when ArrayBoundV2 reports are silenced/removed (and no longer introduce a sink node).

To evaluate the effects of this commit I'll study the "Resolved" reports, which are in theory false positives that are eliminated by the more accurate handling of unsigned values. (This may take some time, e.g. on ffmpeg there are 362 "Resolved" reports.)

@NagyDonat
Copy link
Contributor Author

The main effect of this commit is that it eliminates ~300-400 ArrayBoundV2 reports. I didn't review each of them, but I checked dozens of them and those were all "Out of bound access to memory after the end of the region" false positives that tried to access something at index -1 through an opaque pointer.

This is a logical consequence of the commit:

  • for pointers that are not understood by the analyzer getDynamicExtent() conjures an extent symbol which is handled as a size_t value;
  • before this commit this extent symbol was blindly compared with the offset;
  • when the offset is -1 it was converted to SIZE_MAX by evalBinOpNN which evaluates the comparison according to the C++ rules;
  • even if the extent symbol was totally unconstrained, evalBinOpNN was able to determine that it's not larger than SIZE_MAX;
  • so it reported an overflow error when -1 was used as an array index.
  • Now the new shortcut realizes that -1 is negative, so it's certainly smaller than an unsigned value (the extent symbol), and these situations are not reported as overflow errors.
    (Note that -1 is very significant here: for other negative numbers we only get an assumption that the extent symbol must be huge to accommodate that index.)

I'm surprised that these false positives were so prevalent previously; but then it's good news that this change eliminates them.

There are also lots of lost ArrayBound (V1) reports (e.g. 120 of them on ffmpeg, there are a few others on other projects as well), but I didn't analyze those too closely, because I hope that I can remove that checker in the foreseeable future. The few ones that I checked were difficult to understand / potentially false positives.

There were also two CastSize reports that no longer appear after this commit (1), (2), but they were confusing, messy reports on complicated code, so I don't think that losing them is a significant effect.

@gamesh411
Copy link
Contributor

I checked many false positives, even those that were not FP at first sight; I think we can live without them.
I also agree that we should strive to handle comparison evaluation more uniformly and preferably behind the API barrier of the constraint manager.
LGTM

@NagyDonat
Copy link
Contributor Author

[...] we should strive to handle comparison evaluation more uniformly and preferably behind the API barrier of the constraint manager.

Actually this commit is a step towards the opposite direction -- it adds some tricky workaround logic to one particular checker (and not the constraint manager). Eventually it would be good to move this bounds checking into the constraint manager (or a related, but separate library file) -- but it's very hard to be clear, elegant and general in this area.

The main issue is that:

  • due to the automatic conversion there is is a subtle difference between the C/C++ operator< and the mathematical "less than",
  • so it is not entirely correct to use evalBinOpNN to represent and check the preconditions of memory access,
  • but we want to use the same symbolic expression to represent the comparisons expressed in the code (e.g. in an if (idx < arraySize) {...} branch) and the logical preconditions (that a given value must be in bounds)
  • ...because otherwise a constraint coming from the conditional expression couldn't be used to prove the precondition.

This is why I need to use this hacky "early return for the trivial cases, then use evalBinOpNN in the common case" strategy.

@NagyDonat NagyDonat merged commit fa8a211 into llvm:main Feb 22, 2024
7 checks passed
@NagyDonat NagyDonat deleted the arrayboundv2_generalize_negative_vs_unsigned branch February 22, 2024 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants