-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] Improve handling of unsigned values in ArrayBoundCheckerV2 #81034
[analyzer] Improve handling of unsigned values in ArrayBoundCheckerV2 #81034
Conversation
A memory access is an out of bounds error if the offset is < the extent of the memory region. Notice that here "<" is a _mathematical_ comparison between two numbers and NOT a C/C++ operator that compares two typed C++ values: for example -1 < 1000 is true in mathematics, but if the `-1` is an `int` and the 1000 is a `size_t` value, then evaluating the C/C++ operator `<` will return false because the `-1` will be converted to `SIZE_MAX` by the automatic type conversions. This means that it's incorrect to perform a bounds check with `evalBinOpNN(State, BO_LT, ...)` which performs automatic conversions and can produce wildly incorrect results. ArrayBoundsCheckerV2 already had a special case where it avoided calling `evalBinOpNN` in a situation where it would have performed an automatic conversion; this commit replaces that code with a more general one that covers more situations. (It's still not perfect, but it's better than the previous version and I think it will cover practically all real-world code.) Note that this is not a limitation/bug of the simplification algorithm defined in `getSimplifedOffsets()`: the simplification is not applied in the test case `test_comparison_with_extent_symbol` (because the `Extent` is not a concrete int), but without the new code it would still run into a `-1 < UNSIGNED` comparison that evaluates to false because `evalBinOpNN` performs an automatic type conversion.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: None (NagyDonat) ChangesA memory access is an out of bounds error if the offset is < the extent of the memory region. Notice that here "<" is a mathematical comparison between two numbers and NOT a C/C++ operator that compares two typed C++ values: for example -1 < 1000 is true in mathematics, but if the This means that it's incorrect to perform a bounds check with ArrayBoundsCheckerV2 already had a special case where it avoided calling Note that this is not a limitation/bug of the simplification algorithm defined in Full diff: https://github.com/llvm/llvm-project/pull/81034.diff 3 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
index d7cff49036cb81..a560f274c43ccd 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h
@@ -110,12 +110,16 @@ class SValBuilder {
/// that value is returned. Otherwise, returns NULL.
virtual const llvm::APSInt *getKnownValue(ProgramStateRef state, SVal val) = 0;
- /// Tries to get the minimal possible (integer) value of a given SVal. If the
- /// constraint manager cannot provide an useful answer, this returns NULL.
+ /// Tries to get the minimal possible (integer) value of a given SVal. This
+ /// always returns the value of a ConcreteInt, but may return NULL if the
+ /// value is symbolic and the constraint manager cannot provide a useful
+ /// answer.
virtual const llvm::APSInt *getMinValue(ProgramStateRef state, SVal val) = 0;
- /// Tries to get the maximal possible (integer) value of a given SVal. If the
- /// constraint manager cannot provide an useful answer, this returns NULL.
+ /// Tries to get the maximal possible (integer) value of a given SVal. This
+ /// always returns the value of a ConcreteInt, but may return NULL if the
+ /// value is symbolic and the constraint manager cannot provide a useful
+ /// answer.
virtual const llvm::APSInt *getMaxValue(ProgramStateRef state, SVal val) = 0;
/// Simplify symbolic expressions within a given SVal. Return an SVal
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index 05fc00a990d524..fdcc46e58580b4 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -268,6 +268,16 @@ getSimplifiedOffsets(NonLoc offset, nonloc::ConcreteInt extent,
return std::pair<NonLoc, nonloc::ConcreteInt>(offset, extent);
}
+static bool isNegative(SValBuilder &SVB, ProgramStateRef State, NonLoc Value) {
+ const llvm::APSInt *MaxV = SVB.getMaxValue(State, Value);
+ return MaxV && MaxV->isNegative();
+}
+
+static bool isUnsigned(SValBuilder &SVB, NonLoc Value) {
+ QualType T = Value.getType(SVB.getContext());
+ return T->isUnsignedIntegerType();
+}
+
// Evaluate the comparison Value < Threshold with the help of the custom
// simplification algorithm defined for this checker. Return a pair of states,
// where the first one corresponds to "value below threshold" and the second
@@ -281,18 +291,32 @@ compareValueToThreshold(ProgramStateRef State, NonLoc Value, NonLoc Threshold,
if (auto ConcreteThreshold = Threshold.getAs<nonloc::ConcreteInt>()) {
std::tie(Value, Threshold) = getSimplifiedOffsets(Value, *ConcreteThreshold, SVB);
}
- if (auto ConcreteThreshold = Threshold.getAs<nonloc::ConcreteInt>()) {
- QualType T = Value.getType(SVB.getContext());
- if (T->isUnsignedIntegerType() && ConcreteThreshold->getValue().isNegative()) {
- // In this case we reduced the bound check to a comparison of the form
- // (symbol or value with unsigned type) < (negative number)
- // which is always false. We are handling these cases separately because
- // evalBinOpNN can perform a signed->unsigned conversion that turns the
- // negative number into a huge positive value and leads to wildly
- // inaccurate conclusions.
+
+ // We want to perform a _mathematical_ comparison between the numbers `Value`
+ // and `Threshold`; but `evalBinOpNN` evaluates a C/C++ operator that may
+ // perform automatic conversions. For example the number -1 is less than the
+ // number 1000, but -1 < `1000ull` will evaluate to `false` because the `int`
+ // -1 is converted to ULONGLONG_MAX.
+ // To avoid automatic conversions, we evaluate the "obvious" cases without
+ // calling `evalBinOpNN`:
+ if (isNegative(SVB, State, Value) && isUnsigned(SVB, Threshold)) {
+ if (CheckEquality) {
+ // negative_value == unsigned_value is always false
return {nullptr, State};
}
+ // negative_value < unsigned_value is always false
+ return {State, nullptr};
}
+ if (isUnsigned(SVB, Value) && isNegative(SVB, State, Threshold)) {
+ // unsigned_value == negative_value and unsigned_value < negative_value are
+ // both always false
+ return {nullptr, State};
+ }
+ // FIXME: these special cases are sufficient for handling real-world
+ // comparisons, but in theory there could be contrived situations where
+ // automatic conversion of a symbolic value (which can be negative and can be
+ // positive) leads to incorrect results.
+
const BinaryOperatorKind OpKind = CheckEquality ? BO_EQ : BO_LT;
auto BelowThreshold =
SVB.evalBinOpNN(State, OpKind, Value, Threshold, SVB.getConditionType())
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index ed457e86960063..1f771c2b3bd138 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -186,3 +186,11 @@ void test_assume_after_access2(unsigned long x) {
clang_analyzer_eval(x <= 99); // expected-warning{{TRUE}}
}
+struct incomplete;
+char test_comparison_with_extent_symbol(struct incomplete *p) {
+ // Previously this was reported as a (false positive) overflow error because
+ // the extent symbol of the area pointed by `p` was an unsigned and the '-1'
+ // was converted to its type by `evalBinOpNN`.
+ return ((char *)p)[-1]; // no-warning
+}
+
|
I found yet another situation where unsigned numbers cause stupid errors in the analyzer 😓. This patch is a conservative, minimal solution to "plug the hole", but perhaps it would be better to eliminate unsigned numbers from the out-of-bound calculations and ensure that all comparisons are performed on |
I did an open source evaluation of this commit and there are surprisingly many changes:
Among the "New" reports there is a single division by zero issue from some very complex code (might be a FP, but I don't understand it; probably it's on a code path where we previously reported an out of bounds error); and LOTS of Note that To evaluate the effects of this commit I'll study the "Resolved" reports, which are in theory false positives that are eliminated by the more accurate handling of unsigned values. (This may take some time, e.g. on ffmpeg there are 362 "Resolved" reports.) |
The main effect of this commit is that it eliminates ~300-400 ArrayBoundV2 reports. I didn't review each of them, but I checked dozens of them and those were all "Out of bound access to memory after the end of the region" false positives that tried to access something at index This is a logical consequence of the commit:
I'm surprised that these false positives were so prevalent previously; but then it's good news that this change eliminates them. There are also lots of lost ArrayBound (V1) reports (e.g. 120 of them on ffmpeg, there are a few others on other projects as well), but I didn't analyze those too closely, because I hope that I can remove that checker in the foreseeable future. The few ones that I checked were difficult to understand / potentially false positives. There were also two CastSize reports that no longer appear after this commit (1), (2), but they were confusing, messy reports on complicated code, so I don't think that losing them is a significant effect. |
I checked many false positives, even those that were not FP at first sight; I think we can live without them. |
Actually this commit is a step towards the opposite direction -- it adds some tricky workaround logic to one particular checker (and not the constraint manager). Eventually it would be good to move this bounds checking into the constraint manager (or a related, but separate library file) -- but it's very hard to be clear, elegant and general in this area. The main issue is that:
This is why I need to use this hacky "early return for the trivial cases, then use |
A memory access is an out of bounds error if the offset is < the extent of the memory region. Notice that here "<" is a mathematical comparison between two numbers and NOT a C/C++ operator that compares two typed C++ values: for example -1 < 1000 is true in mathematics, but if the
-1
is anint
and the1000
is asize_t
value, then evaluating the C/C++ operator<
will return false because the-1
will be converted toSIZE_MAX
by the automatic type conversions.This means that it's incorrect to perform a bounds check with
evalBinOpNN(State, BO_LT, ...)
which performs automatic conversions and can produce wildly incorrect results.ArrayBoundsCheckerV2 already had a special case where it avoided calling
evalBinOpNN
in a situation where it would have performed an automatic conversion; this commit replaces that code with a more general one that covers more situations. (It's still not perfect, but it's better than the previous version and I think it will cover practically all real-world code.)Note that this is not a limitation/bug of the simplification algorithm defined in
getSimplifedOffsets()
: the simplification is not applied in the test casetest_comparison_with_extent_symbol
(because theExtent
is not a concrete int), but without the new code it would still run into a-1 < UNSIGNED
comparison that evaluates to false becauseevalBinOpNN
performs an automatic type conversion.