[analyzer] Normalize sub-array indices in RegionStore initializer res…#200044
[analyzer] Normalize sub-array indices in RegionStore initializer res…#200044gamesh411 wants to merge 3 commits into
Conversation
…olution After llvm#198346, alpha.unix.cstring.UninitializedRead reports a false positive when a pointer into a fully-initialized const multidimensional array is advanced past an inner dimension boundary and used as a source argument to memcpy. The root cause is in convertOffsetsFromSvalToUnsigneds in RegionStore, which returned UndefinedVal for any element index exceeding its sub-array extent, conflating pointer arithmetic legality with memory initializedness. This patch separates the two concerns. The RegionStore now normalizes indices that overflow an inner dimension by carrying into the outer dimension via divmod, the same way arr[0][5] in int arr[4][3] denotes the same memory as arr[1][2]. UndefinedVal is returned only when the computed flat offset exceeds the total array allocation. Whether cross-subobject pointer arithmetic constitutes undefined behavior per C/C++ standards is a separate concern for individual checkers to diagnose. No existing checker flags sub-array boundary crossing as UB, verified both before and after llvm#198346. Fixes llvm#199271
|
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Endre Fülöp (gamesh411) Changes…olution After #198346, alpha.unix.cstring.UninitializedRead reports a false positive when a pointer into a fully-initialized const multidimensional array is advanced past an inner dimension boundary and used as a source argument to memcpy. The root cause is in convertOffsetsFromSvalToUnsigneds in RegionStore, which returned UndefinedVal for any element index exceeding its sub-array extent, conflating pointer arithmetic legality with memory initializedness. This patch separates the two concerns. The RegionStore now normalizes indices that overflow an inner dimension by carrying into the outer dimension via divmod, the same way arr[0][5] in int arr[4][3] denotes the same memory as arr[1][2]. UndefinedVal is returned only when the computed flat offset exceeds the total array allocation. Whether cross-subobject pointer arithmetic constitutes undefined behavior per C/C++ standards is a separate concern for individual checkers to diagnose. No existing checker flags sub-array boundary crossing as UB, verified both before and after #198346. Fixes #199271 Full diff: https://github.com/llvm/llvm-project/pull/200044.diff 4 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 6a426331c93c3..7f747b0b1eceb 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -1804,65 +1804,56 @@ getElementRegionOffsetsWithBase(const ElementRegion *ER) {
/// This is a helper function for `getConstantValFromConstArrayInitializer`.
///
-/// Convert array of offsets from `SVal` to `uint64_t` in consideration of
-/// respective array extents.
-/// \param SrcOffsets [in] The array of offsets of type `SVal` in reversed
-/// order (expectedly received from `getElementRegionOffsetsWithBase`).
-/// \param ArrayExtents [in] The array of extents.
-/// \param DstOffsets [out] The array of offsets of type `uint64_t`.
+/// Flatten per-dimension SVal offsets into a linear index, bounds-check
+/// against the total allocation, and decompose back into per-dimension
+/// uint64_t indices.
+///
+/// \param SrcOffsets [in] Per-dimension offsets in reversed order
+/// (as received from `getElementRegionOffsetsWithBase`).
+/// \param ArrayExtents [in] Extents per dimension (outer to inner).
+/// \param DstOffsets [out] Normalized per-dimension indices.
/// \returns:
-/// - `std::nullopt` for successful convertion.
-/// - `UndefinedVal` or `UnknownVal` otherwise. It's expected that this SVal
-/// will be returned as a suitable value of the access operation.
-/// which should be returned as a correct
+/// - `std::nullopt` on success.
+/// - `UndefinedVal` if the flat offset is out of bounds.
+/// - `UnknownVal` if any index is symbolic.
///
/// \example:
/// const int arr[10][20][30] = {}; // ArrayExtents { 10, 20, 30 }
-/// int x1 = arr[4][5][6]; // SrcOffsets { NonLoc(6), NonLoc(5), NonLoc(4) }
-/// // DstOffsets { 4, 5, 6 }
-/// // returns std::nullopt
-/// int x2 = arr[42][5][-6]; // returns UndefinedVal
-/// int x3 = arr[4][5][x2]; // returns UnknownVal
+/// int x1 = arr[4][5][6]; // DstOffsets { 4, 5, 6 }, returns std::nullopt
+/// int x2 = arr[0][0][35]; // DstOffsets { 0, 1, 5 }, returns std::nullopt
+/// int x3 = arr[0][25][-5]; // DstOffsets { 1, 4, 25 }, returns std::nullopt
+/// int x4 = arr[10][0][0]; // returns UndefinedVal (flat offset >= total)
+/// int x5 = arr[4][5][x1]; // returns UnknownVal
static std::optional<SVal>
convertOffsetsFromSvalToUnsigneds(const SmallVector<SVal, 2> &SrcOffsets,
const SmallVector<uint64_t, 2> ArrayExtents,
SmallVector<uint64_t, 2> &DstOffsets) {
- // Check offsets for being out of bounds.
- // C++20 [expr.add] 7.6.6.4 (excerpt):
- // If P points to an array element i of an array object x with n
- // elements, where i < 0 or i > n, the behavior is undefined.
- // Dereferencing is not allowed on the "one past the last
- // element", when i == n.
- // Example:
- // const int arr[3][2] = {{1, 2}, {3, 4}};
- // arr[0][0]; // 1
- // arr[0][1]; // 2
- // arr[0][2]; // UB
- // arr[1][0]; // 3
- // arr[1][1]; // 4
- // arr[1][-1]; // UB
- // arr[2][0]; // 0
- // arr[2][1]; // 0
- // arr[-2][0]; // UB
- DstOffsets.resize(SrcOffsets.size());
+ // Flatten to a linear offset so that both positive overflow and negative
+ // indices across sub-array boundaries resolve to the correct element.
+ int64_t FlatOffset = 0;
auto ExtentIt = ArrayExtents.begin();
- auto OffsetIt = DstOffsets.begin();
- // Reverse `SValOffsets` to make it consistent with `ArrayExtents`.
for (SVal V : llvm::reverse(SrcOffsets)) {
- if (auto CI = V.getAs<nonloc::ConcreteInt>()) {
- // When offset is out of array's bounds, result is UB.
- const llvm::APSInt &Offset = CI->getValue();
- if (Offset.isNegative() || Offset.uge(*(ExtentIt++)))
- return UndefinedVal();
- // Store index in a reversive order.
- *(OffsetIt++) = Offset.getZExtValue();
- continue;
- }
- // Symbolic index presented. Return Unknown value.
- // FIXME: We also need to take ElementRegions with symbolic indexes into
- // account.
- return UnknownVal();
+ auto CI = V.getAs<nonloc::ConcreteInt>();
+ if (!CI)
+ return UnknownVal();
+ FlatOffset = FlatOffset * static_cast<int64_t>(*(ExtentIt++)) +
+ CI->getValue()->getExtValue();
}
+
+ int64_t TotalSize = 1;
+ for (uint64_t E : ArrayExtents)
+ TotalSize *= static_cast<int64_t>(E);
+
+ if (FlatOffset < 0 || FlatOffset >= TotalSize)
+ return UndefinedVal();
+
+ DstOffsets.resize(ArrayExtents.size());
+ uint64_t Remaining = static_cast<uint64_t>(FlatOffset);
+ for (int I = DstOffsets.size() - 1; I >= 0; --I) {
+ DstOffsets[I] = Remaining % ArrayExtents[I];
+ Remaining /= ArrayExtents[I];
+ }
+
return std::nullopt;
}
diff --git a/clang/test/Analysis/bstring_UninitRead.c b/clang/test/Analysis/bstring_UninitRead.c
index 8c79f73569015..25ff4d82f441b 100644
--- a/clang/test/Analysis/bstring_UninitRead.c
+++ b/clang/test/Analysis/bstring_UninitRead.c
@@ -152,3 +152,10 @@ void ff_mov_lang_to_iso639() {
memcpy(ff_mov_lang_to_iso639_to,
mov_mdhd_language_map[ff_mov_lang_to_iso639_code], 4);
}
+
+void memcpy_multidim_crossing_subarray(void *dst) {
+ const long b[4][1200] = {};
+ const long *f = &b[0][0];
+ f = f + 2001; // crosses into b[1][801]
+ memcpy(dst, f, sizeof(long)); // no-warning
+}
diff --git a/clang/test/Analysis/initialization.c b/clang/test/Analysis/initialization.c
index 32bdf733bcb98..fb75fb9739759 100644
--- a/clang/test/Analysis/initialization.c
+++ b/clang/test/Analysis/initialization.c
@@ -73,10 +73,10 @@ void glob_arr_index3(void) {
void negative_index(void) {
int x = 2, y = -2;
- clang_analyzer_eval(glob_arr2[x][y] == 5); // expected-warning{{UNDEFINED}}
+ clang_analyzer_eval(glob_arr2[x][y] == 5); // expected-warning{{TRUE}}
x = 3;
y = -3;
- clang_analyzer_eval(glob_arr2[x][y] == 7); // expected-warning{{UNDEFINED}}
+ clang_analyzer_eval(glob_arr2[x][y] == 7); // expected-warning{{TRUE}}
}
void glob_invalid_index3(void) {
diff --git a/clang/test/Analysis/initialization.cpp b/clang/test/Analysis/initialization.cpp
index d064dd6172d63..100cd89431e94 100644
--- a/clang/test/Analysis/initialization.cpp
+++ b/clang/test/Analysis/initialization.cpp
@@ -87,9 +87,9 @@ void glob_ptr_index2() {
int const *ptr = glob_arr5[1];
clang_analyzer_eval(ptr[0] == 3); // expected-warning{{TRUE}}
clang_analyzer_eval(ptr[1] == 4); // expected-warning{{TRUE}}
- clang_analyzer_eval(ptr[2] == 5); // expected-warning{{UNDEFINED}}
- clang_analyzer_eval(ptr[3] == 0); // expected-warning{{UNDEFINED}}
- clang_analyzer_eval(ptr[4] == 0); // expected-warning{{UNDEFINED}}
+ clang_analyzer_eval(ptr[2] == 5); // expected-warning{{TRUE}}
+ clang_analyzer_eval(ptr[3] == 0); // expected-warning{{TRUE}}
+ clang_analyzer_eval(ptr[4] == 0); // expected-warning{{TRUE}}
}
void glob_invalid_index5() {
@@ -256,3 +256,86 @@ void initlistWithinInitlist() {
// no-crash
clang_analyzer_dump(glob[0]); // expected-warning-re {{reg_${{[0-9]+}}<enum E Element{glob,0 S64b,enum E}>}}
}
+
+const int glob_arr_cross[4][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}};
+void glob_array_cross_subarray_boundary() {
+ const int *p = &glob_arr_cross[0][0];
+ clang_analyzer_eval(p[4] == 5); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[7] == 8); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[11] == 12); // expected-warning{{TRUE}}
+}
+
+const int glob_arr_sparse[4][3] = {{1, 2}, {0, 0, 7}};
+void glob_array_sparse_cross_boundary() {
+ const int *p = &glob_arr_sparse[0][0];
+ clang_analyzer_eval(p[0] == 1); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[1] == 2); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[2] == 0); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[3] == 0); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[5] == 7); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[6] == 0); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[11] == 0); // expected-warning{{TRUE}}
+}
+
+const int glob_arr_3d[2][3][4] = {
+ {{1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}},
+ {{13, 14, 15, 16}, {17, 18, 19, 20}, {21, 22, 23, 24}}};
+void glob_array_3d_cross_boundary() {
+ const int *p = &glob_arr_3d[0][0][0];
+ clang_analyzer_eval(p[0] == 1); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[5] == 6); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[11] == 12); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[12] == 13); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[23] == 24); // expected-warning{{TRUE}}
+}
+
+// Limitation: type-punned access through incompatible pointer type.
+const int glob_arr_pun[2][3] = {{1, 2, 3}, {4, 5, 6}};
+void glob_array_type_pun_unresolved() {
+ const unsigned *p = (const unsigned *)&glob_arr_pun[0][0];
+ clang_analyzer_dump(p[0]); // expected-warning-re{{reg_${{[0-9]+}}<unsigned int Element{glob_arr_pun,0 S64b,unsigned int}>}}
+}
+
+const int glob_arr_recast[4][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}};
+void glob_array_recast_resolves_against_declared_type() {
+ const int (*q)[6] = (const int (*)[6])&glob_arr_recast[0][0];
+ clang_analyzer_eval(q[0][4] == 5); // expected-warning{{TRUE}}
+ clang_analyzer_eval(q[1][0] == 4); // expected-warning{{TRUE}}
+}
+
+void glob_array_write_then_read_cross_boundary() {
+ int arr[4][3];
+ arr[0][0] = 10;
+ arr[1][1] = 42;
+ arr[2][2] = 99;
+ arr[3][2] = 77;
+
+ int *p = &arr[0][0];
+ clang_analyzer_eval(p[0] == 10); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[4] == 42); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[8] == 99); // expected-warning{{TRUE}}
+ clang_analyzer_eval(p[11] == 77); // expected-warning{{TRUE}}
+}
+
+void glob_array_write_then_read_before_start() {
+ int arr[4][3];
+ arr[0][0] = 10;
+ int *p = &arr[0][0];
+ int x = p[-1]; // expected-warning{{Assigned value is uninitialized}}
+ (void)x;
+}
+
+// Negative inner index from a non-zero outer base lands within the allocation.
+const int glob_arr_neg[4][3] = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12}};
+void glob_array_negative_inner_index() {
+ const int *p = &glob_arr_neg[2][0];
+ clang_analyzer_eval(p[-1] == 6); // expected-warning{{TRUE}}
+}
+
+void glob_array_write_then_read_past_end() {
+ int arr[4][3];
+ arr[3][2] = 77;
+ int *p = &arr[0][0];
+ int x = p[12]; // expected-warning{{Assigned value is uninitialized}}
+ (void)x;
+}
|
|
My idea is to move the encoding of UB thanks to cross-subobject indexing out from the Reasons for putting this change in the
|
There was a problem hiding this comment.
Overall LGTM, my only remark is about comment clarity.
The root cause is in convertOffsetsFromSvalToUnsigneds in RegionStore, which returned UndefinedVal for any element index exceeding its sub-array extent, conflating pointer arithmetic legality with memory initializedness.
I completely agree with this diagnosis.
Whether cross-subobject pointer arithmetic constitutes undefined behavior per C/C++ standards is a separate concern for individual checkers to diagnose. No existing checker flags sub-array boundary crossing as UB, verified both before and after #198346.
In fact I'm planning to change the behavior of security.ArrayBound to let it report situations like referencing arr[1][5] in an array declared as int arr[4][3] -- because it is technically UB and the array bound checker should be able to report this.
However if the array bound checker does not report it directly (because it is disabled or the situation is more complex and some heuristic silences the warning) then the RegionStore should not report it indirectly (by returning an Undefined value which can trigger reports from various "use of undefined value" checkers).
More generally, I think UndefinedVal should be reserved for representing the contents of memory that should not be read (e.g uninitialized local, area returned by malloc, perhaps a moved-from object etc.) and we should never use it as the result of operations that trigger undefined behavior (e.g. out-of-bounds access, bad bitwise shift, zero division).
When the analyzer thinks that an undefined operation happens (on a certain execution path), it should immediately produce a bug report (which is focused on the root cause, can be cleanly suppressed or disabled etc.) instead of returning UndefinedVal (which will probably trigger some use-of-undefined checker later, and can be hard to understand or suppress, but may as well go unnoticed).
Returning UndefinedVal as a "second line of defense" is a very counter-productive approach: if the execution path wasn't terminated by an error outright, then the analyzer should admit that it doesn't understand the situation and use UnknownVal.
|
I would like to suggest to add multidimensional test into lit list: To test situation of skip across multiple dimensions. |
|
What releases are affected? |
If you mean #198346, AFAIK, it is not part of any release yet, as it was merged last week. (unless I'm unaware of a branch off that happened for the next release). I have checked that LLVM 22.1.6 does not contain it. |
|
LGTM as of now. |
…olution
After #198346, alpha.unix.cstring.UninitializedRead reports a false positive when a pointer into a fully-initialized const multidimensional array is advanced past an inner dimension boundary and used as a source argument to memcpy. The root cause is in
convertOffsetsFromSvalToUnsignedsin RegionStore, which returned UndefinedVal for any element index exceeding its sub-array extent, conflating pointer arithmetic legality with memory initializedness.This patch separates the two concerns. The RegionStore now normalizes indices that overflow an inner dimension by carrying into the outer dimension via divmod, the same way
arr[0][5]inint arr[4][3]denotes the same memory asarr[1][2]. UndefinedVal is returned only when the computed flat offset exceeds the total array allocation. Whether cross-subobject pointer arithmetic constitutes undefined behavior per C/C++ standards is a separate concern for individual checkers to diagnose. No existing checker flags sub-array boundary crossing as UB, verified both before and after #198346.Fixes #199271