Skip to content

[analyzer] Bring unix.cstring.UninitializedRead checker out of alpha#196292

Open
gamesh411 wants to merge 1 commit into
llvm:mainfrom
gamesh411:cstring-uninitialized-read-dealpha
Open

[analyzer] Bring unix.cstring.UninitializedRead checker out of alpha#196292
gamesh411 wants to merge 1 commit into
llvm:mainfrom
gamesh411:cstring-uninitialized-read-dealpha

Conversation

@gamesh411
Copy link
Copy Markdown
Contributor

There have been recent improvements (#186802) and fixes (#191061) related to this checker. The reports are no longer noisy, as evaluated on 14 OS projects.

There have been recent improvements (llvm#186802) and fixes (llvm#191061) related to this checker.
The reports are no longer noisy, as evaluated on 14 OS projects.
@gamesh411 gamesh411 marked this pull request as ready for review May 12, 2026 13:20
@gamesh411 gamesh411 requested a review from NagyDonat May 12, 2026 13:21
@llvmorg-github-actions llvmorg-github-actions Bot added clang Clang issues not falling into any other category clang:static analyzer labels May 12, 2026
@gamesh411 gamesh411 requested a review from cor3ntin May 12, 2026 13:21
@llvmorg-github-actions
Copy link
Copy Markdown

llvmorg-github-actions Bot commented May 12, 2026

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Endre Fülöp (gamesh411)

Changes

There have been recent improvements (#186802) and fixes (#191061) related to this checker. The reports are no longer noisy, as evaluated on 14 OS projects.


Full diff: https://github.com/llvm/llvm-project/pull/196292.diff

11 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+6)
  • (modified) clang/docs/analyzer/checkers.rst (+34-33)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+5-5)
  • (modified) clang/test/Analysis/analyzer-enabled-checkers.c (+1)
  • (modified) clang/test/Analysis/bstring.c (+4-4)
  • (modified) clang/test/Analysis/bstring.cpp (+2-1)
  • (modified) clang/test/Analysis/bstring_UninitRead.c (+1-1)
  • (modified) clang/test/Analysis/cstring-uninitread-notes.c (+2-2)
  • (modified) clang/test/Analysis/infeasible-crash.c (+1-1)
  • (modified) clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c (+1)
  • (modified) clang/test/Analysis/wstring.c (+2-2)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index cb19b80b7e994..b4fe7f2ace1c1 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -764,6 +764,12 @@ Crash and bug fixes
     - Improvements
     - Moved checkers
 
+
+Moved checkers
+^^^^^^^^^^^^^^
+
+- The checker ``unix.cstring.UninitializedRead`` is now out of alpha.
+
 .. _release-notes-sanitizers:
 
 Sanitizers
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index 61f591916018e..a7b1e1c882e17 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -2353,6 +2353,40 @@ Check for null pointers being passed as arguments to C string functions:
    return strlen(0); // warn
  }
 
+.. _unix-cstring-UninitializedRead:
+
+unix.cstring.UninitializedRead (C)
+""""""""""""""""""""""""""""""""""""""""
+Check for uninitialized reads from common memory copy/manipulation functions such as:
+ ``memcpy, mempcpy, memmove, memcmp, strcmp, strncmp, strcpy, strlen, strsep`` and many more.
+
+.. code-block:: c
+
+ void test() {
+  char src[10];
+  char dst[5];
+  memcpy(dst,src,sizeof(dst)); // warn: Bytes string function accesses uninitialized/garbage values
+ }
+
+Limitations:
+
+   - Due to limitations of the memory modeling in the analyzer, one can likely
+     observe some false-positives of the following kind:
+
+      .. code-block:: c
+
+        void false_positive() {
+          int src[] = {1, 2, 3, 4};
+          int dst[5] = {0};
+          memcpy(dst, src, 4 * sizeof(int)); // false-positive:
+          // The 'src' buffer was correctly initialized, yet we cannot conclude
+          // that since the analyzer could not see a direct initialization of the
+          // very last byte of the source buffer.
+        }
+
+     More details at the corresponding `GitHub issue <https://github.com/llvm/llvm-project/issues/43459>`_.
+
+
 .. _unix-StdCLibraryFunctions:
 
 unix.StdCLibraryFunctions (C)
@@ -3701,39 +3735,6 @@ the analyzer cannot detect embedded NULL characters when determining the string
    memcpy(buffer, str, sizeof(str)); // warn
  }
 
-.. _alpha-unix-cstring-UninitializedRead:
-
-alpha.unix.cstring.UninitializedRead (C)
-""""""""""""""""""""""""""""""""""""""""
-Check for uninitialized reads from common memory copy/manipulation functions such as:
- ``memcpy, mempcpy, memmove, memcmp, strcmp, strncmp, strcpy, strlen, strsep`` and many more.
-
-.. code-block:: c
-
- void test() {
-  char src[10];
-  char dst[5];
-  memcpy(dst,src,sizeof(dst)); // warn: Bytes string function accesses uninitialized/garbage values
- }
-
-Limitations:
-
-   - Due to limitations of the memory modeling in the analyzer, one can likely
-     observe a lot of false-positive reports like this:
-
-      .. code-block:: c
-
-        void false_positive() {
-          int src[] = {1, 2, 3, 4};
-          int dst[5] = {0};
-          memcpy(dst, src, 4 * sizeof(int)); // false-positive:
-          // The 'src' buffer was correctly initialized, yet we cannot conclude
-          // that since the analyzer could not see a direct initialization of the
-          // very last byte of the source buffer.
-        }
-
-     More details at the corresponding `GitHub issue <https://github.com/llvm/llvm-project/issues/43459>`_.
-
 alpha.WebKit
 ^^^^^^^^^^^^
 
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index 6b9e0b50e1f59..84c152cd72bd1 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -460,6 +460,11 @@ def CStringSyntaxChecker : Checker<"BadSizeArg">,
   Dependencies<[CStringModeling]>,
   Documentation<HasDocumentation>;
 
+def CStringUninitializedRead : Checker<"UninitializedRead">,
+  HelpText<"Checks if the string manipulation function would read uninitialized bytes">,
+  Dependencies<[CStringModeling]>,
+  Documentation<HasDocumentation>;
+
 } // end "unix.cstring"
 
 let ParentPackage = CStringAlpha in {
@@ -474,11 +479,6 @@ def CStringBufferOverlap : Checker<"BufferOverlap">,
   Dependencies<[CStringModeling]>,
   Documentation<HasDocumentation>;
 
-def CStringUninitializedRead : Checker<"UninitializedRead">,
-  HelpText<"Checks if the string manipulation function would read uninitialized bytes">,
-  Dependencies<[CStringModeling]>,
-  Documentation<HasDocumentation>;
-
 } // end "alpha.unix.cstring"
 
 let ParentPackage = Unix in {
diff --git a/clang/test/Analysis/analyzer-enabled-checkers.c b/clang/test/Analysis/analyzer-enabled-checkers.c
index c1ed882069073..8371b0e7a410a 100644
--- a/clang/test/Analysis/analyzer-enabled-checkers.c
+++ b/clang/test/Analysis/analyzer-enabled-checkers.c
@@ -55,6 +55,7 @@
 // CHECK-NEXT: unix.cstring.BadSizeArg
 // CHECK-NEXT: unix.cstring.NotNullTerminated
 // CHECK-NEXT: unix.cstring.NullArg
+// CHECK-NEXT: unix.cstring.UninitializedRead
 
 int main() {
   int i;
diff --git a/clang/test/Analysis/bstring.c b/clang/test/Analysis/bstring.c
index 810241accffa2..b337c71eb02c7 100644
--- a/clang/test/Analysis/bstring.c
+++ b/clang/test/Analysis/bstring.c
@@ -1,32 +1,32 @@
 // RUN: %clang_analyze_cc1 -verify %s \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false
 //
 // RUN: %clang_analyze_cc1 -verify %s -DUSE_BUILTINS \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false
 //
 // RUN: %clang_analyze_cc1 -verify %s -DVARIANT \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false
 //
 // RUN: %clang_analyze_cc1 -verify %s -DUSE_BUILTINS -DVARIANT \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false
 
diff --git a/clang/test/Analysis/bstring.cpp b/clang/test/Analysis/bstring.cpp
index 2f1712648d8e1..a5fd56a19eb1a 100644
--- a/clang/test/Analysis/bstring.cpp
+++ b/clang/test/Analysis/bstring.cpp
@@ -19,12 +19,13 @@
 // RUN: %{analyzer} \
 // RUN:     -analyzer-checker=alpha.unix.cstring.BufferOverlap \
 // RUN:     -analyzer-checker=unix.cstring.NotNullTerminated \
+// RUN:     -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:     -verify=expected,no-oob %s
 
 // UninitializedRead enabled without OutOfBounds: verifies that
 // UninitializedRead works independently of OutOfBounds.
 // RUN: %{analyzer} \
-// RUN:     -analyzer-checker=alpha.unix.cstring.UninitializedRead \
+// RUN:     -analyzer-checker=unix.cstring.UninitializedRead \
 // RUN:     -verify=expected,no-oob,uninit %s
 
 #include "Inputs/system-header-simulator-cxx.h"
diff --git a/clang/test/Analysis/bstring_UninitRead.c b/clang/test/Analysis/bstring_UninitRead.c
index 45e38dd316298..7557c9641781c 100644
--- a/clang/test/Analysis/bstring_UninitRead.c
+++ b/clang/test/Analysis/bstring_UninitRead.c
@@ -1,5 +1,5 @@
 // RUN: %clang_analyze_cc1 -verify %s \
-// RUN: -analyzer-checker=core,alpha.unix.cstring
+// RUN: -analyzer-checker=core,unix.cstring.UninitializedRead
 
 //===----------------------------------------------------------------------===//
 // mempcpy() using character array. This is the easiest case, as memcpy
diff --git a/clang/test/Analysis/cstring-uninitread-notes.c b/clang/test/Analysis/cstring-uninitread-notes.c
index b62519a85c8cc..6a934078566c3 100644
--- a/clang/test/Analysis/cstring-uninitread-notes.c
+++ b/clang/test/Analysis/cstring-uninitread-notes.c
@@ -1,5 +1,5 @@
 // RUN: %clang_analyze_cc1 -verify %s \
-// RUN:   -analyzer-checker=core,alpha.unix.cstring \
+// RUN:   -analyzer-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-output=text
 
 #include "Inputs/system-header-simulator.h"
@@ -19,7 +19,7 @@ void returning_without_writing_to_memcpy(const char *src, unsigned size) {
   maybeWrite(src, size, block); // expected-note{{Returning from 'maybeWrite'}}
 
   int buf[8 * 8];
-  memcpy(buf, &block[0], 8); // expected-warning{{The first element of the 2nd argument is undefined [alpha.unix.cstring.UninitializedRead]}}
+  memcpy(buf, &block[0], 8); // expected-warning{{The first element of the 2nd argument is undefined [unix.cstring.UninitializedRead]}}
                              // expected-note@-1{{The first element of the 2nd argument is undefined}}
                              // expected-note@-2{{Other elements might also be undefined}}
 }
diff --git a/clang/test/Analysis/infeasible-crash.c b/clang/test/Analysis/infeasible-crash.c
index d4e6a66f85bcf..062d13f6fe63b 100644
--- a/clang/test/Analysis/infeasible-crash.c
+++ b/clang/test/Analysis/infeasible-crash.c
@@ -1,6 +1,6 @@
 // RUN: %clang_analyze_cc1 %s \
 // RUN:   -analyzer-checker=core \
-// RUN:   -analyzer-checker=alpha.unix.cstring.OutOfBounds,alpha.unix.cstring.UninitializedRead \
+// RUN:   -analyzer-checker=alpha.unix.cstring.OutOfBounds,unix.cstring.UninitializedRead \
 // RUN:   -analyzer-config eagerly-assume=false \
 // RUN:   -verify
 
diff --git a/clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c b/clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
index 4de004e00687a..e1f365cdfbcf6 100644
--- a/clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
+++ b/clang/test/Analysis/std-c-library-functions-arg-enabled-checkers.c
@@ -63,6 +63,7 @@
 // CHECK-NEXT: unix.cstring.BadSizeArg
 // CHECK-NEXT: unix.cstring.NotNullTerminated
 // CHECK-NEXT: unix.cstring.NullArg
+// CHECK-NEXT: unix.cstring.UninitializedRead
 
 int main() {
   int i;
diff --git a/clang/test/Analysis/wstring.c b/clang/test/Analysis/wstring.c
index 9c60d39ff502e..340a01c047a7e 100644
--- a/clang/test/Analysis/wstring.c
+++ b/clang/test/Analysis/wstring.c
@@ -2,7 +2,7 @@
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false  
 //
@@ -10,7 +10,7 @@
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-checker=unix.cstring \
 // RUN:   -analyzer-checker=alpha.unix.cstring \
-// RUN:   -analyzer-disable-checker=alpha.unix.cstring.UninitializedRead \
+// RUN:   -analyzer-disable-checker=unix.cstring.UninitializedRead \
 // RUN:   -analyzer-checker=debug.ExprInspection \
 // RUN:   -analyzer-config eagerly-assume=false
 

@NagyDonat NagyDonat requested a review from steakhal May 12, 2026 14:08
@steakhal steakhal removed the request for review from cor3ntin May 13, 2026 09:16
@steakhal
Copy link
Copy Markdown
Contributor

At first glance it has similar problems as the ArrayBound. It's frequently difficult to keep track of the buffer and where it points to, and incidentally trick the engine to believe that it points to outside of the buffer - I speculate.

I looked at some of the samples, and for example:

  1. sqlite3.c: it's not clear to me if this is a TP, and it wasn't classified. When I looked at the trace, I didn't find it easy to decide myself. It doesn't seem to be actionable to me.
  2. ffmpeg/libavcodec/cinepakenc.c Similar as (1).
  3. vim/src/syntax.c: It's somewhat funny in the example that it reports that char_u buf_chartab[32] was "'buf_chartab' initialized here" - while in fact it was just declared there without any initialization and that was the point of the report. I think this will be confusing to our users.

As a generic note, the error message could/should be probably improved, because right now the The first element of the 2nd argument is undefined is not too helpful.
The problem is that it could mean two things in the engine:

  • The location it refers to was never initialized. (like in example 3)
  • The location might have been initialized, but we formed an out-of-bounds pointer (such as a pointer to the end of a buffer, aka. 1 past last element), and we pass that to something that will dereference it. -- I find these cases a lot more difficult to decipher in practice.

If you all still believe that these issues should not block this move, I could look at the other examples to form a grounded opinion, but right now I'm a bit concerned.

@NagyDonat
Copy link
Copy Markdown
Contributor

NagyDonat commented May 13, 2026

Quick replies to to the quick reply (Iwill try to give a proper review later 😅):

3. vim/src/syntax.c: It's somewhat funny in the example that it reports that char_u buf_chartab[32] was "'buf_chartab' initialized here" - while in fact it was just declared there without any initialization and that was the point of the report. I think this will be confusing to our users.

IIRC that message comes from the bug reporter visitors and we get it automagically via trackExpressionValue(). Definitely deserves a simple patch to fix it and I think this is feasible to fix this without getting mobbed by the skeletons in the closet bug reporter visitor code.

As a generic note, the error message could/should be probably improved, because right now the The first element of the 2nd argument is undefined is not too helpful. The problem is that it could mean two things in the engine:

  • The location it refers to was never initialized. (like in example 3)

  • The location might have been initialized, but we formed an out-of-bounds pointer (such as a pointer to the end of a buffer, aka. 1 past last element), and we pass that to something that will dereference it. -- I find these cases a lot more difficult to decipher in practice.

EDIT: I misremembered the internals of the engine and the stuff below is completely nonsense, the engine doesn't perform repeated bounds checking. The undefined values are coming from some other bug of the RegionStore.

The problem is that the engine is repeating the job of the security.ArrayBound checker and produces Undefined values when it sees an out-of-bounds access that was somehow not detected by the ArrayBound checker. In practice this discrepancy may occur in two situations:

  1. The ArrayBound checker is disabled by the user. In this case it is reasonable to say that they don't want to see these indirect out of bounds reports either.
  2. Either ArrayBound or the logic in the engine is buggy. As I spent lots of time on improving ArrayBound but AFAIK the engine still uses the logic of ArrayBoundV1, I would rather trust the ArrayBound checker.

To fix this, we could ensure that the engine also invokes the up-to-date logic that is used by the ArrayBound checker, but that still doesn't resolve the "doing the same job twice" situation. Therefore I propose that the engine should return UnknownVal instead of UndefinedVal when it thinks that it is reading out-of-bounds memory. This behavior is only relevant if ArrayBound is not enabled, and in that case I think it is better to not report these indirect reports either (which are IMO significantly harder to understand than the direct out-of-bounds reports).

@steakhal @gamesh411 What do you think about this proposal?

@NagyDonat
Copy link
Copy Markdown
Contributor

I looked at a few of the analysis reports together with Endre (= @gamesh411) and we found that:

  • Some previously undiagnosed reports [1] [2] are actually true positives because the code copies a buffer which is partially uninitialized. (It also saves the size of the initialized segment, so it won't actually read from the uninitialized part; but still, the thing reported by the checker does happen.)
  • We found a small false positive which we were able to reproduce in godbolt. This is a regression that wasn't present in clang 22.1, Endre is working on fixing it right now.

Endre is also working on changing the "was initialized here" message to "was left uninitialized here" in the notes.

We will revisit this PR (and the inspection of the rest of the analysis results) when Endre is done with these two subtasks.

@gamesh411
Copy link
Copy Markdown
Contributor Author

I have found the potential cause of the false positive and have a patch here: #198346.
The notes are enhanced in this PR: #198345.

@gamesh411
Copy link
Copy Markdown
Contributor Author

At first glance it has similar problems as the ArrayBound. It's frequently difficult to keep track of the buffer and where it points to, and incidentally trick the engine to believe that it points to outside of the buffer - I speculate.

I looked at some of the samples, and for example:

  1. sqlite3.c: it's not clear to me if this is a TP, and it wasn't classified. When I looked at the trace, I didn't find it easy to decide myself. It doesn't seem to be actionable to me.
  2. ffmpeg/libavcodec/cinepakenc.c Similar as (1).
  3. vim/src/syntax.c: It's somewhat funny in the example that it reports that char_u buf_chartab[32] was "'buf_chartab' initialized here" - while in fact it was just declared there without any initialization and that was the point of the report. I think this will be confusing to our users.

As a generic note, the error message could/should be probably improved, because right now the The first element of the 2nd argument is undefined is not too helpful. The problem is that it could mean two things in the engine:

  • The location it refers to was never initialized. (like in example 3)
  • The location might have been initialized, but we formed an out-of-bounds pointer (such as a pointer to the end of a buffer, aka. 1 past last element), and we pass that to something that will dereference it. -- I find these cases a lot more difficult to decipher in practice.

If you all still believe that these issues should not block this move, I could look at the other examples to form a grounded opinion, but right now I'm a bit concerned.

The error message seems to deliberately only mention if either the beginning of the read buffer is uninitialized, or the end of this same buffer is uninitialized. Tracking a more precise location could be done, but that would need a linear scan over that range of the buffer, and this is why I think the decision was made to not do it, even though it would provide a more precise location of where reading uninitialized values start.
I can trivially fix the note generation, and have done it in #198345.
With this FP fix (#198346), I'll rerun the analysis and report back with the new results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:static analyzer clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants