[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

balazske · 2024-02-02T16:14:11Z

Default value of checker option ModelPOSIX is changed to true. Documentation is updated.

…LibraryFunctions. Default value of checker option `ModelPOSIX` is changed to `true`. Documentation is updated.

llvmbot · 2024-02-02T16:14:38Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Kéri (balazske)

Changes

Default value of checker option ModelPOSIX is changed to true. Documentation is updated.

Full diff: https://github.com/llvm/llvm-project/pull/80457.diff

3 Files Affected:

(modified) clang/docs/analyzer/checkers.rst (+15-4)
(modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+1-1)
(modified) clang/test/Analysis/analyzer-config.c (+1-1)

diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index bb637cf1b8007..24522e56501e5 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -1299,10 +1299,21 @@ range of the argument.
 
 **Parameters**
 
-The checker models functions (and emits diagnostics) from the C standard by
-default. The ``ModelPOSIX`` option enables modeling (and emit diagnostics) of
-additional functions that are defined in the POSIX standard. This option is
-disabled by default.
+The ``ModelPOSIX`` option controls if functions from the POSIX standard are
+recognized by the checker. If ``true``, a big amount of POSIX functions is
+modeled according to the
+`POSIX standard`_. This
+includes ranges of parameters and possible return values. Furthermore the
+behavior related to ``errno`` in the POSIX case is often that ``errno`` is set
+only if a function call fails, and it becomes undefined after a successful
+function call.
+If ``false``, functions are modeled according to the C99 language standard.
+This includes far less functions than the POSIX case. It is possible that the
+same functions are modeled differently in the two cases because differences in
+the standards. The C standard specifies less aspects of the functions, for
+example exact ``errno`` behavior is often unspecified (and not modeled by the
+checker).
+Default value of the option is ``true``.
 
 .. _osx-checkers:
 
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index e7774e5a9392d..a224b81c33a62 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -578,7 +578,7 @@ def StdCLibraryFunctionsChecker : Checker<"StdCLibraryFunctions">,
                   "ModelPOSIX",
                   "If set to true, the checker models additional functions "
                   "from the POSIX standard.",
-                  "false",
+                  "true",
                   InAlpha>
   ]>,
   WeakDependencies<[CallAndMessageChecker, NonNullParamChecker]>,
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 373017f4b18bf..2167a2b32f596 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -129,7 +129,7 @@
 // CHECK-NEXT: unix.DynamicMemoryModeling:Optimistic = false
 // CHECK-NEXT: unix.Errno:AllowErrnoReadOutsideConditionExpressions = true
 // CHECK-NEXT: unix.StdCLibraryFunctions:DisplayLoadedSummaries = false
-// CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = false
+// CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = true
 // CHECK-NEXT: unroll-loops = false
 // CHECK-NEXT: verbose-report-filename = false
 // CHECK-NEXT: widen-loops = false

steakhal · 2024-02-05T17:16:48Z

I'm excited to see this change.
I've not reviewed this yet.

balazske · 2024-02-09T12:07:57Z

The change was evaluated on the following projects. "Lost reports" shows results that disappear if the ModelPOSIX option is changed to true. "New reports" shows the new results. Many of the new results come from the large number of modeled functions. The lost reports are more interesting (some are at project postgres), probably the analysis changes because preconditions of functions are applied (if the option is turned on).

Project	Lost Reports	New Reports
memcached	link	link
tmux	link	link
curl	link	link
twin	link	link
vim	link	link
openssl	link	link
sqlite	link	link
ffmpeg	link	link
postgres	link	link
xerces	link	link
bitcoin	link	link

NagyDonat · 2024-02-12T16:27:03Z

I analyzed the results uploaded by @balazske and found the following:

memcached

The new ModelPosix=true produces two new bug reports (1) assuming that fileno() can fail and (2) errno is undefined after close(). These are arguably true positives, although it's unclear whether fileno() can fail or not (e.g. the manpage on my linux claims both that it should not fail and that it can fail: "These functions should not fail and do not set the external variable errno. (However, in case fileno() detects that its argument is not a valid stream, it must return -1 and set errno to EBADF.)").

tmux

The new ModelPosix=true produces yet another errno undefined after close() and a case where the checker assumes that opening "/dev/null" can fail. The first is a TP, the second is FP in practice but is a reasonable report.

curl

There are 9 new reports with ModelPosix=true:

one very confusing report on an extraordinarily ugly macro -- probably FP, but the author "asked for it" with this mess,
there are two bitwiseshift reports on ugly black magic that breaks if we assume that fileno() returns -1,
one that looks like a straightforward TP caught among confusing code branches,
a 71-step monster that's also probably TP, but hard to understand,
two straightforward failure of "open()" not checked reports (1) (2) in test code, these seem to be TPs
an isatty(fileno()) issue,
a fstat(fileno(), ...) issue.

twin

Two new reports with ModelPosix=true, one tricky mmap issue that appears to be TP if we consider the function in isolation and assume that its len argument can be 0 and yet another checker assumes that opening "/dev/null" can fail report (FP in practice).

vim

7 new reports with ModelPosix=true:

three fstat(fileno(), ..) issues (1), (2), (3),
two fchown(fileno(), ...) issues (1), (2),
one report that's impossible to understand because the relevant things happen in a function that was pruned (but probably FP),
an "errno becomes undefined after successful call" TP.
Note that vim is paranoid enough to handle the case when opening "/dev/null" fails.

openssl

3 new reports with ModelPosix=true:

two fstat(fileno(), ...) issues: (1) and (2),
one issue where the failure of fdopen() is not handled.

sqlite

One new report with ModelPosix=true where the checker assumes that ftell() returns -1 and this leads to a malloc(0) call.

ffmpeg

the old ModelPosix=false produced one FP that disappeared for unknown reasons. This seems to be a "honest mistake" of the analyzer (it doesn't know that ff_neterrno() cannot return 0 = success), I don't know how ModelPosix affected it.
on the other hand the new ModelPosix=true produces a second argument of mmap is 0 error that is almost surely a false positive. The root cause is probably the rough / incorrect modeling of regions and subregions.

postgres

Two lost reports (that no longer appear with ModelPosix=true) and 33 (!!) new reports:

a straightforward leak of a string returned by strdup() is lost and I don't know why. Perhaps turn this into an unit test to examine what happens?
a low-quality FP is also lost -- here the FP originates from the usual problems with loop handling and an ugly macro; I don't know why it disappeared but I won't miss it.
among the new results, half of them are isatty(fileno()) reports: (1), (2), (3), (4), (5) (6), (7), (8), (9), (10), (11), (12), (13), (14), (15), (16),
there are also two dup(fileno()) failures (1), (2),
and four new dup2(..., fileno()) failures: (1), (2), (3), (4),
and 5 fstat(fileno(),...) issues (1), (2) (3), (4) and (5);
in addition to all these unhandled fileno() failures we also have two new fdopen(dup(), ...) issues (1) , (2),
one new "errno is not checked after rewind()" TP,
one new "second argument of mmap is 0" FP that appears because the analyzer assumed a bad constraint in a loop (the usual "if there is a loop, handle zero iterations as a separate branch" bug),
one issue where it's unclear if an error reporting function is noreturn or not (if it's noreturn, this is a FP, I'd guess that the analyzer can't determine this without CTU),
one TP wheren NULL is used as a filename string
one FP where I think that the engine mishandles a cast and assumes that (send(tmpsock, (char *) &crp, sizeof(crp), 0) != (int) sizeof(crp)) can be true even if send succeeds and returns the size of crp (which is a struct variable).

xerces

ModelPosix=true introduces two new reports: one unhandled failure of ftell (with a surprising but essentially correct error message) and an fdopen(dup()) report.

bitcoin

We have three new reports: a good old isatty(fileno()) issue, a false positive where it seems that the analyzer wasn't able to handle an opaque "Status" type, and a fdatasync(fileno() report.

Conclusion

Apparently there are many projects that use fileno() without handling its failure, so reporting each of these calls is a bit too noisy. I'm not familiar with the relevant parts of the posix standard, but purely reasoning from the observed usage I'd say that we should hide this "strict" fileno-may-fail modeling behind an off-by-default flag (or eliminate it completely).

Apart from this question, the change seems to be reasonable and there are several situations where it produces valuable reports.

balazske · 2024-02-13T08:10:36Z

The new appeared bug reports should be similar to the ones that were observed when StdCLibraryFunctionsChecker was made non-alpha (and probably were checked already one time) (because the option was turned on in those tests).
A different solution can be to add a Linux-mode for the checker (change option ModelPOSIX to an enumeration like "C", "POSIX", "Linux"). The strict POSIX standard does not tell exactly that fileno fails only if the file descriptor is invalid. Probably for other functions too the man pages are more detailed about error cases, so the information increases from C to POSIX to Linux. It may be possible to automatically detect presence of Linux source code by checking some macros.

balazske · 2024-02-13T15:35:24Z

a straightforward leak of a string returned by strdup() is lost and I don't know why. Perhaps turn this into an unit test to examine what happens?

This may happen because the "controlled environment" analyzer option may be set to true (but I did not check it). Without ModelPOSIX the getenv call can fail or not (it is not modeled), but with ModelPOSIX it is modeled by the checker and it is assumed that it can not fail (environment variable exists always). In this case the branch with strdup is not executed at all.
Additionally this is maybe not a true positive. The string is passed to putenv and probably should not be freed by the program.

balazske · 2024-02-13T15:49:55Z

Because the many cases with fileno I can agree to change the summary so we assume that it never fails. Probably an other checker may find a case if the passed file handle is invalid because it was not initialized, or the file was already closed (StreamChecker should find this).

NagyDonat · 2024-02-13T16:01:58Z

a straightforward leak of a string returned by strdup() is lost and I don't know why. Perhaps turn this into an unit test to examine what happens?

This may happen because the "controlled environment" analyzer option may be set to true (but I did not check it). Without ModelPOSIX the getenv call can fail or not (it is not modeled), but with ModelPOSIX it is modeled by the checker and it is assumed that it can not fail (environment variable exists always). In this case the branch with strdup is not executed at all. Additionally this is maybe not a true positive. The string is passed to putenv and probably should not be freed by the program.

You're right that the string passed to putenv should not be freed, so this was a false positive. Let's just ignore the disappearance of this report, investigating it provides negligible benefits but could be difficult.

Because the many cases with fileno I can agree to change the summary so we assume that it never fails.

Thanks, that would be a good way forward. Ping me if you have a commit for changing the summary, I'll review it quickly.

Probably an other checker may find a case if the passed file handle is invalid because it was not initialized, or the file was already closed (StreamChecker should find this).

Good idea, that would be very nice as a separate longer-term solution :)

balazske · 2024-02-28T17:02:08Z

Behavior of fileno is already changed in #81842.
I was thinking about that

separate long-term solution

in last comment that it is already existing functionality (in StreamChecker and other invalid pointer checkers).
Should we run again the checks (only modeling of fileno was changed), or is this change acceptable now?

NagyDonat

LGTM, I think that it isn't necessary to re-evaluate the change, because it's clear that the fileno issue is handled and the other reports are good.

I have one very minor suggestion to slightly improve the documentation, but the change is also acceptable without that.

NagyDonat · 2024-02-29T12:35:00Z

clang/docs/analyzer/checkers.rst

+The ``ModelPOSIX`` option controls if functions from the POSIX standard are
+recognized by the checker. If ``true``, a big amount of POSIX functions is
+modeled according to the
+`POSIX standard`_. This
+includes ranges of parameters and possible return values. Furthermore the
+behavior related to ``errno`` in the POSIX case is often that ``errno`` is set
+only if a function call fails, and it becomes undefined after a successful
+function call.
+If ``false``, functions are modeled according to the C99 language standard.
+This includes far less functions than the POSIX case. It is possible that the
+same functions are modeled differently in the two cases because differences in
+the standards. The C standard specifies less aspects of the functions, for
+example exact ``errno`` behavior is often unspecified (and not modeled by the
+checker).
+Default value of the option is ``true``.


Suggested change

The ``ModelPOSIX`` option controls if functions from the POSIX standard are

recognized by the checker. If ``true``, a big amount of POSIX functions is

modeled according to the

`POSIX standard`_. This

includes ranges of parameters and possible return values. Furthermore the

behavior related to ``errno`` in the POSIX case is often that ``errno`` is set

only if a function call fails, and it becomes undefined after a successful

function call.

If ``false``, functions are modeled according to the C99 language standard.

This includes far less functions than the POSIX case. It is possible that the

same functions are modeled differently in the two cases because differences in

the standards. The C standard specifies less aspects of the functions, for

example exact ``errno`` behavior is often unspecified (and not modeled by the

checker).

Default value of the option is ``true``.

The ``ModelPOSIX`` option controls if functions from the POSIX standard are

recognized by the checker.

With ``ModelPOSIX=true``, lots of POSIX functions are modeled according to the

`POSIX standard`_. This includes ranges of parameters and possible return

values. Furthermore the behavior related to ``errno`` in the POSIX case is

often that ``errno`` is set only if a function call fails, and it becomes

undefined after a successful function call.

With ``ModelPOSIX=false``, this checker follows the C99 language standard and

only models the functions that are described there. It is possible that the

same functions are modeled differently in the two cases because differences in

the standards. The C standard specifies less aspects of the functions, for

example exact ``errno`` behavior is often unspecified (and not modeled by the

checker).

Default value of the option is ``true``.

…nix.StdCLibraryFunctions. (#80457)" This reverts commit 7af4e8b.

…unctions (second try). (#80457) Default value of checker option `ModelPOSIX` is changed to `true`. Documentation is updated. This is a re-apply of commit 7af4e8b that was reverted because a test failure (this is fixed now).

{clang][analyzer] Change default value of checker option in unix.StdC…

1f65abd

…LibraryFunctions. Default value of checker option `ModelPOSIX` is changed to `true`. Documentation is updated.

llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Feb 2, 2024

balazske requested review from steakhal and NagyDonat February 5, 2024 17:07

Merge branch 'main' into stdclibraryfunctions_modelposix

07263b7

NagyDonat approved these changes Feb 29, 2024

View reviewed changes

changed the documentation

136b4c6

balazske merged commit 7af4e8b into llvm:main Mar 4, 2024
5 checks passed

balazske added a commit that referenced this pull request Mar 4, 2024

Revert "[clang][analyzer] Change default value of checker option in u…

da5966e

…nix.StdCLibraryFunctions. (#80457)" This reverts commit 7af4e8b.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

balazske commented Feb 2, 2024

llvmbot commented Feb 2, 2024 •

edited

steakhal commented Feb 5, 2024

balazske commented Feb 9, 2024 •

edited

NagyDonat commented Feb 12, 2024 •

edited

balazske commented Feb 13, 2024

balazske commented Feb 13, 2024 •

edited

balazske commented Feb 13, 2024

NagyDonat commented Feb 13, 2024

balazske commented Feb 28, 2024

NagyDonat left a comment •

edited

NagyDonat Feb 29, 2024

[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

Conversation

balazske commented Feb 2, 2024

llvmbot commented Feb 2, 2024 • edited

steakhal commented Feb 5, 2024

balazske commented Feb 9, 2024 • edited

NagyDonat commented Feb 12, 2024 • edited

memcached

tmux

curl

twin

vim

openssl

sqlite

ffmpeg

postgres

xerces

bitcoin

Conclusion

balazske commented Feb 13, 2024

balazske commented Feb 13, 2024 • edited

balazske commented Feb 13, 2024

NagyDonat commented Feb 13, 2024

balazske commented Feb 28, 2024

NagyDonat left a comment • edited

Choose a reason for hiding this comment

NagyDonat Feb 29, 2024

Choose a reason for hiding this comment

llvmbot commented Feb 2, 2024 •

edited

balazske commented Feb 9, 2024 •

edited

NagyDonat commented Feb 12, 2024 •

edited

balazske commented Feb 13, 2024 •

edited

NagyDonat left a comment •

edited