Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][analyzer] Change default value of checker option in unix.StdCLibraryFunctions. #80457

Merged
merged 3 commits into from
Mar 4, 2024

Conversation

balazske
Copy link
Collaborator

@balazske balazske commented Feb 2, 2024

Default value of checker option ModelPOSIX is changed to true. Documentation is updated.

…LibraryFunctions.

Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Feb 2, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 2, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Kéri (balazske)

Changes

Default value of checker option ModelPOSIX is changed to true. Documentation is updated.


Full diff: https://github.com/llvm/llvm-project/pull/80457.diff

3 Files Affected:

  • (modified) clang/docs/analyzer/checkers.rst (+15-4)
  • (modified) clang/include/clang/StaticAnalyzer/Checkers/Checkers.td (+1-1)
  • (modified) clang/test/Analysis/analyzer-config.c (+1-1)
diff --git a/clang/docs/analyzer/checkers.rst b/clang/docs/analyzer/checkers.rst
index bb637cf1b8007..24522e56501e5 100644
--- a/clang/docs/analyzer/checkers.rst
+++ b/clang/docs/analyzer/checkers.rst
@@ -1299,10 +1299,21 @@ range of the argument.
 
 **Parameters**
 
-The checker models functions (and emits diagnostics) from the C standard by
-default. The ``ModelPOSIX`` option enables modeling (and emit diagnostics) of
-additional functions that are defined in the POSIX standard. This option is
-disabled by default.
+The ``ModelPOSIX`` option controls if functions from the POSIX standard are
+recognized by the checker. If ``true``, a big amount of POSIX functions is
+modeled according to the
+`POSIX standard`_. This
+includes ranges of parameters and possible return values. Furthermore the
+behavior related to ``errno`` in the POSIX case is often that ``errno`` is set
+only if a function call fails, and it becomes undefined after a successful
+function call.
+If ``false``, functions are modeled according to the C99 language standard.
+This includes far less functions than the POSIX case. It is possible that the
+same functions are modeled differently in the two cases because differences in
+the standards. The C standard specifies less aspects of the functions, for
+example exact ``errno`` behavior is often unspecified (and not modeled by the
+checker).
+Default value of the option is ``true``.
 
 .. _osx-checkers:
 
diff --git a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
index e7774e5a9392d..a224b81c33a62 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
+++ b/clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
@@ -578,7 +578,7 @@ def StdCLibraryFunctionsChecker : Checker<"StdCLibraryFunctions">,
                   "ModelPOSIX",
                   "If set to true, the checker models additional functions "
                   "from the POSIX standard.",
-                  "false",
+                  "true",
                   InAlpha>
   ]>,
   WeakDependencies<[CallAndMessageChecker, NonNullParamChecker]>,
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 373017f4b18bf..2167a2b32f596 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -129,7 +129,7 @@
 // CHECK-NEXT: unix.DynamicMemoryModeling:Optimistic = false
 // CHECK-NEXT: unix.Errno:AllowErrnoReadOutsideConditionExpressions = true
 // CHECK-NEXT: unix.StdCLibraryFunctions:DisplayLoadedSummaries = false
-// CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = false
+// CHECK-NEXT: unix.StdCLibraryFunctions:ModelPOSIX = true
 // CHECK-NEXT: unroll-loops = false
 // CHECK-NEXT: verbose-report-filename = false
 // CHECK-NEXT: widen-loops = false

@steakhal
Copy link
Contributor

steakhal commented Feb 5, 2024

I'm excited to see this change.
I've not reviewed this yet.

@balazske
Copy link
Collaborator Author

balazske commented Feb 9, 2024

The change was evaluated on the following projects. "Lost reports" shows results that disappear if the ModelPOSIX option is changed to true. "New reports" shows the new results. Many of the new results come from the large number of modeled functions. The lost reports are more interesting (some are at project postgres), probably the analysis changes because preconditions of functions are applied (if the option is turned on).

Project Lost Reports New Reports
memcached link link
tmux link link
curl link link
twin link link
vim link link
openssl link link
sqlite link link
ffmpeg link link
postgres link link
xerces link link
bitcoin link link

@NagyDonat
Copy link
Contributor

NagyDonat commented Feb 12, 2024

I analyzed the results uploaded by @balazske and found the following:

memcached

The new ModelPosix=true produces two new bug reports (1) assuming that fileno() can fail and (2) errno is undefined after close(). These are arguably true positives, although it's unclear whether fileno() can fail or not (e.g. the manpage on my linux claims both that it should not fail and that it can fail: "These functions should not fail and do not set the external variable errno. (However, in case fileno() detects that its argument is not a valid stream, it must return -1 and set errno to EBADF.)").

tmux

The new ModelPosix=true produces yet another errno undefined after close() and a case where the checker assumes that opening "/dev/null" can fail. The first is a TP, the second is FP in practice but is a reasonable report.

curl

There are 9 new reports with ModelPosix=true:

twin

Two new reports with ModelPosix=true, one tricky mmap issue that appears to be TP if we consider the function in isolation and assume that its len argument can be 0 and yet another checker assumes that opening "/dev/null" can fail report (FP in practice).

vim

7 new reports with ModelPosix=true:

  • three fstat(fileno(), ..) issues (1), (2), (3),
  • two fchown(fileno(), ...) issues (1), (2),
  • one report that's impossible to understand because the relevant things happen in a function that was pruned (but probably FP),
  • an "errno becomes undefined after successful call" TP.
    Note that vim is paranoid enough to handle the case when opening "/dev/null" fails.

openssl

3 new reports with ModelPosix=true:

sqlite

One new report with ModelPosix=true where the checker assumes that ftell() returns -1 and this leads to a malloc(0) call.

ffmpeg

  • the old ModelPosix=false produced one FP that disappeared for unknown reasons. This seems to be a "honest mistake" of the analyzer (it doesn't know that ff_neterrno() cannot return 0 = success), I don't know how ModelPosix affected it.
  • on the other hand the new ModelPosix=true produces a second argument of mmap is 0 error that is almost surely a false positive. The root cause is probably the rough / incorrect modeling of regions and subregions.

postgres

Two lost reports (that no longer appear with ModelPosix=true) and 33 (!!) new reports:

xerces

ModelPosix=true introduces two new reports: one unhandled failure of ftell (with a surprising but essentially correct error message) and an fdopen(dup()) report.

bitcoin

We have three new reports: a good old isatty(fileno()) issue, a false positive where it seems that the analyzer wasn't able to handle an opaque "Status" type, and a fdatasync(fileno() report.

Conclusion

Apparently there are many projects that use fileno() without handling its failure, so reporting each of these calls is a bit too noisy. I'm not familiar with the relevant parts of the posix standard, but purely reasoning from the observed usage I'd say that we should hide this "strict" fileno-may-fail modeling behind an off-by-default flag (or eliminate it completely).

Apart from this question, the change seems to be reasonable and there are several situations where it produces valuable reports.

@balazske
Copy link
Collaborator Author

The new appeared bug reports should be similar to the ones that were observed when StdCLibraryFunctionsChecker was made non-alpha (and probably were checked already one time) (because the option was turned on in those tests).
A different solution can be to add a Linux-mode for the checker (change option ModelPOSIX to an enumeration like "C", "POSIX", "Linux"). The strict POSIX standard does not tell exactly that fileno fails only if the file descriptor is invalid. Probably for other functions too the man pages are more detailed about error cases, so the information increases from C to POSIX to Linux. It may be possible to automatically detect presence of Linux source code by checking some macros.

@balazske
Copy link
Collaborator Author

balazske commented Feb 13, 2024

This may happen because the "controlled environment" analyzer option may be set to true (but I did not check it). Without ModelPOSIX the getenv call can fail or not (it is not modeled), but with ModelPOSIX it is modeled by the checker and it is assumed that it can not fail (environment variable exists always). In this case the branch with strdup is not executed at all.
Additionally this is maybe not a true positive. The string is passed to putenv and probably should not be freed by the program.

@balazske
Copy link
Collaborator Author

Because the many cases with fileno I can agree to change the summary so we assume that it never fails. Probably an other checker may find a case if the passed file handle is invalid because it was not initialized, or the file was already closed (StreamChecker should find this).

@NagyDonat
Copy link
Contributor

This may happen because the "controlled environment" analyzer option may be set to true (but I did not check it). Without ModelPOSIX the getenv call can fail or not (it is not modeled), but with ModelPOSIX it is modeled by the checker and it is assumed that it can not fail (environment variable exists always). In this case the branch with strdup is not executed at all. Additionally this is maybe not a true positive. The string is passed to putenv and probably should not be freed by the program.

You're right that the string passed to putenv should not be freed, so this was a false positive. Let's just ignore the disappearance of this report, investigating it provides negligible benefits but could be difficult.

Because the many cases with fileno I can agree to change the summary so we assume that it never fails.

Thanks, that would be a good way forward. Ping me if you have a commit for changing the summary, I'll review it quickly.

Probably an other checker may find a case if the passed file handle is invalid because it was not initialized, or the file was already closed (StreamChecker should find this).

Good idea, that would be very nice as a separate longer-term solution :)

@balazske
Copy link
Collaborator Author

Behavior of fileno is already changed in #81842.
I was thinking about that

separate long-term solution

in last comment that it is already existing functionality (in StreamChecker and other invalid pointer checkers).
Should we run again the checks (only modeling of fileno was changed), or is this change acceptable now?

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think that it isn't necessary to re-evaluate the change, because it's clear that the fileno issue is handled and the other reports are good.

I have one very minor suggestion to slightly improve the documentation, but the change is also acceptable without that.

Comment on lines 1302 to 1316
The ``ModelPOSIX`` option controls if functions from the POSIX standard are
recognized by the checker. If ``true``, a big amount of POSIX functions is
modeled according to the
`POSIX standard`_. This
includes ranges of parameters and possible return values. Furthermore the
behavior related to ``errno`` in the POSIX case is often that ``errno`` is set
only if a function call fails, and it becomes undefined after a successful
function call.
If ``false``, functions are modeled according to the C99 language standard.
This includes far less functions than the POSIX case. It is possible that the
same functions are modeled differently in the two cases because differences in
the standards. The C standard specifies less aspects of the functions, for
example exact ``errno`` behavior is often unspecified (and not modeled by the
checker).
Default value of the option is ``true``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The ``ModelPOSIX`` option controls if functions from the POSIX standard are
recognized by the checker. If ``true``, a big amount of POSIX functions is
modeled according to the
`POSIX standard`_. This
includes ranges of parameters and possible return values. Furthermore the
behavior related to ``errno`` in the POSIX case is often that ``errno`` is set
only if a function call fails, and it becomes undefined after a successful
function call.
If ``false``, functions are modeled according to the C99 language standard.
This includes far less functions than the POSIX case. It is possible that the
same functions are modeled differently in the two cases because differences in
the standards. The C standard specifies less aspects of the functions, for
example exact ``errno`` behavior is often unspecified (and not modeled by the
checker).
Default value of the option is ``true``.
The ``ModelPOSIX`` option controls if functions from the POSIX standard are
recognized by the checker.
With ``ModelPOSIX=true``, lots of POSIX functions are modeled according to the
`POSIX standard`_. This includes ranges of parameters and possible return
values. Furthermore the behavior related to ``errno`` in the POSIX case is
often that ``errno`` is set only if a function call fails, and it becomes
undefined after a successful function call.
With ``ModelPOSIX=false``, this checker follows the C99 language standard and
only models the functions that are described there. It is possible that the
same functions are modeled differently in the two cases because differences in
the standards. The C standard specifies less aspects of the functions, for
example exact ``errno`` behavior is often unspecified (and not modeled by the
checker).
Default value of the option is ``true``.

@balazske balazske merged commit 7af4e8b into llvm:main Mar 4, 2024
5 checks passed
balazske added a commit that referenced this pull request Mar 4, 2024
balazske added a commit that referenced this pull request Mar 4, 2024
…unctions (second try). (#80457)

Default value of checker option `ModelPOSIX` is changed to `true`.
Documentation is updated.

This is a re-apply of commit 7af4e8b
that was reverted because a test failure (this is fixed now).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants