New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clang's static analyzer should be able to warn when memset()/memmove()/memcpy() are optimized away by dead store elimination #61098
Comments
@llvm/issue-subscribers-clang-static-analyzer |
Yes that's a very reasonable check to have (MSC06-C. Beware of compiler optimizations). It sounds like after many years, this is still a real issue. It requires an analysis technique that the static analyzer isn't exactly built for, as it requires reasoning about all possible execution paths in the program (proving that the write is dead on all paths). Because the engine doesn't have any completeness guarantees, the checker can't rely on the engine to produce such proof, so it's largely on its own. The only checker we have of this kind is the "dead stores" checker, and it's pretty obscure and fairly hard to duplicate. So I think this is a good candidate for the new FlowSensitive engine that's being brought up recently in clang, specifically to deal with these issues (cc @ymand). Also somewhat relevant to the ClangIR effort, as it's a nice borderline example where bug-finding meets optimizations (cc @bcardosolopes). Side note, it's probably a bad idea to have the static analyzer checker talk to the LLVM optimizer. The LLVM optimizer is highly upredictable and you probably don't want analysis output to depend on -O levels; that's a known anti-pattern in compiler development, you'd rather know about a potential problem regardless of the current optimization level.
This check seems to actively recommend
They both cannot be optimized out, and also provide additional bounds safety through the extra parameter, but are otherwise more or less identical. I'm really curious what you think about that alternative, like is it applicable in your case, or does it still look like an overkill in most situations? |
In the case of OpenZFS, it is written using C99, which does not have I consider advice to unconditionally replace I would prefer to see warnings from uses that are almost certainly unsafe, such as when a program zeros stack memory right before a function returns, which will cause dead store elimination to remove |
|
Aha, yeah, then I guess it was the right call to make it an off-by-default check. If people actually have them, they can enable it.
I suspect that this is somewhat irrelevant to the problem at hand. Pretty much any memset() invocation can be optimized in weird ways, but we don't intend to warn on every memset() invocation because of that. The problem we've identified is that the developer uses this memset() with the intention to securely erase the buffer. If we can somehow prove the developer's intention, at least in some cases, then it'd be sufficient for us to emit a valuable warning. And one way to prove the developer's intention is to prove that the memset() call literally cannot serve any other purpose. In this case it either serves no purpose at all (dead code - a defect on its own), or it serves the security hardening purpose (again, a defect on its own). So even though we didn't find out which one it is, we know for a fact that the code doesn't make sense. That's a fairly common approach to problems in static analysis: we don't try to predict what the correct fix is, we simply try to point out that no matter how you look at it, the code under analysis doesn't make sense. So I think this could be a quite reasonable source-based or ClangIR-based flow sensitive check this way. |
If it helps, CodeQL already has a check for this. The CodeQL check is open source, so it could probably be skimmed for ideas. CodeQL had been run on OpenZFS PRs, but unfortunately, it failed to catch the issue in OpenZFS because the The vulnerable OpenZFS code probably could be the basis for a test case for a checker. The issue was in both |
Yeah it looks like interprocedural analysis would still be useful for finding some of those bugs. In order to find this specific bug, the checker needs to realize that This sounds quite advanced, so I wouldn't expect us to implement something of this kind quickly. IIUC there are no plans for the FlowSensitive framework to provide generic support for interprocedural analysis. And the static analyzer's generic path sensitive interprocedural analysis can't be used because the problem is not path-sensitive. It's not that hard to build interprocedural flow-sensitive analysis specifically for this problem, but we're not sure whether this is the way we want to go (do we really want to build a custom solution for every checker? so, have a hundred competing solutions to a very non-trivial problem?) so it's a tough call. But in any case, covering the basic case of plain |
In the short term, it might be wise to make Upon deeper inspection, the
Unless you are using the output of https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Format/SnprintfOverflow.ql That is not to mention that you can do unsafe loops with the "safe" I should note that it is possible to figure out the maximum length of the string that would be written by a loop that feeds the output of The complaint about
This is provably safe since the maximum possible length string printed is always written to a buffer that is more than adequate to store it. Since the buffer is stack allocated, the static analyzer should be able to determine that this usage is safe. I did an audit of the OpenZFS codebase not that long ago and switched all unsafe uses of That said, if you are printing a floating point or double value, the worst case size can be over 300 characters in length, which few developers expect. Coincidentally, CodeQL also has a check for this: https://github.com/github/codeql/blob/main/cpp/ql/src/Security/CWE/CWE-120/OverrunWriteFloat.ql Calculation of the worst case buffer usage by a clang static analyzer check should take that into account. These remarks probably also apply to the v variations of those functions. In specific, |
Hello, I do a lot of very quick looks at a wide variety of programs and have a few thoughts to add; if they're not very useful or applicable then I'm sorry for the trouble.
Some of the patterns that I think would be far more useful to report:
These patterns are potentially-unsafe uses of reasonable functions. Actually analyzing them takes time and effort, and I can't expect the compiler to solve the halting problem in an effort to tell the difference. But, some help here would be appreciated. Thanks |
|
On Fri, Mar 03, 2023 at 01:55:43PM -0800, Richard Yao wrote:
`p = malloc(a * sizeof (*p))` should be excluded from such a check. I
But if `a` is large enough (which obviously depends upon context) then
this multiplication can overflow and allocate vastly less memory than
intended. The usual next step is to write `a` objects into the array
and stomp all over unrelated memory.
`p = calloc(a, sizeof(*p));` would be the usual solution for this case.
|
Good point, although I would prefer to see a static analyzer check that explicitly checks for the possibility of this issue rather than one that blindly reports every instance of a multiplication. CodeQL has an explicit check for that: https://github.com/github/codeql/blob/main/cpp/ql/src/Likely%20Bugs/Arithmetic/IntMultToLong.ql That said, I understand that the Linux kernel has been slowly moving to using helper functions (i.e. |
Very good points raised on this thread! On the clangIR bits mentioned by @haoNoQ
Definitely something that can be done as a ClangIR pass, though the hard part is coming up with the proper heuristics / good user experience. Since ClangIR is decoupled from LLVM IR level optimization, we wouldn't be able to write such warnings/remarks in the current scenario, but probably in one where we'd have written the same optimization on top of clangir, which will likely happen at some point. |
memset()
is often used for data sanitization in encryption code. In a project that I regularly scan with Clang's static analyzer, we recently found some memset() operations meant to protect against information leaks in encryption code were optimized away by dead store elimination.The optional security.insecureAPI.DeprecatedOrUnsafeBufferHandling (C) check will warn about any usage of
memset()
/memmove()
/memcpy()
, but this seems extreme since there are no safe alternatives tomemmove()
/memcpy()
and for initialization,memset()
is perfectly safe to use. It would be more useful if a check were made that warns whenever dead store optimization should eliminate a memset()/memcpy()/memmove() operation. The commercial static analyzer, PVS Studio, already does this, but it is prohibitively expensive for open source projects.https://pvs-studio.com/en/docs/warnings/v570/
The text was updated successfully, but these errors were encountered: