CPP: Better handling of %s/%c/%S/%C in Printf/FormattingFunction.qll#1119
CPP: Better handling of %s/%c/%S/%C in Printf/FormattingFunction.qll#1119jbj merged 11 commits intogithub:masterfrom
Conversation
jbj
left a comment
There was a problem hiding this comment.
Impressive work! I have just some minor comments.
...t/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Linux_mixed_byte_wprintf/tests.cpp
Show resolved
Hide resolved
| ( | ||
| c.getAnArgument() = "--microsoft" or | ||
| c.getAnArgument().matches("%\\\\cl.exe") | ||
| ) |
There was a problem hiding this comment.
Matching on \cl.exe seems fragile to me; for example, what if it's written in upper case or with a forward slash? @ian-semmle, is this the best we can do? Why don't we always get a --microsoft argument?
There was a problem hiding this comment.
We discussed the matter on Slack a few days ago, and agreed that the right way would be for the extractor to tell us explicitly whether a compilation was Microsoft or not.
Right now we might get slightly better accuracy by looking for the existence of the _MSC_VER macro anywhere in the snapshot, though we'll just have to assume that all files are compiled as Microsoft if we see it (so we'll do worse in the probably very rare case of mixed Microsoft and non-Microsoft compilations). Another suggestion was looking for paths beginning C: or similar (which detects a Microsoft file system, rather than compiler, and may work poorly with test path normalization).
I'm happy to implement the _MSC_VER thing if you're convinced it would be preferable. Or just make the cl.exe clause a bit more robust?
There was a problem hiding this comment.
In that case, it sounds better to make the cl.exe test more robust. Is it only relevant to test for cl.exe when there's also a --mimic argument?
There was a problem hiding this comment.
Yes I think so. My logic was that with matching the \, it's highly unlikely we'll get false results for cl.exe.
There was a problem hiding this comment.
A quick Google search shows that some people like to spell it CL.exe, so turning the match into a case-insensitive regex might be necessary. What Compilation argument do we get if the compiler is just invoked as cl, without the .exe? Or the --mimic option only work if .exe is included? I suppose this argument is only ever produced by our own tracer, so your current match might be fine if the tracer normalises everything.
There was a problem hiding this comment.
I've made it slash/case insensitive, that seems pretty uncontroversial. Removing the need for .exe seems riskier to me as cl by itself is a very short string that could coincidentally match something else (e.g. some command line flag that happens to be /CL).
...t/query-tests/Likely Bugs/Format/WrongTypeFormatArguments/Linux_mixed_byte_wprintf/tests.cpp
Show resolved
Hide resolved
a414c13 to
1336e5b
Compare
|
Merge conflict fixed. |
|
There's a failing test and a merge conflict. |
|
Yep, it's yet another conflict on the change notes file - fixed. The test result changes are addressed in https://git.semmle.com/Semmle/code/pull/31148/files |
jbj
left a comment
There was a problem hiding this comment.
This PR and the internal one LGTM. The internal one just needs a submodule bump before we can merge them.
0d2edb3 to
35e68ff
Compare
A customer pointed out a few weeks ago that the meanings of the
%s/%Sformat characters are not 'reversed' in wide printf functions on Linux as they are on Microsoft platforms - i.e.%sis alwayscharand%Sis always wide on Linux, whereas%salways matches the format string's character type and%Sis always the opposite in Microsoft-land.%c/%Care similarly affected.I've run some tests using VS on my Windows laptop and g++ on a remote Linux box and this appears to be an accurate summary - but I wouldn't be surprised if at some point we find edge cases (compiler arguments? C++ versions? MinGW?) to account for as well.
This PR fixes the above, improves the Microsoft-detection logic, and cleans up a bit more duplicated logic (see prior work in #1008).
Results and performance are unaffected for most real-world projects (wide character
printfs are relatively rare). Sadly I do not have access to the example which motivated this work, so we have to rely on tests.