Skip to content

CPP: Better handling of %s/%c/%S/%C in Printf/FormattingFunction.qll#1119

Merged
jbj merged 11 commits intogithub:masterfrom
geoffw0:wprintf2
Apr 1, 2019
Merged

CPP: Better handling of %s/%c/%S/%C in Printf/FormattingFunction.qll#1119
jbj merged 11 commits intogithub:masterfrom
geoffw0:wprintf2

Conversation

@geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Mar 15, 2019

A customer pointed out a few weeks ago that the meanings of the %s / %S format characters are not 'reversed' in wide printf functions on Linux as they are on Microsoft platforms - i.e. %s is always char and %S is always wide on Linux, whereas %s always matches the format string's character type and %S is always the opposite in Microsoft-land. %c / %C are similarly affected.

I've run some tests using VS on my Windows laptop and g++ on a remote Linux box and this appears to be an accurate summary - but I wouldn't be surprised if at some point we find edge cases (compiler arguments? C++ versions? MinGW?) to account for as well.

This PR fixes the above, improves the Microsoft-detection logic, and cleans up a bit more duplicated logic (see prior work in #1008).

Results and performance are unaffected for most real-world projects (wide character printfs are relatively rare). Sadly I do not have access to the example which motivated this work, so we have to rely on tests.

@geoffw0 geoffw0 added the C++ label Mar 15, 2019
@geoffw0 geoffw0 requested a review from a team as a code owner March 15, 2019 14:17
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work! I have just some minor comments.

(
c.getAnArgument() = "--microsoft" or
c.getAnArgument().matches("%\\\\cl.exe")
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching on \cl.exe seems fragile to me; for example, what if it's written in upper case or with a forward slash? @ian-semmle, is this the best we can do? Why don't we always get a --microsoft argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed the matter on Slack a few days ago, and agreed that the right way would be for the extractor to tell us explicitly whether a compilation was Microsoft or not.

Right now we might get slightly better accuracy by looking for the existence of the _MSC_VER macro anywhere in the snapshot, though we'll just have to assume that all files are compiled as Microsoft if we see it (so we'll do worse in the probably very rare case of mixed Microsoft and non-Microsoft compilations). Another suggestion was looking for paths beginning C: or similar (which detects a Microsoft file system, rather than compiler, and may work poorly with test path normalization).

I'm happy to implement the _MSC_VER thing if you're convinced it would be preferable. Or just make the cl.exe clause a bit more robust?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it sounds better to make the cl.exe test more robust. Is it only relevant to test for cl.exe when there's also a --mimic argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so. My logic was that with matching the \, it's highly unlikely we'll get false results for cl.exe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick Google search shows that some people like to spell it CL.exe, so turning the match into a case-insensitive regex might be necessary. What Compilation argument do we get if the compiler is just invoked as cl, without the .exe? Or the --mimic option only work if .exe is included? I suppose this argument is only ever produced by our own tracer, so your current match might be fine if the tracer normalises everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it slash/case insensitive, that seems pretty uncontroversial. Removing the need for .exe seems riskier to me as cl by itself is a very short string that could coincidentally match something else (e.g. some command line flag that happens to be /CL).

@jbj jbj added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Mar 18, 2019
@geoffw0 geoffw0 force-pushed the wprintf2 branch 4 times, most recently from a414c13 to 1336e5b Compare March 22, 2019 12:39
@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 22, 2019

Merge conflict fixed.

@jbj
Copy link
Contributor

jbj commented Mar 26, 2019

There's a failing test and a merge conflict.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Mar 27, 2019

Yep, it's yet another conflict on the change notes file - fixed.

The test result changes are addressed in https://git.semmle.com/Semmle/code/pull/31148/files

jbj
jbj previously approved these changes Mar 27, 2019
Copy link
Contributor

@jbj jbj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR and the internal one LGTM. The internal one just needs a submodule bump before we can merge them.

@geoffw0 geoffw0 force-pushed the wprintf2 branch 2 times, most recently from 0d2edb3 to 35e68ff Compare March 28, 2019 12:24
@jbj jbj merged commit 76caad0 into github:master Apr 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments