Skip to content

C++: IR'ify cpp/uninitialized-local and fix FPs #14704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 7, 2023

Conversation

MathiasVP
Copy link
Contributor

We've seen a lot of complaints recently about this query being very noisy. This is my attempt at fixing this on a relatively short-term basis. In the future, this query should be made smarter and handle the new missing results.

Commit-by-commit review recommended:

  • The first commit does most of the query changes: it replaces the use of AST-based control-flow analysis with the IR-based "Must flow" library. This should get rid of the kind of FPs we've seen reported to us.
  • The next commit fixes a bug in the "Must flow" library to handle the case where the source equals the sink.
  • Finally, the last commit fixes the last remaining FPs in the query by adding a barrier.

@MathiasVP MathiasVP requested a review from a team as a code owner November 7, 2023 09:27
@github-actions github-actions bot added the C++ label Nov 7, 2023
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QL and test results LGTM. Notably we're not getting any? results where the variable is only uninitialized in some paths now - but given the complaints from users that seems like a reasonable compromise for the time being.

I'd like to have a look at some real world differences, either on DCA (if there's enough data to see) or perhaps through MRVA.

@MathiasVP
Copy link
Contributor Author

QL and test results LGTM. Notably we're not getting any? results where the variable is only uninitialized in some paths now - but given the complaints from users that seems like a reasonable compromise for the time being.

Indeed, since we're now using the "must flow" library we won't have alerts on conditionally assigned variables. It's a bit of a shame, and we have an internal issue for how to bring back such results.

@MathiasVP
Copy link
Contributor Author

DCA just came back, and there are plenty of results to look at 😂. I plan on 👀'ing some of them later today. If you're also interested in doing so we can split up the working on a per-project basis @geoffw0.

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 7, 2023

Sounds good, which projects would you like me to look at?

@MathiasVP
Copy link
Contributor Author

MathiasVP commented Nov 7, 2023

Sounds good, which projects would you like me to look at?

If you take the offenders (wireshark__wireshark, erlang__otp and php__php-src) then I can take the rest. How does that sound? I haven't done the math to see if that's a fair distribution. So if you think I'm being unfair please just take a subset of those projects 😅

@geoffw0
Copy link
Contributor

geoffw0 commented Nov 7, 2023

wireshark__wireshark

1 new result:

  • this, likely FP, but certainly one I can live with. It appears to be initialized via:
decrypt_info.status = &status;
...
decrypt_ieee802154_payload(... &decrypt_info ...)

124 lost results. I sampled some:

  • the first result was a FP due to correlated write and read if statements.
  • the second result I believe is FP, written to in every branch, but you have to figure out that a switch statement without a default covers all possibilities to be sure.
  • the third result was a FP due to correlated write and read if statements.
  • this is again likely an FP due to correlated write and read if statements.
  • this is probably a FP, written to in every case. Involves another switch that may or may not be exhaustive.
  • this is FP, to be sure you have to figure out that a while loop always takes at least one iteration to due to initial state of some variables.

erlang__otp

5 new results:

  • the first result I'm not sure. argp is made to point to res, and is written in some branches of the switch, but we're in a loop so there could be some guarantee that one is always reached.
  • the second result is similarly fiddly, though I think it's slightly more likely that resp is maybe written in every case.

132 lost results:

  • the first result I think is FP. Every branch writes to it, but only because of an if statement that triggers in the case the switch doesn't cover.
  • the second result I think is FP. If n == 0 we don't reach the read, if n > 0 we would have assigned to it.

php__php-src

387 lost results:

  • the first result may be FP. Z_PARAM_LONG(y) appears to be a PHP macro that "Receives the integer number argument and stores the value of the actual parameter in a C variable of type zend_long". The implementation? is fiddly and it's not clear to me whether y would be written in the event of an error.
  • a randomly picked result looks similar, this time the macro that presumably writes to it is Z_PARAM_PATH.
  • the last result is FP, correlated conditions.

In summary:

  • we lose a lot of results; all that I looked at were FP or suspected FP. 🎉
  • we gained a very few results; all that I looked at were FP or suspected FP as well, but it's a much smaller number of results.
  • I didn't see anything I was confident was a TP (but they may be hiding as results that both versions of the query found).
  • I'm happy with the effect of this change overall.

Other thoughts:

  • correlated if statements are important cases, there are several common patterns, but we test these quite well.
  • switch statements are also common but I can't see any tests for these.
    • perhaps we should add tests with an exhaustive and non-exhaustive switch; and a non-exhaustive one where the read occurs in a correlated if such that the variable was always written.
  • one day having analysis of exhaustive / non-exhaustive switch statements might be nice for queries like this.

@MathiasVP
Copy link
Contributor Author

  • perhaps we should add tests with an exhaustive and non-exhaustive switch; and a non-exhaustive one where the read occurs in a correlated if such that the variable was always written.

Done in 69502d0

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. 👍

@MathiasVP MathiasVP merged commit a8eed6b into github:main Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants