Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reland - Report coverage 0 of dead blocks #85385

Merged
merged 1 commit into from
Jun 4, 2021

Conversation

richkadel
Copy link
Contributor

@richkadel richkadel commented May 16, 2021

Fixes: #84018

With -Z instrument-coverage, coverage reporting of dead blocks
(for example, blocks dropped because a conditional branch is dropped,
based on const evaluation) is now supported.

Note, this PR relands an earlier, reverted PR that failed when compiling
generators. The prior issues with generators has been resolved and a new
test was added to prevent future regressions.

Check out the resulting changes to test coverage of dead blocks in the
test coverage reports in this PR.

r? @tmandry
fyi: @wesleywiser

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 16, 2021
@richkadel
Copy link
Contributor Author

@tmandry - I ran several experiments, trying different things, and tested compiling and coverage report generation on Fuchsia tests that were not working under the reverted PR. I did identify the original problem and corrected for it.

@richkadel
Copy link
Contributor Author

converted to draft because I found a new case with the same ICE caused by the reverted PR; so this is still not 100% fixed.

@richkadel
Copy link
Contributor Author

As alluded to in my previous comment, I managed to trigger the same assertion failure that forced us to revert the previous PR for dead block coverage (though much more rare):

thread 'rustc' panicked at 'No counters provided the source_hash for used function: Instance { def: Item(WithOptConstParam { did: DefId(0:1199 ~ ffx_emulator[8787]::types::__mock_MockFuchsiaPaths_FuchsiaPaths::__get_image_path::{impl#2}::matches), const_param_did: None }), substs: [] }', compiler/rustc_codegen_ssa/src/coverageinfo/map.rs:156:9

Initially, just to capture the stack trace and see where the problem was coming from, I added an assert to save_unreachable_coverage() that checks if there is at least one Counter still in the MIR after dropping dead blocks. As predicted, the assert failed, on the same function:

query stack during panic:
#0 [mir_drops_elaborated_and_const_checked] elaborating drops for types::__mock_MockFuchsiaPaths_FuchsiaPaths::__get_image_path::<impl at ../../src/developer/ffx/plugins/emulator/src/types.rs:34:1: 34:12>::matches

The stack trace (not shown here) showed that the problem occurs during the SimplifyCfg pass itself. I have to save unreachable coverage during that pass in most cases, so I'm now ensuring dead block coverage is only saved if at least one Counter still remains in a live block.

So I replaced the assert with a simple check, and if the result of dropping the dead blocks removes all Counters, I simply don't try to save them, for that function. (This is the same as the current behavior, without this PR.)

@richkadel richkadel marked this pull request as ready for review May 20, 2021 23:58
@richkadel
Copy link
Contributor Author

@wesleywiser - I think Tyler is out for a few days. No rust really, but if you are willing to review this one, it should be ready to land. FYI, I've been validating the edge cases that failed before no longer fail by compiling all of Fuchsia OS (kitchen-sink) with coverage. It's pretty extensive, and the new PR resolves the known failures from the previous iteration.

@wesleywiser
Copy link
Member

@richkadel I'd be happy to take a look later today!

@richkadel
Copy link
Contributor Author

@wesleywiser @tmandry - Thanks for your initial reviews.

Please let me know what I need to do to get your approval.

I can't simplify the logic any further.

I could remove the AssertNone option, and "hoping" no one violates those assertions, or I could change it to use debug_assert!(), but from my experience dealing with this issue, I believe the assertions should stay, and should not be debug-only: The unexpected issues I've encountered in the past happen in rare edge cases that I've found difficult to replicate in a unit test, and if the assertion is violated in the future, we likely won't catch it until the problem manifests itself in some weird downstream way, like broken coverage reports that no one can explain.

I can add more comments in the code, if that would help?

Thanks,
Rich

@tmandry
Copy link
Member

tmandry commented May 26, 2021

I've been thinking about this, and my concern is that we need to assert that other passes never remove counters to catch bugs, but I don't see a reason why it would always be true. For example, unreachable_prop is unlikely to be the only pass that ends up culling blocks as unreachable. Therefore we're passing the complexity burden onto the authors of future MIR passes to understand this issue and deal with it appropriately.

Instead of creating unreachable counters at the time they're removed, is there some bookkeeping we can do to notice those counters are indeed unreachable at codegen time? If we could then the logic could be self-contained and everything would just "fall out" of the final MIR, regardless of the passes that run on it. I realize this might require a pre-pass over the MIR at codegen time, but it could be a linear scan of all statements in all blocks (since we would have already simplified it), and I think that would be preferable.

@richkadel
Copy link
Contributor Author

Instead of creating unreachable counters at the time they're removed, is there some bookkeeping we can do to notice those counters are indeed unreachable at codegen time?

Some of this is probably obvious, but to summarize, once the SimplifyCfg pass runs, and deletes BasicBlocks with coverage statements, the coverage data is gone.

I think that any "bookkeeping" would have to be an additional data structure in the mir::Body, to store a duplicate data structure of code regions, just in case some subset of code regions is removed. Can you think of any other way?

Also, keep in mind this would only be relevant when blocks are removed due to const evaluation, and that's exceptionally rare. So the additional data structure and the added lookup logic during codegen seems to outweigh the benefit.

So let's go back to your initial concern:

that we need to assert that other passes never remove counters to catch bugs, but I don't see a reason why it would always be true

OK, I can see that argument.

It gives me a different idea. Let me see if I can change this to just simply be an opt-in kind of approach that leaves almost everything as it was before, and only implement the special handling (to save unreachable code regions) when simplifying after specific actions like (maybe only) const eval.

This sounds too obvious to me now, so I wonder if I already tried this and his a roadblock, but I'll give it another shot.

@richkadel
Copy link
Contributor Author

Is this any better? Much less changed.

@richkadel richkadel force-pushed the simpler-simplify-with-coverage branch from cb0082c to 74b8e38 Compare May 26, 2021 05:38
@richkadel
Copy link
Contributor Author

I just thought of a better way that addresses your concern, and overall should be more reliable, I believe. I will want to test it by compiling everything against fuchsia kitchen sink again.

@richkadel richkadel marked this pull request as draft May 27, 2021 02:19
@tmandry
Copy link
Member

tmandry commented May 27, 2021

I think I'd be fine with another data structure on the mir body, for what it's worth.

@richkadel richkadel force-pushed the simpler-simplify-with-coverage branch from 74b8e38 to 702c8df Compare June 1, 2021 01:37
@richkadel
Copy link
Contributor Author

I think I'd be fine with another data structure on the mir body, for what it's worth.

No need for that. In fact, what I just committed is even better than what I had planned to do.

I found the root cause of the problems that plagued the fuchsia_async (and similar) functions when trying to implement dead block coverage! (Not the macro issues, but the problem with the generator pass causing malformed coverage.)

So I removed the stuff you were concerned with, and cleaned up some unneeded checks, now that the generator CFG simplification is handled more directly (and still generally).

And as a bonus, I also was able to add a new test that I confirmed caused the same kind of malformed coverage issues, without the fixes I just made. That makes me much more comfortable that we should be able to catch similar issues or compiler regressions early.

This is once again ready for review.

@richkadel richkadel marked this pull request as ready for review June 1, 2021 02:45
@richkadel
Copy link
Contributor Author

@wesleywiser -

FYI, I think Tyler is OOO this week, but I think you should be able to confirm that this revised PR addresses both your concerns and Tyler's concerns, and if so, maybe you would be willing to approve this?

The new version of the PR removes the somewhat controversial directive for how to handle dropped coverage (Assert, Allow, or Save), and now makes the right choice atomically, in an efficient way.

And it finally includes a new test that confirms the new logic addresses the edge case that was giving me problems!

I was able to finally deduce the improved approach by trying many different approaches, experimentally, until I not only had a solution that fully compiles and generates correct coverage on a very large codebase of complex Rust (including many third party crates), but in doing so, also finally revealed the root cause of prior failures (edge case with some generator implementations).

From this, I was able to generate a new test (generator.rs) that reproduces the conditions that lead to prior workarounds, and is now fixed with this new, simpler implementation.

Copy link
Member

@wesleywiser wesleywiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I like this a lot more than the previous approach 🙂

@richkadel richkadel force-pushed the simpler-simplify-with-coverage branch from 702c8df to 65d3e60 Compare June 1, 2021 20:26
Fixes: rust-lang#84018

With `-Z instrument-coverage`, coverage reporting of dead blocks
(for example, blocks dropped because a conditional branch is dropped,
based on const evaluation) is now supported.

Note, this PR relands an earlier, reverted PR that failed when compiling
generators. The prior issues with generators has been resolved and a new
test was added to prevent future regressions.

Check out the resulting changes to test coverage of dead blocks in the
test coverage reports in this PR.
@bors
Copy link
Contributor

bors commented Jun 2, 2021

⌛ Testing commit f4f76e6 with merge 82407804c56e80a6bff84821f015c59e527e7171...

@bors
Copy link
Contributor

bors commented Jun 2, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 2, 2021
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@richkadel
Copy link
Contributor Author

I don't see any error messages?

Possible spurious infra failure on:

auto (dist-aarch64-apple, ./x.py dist --stage 2, --build=x86_64-apple-darwin --host=aarch64-apple...

@richkadel
Copy link
Contributor Author

@wesleywiser - I think we just need a "retry" (reason: spurious), unless you see something I don't.

Thanks!

@wesleywiser
Copy link
Member

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 2, 2021
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Jun 3, 2021
…erage, r=wesleywiser

Reland - Report coverage `0` of dead blocks

Fixes: rust-lang#84018

With `-Z instrument-coverage`, coverage reporting of dead blocks
(for example, blocks dropped because a conditional branch is dropped,
based on const evaluation) is now supported.

Note, this PR relands an earlier, reverted PR that failed when compiling
generators. The prior issues with generators has been resolved and a new
test was added to prevent future regressions.

Check out the resulting changes to test coverage of dead blocks in the
test coverage reports in this PR.

r? `@tmandry`
fyi: `@wesleywiser`
@bors
Copy link
Contributor

bors commented Jun 3, 2021

⌛ Testing commit f4f76e6 with merge 555e88f95f60af84045bdcf8127a1a3df688675f...

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Jun 3, 2021

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jun 3, 2021
@richkadel
Copy link
Contributor Author

@wesleywiser - Any idea what's going on here? Is this PR just unlucky? (2 spurious failures in 2 attempts?)

I can't see any details on the failure, and the changes should not have any Mac-specific impact AFAICT.

Do we need to "try" instead of "retry" (I'm not sure if there's a difference, but wondering if the failure yesterday left some bad state that "retry" may be using)?

Confused.

@wesleywiser
Copy link
Member

I think the macOS builders are just pretty flaky. I ran into the same thing in another PR today.

Do we need to "try" instead of "retry" (I'm not sure if there's a difference, but wondering if the failure yesterday left some bad state that "retry" may be using)?

try tells bors to produce build artifacts for Linux suitable for downloading via rustup or use on perf.rlo but it won't actually run the complete CI loop or merge the results. AFAIK retry should be sufficient.

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 3, 2021
@richkadel
Copy link
Contributor Author

try tells bors to produce build artifacts for Linux

Oh, yes, sorry... I meant to say, "does it need a fresh @bors r+ or something (not try).

Anyway, hopefully this time it will work. Thank you!

@bors
Copy link
Contributor

bors commented Jun 4, 2021

⌛ Testing commit f4f76e6 with merge 289ada5...

@bors
Copy link
Contributor

bors commented Jun 4, 2021

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing 289ada5 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 4, 2021
@bors bors merged commit 289ada5 into rust-lang:master Jun 4, 2021
@rustbot rustbot added this to the 1.54.0 milestone Jun 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Show zero coverage for unreachable blocks (e.g., dropped after const evaluation)
7 participants