Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate assess performance regression with MIR linker #1828

Open
tedinski opened this issue Nov 1, 2022 · 7 comments
Open

Investigate assess performance regression with MIR linker #1828

tedinski opened this issue Nov 1, 2022 · 7 comments
Labels
[E] Performance Track performance improvement (Time / Memory / CPU)

Comments

@tedinski
Copy link
Contributor

tedinski commented Nov 1, 2022

From #1816:

  1. Using the MIR linker significantly increases the "problem sizes" and verification times for my goto test crate (rand). We go from usually <1k properties and ~10s common verification times to 50k properties and 60s verification times for each test harness. I have not yet investigated why.
  2. Two tests from that crate go from unwind failures to unsupported_construct failures (try and inline asm). It's curious that this happened as a result (presumably) of linking the standard library. I think this also should be investigated.

This is a tracking issue for investigating these behavior changes.

@tedinski tedinski self-assigned this Nov 1, 2022
@tedinski tedinski added this to the Create Kani Assess tool milestone Nov 1, 2022
@fzaiser
Copy link
Contributor

fzaiser commented Nov 1, 2022

May be related to #1818

@tedinski
Copy link
Contributor Author

tedinski commented Nov 3, 2022

So far I believe I've found two issues:

  1. When using --reachability=pub_fns and --profile test we end with the the sole "root" being one symbol: e.g. _RNvCscWIzLuMkfbr_12assess_works4main (which I believe is the generated main for the test binary, which brings in the whole parallel test runner machinery I assume)
  2. Even accounting for this, there still appears to be symbols included in the goto binary that are unreachable from this root. I haven't found a culprit yet, and am working on ways to figure out how to find this...

So:

  1. I'm surprised by main being the sole root of the reachability, since I thought we previously saw the actual test harnesses themselves get made "public" and each would be roots. I'm wondering if there's a tweak we can make here to make "pub_fns" go back to that behavior.
  2. Otherwise, I suspect "pub_fns" isn't useful and we'll have to add a different reachability mode for "all harness types" (or something like that, to get test and proof harnesses as roots)
  3. More investigation needed on that extra stuff that seems to be getting pulled in.

@celinval
Copy link
Contributor

celinval commented Nov 3, 2022

Interesting, here are few thoughts:

  1. We explicitly bring in the main function as an entry point when using pub_fns. This was done to fix bookrunner, since we use --function main, and main might not be set as public (Fix MIR Linker handling of --function main #1775). I would also expect the legacy linker to bring main though. Do you know if that's the case?
  2. The pub_fns is still required for --function and it might be useful for testing code generation of a crate, but it is possible that what you want is another option that pulls in the test harnesses instead. It will definitely be more efficient.
  3. I'm curious to know what is being brought up that is not reachable from root. One possibility is the implementation of some traits. Whenever we encounter a DST Coercion of a concrete type into a trait, we add to the reachability analysis all methods that are needed to generate the virtual table. Even if those methods don't ever get invoked.

@tedinski tedinski added the [E] Performance Track performance improvement (Time / Memory / CPU) label Nov 14, 2022
@tedinski
Copy link
Contributor Author

tedinski commented Jan 4, 2023

Current status: The addition of the test reachability mode back in Nov cut the performance problem by 50%, but we still see many goto binaries "blow up" in size.

Probable course of action:

  1. We can investigate to see if we can find a specific set of functions that cause this ballooning, and perhaps stub them out (even by default). We have a hypothesis that some of this size problem is caused by code that triggers reachability for the entire backtrace machinery in Rust. If we can prevent that, this would largely solve the problem where tiny, trivial harnesses balloon into 200+MB of code (and GBs of memory).
  2. Introduce project and stop merging all files generated by the compiler #1956 made all targets get built serially instead of in parallel by cargo. We could fix this, but in a practical sense we're blocked on doing that by reducing the memory consumption and ballooning problem above. If all we did was fix this problem, we'd just run machines out of memory as they try to build 16 symtabs that each consume 10+GB of RAM.
  3. Longer term, we can generate goto binaries directly, instead of generating symtab json and translating them with symtab2gb.

There are a lot of alternative options here: for instance, we could try to optimize the memory consumption of symtab2gb. Or break up symtab generation into multiple files. Or we could try having a pre-built goto binary for the already monomorphic parts of std. But the above seems like the most compelling course of action.

@celinval
Copy link
Contributor

celinval commented Jan 5, 2023

I think we stopped building in parallel before that. We did that so we could use cargo rustc.

@tedinski
Copy link
Contributor Author

tedinski commented Jan 6, 2023

My mistake, I checked the history this time:

@tedinski
Copy link
Contributor Author

tedinski commented Feb 2, 2023

This trivial harness:

use anyhow::*;

#[kani::proof]
fn proof_using_anyhow() {
    let _ = Ok(()).context("words");
}

Produces 286MB of symtab.json. With a few different hacky experiments in visit_fn in the reachability code:

40M if pretty_name.contains("std::backtrace") { return }
67M if pretty_name.contains("backtrace_rs::") { return }

So there's a pretty significant reduction in bloat there (220+MB eliminated). Unfortunately, I have not yet been able to identify any single or few functions that, if stubbed, would accomplish the same effect. I think because the backtrace gets passed around and potentially printed that a lot of this code looks potentially reachable.

I tried applying the latter hack to an assess run, but I still saw pretty significant symtab2gb memory usage in a few crates, so that's probably where to investigate next: (not an exhaustive list)

anyhow 12G
textwrap 13G
unicode-normalization 10G
regex 16G

@tedinski tedinski removed their assignment Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[E] Performance Track performance improvement (Time / Memory / CPU)
Projects
No open projects
Status: In Progress
Status: No status
Status: No status
Status: No status
Development

No branches or pull requests

3 participants