Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

Optimize the output side of srcview #2515

Merged
merged 15 commits into from
Nov 2, 2022

Conversation

DrChat
Copy link
Member

@DrChat DrChat commented Oct 13, 2022

Summary of the Pull Request

This PR contains performance improvements for the srcview crate.

PR Checklist

  • Applies to work item: #xxx
  • CLA signed. If not, go over here and sign the CLI.
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Info on Pull Request

  1. Comparing PathBuf is more expensive than comparing raw strings, so we should strive to use strings while building the PdbCache.
  2. It's more expensive to allocate temporary backing memory for the XML cobertura file than it is to simply write directly to a handle provided by the caller.
  3. Swap to quick-xml and try to avoid allocating any memory in the hot path when outputting cobertura XML output.

Validation Steps Performed

srcview binary with a large PDB (ntoskrnl) and modoff file.

@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2022

Codecov Report

Merging #2515 (7dfac96) into main (8be6e3c) will decrease coverage by 0.23%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main    #2515      +/-   ##
==========================================
- Coverage   29.89%   29.65%   -0.24%     
==========================================
  Files         289      282       -7     
  Lines       35771    34209    -1562     
==========================================
- Hits        10692    10145     -547     
+ Misses      25079    24064    -1015     
Impacted Files Coverage Δ
src/agent/srcview/src/bin/srcview.rs 0.97% <0.00%> (-0.12%) ⬇️
src/agent/srcview/src/pdbcache.rs 0.00% <0.00%> (ø)
src/agent/srcview/src/report.rs 0.00% <0.00%> (ø)
src/agent/onefuzz/src/env.rs 0.00% <0.00%> (-83.34%) ⬇️
src/agent/onefuzz/src/libfuzzer.rs 12.23% <0.00%> (-57.87%) ⬇️
src/agent/onefuzz/src/memory.rs 0.00% <0.00%> (-25.00%) ⬇️
src/agent/coverage/src/test.rs 85.71% <0.00%> (-14.29%) ⬇️
src/agent/onefuzz/src/monitor.rs 67.12% <0.00%> (-13.70%) ⬇️
src/agent/onefuzz/src/machine_id.rs 51.72% <0.00%> (-5.12%) ⬇️
src/agent/onefuzz/src/expand.rs 62.97% <0.00%> (-3.52%) ⬇️
... and 24 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@DrChat DrChat changed the title Optimize srcview Optimize the output side of srcview Oct 13, 2022
@DrChat DrChat marked this pull request as ready for review October 13, 2022 19:57
src/agent/srcview/src/pdbcache.rs Show resolved Hide resolved
src/agent/srcview/src/report.rs Show resolved Hide resolved
@Porges Porges force-pushed the user/jusmoore/srcview_perf branch 2 times, most recently from 193aeef to 660231a Compare October 13, 2022 20:57
@DrChat
Copy link
Member Author

DrChat commented Oct 13, 2022

Of particular note: When writing a file, callers will need to wrap the file with a BufWriter before passing it into Report::cobertura or their performance will drastically suffer.
While I don't think it should be the job of srcview to try and buffer any writes, it might be appropriate to update the example with one that does this.

@Porges
Copy link
Member

Porges commented Oct 13, 2022

@DrChat Of particular note: When writing a file, callers will need to wrap the file with a BufWriter before passing it into Report::cobertura or their performance will drastically suffer. While I don't think it should be the job of srcview to try and buffer any writes, it might be appropriate to update the example with one that does this.

Big Idea:

   pub fn cobertura(
        &self,
        filter_regex: Option<&str>,
        output: &mut std::io::BufWriter<impl Write>,
    ) -> Result<()> {
        self.cobertura_slowly(filter_regex, output)
    }

    pub fn cobertura_slowly(
        &self,
        filter_regex: Option<&str>,
        output: &mut impl Write,
    ) -> Result<()> {

@DrChat
Copy link
Member Author

DrChat commented Oct 13, 2022

Haha, I mean we might as well as just leave the &mut impl Write function and nudge callers to use a BufWriter if they're writing a file.
Maybe if it becomes a problem, we could do something like just wrap it internally?

@Porges
Copy link
Member

Porges commented Oct 13, 2022

@DrChat Haha, I mean we might as well as just leave the &mut impl Write function and nudge callers to use a BufWriter if they're writing a file. Maybe if it becomes a problem, we could do something like just wrap it internally?

For now I'll just update our example executable, and the NOTE should hopefully point the way.

@DrChat
Copy link
Member Author

DrChat commented Oct 14, 2022

Also fyi - I'll probably follow up with another few PRs with further refactoring.
There's definitely a lot that can be done to reduce memory use and improve performance.

@ranweiler
Copy link
Member

In general, for incremental perf optimization PRs, we should include some before/after benchmarks of the relevant metrics (time, memory). This is helpful for understanding what one gets in exchange for using (sometimes) less obvious / more complex implementation strategies. Will also help diagnose regressions and guide future optimization attempts.

@DrChat
Copy link
Member Author

DrChat commented Oct 17, 2022

@ranweiler totally agree - any recommendations on measuring and reporting the perf?
I'm using WPA/ETW personally, but I'm not very disciplined with it and it doesn't account for warm-up runs or anything of the sort. This perf improvement was big enough to be obvious from my traces at least.
criterion has been a good crate to use for perf testing in the past, though it will require some setup.

@ranweiler
Copy link
Member

@DrChat, I don't think we need to get too strict with it or anything, especially with more obvious big wins. If you're using WPA, even just noting the changes you're seeing at a high level would be useful. That'd help share effect magnitude, and ideally give others a sense of how you were measuring it / how to repeat on their own.

@DrChat
Copy link
Member Author

DrChat commented Oct 17, 2022

Sounds good!
Before (symbols are a bit mangled, but you can see SrcView and Report functions in the 5 entries past the highlighted frame):
wpa_4SgF6m3fcF

After:
wpa_WgKwYIiLiG

Overall, from 13.637s to 8.937s (roughly 1.5x speedup). For our use-case, much higher because we call the output side more than SrcView::insert.
I may work on a criterion benchmark in any case because I'm planning on doing further optimizations (and especially looking at cutting down on memory use, because there's a lot of opportunity to do so).

@ranweiler ranweiler merged commit d655fcd into microsoft:main Nov 2, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Dec 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants