Parallelize dwarfdump per-unit #285

rocallahan · 2018-03-09T04:58:26Z

Maybe you won't like the memory usage increase, but this is a pretty nice speedup when you're not writing the results to a file. Fearless concurrency FTW.

I have a followup patch that adds a dwarfdump option to drop the output for compilation units where the output doesn't match a given regex, which makes this more useful.

rocallahan · 2018-03-09T05:17:57Z

Forgot to mention, that 4x speedup is basically linear speedup on my laptop (4 cores, 8 hyperthreads).

coveralls · 2018-03-09T05:27:42Z

Coverage decreased (-0.01%) to 92.506% when pulling fef4e3d on rocallahan:parallel-dump into 71a1138 on gimli-rs:master.

philipc · 2018-03-11T06:37:16Z

Forgot to mention, that 4x speedup is basically linear speedup on my laptop (4 cores, 8 hyperthreads).

There's an initial speedup due to writing to a Vec instead of using a BufWriter. So I'm seeing 80s before this PR, 60s with 1 worker, 23s with 4 workers.

philipc

Thanks, nice speedup! I think the memory usage is fine. It would be nice if some other library could make parallel_output simpler, but I'm not aware of anything that does (I saw your comments in rayon about converting to a serial iterator).

philipc · 2018-03-11T06:38:39Z

examples/dwarfdump.rs

@@ -65,6 +71,57 @@ impl From<io::Error> for Error {

 pub type Result<T> = result::Result<T, Error>;

+fn parallel_output<II: IntoIterator, F>(max_workers: usize, iter: II, f: F) -> Result<()>


~~Any benefit to using IntoIterator instead of Iterator?~~ Also please move the bound into the where clause.

We could use Iterator here and move into_iter to the caller, but specifying IntoIterator here makes this function more reusable with I think no net increase in complexity.

philipc · 2018-03-11T06:40:30Z

examples/dwarfdump.rs

+fn parallel_output<II: IntoIterator, F>(max_workers: usize, iter: II, f: F) -> Result<()>
+    where F: Sync + Fn(II::Item, &mut Vec<u8>) -> Result<()>,
+          II::IntoIter: Send {
+    let state = Mutex::new((iter.into_iter().fuse(), 0, Ok(())));


Having the fuse doesn't hurt, but is it actually needed for the current caller?

I think this code is complex enough that using a local struct type for state would improve readability.

We depend on fused behavior. If we remove fuse here then I would want to make parallel_output take a Vec<T> instead of a generic Iterator/IntoIterator. Is that preferable?

No, it's fine how it is.

philipc · 2018-03-11T06:52:26Z

src/reader.rs

@@ -179,7 +179,7 @@ impl ReaderOffset for usize {
 ///
 /// All read operations advance the section offset of the reader
 /// unless specified otherwise.
-pub trait Reader: Debug + Clone {
+pub trait Reader: Debug + Clone + Send + Sync {


I'm not sure about this (and same for Endianity). Nothing in gimli requires this. I think this should be up to the consumer to decide, but this isn't something I've had much experience with. We're already defining a Reader trait in dwarfdump that could specify this requirement for its use.

Unfortunately this is tricky because there doesn't seem to be a way for dwarfdump::Reader to constrain its Endian to be Send + Sync. This is related to rust-lang/rust#38738.

I figured something out.

philipc · 2018-03-11T06:53:50Z

examples/dwarfdump.rs

    if flags.eh_frame {
-        dump_eh_frame(w, eh_frame)?;
+        dump_eh_frame(&mut BufWriter::new(out.lock()), eh_frame)?;


These are a bit ugly, but I don't have a better idea. This relies on the drop to flush, right?

rocallahan · 2018-03-11T23:54:35Z

There's an initial speedup due to writing to a Vec instead of using a BufWriter. So I'm seeing 80s before this PR, 60s with 1 worker, 23s with 4 workers.

Well spotted. That's interesting.

Before: time ( target/release/examples/dwarfdump -i ~/mozilla-central/obj-ff-opt/dist/bin/libxul.so >& /dev/null ) real 1m39.153s user 1m37.714s sys 0m1.320s After: time ( target/release/examples/dwarfdump -i ~/mozilla-central/obj-ff-opt/dist/bin/libxul.so >& /dev/null ) real 0m25.641s user 2m3.328s sys 0m1.087s This increases memory usage. We buffer the output; the max memory usage increases by around the size of the N largest outputs per compilation unit, where N is the min of 16 and num_cpus::get(). The larger compilation units in Firefox libxul.so produce tens to hundreds of megabytes of output each. Then again, the speedup processing such large files is important.

philipc requested changes Mar 11, 2018

View reviewed changes

philipc approved these changes Mar 12, 2018

View reviewed changes

philipc merged commit 5183af0 into gimli-rs:master Mar 12, 2018

rocallahan deleted the parallel-dump branch March 22, 2018 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize dwarfdump per-unit #285

Parallelize dwarfdump per-unit #285

rocallahan commented Mar 9, 2018

rocallahan commented Mar 9, 2018

coveralls commented Mar 9, 2018 •

edited

Loading

philipc commented Mar 11, 2018

philipc left a comment

philipc Mar 11, 2018 •

edited

Loading

rocallahan Mar 11, 2018

philipc Mar 11, 2018

philipc Mar 11, 2018

rocallahan Mar 11, 2018

philipc Mar 12, 2018

philipc Mar 11, 2018

rocallahan Mar 12, 2018

rocallahan Mar 12, 2018

philipc Mar 11, 2018

rocallahan Mar 12, 2018

rocallahan commented Mar 11, 2018

		@@ -65,6 +71,57 @@ impl From<io::Error> for Error {

		pub type Result<T> = result::Result<T, Error>;

		fn parallel_output<II: IntoIterator, F>(max_workers: usize, iter: II, f: F) -> Result<()>

Parallelize dwarfdump per-unit #285

Parallelize dwarfdump per-unit #285

Conversation

rocallahan commented Mar 9, 2018

rocallahan commented Mar 9, 2018

coveralls commented Mar 9, 2018 • edited Loading

philipc commented Mar 11, 2018

philipc left a comment

Choose a reason for hiding this comment

philipc Mar 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rocallahan commented Mar 11, 2018

coveralls commented Mar 9, 2018 •

edited

Loading

philipc Mar 11, 2018 •

edited

Loading