Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FileSerializationSink by using parking_lot::Mutex and avoiding heap allocations in write_atomic. #88

Merged
merged 3 commits into from Nov 22, 2019

Conversation

@michaelwoerister
Copy link
Contributor

michaelwoerister commented Nov 15, 2019

This PR makes the FileSerializationSink exactly as fast as the MmapSerializationSink in the benchmarks we have. But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

It would be nice if we could get rid of the MmapSerializationSink because it keeps everything in memory until the end.

I wonder how much work it would be to have benchmarks that actually run rustc. It's a hassle to test this manually.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 15, 2019

I think our benchmarks generate too little data to be realistic (less than a megabyte, I think). I'll look into that later. Also, a multi-threaded benchmark would be good.

@wesleywiser

This comment has been minimized.

Copy link
Member

wesleywiser commented Nov 15, 2019

But I only tested on Linux and I also remember that the benchmarks were no good indication of performance when used in rustc.

FYI, as I recall, the issue was that mmap was quite a bit slower on Windows.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 15, 2019

Here are some benchmark numbers with and without the patch (numbers in milliseconds):

Windows before after improvement
file, 1 thread 897 773 ~15%
file, 8 threads 921 663 ~30%
mmap, 1 thread 909 939 -
mmap, 8 threads 839 830 -
Linux before after improvement
file, 1 thread 605 510 15-20%
file, 8 threads 765 752 -
mmap, 1 thread 512 519 -
mmap, 8 threads 378 349 -

The patch should be very good for Windows, especially for the multithreaded case. On Linux mmap is as fast (1 thread) or much faster (8 threads). I don't have any numbers for macOS.

@wesleywiser

This comment has been minimized.

Copy link
Member

wesleywiser commented Nov 15, 2019

I can gather some macOS numbers if you're interested.

@wesleywiser wesleywiser self-assigned this Nov 15, 2019
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 15, 2019

If you have time, I'd be interested, yeah.

I got the numbers by running cargo +nightly bench. The "after" numbers are from this PR as it is, and the "before" numbers are from this PR, but with commit 8367ec6 removed. The last commit has to stay in because it modifies the benchmarks.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 15, 2019

Update: I just pushed the "baseline" version to https://github.com/michaelwoerister/measureme/tree/opt-file-sink-ref.

@wesleywiser

This comment has been minimized.

Copy link
Member

wesleywiser commented Nov 18, 2019

Here's what I'm seeing on my 4 core 8 thread MBP:

macOS before after improvement
file, 1 thread 759 556 ~26%
file, 8 thread 1,666 559 ~66%
mmap, 1 thread 670 622 ~7%
mmap, 8 thread 394 386 ~2%
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 18, 2019

Thanks, @wesleywiser! Looks like we actually might want to switch to FileSerializationSink on macOS too for the non-parallel case.

Copy link
Member

wesleywiser left a comment

Is the second commit still WIP? If not, this looks good to me overall.

measureme/Cargo.toml Show resolved Hide resolved
measureme/src/file_serialization_sink.rs Outdated Show resolved Hide resolved
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 19, 2019

The second commit is still work-in-progress, yes. I want to update the code in testing_common to handle multiple threads, which shouldn't be too hard. Thanks for review!

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 20, 2019

I did some new Windows measurements of parking_lot vs std::sync. It looks like parking_lot is a bit faster indeed. Let's keep it.

Windows before std::sync parking_lot
file, 1 thread 744 614 589
file, 8 threads 862 637 582
@michaelwoerister michaelwoerister force-pushed the michaelwoerister:opt-file-sink branch from 3a6a0df to 58dbbd9 Nov 20, 2019
@michaelwoerister michaelwoerister force-pushed the michaelwoerister:opt-file-sink branch from 58dbbd9 to b9d0111 Nov 20, 2019
@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 20, 2019

This is ready for review now.

@@ -16,7 +23,12 @@ impl SerializationSink for FileSerializationSink {
let file = fs::File::create(path)?;

Ok(FileSerializationSink {
data: Mutex::new((BufWriter::new(file), 0)),

This comment has been minimized.

Copy link
@andjo403

andjo403 Nov 20, 2019

Contributor

what is the gain by removing the BufWriter? feels like the new code is similar to the BufWriter code
so will BufWriter::with_capacity(1024*512, file) give the same result?

This comment has been minimized.

Copy link
@michaelwoerister

michaelwoerister Nov 21, 2019

Author Contributor

The difference is that BufWriter does not allow directly writing to its buffer, so we are basically re-implementing BufWriter here. The interface of write_atomic requires there to be a writable output buffer.

@andjo403

This comment has been minimized.

Copy link
Contributor

andjo403 commented Nov 20, 2019

have someone looked at how mush the result differ for the "fast path" where the buffer is updated and the "slow path" where the file is written? as we are blocking all threads from writing during the file write this can affect many events.
also by increasing the size of the buffer to 512Kb from 8Kb maybe the variance of the measurements bigger as it takes longer to write.
was thinking that maybe some of the variance see in #67 can be due to the file writes.

@michaelwoerister

This comment has been minimized.

Copy link
Contributor Author

michaelwoerister commented Nov 21, 2019

@andjo403 I have not investigated variance. The bigger buffer should reduce the number of writes, while making each write larger. So they are less evenly distributed but the fixed overhead might amortize better.

It would be nice to do the actual file writing in a background thread via some kind of double buffering scheme. I haven't tried to implement something like that though.

@wesleywiser wesleywiser merged commit 665b384 into rust-lang:master Nov 22, 2019
1 check passed
1 check passed
Travis CI - Pull Request Build Passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.