Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a StreamingBuffer #348

Closed
wants to merge 4 commits into from

Conversation

bjorn3
Copy link
Contributor

@bjorn3 bjorn3 commented Aug 7, 2021

This streams an object file to the given writer without keeping it all in memory at once. This avoids an unnecessary copy of all data when you want to write the object file to the disk immediately afterwards anyway. I also made the api contract of WritableBuffer a bit stricter to for example allow writing to a file using memory mapping by requiring that the size given to .reserve() matches matches the final size of the object file. Finally I pre-allocated a couple of vecs with the final size to prevent re-allocations.

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 7, 2021

As for the rationale for this change, a non-trivial portion of the time necessary to create the metadata object file in rustc is spent during memcpy. A bigger portion of the time is spent in copying the data into the Object during append_section_data though, which is probably harder to avoid.

@philipc
Copy link
Contributor

philipc commented Aug 7, 2021

I don't see the reason for all of these API changes. Although it isn't documented well, I believe the current API already does what you want. So all that is needed is a StreamingBuffer implementation.

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 8, 2021

The change from the resize to align method is necessary to be able to implement it using a single write call per 1024 alignment bytes instead of one write call for each alignment byte.

The exact allocation of several vecs is just a perf optimization to avoid unnecessary reallocations that copy the full contents of the vec over to the new allocation, which is expensive for bigger vecs.

The stricter api contract is not necessary for StreamingBuffer, but it is necessary if you want to implement say an mmap based writer.

@philipc
Copy link
Contributor

philipc commented Aug 8, 2021

The change from the resize to align method is necessary to be able to implement it using a single write call per 1024 alignment bytes instead of one write call for each alignment byte.

I don't see why that optimisation can't be done with the old resize. The first thing you do is calculate the new length, which is effectively the same as having the caller pass in the new length to resize. The reason I prefer to not have this change is because it requires the implementation to do the alignment calculation.

The exact allocation of several vecs is just a perf optimization to avoid unnecessary reallocations that copy the full contents of the vec over to the new allocation, which is expensive for bigger vecs.

Those changes are fine.

The stricter api contract is not necessary for StreamingBuffer, but it is necessary if you want to implement say an mmap based writer.

yurydelendik already used it for something like this in wasmtime (although I'm not sure that those changes ever landed). It was the reason this trait was added. It isn't documented properly, but the contract always was the we do a single reserve before writing.

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 8, 2021

I don't see why that optimisation can't be done with the old resize. The first thing you do is calculate the new length, which is effectively the same as having the caller pass in the new length to resize. The reason I prefer to not have this change is because it requires the implementation to do the alignment calculation.

How do you then cheaply create a slice to write? You can't slice a promoted &'static [u8; 1024] as the caller may pass in any byte value, not just 0. Pretty much the only option is to put it on the stack and memset it to the required size which is more expensive than a single slice operation.

@philipc
Copy link
Contributor

philipc commented Aug 8, 2021

How do you then cheaply create a slice to write? You can't slice a promoted &'static [u8; 1024] as the caller may pass in any byte value, not just 0. Pretty much the only option is to put it on the stack and memset it to the required size which is more expensive than a single slice operation.

Ok, so delete the value parameter to resize, since nothing used it.

This avoids unnecessary reallocations that need to copy the already
written data over.
In preparation for adding a streaming file writer
@bjorn3 bjorn3 force-pushed the writable_buffer_improvements branch from 5bc3945 to 4ecc720 Compare August 8, 2021 09:36
@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 8, 2021

Ok, so delete the value parameter to resize, since nothing used it.

Done

fn reserve(&mut self, additional: usize) -> Result<(), ()> {
self.reserve(additional);
fn reserve(&mut self, size: usize) -> Result<(), ()> {
assert_eq!(self.len(), 0, "Buffer must be empty");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_eq!(self.len(), 0, "Buffer must be empty");
debug_assert!(self.is_empty());


#[inline]
fn write_bytes(&mut self, val: &[u8]) {
self.writer.write_all(val).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that unwrap is correct here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. I forgot to change this function to return std::io::Result<()>.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if instead of returning a std::io::Result here, we cached the error value and ignored all future writes, and then returned the error value at the end.

@philipc
Copy link
Contributor

philipc commented Aug 8, 2021

Pretty much the only option is to put it on the stack and memset it to the required size which is more expensive than a single slice operation.

By the way, it becomes a memset anyway when using a BufWriter.

@bjorn3
Copy link
Contributor Author

bjorn3 commented Aug 8, 2021

By the way, it becomes a memset anyway when using a BufWriter.

In the &[0; 1024][..write_amt] the [0; 1024] should be promoted to a global I think, so the slice references memory part of the .bss section.

@philipc
Copy link
Contributor

philipc commented Aug 8, 2021

In the &[0; 1024][..write_amt] the [0; 1024] should be promoted to a global I think, so the slice references memory part of the .bss section.

Yes, but if that is then immediately copied into a BufWriter's buffer, it optimises that memcpy into a memset, so the .bss is unused. It doesn't matter, I just wasn't aware of the .bss optimisation so I went exploring.

@philipc
Copy link
Contributor

philipc commented Aug 13, 2021

This will probably conflict with #350 (please don't do the write result change until after it is merged).

@philipc philipc closed this in #369 Aug 30, 2021
@bjorn3 bjorn3 deleted the writable_buffer_improvements branch August 30, 2021 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants