ReadBuf's API #94741

nrc · 2022-03-08T16:49:04Z

ReadBuf is unstable, part of the read-buf feature. I believe the API might be improved by some of the following ways (I've grouped methods by theme):

Construction

remove new (it should create an empty ReadBuf, but that is useless, I think)
Should implement From<&'a mut [u8]> for ReadBuf<'a> to create a ReadBuf from a byte slice (rather than new)
Should implement From<&'a mut [MaybeUninit<u8>]> for ReadBuf<'a> to create a ReadBuf from a byte slice (rather than uninit)

Length functions

Keep capacity
~~filled_len -> len to match Vec etc.~~
remove filled_len (use filled().len())
remove remaining (can use ~~capacity() - len() or~~ unfilled().len())
remove initialized_len (can use initialized().len())

Slicing functions

Keep filled, initialized, unfilled, uninitialized, and mut variants
~~Remove filled and filled_mut, replace by implementing Into<&'a mut [u8]> for ReadBuf<'a>~~

Initializing data

Remove initialize_unfilled and initialize_unfilled_to, unless these are super useful
- rationale, getting unfilled slice is easy, initialising is easy, so it seems the only benefit is tracking this
keep assume_init

Misc

Keep clear and append
Remove add_filled
Rename set_filled to assume_filled (to match assume_init)

The current API is documented at https://doc.rust-lang.org/nightly/std/io/struct.ReadBuf.html

This issue continues a discussion from Zulip (https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/ReadBuf.20API), there was not much discussion relevant to the points above.

The text was updated successfully, but these errors were encountered:

djc · 2022-03-08T16:55:09Z

filled_len -> len to match Vec etc.

Not sure about this one. If I convert a &mut [u8] to ReadBuf (with let mut buf = ReadBuf::from(slice);, say), suddenly the buf.len() is shorter than the slice.len(). IMO we should either stick with filled_len() or ditch it in favor of filled().len() (which is nicely symmetric with unfilled().len()).

Remove filled and filled_mut, replace by implementing Into<&'a mut [u8]> for ReadBuf<'a>

This one also doesn't make sense to me. filled() and filled_mut() reborrow the buffer from the ReadBuf, which is very different from giving away ownership of the ReadBuf in order to get a slice. IIRC it's at least somewhat useful to be able to get access to filled() while keeping the ReadBuf intact for further use.

joshtriplett · 2022-03-08T17:49:41Z

I agree that we shouldn't have unqualified methods like len affect part (not all) of the buffer; too much potential for confusion.

nrc · 2022-03-08T17:51:51Z

suddenly the buf.len() is shorter than the slice.len()

Do you think this will have practical problems? To me it seems that creating a ReadBuf from a slice means we change the semantics of how we interpret the slice and thus len could have a different value.

filled() and filled_mut() reborrow the buffer from the ReadBuf, which is very different from giving away ownership of the ReadBuf in order to get a slice. IIRC it's at least somewhat useful to be able to get access to filled() while keeping the ReadBuf intact for further use.

That makes sense, does Into<&'a mut [u8]> for &'a mut ReadBuf<'a> work better? (I had also previously suggested using Deref, though I think that is too magical) My underlying philosophy here is that only the user wants to get access to the filled slice, and if the user wants the filled slice, then the filled slice is the part the 'useful' part.

I agree that we shouldn't have unqualified methods like len affect part (not all) of the buffer; too much potential for confusion.

My analogy here is Vec where len is only the 'useful' part of the available buffer, and capacity is the whole part.

beepster4096 · 2022-03-08T19:40:37Z

I don't like having From/Into be the only way to do this. I think it's always better to have explicit methods of doing things, in addition to the conversion traits.

djc · 2022-03-13T15:45:18Z

Do you think this will have practical problems? To me it seems that creating a ReadBuf from a slice means we change the semantics of how we interpret the slice and thus len could have a different value.

I guess my concern is that there are two lengths in play here, the filled length and the initialized length, and because there are two it would be better to disambiguate between them explicitly. The same is not true for Vec, where all filled values are also initialized.

That makes sense, does Into<&'a mut [u8]> for &'a mut ReadBuf<'a> work better? (I had also previously suggested using Deref, though I think that is too magical) My underlying philosophy here is that only the user wants to get access to the filled slice, and if the user wants the filled slice, then the filled slice is the part the 'useful' part.

IMO this proposed Into argument loogs pretty convoluted and not at all intuitive. I don't think conversions into slices like this are very common? They would almost always be represented by a method instead. And to turn it the other way around, what is gained by changing this into an Into impl? Having methods is perfectly fine, their documentation is generally easier to locate and their name provides an explicit hint about what's going on.

My analogy here is Vec where len is only the 'useful' part of the available buffer, and capacity is the whole part.

See my point above about there being a third boundary in play in this case, vs two for Vec.

nrc · 2022-03-14T09:34:44Z

OK, I've removed the controversial changes: lets keep filled and not implement deref/into for getting the the filled part. I've suggested removing filled_len and using filled().len(). (fwiw, I think the distinction between filled and unfilled should not be important for the user (only to a function which is filling the buffer), so I think the analogy to Vec holds, but operating directly on the slices seems fine).

I'm still unsure about initialize_unfilled - I'd like to remove it since it seems like a weird mashing together of functionality, however, I can also believe it might be convenient.

std::io: Modify some ReadBuf method signatures to return `&mut Self` This allows using `ReadBuf` in a builder-like style and to setup a `ReadBuf` and pass it to `read_buf` in a single expression, e.g., ``` // With this PR: reader.read_buf(ReadBuf::uninit(buf).assume_init(init_len))?; // Previously: let mut buf = ReadBuf::uninit(buf); buf.assume_init(init_len); reader.read_buf(&mut buf)?; ``` r? `@sfackler` cc rust-lang#78485, rust-lang#94741

std::io: migrate ReadBuf to BorrowBuf/BorrowCursor This PR replaces `ReadBuf` (used by the `Read::read_buf` family of methods) with `BorrowBuf` and `BorrowCursor`. The general idea is to split `ReadBuf` because its API is large and confusing. `BorrowBuf` represents a borrowed buffer which is mostly read-only and (other than for construction) deals only with filled vs unfilled segments. a `BorrowCursor` is a mostly write-only view of the unfilled part of a `BorrowBuf` which distinguishes between initialized and uninitialized segments. For `Read::read_buf`, the caller would create a `BorrowBuf`, then pass a `BorrowCursor` to `read_buf`. In addition to the major API split, I've made the following smaller changes: * Removed some methods entirely from the API (mostly the functionality can be replicated with two calls rather than a single one) * Unified naming, e.g., by replacing initialized with init and assume_init with set_init * Added an easy way to get the number of bytes written to a cursor (`written` method) As well as simplifying the API (IMO), this approach has the following advantages: * Since we pass the cursor by value, we remove the 'unsoundness footgun' where a malicious `read_buf` could swap out the `ReadBuf`. * Since `read_buf` cannot write into the filled part of the buffer, we prevent the filled part shrinking or changing which could cause underflow for the caller or unexpected behaviour. ## Outline ```rust pub struct BorrowBuf<'a> impl Debug for BorrowBuf<'_> impl<'a> From<&'a mut [u8]> for BorrowBuf<'a> impl<'a> From<&'a mut [MaybeUninit<u8>]> for BorrowBuf<'a> impl<'a> BorrowBuf<'a> { pub fn capacity(&self) -> usize pub fn len(&self) -> usize pub fn init_len(&self) -> usize pub fn filled(&self) -> &[u8] pub fn unfilled<'this>(&'this mut self) -> BorrowCursor<'this, 'a> pub fn clear(&mut self) -> &mut Self pub unsafe fn set_init(&mut self, n: usize) -> &mut Self } pub struct BorrowCursor<'buf, 'data> impl<'buf, 'data> BorrowCursor<'buf, 'data> { pub fn clone<'this>(&'this mut self) -> BorrowCursor<'this, 'data> pub fn capacity(&self) -> usize pub fn written(&self) -> usize pub fn init_ref(&self) -> &[u8] pub fn init_mut(&mut self) -> &mut [u8] pub fn uninit_mut(&mut self) -> &mut [MaybeUninit<u8>] pub unsafe fn as_mut(&mut self) -> &mut [MaybeUninit<u8>] pub unsafe fn advance(&mut self, n: usize) -> &mut Self pub fn ensure_init(&mut self) -> &mut Self pub unsafe fn set_init(&mut self, n: usize) -> &mut Self pub fn append(&mut self, buf: &[u8]) } ``` ## TODO * ~~Migrate non-unix libs and tests~~ * ~~Naming~~ * ~~`BorrowBuf` or `BorrowedBuf` or `SliceBuf`? (We might want an owned equivalent for the async IO traits)~~ * ~~Should we rename the `readbuf` module? We might keep the name indicate it includes both the buf and cursor variations and someday the owned version too. Or we could change it. It is not publicly exposed, so it is not that important~~. * ~~`read_buf` method: we read into the cursor now, so the `_buf` suffix is a bit weird.~~ * ~~Documentation~~ * Tests are incomplete (I adjusted existing tests, but did not add new ones). cc rust-lang#78485, rust-lang#94741 supersedes: rust-lang#95770, rust-lang#93359 fixes rust-lang#93305

std::io: Modify some ReadBuf method signatures to return `&mut Self` This allows using `ReadBuf` in a builder-like style and to setup a `ReadBuf` and pass it to `read_buf` in a single expression, e.g., ``` // With this PR: reader.read_buf(ReadBuf::uninit(buf).assume_init(init_len))?; // Previously: let mut buf = ReadBuf::uninit(buf); buf.assume_init(init_len); reader.read_buf(&mut buf)?; ``` r? `@sfackler` cc rust-lang/rust#78485, rust-lang/rust#94741

std::io: migrate ReadBuf to BorrowBuf/BorrowCursor This PR replaces `ReadBuf` (used by the `Read::read_buf` family of methods) with `BorrowBuf` and `BorrowCursor`. The general idea is to split `ReadBuf` because its API is large and confusing. `BorrowBuf` represents a borrowed buffer which is mostly read-only and (other than for construction) deals only with filled vs unfilled segments. a `BorrowCursor` is a mostly write-only view of the unfilled part of a `BorrowBuf` which distinguishes between initialized and uninitialized segments. For `Read::read_buf`, the caller would create a `BorrowBuf`, then pass a `BorrowCursor` to `read_buf`. In addition to the major API split, I've made the following smaller changes: * Removed some methods entirely from the API (mostly the functionality can be replicated with two calls rather than a single one) * Unified naming, e.g., by replacing initialized with init and assume_init with set_init * Added an easy way to get the number of bytes written to a cursor (`written` method) As well as simplifying the API (IMO), this approach has the following advantages: * Since we pass the cursor by value, we remove the 'unsoundness footgun' where a malicious `read_buf` could swap out the `ReadBuf`. * Since `read_buf` cannot write into the filled part of the buffer, we prevent the filled part shrinking or changing which could cause underflow for the caller or unexpected behaviour. ## Outline ```rust pub struct BorrowBuf<'a> impl Debug for BorrowBuf<'_> impl<'a> From<&'a mut [u8]> for BorrowBuf<'a> impl<'a> From<&'a mut [MaybeUninit<u8>]> for BorrowBuf<'a> impl<'a> BorrowBuf<'a> { pub fn capacity(&self) -> usize pub fn len(&self) -> usize pub fn init_len(&self) -> usize pub fn filled(&self) -> &[u8] pub fn unfilled<'this>(&'this mut self) -> BorrowCursor<'this, 'a> pub fn clear(&mut self) -> &mut Self pub unsafe fn set_init(&mut self, n: usize) -> &mut Self } pub struct BorrowCursor<'buf, 'data> impl<'buf, 'data> BorrowCursor<'buf, 'data> { pub fn clone<'this>(&'this mut self) -> BorrowCursor<'this, 'data> pub fn capacity(&self) -> usize pub fn written(&self) -> usize pub fn init_ref(&self) -> &[u8] pub fn init_mut(&mut self) -> &mut [u8] pub fn uninit_mut(&mut self) -> &mut [MaybeUninit<u8>] pub unsafe fn as_mut(&mut self) -> &mut [MaybeUninit<u8>] pub unsafe fn advance(&mut self, n: usize) -> &mut Self pub fn ensure_init(&mut self) -> &mut Self pub unsafe fn set_init(&mut self, n: usize) -> &mut Self pub fn append(&mut self, buf: &[u8]) } ``` ## TODO * ~~Migrate non-unix libs and tests~~ * ~~Naming~~ * ~~`BorrowBuf` or `BorrowedBuf` or `SliceBuf`? (We might want an owned equivalent for the async IO traits)~~ * ~~Should we rename the `readbuf` module? We might keep the name indicate it includes both the buf and cursor variations and someday the owned version too. Or we could change it. It is not publicly exposed, so it is not that important~~. * ~~`read_buf` method: we read into the cursor now, so the `_buf` suffix is a bit weird.~~ * ~~Documentation~~ * Tests are incomplete (I adjusted existing tests, but did not add new ones). cc rust-lang/rust#78485, rust-lang/rust#94741 supersedes: rust-lang/rust#95770, rust-lang/rust#93359 fixes #93305

nrc added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 8, 2022

nrc mentioned this issue Apr 7, 2022

std::io: Modify some ReadBuf method signatures to return &mut Self #95770

Merged

nrc mentioned this issue May 13, 2022

std::io: migrate ReadBuf to BorrowedBuf/BorrowedCursor #97015

Merged

Enselic added the A-io Area: std::io, std::fs, std::net and std::path label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ReadBuf's API #94741

ReadBuf's API #94741

nrc commented Mar 8, 2022 •

edited

Loading

djc commented Mar 8, 2022

joshtriplett commented Mar 8, 2022

nrc commented Mar 8, 2022

beepster4096 commented Mar 8, 2022

djc commented Mar 13, 2022

nrc commented Mar 14, 2022

ReadBuf's API #94741

ReadBuf's API #94741

Comments

nrc commented Mar 8, 2022 • edited Loading

Construction

Length functions

Slicing functions

Initializing data

Misc

djc commented Mar 8, 2022

joshtriplett commented Mar 8, 2022

nrc commented Mar 8, 2022

beepster4096 commented Mar 8, 2022

djc commented Mar 13, 2022

nrc commented Mar 14, 2022

nrc commented Mar 8, 2022 •

edited

Loading