Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadBuf's API #94741

Open
nrc opened this issue Mar 8, 2022 · 6 comments
Open

ReadBuf's API #94741

nrc opened this issue Mar 8, 2022 · 6 comments
Labels
A-io Area: std::io, std::fs, std::net and std::path T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@nrc
Copy link
Member

nrc commented Mar 8, 2022

cc #78485

ReadBuf is unstable, part of the read-buf feature. I believe the API might be improved by some of the following ways (I've grouped methods by theme):

Construction

  • remove new (it should create an empty ReadBuf, but that is useless, I think)
  • Should implement From<&'a mut [u8]> for ReadBuf<'a> to create a ReadBuf from a byte slice (rather than new)
  • Should implement From<&'a mut [MaybeUninit<u8>]> for ReadBuf<'a> to create a ReadBuf from a byte slice (rather than uninit)

Length functions

  • Keep capacity
  • filled_len -> len to match Vec etc.
  • remove filled_len (use filled().len())
  • remove remaining (can use capacity() - len() or unfilled().len())
  • remove initialized_len (can use initialized().len())

Slicing functions

  • Keep filled, initialized, unfilled, uninitialized, and mut variants
  • Remove filled and filled_mut, replace by implementing Into<&'a mut [u8]> for ReadBuf<'a>

Initializing data

  • Remove initialize_unfilled and initialize_unfilled_to, unless these are super useful
    • rationale, getting unfilled slice is easy, initialising is easy, so it seems the only benefit is tracking this
  • keep assume_init

Misc

  • Keep clear and append
  • Remove add_filled
  • Rename set_filled to assume_filled (to match assume_init)

The current API is documented at https://doc.rust-lang.org/nightly/std/io/struct.ReadBuf.html

This issue continues a discussion from Zulip (https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/ReadBuf.20API), there was not much discussion relevant to the points above.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 8, 2022
@djc
Copy link
Contributor

djc commented Mar 8, 2022

filled_len -> len to match Vec etc.

Not sure about this one. If I convert a &mut [u8] to ReadBuf (with let mut buf = ReadBuf::from(slice);, say), suddenly the buf.len() is shorter than the slice.len(). IMO we should either stick with filled_len() or ditch it in favor of filled().len() (which is nicely symmetric with unfilled().len()).

Remove filled and filled_mut, replace by implementing Into<&'a mut [u8]> for ReadBuf<'a>

This one also doesn't make sense to me. filled() and filled_mut() reborrow the buffer from the ReadBuf, which is very different from giving away ownership of the ReadBuf in order to get a slice. IIRC it's at least somewhat useful to be able to get access to filled() while keeping the ReadBuf intact for further use.

@joshtriplett
Copy link
Member

I agree that we shouldn't have unqualified methods like len affect part (not all) of the buffer; too much potential for confusion.

@nrc
Copy link
Member Author

nrc commented Mar 8, 2022

suddenly the buf.len() is shorter than the slice.len()

Do you think this will have practical problems? To me it seems that creating a ReadBuf from a slice means we change the semantics of how we interpret the slice and thus len could have a different value.

filled() and filled_mut() reborrow the buffer from the ReadBuf, which is very different from giving away ownership of the ReadBuf in order to get a slice. IIRC it's at least somewhat useful to be able to get access to filled() while keeping the ReadBuf intact for further use.

That makes sense, does Into<&'a mut [u8]> for &'a mut ReadBuf<'a> work better? (I had also previously suggested using Deref, though I think that is too magical) My underlying philosophy here is that only the user wants to get access to the filled slice, and if the user wants the filled slice, then the filled slice is the part the 'useful' part.

I agree that we shouldn't have unqualified methods like len affect part (not all) of the buffer; too much potential for confusion.

My analogy here is Vec where len is only the 'useful' part of the available buffer, and capacity is the whole part.

@beepster4096
Copy link
Contributor

I don't like having From/Into be the only way to do this. I think it's always better to have explicit methods of doing things, in addition to the conversion traits.

@djc
Copy link
Contributor

djc commented Mar 13, 2022

Do you think this will have practical problems? To me it seems that creating a ReadBuf from a slice means we change the semantics of how we interpret the slice and thus len could have a different value.

I guess my concern is that there are two lengths in play here, the filled length and the initialized length, and because there are two it would be better to disambiguate between them explicitly. The same is not true for Vec, where all filled values are also initialized.

That makes sense, does Into<&'a mut [u8]> for &'a mut ReadBuf<'a> work better? (I had also previously suggested using Deref, though I think that is too magical) My underlying philosophy here is that only the user wants to get access to the filled slice, and if the user wants the filled slice, then the filled slice is the part the 'useful' part.

IMO this proposed Into argument loogs pretty convoluted and not at all intuitive. I don't think conversions into slices like this are very common? They would almost always be represented by a method instead. And to turn it the other way around, what is gained by changing this into an Into impl? Having methods is perfectly fine, their documentation is generally easier to locate and their name provides an explicit hint about what's going on.

My analogy here is Vec where len is only the 'useful' part of the available buffer, and capacity is the whole part.

See my point above about there being a third boundary in play in this case, vs two for Vec.

@nrc
Copy link
Member Author

nrc commented Mar 14, 2022

OK, I've removed the controversial changes: lets keep filled and not implement deref/into for getting the the filled part. I've suggested removing filled_len and using filled().len(). (fwiw, I think the distinction between filled and unfilled should not be important for the user (only to a function which is filling the buffer), so I think the analogy to Vec holds, but operating directly on the slices seems fine).

I'm still unsure about initialize_unfilled - I'd like to remove it since it seems like a weird mashing together of functionality, however, I can also believe it might be convenient.

compiler-errors added a commit to compiler-errors/rust that referenced this issue Jun 9, 2022
std::io: Modify some ReadBuf method signatures to return `&mut Self`

This allows using `ReadBuf` in a builder-like style and to setup a `ReadBuf` and
pass it to `read_buf` in a single expression, e.g.,

```
// With this PR:
reader.read_buf(ReadBuf::uninit(buf).assume_init(init_len))?;

// Previously:
let mut buf = ReadBuf::uninit(buf);
buf.assume_init(init_len);
reader.read_buf(&mut buf)?;
```

r? `@sfackler`

cc rust-lang#78485, rust-lang#94741
bors added a commit to rust-lang-ci/rust that referenced this issue Jun 10, 2022
std::io: Modify some ReadBuf method signatures to return `&mut Self`

This allows using `ReadBuf` in a builder-like style and to setup a `ReadBuf` and
pass it to `read_buf` in a single expression, e.g.,

```
// With this PR:
reader.read_buf(ReadBuf::uninit(buf).assume_init(init_len))?;

// Previously:
let mut buf = ReadBuf::uninit(buf);
buf.assume_init(init_len);
reader.read_buf(&mut buf)?;
```

r? `@sfackler`

cc rust-lang#78485, rust-lang#94741
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
std::io: migrate ReadBuf to BorrowBuf/BorrowCursor

This PR replaces `ReadBuf` (used by the `Read::read_buf` family of methods) with `BorrowBuf` and `BorrowCursor`.

The general idea is to split `ReadBuf` because its API is large and confusing. `BorrowBuf` represents a borrowed buffer which is mostly read-only and (other than for construction) deals only with filled vs unfilled segments. a `BorrowCursor` is a mostly write-only view of the unfilled part of a `BorrowBuf` which distinguishes between initialized and uninitialized segments. For `Read::read_buf`, the caller would create a `BorrowBuf`, then pass a `BorrowCursor` to `read_buf`.

In addition to the major API split, I've made the following smaller changes:

* Removed some methods entirely from the API (mostly the functionality can be replicated with two calls rather than a single one)
* Unified naming, e.g., by replacing initialized with init and assume_init with set_init
* Added an easy way to get the number of bytes written to a cursor (`written` method)

As well as simplifying the API (IMO), this approach has the following advantages:

* Since we pass the cursor by value, we remove the 'unsoundness footgun' where a malicious `read_buf` could swap out the `ReadBuf`.
* Since `read_buf` cannot write into the filled part of the buffer, we prevent the filled part shrinking or changing which could cause underflow for the caller or unexpected behaviour.

## Outline

```rust
pub struct BorrowBuf<'a>

impl Debug for BorrowBuf<'_>

impl<'a> From<&'a mut [u8]> for BorrowBuf<'a>
impl<'a> From<&'a mut [MaybeUninit<u8>]> for BorrowBuf<'a>

impl<'a> BorrowBuf<'a> {
    pub fn capacity(&self) -> usize
    pub fn len(&self) -> usize
    pub fn init_len(&self) -> usize
    pub fn filled(&self) -> &[u8]
    pub fn unfilled<'this>(&'this mut self) -> BorrowCursor<'this, 'a>
    pub fn clear(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
}

pub struct BorrowCursor<'buf, 'data>

impl<'buf, 'data> BorrowCursor<'buf, 'data> {
    pub fn clone<'this>(&'this mut self) -> BorrowCursor<'this, 'data>
    pub fn capacity(&self) -> usize
    pub fn written(&self) -> usize
    pub fn init_ref(&self) -> &[u8]
    pub fn init_mut(&mut self) -> &mut [u8]
    pub fn uninit_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn as_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn advance(&mut self, n: usize) -> &mut Self
    pub fn ensure_init(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
    pub fn append(&mut self, buf: &[u8])
}
```

## TODO

* ~~Migrate non-unix libs and tests~~
* ~~Naming~~
  * ~~`BorrowBuf` or `BorrowedBuf` or `SliceBuf`? (We might want an owned equivalent for the async IO traits)~~
  * ~~Should we rename the `readbuf` module? We might keep the name indicate it includes both the buf and cursor variations and someday the owned version too. Or we could change it. It is not publicly exposed, so it is not that important~~.
  * ~~`read_buf` method: we read into the cursor now, so the `_buf` suffix is a bit weird.~~
* ~~Documentation~~
* Tests are incomplete (I adjusted existing tests, but did not add new ones).

cc rust-lang#78485, rust-lang#94741
supersedes: rust-lang#95770, rust-lang#93359
fixes rust-lang#93305
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
std::io: migrate ReadBuf to BorrowBuf/BorrowCursor

This PR replaces `ReadBuf` (used by the `Read::read_buf` family of methods) with `BorrowBuf` and `BorrowCursor`.

The general idea is to split `ReadBuf` because its API is large and confusing. `BorrowBuf` represents a borrowed buffer which is mostly read-only and (other than for construction) deals only with filled vs unfilled segments. a `BorrowCursor` is a mostly write-only view of the unfilled part of a `BorrowBuf` which distinguishes between initialized and uninitialized segments. For `Read::read_buf`, the caller would create a `BorrowBuf`, then pass a `BorrowCursor` to `read_buf`.

In addition to the major API split, I've made the following smaller changes:

* Removed some methods entirely from the API (mostly the functionality can be replicated with two calls rather than a single one)
* Unified naming, e.g., by replacing initialized with init and assume_init with set_init
* Added an easy way to get the number of bytes written to a cursor (`written` method)

As well as simplifying the API (IMO), this approach has the following advantages:

* Since we pass the cursor by value, we remove the 'unsoundness footgun' where a malicious `read_buf` could swap out the `ReadBuf`.
* Since `read_buf` cannot write into the filled part of the buffer, we prevent the filled part shrinking or changing which could cause underflow for the caller or unexpected behaviour.

## Outline

```rust
pub struct BorrowBuf<'a>

impl Debug for BorrowBuf<'_>

impl<'a> From<&'a mut [u8]> for BorrowBuf<'a>
impl<'a> From<&'a mut [MaybeUninit<u8>]> for BorrowBuf<'a>

impl<'a> BorrowBuf<'a> {
    pub fn capacity(&self) -> usize
    pub fn len(&self) -> usize
    pub fn init_len(&self) -> usize
    pub fn filled(&self) -> &[u8]
    pub fn unfilled<'this>(&'this mut self) -> BorrowCursor<'this, 'a>
    pub fn clear(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
}

pub struct BorrowCursor<'buf, 'data>

impl<'buf, 'data> BorrowCursor<'buf, 'data> {
    pub fn clone<'this>(&'this mut self) -> BorrowCursor<'this, 'data>
    pub fn capacity(&self) -> usize
    pub fn written(&self) -> usize
    pub fn init_ref(&self) -> &[u8]
    pub fn init_mut(&mut self) -> &mut [u8]
    pub fn uninit_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn as_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn advance(&mut self, n: usize) -> &mut Self
    pub fn ensure_init(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
    pub fn append(&mut self, buf: &[u8])
}
```

## TODO

* ~~Migrate non-unix libs and tests~~
* ~~Naming~~
  * ~~`BorrowBuf` or `BorrowedBuf` or `SliceBuf`? (We might want an owned equivalent for the async IO traits)~~
  * ~~Should we rename the `readbuf` module? We might keep the name indicate it includes both the buf and cursor variations and someday the owned version too. Or we could change it. It is not publicly exposed, so it is not that important~~.
  * ~~`read_buf` method: we read into the cursor now, so the `_buf` suffix is a bit weird.~~
* ~~Documentation~~
* Tests are incomplete (I adjusted existing tests, but did not add new ones).

cc rust-lang#78485, rust-lang#94741
supersedes: rust-lang#95770, rust-lang#93359
fixes rust-lang#93305
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
std::io: Modify some ReadBuf method signatures to return `&mut Self`

This allows using `ReadBuf` in a builder-like style and to setup a `ReadBuf` and
pass it to `read_buf` in a single expression, e.g.,

```
// With this PR:
reader.read_buf(ReadBuf::uninit(buf).assume_init(init_len))?;

// Previously:
let mut buf = ReadBuf::uninit(buf);
buf.assume_init(init_len);
reader.read_buf(&mut buf)?;
```

r? `@sfackler`

cc rust-lang/rust#78485, rust-lang/rust#94741
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
std::io: migrate ReadBuf to BorrowBuf/BorrowCursor

This PR replaces `ReadBuf` (used by the `Read::read_buf` family of methods) with `BorrowBuf` and `BorrowCursor`.

The general idea is to split `ReadBuf` because its API is large and confusing. `BorrowBuf` represents a borrowed buffer which is mostly read-only and (other than for construction) deals only with filled vs unfilled segments. a `BorrowCursor` is a mostly write-only view of the unfilled part of a `BorrowBuf` which distinguishes between initialized and uninitialized segments. For `Read::read_buf`, the caller would create a `BorrowBuf`, then pass a `BorrowCursor` to `read_buf`.

In addition to the major API split, I've made the following smaller changes:

* Removed some methods entirely from the API (mostly the functionality can be replicated with two calls rather than a single one)
* Unified naming, e.g., by replacing initialized with init and assume_init with set_init
* Added an easy way to get the number of bytes written to a cursor (`written` method)

As well as simplifying the API (IMO), this approach has the following advantages:

* Since we pass the cursor by value, we remove the 'unsoundness footgun' where a malicious `read_buf` could swap out the `ReadBuf`.
* Since `read_buf` cannot write into the filled part of the buffer, we prevent the filled part shrinking or changing which could cause underflow for the caller or unexpected behaviour.

## Outline

```rust
pub struct BorrowBuf<'a>

impl Debug for BorrowBuf<'_>

impl<'a> From<&'a mut [u8]> for BorrowBuf<'a>
impl<'a> From<&'a mut [MaybeUninit<u8>]> for BorrowBuf<'a>

impl<'a> BorrowBuf<'a> {
    pub fn capacity(&self) -> usize
    pub fn len(&self) -> usize
    pub fn init_len(&self) -> usize
    pub fn filled(&self) -> &[u8]
    pub fn unfilled<'this>(&'this mut self) -> BorrowCursor<'this, 'a>
    pub fn clear(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
}

pub struct BorrowCursor<'buf, 'data>

impl<'buf, 'data> BorrowCursor<'buf, 'data> {
    pub fn clone<'this>(&'this mut self) -> BorrowCursor<'this, 'data>
    pub fn capacity(&self) -> usize
    pub fn written(&self) -> usize
    pub fn init_ref(&self) -> &[u8]
    pub fn init_mut(&mut self) -> &mut [u8]
    pub fn uninit_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn as_mut(&mut self) -> &mut [MaybeUninit<u8>]
    pub unsafe fn advance(&mut self, n: usize) -> &mut Self
    pub fn ensure_init(&mut self) -> &mut Self
    pub unsafe fn set_init(&mut self, n: usize) -> &mut Self
    pub fn append(&mut self, buf: &[u8])
}
```

## TODO

* ~~Migrate non-unix libs and tests~~
* ~~Naming~~
  * ~~`BorrowBuf` or `BorrowedBuf` or `SliceBuf`? (We might want an owned equivalent for the async IO traits)~~
  * ~~Should we rename the `readbuf` module? We might keep the name indicate it includes both the buf and cursor variations and someday the owned version too. Or we could change it. It is not publicly exposed, so it is not that important~~.
  * ~~`read_buf` method: we read into the cursor now, so the `_buf` suffix is a bit weird.~~
* ~~Documentation~~
* Tests are incomplete (I adjusted existing tests, but did not add new ones).

cc rust-lang/rust#78485, rust-lang/rust#94741
supersedes: rust-lang/rust#95770, rust-lang/rust#93359
fixes #93305
@Enselic Enselic added the A-io Area: std::io, std::fs, std::net and std::path label Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: std::io, std::fs, std::net and std::path T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants