Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Read::initializer #42788

Open
sfackler opened this issue Jun 21, 2017 · 20 comments · May be fixed by #81156
Open

Tracking issue for Read::initializer #42788

sfackler opened this issue Jun 21, 2017 · 20 comments · May be fixed by #81156

Comments

@sfackler
Copy link
Member

@sfackler sfackler commented Jun 21, 2017

Implemented in #42002.

Open questions / blockers

  • The code currently creates references to uninitialized integers (grep for 42788), can we avoid that?
@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Aug 29, 2017

I'm personally no longer sold on the infrastructure that we've got here I think. I was initially hoping that we'd have some safe way of working with these APIs and such, but in the end it all turned up unsafe. In light of all that I'd personally advocate for much simpler API:

impl Read {
    #[inline]
    unsafe fn may_read_buffer(&self) -> bool { true }
}

I don't think the complexity we have today is worth the cost currently, and I'd imagine that we could start making abstractions like:

fn initialize_buffer_for(r: &Read, buf: &mut [u8]) {
    // ...
}
@tbu-
Copy link
Contributor

@tbu- tbu- commented Dec 13, 2017

I wrote a tiny library for reading into uninitialized buffers: https://crates.io/crates/buffer/

Maybe its ideas could be adapted for the standard library. The following is safe code using the library:

let mut vec = Vec::with_capacity(1024);
if reader.read_buffer(&mut vec)?.len() != 0 {
    if vec[0] == 0 {
        // ...
    }
}

Adoption of something like this in the standard library would probably also help with adoption by third party crates, see e.g. tokio-rs/mio#628.

@seanmonstar
Copy link
Contributor

@seanmonstar seanmonstar commented Feb 21, 2018

An alternative to the current API is to go along the path that TrustedLen did, which is just to have an unsafe trait.

unsafe trait ReadUninit: Read {}

With specialization landing, anyone who cares to can take advantage of ReadUinit:

impl<T: Read> ReadBuf for T {
    fn read_buf<B: BufMut>(&mut self, buf: &mut B) {
        self.read(buf.zeroed())
    }
}

impl<T: ReadUninit> ReadBuf for T {
    fn read_buf<B: BufMut>(&mut self, buf: &mut B) {
        self.read(buf.bytes())
    }
}
@sfackler
Copy link
Member Author

@sfackler sfackler commented Feb 21, 2018

That's incompatible with trait objects which are used pretty heavily with Read in particular.

@seanmonstar
Copy link
Contributor

@seanmonstar seanmonstar commented Feb 21, 2018

Specialization is incompatible with trait objects?

@sfackler
Copy link
Member Author

@sfackler sfackler commented Feb 21, 2018

If I have a &mut Read, I have no way of checking if it implements ReadUninit or not.

@jethrogb
Copy link
Contributor

@jethrogb jethrogb commented Jan 28, 2019

@RalfJung suggested a function that takes &mut [MaybeUninit<u8>]

@alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Apr 10, 2019

@sfackler do you recall if we have a tracking issue for the alternative method of implementing this where we hint to the compiler that a buffer shouldn't be considered as undefined? (the strategy we discussed at the last work week)

@sfackler
Copy link
Member Author

@sfackler sfackler commented Apr 10, 2019

There's a bunch of discussion in the PR adding it: #58363.

It turns out that we explicitly decided to not go in that direction in the lead up to 1.0, so I think it's worth an RFC that I haven't gotten around to writing. In particular, that approach can change behavior for user-provided Read implementations that are passed to e.g. io::copy even before it's stable.

@abonander
Copy link
Contributor

@abonander abonander commented Jul 10, 2019

I second the proposal for &mut [MaybeUninit<u8>] especially since the API design with MaybeUninit seems to be going in that direction anyway; there's just a few methods that I think need to be added to make this work nicely:

// std::mem
impl<T> MaybeUninit<T> {
    pub fn init_all(uninit: &mut [MaybeUninit<T>], val: T) -> &mut [T]  where T: Copy { // `T` could even be `Clone` since this method is panic-safe except for leaks
        ... // write `val` to all elements
        Self::as_init_all(uninit)
    }

    pub fn init_copy(uninit: &mut [MaybeUninit<T>], data: &[T]) -> &mut [T] where T: Copy { // `T` could also be `Clone`
        assert_eq!(uninit.len(), data.len());
        unsafe {
            Self::as_first_ptr(uninit).copy(data, uninit.len());
        }
        Self::as_init_all(uninit)
    }

    pub unsafe fn as_init_all(uninit: &mut [MaybeUninit<T>] -> &mut [T] { 
        ... // assume-initialized intrinsic
        mem::transmute(uninit)
    }
}

// std::io
pub trait Read {
    // ... existing methods

    fn read_uninit(&mut self, uninit: &mut [MaybeUninit<u8>]) -> io::Result<()> {
        self.read(MaybeUninit::init_all(uninit, 0))
    }
}

Then implementations that don't read the buffer can override read_uninit() and use MaybeUninit::as_first_ptr() to pass it to a syscall or copy to it from another buffer with init_copy().

@xfix
Copy link
Contributor

@xfix xfix commented Jul 31, 2019

I wonder about having a method like the following in Read.

fn read_uninit<'a>(&mut self, buf: &'a mut [MaybeUninit<u8>]) -> Result<&'a mut [u8]> {
    for byte in &mut *buf {
        unsafe {
            *byte.as_mut_ptr() = 0;
        }
    }
    let initialized = unsafe { mem::transmute::<&mut [MaybeUninit<u8>], &mut [u8]>(buf) };
    let size = self.read(initialized)?;
    Ok(&mut initialized[..size])
}

The reason for it returning Result<&'a mut [u8]> is to allow using this in safe code as well as preventing malicious implementations - if Result<usize> was returned, as it is a case for other Read methods, an implementation could just say Ok(42) and watch the world burn when trying to interpret the uninitialized bytes. It is the responsibility of an implementation to ensure the mutable slice it returns is initialized.

The choice of 0 assignment is somewhat arbitrary. It can be some other constant or frozen uninitialized memory, as long it doesn't outright cause UB it's fine.

Note that such an implementation doesn't ensure that that result.as_ptr() == MaybeUninit::first_ptr(buf). Unsafe implementations that depend on this should check for this manually. This is necessary, as implementations could return values like Box::leak(vec![1, 2, 3].into_boxed_slice()) otherwise, which isn't verified by the compiler. Optimizer should be able to remove this assertion most of the time.

@sfackler
Copy link
Member Author

@sfackler sfackler commented Jul 31, 2019

The default implementation will make that pretty hard to use in practice, unless you already know that your Read implementation overrides it. You're commonly calling read with a large buffer and don't expect it to be entirely filled, so writing to the entire thing before the raw read call imposes a very large performance penalty. There's an issue somewhere on Tokio's issue tracker about a similar issue with their AsyncRead::read_buf method.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Aug 1, 2019

Maybe this, together with no_std compat, makes it worth considering an io2 module with entirely separate traits?

@seanmonstar
Copy link
Contributor

@seanmonstar seanmonstar commented Aug 1, 2019

I've been thinking for a few months now that if I could redo them, I wouldn't pass byte slices, but some trait object probably (like &mut dyn Buf), so that new concepts can be built and still be compatible.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Aug 1, 2019

What examples of new concepts do you have in mind?

@seanmonstar
Copy link
Contributor

@seanmonstar seanmonstar commented Aug 2, 2019

So far, there's 2 that we use extensively: vectored IO, and uninitialized buffers. Using these features require an extra method to be implemented on Read/Write, and both are critical for performance. A problem we've seen is that any read/write wrappers that forget to forward the extra methods suddenly lose on performance, and it's not obvious why.

If there were a single read and write method, where a wrapper would forward to an inner type, the ability to use uninitialized memory and vectored IO would only depend on the dyn Buf and the innermost Read/Write. For similar inspiration, there's ByteBuffer in Java's NIO and ByteBuf in Netty, which allow for new buffer strategies to be added instead of requiring &[u8].

@abonander
Copy link
Contributor

@abonander abonander commented Aug 28, 2019

Can't we address the issue of wrappers not forwarding with a lint? It's not bulletproof but anyone who cares should see the warning. The bonus of a lint is that it can be applied to other traits with similar situations, like Iterator (and even AsyncRead if it's exposed).

The problem with introducing a new trait/module is that it fragments the ecosystem and early adopters suffer from a lack of widespread support. There's plenty of examples of this. I still don't see a whole lot of use of Java's NIO myself, but then again I don't really work in enterprise software. I'm not entirely opposed to the idea myself, but it would definitely need a concerted push for adoption.

Now, if Rust ever decides to bring async I/O into the stdlib, it could definitely make the transition then as there would be no competing implementation.

@danielhenrymantilla
Copy link
Contributor

@danielhenrymantilla danielhenrymantilla commented Sep 3, 2019

Getting rid of all references to uninitialized bytes in Read

Uninitialized buffers should now be using [MaybeUninit<u8>]. An example of a Read-like API with them is featured in the ::uninit crate. Together with Vec::reserve_uninit it leads to a safe and readable pattern.

The only thing lacking in my crate is an added try_downcast method for Read:

#![allow(bad_style)] // These would not be the final names

trait Read {
    // ...
    //! Added method:

    /// To be overriden by implementors of `ReadIntoUninit`
    #[inline]
    fn try_downcast_to_ReadIntoUninit (self: &'_ mut Self)
        -> Option<&'_ mut dyn ReadIntoUninit>
    {
        None
    }
}

This way, for instance, the unsafe code at https://doc.rust-lang.org/1.37.0/src/std/io/mod.rs.html#355-374 could be removed, and we could have something using DEFAULT_BUF_SIZE for a specialized (R : ReadIntoUninit + ?Sized) version of fn read_to_end<R: Read + ?Sized>(r: &mut R, buf: &mut Vec<u8>) -> Result<usize> { which would avoid using small allocations for the specialized case, and would fall back to zero-initialized small vec extentions in the current case (i.e., also getting rid of the unsafe, since the unsafe block would simply become g.buf.resize(g.buf.len() + 32, 0).

Users with a dyn Read could then try to upgrade it to dyn ReadIntoUninit to use the improved versions of .read_to_end().

Centril added a commit to Centril/rust that referenced this issue Sep 4, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
Centril added a commit to Centril/rust that referenced this issue Sep 5, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
Centril added a commit to Centril/rust that referenced this issue Sep 5, 2019
…ut_uninit_integers, r=Centril

Added warning around code with reference to uninit bytes

Officially, uninitialized integers, and therefore, Rust references to them are _invalid_ (note that this may evolve into official defined behavior (_c.f._, rust-lang/unsafe-code-guidelines#71)).

However, `::std` uses references to uninitialized integers when working with the `Read::initializer` feature (rust-lang#42788), since it relies on this unstably having defined behavior with the current implementation of the compiler (IIUC).

Hence the comment to disincentivize people from using this pattern outside the standard library.
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Sep 3, 2020
Read: adjust a FIXME reference

There's already another reference to rust-lang#42788 for basically the same problem, so lets reuse it here:
https://github.com/rust-lang/rust/blob/5e208efaa850efaa97495e81c49cf0f5767e8f49/library/std/src/io/mod.rs#L369-L376

r? @Dylan-DPC
@Kestrer
Copy link
Contributor

@Kestrer Kestrer commented Jan 10, 2021

Can this be closed now that the reading into unintialized buffers RFC is merged and #78485 exists? Since the API is going to be removed anyway, it seems pointless to keep this open.

@sfackler
Copy link
Member Author

@sfackler sfackler commented Jan 10, 2021

It will be closed when the API is removed.

@DrMeepster DrMeepster linked a pull request that will close this issue Jan 18, 2021
iceiix added a commit to iceiix/fs-web that referenced this issue Jan 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.