Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Reading into uninitialized buffers #2930

Merged
merged 22 commits into from Oct 28, 2020
Merged

Conversation

sfackler
Copy link
Member

@sfackler sfackler commented May 18, 2020

@sfackler sfackler added the T-libs-api label May 18, 2020
text/0000-read-buf.md Outdated Show resolved Hide resolved
sfackler and others added 2 commits May 18, 2020
Co-authored-by: Taiki Endo <te316e89@gmail.com>
Copy link
Member

@shepmaster shepmaster left a comment

Has any exploration been done to see how implementing this in the standard library would be able to replace existing similar usages such as those mentioned as prior art?

A big point of the standard library is to have reusable abstractions, so it would be nice to have some confidence that it is an abstraction that would work outside of just the usage of Read.

text/0000-read-buf.md Outdated Show resolved Hide resolved
@sfackler
Copy link
Member Author

@sfackler sfackler commented May 18, 2020

Yeah, this should be applicable to AsyncRead without any real issue.


The [`tokio::io::AsyncRead`](https://docs.rs/tokio/0.2.21/tokio/io/trait.AsyncRead.html) trait has a somewhat similar
approach, with a `prepare_uninitialized_buffer` method which takes a `&mut [MaybeUninit<u8>]` slice and initializes it
if necessary.
Copy link
Member

@shepmaster shepmaster May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to explicitly state that it's believed that this RFC could replace this usage (even if the crate itself decided not to for whatever reason).

pub struct ReadBuf<'a> {
buf: &'a mut [MaybeUninit<u8>],
written: usize,
initialized: usize,
}
Copy link
Member

@shepmaster shepmaster May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the risk of opening a can of worms, this feels very similar to arrayvec. This makes me wonder a few points:

  1. Could this be made generic over u8, so it could be used for other similar "partial initialization" cases?

  2. Could there be an owning variant of this so that we could do something like

    let mut buf = ReadBufOwned::<const 8192>();

    I'm using unstable const generics here, sadly, but I think the ergonomics are nice.

Because of #2, I'd also like to bikeshed about the name ReadBuf a little. We have Path and PathBuf, where *Buf is the owned version, so I originally wanted to type ReadBufBuf...

Copy link
Member Author

@sfackler sfackler May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Could this be made generic over u8, so it could be used for other similar "partial initialization" cases?

It could be made generic, but I'm not sure the equivalence is really tight enough. The API of ReadBuf is centered entirely around progressive initialization, which doesn't necessarily align with the API for a general purpose vec-style data structure.

2. Could there be an owning variant of this so that we could do something like

That's covered briefly in the future possibilities section.

Copy link
Member

@shepmaster shepmaster May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's covered briefly in the future possibilities section.

Are you referring to this section?

There is probably some kind of abstraction that could be defined to encapsulate that logic.

If so, perhaps that section could be extended a bit; I've re-read it a few times and still can't see a connection to an owning version.

Copy link

@golddranks golddranks May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too, saw some avenues for possible generalizations: currently one is unable to use Vec in no_std crates. It's often preferable for unopinionated libraries to let the caller to decide the "backing store" of a buffer. Slices fit the bill for some cases, but in cases where there is input-dependent or otherwise unknown amount of data stored in the buffer, it's generally desirable to have a type that keeps track the "written part" and "uninitialized part", for both ergonomics and correctness (less book-keeping for the user) and performance (skipping initialization). In this sense, Vec is pretty awesome, but it comes at the cost of an allocation and dependence on allocator.

It's also desirable to have such type as a part of the "common vocabulary" of the ecosystem, but alas, std doesn't have a type that fits the bill at the moment.

Thus, I think it would be valuable to provide a type that is generalized over the contained type, and keeps track the "written"/"uninitialized" split. We could then only allow Reading, but Writeing to it, and consider it as a generic buffer type for some other uses, such as a replacement for Vec for libraries that need to work in no_std contexts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Vec-like abstraction that uses a fixed-size slice of uninitialized memory as backing storage would indeed be a useful abstraction, but requires basically reimplementing the entire Vec if done outside the standard library. I've written at length about this issue here. That writeup may be a bit outdated, but I'm glad to elaborate if you believe this is relevant to this discussion.

@programmerjake
Copy link
Member

@programmerjake programmerjake commented May 19, 2020

One related thing to think about is how well this proposal would mesh with async reads where the kernel is the buffer owner, not the caller. See https://vorner.github.io/2019/11/03/io-uring-mental-experiments.html for some reasoning.

@sfackler
Copy link
Member Author

@sfackler sfackler commented May 19, 2020

I believe this is basically orthogonal to an io_uring interface.

@burdges
Copy link

@burdges burdges commented May 19, 2020

Is there an existing crate to develop ReadBuf and friends outside std for people who want to use this right now? I'm thinking like other "std v2" projects like futures before and concurrently no_std Error?

@joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented May 19, 2020

I think it'd be valuable to specify a feature (even if for internal use only) that allows a trait to specify a list of lists of methods that constitute a minimum implementation, such that an impl that doesn't provide at least one of the minimum implementations will get a warning.

@sfackler
Copy link
Member Author

@sfackler sfackler commented May 20, 2020

@burdges I have some of the code in https://github.com/sfackler/read-buf, but it currently uses an older version of ReadBuf that doesn't track the written cursor internally.

monomorphization or specialization.
4. It needs to work with both normal and vectored IO (via `read_vectored`).
5. It needs to be composable. Readers are very commonly nested (e.g. `GzipReader<TlsStream<TcpStream>>`), and wrapper
readers should be able to opt-in to fast paths supported by their inner reader.
Copy link

@abonander abonander May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every proposal for reading into uninitialized memory that revolved around adding a new method to the Read trait has been criticized for potentially leaving performance on the table if a wrapper type forgets to forward the new method.

I've never seen it seriously discussed to have a warn-by-default lint for trait methods that are meant to be overridden. This could apply to Iterator::size_hint() as well, which also leaves performance on the table if not forwarded.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd gladly take the task of implementing such a lint myself, in fact.

Copy link
Contributor

@withoutboats withoutboats May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, "potentially leaving performance on the table" is fine - and preferable to overwhelming users new to these interfaces with all of the information they need to implement the absolute perfect interface in every case. We need to worry about information overload and ensure we gradually onboard users to exploiting the full power of these interfaces.

Any lints like this should be opt-in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a clippy lint, then?

@rpjohnst
Copy link

@rpjohnst rpjohnst commented May 20, 2020

Regarding the unresolved question of whether to include the written field... what sort of performance impact would we expect there, from the two choices?

With the written field included in ReadBuf, there can be fewer asserts, but the struct itself is larger, on top of becoming a by-reference argument where there once were direct by-value argument and return value. These are probably both mitigated by #[inline], but then there's the question of the impact on compile times.

Without the written field, the struct is smaller, but there are more asserts checking impl correctness. How easy is it to optimize out those asserts? Are there idioms people will learn similar to those that make bounds checks more optimizable?

Or is this difference lost in the noise because we're doing I/O? Are there cases where this might not be true?

@djc
Copy link
Contributor

@djc djc commented May 20, 2020

It might be nice if there was a safe interface for writing to the uninitialized part of a ReadBuf from an already existing slice of buffered data (that is, is there a useful intersection of "safe interface" and "no need to unnecessarily initialize").

I did not read the Dropbox Paper document now (and I think I would feel slightly better if the RFC didn't rely on linking to it). Does it cover instead having a ReadBuf trait, with the Read impl taking that type as an argument (with default)? This could potentially allow some abstraction over things that could be wrapped, as some others have mentioned upthread.

@programmerjake
Copy link
Member

@programmerjake programmerjake commented May 20, 2020

It might be nice if there was a safe interface for writing to the uninitialized part of a ReadBuf from an already existing slice of buffered data (that is, is there a useful intersection of "safe interface" and "no need to unnecessarily initialize").

Seems like you want ReadBuf::append, which is already included.

text/0000-read-buf.md Outdated Show resolved Hide resolved
Co-authored-by: Bastian Kauschke <bastian_kauschke@hotmail.de>
text/0000-read-buf.md Outdated Show resolved Hide resolved
Co-authored-by: Ralf Jung <post@ralfj.de>
///
/// The caller must ensure that `n` unwritten bytes of the buffer have already been initialized.
#[inline]
pub unsafe fn assume_initialized(&mut self, n: usize) { ... }
Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume_init would be more consistent with MaybeUninit::assume_init. But I guess at that point all "initialized" functions here should be called "init" instead. Maybe this is worth putting down as an open question to be decided later?

/// buffer that has been logically written to, a region that has been initialized at some point but not yet logically
/// written to, and a region at the end that is fully uninitialized. The written-to region is guaranteed to be a
/// subset of the initialized region.
pub struct ReadBuf<'a> {
Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explicitly state that written <= initialized <= buf.len() is an invariant of this type?

Copy link
Member Author

@sfackler sfackler May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "the written-to region is guaranteed to be a subset of the initialized region" bit is trying to say that. Maybe it should be clarified?

Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just feel like this is worth repeating in math/Rust notation -- kind of the ultimate clarification. ;)

Copy link
Contributor

@pickfire pickfire May 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand that, having the mathematical notation would make it easier to understand.

///
/// Panics if `self.remaining()` is less than `n`.
#[inline]
pub fn initialize_unwritten_to(&mut self, n: usize) -> &mut [u8] { ... }
Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the "to" here sounds like it defines the target of the operation, like "write X to Y". Also the relation with "up to index n" is a bit weak since this is not index n in the buffer, it is index n in the uninitialized part of the buffer (I think).

Maybe initialize_unwritten_n or initialize_unwritten_len or initialize_unwritten_with_len or so?

Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better set of names for the three methods that give access to the unwritten part would be unwritten, unwritten_len (both of which return &mut [u8]), and unwritten_maybe_uninit (returning &mut [MaybeUninit<u8>]).

Copy link
Member Author

@sfackler sfackler May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it's somewhat important for the names of these methods to indicate that they're potentially doing significant amounts of work.

Copy link
Member

@RalfJung RalfJung May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's fair. OTOH, having the most dangerous method have the shortest name (unwritten_mut) also doesn't seem great.

@Amanieu
Copy link
Member

@Amanieu Amanieu commented May 20, 2020

Have you considered renaming written to filled? I found written a bit confusing at first glance since this type is used when reading data from a file, not writing.

@sfackler
Copy link
Member Author

@sfackler sfackler commented May 20, 2020

Ooh yeah, filled is much cleaner than written! I will update with that and @RalfJung's method name suggestions tonight.

@sfackler
Copy link
Member Author

@sfackler sfackler commented Sep 8, 2020

It does not - Box::leak will produce a reference with any lifetime.

@Thomasdezeeuw
Copy link

@Thomasdezeeuw Thomasdezeeuw commented Oct 15, 2020

I would like to propose a different API based on a trait rather than a new type. It allows &mut [u8] to be used in the same way it is now, but allows easier use of Vec<u8> and any custom buffer types.

The trait is following.

/// Trait to make easier to work with slices of (uninitialised) bytes.
///
/// This is implemented for common types such as `&mut[u8]` and `Vec<u8>`.
pub trait Bytes {
    /// Returns itself as a slice of bytes that may or may not be initialised.
    /// 
    /// # Unsafety
    /// 
    /// Caller must ensure to only write valid bytes to the returned slice.
    unsafe fn as_bytes(&mut self) -> &mut [MaybeUninit<u8>];

    /// Update the length of the byte slice.
    ///
    /// # Safety
    ///
    /// The caller must ensure that at least `n` of the bytes returned by
    /// [`Bytes::as_bytes`] are initialised.
    unsafe fn update_length(&mut self, n: usize);
}

It should be used in the following way:

  1. The read implementation calls Byte::as_bytes and writes bytes into the returned slice.
  2. The read implementation calls Bytes::update_length updating the buffers length.
  3. The caller now has a correctly sized buffer.

This my implementation for TcpStream::try_recv for example:

impl TcpStream {
    pub fn try_recv<B>(&mut self, mut buf: B) -> io::Result<usize>
    where
        B: Bytes,
    {
        let dst = buf.as_bytes();
        debug_assert!(
            !dst.is_empty(),
            "called `TcpStream::try_recv with an empty buffer"
        );
        syscall!(recv(
            self.socket.as_raw_fd(),
            dst.as_mut_ptr().cast(),
            dst.len(),
            0, // Flags.
        ))
        .map(|read| {
            let read = read as usize;
            // Safety: just read the bytes.
            unsafe { buf.update_length(read) }
            read
        })
    }
}

Some implementations of Bytes:

For slices, both using [u8] and [MaybeUninit<u8>].

impl Bytes for [MaybeUninit<u8>] {
    fn as_bytes(&mut self) -> &mut [MaybeUninit<u8>] {
        self
    }

    unsafe fn update_length(&mut self, _: usize) {
        // Can't update the length of a slice.
    }
}

impl Bytes for [u8] {
    fn as_bytes(&mut self) -> &mut [MaybeUninit<u8>] {
        // Safety: `MaybeUninit<u8>` is guaranteed to have the same layout as
        // `u8` so it same to cast the pointer.
        unsafe { &mut *(self as *mut [u8] as *mut [MaybeUninit<u8>]) }
    }

    unsafe fn update_length(&mut self, _: usize) {
        // Can't update the length of a slice.
    }
}

This results is basically the same usage as Read::read currently because the user still needs to slice the buffer properly themselves, but does allow the use of MaybeUninit<u8>. However the implement for Vec<u8> is much nicer (see usage below).

impl Bytes for Vec<u8> {
    fn as_bytes(&mut self) -> &mut [MaybeUninit<u8>] {
        // Safety: `Vec` ensures the pointer is correct for us. The pointer is
        // at least valid for start + `Vec::capacity` bytes, a range we stay
        // within.
        unsafe {
            let len = self.len();
            let data_ptr = self.as_mut_ptr().add(len).cast();
            // NOTE: `Vec` ensure capacity >= len, so this is always >= 0.
            let capacity_left = self.capacity() - len;
            slice::from_raw_parts_mut(data_ptr, capacity_left)
        }
    }

    unsafe fn update_length(&mut self, n: usize) {
        // Safety: caller must ensure that `n` is valid.
        self.set_len(self.len() + n);
    }
}

This allow the following to just work as expected.

let mut stream = TcpStream::connect(address)?;
let mut buf = Vec::with_capacity(4 * 1024); // 4 KB.
let n = stream.recv(&mut buf)?;
// `buf` now contains `n` bytes with it length set to `buf.len() + n`.
// Tthe bytes are added to the `Vec`, not overwritten.

@WaffleLapkin
Copy link
Contributor

@WaffleLapkin WaffleLapkin commented Oct 15, 2020

@Thomasdezeeuw please note that your impl Bytes for [u8] is unsound. While converting Box<[T]> -> Box<[MaybeUninit<T>]> or &[T] -> &[MaybeUninit<T>] is ok, converting &mut [T] to &mut [MaybeUninit<T>] is UB as you may write uninit values to T.

let mut buf = [0u8];
let maybe = unsafe { &mut *((&mut buf) as *mut [u8] as *mut [MaybeUninit<u8>]) };
// After this line `buf` will contain uninitialized value, which is UB
maybe[0] = MaybeUninit::uninit();

See also: rust-lang/rust#66699

@danielhenrymantilla
Copy link

@danielhenrymantilla danielhenrymantilla commented Oct 15, 2020

But note, @Thomasdezeeuw, that your suggestion can be fixed the same way an actual crate with a similar API to your suggestion did: inventing (and using) an &out [u8] reference abstraction: https://docs.rs/uninit/0.4.0/uninit/out_ref/struct.Out.html

It's basically &mut [u8], but for the ability to write garbage to the pointee, which is what makes &mut [u8] -> &mut [MU<u8>] unsound.

@Thomasdezeeuw
Copy link

@Thomasdezeeuw Thomasdezeeuw commented Oct 15, 2020

@WaffleLapkin, @danielhenrymantilla good points. I didn't think about that case. I could be somewhat easily fixed by making Bytes::as_bytes unsafe and requiring the caller to only write valid bytes (I've updated my comment #2930 (comment)). Or as @danielhenrymantilla suggested some kind of wrapper type.

@sfackler
Copy link
Member Author

@sfackler sfackler commented Oct 15, 2020

trait Bytes is not compatible with dyn Read: https://github.com/rust-lang/rfcs/pull/2930/files#diff-ae79fbabbd969ac5cf26c1ec56a7e690310543267ad0c240f20c6f642dbb4eaaR592

unsound unsafe code) is to fail to actually write useful data into the buffer. Code using a `BrokenReader` may see bad
data in the buffer, but the bad data at least has defined contents now!

Note that `read` is still a required method of the `Read` trait. It can be easily written to delegate to `read_buf`:
Copy link
Member

@cramertj cramertj Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally imagining that I'll very quickly pull out this macro:

macro_rules! default_read_impl {
    () => {
        fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
            let mut buf = std::io::ReadBuf::new(buf);
            std::io::Read::read_buf(self, &mut buf)?;
            std::result::Result::Ok(buf.filled().len())
        }
    }
}

impl Read for SomeReader {
    default_read_impl!();
    fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> {
        ...
    }
}

I'm not sure whether that would be worth including in std. My gut feeling is "no", but that it's going to be a very mild annoyance to copy-paste into every new crate that implements lots of Read types.

@programmerjake
Copy link
Member

@programmerjake programmerjake commented Oct 15, 2020

maybe Read::read should have a default implementation and also have an annotation making rustc require at least one of Read::read, Read::read_buf, or the vectored read functions must be implemented.

@sfackler
Copy link
Member Author

@sfackler sfackler commented Oct 15, 2020

That's described in the Future Possibilities section: https://github.com/rust-lang/rfcs/pull/2930/files#diff-ae79fbabbd969ac5cf26c1ec56a7e690310543267ad0c240f20c6f642dbb4eaaR653-R655

/// # Safety
///
/// The caller must not de-initialize portions of the buffer that have already been initialized.
#[inline]
pub unsafe fn unfilled_mut(&mut self) -> &mut [MaybeUninit<u8>] { ... }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to propose that the safety section is expanded to the following:

The caller must not de-initialize portions of the buffer that have already been initialized.
This includes any bytes in the region marked as uninitialized by ReadBuf.

Without a comment like this, it is unclear whether it is sound to put an UninitSlice from the bytes crate into an ReadBuf without first initializing the memory.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or simply change that to return an Out<[u8]> to get rid of the unsafe altogether: even though I like the ReadBuf abstraction since it tracks the number of initialized elements (which an Out<[u8]> does not), I still think that an out-references abstraction is both intuitive and less error-prone than &mut MaybeUninit<...>s, and something that is very tied to "writing into potentially-uninit stuff", so the &mut MU part of the ReadBuf API ought to be replaced with that. Being able to, for instance, feature a non-unsafe version of this method is just one example of that (and then, if people were to need exactly an &mut [MU<u8>], they could call the unsafe .as_uninit_mut() method, who has a well-detailed section about how and why writing uninit bytes to the pointee is unsound).

@nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Oct 28, 2020

Huzzah! The @rust-lang/libs team decided in September to accept this RFC. I have merged it into the repository. The tracking issue is rust-lang/rust#78485, if you would like to follow along from there.

@Blub
Copy link

@Blub Blub commented Jan 1, 2021

Isn't the example in the But how bad are undefined values really? section just wrong in the sense that it uses both unsafe and does exactly the thing that is documented to always be wrong?
Instead of unint().assume_init(), using uninit() and as_mut_ptr() to write, then perform checks and only then finish with assume_init() it should be fine, no?
From what I can tell, ReadBuf seems to be a tool against compiler bug-omizations caused by documented-as-wrong unsafe code. I really don't like how this API feels... Is there an example of code that fails without this which doesn't use unsafe or isn't wrong already? With tokio now I feel like I can't even make the safest code not use unsafe blocks if I don't want to nedlessly pre-initialize something that gets immediately overwritten afterwards anyway...

@danielhenrymantilla
Copy link

@danielhenrymantilla danielhenrymantilla commented Jan 1, 2021

Isn't the example in the But how bad are undefined values really? section just wrong in the sense that it uses both unsafe and does exactly the thing that is documented to always be wrong?

Well, that's the point of that section: it shows how uninit values are not "just random bytes", but that they are treated specially by the compiler. It's a "how bad can uninit-values-based-UB be"? Granted, most people already know that UB is just UB, that there is no more need to dig deeper, but examples of bugs such as the one in that section can show this is not just some theoretical concern, it is a real issue present in current Rust.

Is there an example of code that fails without this which doesn't use unsafe or isn't wrong already?

Technically not yet, precisely due to the lack of an API that would allow it:

  • we currently have an API / a trait that does not let us correctly / soundly read into an uninitialized buffer.

  • So any attempt to try and do that nevertheless is indeed doomed to be unsound.

  • So that's the more reason to try and improve the API to allow this!

The other option would have been to always require an initialized buffer to perform reads to (i.e., keep the current status quo as is), but this can have a non-negligible performance impact in some performance-sensitive code, which would mean that Rust wouldn't live to the zero-cost abstraction ideal anymore, despite OS primitives that allow reading into uninit buffers and languages in which it can be done (e.g., C, C++). That's where the current code "fails".

So this whole RFC is suggesting a way to:

  • perform reads into potentially uninitialized buffers,

  • all without unsafe (at the cost of tracking some lengths at runtime) for most of the API.

The missing piece of this API, which has been disregarded despite my remarks, is that there is still an unsafe part of the API (pub unsafe fn unfilled_mut(&mut self) -> &mut [MaybeUninit<u8>] { ... }), which deals with &mut [MaybeUninit<u8>] "out buffers", instead of having embraced the more primitive abstraction that completely fits this whole domain: out pointers that support both init and uninit buffers.

@algesten
Copy link

@algesten algesten commented Jan 1, 2021

@Blub

Instead of unint().assume_init(), using uninit() and as_mut_ptr() to write, then perform checks and only then finish with assume_init() it should be fine, no?

The problem is that BrokenReader doesn't write, so the memory remains uninitialized. That means the u32 value constructed from that memory is undefined (it's NOT a random value, see Ralf's blog post linked in the RFC). Moving the assume_init doesn't change anything.

Unsure what you mean "perform checks. AFAIK, there is no check to know whether memory was initialized, but it doesn't matter, BrokenReader2 writes, and still is wrong since it reads the uninitialized memory first.

Is there an example of code that fails without this which doesn't use unsafe or isn't wrong already?

The example does illustrate this. BrokenReader and 2 is safe when used with inited mem. The use of uninited memory with BrokenReader appears to be an OK optimization at first glance. These two parts could be implemented in different libraries, work fine for most cases, but the combination is UB.

I feel like I can't even make the safest code not use unsafe blocks if I don't want to nedlessly pre-initialize something that gets immediately overwritten afterwards anyway...

That's exactly what ReadBuf does, no? It gives you a safe way to read into uninited memory – no unsafe needed.

@Darksonn
Copy link

@Darksonn Darksonn commented Jan 1, 2021

Is there an example of code that fails without this which doesn't use unsafe or isn't wrong already?

No of course not. If you don't use unsafe, you cannot trigger undefined behavior. If you don't want to be using unsafe blocks, the price is instead being forced to pre-initialize something that gets immediately overwritten afterwards anyway.

Instead of unint().assume_init(), using uninit() and as_mut_ptr() to write, then perform checks and only then finish with assume_init() it should be fine, no?

Sure, that would be a way to write correct unsafe code that writes to uninitialized memory. The purpose of ReadBuf is two-fold:

  • Act as a type similar to &mut [MaybeUninit<u8>] that can soundly exist while the target memory is uninitialized.
  • Provide a way to tell the caller how much memory was initialized by the call.

The second point is important because the Read trait is not unsafe to implement, so if a bad Read impl claims to have initialized the entire buffer without doing so, and the caller then uses unsafe code to read from that buffer, then that triggers undefined behavior, and since the caller is the one who used unsafe, the caller is at fault. By using ReadBuf we avoid this issue, because to tell the caller that you have initialized more memory in the buffer without actually doing so, you have to call an unsafe method on ReadBuf, moving the fault to the Read impl rather than the caller.

With tokio now I feel like I can't even make the safest code not use unsafe blocks if I don't want to nedlessly pre-initialize something that gets immediately overwritten afterwards anyway...

For Tokio specifically, if you are using the ReadBuf api directly, then you must be writing a poll_* method, as we avoided exposing them in other contexts. For this case, you can use the utility poll_read_buf, which lets you read into uninitialized memory without unsafe code. There's also a companion poll_write_buf.

This method also exists for non-poll methods in the form of AsyncReadExt::read_buf.

Edit: Did we all wake up at the same time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io A-raw-pointers disposition-merge finished-final-comment-period Libs-Tracked T-libs-api to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet