Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp unstable MaybeUninit APIs #122

Open
SUPERCILEX opened this issue Oct 17, 2022 · 42 comments
Open

Revamp unstable MaybeUninit APIs #122

SUPERCILEX opened this issue Oct 17, 2022 · 42 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@SUPERCILEX
Copy link

SUPERCILEX commented Oct 17, 2022

Goals

Many of the new MaybeUninit APIs deal with slices or arrays. The current state of affairs is unfortunate because it leads to API duplication. An API may need to be duplicated 3 times: once on MaybeUninit and then another two times for slices and arrays. In an ideal world, the type system could express some sort of container API where if something contains a MaybeUninit then certain methods are available. Since we don't live in that world, the primary goal is to make as many MaybeUninit methods available as possible on slices and arrays without duplication.

Other goals include guiding users towards the right APIs and making unsafely more visible.

Problematic APIs

MaybeUninit::{uninit_array,array_assume_init}

These APIs let you create MaybeUninit arrays, but do not let you manipulate them. Any APIs on MaybeUninit would need to be duplicated on arrays or accessed via slices (requiring borrowing).

Proposal

An API that converts between [MaybeUninit<T>; N] <-> MaybeUninit<[T; N]>. Since you can get a MaybeUninit with your array inside it, anything on MaybeUninit will work for arrays.

Currently it is named transpose but there may be clearer names.

Alternatives

Implement index traits on MaybeUninit<[T; N]>

My two primary concerns are that this prevents the natural use of array initialization syntax and would require bounds checking when that may not be desirable.

Furthermore, I don't think this works with MaybeUninits of non-Copy types unless you call the index method directly?

slice_assume_init variants

Assuming a transpose-like API is not possible for slices, these methods need to be available.

Proposal

Currently they aren't inherent methods but should be.

as_bytes variants

Based on the discussion below we should keep these methods to circumvent unsafety around invalid pointers.

Proposal

Currently they aren't inherent methods but should be. Add documentation warning users to be extra careful about meeting the invariants of the type they are mucking around with.

slice_as_ptr variants

These methods are questionable because they combine the removal of bounds checking with raw pointer conversion. You may want one but not the other, and in either case, you should be required to use separate unsafes.

Proposal

Get rid of them and instead provide safe MaybeUninit pointer conversion to its underlying type (T). That is, *{const,mut} MaybeUninit<T> -> *{const,mut} T.

write_slice variants

Based on discussion in the tracking issue, the name of these methods has not conveyed the fact that write_cloned does not drop old items. I believe we should just use consistent names with slices and document this behavior in bold or something. And at worst, memory leaks are not unsafe in rust so eh.

Proposal

  • Rename to {copy,clone}_from_slice
  • Move to inherent methods

uninit

This method cannot be used directly in array initializers if T is not Copy. However, using a const gets around this.

Proposal

Add an UNINIT const that does the same thing as uninit.

Drawbacks

API duplication. This also isn't necessary with transpose, though it would look prettier.

API diff

impl<T> MaybeUninit<T> {
    // Removed method. Can be replaced with `array.transpose().assume_init()`
-    pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
-    pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
-    pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
-    pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
-    pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
-    pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
-    pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
-    pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
-    pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
-    pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
+    pub unsafe fn assume_init_ref(&self) -> &[T]
+    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
+    pub fn copy_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
+    pub fn clone_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Clone
+    pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
+    pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]
}

// Completely new methods

impl<T, const N: usize> MaybeUninit<[T; N]> {
+    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T, const N: usize> [MaybeUninit<T>; N] {
+    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

impl<T> *const MaybeUninit<T> {
+    pub const fn inner(self) -> *const T
}
impl<T> *mut MaybeUninit<T> {
+    pub const fn inner(self) -> *mut T
}
@SUPERCILEX SUPERCILEX added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Oct 17, 2022
@the8472
Copy link
Member

the8472 commented Oct 17, 2022

Please list all the API changes you intend to make. And the motivation doesn't explain why this is better, why we'd want this.

@SUPERCILEX SUPERCILEX changed the title Remove MaybeUninit::{uninit_array,array_assume_init} now that transpose APIs are available Remove MaybeUninit::{uninit_array,array_assume_init} now that transpose APIs are available Oct 17, 2022
@SUPERCILEX
Copy link
Author

Done. The motivation is from #110.

@the8472
Copy link
Member

the8472 commented Oct 17, 2022

I'm seeing additional changes on rustc PRs that all are related to MaybeUninit. I think a larger overview of what you're aiming for overall rather than piecemeal changes would be helpful.

@SUPERCILEX
Copy link
Author

I'm not sure I agree since these changes can go through independently, but the big theme is "generalize":

  • Transpose is a superset of anything you could do with MaybeUninit::{uninit_array,array_assume_init}.
  • The slice_as{,_mut}_ptr methods can be replaced with a combination of array indexing (or get_unchecked if you don't want bounds checking which should be an explicit choice) and then a safe cast to the inner type.
    • I'm going to deal with this last because I'm not sure I like my solution, but I also really hate the current methods because they force you to combine elided bounds checking with reference to pointer conversion.

And then unrelated:

  • Move a bunch of methods that couldn't be on slices to be on slices since that's possible now.

@ChrisDenton
Copy link

ChrisDenton commented Oct 18, 2022

Quick summary, as far as I understand it. I'm including transpose here even though it's already merged because its use is what makes this work. Sorry if I've missed anything.

EDIT: Updated after feedback from SUPERCILEX
EDIT EDIT: Removed as_bytes methods, see below
EDIT EDIT EDIT: This is now outdated. See SUPERCILEX opening comment

impl<T, const N: usize> MaybeUninit<[T; N]> {
    /// MaybeUninit<[T; N]> into a [MaybeUninit<T>; N]
    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T> MaybeUninit<T> {
    // Remove `as_bytes` methods. See below for discussion.
    //pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
    //pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

// New array method
impl<T, const N: usize> [MaybeUninit<T>; N] {
    /// Transposes a [MaybeUninit<T>; N] into a MaybeUninit<[T; N]>.
    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
    pub fn write_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
    pub fn write_slice_cloned(&mut self, src: &[T]) -> &mut [T] where T: Clone
    pub unsafe fn assume_init_ref(&self) -> &[T]
    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
}

// Convert a MaybeUninit pointer to a pointer to its underlying type.
// This is not a `From` implementation because that could be surprising in unsafe code.
impl<T> *const MaybeUninit<T> {
    pub const fn inner(self) -> *const T {
        self as *const T
    }
}
impl<T> *mut MaybeUninit<T> {
    pub const fn inner(self) -> *mut T {
        self as *mut T
    }
}

@SUPERCILEX
Copy link
Author

SUPERCILEX commented Oct 18, 2022

as{,_mut}_bytes should be under [MaybeUninit<T>] instead of [MaybeUninit<T>; N]. Only transpose will ever be needed under [MaybeUninit<T>; N] because you have access to all of MaybeUninit after the transpose. Note that the only reason any of the [MaybeUninit<T>] methods exist is because you can't transpose references AFAIK (&[MaybeUninit<u8>] to MaybeUninit<&[u8]> makes no sense).


impl From<*const MaybeUninit<T>> for *const T;
impl From<*mut MaybeUninit<T>> for *mut T;

It was decided that these guys should not be From impls since that's too implicit. I'll tackle slice_as{,_mut}_ptr (and the safe pointer casts) last since that will need bikeshedding + more thought.


Just a general note, I'd hesitate to say the "new" methods were "removed." While true from the API diff standpoint, they were just moved. Anyway, thank you!

@ChrisDenton
Copy link

Thanks, I've edited my comment to reflect that.

@SUPERCILEX
Copy link
Author

Nice! Slight tweak:

+   // Removed methods. These can be replaced with `array.as_{,_mut}ptr().inner()`.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]

@SUPERCILEX
Copy link
Author

Oh wait no sorry misread stuff. I think you're fixing it as we speak. Writing out reply rn.

@ChrisDenton
Copy link

Yeah, sorry my copy/pasting went askew and I got in a muddle fixing it. Should be sorted now.

@SUPERCILEX
Copy link
Author

Still not quite cuz I goofed, should be:

impl<T> MaybeUninit<T> {
    // Renamed from `as_bytes_mut`
    pub fn as_mut_bytes(&mut self) -> &mut [MaybeUninit<u8>];

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
    //pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

@ChrisDenton
Copy link

Ok, I think I got there in the end! Thanks.

@SUPERCILEX
Copy link
Author

SUPERCILEX commented Oct 18, 2022

Lol, thanks! Looks good.

Maybe also worth pointing out that there's discussion around the write_slice method naming: rust-lang/rust#79995.


Seeing all of this listed out is really helpful. It's making me wonder about the as_bytes methods. They feel like a weird restricted transmute. as_bytes could become as_r and output any type R to generalize, but then we've basically re-invented transmutes. Poking around, I couldn't find usages of the non-slice as_bytes methods. The slice ones seem to be used to read data from a byte stream into a repr(C) type. I'm starting to lean towards getting rid of those methods and requiring an explict transmute. The tracking issue talks about padding bytes being safe, but that's not the big pitfall, the implicit transmute is:

#![feature(maybe_uninit_as_bytes)]
use core::mem::MaybeUninit;

#[derive(Debug)]
enum Variant { A(u64), B(String) }

fn main() {
    let mut u = MaybeUninit::new(Variant::A(42));

    // How are variants represented again? Who knows, but 69 is better than 42
    u.as_bytes_mut()[0].write(69);
    
    // SAFETY: look how safe this is! It's intialized, so nothing could go wrong right?
    let u = unsafe { u.assume_init() };
    
    // Ahem
    println!("{u:?}");
}

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I desperately need to get some other stuff done, but I'll open a PR to nuke the as_bytes methods in a bit.

@thomcc
Copy link
Member

thomcc commented Oct 18, 2022

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I think it just means we need to be clearer about the invariants that assume_init() must uphold (perhaps... I actually think we're clear about this already) -- it's already true that it must not just be initialized, it must uphold the validity invariants of a T as well.

More specifically, we've been pretty clear for a while now that MaybeUninit<T> can hold initialized values aside from T (it's not just T | undef, IOW).

@scottmcm
Copy link
Member

Right, assume_init means "assume initialized as a T", not just "the bits were set" -- otherwise MaybeUninit::zeroed would make no sense. And yes, MaybeUninit holds sizeof(T) bytes that may or may not be initialized. rust-lang/rust#97712 is interesting reading related to that.

I would encourage moving the as_bytes discussions to a different issue. They're more around https://github.com/rust-lang/project-safe-transmute kinds of scenarios, in my head, rather than the uninit/assume_init things that this one seems to mostly be talking about.

Naming feedback: I'm really not convinced by the name inner for those pointer methods. That might be ok for safe stuff, where it being unclear might be fine, but like Ralf was saying in https://github.com/rust-lang/rust/pull/101179/files#r996682951, around unsafe code -- as these necessarily are, because pointers -- it's worth being more specific about what they're doing. Maybe #51 (comment) can provide some inspiration? Pointers now have cast, cast_const, and cast_mut, so maybe cast_init or something?

Said otherwise, I think "rework all the MaybeUninit functions" is too big a scope for the issue to feasibly get consensus on. I suspect that there's something smaller and separable but still valuable. For example, I might phrase my initial impression of a bunch of these things as being about "rather than needing different-named creation and consumption methods in MaybeUninit for a bunch of different types, there should be a couple extra transformers or methods on those other types to be able to just use uninit and assume_init".

@SUPERCILEX
Copy link
Author

@thomcc Sure, I agree with you that MaybeUninit can hold "initialized" values like MaybeUninit::<NonZeroUsize>::zeroed() that violate the invariants of the type and it's therefore UB to assume_init even though the memory is initialized. That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized. Please correct me if I'm wrong as then I don't object to the as_bytes methods. However, if you really can't uninitialize MaybeUninit once it's been initialized, then the as_bytes methods are the only ones that let you safely perform uninitialization, hence my strong objection. They weaken MaybeUninit from a diode allowing initialization to a bidirectional circuit that supports both initialization and uninitialization (very sneakily too).

I would encourage moving the as_bytes discussions to a different issue.

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Naming feedback: I'm really not convinced by the name inner

Fully agree. Do we want this issue to be a catch-all MaybeUninit nightly API overhaul? If so I'll close the other FCP and work on making this issue encompass all the relevant changes + discussion.

@scottmcm
Copy link
Member

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Whatever libs-api says is best is what you should do 😄

It's definitely good to consolidate the array_assume_init/slice_assume_init_mut/slice_assume_init_ref/uninit_array changes into one. But whether that's the same overhaul conversation as as_bytes isn't obvious to me.

@thomcc
Copy link
Member

thomcc commented Oct 19, 2022

That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized.

It can. Consider a case like:

pub fn make_uninitialized<T>(mu: &mut MaybeUninit<T>) {
    *mu = MaybeUninit::uninit();
}

@SUPERCILEX
Copy link
Author

SUPERCILEX commented Oct 19, 2022

@scottmcm Sounds good! :) I closed #123.

TODO(me):

  • Move the Chris signatures to the top
  • Talk about as_bytes
  • Talk about potential UNINIT constant
  • Explain overall API duplication reduction goals
  • Talk about Index{,Mut} traits

Should be able to get to this in a few days.


@thomcc Good point, that lessens my concerns. That said, I remain unconvinced:

  • I would consider that to be replacing the value not modifying it, but I agree that from the API contract perspective that this distinction is irrelevant.
  • as_bytes is still the only safe method that I know of (you'll probs prove me wrong again 😆) that lets you transmute a type and then mess around with it. I guess you could argue that it's similar to doing a chain of raw pointer casts, but I tend to think of raw pointers as the ultimate form of unsafety. If I see a raw pointer, I automatically assume the code is doing something insane. But if I see slices, I assume that at least the types are sound.
  • Related to my previous point, I feel like this turns assume_init into a double unsafe. There's the unsafe for making sure that the type is initialized and its invariants are met, and then as_bytes introduces the unsafe of making sure your individual byte manipulations of the type are sound. This feels different to me than "meet the invariants" because there's no way to individually manipulate part of a struct without using raw pointers: you have to use write which is an all-or-nothing "initialize me" function.
    • For example, this makes it trivially easy to forget about endianness. My argument for requiring a transmute in addition to an assume_init is that it forces you to cleanly separate documenting the safety invariants of "I initialized this" vs. "I know my data will be in big-endian and I did the requisite byte manipulations to make the type valid." Basically I'm saying that if you ignore the uninit parts, we've given people an easy way to avoid thinking about what it means to mess around with the raw bytes of their type. If would be nice if after you see a write, you know that the type will be valid because it was created from safe rust. as_bytes feels like a hidden raw pointer to me.

@m-ou-se m-ou-se added the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Oct 19, 2022
@RalfJung
Copy link
Member

RalfJung commented Oct 19, 2022

That said, I remain unconvinced:

MaybeUninit<u8> is a pretty special type, it's our version of the C++ std::byte. IMO it makes a lot of sense that MaybeUninit would have special methods for it.

But anyway, that's off-topic for this PR. I am totally on-board with overhauling the unstable parts of the MaybeUninit API. I would probably not be much involved in that overhaul though -- I am not a libs API person, once the basic language primitives are available and in principle all things can be written correctly somehow I am really bad at figuring out how to best package them up for people and love to leave that work to others. But that sounds like an ACP/RFC kind of document, not a PR. Oh I didn't realize this is a t-libs design issue. I got pinged into 2 or 3 discussions on the same topic here and completely lost track...

@RalfJung
Copy link
Member

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

@SUPERCILEX
Copy link
Author

C++

I'm not sure we should be using C++ as the role model. 😜 But actually, maybe what I'm advocating for then is to make the as_bytes methods unsafe. I just realized a simple transmute doesn't cut it since you need to set up the slice metadata (and we don't want people to need to deal with that). Anyway, I'll think about this more and put some thoughts in the summary.

I feel like the name should indicate somehow that this is about arrays/slices.

Why? It's defined as an inherent method on arrays... the type system should yell at you if you goof.

And yeah, happy to bikeshed on the name: how about bellybutton terminology? Innie and outie? Ok but more seriously maybe eversion?

@RalfJung
Copy link
Member

What I meant is that the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

@SUPERCILEX
Copy link
Author

the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

Sorry, I haven't quite been able to figure out what you mean by this. Are you saying that because writing the bytes is safe, you know you won't cause UB while writing the bytes? As opposed to pointers where writing can cause UB?

If so, I can potentially see how that's a good thing though I'm still bothered that it's pushing the unsafety of manipulating a type's raw bytes downstream. I guess what I've been saying is that I don't see much value in being able to write the bytes safely. Crashing while creating the type vs while using the type doesn't feel different to me. Actually, thinking about this some more I think what you're trying to say is that as_bytes eliminates the possibility of UBing while creating the type? Whereas pointers get the privilege of UBing after creating a garbage type AND while creating the type?

If I'm understanding your argument correctly, then I'm mostly convinced of the value of the as_bytes methods. Perhaps all the mut variants of as_bytes just need extra loud yelling in the docs that reminds people to ensure the bytes they're putting in the type match its invariants. (Though based on my sidenote below, if everyone is in agreement on which invariants must be matched where, then extra docs could be a nice reminder but probably aren't necessarily.)


As a sidenote, this thought process has changed my understanding of safety with regards to raw pointers. I previously thought writing to the pointer was the right place to document invariants about the type's bytes and how they were being met. However, it sounds like the correct place for that documentation is in the places you read the pointer. That is, writing should focus on the validity of the pointer itself whereas reading should focus on the validity of the data the pointer points to. Seeing that written out it seems somewhat obvious, but this is a newsflash to me lol. I'll reread the pointer docs because I don't remember them talking about this and I think it'd be valuable to mention.

@RalfJung
Copy link
Member

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant. Any code that assumes otherwise is wrong. This needs to be documented in the English text that accompanies MaybeUninit (probably our existing docs could be improved), but I claim that having as_bytes also serves as part of that documentation.

Removing as_bytes would not change the safety invariant, and people could still soundly do the exact same thing that as_bytes does. So the only thing we achieve by removing as_bytes is making it less obvious that something like as_bytes is sound for MaybeUninit.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant. For once, that would be a breaking change. Second, given that MaybeUninit::uninit() is safe and can be assigned into any MaybeUninit, I don't think there even is a choice here. Which safety invariant are you suggesting, concretely?

What you say about raw pointers make sense. It's still worth being careful on writes though, since the read might be unsuspecting safe code reading through a reference.

@SUPERCILEX
Copy link
Author

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant.

Gotya, that makes sense.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant.

I was more advocating for forcing people to document why their writes meet a type's invariants, but yeah that doesn't make much sense when you're writing uninit bytes or zeros.

since the read might be unsuspecting safe code reading through a reference.

I thought this wasn't allowed? The pointer docs say this:

The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory.

That makes it sound like you need to make a new reference after using a raw pointer. Or is it just saying you have to stack your usage (so create and drop the pointer before accessing the original reference again)?

@RalfJung
Copy link
Member

RalfJung commented Oct 31, 2022

I thought this wasn't allowed? The pointer docs say this:

You can always do something like

fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  let _val = *x; // but UB here!
}

So yes as long as it's properly nested, the ptr access is fine, and even the write itself is fine here, but the read can occur in safe code.

@SUPERCILEX
Copy link
Author

Makes sense, thank you!


TODO(me):

  • Clarify ptr docs about stacked borrows.
  • Add section to ptr docs talking about write vs read invariants, noting reference interactions.
  • Update this issue to keep as_bytes and add docs with reminder about invariants.

@ClydeHobart
Copy link

ClydeHobart commented Nov 1, 2022

Unrelated to the uninit byte methods, but is it reasonable to include either

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub unsafe fn array_assume_init_ref(&self) -> &[T; N];
    pub unsafe fn array_assume_init_mut(&mut self) -> &mut [T; N];
}

or

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub fn transpose_ref(&self) -> &MaybeUninit<[T; N]>;
    pub fn transpose_mut(&mut self) -> &mut MaybeUninit<[T; N]>;
}

as part of this proposal? The copy produced by transpose() could be expensive if the array is exceedingly large. The slice methods still exist, but that loses array size information which could be useful.

Edited for tone.

@SUPERCILEX
Copy link
Author

@ClydeHobart I'm not quite sure I understand the use case of those methods.

The copy produced by transpose() could be expensive if the array is exceedingly large.

Are you talking about the copy due to transpose taking in a self as opposed to a &self? If so, that shouldn't be a concern as long as transpose is inlined (which should basically always happen, I'd consider it a bug if not). We could potentially add an llvm ir test to enforce this.


Everyone else: ok, I've updated the issue with the latest proposals. I'm pretty happy with everything (minus some needed bikeshedding) except for the UNINIT const which I'm still unsure about. What are the next steps?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 13, 2022
Add small clarification around using pointers derived from references

r? `@RalfJung`

One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data?

```rust
fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  // What now? x is considered garbage when?
}
```
RalfJung pushed a commit to RalfJung/miri that referenced this issue Nov 15, 2022
Add small clarification around using pointers derived from references

r? `@RalfJung`

One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data?

```rust
fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  // What now? x is considered garbage when?
}
```
@SUPERCILEX
Copy link
Author

From @scottmcm: rust-lang/rust#104475 (comment)

Under the following assumptions:

  • Inline const will be stabilized (or we're fine telling people they need to use that feature)
  • The slice methods need to stick around (or can somehow become part of the field projection rfc)

Then I think the new proposal would be the same as what we have right now except:

  • Keep array_assume_init (but as an inherent method aka Scott's PR)
  • Delete the transpose methods
  • Don't add an UNINIT const

Then people will be expected to create arrays the normal way and can therefore access individual elements. To perform array-wise operations, they will convert the array to a slice and use the slice methods.

Thoughts?

@m-ou-se
Copy link
Member

m-ou-se commented Jan 17, 2023

This pattern where we want some functionality on some type X but also on [X] seems to occur in various places. For example, AtomicU8::get_mut() goes from a &mut AtomicU8 to an &mut u8, but we also want to do that for &mut [AtomicU8] to &mut [u8] (for which we currently have the unstable get_mut_slice). Similarly for Cell, where we have a 'transpose' method named as_slice_of_cells (in only one direction).

Adding functionality using on [X] as Self (like impl<T> [MaybeUninit<T>] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I'm hoping we might be able to find an ergonomic solution that works for more than just MaybeUninit. Maybe just an API convention that we consistently use, but I think we're going to need some new language feature. (Perhaps something related to rust-lang/rfcs#3318?)

@m-ou-se m-ou-se removed the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Jan 17, 2023
@SUPERCILEX
Copy link
Author

I think you're right: we shouldn't be special casing MaybeUninit.

So that leaves two things from this ACP:

  • Remove slice_as_ptr. The proposed solution adds a conversion to unwrap a MaybeUninit pointer into just a T pointer. I doubt whatever general container solution we come up with would be able to handle this, so I think it's still relevant.
  • Make slice_as_bytes inherent. Same as above, this is a transmute, so probs not handled by the solution we come up with.

I'm just speculating here, so I'd also be ok with closing everything and saying we'll wait to find out what the general solution looks like.

thomcc pushed a commit to tcdi/postgrestd that referenced this issue Feb 10, 2023
Add small clarification around using pointers derived from references

r? `@RalfJung`

One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data?

```rust
fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  // What now? x is considered garbage when?
}
```
@nuttycom
Copy link

nuttycom commented Mar 8, 2023

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

IMO transpose is a better name than the operation that it specializes usually gets: https://hackage.haskell.org/package/base-4.18.0.0/docs/Prelude.html#v:sequenceA

@SUPERCILEX
Copy link
Author

Adding functionality using on [X] as Self (like impl [MaybeUninit] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I think we can strike a good balance by making the important methods inherent. I think that means slice_assume_init (and mut) should be inherent while the others can remain static methods. This will hit the most common use cases and make the assume_init API look consistent. The copy slice methods are arguably like ptr::copy_nonoverlapping so that's consistent.

Thoughts?

@SUPERCILEX
Copy link
Author

https://github.com/rust-lang/rust/pull/103131/files shows just how common the assume_init methods are and they end up being much more concise.

bors added a commit to rust-lang-ci/rust that referenced this issue Feb 16, 2024
Rename MaybeUninit::write_slice

A step to push rust-lang#79995 forward.

rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Feb 17, 2024
Rename MaybeUninit::write_slice

A step to push #79995 forward.

rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Rename MaybeUninit::write_slice

A step to push #79995 forward.

rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Rename MaybeUninit::write_slice

A step to push #79995 forward.

rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

9 participants