Revamp unstable MaybeUninit APIs #122

SUPERCILEX · 2022-10-17T19:18:49Z

Goals

Many of the new MaybeUninit APIs deal with slices or arrays. The current state of affairs is unfortunate because it leads to API duplication. An API may need to be duplicated 3 times: once on MaybeUninit and then another two times for slices and arrays. In an ideal world, the type system could express some sort of container API where if something contains a MaybeUninit then certain methods are available. Since we don't live in that world, the primary goal is to make as many MaybeUninit methods available as possible on slices and arrays without duplication.

Other goals include guiding users towards the right APIs and making unsafely more visible.

Problematic APIs

`MaybeUninit::{uninit_array,array_assume_init}`

These APIs let you create MaybeUninit arrays, but do not let you manipulate them. Any APIs on MaybeUninit would need to be duplicated on arrays or accessed via slices (requiring borrowing).

Proposal

An API that converts between [MaybeUninit<T>; N] <-> MaybeUninit<[T; N]>. Since you can get a MaybeUninit with your array inside it, anything on MaybeUninit will work for arrays.

Currently it is named transpose but there may be clearer names.

Alternatives

Implement index traits on `MaybeUninit<[T; N]>`

My two primary concerns are that this prevents the natural use of array initialization syntax and would require bounds checking when that may not be desirable.

Furthermore, I don't think this works with MaybeUninits of non-Copy types unless you call the index method directly?

`slice_assume_init` variants

Assuming a transpose-like API is not possible for slices, these methods need to be available.

Proposal

Currently they aren't inherent methods but should be.

`as_bytes` variants

Based on the discussion below we should keep these methods to circumvent unsafety around invalid pointers.

Proposal

Currently they aren't inherent methods but should be. Add documentation warning users to be extra careful about meeting the invariants of the type they are mucking around with.

`slice_as_ptr` variants

These methods are questionable because they combine the removal of bounds checking with raw pointer conversion. You may want one but not the other, and in either case, you should be required to use separate unsafes.

Proposal

Get rid of them and instead provide safe MaybeUninit pointer conversion to its underlying type (T). That is, *{const,mut} MaybeUninit<T> -> *{const,mut} T.

`write_slice` variants

Based on discussion in the tracking issue, the name of these methods has not conveyed the fact that write_cloned does not drop old items. I believe we should just use consistent names with slices and document this behavior in bold or something. And at worst, memory leaks are not unsafe in rust so eh.

Proposal

Rename to {copy,clone}_from_slice
Move to inherent methods

`uninit`

This method cannot be used directly in array initializers if T is not Copy. However, using a const gets around this.

Proposal

Add an UNINIT const that does the same thing as uninit.

Drawbacks

API duplication. This also isn't necessary with transpose, though it would look prettier.

API diff

impl<T> MaybeUninit<T> {
    // Removed method. Can be replaced with `array.transpose().assume_init()`
-    pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
-    pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
-    pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
-    pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
-    pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
-    pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
-    pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
-    pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
-    pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
-    pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
+    pub unsafe fn assume_init_ref(&self) -> &[T]
+    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
+    pub fn copy_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
+    pub fn clone_from_slice(&mut self, src: &[T]) -> &mut [T] where T: Clone
+    pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
+    pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]
}

// Completely new methods

impl<T, const N: usize> MaybeUninit<[T; N]> {
+    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T, const N: usize> [MaybeUninit<T>; N] {
+    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

impl<T> *const MaybeUninit<T> {
+    pub const fn inner(self) -> *const T
}
impl<T> *mut MaybeUninit<T> {
+    pub const fn inner(self) -> *mut T
}

The text was updated successfully, but these errors were encountered:

the8472 · 2022-10-17T19:30:36Z

Please list all the API changes you intend to make. And the motivation doesn't explain why this is better, why we'd want this.

SUPERCILEX · 2022-10-17T19:33:10Z

Done. The motivation is from #110.

the8472 · 2022-10-17T19:38:36Z

I'm seeing additional changes on rustc PRs that all are related to MaybeUninit. I think a larger overview of what you're aiming for overall rather than piecemeal changes would be helpful.

SUPERCILEX · 2022-10-17T20:00:19Z

I'm not sure I agree since these changes can go through independently, but the big theme is "generalize":

Transpose is a superset of anything you could do with MaybeUninit::{uninit_array,array_assume_init}.
The slice_as{,_mut}_ptr methods can be replaced with a combination of array indexing (or get_unchecked if you don't want bounds checking which should be an explicit choice) and then a safe cast to the inner type.
- I'm going to deal with this last because I'm not sure I like my solution, but I also really hate the current methods because they force you to combine elided bounds checking with reference to pointer conversion.

And then unrelated:

Move a bunch of methods that couldn't be on slices to be on slices since that's possible now.

ChrisDenton · 2022-10-18T02:38:27Z

Quick summary, as far as I understand it. I'm including transpose here even though it's already merged because its use is what makes this work. Sorry if I've missed anything.

EDIT: Updated after feedback from SUPERCILEX
EDIT EDIT: Removed as_bytes methods, see below
EDIT EDIT EDIT: This is now outdated. See SUPERCILEX opening comment

impl<T, const N: usize> MaybeUninit<[T; N]> {
    /// MaybeUninit<[T; N]> into a [MaybeUninit<T>; N]
    pub fn transpose(self) -> [MaybeUninit<T>; N]
}
impl<T> MaybeUninit<T> {
    // Remove `as_bytes` methods. See below for discussion.
    //pub fn as_bytes(&self) -> &[MaybeUninit<u8>]
    //pub fn as_bytes_mut(&mut self) -> &mut [MaybeUninit<u8>]

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

// New array method
impl<T, const N: usize> [MaybeUninit<T>; N] {
    /// Transposes a [MaybeUninit<T>; N] into a MaybeUninit<[T; N]>.
    pub fn transpose(self) -> MaybeUninit<[T; N]>;
}

// Moved slice methods
impl<T> [MaybeUninit<T>] {
    pub fn write_slice(&mut self, src: &[T]) -> &mut [T] where T: Copy
    pub fn write_slice_cloned(&mut self, src: &[T]) -> &mut [T] where T: Clone
    pub unsafe fn assume_init_ref(&self) -> &[T]
    pub unsafe fn assume_init_mut(&mut self) -> &mut [T]
}

// Convert a MaybeUninit pointer to a pointer to its underlying type.
// This is not a `From` implementation because that could be surprising in unsafe code.
impl<T> *const MaybeUninit<T> {
    pub const fn inner(self) -> *const T {
        self as *const T
    }
}
impl<T> *mut MaybeUninit<T> {
    pub const fn inner(self) -> *mut T {
        self as *mut T
    }
}

SUPERCILEX · 2022-10-18T02:56:19Z

as{,_mut}_bytes should be under [MaybeUninit<T>] instead of [MaybeUninit<T>; N]. Only transpose will ever be needed under [MaybeUninit<T>; N] because you have access to all of MaybeUninit after the transpose. Note that the only reason any of the [MaybeUninit<T>] methods exist is because you can't transpose references AFAIK (&[MaybeUninit<u8>] to MaybeUninit<&[u8]> makes no sense).

impl From<*const MaybeUninit<T>> for *const T;
impl From<*mut MaybeUninit<T>> for *mut T;

It was decided that these guys should not be From impls since that's too implicit. I'll tackle slice_as{,_mut}_ptr (and the safe pointer casts) last since that will need bikeshedding + more thought.

Just a general note, I'd hesitate to say the "new" methods were "removed." While true from the API diff standpoint, they were just moved. Anyway, thank you!

ChrisDenton · 2022-10-18T03:06:24Z

Thanks, I've edited my comment to reflect that.

SUPERCILEX · 2022-10-18T03:09:17Z

Nice! Slight tweak:

+   // Removed methods. These can be replaced with `array.as_{,_mut}ptr().inner()`.
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]

SUPERCILEX · 2022-10-18T03:12:14Z

Oh wait no sorry misread stuff. I think you're fixing it as we speak. Writing out reply rn.

ChrisDenton · 2022-10-18T03:13:20Z

Yeah, sorry my copy/pasting went askew and I got in a muddle fixing it. Should be sorted now.

SUPERCILEX · 2022-10-18T03:15:54Z

Still not quite cuz I goofed, should be:

impl<T> MaybeUninit<T> {
    // Renamed from `as_bytes_mut`
    pub fn as_mut_bytes(&mut self) -> &mut [MaybeUninit<u8>];

    // Removed method. Can be replaced with `array.transpose().assume_init()`
    //pub unsafe fn array_assume_init<const N: usize>(array: [Self; N]) -> [T; N]
    // Removed method. Can be replaced with `MaybeUninit::<[T; N]>::uninit().transpose()`
    //pub fn uninit_array<const N: usize>() -> [MaybeUninit<T>; N]

    // Removed methods. These can be replaced with `slice.as_{,_mut}ptr().inner()`.
    //pub fn slice_as_mut_ptr(this: &mut [MaybeUninit<T>]) -> *mut T
    //pub fn slice_as_ptr(this: &[MaybeUninit<T>]) -> *const T

    // Moved methods.
    //pub fn slice_as_bytes(this: &[MaybeUninit<T>]) -> &[MaybeUninit<u8>]
    //pub fn slice_as_bytes_mut(this: &mut [MaybeUninit<T>]) -> &mut [MaybeUninit<u8>]
    //pub unsafe fn slice_assume_init_mut(slice: &mut [Self]) -> &mut [T]
    //pub unsafe fn slice_assume_init_ref(slice: &[Self]) -> &[T]
    //pub fn write_slice<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T]where T: Copy
    //pub fn write_slice_cloned<'a>(this: &'a mut [MaybeUninit<T>], src: &[T]) -> &'a mut [T] where T: Clone,
}

ChrisDenton · 2022-10-18T03:18:31Z

Ok, I think I got there in the end! Thanks.

SUPERCILEX · 2022-10-18T04:02:49Z

Lol, thanks! Looks good.

Maybe also worth pointing out that there's discussion around the write_slice method naming: rust-lang/rust#79995.

Seeing all of this listed out is really helpful. It's making me wonder about the as_bytes methods. They feel like a weird restricted transmute. as_bytes could become as_r and output any type R to generalize, but then we've basically re-invented transmutes. Poking around, I couldn't find usages of the non-slice as_bytes methods. The slice ones seem to be used to read data from a byte stream into a repr(C) type. I'm starting to lean towards getting rid of those methods and requiring an explict transmute. The tracking issue talks about padding bytes being safe, but that's not the big pitfall, the implicit transmute is:

#![feature(maybe_uninit_as_bytes)]
use core::mem::MaybeUninit;

#[derive(Debug)]
enum Variant { A(u64), B(String) }

fn main() {
    let mut u = MaybeUninit::new(Variant::A(42));

    // How are variants represented again? Who knows, but 69 is better than 42
    u.as_bytes_mut()[0].write(69);
    
    // SAFETY: look how safe this is! It's intialized, so nothing could go wrong right?
    let u = unsafe { u.assume_init() };
    
    // Ahem
    println!("{u:?}");
}

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I desperately need to get some other stuff done, but I'll open a PR to nuke the as_bytes methods in a bit.

thomcc · 2022-10-18T06:35:55Z

Actually, now that I've written the example, scratch the "starting to lean." I'm completely against all the as_bytes methods and they should be removed ASAP since they can cause UB without violating any of the MaybeUninit invariants. cc @RalfJung for confirmation that I understood this correctly.

I think it just means we need to be clearer about the invariants that assume_init() must uphold (perhaps... I actually think we're clear about this already) -- it's already true that it must not just be initialized, it must uphold the validity invariants of a T as well.

More specifically, we've been pretty clear for a while now that MaybeUninit<T> can hold initialized values aside from T (it's not just T | undef, IOW).

scottmcm · 2022-10-18T07:18:33Z

Right, assume_init means "assume initialized as a T", not just "the bits were set" -- otherwise MaybeUninit::zeroed would make no sense. And yes, MaybeUninit holds sizeof(T) bytes that may or may not be initialized. rust-lang/rust#97712 is interesting reading related to that.

I would encourage moving the as_bytes discussions to a different issue. They're more around https://github.com/rust-lang/project-safe-transmute kinds of scenarios, in my head, rather than the uninit/assume_init things that this one seems to mostly be talking about.

Naming feedback: I'm really not convinced by the name inner for those pointer methods. That might be ok for safe stuff, where it being unclear might be fine, but like Ralf was saying in https://github.com/rust-lang/rust/pull/101179/files#r996682951, around unsafe code -- as these necessarily are, because pointers -- it's worth being more specific about what they're doing. Maybe #51 (comment) can provide some inspiration? Pointers now have cast, cast_const, and cast_mut, so maybe cast_init or something?

Said otherwise, I think "rework all the MaybeUninit functions" is too big a scope for the issue to feasibly get consensus on. I suspect that there's something smaller and separable but still valuable. For example, I might phrase my initial impression of a bunch of these things as being about "rather than needing different-named creation and consumption methods in MaybeUninit for a bunch of different types, there should be a couple extra transformers or methods on those other types to be able to just use uninit and assume_init".

SUPERCILEX · 2022-10-18T07:46:28Z

@thomcc Sure, I agree with you that MaybeUninit can hold "initialized" values like MaybeUninit::<NonZeroUsize>::zeroed() that violate the invariants of the type and it's therefore UB to assume_init even though the memory is initialized. That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized. Please correct me if I'm wrong as then I don't object to the as_bytes methods. However, if you really can't uninitialize MaybeUninit once it's been initialized, then the as_bytes methods are the only ones that let you safely perform uninitialization, hence my strong objection. They weaken MaybeUninit from a diode allowing initialization to a bidirectional circuit that supports both initialization and uninitialization (very sneakily too).

I would encourage moving the as_bytes discussions to a different issue.

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Naming feedback: I'm really not convinced by the name inner

Fully agree. Do we want this issue to be a catch-all MaybeUninit nightly API overhaul? If so I'll close the other FCP and work on making this issue encompass all the relevant changes + discussion.

scottmcm · 2022-10-18T08:00:08Z

Happy to open another FCP, but it sounds like there's a general preference for consolidating all the MaybeUninit changes?

Whatever libs-api says is best is what you should do 😄

It's definitely good to consolidate the array_assume_init/slice_assume_init_mut/slice_assume_init_ref/uninit_array changes into one. But whether that's the same overhaul conversation as as_bytes isn't obvious to me.

thomcc · 2022-10-19T01:24:53Z

That said, AFAIK once a MaybeUninit value becomes properly initialized, it can never be uninitialized.

It can. Consider a case like:

pub fn make_uninitialized<T>(mu: &mut MaybeUninit<T>) {
    *mu = MaybeUninit::uninit();
}

SUPERCILEX · 2022-10-19T03:26:28Z

@scottmcm Sounds good! :) I closed #123.

TODO(me):

Move the Chris signatures to the top
Talk about as_bytes
Talk about potential UNINIT constant
Explain overall API duplication reduction goals
Talk about Index{,Mut} traits

Should be able to get to this in a few days.

@thomcc Good point, that lessens my concerns. That said, I remain unconvinced:

I would consider that to be replacing the value not modifying it, but I agree that from the API contract perspective that this distinction is irrelevant.
as_bytes is still the only safe method that I know of (you'll probs prove me wrong again 😆) that lets you transmute a type and then mess around with it. I guess you could argue that it's similar to doing a chain of raw pointer casts, but I tend to think of raw pointers as the ultimate form of unsafety. If I see a raw pointer, I automatically assume the code is doing something insane. But if I see slices, I assume that at least the types are sound.
Related to my previous point, I feel like this turns assume_init into a double unsafe. There's the unsafe for making sure that the type is initialized and its invariants are met, and then as_bytes introduces the unsafe of making sure your individual byte manipulations of the type are sound. This feels different to me than "meet the invariants" because there's no way to individually manipulate part of a struct without using raw pointers: you have to use write which is an all-or-nothing "initialize me" function.
- For example, this makes it trivially easy to forget about endianness. My argument for requiring a transmute in addition to an assume_init is that it forces you to cleanly separate documenting the safety invariants of "I initialized this" vs. "I know my data will be in big-endian and I did the requisite byte manipulations to make the type valid." Basically I'm saying that if you ignore the uninit parts, we've given people an easy way to avoid thinking about what it means to mess around with the raw bytes of their type. If would be nice if after you see a write, you know that the type will be valid because it was created from safe rust. as_bytes feels like a hidden raw pointer to me.

RalfJung · 2022-10-19T12:22:47Z

That said, I remain unconvinced:

MaybeUninit<u8> is a pretty special type, it's our version of the C++ std::byte. IMO it makes a lot of sense that MaybeUninit would have special methods for it.

But anyway, that's off-topic for this PR. I am totally on-board with overhauling the unstable parts of the MaybeUninit API. I would probably not be much involved in that overhaul though -- I am not a libs API person, once the basic language primitives are available and in principle all things can be written correctly somehow I am really bad at figuring out how to best package them up for people and love to leave that work to others. But that sounds like an ACP/RFC kind of document, not a PR. Oh I didn't realize this is a t-libs design issue. I got pinged into 2 or 3 discussions on the same topic here and completely lost track...

RalfJung · 2022-10-19T12:30:37Z

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

SUPERCILEX · 2022-10-20T01:29:33Z

C++

I'm not sure we should be using C++ as the role model. 😜 But actually, maybe what I'm advocating for then is to make the as_bytes methods unsafe. I just realized a simple transmute doesn't cut it since you need to set up the slice metadata (and we don't want people to need to deal with that). Anyway, I'll think about this more and put some thoughts in the summary.

I feel like the name should indicate somehow that this is about arrays/slices.

Why? It's defined as an inherent method on arrays... the type system should yell at you if you goof.

And yeah, happy to bikeshed on the name: how about bellybutton terminology? Innie and outie? Ok but more seriously maybe eversion?

RalfJung · 2022-10-26T06:25:26Z

What I meant is that the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

SUPERCILEX · 2022-10-28T20:46:46Z

the fact that writing the bytes is safe is a very explicit and useful bit of invariant documentation.

Sorry, I haven't quite been able to figure out what you mean by this. Are you saying that because writing the bytes is safe, you know you won't cause UB while writing the bytes? As opposed to pointers where writing can cause UB?

If so, I can potentially see how that's a good thing though I'm still bothered that it's pushing the unsafety of manipulating a type's raw bytes downstream. I guess what I've been saying is that I don't see much value in being able to write the bytes safely. ~~Crashing while creating the type vs while using the type doesn't feel different to me.~~ Actually, thinking about this some more I think what you're trying to say is that as_bytes eliminates the possibility of UBing while creating the type? Whereas pointers get the privilege of UBing after creating a garbage type AND while creating the type?

If I'm understanding your argument correctly, then I'm mostly convinced of the value of the as_bytes methods. Perhaps all the mut variants of as_bytes just need extra loud yelling in the docs that reminds people to ensure the bytes they're putting in the type match its invariants. (Though based on my sidenote below, if everyone is in agreement on which invariants must be matched where, then extra docs could be a nice reminder but probably aren't necessarily.)

As a sidenote, this thought process has changed my understanding of safety with regards to raw pointers. I previously thought writing to the pointer was the right place to document invariants about the type's bytes and how they were being met. However, it sounds like the correct place for that documentation is in the places you read the pointer. That is, writing should focus on the validity of the pointer itself whereas reading should focus on the validity of the data the pointer points to. Seeing that written out it seems somewhat obvious, but this is a newsflash to me lol. I'll reread the pointer docs because I don't remember them talking about this and I think it'd be valuable to mention.

RalfJung · 2022-10-29T11:44:55Z

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant. Any code that assumes otherwise is wrong. This needs to be documented in the English text that accompanies MaybeUninit (probably our existing docs could be improved), but I claim that having as_bytes also serves as part of that documentation.

Removing as_bytes would not change the safety invariant, and people could still soundly do the exact same thing that as_bytes does. So the only thing we achieve by removing as_bytes is making it less obvious that something like as_bytes is sound for MaybeUninit.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant. For once, that would be a breaking change. Second, given that MaybeUninit::uninit() is safe and can be assigned into any MaybeUninit, I don't think there even is a choice here. Which safety invariant are you suggesting, concretely?

What you say about raw pointers make sense. It's still worth being careful on writes though, since the read might be unsuspecting safe code reading through a reference.

SUPERCILEX · 2022-10-29T17:14:04Z

I am saying that writing uninit bytes into (parts of) a MaybeUninit is definitely allowed by its safety invariant.

Gotya, that makes sense.

Some of what you say sound like you actually intend to not just remove the method, but also change the safety invariant.

I was more advocating for forcing people to document why their writes meet a type's invariants, but yeah that doesn't make much sense when you're writing uninit bytes or zeros.

since the read might be unsuspecting safe code reading through a reference.

I thought this wasn't allowed? The pointer docs say this:

The result of casting a reference to a pointer is valid for as long as the underlying object is live and no reference (just raw pointers) is used to access the same memory.

That makes it sound like you need to make a new reference after using a raw pointer. Or is it just saying you have to stack your usage (so create and drop the pointer before accessing the original reference again)?

RalfJung · 2022-10-31T10:48:41Z

I thought this wasn't allowed? The pointer docs say this:

You can always do something like

fn foo(x: &mut NonZeroI32)  {
  let ptr = x as *mut NonZeroI32;
  unsafe { ptr.cast::<i32>().write(0); } // no UB here
  let _val = *x; // but UB here!
}

So yes as long as it's properly nested, the ptr access is fine, and even the write itself is fine here, but the read can occur in safe code.

SUPERCILEX · 2022-11-01T01:01:25Z

Makes sense, thank you!

TODO(me):

Clarify ptr docs about stacked borrows.
Add section to ptr docs talking about write vs read invariants, noting reference interactions.
Update this issue to keep as_bytes and add docs with reminder about invariants.

ClydeHobart · 2022-11-01T04:29:20Z

Unrelated to the uninit byte methods, but is it reasonable to include either

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub unsafe fn array_assume_init_ref(&self) -> &[T; N];
    pub unsafe fn array_assume_init_mut(&mut self) -> &mut [T; N];
}

or

impl<T, const N: usize> [MaybeUninit<T>; N] {
    pub fn transpose_ref(&self) -> &MaybeUninit<[T; N]>;
    pub fn transpose_mut(&mut self) -> &mut MaybeUninit<[T; N]>;
}

as part of this proposal? The copy produced by transpose() could be expensive if the array is exceedingly large. The slice methods still exist, but that loses array size information which could be useful.

Edited for tone.

SUPERCILEX · 2022-11-05T05:12:42Z

@ClydeHobart I'm not quite sure I understand the use case of those methods.

The copy produced by transpose() could be expensive if the array is exceedingly large.

Are you talking about the copy due to transpose taking in a self as opposed to a &self? If so, that shouldn't be a concern as long as transpose is inlined (which should basically always happen, I'd consider it a bug if not). We could potentially add an llvm ir test to enforce this.

Everyone else: ok, I've updated the issue with the latest proposals. I'm pretty happy with everything (minus some needed bikeshedding) except for the UNINIT const which I'm still unsure about. What are the next steps?

Add small clarification around using pointers derived from references r? `@RalfJung` One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data? ```rust fn foo(x: &mut NonZeroI32) { let ptr = x as *mut NonZeroI32; unsafe { ptr.cast::<i32>().write(0); } // no UB here // What now? x is considered garbage when? } ```

SUPERCILEX · 2022-11-19T20:10:33Z

From @scottmcm: rust-lang/rust#104475 (comment)

Under the following assumptions:

Inline const will be stabilized (or we're fine telling people they need to use that feature)
The slice methods need to stick around (or can somehow become part of the field projection rfc)

Then I think the new proposal would be the same as what we have right now except:

Keep array_assume_init (but as an inherent method aka Scott's PR)
Delete the transpose methods
Don't add an UNINIT const

Then people will be expected to create arrays the normal way and can therefore access individual elements. To perform array-wise operations, they will convert the array to a slice and use the slice methods.

Thoughts?

m-ou-se · 2023-01-17T13:16:31Z

This pattern where we want some functionality on some type X but also on [X] seems to occur in various places. For example, AtomicU8::get_mut() goes from a &mut AtomicU8 to an &mut u8, but we also want to do that for &mut [AtomicU8] to &mut [u8] (for which we currently have the unstable get_mut_slice). Similarly for Cell, where we have a 'transpose' method named as_slice_of_cells (in only one direction).

Adding functionality using on [X] as Self (like impl<T> [MaybeUninit<T>] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I'm hoping we might be able to find an ergonomic solution that works for more than just MaybeUninit. Maybe just an API convention that we consistently use, but I think we're going to need some new language feature. (Perhaps something related to rust-lang/rfcs#3318?)

SUPERCILEX · 2023-01-17T20:34:01Z

I think you're right: we shouldn't be special casing MaybeUninit.

So that leaves two things from this ACP:

Remove slice_as_ptr. The proposed solution adds a conversion to unwrap a MaybeUninit pointer into just a T pointer. I doubt whatever general container solution we come up with would be able to handle this, so I think it's still relevant.
Make slice_as_bytes inherent. Same as above, this is a transmute, so probs not handled by the solution we come up with.

I'm just speculating here, so I'd also be ok with closing everything and saying we'll wait to find out what the general solution looks like.

Add small clarification around using pointers derived from references r? `@RalfJung` One question about your example from rust-lang/libs-team#122: at what point does UB arise? If writing 0 does not cause UB and the reference `x` is never read or written to (explicitly or implicitly by being wrapped in another data structure) after the call to `foo`, does UB only arise when dropping the value? I don't really get that since I thought references were always supposed to point to valid data? ```rust fn foo(x: &mut NonZeroI32) { let ptr = x as *mut NonZeroI32; unsafe { ptr.cast::<i32>().write(0); } // no UB here // What now? x is considered garbage when? } ```

nuttycom · 2023-03-08T20:17:18Z

Also, transpose is a rather undescriptive name. So I am not very happy about this MaybeUninit::uninit().transpose()... I feel like the name should indicate somehow that this is about arrays/slices.

(I know it has precedence for Option/Result swapping, but already there I found the same pretty unhelpful. The only technical meaning of 'transpose' I am away of is to mirror a matrix along its diagonal, and that doesn't really have any connection with what happens here.)

IMO transpose is a better name than the operation that it specializes usually gets: https://hackage.haskell.org/package/base-4.18.0.0/docs/Prelude.html#v:sequenceA

SUPERCILEX · 2023-06-17T19:03:47Z

Adding functionality using on [X] as Self (like impl [MaybeUninit] { … }) has some problems. The methods will appear on the documentation of [], and tools like auto completion will suggest them as methods on []. Not using self can also be annoying, having to spell out e.g. AtomicU8::get_mut_slice. And impl [MyType] { … } is not allowed outside of core, and therefore should probably be avoided inside core as much as possible.

I think we can strike a good balance by making the important methods inherent. I think that means slice_assume_init (and mut) should be inherent while the others can remain static methods. This will hit the most common use cases and make the assume_init API look consistent. The copy slice methods are arguably like ptr::copy_nonoverlapping so that's consistent.

Thoughts?

SUPERCILEX · 2023-06-17T19:06:08Z

https://github.com/rust-lang/rust/pull/103131/files shows just how common the assume_init methods are and they end up being much more concise.

Rename MaybeUninit::write_slice A step to push rust-lang#79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.

Rename MaybeUninit::write_slice A step to push #79995 forward. rust-lang/libs-team#122 also suggested to make them inherent methods, but they can't be — they'd conflict with slice's regular methods.

SUPERCILEX added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Oct 17, 2022

SUPERCILEX mentioned this issue Oct 17, 2022

Kill array_assume_init rust-lang/rust#103134

Open

SUPERCILEX changed the title ~~Remove MaybeUninit::{uninit_array,array_assume_init} now that transpose APIs are available~~ Remove MaybeUninit::{uninit_array,array_assume_init} now that transpose APIs are available Oct 17, 2022

SUPERCILEX mentioned this issue Oct 17, 2022

Nuke slice_as{,_mut}_ptr methods of MaybeUninit rust-lang/rust#103133

Closed

SUPERCILEX mentioned this issue Oct 18, 2022

Tidy up maybe_uninit_as_bytes API rust-lang/rust#103128

Closed

SUPERCILEX mentioned this issue Oct 19, 2022

Deprecate uninit_array rust-lang/rust#101179

Open

m-ou-se added the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Oct 19, 2022

beepster4096 mentioned this issue Nov 3, 2022

Tracking Issue for maybe_uninit_write_slice rust-lang/rust#79995

Open

4 tasks

SUPERCILEX mentioned this issue Nov 5, 2022

Add small clarification around using pointers derived from references rust-lang/rust#103996

Merged

SUPERCILEX mentioned this issue Nov 19, 2022

What if we just add <[MaybeUninit<T>; N]>::assume_init directly? rust-lang/rust#104475

Closed

ajwock mentioned this issue Dec 30, 2022

&mut [MaybeUninit<T>]: Fill with value or with closure. #156

Closed

m-ou-se removed the I-libs-api-nominated Indicates that an issue has been nominated for discussion during a team meeting. label Jan 17, 2023

SUPERCILEX mentioned this issue May 6, 2023

Tracking issue for #![feature(maybe_uninit_slice)] rust-lang/rust#63569

Open

1 task

SUPERCILEX mentioned this issue May 17, 2023

Tidy up maybe_uninit_write_slice API rust-lang/rust#103130

Closed

SUPERCILEX mentioned this issue Jun 26, 2023

Remove MaybeUninit::slice_as_(mut_)ptr and optionally add *const/*mut MaybeUninit<T> -> *const/*mut T type safe conversions #245

Closed

kornelski mentioned this issue Oct 3, 2023

Rename MaybeUninit::write_slice rust-lang/rust#116385

Merged

Revamp unstable MaybeUninit APIs #122

Revamp unstable MaybeUninit APIs #122

Comments

SUPERCILEX commented Oct 17, 2022 • edited

Goals

Problematic APIs

MaybeUninit::{uninit_array,array_assume_init}

Proposal

Alternatives

Implement index traits on MaybeUninit<[T; N]>

slice_assume_init variants

Proposal

as_bytes variants

Proposal

slice_as_ptr variants

Proposal

write_slice variants

Proposal

uninit

Proposal

Drawbacks

API diff

the8472 commented Oct 17, 2022

SUPERCILEX commented Oct 17, 2022

the8472 commented Oct 17, 2022

SUPERCILEX commented Oct 17, 2022

ChrisDenton commented Oct 18, 2022 • edited

SUPERCILEX commented Oct 18, 2022 • edited

ChrisDenton commented Oct 18, 2022

SUPERCILEX commented Oct 18, 2022

SUPERCILEX commented Oct 18, 2022

ChrisDenton commented Oct 18, 2022

SUPERCILEX commented Oct 18, 2022

ChrisDenton commented Oct 18, 2022

SUPERCILEX commented Oct 18, 2022 • edited

thomcc commented Oct 18, 2022 • edited

scottmcm commented Oct 18, 2022

SUPERCILEX commented Oct 18, 2022

scottmcm commented Oct 18, 2022

thomcc commented Oct 19, 2022

SUPERCILEX commented Oct 19, 2022 • edited

RalfJung commented Oct 19, 2022 • edited

RalfJung commented Oct 19, 2022

SUPERCILEX commented Oct 20, 2022

RalfJung commented Oct 26, 2022

SUPERCILEX commented Oct 28, 2022

RalfJung commented Oct 29, 2022

SUPERCILEX commented Oct 29, 2022

RalfJung commented Oct 31, 2022 • edited

SUPERCILEX commented Nov 1, 2022

ClydeHobart commented Nov 1, 2022 • edited

SUPERCILEX commented Nov 5, 2022

SUPERCILEX commented Nov 19, 2022

m-ou-se commented Jan 17, 2023

SUPERCILEX commented Jan 17, 2023

nuttycom commented Mar 8, 2023

SUPERCILEX commented Jun 17, 2023

SUPERCILEX commented Jun 17, 2023

SUPERCILEX commented Oct 17, 2022 •

edited

`MaybeUninit::{uninit_array,array_assume_init}`

Implement index traits on `MaybeUninit<[T; N]>`

`slice_assume_init` variants

`as_bytes` variants

`slice_as_ptr` variants

`write_slice` variants

`uninit`

ChrisDenton commented Oct 18, 2022 •

edited

SUPERCILEX commented Oct 18, 2022 •

edited

SUPERCILEX commented Oct 18, 2022 •

edited

thomcc commented Oct 18, 2022 •

edited

SUPERCILEX commented Oct 19, 2022 •

edited

RalfJung commented Oct 19, 2022 •

edited

RalfJung commented Oct 31, 2022 •

edited

ClydeHobart commented Nov 1, 2022 •

edited