Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for vec_into_raw_parts #65816

Open
2 tasks
shepmaster opened this issue Oct 25, 2019 · 31 comments
Open
2 tasks

Tracking issue for vec_into_raw_parts #65816

shepmaster opened this issue Oct 25, 2019 · 31 comments
Labels
A-collections A-raw-pointers B-unstable C-tracking-issue Libs-Tracked T-libs-api

Comments

@shepmaster
Copy link
Member

@shepmaster shepmaster commented Oct 25, 2019

#65705 adds:

impl String {
    pub fn into_raw_parts(self) -> (*mut u8, usize, usize) {…}
}
impl<T> Vec<T> {
    pub fn into_raw_parts(self) -> (*mut T, usize, usize) {…}
}

Things to evaluate before stabilization

@shepmaster shepmaster added A-collections T-libs-api C-tracking-issue labels Oct 25, 2019
@jonas-schievink jonas-schievink added the B-unstable label Oct 25, 2019
@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 25, 2019

Should the returned pointer be a NonNull instead? Box::raw doesn’t do that because it was stabilized before NonNull was.

Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
@shepmaster
Copy link
Member Author

@shepmaster shepmaster commented Oct 25, 2019

Should these functions be associated functions like Box::into_raw? The Box case is required because we want to avoid colliding with methods exposed by Deref / DerefMut. This isn't a concern for String or Vec<T>, but it may be nice to be consistent across this family of similar functions.

@matklad
Copy link
Member

@matklad matklad commented Oct 28, 2019

Here's an example where this might have helped with avoiding UB: confio/go-rust-demo#1 (comment). There, the vector was destructed manually and mem::forgetted, however, len was used in place of both length and capacity. If Vec::from_raw_parts was a thing, it might have prevented this issue, as, with cap on your hands, you are more likely to do the right thing with it.

danielhenrymantilla added a commit to danielhenrymantilla/rust that referenced this issue Nov 24, 2019
Use `Vec::into_raw_parts` instead of a manual implementation of
`Vec::transmute`.

If `Vec::into_raw_parts` uses `NonNull` instead, then the code here
will need to be adjusted to take it into account (issue rust-lang#65816)
danielhenrymantilla added a commit to danielhenrymantilla/rust that referenced this issue Feb 4, 2020
Updated tracking issue number

Added safeguards for transmute_vec potentially being factored out elsewhere

Clarified comment about avoiding mem::forget

Removed unneeded unstable guard

Added back a stability annotation for CI

Minor documentation improvements

Thanks to @Centril's code review

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>

Improved layout checks, type annotations and removed unaccurate comment

Removed unnecessary check on array layout

Adapt the stability annotation to the new 1.41 milestone

Co-Authored-By: Mazdak Farrokhzad <twingoow@gmail.com>

Simplify the implementation.

Use `Vec::into_raw_parts` instead of a manual implementation of
`Vec::transmute`.

If `Vec::into_raw_parts` uses `NonNull` instead, then the code here
will need to be adjusted to take it into account (issue rust-lang#65816)

Reduce the whitespace of safety comments
@Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Feb 15, 2020

If we view this as being the opposite of Vec::from_raw_parts the answers become fairly obvious:

  • The pointer shoud be a plain raw pointer, because that's the type that from_raw_parts takes.
  • The method should be associated so that both visually it matches the style used for from_raw_parts (in a sense), and also because this is a "weird" method that editors don't need to be suggesting with auto-complete.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Feb 15, 2020

I think it’s not at all obvious that deliberately making this method "weird" is a good thing.

@Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Feb 15, 2020

The method is weird either way. "This uses the vec by self, but won't free the memory" is sufficiently out there.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Feb 15, 2020

I don’t see how that makes it weird. Vec::into_boxed_slice is the same. It’s pretty normal for conversions to take self by value.

@Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Feb 15, 2020

But into_boxed_slice doesn't leak the memory

@ghost
Copy link

@ghost ghost commented Feb 17, 2020

Well,the documentation does a great job telling you how to avoid a leak and also leaking it is not unsafe.

@Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Feb 17, 2020

I don't disagree with any of that, but also my opinion is unchanged.

It's not a big deal to have this as associated function vs method, but shepmaster is right that the associated version function makes this meta-family of functions feel consistent across the alloc crate.

But again it's a totally unimportant difference either way, and we should just stabilize either form and get on with the day.

@ghost
Copy link

@ghost ghost commented Feb 17, 2020

In my opinion functions that start with into_ should be methods to match the definition of other methods and the Into trait unless can collide with methods on the Deref::Target while the rest can be left as associated functions if it is conventional.

@hsivonen

This comment has been minimized.

@RustyYato

This comment has been minimized.

@shepmaster

This comment has been minimized.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Apr 27, 2020

@rust-lang/libs Any thoughts on the unresolved questions in the issue description? I’d be inclined to say:

  • This method should return a NonNull pointer: the conversion to a raw pointer is easy and safe, the reverse in either unsafe or has an unnecessary branch
  • It should not be changed to an associated function: Deref::Target is not to an arbitrary type, so unlike Box there is no risk of shadowing another method.

@BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Apr 27, 2020

@SimonSapin SGTM. The missing parallelism between definitions of into_raw_parts and from_raw_parts is a little unfortunate, but I think NonNull is the right choice for the reasons you stated.

@shepmaster
Copy link
Member Author

@shepmaster shepmaster commented Apr 27, 2020

I never stated my preference: I do think that it should be an associated function. My rationale is purely based on consistency. I will probably always use this function as Vec::into_raw_parts(some_vec) so that I don't have to think "oh, is this something that implements Deref so I should use it like this or that".

Arguing against myself, if the team goes with a method, the associated function syntax will still work, so I'll be able to do what I want.

In that case the methods won't be consistent across types, and to me will be a wart.

The missing parallelism [...] is a little unfortunate

To be clear, there's no technical reason that prevents the team from choosing to make this an associated function, AFAIK. There's just no technical reason that forces it to be an associated function.

@BurntSushi
Copy link
Member

@BurntSushi BurntSushi commented Apr 27, 2020

To clarify, by "missing parallelism," I was referring to the fact that from_raw_parts takes a raw pointer and we probably want into_raw_parts to return NonNull.

@KodrAus
Copy link
Contributor

@KodrAus KodrAus commented Jul 29, 2020

NonNull seems like the right type to me too. It exists now and we should use it. But I do think it would be good to look at how we might be able to offer some consistency across these container types before stabilizing this one either way. Since we're stuck with stable APIs effectively forever I think we need to be mindful of keeping things consistent over slowly shifting idioms.

@KodrAus KodrAus added Libs-Tracked A-raw-pointers labels Jul 29, 2020
@cmazakas
Copy link

@cmazakas cmazakas commented Jan 1, 2021

Why would the returned pointer be marked NonNull? Default constructed Vec doesn't allocate so it should return a null pointer in that case.

Edit: Just remembered the stdlib hacks the pointer into being something like 4 so it's technically NonNull. Very gross and I don't see much benefit but NonNull does make more sense now. Still feels disingenuous for an implementation to do when it specifically brags about never allocating until it needs to.

@Lokathor
Copy link
Contributor

@Lokathor Lokathor commented Jan 1, 2021

The vec pointer being NonNull gives Vec a niche, so that types like Option<Vec<T>> are the same size as Vec<T>

I don't see how it's disingenuous. It really doesn't allocate.

@cmazakas
Copy link

@cmazakas cmazakas commented Jan 6, 2021

Ah, that's an interesting implementation detail. I wasn't aware Rust made that kind of optimization so thank you for the correction.

What does a user gain if the result of into_raw_parts() returns a NonNull if they'd still have to actively handle the dangling case?

In the case of returning a simple *mut T, it's at least apparent that this pointer can be completely invalid (i.e. NULL). With a NonNull, a user most now infer its validity by checking the returned capacity.

Imo, it's much more apparent to check a raw pointer for being null than it is to check a NonNull for dangling.

In the case of Vec, using a NonNull internally enables nice space optimizations under common Rust idioms but I'm not sure that applies to this particular function. Moreso, into_raw_parts() better composes when pointers are easier to cast:

let (ptr, len, cap) = vec_of_uninit_bytes.into_raw_parts();
let vec = Vec::<u8>::from_raw_parts(ptr as *mut u8, len, cap);

I'm not sure if NonNull enables such easy casting for pointer convertible types.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Jan 6, 2021

You don’t need to check the pointer for being dangling. The valid region of memory you can access through the pointer has size capacity * size_of::<T>(). The pointer is dangling if and only if the capacity is zero, which means you cannot access any memory through the pointer regardless of its address.

NonNull has a cast method. But note, whether you use cast or as, transmuting a Vec<T> to Vec<U> in this way is only valid if:

size_of::<T>() * old_len == size_of::<U>() * new_len && 
size_of::<T>() * old_capacity == size_of::<U>() * new_capacity &&
align_of::<T>() == align_of::<U>()

@DomantasJ
Copy link

@DomantasJ DomantasJ commented Apr 8, 2021

Vec::into_raw_parts is currently defined in impl<T, A: Allocator> Vec<T, A>. Isn't this a potential footgun? The pointer you get is generally only usable with that specific allocator, so a function that throws it away seems to encourage doing the wrong thing.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Apr 8, 2021

That’s a good point. We could limit Vec::into_raw_parts to A = Global and have an additional method that also returns the allocator handle. It still wouldn’t really prevent misusing the raw pointer but would nudge into considering the allocator compatibility.

@DomantasJ
Copy link

@DomantasJ DomantasJ commented Apr 8, 2021

That method, Vec::into_raw_parts_with_alloc, already exists. I suspect this problem was just an oversight when adding allocator support.

@mina86
Copy link
Contributor

@mina86 mina86 commented May 12, 2021

NonNull has a cast method. But note, whether you use cast or as, transmuting a Vec<T> to Vec<U> in this way is only valid if:

size_of::<T>() * old_len == size_of::<U>() * new_len && 
size_of::<T>() * old_capacity == size_of::<U>() * new_capacity &&
align_of::<T>() == align_of::<U>()

Could I ask for clarification on that? The documentation of Vec::from_raw_parts indicates that size of T and U must be the same as well, cf. ‘T needs to have the same size and alignment as what ptr was allocated with.’ Is the documentation invalid, my understanding of it incorrect or requirements you’ve listed insufficient?

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented May 12, 2021

Changing the variable name to match my previous example:

U needs to have the same size and alignment as what ptr was allocated with.

This is not the same as “U needs to have the same size and alignment as T”. In fact I think this wording is slightly wrong, since ptr was allocated not with the layout (size and alignment) of a single T but with the layout of old_capacity times consecutive Ts. And similarly, in the Vec<U> about to be created, what matters as far as the allocator is concerned is the layout of new_capacity times consecutive Us, not the layout of a single U.

@lopopolo
Copy link
Contributor

@lopopolo lopopolo commented Jan 14, 2022

Hey folks, I don't mean to add noise to an old tracking issue, but I was hoping to add some input on the usability of returning the triple as a tuple.

I never remember what order capacity and length are returned in and always have to look at the docs. I have a small utility crate that makes this a bit easier by returning a struct with named fields. Is returning a struct instead of a tuple that has two fields of the same type too wild of a change to consider? I see that this thread already is considering breaking parallelism with Vec::from_raw_parts to return NonNull<T>.

https://crates.io/crates/raw-parts

@LegionMammal978
Copy link
Contributor

@LegionMammal978 LegionMammal978 commented Mar 14, 2022

To put in my own two cents (whatever they're worth), I really don't think that using NonNull<T> here would be worth the symmetry hit. Currently:

  • The standard library has 6 stable into_raw functions that return a non-null raw pointer: {Arc, Rc}::into_raw, Box::into_raw, CString::into_raw, and {rc, sync}::Weak::into_raw.
  • It has 12 stable from_raw-type functions that take a non-null raw pointer: {Arc, Rc}::from_raw, Box::from_raw, CString::from_raw, ptr::slice_from_raw_parts[_mut], {rc, sync}::Weak::from_raw, slice::from_raw_parts[_mut], String::from_raw_parts, and Vec::from_raw_parts.
  • All of the unstable variants which use the Allocator API also use *mut T exclusively, but those can always be modified. Finally, there is NonNull::slice_from_raw_parts which takes a NonNull<T>, but that is also unstable and explicitly part of the NonNull interface.

Especially since we already have String::from_raw_parts and Vec::from_raw_parts that take a *mut T, it would be very confusing to have these be the only two into_raw-type functions that return a NonNull<T>. It would require permanently explaining the historical baggage that kept Arc, Box, CString, Rc, and Weak from using the type for their own into_raw functions.

Meanwhile, a NonNull<T> has limited benefit over a *mut T in this context. By the definition of into_raw_parts, we know that the returned ptr is valid for writes up to capacity, and that it can be deallocated by a matching from_raw_parts. The further knowledge that it is non-null does not give us any more expressive power. The sole advantage of making ptr a NonNull<T> is to free up the all-zeroes niche for Option and option-like enums. If this is ever desired, NonNull::new and NonNull::new_unchecked are only a single function call away, and the non-null precondition for the latter is trivially satisfied by the documentation of into_raw_parts.

For code that really needs NonNull without the extra unsafe or unwrap, an alternative solution can be found by extending the existing unstable function NonNull::slice_from_raw_parts. We can add additional associated functions to the NonNull type that produce NonNull<T> values from Box, CString, String, Vec, etc. This way, the NonNull interface is self-contained, and we do not need to disrupt the existing symmetry found in the from_raw and into_raw methods of slices and heap-allocated types.

For these reasons, my preference is to continue to return (*mut T, usize, usize) for both of these functions, and possibly to add new associated functions to NonNull if they are found necessary. Keep in mind that code manipulating NonNull pointers will always require unsafe blocks and precondition checks in order to access them, so a trivially sound NonNull::new_unchecked call upon creation is a small ask in comparison.

@SypoolProtocol
Copy link

@SypoolProtocol SypoolProtocol commented Apr 27, 2022

I think we just need a raw pointer here, does not really need a NonNull wrapper over it since we probably would also make it as a raw pointer.
And I don't think we should make it an associated function, it should consume a vector in a method way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections A-raw-pointers B-unstable C-tracking-issue Libs-Tracked T-libs-api
Projects
None yet
Development

No branches or pull requests