Add support for in-place map for `Vec`s of types with same size #15302

tbu- · 2014-07-01T13:16:24Z

This is implemented using a new struct PartialVec which implements the proper
drop semantics in case the conversion is interrupted by an unwind.

huonw · 2014-07-01T13:20:51Z

src/libcollections/vec.rs

+    ///
+    /// Fails if not all `T`s were popped, also fails if not the same amount of
+    /// `U`s was pushed before calling `unwrap`.
+    pub fn unwrap(self) -> Vec<U> {


This should probably be called into_vec.

tbu- · 2014-07-01T22:28:59Z

@huonw I addressed your comment.

tbu- · 2014-07-03T13:29:27Z

The use case for this is something along the lines of being able to typesafely convert a read BigEndianU32 to u32, like this C code (for little-endian machines).

uint32_t *array;
size_t array_length;
for(size_t i = 0; i < array_length; i++)
{
    uint8_t *bytes = &array[i];
    array[i] = (bytes[0]<<24) | (bytes[1]<<16) | (bytes[2]<<8) | (bytes[3])
}

In Rust with that change it would be:

struct BigEndianU32(...);
let array: Vec<BigEndianU32>;
let array: Vec<u32> = array.map_inplace(|x| x.to_u32());

alexcrichton · 2014-07-05T20:20:46Z

While this seems like a very real problem, adding a specific solution for only the Vec container and no other seems like it's not the best solution to take here. Additionally, as noted in the FIXME, there is no static assertion that the two types are of the same size, encouraging dynamic failure, which the standard library in general avoids.

Do you know if it would be possible to extend this to the map().collect() pattern in general? This, for example, doesn't work with hash maps, hash sets, tree maps, etc. (which all suffer the same problem).

thestinger · 2014-07-05T20:22:53Z

There's always the option of just using mut_iter().

heinrich5991 · 2014-07-05T23:49:28Z

@thestinger This only works if the source and the target type are the same, Vec<BigEndianU32> to Vec<u32> wouldn't work.

tbu- · 2014-07-06T18:59:01Z

@alexcrichton I thought about extending it to other containers, and it looks like this isn't easily possible, as the container's internal memory layout could differ between a Container<uint> and a Container<&uint>. This could be illustrated by the SmallIntMap: Although uint and &uint have the same size, the representation of the entries in the SmallIntMap does not, as the SmallIntMap uses Option<T> internally, making SmallIntMap<uint> not convertible in place to SmallIntMap<&uint>.

That being said, the general pattern could be applied to different containers as well, such as for the keys of a hashmap, the entries of a linked list, the keys in a tree, etc.

bluss · 2014-07-06T23:30:59Z

Is it useful to expose the PartialVec as part of public API? Can it be enough to only provide map-inplace?

tbu- · 2014-07-07T14:01:53Z

Yes, that would work too.

lilyball · 2014-07-21T07:44:52Z

src/libcollections/vec.rs

+
+impl<T,U> PartialVec<T,U> {
+    /// Creates a `PartialVec` from a `Vec`.
+    pub fn new(mut vec: Vec<T>) -> PartialVec<T,U> {


This should perhaps be called from_vec() instead.

lilyball · 2014-07-21T08:26:26Z

I'm not a fan of the usage of pop() to remove the first element of the vector. It should perhaps be shift() instead.

I'd also love to see the zero-sized type restriction lifted. An if condition testing for a zero-size type on every function should allow you to deal with this, e.g. by repurposing the pointers as counters instead. It should be compiled away during monomorphization so each instance of PartialVec only contains the correct code (the zero-sized type vs the non-zero-sized type).

lilyball · 2014-07-21T09:04:19Z

As for generalizing this to other containers, it could actually theoretically work for any container that has the following invariants:

The Drop impl for the container drops all the elements and deallocates any allocated memory, and does nothing else.
The container provides some equivalent of set_len(0) that causes the Drop impl to merely deallocate allocated memory.
The container provides an Iterator<&mut T> iterator. The method that provides it does not mutate the container, and calling .next() on the iterator does not mutate the container.
The mutable iterator yields the same sequence of elements every time, if the container has not been otherwise mutated

Given this, the PartialVec impl could use a counter to keep track of how many elements it's popped or pushed, and maintain two parallel mutable iterators (which unfortunately does violate the lifetime system, but I believe it's safe given the above invariants), one for elements to read from and one for slots to write to. Given the counters and the iterators, it can then handle traversing arbitrary container structures to rewrite the elements, and similarly handle traversing the container to destruct the elements.

This does require less obviously-safe code, of course. And I am a bit uncomfortable about maintaining parallel iterators like that, because if the iterator uses a &mut reference internally then it's technically a violation of the aliasing rules, but since it never actually writes through that &mut reference then I think it's not UB. The individual &mut T references yielded by the iterator would in fact be ensured to remain unique. This whole issue could also be avoided by using RandomAccessIterator instead, but that would severely limit the collections this would work with.

And of course there's no way for the compiler to enforce the above invariants. A trait would have to be written that documents the invariants (and provides the relevant methods), and all implementors would have to be sure they meet said invariants.

For the trait itself, HKTs would be extremely helpful in representing the type signature, but they're not actually necessary. If the trait takes two type parameters, the "from" container type and the "to" container type, then it can work. In fact, this would actually let you map between two disparate container definitions if they use the same memory layout under the hood (and opted in to this via the trait conformance). Unfortunately, due to the need for defining a mut_iter() function, the type signature of the trait gets extremely complicated, and would look something like:

pub trait InPlaceIterable<'a, T, It: Iterator<&'a mut T>> {
    fn in_place_mut_iter(&'a mut self) -> It;
}

/// Implementors of this trait must follow the documented invariants.
/// Any implementor of this trait will also need to implement
/// InPlaceIterable to actually be useful
pub trait InPlaceMappable {
    fn ensure_drop_no_elements(&mut self); // set_len(0)
}

// C_T is the "source" collection, e.g. Vec<T>
// C_U is the "destination" collection e.g. Vec<U>
pub struct PartialMapper<C_T, C_U> { ... }

impl<'a, T, U, T_It: Iterator<&'a mut T>,
        C_T: InPlaceIterable<'a, T, T_It> + InPlaceMappable,
        // C_U doesn't need InPlaceIterable because we only construct iterators from
        // C_T, and transmute the &mut T to &mut U as necessary.
        // Similarly it doesn't need InPlaceMappable
        C_U> PartialMapper<C_T, C_U> {
    pub fn new(source: C_T) -> PartialMapper<C_T, C_U> { ... }
    pub fn to_dest(self) -> C_U { ... }
    pub fn shift(&mut self) -> Option<T> { ... }
    pub fn push(&mut self, val: U) { ... }
}

tbu- · 2014-07-21T14:23:53Z

@kballard I have explained above why it can't work unless we know the internal memory layout of the container, see for example SmallIntMap. If you add to your requirements that the elements of T are laid out in memory as simple instances of T, that's fine, but once it's something like Option<T> it won't work anymore.

Otherwise, this still has some problems. What about containers that keep their elements sorted, or hashed, like HashSet, some kind of binary tree or else. In the end it looks like this can almost only meaningfully be supported for Vec-like structures.

lilyball · 2014-07-21T18:27:20Z

@tbu- Well, it can work for any container that can vend an Iterator<&mut T>. You're right that things like SmallIntMap and containers that sort based on value can't do that, but you could e.g. use this to map the value type of a HashMap<K,V> (you wouldn't be able to use the key in the mapping though). Similarly, you could use this for things like DList.

That said, I think you're right in assuming that Vec is the primary use-case for this kind of functionality.

tbu- · 2014-07-21T19:33:24Z

@kballard It doesn't really work for SmallIntMap which can (could) give out an Iterator<&mut T> (the Option thing is not preventing it, you can get a Option<&mut T> from a Option<T>.

But you're right that it could be useful for DList or even on the values of a HashMap.

lilyball · 2014-07-21T20:37:12Z

@tbu- I was making assumptions about SmallIntMap based on your claim that it doesn't work, but looking at how SmallIntMap works, well, theoretically in a perfect world it would work. But AFAIK, it can't work only due to the null-pointer optimization, e.g. if either T XOR U are null-pointer-optimized then the layout of Option<T> and Option would differ. This could theoretically be asserted at runtime by an InPlaceMappable method that is designed for this purpose, but the addition of yet more runtime failure is really unfortunate (especially as the runtime failure would differ for types that care about null-pointer-optimization vs types that don't).

lilyball · 2014-07-21T20:39:53Z

Incidentally, the size_of assertions are actually wrong, they need to be min_align_of instead, as that's what Vec actually uses for the stride between two elements.

tbu- · 2014-07-21T22:13:42Z

@kballard I don't think so:

pub fn with_capacity(capacity: uint) -> Vec<T> {
    [...] {
        let size = capacity.checked_mul(&mem::size_of::<T>())
                           .expect("capacity overflow");
        let ptr = unsafe { allocate(size, mem::min_align_of::<T>()) };
        Vec { len: 0, cap: capacity, ptr: ptr as *mut T }
    }
}

The mem::min_align_of::<T>() parameter of allocate seems to be the alignment:

pub unsafe fn allocate(size: uint, align: uint) -> *mut u8;

tbu- · 2014-07-21T22:41:22Z

@kballard It's not just null-pointer optimization btw, we have also have unspecified struct layout, so other enum optimizations could come.

lilyball · 2014-07-21T23:46:11Z

@tbu- Err yeah, you're right, I misinterpreted. It's not used as the stride between two elements. But it is used. So I guess you need to assert that both size_of and min_align_of are the same for T and U, otherwise deallocation of the resulting Vec will be wrong.

lilyball · 2014-07-21T23:53:54Z

@tbu- I'm actually unclear on how the unspecified struct layout is supposed to behave in the face of type parameters. On the one hand, transmuting between differently-parameterized variants of the same type assumes that the layout is stable for all type parameters, but on the other hand, allowing the layout to differ depending on the parameters is a potential optimization that we'd like to have.

Assuming that the layout is not required to be stable for all possible parameters to a type, it's actually technically not even correct to transmute between Vec<T> and Vec. There's no good reason for the layouts to differ as the type parameter is hidden behind a pointer, but that doesn't mean it's correct to assume the layout is stable.

Given that, I think that you need to verify what guarantees are made about the layout of generic types. If the layout of Vec<T> and Vec cannot actually be guaranteed to be stable (according to the rules, rather than the current compiler implementation), then I think you're going to have to submit an RFC to modify the struct layout guarantees to guarantee that the layout is the same for all pairs of instances Type<T> and Type where size_of::<T>() = size_of::() and min_align_of::<T>() == min_align_of::(). Those two requirements are the ones you're relying on already (well, you're relying on size right now, but you need to rely on min align as well). They're also required in order to transmute between two versions of the struct where the type is used directly, although I'm pretty sure that right now it's a compiler error to use transmute() when a type parameter can influence the size of the overall type.

huonw · 2014-07-21T23:55:34Z

I think it would be very reasonable for Vec to have a fixed layout. There's not much reordering the fields of a Vec will do in terms of optimisation.

tbu- · 2014-07-22T00:00:31Z

@kballard That Vec transmute can be fixed, get the raw parts (not possible with destructuring unfortunately, needs to be done manually), forget the old Vec and construct a new one.

lilyball · 2014-07-22T00:28:41Z

@tbu- True, you can build from raw parts. If you're doing that as a transmute-equivalent you will need to make sure that e.g. .as_mut_ptr() actually always returns the underlying pointer. Right now it does, but until very recently it actually returned different values depending on whether the element was a zero size. I promoted getting this fixed, but if you're going to rely on it you'll probably want to add a comment to the as_ptr() and as_mut_ptr() methods documenting that they must return the pointer field as-is.

lilyball · 2014-07-22T00:30:28Z

Of course, you could also have InPlaceMappable take the destination type and provide fn xmute(self) -> C_U; itself, thus requiring that the type ensure that the transformation is valid.

At this point I'm going to stop talking about how we could theoretically do this for arbitrary collections. I do like exploring the topic, but I'm uncomfortable with the number of invariants that the compiler is unable to check, and as you pointed out most collections actually can't take advantage of this.

This is implemented using a new struct `PartialVec` which implements the proper drop semantics in case the conversion is interrupted by an unwind.

This specifically includes: - Fix of the tests - Remove `transmute` between `Vec`s of different types

alexcrichton · 2014-08-06T14:38:36Z

Closing due to inactivity, but feel free to reopen with a rebase!

This is implemented using a new struct PartialVec which implements the proper drop semantics in case the conversion is interrupted by an unwind. For the old pull requests, see #15302, #16369.

minor: Sync from rust

huonw reviewed Jul 1, 2014
View reviewed changes

lilyball reviewed Jul 21, 2014
View reviewed changes

tbu- added 3 commits August 3, 2014 12:37

Add support for in-place map for Vecs of types with same size

33d5288

This is implemented using a new struct `PartialVec` which implements the proper drop semantics in case the conversion is interrupted by an unwind.

PartialVec: Remove TODOs and rename unwrap to into_vec

87f510c

Fix some of the issues mentioned in the PR on Github

d871604

This specifically includes: - Fix of the tests - Remove `transmute` between `Vec`s of different types

alexcrichton closed this Aug 6, 2014

tbu- mentioned this pull request Aug 8, 2014

Add support for in-place map for Vecs of types with same size #16369

Closed

tbu- mentioned this pull request Aug 29, 2014

Add support for in-place map for Vecs of types with same size (take 3) #16853

Merged

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2023

Auto merge of rust-lang#15302 - lnicola:sync-from-rust, r=lnicola

8174450

minor: Sync from rust

tbu- mentioned this pull request Jul 25, 2023

Comments: Learn Unsafe Rust from My Mistakes geo-ant/blog#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for in-place map for `Vec`s of types with same size #15302

Add support for in-place map for `Vec`s of types with same size #15302

tbu- commented Jul 1, 2014

huonw Jul 1, 2014

tbu- commented Jul 1, 2014

tbu- commented Jul 3, 2014

alexcrichton commented Jul 5, 2014

thestinger commented Jul 5, 2014

heinrich5991 commented Jul 5, 2014

tbu- commented Jul 6, 2014

bluss commented Jul 6, 2014

tbu- commented Jul 7, 2014

lilyball Jul 21, 2014

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

huonw commented Jul 21, 2014

tbu- commented Jul 22, 2014

lilyball commented Jul 22, 2014

lilyball commented Jul 22, 2014

alexcrichton commented Aug 6, 2014

Add support for in-place map for Vecs of types with same size #15302

Add support for in-place map for Vecs of types with same size #15302

Conversation

tbu- commented Jul 1, 2014

huonw Jul 1, 2014

Choose a reason for hiding this comment

tbu- commented Jul 1, 2014

tbu- commented Jul 3, 2014

alexcrichton commented Jul 5, 2014

thestinger commented Jul 5, 2014

heinrich5991 commented Jul 5, 2014

tbu- commented Jul 6, 2014

bluss commented Jul 6, 2014

tbu- commented Jul 7, 2014

lilyball Jul 21, 2014

Choose a reason for hiding this comment

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

tbu- commented Jul 21, 2014

tbu- commented Jul 21, 2014

lilyball commented Jul 21, 2014

lilyball commented Jul 21, 2014

huonw commented Jul 21, 2014

tbu- commented Jul 22, 2014

lilyball commented Jul 22, 2014

lilyball commented Jul 22, 2014

alexcrichton commented Aug 6, 2014

Add support for in-place map for `Vec`s of types with same size #15302

Add support for in-place map for `Vec`s of types with same size #15302