Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for in-place map for Vecs of types with same size #15302

Closed
wants to merge 3 commits into from

Conversation

tbu-
Copy link
Contributor

@tbu- tbu- commented Jul 1, 2014

This is implemented using a new struct PartialVec which implements the proper
drop semantics in case the conversion is interrupted by an unwind.

///
/// Fails if not all `T`s were popped, also fails if not the same amount of
/// `U`s was pushed before calling `unwrap`.
pub fn unwrap(self) -> Vec<U> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be called into_vec.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 1, 2014

@huonw I addressed your comment.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 3, 2014

The use case for this is something along the lines of being able to typesafely convert a read BigEndianU32 to u32, like this C code (for little-endian machines).

uint32_t *array;
size_t array_length;
for(size_t i = 0; i < array_length; i++)
{
    uint8_t *bytes = &array[i];
    array[i] = (bytes[0]<<24) | (bytes[1]<<16) | (bytes[2]<<8) | (bytes[3])
}

In Rust with that change it would be:

struct BigEndianU32(...);
let array: Vec<BigEndianU32>;
let array: Vec<u32> = array.map_inplace(|x| x.to_u32());

@alexcrichton
Copy link
Member

While this seems like a very real problem, adding a specific solution for only the Vec container and no other seems like it's not the best solution to take here. Additionally, as noted in the FIXME, there is no static assertion that the two types are of the same size, encouraging dynamic failure, which the standard library in general avoids.

Do you know if it would be possible to extend this to the map().collect() pattern in general? This, for example, doesn't work with hash maps, hash sets, tree maps, etc. (which all suffer the same problem).

@thestinger
Copy link
Contributor

There's always the option of just using mut_iter().

@heinrich5991
Copy link

@thestinger This only works if the source and the target type are the same, Vec<BigEndianU32> to Vec<u32> wouldn't work.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 6, 2014

@alexcrichton I thought about extending it to other containers, and it looks like this isn't easily possible, as the container's internal memory layout could differ between a Container<uint> and a Container<&uint>. This could be illustrated by the SmallIntMap: Although uint and &uint have the same size, the representation of the entries in the SmallIntMap does not, as the SmallIntMap uses Option<T> internally, making SmallIntMap<uint> not convertible in place to SmallIntMap<&uint>.

That being said, the general pattern could be applied to different containers as well, such as for the keys of a hashmap, the entries of a linked list, the keys in a tree, etc.

@bluss
Copy link
Member

bluss commented Jul 6, 2014

Is it useful to expose the PartialVec as part of public API? Can it be enough to only provide map-inplace?

@tbu-
Copy link
Contributor Author

tbu- commented Jul 7, 2014

Yes, that would work too.


impl<T,U> PartialVec<T,U> {
/// Creates a `PartialVec` from a `Vec`.
pub fn new(mut vec: Vec<T>) -> PartialVec<T,U> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should perhaps be called from_vec() instead.

@lilyball
Copy link
Contributor

I'm not a fan of the usage of pop() to remove the first element of the vector. It should perhaps be shift() instead.

I'd also love to see the zero-sized type restriction lifted. An if condition testing for a zero-size type on every function should allow you to deal with this, e.g. by repurposing the pointers as counters instead. It should be compiled away during monomorphization so each instance of PartialVec only contains the correct code (the zero-sized type vs the non-zero-sized type).

@lilyball
Copy link
Contributor

As for generalizing this to other containers, it could actually theoretically work for any container that has the following invariants:

  1. The Drop impl for the container drops all the elements and deallocates any allocated memory, and does nothing else.
  2. The container provides some equivalent of set_len(0) that causes the Drop impl to merely deallocate allocated memory.
  3. The container provides an Iterator<&mut T> iterator. The method that provides it does not mutate the container, and calling .next() on the iterator does not mutate the container.
  4. The mutable iterator yields the same sequence of elements every time, if the container has not been otherwise mutated

Given this, the PartialVec impl could use a counter to keep track of how many elements it's popped or pushed, and maintain two parallel mutable iterators (which unfortunately does violate the lifetime system, but I believe it's safe given the above invariants), one for elements to read from and one for slots to write to. Given the counters and the iterators, it can then handle traversing arbitrary container structures to rewrite the elements, and similarly handle traversing the container to destruct the elements.

This does require less obviously-safe code, of course. And I am a bit uncomfortable about maintaining parallel iterators like that, because if the iterator uses a &mut reference internally then it's technically a violation of the aliasing rules, but since it never actually writes through that &mut reference then I think it's not UB. The individual &mut T references yielded by the iterator would in fact be ensured to remain unique. This whole issue could also be avoided by using RandomAccessIterator instead, but that would severely limit the collections this would work with.

And of course there's no way for the compiler to enforce the above invariants. A trait would have to be written that documents the invariants (and provides the relevant methods), and all implementors would have to be sure they meet said invariants.

For the trait itself, HKTs would be extremely helpful in representing the type signature, but they're not actually necessary. If the trait takes two type parameters, the "from" container type and the "to" container type, then it can work. In fact, this would actually let you map between two disparate container definitions if they use the same memory layout under the hood (and opted in to this via the trait conformance). Unfortunately, due to the need for defining a mut_iter() function, the type signature of the trait gets extremely complicated, and would look something like:

pub trait InPlaceIterable<'a, T, It: Iterator<&'a mut T>> {
    fn in_place_mut_iter(&'a mut self) -> It;
}

/// Implementors of this trait must follow the documented invariants.
/// Any implementor of this trait will also need to implement
/// InPlaceIterable to actually be useful
pub trait InPlaceMappable {
    fn ensure_drop_no_elements(&mut self); // set_len(0)
}

// C_T is the "source" collection, e.g. Vec<T>
// C_U is the "destination" collection e.g. Vec<U>
pub struct PartialMapper<C_T, C_U> { ... }

impl<'a, T, U, T_It: Iterator<&'a mut T>,
        C_T: InPlaceIterable<'a, T, T_It> + InPlaceMappable,
        // C_U doesn't need InPlaceIterable because we only construct iterators from
        // C_T, and transmute the &mut T to &mut U as necessary.
        // Similarly it doesn't need InPlaceMappable
        C_U> PartialMapper<C_T, C_U> {
    pub fn new(source: C_T) -> PartialMapper<C_T, C_U> { ... }
    pub fn to_dest(self) -> C_U { ... }
    pub fn shift(&mut self) -> Option<T> { ... }
    pub fn push(&mut self, val: U) { ... }
}

@tbu-
Copy link
Contributor Author

tbu- commented Jul 21, 2014

@kballard I have explained above why it can't work unless we know the internal memory layout of the container, see for example SmallIntMap. If you add to your requirements that the elements of T are laid out in memory as simple instances of T, that's fine, but once it's something like Option<T> it won't work anymore.

Otherwise, this still has some problems. What about containers that keep their elements sorted, or hashed, like HashSet, some kind of binary tree or else. In the end it looks like this can almost only meaningfully be supported for Vec-like structures.

@lilyball
Copy link
Contributor

@tbu- Well, it can work for any container that can vend an Iterator<&mut T>. You're right that things like SmallIntMap and containers that sort based on value can't do that, but you could e.g. use this to map the value type of a HashMap<K,V> (you wouldn't be able to use the key in the mapping though). Similarly, you could use this for things like DList.

That said, I think you're right in assuming that Vec is the primary use-case for this kind of functionality.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 21, 2014

@kballard It doesn't really work for SmallIntMap which can (could) give out an Iterator<&mut T> (the Option thing is not preventing it, you can get a Option<&mut T> from a Option<T>.

But you're right that it could be useful for DList or even on the values of a HashMap.

@lilyball
Copy link
Contributor

@tbu- I was making assumptions about SmallIntMap based on your claim that it doesn't work, but looking at how SmallIntMap works, well, theoretically in a perfect world it would work. But AFAIK, it can't work only due to the null-pointer optimization, e.g. if either T XOR U are null-pointer-optimized then the layout of Option<T> and Option<U> would differ. This could theoretically be asserted at runtime by an InPlaceMappable method that is designed for this purpose, but the addition of yet more runtime failure is really unfortunate (especially as the runtime failure would differ for types that care about null-pointer-optimization vs types that don't).

@lilyball
Copy link
Contributor

Incidentally, the size_of assertions are actually wrong, they need to be min_align_of instead, as that's what Vec actually uses for the stride between two elements.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 21, 2014

@kballard I don't think so:

pub fn with_capacity(capacity: uint) -> Vec<T> {
    [...] {
        let size = capacity.checked_mul(&mem::size_of::<T>())
                           .expect("capacity overflow");
        let ptr = unsafe { allocate(size, mem::min_align_of::<T>()) };
        Vec { len: 0, cap: capacity, ptr: ptr as *mut T }
    }
}

The mem::min_align_of::<T>() parameter of allocate seems to be the alignment:

pub unsafe fn allocate(size: uint, align: uint) -> *mut u8;

@tbu-
Copy link
Contributor Author

tbu- commented Jul 21, 2014

@kballard It's not just null-pointer optimization btw, we have also have unspecified struct layout, so other enum optimizations could come.

@lilyball
Copy link
Contributor

@tbu- Err yeah, you're right, I misinterpreted. It's not used as the stride between two elements. But it is used. So I guess you need to assert that both size_of and min_align_of are the same for T and U, otherwise deallocation of the resulting Vec will be wrong.

@lilyball
Copy link
Contributor

@tbu- I'm actually unclear on how the unspecified struct layout is supposed to behave in the face of type parameters. On the one hand, transmuting between differently-parameterized variants of the same type assumes that the layout is stable for all type parameters, but on the other hand, allowing the layout to differ depending on the parameters is a potential optimization that we'd like to have.

Assuming that the layout is not required to be stable for all possible parameters to a type, it's actually technically not even correct to transmute between Vec<T> and Vec<U>. There's no good reason for the layouts to differ as the type parameter is hidden behind a pointer, but that doesn't mean it's correct to assume the layout is stable.

Given that, I think that you need to verify what guarantees are made about the layout of generic types. If the layout of Vec<T> and Vec<U> cannot actually be guaranteed to be stable (according to the rules, rather than the current compiler implementation), then I think you're going to have to submit an RFC to modify the struct layout guarantees to guarantee that the layout is the same for all pairs of instances Type<T> and Type<U> where size_of::<T>() = size_of::<U>() and min_align_of::<T>() == min_align_of::<U>(). Those two requirements are the ones you're relying on already (well, you're relying on size right now, but you need to rely on min align as well). They're also required in order to transmute between two versions of the struct where the type is used directly, although I'm pretty sure that right now it's a compiler error to use transmute() when a type parameter can influence the size of the overall type.

@huonw
Copy link
Member

huonw commented Jul 21, 2014

I think it would be very reasonable for Vec to have a fixed layout. There's not much reordering the fields of a Vec will do in terms of optimisation.

@tbu-
Copy link
Contributor Author

tbu- commented Jul 22, 2014

@kballard That Vec transmute can be fixed, get the raw parts (not possible with destructuring unfortunately, needs to be done manually), forget the old Vec and construct a new one.

@lilyball
Copy link
Contributor

@tbu- True, you can build from raw parts. If you're doing that as a transmute-equivalent you will need to make sure that e.g. .as_mut_ptr() actually always returns the underlying pointer. Right now it does, but until very recently it actually returned different values depending on whether the element was a zero size. I promoted getting this fixed, but if you're going to rely on it you'll probably want to add a comment to the as_ptr() and as_mut_ptr() methods documenting that they must return the pointer field as-is.

@lilyball
Copy link
Contributor

Of course, you could also have InPlaceMappable take the destination type and provide fn xmute(self) -> C_U; itself, thus requiring that the type ensure that the transformation is valid.

At this point I'm going to stop talking about how we could theoretically do this for arbitrary collections. I do like exploring the topic, but I'm uncomfortable with the number of invariants that the compiler is unable to check, and as you pointed out most collections actually can't take advantage of this.

This is implemented using a new struct `PartialVec` which implements the proper
drop semantics in case the conversion is interrupted by an unwind.
This specifically includes:
- Fix of the tests
- Remove `transmute` between `Vec`s of different types
@alexcrichton
Copy link
Member

Closing due to inactivity, but feel free to reopen with a rebase!

bors added a commit that referenced this pull request Sep 15, 2014
This is implemented using a new struct PartialVec which implements the proper
drop semantics in case the conversion is interrupted by an unwind.

For the old pull requests, see #15302, #16369.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants