Permalink
Switch branches/tags
Find file Copy path
404 lines (302 sloc) 16.4 KB

Summary

Introduce new APIs to libcore / libstd to serve as safe abstractions for data which cannot be safely moved around.

Motivation

A longstanding problem for Rust has been dealing with types that should not be moved. A common motivation for this is when a struct contains a pointer into its own representation - moving that struct would invalidate that pointer. This use case has become especially important recently with work on generators. Because generators essentially reify a stackframe into an object that can be manipulated in code, it is likely for idiomatic usage of a generator to result in such a self-referential type, if it is allowed.

This proposal adds an API to std which would allow you to guarantee that a particular value will never move again, enabling safe APIs that rely on self-references to exist.

Guide-level explanation

The core goal of this RFC is to provide a reference type where the referent is guaranteed to never move before being dropped. We want to do this with a minimum disruption to the type system, and in fact, this RFC shows that we can achieve the goal without any type system changes.

Let's take that goal apart, piece by piece, from the perspective of the futures (i.e. async/await) use case:

  • Reference type. The reason we need a reference type is that, when working with things like futures, we generally want to combine smaller futures into larger ones, and only at the top level put an entire resulting future into some immovable location. Thus, we need a reference type for methods like poll, so that we can break apart a large future into its smaller components, while retaining the guarantee about immobility.

  • Never to move before being dropped. Again looking at the futures case, once we being polling a future, we want it to be able to store references into itself, which is possible if we can guarantee that the whole future will never move. We don't try to track whether such references exist at the type level, since that would involve cumbersome typestate; instead, we simply decree that by the time you initially poll, you promise to never move an immobile future again.

At the same time, we want to support futures (and iterators, etc.) that can move. While it's possible to do so by providing two distinct Future (or Iterator, etc) traits, such designs incur unacceptable ergonomic costs.

The key insight of this RFC is that we can create a new library type, Pin<'a, T>, which encompasses both movable and immobile referents. The type is paired with a new auto trait, Unpin, which determines the meaning of Pin<'a, T>:

  • If T: Unpin (which is the default), then Pin<'a, T> is entirely equivalent to &'a mut T.
  • If T: !Unpin, then Pin<'a, T> provides a unique reference to a T with lifetime 'a, but only provides &'a T access safely. It also guarantees that the referent will never be moved. However, getting &'a mut T access is unsafe, because operations like mem::replace mean that &mut access is enough to move data out of the referent; you must promise not to do so.

To be clear: the sole function of Unpin is to control the meaning of Pin. Making Unpin an auto trait means that the vast majority of types are automatically "movable", so Pin degenerates to &mut. In the case that you need immobility, you opt out of Unpin, and then Pin becomes meaningful for your type.

Putting this all together, we arrive at the following definition of Future:

trait Future {
    type Item;
    type Error;

    fn poll(self: Pin<Self>, cx: &mut task::Context) -> Poll<Self::Item, Self::Error>;
}

By default when implementing Future for a struct, this definition is equivalent to today's, which takes &mut self. But if you want to allow self-referencing in your future, you just opt out of Unpin, and Pin takes care of the rest.

This RFC also provides a pinned analogy to Box called PinBox<T>. It works along the same lines as the Pin type discussed here - if the type implements Unpin, it functions the same as the unpinned Box; if the type has opted out of Unpin, it guarantees that they type behind the reference will not be moved again.

Reference-level explanation

The Unpin auto trait

This new auto trait is added to the core::marker and std::marker modules:

pub unsafe auto trait Unpin { }

A type that implements Unpin can be moved out of one of the pinned reference types discussed later in this RFC. Otherwise, they do not expose a safe API which allows you to move a value out of them. Because Unpin is an auto trait, most types in Rust implement Unpin. The types which don't are primarily self-referential types, like certain generators.

This trait is a lang item, but only to generate negative impls for certain generators. Unlike previous ?Move proposals, and unlike some traits like Sized and Copy, this trait does not impose any compiler-based semantics types that do or don't implement it. Instead, the semantics are entirely enforced through library APIs which use Unpin as a marker.

Pin

The Pin struct is added to both core::mem and std::mem. It is a new kind of reference, with stronger requirements than &mut T

#[fundamental]
pub struct Pin<'a, T: ?Sized + 'a> {
    data: &'a mut T,
}

Safe APIs

Pin implements Deref, but only implements DerefMut if the type it references implements Unpin. This way, it is not safe to call mem::swap or mem::replace when the type referenced does not implement Unpin.

impl<'a, T: ?Sized> Deref for Pin<'a, T> { ... }

impl<'a, T: Unpin + ?Sized> DerefMut for Pin<'a, T> { ... }

It can only be safely constructed from references to types that implement Unpin:

impl<'a, T: Unpin + ?Sized> Pin<'a, T> {
    pub fn new(reference: &'a mut T) -> Pin<'a, T> { ... }
}

It also has a function called borrow, which allows it to be transformed to a pin of a shorter lifetime:

impl<'a, T: ?Sized> Pin<'a, T> {
    pub fn borrow<'b>(this: &'b mut Pin<'a, T>) -> Pin<'b, T> { ... }
}

It may also implement additional APIs as is useful for type conversions, such as AsRef, From, and so on. Pin implements CoerceUnsized as necessary to make coercing them into trait objects possible.

Unsafe APIs

Pin can be unsafely constructed from mutable references to types that may not implement Unpin. Users who use this constructor must know that the type they are passing a reference to will never be moved again after the Pin is constructed, even after the lifetime of the reference has ended. (For example, it is always unsafe to construct a Pin from a reference you did not create, because you don't know what will happen once the lifetime of that reference ends.)

impl<'a, T: ?Sized> Pin<'a, T> {
    pub unsafe fn new_unchecked(reference: &'a mut T) -> Pin<'a, T> { ... }
}

Pin also has an API which allows it to be converted into a mutable reference for a type that doesn't implement Unpin. Users who use this API must guarantee that they never move out of the mutable reference they receive.

impl<'a, T: ?Sized> Pin<'a, T> {
    pub unsafe fn get_mut<'b>(this: &'b mut Pin<'a, T>) -> &'b mut T { ... }
}

Finally, as a convenience, Pin implements an unsafe map function, which makes it easier to project through a field. Users calling this function must guarantee that the value returned will not move as long as the referent of this pin doesn't move (e.g. it is a private field of the value). They also must not move out of the mutable reference they receive as the closure argument:

impl<'a, T: ?Sized> Pin<'a, T> {
    pub unsafe fn map<'b, U, F>(this: &'b mut Pin<'a, T>, f: F) -> Pin<'b, U>
	where F: FnOnce(&mut T) -> &mut U
    { ... }
}

// for example:
struct Foo {
    bar: Bar,
}

let foo_pin: Pin<Foo>;

let bar_pin: Pin<Bar> = unsafe { Pin::map(&mut foo_pin, |foo| &mut foo.bar) };
// Equivalent to:
let bar_pin: Pin<Bar> = unsafe {
    let foo: &mut Foo = Pin::get_mut(&mut foo_pin);
    Pin::new_unchecked(&mut foo.bar)
};

PinBox

The PinBox type is added to alloc::boxed and std::boxed. It is analogous to the Box type in the same way that Pin is analogous to the reference types, and it has a similar API.

#[fundamental]
pub struct PinBox<T: ?Sized> {
    inner: Box<T>,
}

Safe API

Unlike Pin, it is safe to construct a PinBox from a T and from a Box<T>, even if the type does not implement Unpin:

impl<T> PinBox<T> {
    pub fn new(data: T) -> PinBox<T> { ... }
}

impl<T: ?Sized> From<Box<T>> for PinBox<T> {
    fn from(boxed: Box<T>) -> PinBox<T> { ... }
}

It also provides the same Deref impls as Pin does:

impl<T: ?Sized> Deref for PinBox<T> { ... }
impl<T: Unpin + ?Sized> DerefMut for PinBox<T> { ... }

If the data implements Unpin, its also safe to convert a PinBox into a Box:

impl<T: Unpin + ?Sized> From<PinBox<T>> for Box<T> { ... }

Finally, it is safe to get a Pin from borrows of PinBox:

impl<T: ?Sized> PinBox<T> {
    fn as_pin<'a>(&'a mut self) -> Pin<'a, T> { ... }
}

These APIs make PinBox a reasonable way of handling data which does not implement Unpin. Once you heap allocate that data inside of a PinBox, you know that it will never change address again, and you can hand out Pin references to that data.

Unsafe API

PinBox can be unsafely converted into &mut T and Box<T> even if the type it references does not implement Unpin, similar to Pin:

impl<T: ?Sized> PinBox<T> {
    pub unsafe fn get_mut<'a>(this: &'a mut PinBox<T>) -> &'a mut T { ... }
    pub unsafe fn into_inner(this: PinBox<T>) -> Box<T> { ... }
}

Immovable generators

Today, the unstable generators feature has an option to create generators which contain references that live across yield points - these are, in effect, internal references into the generator's state machine. Because internal references are invalidated if the type is moved, these kinds of generators ("immovable generators") are currently unsafe to create.

Once the arbitrary_self_types feature becomes object safe, we will make three changes to the generator API:

  1. We will change the resume method to take self by self: Pin<Self> instead of &mut self.
  2. We will implement !Unpin for the anonymous type of an immovable generator.
  3. We will make it safe to define an immovable generator.

This is an example of how the APIs in this RFC allow for self-referential data types to be created safely.

Drawbacks

This adds additional APIs to std, including an auto trait. Such additions should not be taken lightly, and only included if they are well-justified by the abstractions they express.

Rationale and alternatives

Comparison to ?Move

One previous proposal was to add a built-in Move trait, similar to Sized. A type that did not implement Move could not be moved after it had been referenced.

This solution had some problems. First, the ?Move bound ended up "infecting" many different APIs where it wasn't relevant, and introduced a breaking change in several cases where the API bound changed in a non-backwards compatible way.

In a certain sense, this proposal is a much more narrowly scoped version of ?Move. With ?Move, any reference could act as the "Pin" reference does here. However, because of this flexibility, the negative consequences of having a type that can't be moved had a much broader impact.

Instead, we require APIs to opt into supporting immovability (a niche case) by operating with the Pin type, avoiding "infecting" the basic reference type with concerns around immovable types.

Comparison to using unsafe APIs

Another alternative we've considered was to just have the APIs which require immovability be unsafe. It would be up to the users of these APIs to review and guarantee that they never moved the self-referential types. For example, generator would look like this:

trait Generator {
    type Yield;
    type Return;

    unsafe fn resume(&mut self) -> CoResult<Self::Yield, Self::Return>;
}

This would require no extensions to the standard library, but would place the burden on every user who wants to call resume to guarantee (at the risk of memory unsafety) that their types were not moved, or that they were movable. This seemed like a worse trade off than adding these APIs.

Anchor as a wrapper type and StableDeref

In a previous iteration of this RFC, there was a wrapper type called Anchor that could "anchor" any smart pointer, and there was a hierarchy of traits relating to the stability of the referent of different pointer types. This has been replaced with PinBox.

The primary benefit of this approach was that it was partially integrated with crates like owning-ref and rental, which also use a hierarchy of stability traits. However, because of differences in the requirements, the traits used by owning-ref et al. ended up being a non-overlapping subset of the traits proposed by this RFC from the traits used by the Anchor type. Merging these into a single hierarchy provided relatively little benefit.

And the only types that implemented all of the necessary traits to be put into an Anchor before were Box<T> and Vec<T>. Because you cannot get mutable access to the smart pointer (unless the referent implements Unpin), an Anchor<Vec<T>> was not really any different from an Anchor<Box<[T]>> in the previous iteration of the RFC. For this reason, replacing Anchor with PinBox and just supporting PinBox<[T]> reduced the API complexity without losing any expressiveness.

Stack pinning API (potential future extension)

This API supports pinning !Unpin types in the heap. However, they can also be safely held in place in the stack, allowing a safe API for creating a Pin referencing a stack allocated !Unpin type.

This API is small, and does not become a part of anyone's public API. For that reason, we'll start by allowing it to grow out of tree, in third party crates, before including it in std. Here a version of the API, for reference purposes:

pub fn pinned(data: T) -> PinTemporary<'a, T> {
    PinTemporary { data, _marker: PhantomData }
}

struct PinTemporary<'a, T: 'a> {
    data: T,
    _marker: PhantomData<&'a &'a mut ()>,
}

impl<'a, T> PinTemporary<'a, T> {
    pub fn into_pin(&'a mut self) -> Pin<'a, T> {
        unsafe { Pin::new_unchecked(&mut self.data) }
    }
}

Making Pin a built-in type (potential future extension)

The Pin type could instead be a new kind of first-class reference - &'a pin T. This would have some advantages - it would be trivial to project through fields, for example, and "stack pinning" would not require an API, it would be natural. However, it has the downside of adding a new reference type, a very big language change.

For now, we're happy to stick with the Pin struct in std, and if this type is ever added, turn the Pin type into an alias for the reference type.

Having both Pin and PinMut

Instead of just having Pin, the type called Pin could instead be PinMut, and we could have a type called Pin, which is like PinMut, but only contains a shared, immutable reference.

Because we've determined that it should be safe to immutably dereference Pin/PinMut, this Pin type would not provide significant guarantees that a normal immutable reference does not. If a user needs to pass around references to data stored pinned, an &Pin (under the definition of Pin provided in this RFC) would suffice. For this reason, the Pin/PinMut distinction introduced extra types and complexity without any impactful benefit.

Unresolved questions

In addition to the future extensions discussed above, the APIs of the three pin types in std will grow over time as they implement more common conversion traits and so on.

We may also choose to require that Pin uphold stricter guarantees, requiring that Unpin data inside the Pin not leak unless the memory remains valid for the remainder of the program lifetime. This would make the stack API documented above unsound, but might also enable other APIs to make use of these guarantees to ensure that a destructor always runs if the memory becomes invalid.