New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Pointer metadata & VTable #2580

Open
wants to merge 20 commits into
base: master
from

Conversation

Projects
None yet
@SimonSapin
Contributor

SimonSapin commented Oct 27, 2018

Add generic APIs that allow manipulating the metadata of fat pointers:

  • Naming the metadata’s type (as an associated type)
  • Extracting metadata from a pointer
  • Reconstructing a pointer from a data pointer and metadata
  • Representing vtables, the metadata for trait objects, as a type with some limited API

This RFC does not propose a mechanism for defining custom dynamically-sized types, but tries to stay compatible with future proposals that do.

HTML view

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 27, 2018

A previous iteration of this RFC is also visible at #2579. It was based on @Gankro’s proposal https://gist.github.com/Gankro/b053cb4d1cb3bcaec070de89734720f7

Show resolved Hide resolved text/0000-ptr-meta.md Outdated
@mikeyhew

This comment has been minimized.

mikeyhew commented Oct 27, 2018

First of all, thanks for opening this rfc. It's the right way to fix the raw::TraitObject API, and is a big step toward custom DST.

My only criticism is I think the Vtable type should be generic over the trait object type, as mentioned in the alternatives section. Having different Metadata types for each trait object type would help catch errors at compile time. Also, it seems like &'static would preclude us from implementing trait object types like dyn Trait1 + Trait2 using multiple vtable pointers, and i think it would be nice to avoid making that decision in this RFC.

I like the name Pointee, I think it's an improvement over Referent because it's more clear what it means.

@gnzlbg

This comment has been minimized.

Contributor

gnzlbg commented Oct 27, 2018

cc @ubsan

Show resolved Hide resolved text/0000-ptr-meta.md Outdated
Show resolved Hide resolved text/0000-ptr-meta.md Outdated
Show resolved Hide resolved text/0000-ptr-meta.md Outdated
Show resolved Hide resolved text/0000-ptr-meta.md Outdated
Show resolved Hide resolved text/0000-ptr-meta.md Outdated
Show resolved Hide resolved text/0000-ptr-meta.md Outdated
///
/// [dst]: https://doc.rust-lang.org/nomicon/exotic-sizes.html#dynamically-sized-types-dsts
#[lang = "pointee"]
pub trait Pointee {

This comment has been minimized.

@oli-obk

oli-obk Oct 27, 2018

Contributor

so... I'm assuming the compiler implements

default impl<T: ?Sized> Pointee for T {
    type Metadata = &'static Vtable;
}
impl<T: Sized> Pointee for T {
    type Metadata = ();
}
impl Pointee for str {
    type Metadata = usize;
}
impl<T: Sized> Pointee for [T] {
    type Metadata = usize;
}

Which means theoretically we could make Vtable generic over T allowing the drop_in_place method to take a raw pointer with the correct pointee type?

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

These impls would be accurate in current Rust, but what I had in mind instead was that the compiler would automatically generate impls, similar to what it does for the std::marker::Unsize trait. As far as the standard library is concerned these impls would be "magic", not based on specialization.

Regardless, yes, making VTable generic with a type parameter for the trait object type is possible.

Show resolved Hide resolved text/0000-ptr-meta.md Outdated
(Answer: they can use a different metadata type like `[&'static VTable; N]`.)
`VTable` could be made generic with a type parameter for the trait object type that it describes.
This would avoid forcing that the size, alignment, and destruction pointers

This comment has been minimized.

@oli-obk

oli-obk Oct 27, 2018

Contributor

How would that avoid forcing this? Can you elaborate?

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

Without a type parameter, x.size() with x: &'static VTable necessarily executes the same code for any vtable. With a type parameter, x: &'static VTable<dyn Foo> and x: &'static VTable<dyn Bar> are different types and could execute different code. (For example, do table lookup with different offsets.) However, keeping the offset of size the same within all vtables might be desirable regardless of this API.

`VTable` could be made generic with a type parameter for the trait object type that it describes.
This would avoid forcing that the size, alignment, and destruction pointers
be in the same location (offset) for every vtables.
But keeping them in the same location is probaly desirable anyway to keep code size

This comment has been minimized.

@scottmcm

scottmcm Oct 28, 2018

Member

Missing the end of a sentence?

type Metadata;
}
/// Pointers to types implementing this trait alias are

This comment has been minimized.

@scottmcm

scottmcm Oct 28, 2018

Member

Missing the end of a sentence?

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 28, 2018

@mikeyhew I’ve very open to adding a type parameter to VTable, I’ll just wait to get some more feedback on the RFC as-is before making significant changes.

As to supporting super-fat pointers with multiple vtable pointers, as mentioned in the alternatives section I believe this design doesn’t prevent it. Types that don’t exist yet and are added to the language in the future (possibly custom DSTs) can have a different metadata type. For dyn A + B that could be [&'static VTable; 2], for example. This proposal does however force dyn C with trait C: A + B {} to keep only having a single vtable pointer.

pub unsafe fn drop_in_place(&self, data: *mut ()) { ... }
}
```

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

No drawbacks section...?

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

I came up short trying to think of a reason not to do this at all (as opposed to doing it differently). Suggestions welcome.

and (hopefully) more compatible with future custom DSTs proposals,
this RFC resolves the question of what happens
if trait objects with super-fat pointers with multiple vtable pointers are ever added.
(Answer: they can use a different metadata type like `[&'static VTable; N]`.)

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Should then [&'static VTable; 1] for dyn SomeTrait be used to make that transition smoother and to fit better with const generics?

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

This would make some sense if we were definitely gonna have super-fat pointers with multiple separate vtable pointers as fat pointer metadata. But if we don’t and end up with a different solution to upcasting, we’ll end up with a always-single-item arrays for no reason. This isn’t really the thread to get into that discussion, but my opinion is that super-fat pointer have a significant enough size cost that I’d much prefer a different solution.

Perhaps an alternative for this RFC, more neutral with respect super-fat pointers v.s. not, would be to have type Metadata = VTable<Self>; for trait objects. (See other comments about VTable’s possible type paramater.) With the pointer/reference indirection hidden away in private fields of the VTable type, this design would be compatible with having VTable<dyn A + B> contain two pointers in the future.

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

Also, is there a use case for generic code that accepts any trait object with any number of vtable pointer but not other kinds of DSTs?

and (hopefully) more compatible with future custom DSTs proposals,
this RFC resolves the question of what happens
if trait objects with super-fat pointers with multiple vtable pointers are ever added.
(Answer: they can use a different metadata type like `[&'static VTable; N]`.)

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

Are we doing the proposals in the right order? Shouldn't we focus on dealing with dyn A + B + C, upcasting, and such things first? Also, cc #2035.

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

I believe the design proposed here is compatible enough with various options for multi-traits trait objects and upcasting such that there isn’t a strong dependency, and we don’t need to block this RFC on everything else being settled.

* The name of `Pointee`. [Internals thread #6663][6663] used `Referent`.
* The location of `VTable`. Is another module more appropriate than `std::ptr`?

This comment has been minimized.

@Centril

Centril Oct 28, 2018

Contributor

and should it be called Dictionary instead? ("type class dictionary")

This comment has been minimized.

@kennytm

kennytm Oct 28, 2018

Member

Big -1 to calling it Dictionary since this typically means a key-value map (example: C#, Swift, Python).

Furthermore, here in Rust the VTable is implemented as an array of function pointers, not a HashMap (unlike e.g. Python where it is really implemented as a dict), so calling it Dictionary obscures the alleged complexity.

This comment has been minimized.

@SimonSapin

SimonSapin Oct 28, 2018

Contributor

Even if the implementation happened to use HashMap, I’d prefer VTable since it’s more descriptive of the role of this type. (As opposed to: dictionary of what?) I believe that vtable is a well-enough established term of art.

kennytm and others added some commits Oct 28, 2018

Pointer metadata: add bounds on the associated type
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: typo fix
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: typo fix
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: typo fix
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: typo fix
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: grammar
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: grammar
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
Pointer metadata: grammar
Co-Authored-By: SimonSapin <simon.sapin@exyr.org>
@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 28, 2018

Regarding making VTable generic, if it looks like this:

struct VTable<Dyn> { … }

… then it could be used with any type as a parameter. What does VTable<u32> mean? If we want to restrict VTable’s parameter to only types where it makes sense (dyn SomeTrait, dyn SomeTrait + SomeAutoTrait, etc.), we’ll need a dedicated public trait:

struct VTable<Dyn> where Dyn: ?Sized + std::marker::DynTrait { … }

Do we want such a trait?

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 28, 2018

Hmm, maybe we could get away with this?

struct VTable<Dyn> where Dyn: ?Sized + Pointee<Metadata=Self> { … }
@Gankro

This comment has been minimized.

Contributor

Gankro commented Oct 28, 2018

Is there any way to ensure minimal compiler time is wasted performing monomorphization on useless vtable type params? If not, I would rather the API just be less safe.

@mikeyhew

This comment has been minimized.

mikeyhew commented Oct 28, 2018

@Gankro the word "useless" sounds a little strong there... the purpose it to make sure you can't use a vtable from a *const dyn Trait to make a *const dyn SomeOtherTrait, at least not without doing something obviously hacky like transmute.

@Gankro

This comment has been minimized.

Contributor

Gankro commented Oct 28, 2018

You're supposing a situation where I somehow am writing code with two vtable types floating around, in which case I have two data pointers, and nothing can stop me from swapping the data pointers, producing the exact same effect.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 31, 2018

I’ve pushed some more changes, including code for ThinBox which hopefully makes a more compelling example.

Interestingly, the only reason ThinBox is restricted to trait objects rather than supporting any DST is in order to use VTable::align: knowing the value’s alignment is required in order to construct a pointer to it.

This problem is not specific to ThinBox. It was brought up before by @kennytm in the context of a custom DSTs discussion:

https://internals.rust-lang.org/t/pre-erfc-lets-fix-dsts/6663/2

DynAligned doesn’t work. The minimal promise a type can give about alignment, if any, is AlignFromMeta . This is because in a DST struct

struct X<T: DynAligned> {
    a: u8,
    b: T,
}

if you need to read &x.b , you’ll first need to find the offset of b , which in turn requires knowing the alignment of b without knowing its pointer value.

Since being able to compute the alignment based based on only a (DST) type and the pointer metadata is already a constraint of the language, how about exposing an API doing that?

// std::mem
// Very much subject to bikeshed
pub fn align_for_metadata::<T: ?Sized>(meta: <T as Pointee>::Metadata) -> usize {…}

Or

trait Pointee {
    // Already in the RFC:
    type Metadata: Copy + Send + Sync + Ord + Hash + 'static;

    // Added:
    fn align(meta: Self::Metadata) -> usize;
}
@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 31, 2018

@F001 Which trait do you mean? There is no trait named TraitObject in this RFC.

@kennytm

This comment has been minimized.

Member

kennytm commented Oct 31, 2018

@SimonSapin The first one (pub fn std::mem::align_for_meta) doesn't seem compatible with Custom DST since external code cannot override a free function.

BTW, the alignment of an extern type is undefined. Though I'd implement <extern type>::align() to panic 🙂.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 31, 2018

Is it undefined? What about code like this, then? (Though this is getting into discussion for rust-lang/rust#43467 rather than here…)

#![feature(extern_types)]
extern { pub type Foo; }
pub struct Bar(u8, Foo);
pub fn baz(x: &Bar) -> &Foo {
    &x.1
}

(Playground)

@kennytm

This comment has been minimized.

Member

kennytm commented Oct 31, 2018

@SimonSapin Yeah that's getting off-topic, the extern-type-in-a-struct discussed before starting from rust-lang/rust#43467 (comment) I think. This code works today because we're still emitting the fallback values size_of_val(x) == 0 and align_of_val(x) == 1 instead of abort()/panic!().

What I mean is that Pointee::align may diverge, but that's a very special case and is simpler than non-panicking alternatives like returning an Option<usize>.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Oct 31, 2018

@kennytm So you don’t think that this a reason for Pointee::align not to exist for every type?

@mikeyhew

This comment has been minimized.

mikeyhew commented Nov 1, 2018

I think we're getting off topic here: this RFC shouldn't provide an API for calculating the alignment, because there is not as much agreement on how to do so. Some of us would prefer to have traits like AlignFromMeta and AlignFromRef, whereas others would be OK with having a function that panics.

/// Pointers to types implementing this trait alias are “thin”:
///
/// ```rust
/// fn always_true<T: std::prt::Thin>() -> bool {

This comment has been minimized.

@mzji

mzji Nov 1, 2018

std::prt::Thin -> std::ptr::Thin ?

This comment has been minimized.

@SimonSapin

SimonSapin Nov 14, 2018

Contributor

Fixed, thanks.

@Gankro

This comment has been minimized.

Contributor

Gankro commented Nov 1, 2018

I think we're getting off topic here: this RFC shouldn't provide an API for calculating the alignment, because there is not as much agreement on how to do so. Some of us would prefer to have traits like AlignFromMeta and AlignFromRef, whereas others would be OK with having a function that panics.

This RFC is literally useless without alignment, as noted many times in the text and examples

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 1, 2018

Right, both ThinBox and VecOfDst absolutely require knowing the alignment of values.

#[lang = "pointee"]
pub trait Pointee {
/// The type for metadata in pointers and references to `Self`.
type Metadata: Copy + Send + Sync + Ord + Hash + 'static;

This comment has been minimized.

@kennytm

kennytm Nov 10, 2018

Member

Sorry I missed the Pin API. Could you also add Unpin to the list?

This comment has been minimized.

@SimonSapin

SimonSapin Nov 10, 2018

Contributor

It’s somewhat worrying if this list of bounds is growing. Could there be another trait in the future that we’d want to add here, but can’t because the trait is stable?

(In an hypothetical custom-DST future where user-defined impls can exist. In this RFC I think they cannot because they would always overlap with impls provided by the compiler through "magic".)

This comment has been minimized.

@SimonSapin

SimonSapin Nov 10, 2018

Contributor

What are the consequences for generic code not to be able to assume T::Metadata: Unpin specifically?

This comment has been minimized.

@kennytm

kennytm Nov 10, 2018

Member

If we introduce any core trait which is implemented on references or pointers, then the trait should appear on the list. The issue is that &T is equivalent to (*const (), T::Metadata), and we have

impl<'a, T: ?Sized> Unpin for &'a T {}

which forces T::Metadata must always be Unpin, otherwise we could move pinned data by smuggling it through a custom DST. For Unpin there's no way to safely construct a custom DST reference with pinned metadata so it is arguably fine, the main issue comes from traits which are implemented on pointers (e.g. Hash and Copy) where no unsafe code will be involved.

Yes the list will grow until we stabilize Pointee. After that, the impls cannot assume the metadata is "sane" and must write

impl<'a, T: ?Sized> Unpin for &'a T
where 
    T::Metadata: Unpin,  // <----
{}

or accept that T::Metadata: !Unpin is allowed.

The list doesn't affect built-in DSTs where the only possible types are usize or &'static VTable.

This comment has been minimized.

@plietar

plietar Nov 11, 2018

Can types even be Copy but not Unpin? Can we add Unpin as a super trait for Copy, in a backwards compatible way? I think so, because Unpin is an auto trait.

This comment has been minimized.

This comment has been minimized.

@ubsan

ubsan Nov 11, 2018

Contributor

I think that it would be reasonable to also require Unpin of T::Metadata.

This comment has been minimized.

@SimonSapin

SimonSapin Nov 14, 2018

Contributor

Added.

@ubsan ubsan referenced this pull request Nov 11, 2018

Open

RFC: Custom DSTs #2594

@kennytm kennytm referenced this pull request Nov 11, 2018

Open

Custom DST #2

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 14, 2018

A few questions:

  • I think that the metadata type of trait objects in this RFC is &'static VTable, not VTable. Should VTable be an extern type so that users can't assume anything about its representation?
  • I'm not certain the bounds on Metadata are useful. When would you want to write code generic over pointees with different metadatas that manipulates the metadata?
@kennytm

This comment has been minimized.

Member

kennytm commented Nov 14, 2018

The point of the bounds is to provide forward compatibility with Custom DSTs. For sure you can't impl Pointee for X with this RFC alone, and with built-in types the list would be irrelevant. But once we get Custom DST the same list will be needed anyway, so I don't see any harm being more specific upfront.

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 14, 2018

@kennytm why is the list needed? what relies on those guarantees?

@kennytm

This comment has been minimized.

Member

kennytm commented Nov 14, 2018

@withoutboats This is needed for Custom DSTs because *T should be equivalent to (*(), T::Metadata) (also note that all from_raw_parts functions in this RFC are safe).

For example, if Metadata does not require Copy, a Custom DST could make Vec<usize> a Metadata and then we can "copy" the Vec via:

// all safe!
let ptr = <*const CustomDst>::from_raw_parts(&(), vec![1,2,3]);
let ptr2 = ptr;
let ptr3 = ptr;
let vec2 = metadata(ptr);
let vec3 = metadata(ptr);

The whole list Sized + Copy + Send + Sync + Ord + Hash + Unpin + 'static is derived from various such impls on pointers and references.

We could restrict the impl<T: ?Sized> Copy for *const T to where T::Metadata: Copy, but this will break existing generic code. We could prevent Custom DSTs from having non-Copy metadata but that's equivalent to adding this list.

Again, the list is irrelevant if no one can impl Pointee, it is mostly for future compatibility with Custom DST.

(A copy of the relevant section from kennytm#2)

When manipulating generic DSTs, we often need to use its metadata type. This can be exposed via associated type if the DST implements some trait.

type Pointee {
    type Metadata: Sized;
}

That is, every type &Dst will be represented as the tuple (&Something, Dst::Metadata) in memory.

We need to ensure all existing traits implemented for pointers and references (&T, *T, Box<T> etc) are not affected by custom DST. This imposes many restrictions on Metadata:

  • Copy&T and *T are both Copy

  • Send&T is Send as long as T is Sync, without considering T::Metadata.

  • Sync&T is Sync as long as T is Sync, without considering T::Metadata.

  • Ord*T is ordered by the pointer value + metadata together. Demonstration:

    let a: &[u8] = &[1, 2, 3];
    let b: &[u8] = &a[..2];
    let a_ptr: *const [u8] = a;
    let b_ptr: *const [u8] = b;
    // the thin-pointer part are equal...
    assert_eq!(a_ptr as *const u8, b_ptr as *const u8);
    // but together with the metadata, they are different.
    assert!(a_ptr > b_ptr);
  • Hash*T hashes the pointer value + metadata together.

  • Unpin&T is always Unpin (Copy does not imply Unpin).

  • 'static*T should outlive T, thus T::Metadata should outlive T. There is no 'self lifetime, nor it makes sense to complicate the matters by making Metadata a GAT, thus the 'static bound.

Thus, the final constraint would be:

trait Pointee {
    type Metadata: Sized + Copy + Send + Sync + Ord + Hash + Unpin + 'static;
}

In principle Metadata should be further bound by UnwindSafe and RefUnwindSafe, but these traits are defined in libstd instead of libcore, so they cannot be included in the bound. One may need to explicitly opt-out of RefUnwindSafe according to the metadata type.

impl<T: ?Sized> !RefUnwindSafe for T {}
impl<T: ?Sized> RefUnwindSafe for T where T::Metadata: UnwindSafe {}

Fortunately, RefUnwindSafe is only excluded for UnsafeCell<T>, and UnwindSafe is only excluded for &mut T, both of which are handled by the Copy bound already.

The Metadata type should also be bound by Freeze (i.e. cell-free), but Freeze is a private trait, thus cannot be exposed to public. Again, Copy already eliminated the possibility of having cells.

@SimonSapin

This comment has been minimized.

Contributor

SimonSapin commented Nov 15, 2018

When would you want to write code generic over pointees with different metadatas that manipulates the metadata?

It’s rather niche, and likely we’ll end up with a small number of library that do that, but for example ThinBox and VecOfDst both could (almost #2580 (comment)) be generalized to work with any DST.

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 15, 2018

@kennytm your comments make sense, thanks! and it is concerning that this represents a forward compat hazard with adding new autotraits like Unpin, something we'll have to keep in mind.

However, I don't think the metadata influences the implementations of Ord and Hash, do they? So why are those necessary? (I'm also surprised to learn that vtables have an ordering.)

EDIT: And I'm not so convinced that Unpin is necessary either (not that this should ever matter in practice). We haven't guaranteed that Pin pins the metadata of a wide pointer at all.

Biascally, I think Copy + Send + Sync + 'static captures the real fact that we need to guarantee about metadata: its POD with no more restrictions than an address has.

@kennytm

This comment has been minimized.

Member

kennytm commented Nov 15, 2018

@withoutboats For now the both Hash and Ord impl for pointers compare their content as if opaque bytes (e.g. for Hash the implementation just transmutes to (usize, usize) and hash the tuple). I don't think transmute works when Custom DST is added.

Since we need to keep these impls without adding new where bounds:

impl<T: ?Sized> Hash for *const T { ... }
impl<T: ?Sized> Ord for *const T { ... }

and we need to keep comparing the metadata to maintain the runtime behavior, we'll need to add the Ord + Hash constraints.


For Unpin — interesting. I believe Pin<&mut CustomDst> not capturing the metadata in the pin is irrelevant. The interesting case is

Pin<&mut &CustomDst>  // this is a _thin_ pointer to a fat pointer

The inner reference is always Unpin currently, meaning we could safely Pin::get_mut to obtain the &mut &CustomDst, and we could mem::swap it with a different &CustomDst. If CustomDst::Metadata: !Unpin, this would have violated the pin guarantee.

Fortunately there's no safe way convert an arbitrary Pin<&mut T> to a Pin<&mut &CustomDst> (unlike Copy) so the damage is quite limited if Unpin is not on the list.

@withoutboats

This comment has been minimized.

Contributor

withoutboats commented Nov 15, 2018

@kennytm I was thinking of reference types, not pointer types. The behavior of pointer types is interesting, maybe even unfortunate..

@bjorn3

This comment has been minimized.

bjorn3 commented Nov 16, 2018

Because of impl<T: ?Sized> Eq for *const T the metadata has to be Eq too.

@kennytm

This comment has been minimized.

Member

kennytm commented Nov 16, 2018

Ord already implies PartialEq + PartialOrd + Eq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment