New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Dynamically Sized Types for Rust #1524

Closed
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
@ubsan
Contributor

ubsan commented Mar 2, 2016

I believe this fixes #813, and is a nicer, and far more powerful, solution than #709.

@dgrunwald

This comment has been minimized.

Show comment
Hide comment
@dgrunwald

dgrunwald Mar 2, 2016

Contributor

&CStr was originally supposed to be compatible with *mut c_char for FFI purposes. (with CStr being an unsized type)
It's only a fat pointer because of Rust's current limitations regarding DSTs.

Contributor

dgrunwald commented Mar 2, 2016

&CStr was originally supposed to be compatible with *mut c_char for FFI purposes. (with CStr being an unsized type)
It's only a fat pointer because of Rust's current limitations regarding DSTs.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 2, 2016

Contributor

@dgrunwald Ah, well, this RFC does support that.

struct CStr([libc::c_char]);
impl Dst for CStr {
    type Meta = ();
}
impl Sizeable for CStr {
    fn size_of_val(&self) -> usize {
        libc::strlen(self.as_ptr())
    }
}
impl CStr {
    pub unsafe fn from_ptr(p: *const libc::c_char) -> &CStr {
        // either
        std::mem::transmute::<*const libc::c_char, &CStr>(p)
        // or
        std::ptr::make_fat_ptr(p as *const (), ())
    }
}
Contributor

ubsan commented Mar 2, 2016

@dgrunwald Ah, well, this RFC does support that.

struct CStr([libc::c_char]);
impl Dst for CStr {
    type Meta = ();
}
impl Sizeable for CStr {
    fn size_of_val(&self) -> usize {
        libc::strlen(self.as_ptr())
    }
}
impl CStr {
    pub unsafe fn from_ptr(p: *const libc::c_char) -> &CStr {
        // either
        std::mem::transmute::<*const libc::c_char, &CStr>(p)
        // or
        std::ptr::make_fat_ptr(p as *const (), ())
    }
}
@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

I'm not sure about this. My main concern is around how actually useful this is compared to the complexity. There is a lot of code that assumes fat pointers are two words and two words only. This would completely change that. It's a very large increase in complexity.

Another issue is how this interacts with generics and unsizing. Currently Foo<[T; N]> will coerce into Foo<[T]> and everything will work properly because we know how to do that. There's also the case of people trying to use a generic Meta type, it's unclear how that might work.

Right now, my thought is that it's a nice feature (like most that get proposed), but not worth the increased complexity.

Contributor

Aatch commented Mar 3, 2016

I'm not sure about this. My main concern is around how actually useful this is compared to the complexity. There is a lot of code that assumes fat pointers are two words and two words only. This would completely change that. It's a very large increase in complexity.

Another issue is how this interacts with generics and unsizing. Currently Foo<[T; N]> will coerce into Foo<[T]> and everything will work properly because we know how to do that. There's also the case of people trying to use a generic Meta type, it's unclear how that might work.

Right now, my thought is that it's a nice feature (like most that get proposed), but not worth the increased complexity.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch We need something like this. Look at the documentation for CStr:

Note: This operation is intended to be a 0-cost cast but it is currently implemented with an up-front calculation of the length of the string.

This is just in the standard library. Out there in real code, we have to deal with C interop, where stuff like

typedef struct _TOKEN_GROUPS {
  DWORD              GroupCount;
  SID_AND_ATTRIBUTES Groups[ANYSIZE_ARRAY];
} TOKEN_GROUPS, *PTOKEN_GROUPS;

has to be translated into

struct TokenGroups {
    group_count: u32,
    groups: [SID_AND_ATTRIBUTES; 0],
}

And traits like Index and IndexMut only return references, so what if you have a 3D world which wants to return 2D slices into the world? You can't. You just can't. You have to define getters which return a

struct 2DSlice<'a, T> {
    width: usize,
    height: usize,
    ptr: *const T,
    lifetime: PhantomData<&'a [T]>,
}
struct 2DSliceMut<'a, T> {
    width: usize,
    height: usize,
    ptr: *mut T,
    lifetime: PhantomData<&'a mut [T]>,
}
Contributor

ubsan commented Mar 3, 2016

@Aatch We need something like this. Look at the documentation for CStr:

Note: This operation is intended to be a 0-cost cast but it is currently implemented with an up-front calculation of the length of the string.

This is just in the standard library. Out there in real code, we have to deal with C interop, where stuff like

typedef struct _TOKEN_GROUPS {
  DWORD              GroupCount;
  SID_AND_ATTRIBUTES Groups[ANYSIZE_ARRAY];
} TOKEN_GROUPS, *PTOKEN_GROUPS;

has to be translated into

struct TokenGroups {
    group_count: u32,
    groups: [SID_AND_ATTRIBUTES; 0],
}

And traits like Index and IndexMut only return references, so what if you have a 3D world which wants to return 2D slices into the world? You can't. You just can't. You have to define getters which return a

struct 2DSlice<'a, T> {
    width: usize,
    height: usize,
    ptr: *const T,
    lifetime: PhantomData<&'a [T]>,
}
struct 2DSliceMut<'a, T> {
    width: usize,
    height: usize,
    ptr: *mut T,
    lifetime: PhantomData<&'a mut [T]>,
}
@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch Unsizing isn't touched in this RFC. We really need integer generics to start doing stuff with unsizing. However, once we do get integer generics, it'll be quite easy to add that to this and we can have unsizing for everybody :)

Contributor

ubsan commented Mar 3, 2016

@Aatch Unsizing isn't touched in this RFC. We really need integer generics to start doing stuff with unsizing. However, once we do get integer generics, it'll be quite easy to add that to this and we can have unsizing for everybody :)

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch A generic Meta type isn't a problem, because of this error: "the type parameter T is not constrained by the impl trait, self type, or predicates".

Contributor

ubsan commented Mar 3, 2016

@Aatch A generic Meta type isn't a problem, because of this error: "the type parameter T is not constrained by the impl trait, self type, or predicates".

@reem

This comment has been minimized.

Show comment
Hide comment
@reem

reem Mar 3, 2016

@ubsan struct Whatever<T> { inner: Vec<T>> } impl<T> Whatever<T> Dst { type Meta = T; }?

reem commented Mar 3, 2016

@ubsan struct Whatever<T> { inner: Vec<T>> } impl<T> Whatever<T> Dst { type Meta = T; }?

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@reem 1) That's not an unsized type, 2) that wouldn't actually be a problem because you know what Meta is (as long as Whatever<T: Copy>)

Contributor

ubsan commented Mar 3, 2016

@reem 1) That's not an unsized type, 2) that wouldn't actually be a problem because you know what Meta is (as long as Whatever<T: Copy>)

@bluss

This comment has been minimized.

Show comment
Hide comment
@bluss

bluss Mar 3, 2016

@japaric Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

bluss commented Mar 3, 2016

@japaric Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

@solson

This comment has been minimized.

Show comment
Hide comment
@solson

solson Mar 3, 2016

Bikeshed: size_of_prelude could be called min_size_of or a variation of that.

solson commented on ca35a3e Mar 3, 2016

Bikeshed: size_of_prelude could be called min_size_of or a variation of that.

@thepowersgang

This comment has been minimized.

Show comment
Hide comment
@thepowersgang

thepowersgang Mar 3, 2016

Contributor

How would indexing the inner [T] of a custom DST be handled? It doesn't have the element count metadata from a normal slice. Would it be an unsafe operation? Or would the normal impl [T] methods be removed/hidden by the compiler when the slice is accessed via a custom DST pointer.

Idea: Add fn element_count(&self) -> usize to the Dst trait

Contributor

thepowersgang commented Mar 3, 2016

How would indexing the inner [T] of a custom DST be handled? It doesn't have the element count metadata from a normal slice. Would it be an unsafe operation? Or would the normal impl [T] methods be removed/hidden by the compiler when the slice is accessed via a custom DST pointer.

Idea: Add fn element_count(&self) -> usize to the Dst trait

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@thepowersgang It's an unsafe operation. The inner [T] isn't "real"; it's only there to get the base pointer of the unsized data.

Contributor

ubsan commented Mar 3, 2016

@thepowersgang It's an unsafe operation. The inner [T] isn't "real"; it's only there to get the base pointer of the unsized data.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Mar 3, 2016

Member

I originally proposed using [T; 0] for anything slice-like because it has the right alignment and can serve as a base pointer.

@Aatch Fat pointers being hardcoded to one pointer-sized piece of metadata is a mistake. There are important extensions to slice and traits we need to support and that hardcoding has to go (which is easier with the MIR).
cc @nikomatsakis about his desire to have TraitA+TraitB work as a trait object with 2 vtable pointers.

Member

eddyb commented Mar 3, 2016

I originally proposed using [T; 0] for anything slice-like because it has the right alignment and can serve as a base pointer.

@Aatch Fat pointers being hardcoded to one pointer-sized piece of metadata is a mistake. There are important extensions to slice and traits we need to support and that hardcoding has to go (which is easier with the MIR).
cc @nikomatsakis about his desire to have TraitA+TraitB work as a trait object with 2 vtable pointers.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

I liked [T] because it seemed nicer; this should be up for bikeshedding.

Contributor

ubsan commented Mar 3, 2016

I liked [T] because it seemed nicer; this should be up for bikeshedding.

@japaric

This comment has been minimized.

Show comment
Hide comment
@japaric

japaric Mar 3, 2016

Member

Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

Yes, I was working on it last year. I don't have time right now to analyze/comment on the design proposed in this RFC. So I'm just going to drop a few links here:

Member

japaric commented Mar 3, 2016

Didn't you work on user-defined DSTs before? Do you have a link to your old effort?

Yes, I was working on it last year. I don't have time right now to analyze/comment on the design proposed in this RFC. So I'm just going to drop a few links here:

@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

@thepowersgang actually brings up a good point, if you can get at the [T], then you'd expect to be able to work with it like a regular [T]. I'd prefer a way of indicating "this is unsized" that doesn't use [T] in that case.

Either use a [T;0] or something else that isn't already a DST. I don't think this really needs a marker for the "unsized portion" anyway, since a more trait-like DST would probably want to use (). The trait itself should be enough. It might be possible to treat an implementation of Dst as an implied impl !Sized, which would handle most of the issues I can think of.

Contributor

Aatch commented Mar 3, 2016

@thepowersgang actually brings up a good point, if you can get at the [T], then you'd expect to be able to work with it like a regular [T]. I'd prefer a way of indicating "this is unsized" that doesn't use [T] in that case.

Either use a [T;0] or something else that isn't already a DST. I don't think this really needs a marker for the "unsized portion" anyway, since a more trait-like DST would probably want to use (). The trait itself should be enough. It might be possible to treat an implementation of Dst as an implied impl !Sized, which would handle most of the issues I can think of.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

Hmm, yeah, I do think [T; 0] is better now that it's been discussed.

Contributor

ubsan commented Mar 3, 2016

Hmm, yeah, I do think [T; 0] is better now that it's been discussed.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

Although, I'm thinking about how it would work with integer generics and Unsizing coercions... I think that [T] is the best way, because we would have some sort of way of saying "This PascalStr is a Sized PascalStr (which can be unsized)", or whatever, which would look something like

struct PascalStr {
    len: usize,
    buf: [u8],
}
// &PascalStr == ptr -> { len, [buf; len] }
// &PascalStr<N> == ptr -> { N, [buf; N] }
// PascalStr<N> == { N, [buf; N] }
Contributor

ubsan commented Mar 3, 2016

Although, I'm thinking about how it would work with integer generics and Unsizing coercions... I think that [T] is the best way, because we would have some sort of way of saying "This PascalStr is a Sized PascalStr (which can be unsized)", or whatever, which would look something like

struct PascalStr {
    len: usize,
    buf: [u8],
}
// &PascalStr == ptr -> { len, [buf; len] }
// &PascalStr<N> == ptr -> { N, [buf; N] }
// PascalStr<N> == { N, [buf; N] }
@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

@ubsan eh, I'm not sure that's really worth it here. My main concern is that the compiler now has to be aware of a lot more context when handling field access. The main advantage of [T; 0] (or a semantically-similar marker type) is that you can't safely access any data through a pointer to it, but it has all the relevant properties of T. I'm not sure I like the idea of &foo.bar being invalid just because foo implements Dst.

Unsizing is probably better handed via separate types. This is already the case, [T; N] might look similar to [T], but they're different types completely. It's more obvious when you consider trait object unsizing. I think the same is going to be true here, if you want unsizing, you're gonna have to have a separate Sized type like everybody else.

Contributor

Aatch commented Mar 3, 2016

@ubsan eh, I'm not sure that's really worth it here. My main concern is that the compiler now has to be aware of a lot more context when handling field access. The main advantage of [T; 0] (or a semantically-similar marker type) is that you can't safely access any data through a pointer to it, but it has all the relevant properties of T. I'm not sure I like the idea of &foo.bar being invalid just because foo implements Dst.

Unsizing is probably better handed via separate types. This is already the case, [T; N] might look similar to [T], but they're different types completely. It's more obvious when you consider trait object unsizing. I think the same is going to be true here, if you want unsizing, you're gonna have to have a separate Sized type like everybody else.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch but they're not completely different types. You need them to be exactly the same type, in fact, behind the pointer, for unsizing coercions to work.

Contributor

ubsan commented Mar 3, 2016

@Aatch but they're not completely different types. You need them to be exactly the same type, in fact, behind the pointer, for unsizing coercions to work.

@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

@ubsan no, they're the same representation, not the same type.

Contributor

Aatch commented Mar 3, 2016

@ubsan no, they're the same representation, not the same type.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Mar 3, 2016

Member
struct PascalStrBuffer<B> {
    len: usize,
    buf: B
}
type PascalStrArray<const N: usize> = PascalStrBuffer<[u8; N]>;
type PascalStr = PascalStrBuffer<[u8]>;

This might be an okay compromise of sorts.

I've also considered having a single optional integer parameter, which denotes the minimal contained size and defaults to 0 (which would mean [u8; 0], i.e. "any size").
The problem with that, though, is you lose the sized aspect for which you wanted the flexibility in the first place, so it's probably pointless.

Member

eddyb commented Mar 3, 2016

struct PascalStrBuffer<B> {
    len: usize,
    buf: B
}
type PascalStrArray<const N: usize> = PascalStrBuffer<[u8; N]>;
type PascalStr = PascalStrBuffer<[u8]>;

This might be an okay compromise of sorts.

I've also considered having a single optional integer parameter, which denotes the minimal contained size and defaults to 0 (which would mean [u8; 0], i.e. "any size").
The problem with that, though, is you lose the sized aspect for which you wanted the flexibility in the first place, so it's probably pointless.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch fine; you can argue whether they're the same type or not. They are the same representation, definitely, and if you had to write separate Sized versions of each type, imagine the really big types:

#[repr(C)]
struct CLAIM_SECURITY_ATTRIBUTE_RELATIVE_V1 {
    Name: DWORD,
    ValueType: WORD,
    Reserved: WORD,
    Flags: DWORD,
    ValueCount: DWORD,
    union Values {
        pInt64: [DWORD],
        pUint64: [DWORD],
        ppString: [DWORD],
        pFqbn: [DWORD],
        pOctetString: [DWORD],
    }
}

imagine writing two of these types, and now for each of the Windows flexible array structs. It gets ridiculous.

Contributor

ubsan commented Mar 3, 2016

@Aatch fine; you can argue whether they're the same type or not. They are the same representation, definitely, and if you had to write separate Sized versions of each type, imagine the really big types:

#[repr(C)]
struct CLAIM_SECURITY_ATTRIBUTE_RELATIVE_V1 {
    Name: DWORD,
    ValueType: WORD,
    Reserved: WORD,
    Flags: DWORD,
    ValueCount: DWORD,
    union Values {
        pInt64: [DWORD],
        pUint64: [DWORD],
        ppString: [DWORD],
        pFqbn: [DWORD],
        pOctetString: [DWORD],
    }
}

imagine writing two of these types, and now for each of the Windows flexible array structs. It gets ridiculous.

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Mar 3, 2016

Member

It would be almost impossible to ensure the same layout for two different definitions though.
And I'm not sure how well #[repr(C)] would work with the DST stuff, although I reckon that it seems useful to have C interop defined for non-trait DSTs anyway.

Member

eddyb commented Mar 3, 2016

It would be almost impossible to ensure the same layout for two different definitions though.
And I'm not sure how well #[repr(C)] would work with the DST stuff, although I reckon that it seems useful to have C interop defined for non-trait DSTs anyway.

@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

In that case how about we restrict this to slice-like DSTs and use @thepowersgang's suggestion of having an element_count method?

Contributor

Aatch commented Mar 3, 2016

In that case how about we restrict this to slice-like DSTs and use @thepowersgang's suggestion of having an element_count method?

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch that seems overly restrictive. For example, if you have a

struct 2dSlice<T>([T]);
impl<T> Dst for 2dSlice<T> {
    type Meta = (usize, usize);
    fn element_count(&self) -> usize {
         // (meta.0 * meta.1)
    }
}
impl<T> Sizeable for 2dSlice<T> {...}

fn main() { 
    let x = 2dSlice::<(), (std::usize::MAX, std::usize::MAX)>::new(); // not possible with `element_count`
}
Contributor

ubsan commented Mar 3, 2016

@Aatch that seems overly restrictive. For example, if you have a

struct 2dSlice<T>([T]);
impl<T> Dst for 2dSlice<T> {
    type Meta = (usize, usize);
    fn element_count(&self) -> usize {
         // (meta.0 * meta.1)
    }
}
impl<T> Sizeable for 2dSlice<T> {...}

fn main() { 
    let x = 2dSlice::<(), (std::usize::MAX, std::usize::MAX)>::new(); // not possible with `element_count`
}
@Aatch

This comment has been minimized.

Show comment
Hide comment
@Aatch

Aatch Mar 3, 2016

Contributor

@ubsan that's not possible anyway! You can't have something that large. And it's not like the same reasoning doesn't apply to the size_of_val case.

Oh, and something I just thought of: destructors/dropping. It's all well and good having something that only contains Copy` types, but what if you have something non-Copy in your custom DST?

Contributor

Aatch commented Mar 3, 2016

@ubsan that's not possible anyway! You can't have something that large. And it's not like the same reasoning doesn't apply to the size_of_val case.

Oh, and something I just thought of: destructors/dropping. It's all well and good having something that only contains Copy` types, but what if you have something non-Copy in your custom DST?

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Mar 3, 2016

Contributor

@Aatch yes you can :) look at the type inside!

You only have a pointer to the values, and aren't the owner. You could impl Drop to drop the inside values, and that would work for something like Box.

Contributor

ubsan commented Mar 3, 2016

@Aatch yes you can :) look at the type inside!

You only have a pointer to the values, and aren't the owner. You could impl Drop to drop the inside values, and that would work for something like Box.

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Dec 21, 2016

Contributor

ping @nikomatsakis

status?

Contributor

ubsan commented Dec 21, 2016

ping @nikomatsakis

status?

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Jan 11, 2017

Contributor

On priorities

So @ubsan and I had a good chat about this RFC the other day. Let me lay out my personal perspective. I think that the core ideas of this RFC are really solid. I also think this RFC pretty clearly doesn't fit well into the roadmap priorities. What I would like to see happen is something where we can accept this RFC on a highly provisional basis -- basically, for experimentation in tree, since there is no way to experiment on this out of tree. We don't really have a good process for experimentation right now. We have accepted some RFCs in this kind of "very tentative" fashion (e.g., naked fns), but it never feels like a 'fully proper' thing to me. I have some thoughts on this point in particular (trying to adjust the process to make a different pathway for "explicitly experimental"), but I think this PR isn't the proper place for them. I'll try to write up something to internals. So, let's presume we work that out, and decide to accept this RFC on some kind of provisional basis. (I'm honestly not sure what the rest of @rust-lang/lang thinks about this whole obsession of mine.)

One thing I worry about is that even reviewing and landing PRs related to this RFC might be a lot of work. The changes to the compiler will cause churn, merge conficts, and accidental regressions, all of which take a toll. But I'm not sure how much in practice.

On the RFC itself

But speaking more to the RFC itself: fundamentally, I think that the motivation is pretty strong. As @ubsan pointed out to me, the Deref and Index traits bake in references into the trait itself. In some ways, this seems like a good thing to me. It ensures that we can syntactically recognize lvalues like *foo and foo[10] -- in contrast, if those sometimes generated wholly new values that had ownership (e.g. IndexGet, DerefGet), we'd have to introduce temporaries to take ownership, which would affect the lifetimes, and you'd only know this by evaluating the various traits. So, I find the idea of custom DST to be an appealing middle road here. It preserves the fact that Deref/Index always returns a borrow, while allowing for enough flexibility to enable a lot of useful use cases that don't work today:

  • subslices of multidimensional arrays;
  • modeling C and Pascal strings;
  • are there others? I feel confident people will come up with some. =)

The RFC's design itself seems about right. Whatever we do, it's going to look something like this: We need a way to declare an unsized type that combines "data" and "meta" parts. The meta part must be copy so that &T: Copy continues to hold. We need some intrinsics or lang items or something for breaking apart and reassembling fat pointers. I see three overall areas of uncertainty here:

  1. How hard will this be to implement? Will that reveal horrible flaws?
  2. Will we be able to write matrix libraries? Model CStr? Perhaps do some other clever things?
  3. What syntax should we use (more on this later).

The first two seem like things that will be hard to answer without implementing this and giving it a spin. Given that this is not really part of the roadmap, I think this would have to be something that is done done by motivated people in the community. It's a good way to learn about the Rust compiler. The role of the teams would ideally be a mentorship role. @nrc can tell you how awesome DST is as a way to learn everything you might want to know about trans. :P

I also expect that the RFC will likely change shape over the course of implementation. That's fine and expected.

That said, there are some particulars I'd still like to change in the RFC. To some extent these are "surface" syntactic issues, but I think they also get at the mental model people should have. I think we should move away from an Unsized or !Sized trait, which seems to pertain only to unsized types, and instead think of a Referent trait, which pertains to all types. It defines what it means for a type to be a reference. This would also move away from the idea of "fat and thin" pointers, and instead say that all references have a "data" and a "meta" part. But for "thin" pointers (i.e., reference to sized types) the Meta is always ().

Here is the trait I propose, roughly. You will notice a certain resemblance to the !Sized trait:

pub trait Referent {
    type Data;
    type Meta: Copy;

    /// Compute size/alignment of the data behind a reference.
    fn size_of_val(&self) -> usize; 
    fn align_of_val(&self) -> usize;
    
    /// Make a reference from its constituent parts.
    fn assemble(data: *const Self::Data, meta: Self::Meta) -> *const Self;

    /// Break a reference down into its constituent parts.
    fn disassemble(data: *const Self) -> (*const Self, Self::Meta);

    /// Convenience functions.
    fn data(r: &Self) -> &Self::Data { /* defined in terms of disassemble */ }
    fn meta(r: &Self) -> Self::Meta { let (_, meta) = Self::disassemble(r); meta }
}

This trait would be deeply built into the compiler, so it wouldn't be something you can implement by hand. For all existing types, the compiler would provide the implementation. For illustrative purposes, you might imagine that the impl for all sized types would look something like this:

impl<T> Referent for T {
    type Data = T;
    type Meta: Copy = ();

    fn size_of_val(&self) -> usize { size_of::<T>() }
    fn align_of_val(&self) -> usize { align_of::<T>() }
}

Ok, so how do we declare custom DST? The current RFC sort of "puns", using a single struct to represent both the unsized type and a prefix of the data that it points at, and pairing that with an impl:

// Current RFC
struct Pixels([f32; 0]);
struct CStr([u8; 0]);
impl !Sized for CStr { ... }

The unfortunate part of this pun is that it means that a raw pointer to the unsized type is a fat pointer. So, e.g., *mut PascalStr would be a fat pointer (admittedly, one with a () metadata). We now have no good way to refer to just the data pointer, devoid of its metadata. The only way we can do it is with a *const (), effectively.

In contrast, if we declare the unsized type explicitly, and then connect it to a separate data/meta type, we have a name for each constituent part. Here I'm going to go wild using some syntax. The idea is that you declare an unsized type and, together with it, you declare some parts of the Referent trait:

/// Declare an unsized type and some of its parameters.
/// All of these members are required.
unsized type Pixels {
    type Data = f32; // data pointer is a `*const f32`
    type Meta = [usize; 2]; // meta is two `usize` values
    fn size_of_val(&self) -> usize { .. }
    fn align_of_val(&self) -> usize { .. }
}

Here, the Data field basically tells us what type the data has. It is really a convenience for users, to help them write more type-safe code. It might describe some prefix of the data, or, as with Pixels, it describes the type of a repeating segment (but not the bounds).

Concerns

Finally, some of my concerns:

  • If we accept this, and implement it, will it just linger in a quasi-supported, perma-unstable state? (This applies equally well to many other experiments, e.g. naked fns)
    • A counter-pressure here might come from libraries like ndarray, if they decided they could make good use of this feature.
  • What about unsize coercions? How do they fit in here? Perhaps they just come naturally from deref coercions?
  • Does all the complexity and mechanism here pay for itself? I don't think you can really argue that this makes language simpler, even if it does expose some previously baked-in parts to customization.
  • Even reviewing and landing PRs related to this RFC might be a lot of work, which might delay other things (as I mentioned earlier).
  • We might find we want some more general descriptive power. This RFC cannot (by itself) describe thin trait objects, for example, they would need some reflective mechanism. That's probably an orthogonal concern.
Contributor

nikomatsakis commented Jan 11, 2017

On priorities

So @ubsan and I had a good chat about this RFC the other day. Let me lay out my personal perspective. I think that the core ideas of this RFC are really solid. I also think this RFC pretty clearly doesn't fit well into the roadmap priorities. What I would like to see happen is something where we can accept this RFC on a highly provisional basis -- basically, for experimentation in tree, since there is no way to experiment on this out of tree. We don't really have a good process for experimentation right now. We have accepted some RFCs in this kind of "very tentative" fashion (e.g., naked fns), but it never feels like a 'fully proper' thing to me. I have some thoughts on this point in particular (trying to adjust the process to make a different pathway for "explicitly experimental"), but I think this PR isn't the proper place for them. I'll try to write up something to internals. So, let's presume we work that out, and decide to accept this RFC on some kind of provisional basis. (I'm honestly not sure what the rest of @rust-lang/lang thinks about this whole obsession of mine.)

One thing I worry about is that even reviewing and landing PRs related to this RFC might be a lot of work. The changes to the compiler will cause churn, merge conficts, and accidental regressions, all of which take a toll. But I'm not sure how much in practice.

On the RFC itself

But speaking more to the RFC itself: fundamentally, I think that the motivation is pretty strong. As @ubsan pointed out to me, the Deref and Index traits bake in references into the trait itself. In some ways, this seems like a good thing to me. It ensures that we can syntactically recognize lvalues like *foo and foo[10] -- in contrast, if those sometimes generated wholly new values that had ownership (e.g. IndexGet, DerefGet), we'd have to introduce temporaries to take ownership, which would affect the lifetimes, and you'd only know this by evaluating the various traits. So, I find the idea of custom DST to be an appealing middle road here. It preserves the fact that Deref/Index always returns a borrow, while allowing for enough flexibility to enable a lot of useful use cases that don't work today:

  • subslices of multidimensional arrays;
  • modeling C and Pascal strings;
  • are there others? I feel confident people will come up with some. =)

The RFC's design itself seems about right. Whatever we do, it's going to look something like this: We need a way to declare an unsized type that combines "data" and "meta" parts. The meta part must be copy so that &T: Copy continues to hold. We need some intrinsics or lang items or something for breaking apart and reassembling fat pointers. I see three overall areas of uncertainty here:

  1. How hard will this be to implement? Will that reveal horrible flaws?
  2. Will we be able to write matrix libraries? Model CStr? Perhaps do some other clever things?
  3. What syntax should we use (more on this later).

The first two seem like things that will be hard to answer without implementing this and giving it a spin. Given that this is not really part of the roadmap, I think this would have to be something that is done done by motivated people in the community. It's a good way to learn about the Rust compiler. The role of the teams would ideally be a mentorship role. @nrc can tell you how awesome DST is as a way to learn everything you might want to know about trans. :P

I also expect that the RFC will likely change shape over the course of implementation. That's fine and expected.

That said, there are some particulars I'd still like to change in the RFC. To some extent these are "surface" syntactic issues, but I think they also get at the mental model people should have. I think we should move away from an Unsized or !Sized trait, which seems to pertain only to unsized types, and instead think of a Referent trait, which pertains to all types. It defines what it means for a type to be a reference. This would also move away from the idea of "fat and thin" pointers, and instead say that all references have a "data" and a "meta" part. But for "thin" pointers (i.e., reference to sized types) the Meta is always ().

Here is the trait I propose, roughly. You will notice a certain resemblance to the !Sized trait:

pub trait Referent {
    type Data;
    type Meta: Copy;

    /// Compute size/alignment of the data behind a reference.
    fn size_of_val(&self) -> usize; 
    fn align_of_val(&self) -> usize;
    
    /// Make a reference from its constituent parts.
    fn assemble(data: *const Self::Data, meta: Self::Meta) -> *const Self;

    /// Break a reference down into its constituent parts.
    fn disassemble(data: *const Self) -> (*const Self, Self::Meta);

    /// Convenience functions.
    fn data(r: &Self) -> &Self::Data { /* defined in terms of disassemble */ }
    fn meta(r: &Self) -> Self::Meta { let (_, meta) = Self::disassemble(r); meta }
}

This trait would be deeply built into the compiler, so it wouldn't be something you can implement by hand. For all existing types, the compiler would provide the implementation. For illustrative purposes, you might imagine that the impl for all sized types would look something like this:

impl<T> Referent for T {
    type Data = T;
    type Meta: Copy = ();

    fn size_of_val(&self) -> usize { size_of::<T>() }
    fn align_of_val(&self) -> usize { align_of::<T>() }
}

Ok, so how do we declare custom DST? The current RFC sort of "puns", using a single struct to represent both the unsized type and a prefix of the data that it points at, and pairing that with an impl:

// Current RFC
struct Pixels([f32; 0]);
struct CStr([u8; 0]);
impl !Sized for CStr { ... }

The unfortunate part of this pun is that it means that a raw pointer to the unsized type is a fat pointer. So, e.g., *mut PascalStr would be a fat pointer (admittedly, one with a () metadata). We now have no good way to refer to just the data pointer, devoid of its metadata. The only way we can do it is with a *const (), effectively.

In contrast, if we declare the unsized type explicitly, and then connect it to a separate data/meta type, we have a name for each constituent part. Here I'm going to go wild using some syntax. The idea is that you declare an unsized type and, together with it, you declare some parts of the Referent trait:

/// Declare an unsized type and some of its parameters.
/// All of these members are required.
unsized type Pixels {
    type Data = f32; // data pointer is a `*const f32`
    type Meta = [usize; 2]; // meta is two `usize` values
    fn size_of_val(&self) -> usize { .. }
    fn align_of_val(&self) -> usize { .. }
}

Here, the Data field basically tells us what type the data has. It is really a convenience for users, to help them write more type-safe code. It might describe some prefix of the data, or, as with Pixels, it describes the type of a repeating segment (but not the bounds).

Concerns

Finally, some of my concerns:

  • If we accept this, and implement it, will it just linger in a quasi-supported, perma-unstable state? (This applies equally well to many other experiments, e.g. naked fns)
    • A counter-pressure here might come from libraries like ndarray, if they decided they could make good use of this feature.
  • What about unsize coercions? How do they fit in here? Perhaps they just come naturally from deref coercions?
  • Does all the complexity and mechanism here pay for itself? I don't think you can really argue that this makes language simpler, even if it does expose some previously baked-in parts to customization.
  • Even reviewing and landing PRs related to this RFC might be a lot of work, which might delay other things (as I mentioned earlier).
  • We might find we want some more general descriptive power. This RFC cannot (by itself) describe thin trait objects, for example, they would need some reflective mechanism. That's probably an orthogonal concern.
@Ericson2314

This comment has been minimized.

Show comment
Hide comment
@Ericson2314

Ericson2314 Jan 18, 2017

Contributor

Cross-posting from https://internals.rust-lang.org/t/recent-change-to-make-exhaustiveness-and-uninhabited-types-play-nicer-together/4602/56?u=ericson2314

Earlier in this thread I mentioned a DynSized trait to be used in a Sized: DynSized: Referent hierarchy. The point of this is some DSTs have unknown size at runtime too. The classic example of this are:

  1. Opaque structs in C. These are, quite operational, unknown types of unknown size.

  2. The void in C's void*. This is not the void for zero arguments or an () return! This is in fact exactly like the above, except for C's coercion rules, which make it act like a supertype of many types.

  3. Functions themselves (see #883). Then we can use Option<&'static fn-itself-type>, *const fun-itself type, and &'a fn-itself-type for weak linking and run-time (not init-time), dynamic linking, and JIT compilers.

  4. CStr, if we consider the implicit O(n) size computation a poor fit for a systems language.

Contributor

Ericson2314 commented Jan 18, 2017

Cross-posting from https://internals.rust-lang.org/t/recent-change-to-make-exhaustiveness-and-uninhabited-types-play-nicer-together/4602/56?u=ericson2314

Earlier in this thread I mentioned a DynSized trait to be used in a Sized: DynSized: Referent hierarchy. The point of this is some DSTs have unknown size at runtime too. The classic example of this are:

  1. Opaque structs in C. These are, quite operational, unknown types of unknown size.

  2. The void in C's void*. This is not the void for zero arguments or an () return! This is in fact exactly like the above, except for C's coercion rules, which make it act like a supertype of many types.

  3. Functions themselves (see #883). Then we can use Option<&'static fn-itself-type>, *const fun-itself type, and &'a fn-itself-type for weak linking and run-time (not init-time), dynamic linking, and JIT compilers.

  4. CStr, if we consider the implicit O(n) size computation a poor fit for a systems language.

@Ericson2314 Ericson2314 referenced this pull request Jan 21, 2017

Merged

extern types #1861

@ubsan

This comment has been minimized.

Show comment
Hide comment
@ubsan

ubsan Feb 6, 2017

Contributor

I can no longer keep this open in good faith. Someone else can take it from me, but ... personal issues.

Contributor

ubsan commented Feb 6, 2017

I can no longer keep this open in good faith. Someone else can take it from me, but ... personal issues.

@ubsan ubsan closed this Feb 6, 2017

@AtheMathmo AtheMathmo referenced this pull request Feb 11, 2017

Closed

Range-based indexing #149

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 17, 2017

Contributor

So, for some time now I've been wanting to write a followup to my previous comment sharing my thoughts about this work. Even though @ubsan has closed the RFC, I figured I should leave it anyway. Hopefully it is clear from my previous comments that I think the ideas in here are good in a technical sense, and I'd like to do some experimentation with them. However, it's probably also clear that I feel somewhat hestitant about this. My concerns here are not technical, but rather all about prioritization and motiviation.

Rust is an ambitious language. Making it successul is a big job and there is a lot of work ahead for us. When I ask myself "how would Custom DST help Rust grow over the next year", I don't feel like I hear a convincing answer. The main use case that I see is helping make matrix libraries more ergonomic. Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things, it's just a way to make them nicer to use. (I'd be interested to hear otherwise; are there any applications where the presence of custom DST would dramatically change the design in some way that it's not even worth doing it otherwise?)

Earlier I had thoughts, which I mentioned briefly in my comment, about the idea of trying to land this work in a very provisional way, to enable hacking. I am still interested in establishing a process for this. I think there are a number of things that might benefit from having a more "experimentation friendly" process (e.g., naked fns, inline asm, and interrupt ABIs come to mind).

But I haven't gone and publicly worked on defining such a process because I also have concerns: the fact is that any time there is significant churn in the codebase, it has the potential to take a lot of time for core developers who are trying to focus on other things. I think the work around ! has proven this point quite dramatically! By any measure, ! is not a core feature, but working out all the details on it and how it interacts with the unsafe code guidelines, match statements, and so forth has taken and will continue to take quite a lot of time and energy. (I think that much of the work around ! is work we would have to do anyway, as it happens, but perhaps with less urgency and more at a time of our choosing.) And even the most innocent and orthogonal of feature can cause ICEs and regressions which are a big distraction.

(On the other hand, there are some areas where we are adopting a looser process. We did land naked fns in a "provisional state", for example, and that has largely been a non-event. There is some experimentation around the embedded area, particularly with ABIs. But all of these are quite narrow and targeted compared to custom dst, which affects the code that handles every single reference.)

Before @ubsan closed this RFC, I was working up my courage to move to postpone (close) -- but I kept hesitating, because I think it's a good idea and I like it. Yet still I have the feeling that on balance it just isn't a good investment of resources. I regret that hesitating perhaps sends a more frustrating message than actually moving to close. So @ubsan, I'm sorry about that.

Contributor

nikomatsakis commented Feb 17, 2017

So, for some time now I've been wanting to write a followup to my previous comment sharing my thoughts about this work. Even though @ubsan has closed the RFC, I figured I should leave it anyway. Hopefully it is clear from my previous comments that I think the ideas in here are good in a technical sense, and I'd like to do some experimentation with them. However, it's probably also clear that I feel somewhat hestitant about this. My concerns here are not technical, but rather all about prioritization and motiviation.

Rust is an ambitious language. Making it successul is a big job and there is a lot of work ahead for us. When I ask myself "how would Custom DST help Rust grow over the next year", I don't feel like I hear a convincing answer. The main use case that I see is helping make matrix libraries more ergonomic. Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things, it's just a way to make them nicer to use. (I'd be interested to hear otherwise; are there any applications where the presence of custom DST would dramatically change the design in some way that it's not even worth doing it otherwise?)

Earlier I had thoughts, which I mentioned briefly in my comment, about the idea of trying to land this work in a very provisional way, to enable hacking. I am still interested in establishing a process for this. I think there are a number of things that might benefit from having a more "experimentation friendly" process (e.g., naked fns, inline asm, and interrupt ABIs come to mind).

But I haven't gone and publicly worked on defining such a process because I also have concerns: the fact is that any time there is significant churn in the codebase, it has the potential to take a lot of time for core developers who are trying to focus on other things. I think the work around ! has proven this point quite dramatically! By any measure, ! is not a core feature, but working out all the details on it and how it interacts with the unsafe code guidelines, match statements, and so forth has taken and will continue to take quite a lot of time and energy. (I think that much of the work around ! is work we would have to do anyway, as it happens, but perhaps with less urgency and more at a time of our choosing.) And even the most innocent and orthogonal of feature can cause ICEs and regressions which are a big distraction.

(On the other hand, there are some areas where we are adopting a looser process. We did land naked fns in a "provisional state", for example, and that has largely been a non-event. There is some experimentation around the embedded area, particularly with ABIs. But all of these are quite narrow and targeted compared to custom dst, which affects the code that handles every single reference.)

Before @ubsan closed this RFC, I was working up my courage to move to postpone (close) -- but I kept hesitating, because I think it's a good idea and I like it. Yet still I have the feeling that on balance it just isn't a good investment of resources. I regret that hesitating perhaps sends a more frustrating message than actually moving to close. So @ubsan, I'm sorry about that.

@AtheMathmo

This comment has been minimized.

Show comment
Hide comment
@AtheMathmo

AtheMathmo Feb 17, 2017

@nikomatsakis Thank you for your comment.

I agree with what you're saying - there are more pressing things to work on right now. I also wanted to comment on the following:

Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things

I think this is an accurate statement. There are a few things that I cannot do without custom DSTs, some of which don't even have ugly workarounds, but these are only mechanisms to tidy up the code and make things easier for users. Here are some example issues that we had to put off:

AtheMathmo/rulinalg#149 (I think custom DSTs help this)
AtheMathmo/rulinalg#14

It would be really nice to parallel the Vec<T>-&[T] relationship in the core library but Matrix-like things are still workable without this (to me).

AtheMathmo commented Feb 17, 2017

@nikomatsakis Thank you for your comment.

I agree with what you're saying - there are more pressing things to work on right now. I also wanted to comment on the following:

Now, admittedly, this is somewhat compelling, but otoh lack of custom dst isn't, I don't think, blocking work on Matrix-like things

I think this is an accurate statement. There are a few things that I cannot do without custom DSTs, some of which don't even have ugly workarounds, but these are only mechanisms to tidy up the code and make things easier for users. Here are some example issues that we had to put off:

AtheMathmo/rulinalg#149 (I think custom DSTs help this)
AtheMathmo/rulinalg#14

It would be really nice to parallel the Vec<T>-&[T] relationship in the core library but Matrix-like things are still workable without this (to me).

@SSheldon SSheldon referenced this pull request Feb 21, 2017

Closed

RFC: truly unsized types #813

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 21, 2017

Contributor

@AtheMathmo interestingly, I was planning to come and post a comment here anyhow just to mention I myself ran into a use-case for this just yesterday when hacking on my NLL prototype, specifically the bitset code. Really it's just the "2-D matrix use case" in disguise, but in trying to play with it, I did appreciate that there are things that are hard to do "just with methods". Though maybe I didn't find the best answer.

In particular, I have a BitSet type that supports, for each graph node, a bit set of N bits. I wanted the ability to also create a BitBuf (one row of N bits) and a BitSlice (a view into either BitSet or BitBuf). This all worked ok but it seemed like I had to copy-and-paste various bits of logic and couldn't easily eliminate the redundancies. I remember thinking that if I could have Deref and DerefMut impls I would have wound up with less redundancy. Sorry not very specific. I have to revisit it now and try to re-examine.

In any case, I definitely think that this conversation -- to what extent is this "just sugar" vs unlocking some fundamental capability -- is the critical one to deciding how to prioritize this change.

Contributor

nikomatsakis commented Feb 21, 2017

@AtheMathmo interestingly, I was planning to come and post a comment here anyhow just to mention I myself ran into a use-case for this just yesterday when hacking on my NLL prototype, specifically the bitset code. Really it's just the "2-D matrix use case" in disguise, but in trying to play with it, I did appreciate that there are things that are hard to do "just with methods". Though maybe I didn't find the best answer.

In particular, I have a BitSet type that supports, for each graph node, a bit set of N bits. I wanted the ability to also create a BitBuf (one row of N bits) and a BitSlice (a view into either BitSet or BitBuf). This all worked ok but it seemed like I had to copy-and-paste various bits of logic and couldn't easily eliminate the redundancies. I remember thinking that if I could have Deref and DerefMut impls I would have wound up with less redundancy. Sorry not very specific. I have to revisit it now and try to re-examine.

In any case, I definitely think that this conversation -- to what extent is this "just sugar" vs unlocking some fundamental capability -- is the critical one to deciding how to prioritize this change.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 21, 2017

Contributor

Another thing that I was thinking about was this question: is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

Contributor

nikomatsakis commented Feb 21, 2017

Another thing that I was thinking about was this question: is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

@AtheMathmo

This comment has been minimized.

Show comment
Hide comment
@AtheMathmo

AtheMathmo Feb 21, 2017

is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

I cannot speak for the others who are looking to make use of this feature, but for me I would be making significant changes.

Right now we have the following types in rulinalg: Matrix, MatrixSlice, MatrixSliceMut. And we have BaseMatrix and BaseMatrixMut traits to let us be generic over these. If we had custom DSTs then we could use Deref and DerefMut and (probably) wouldn't want these traits any more - we just implement everything for the slice types and let Deref do the work.

Another particularly difficult area is with operator overloading. Right now we have an explosion of overloading implementations for all combinations of matrix types. I think that custom DSTs could help here too but I haven't explored the idea too much. (Hoping Deref lets us cover all &MatType op &OtherMatType etc. ops).

I'll try to spend some time thinking about whether there is anything else that I can add to this discussion. I've got quite used to working around the lack of custom DSTs and I don't think there are many features missing for the user. There are however quite a few ways we could make things nicer - not needing to import traits everywhere, being able to have &MatrixSlice return types in their code and with Index etc., and others.

AtheMathmo commented Feb 21, 2017

is this the sort of capability that, when added, would mean that the interface for existing libraries would want to be completely reworked to take advantage of it?

I cannot speak for the others who are looking to make use of this feature, but for me I would be making significant changes.

Right now we have the following types in rulinalg: Matrix, MatrixSlice, MatrixSliceMut. And we have BaseMatrix and BaseMatrixMut traits to let us be generic over these. If we had custom DSTs then we could use Deref and DerefMut and (probably) wouldn't want these traits any more - we just implement everything for the slice types and let Deref do the work.

Another particularly difficult area is with operator overloading. Right now we have an explosion of overloading implementations for all combinations of matrix types. I think that custom DSTs could help here too but I haven't explored the idea too much. (Hoping Deref lets us cover all &MatType op &OtherMatType etc. ops).

I'll try to spend some time thinking about whether there is anything else that I can add to this discussion. I've got quite used to working around the lack of custom DSTs and I don't think there are many features missing for the user. There are however quite a few ways we could make things nicer - not needing to import traits everywhere, being able to have &MatrixSlice return types in their code and with Index etc., and others.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 22, 2017

Contributor

@AtheMathmo it seems like it would be a fruitful exercise to try and use e.g. the design from my comment and sketch out how the types in your library would work (in contrast to now) and throw it in a gist. (I'd like to see how you envision the operator overloading working, too.) Apologies if you've already done it and I missed it.

(The design from the RFC seems mostly fine too, I don't recall there being any major differences.)

Contributor

nikomatsakis commented Feb 22, 2017

@AtheMathmo it seems like it would be a fruitful exercise to try and use e.g. the design from my comment and sketch out how the types in your library would work (in contrast to now) and throw it in a gist. (I'd like to see how you envision the operator overloading working, too.) Apologies if you've already done it and I missed it.

(The design from the RFC seems mostly fine too, I don't recall there being any major differences.)

@AtheMathmo

This comment has been minimized.

Show comment
Hide comment
@AtheMathmo

AtheMathmo Feb 22, 2017

@nikomatsakis that sounds like a good idea!

I'm a little busy for the rest of this week but hopefully will be able to sketch something out early next week.

AtheMathmo commented Feb 22, 2017

@nikomatsakis that sounds like a good idea!

I'm a little busy for the rest of this week but hopefully will be able to sketch something out early next week.

@nikomatsakis

This comment has been minimized.

Show comment
Hide comment
@nikomatsakis

nikomatsakis Feb 23, 2017

Contributor

@AtheMathmo great, I'd love to see it. Since this RFC is closed, and it feels like we're still a bit in "design discussion" here, I'm going to open a thread on internals to carry on the conversation:

https://internals.rust-lang.org/t/custom-dst-discussion/4842

Contributor

nikomatsakis commented Feb 23, 2017

@AtheMathmo great, I'd love to see it. Since this RFC is closed, and it feels like we're still a bit in "design discussion" here, I'm going to open a thread on internals to carry on the conversation:

https://internals.rust-lang.org/t/custom-dst-discussion/4842

@stjepang

This comment has been minimized.

Show comment
Hide comment
@stjepang

stjepang Jun 17, 2017

Contributor

For some more motivation, here's what I need in Crossbeam:

  1. Heap-allocated Array<T>. The head consists of an usize (length), and the tail is [T].
  2. Heap-allocated SkiplistNode<K, V>. The head consists of (K, V, usize) (key, value, height), and the tail is [AtomicPtr<SkiplistNode<K, V>>].
  3. Other similar situations with heap-allocated objects. The head is some arbitrary data and the length. The tail is an array of that length.

Pointers to such objects must be thin because atomically manipulating multiple words is a pain to do portably and performantly. Some support from the language for such use of DSTs would be great to have.

Contributor

stjepang commented Jun 17, 2017

For some more motivation, here's what I need in Crossbeam:

  1. Heap-allocated Array<T>. The head consists of an usize (length), and the tail is [T].
  2. Heap-allocated SkiplistNode<K, V>. The head consists of (K, V, usize) (key, value, height), and the tail is [AtomicPtr<SkiplistNode<K, V>>].
  3. Other similar situations with heap-allocated objects. The head is some arbitrary data and the length. The tail is an array of that length.

Pointers to such objects must be thin because atomically manipulating multiple words is a pain to do portably and performantly. Some support from the language for such use of DSTs would be great to have.

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Aug 7, 2017

Contributor

Heap-allocating (HeaderIncludingLength, [Item]) (possibly with not all items initialized) is a common pattern that could use some more built-in support. But does it need to be language-level with DSTs, or could it be a library-level?

Contributor

SimonSapin commented Aug 7, 2017

Heap-allocating (HeaderIncludingLength, [Item]) (possibly with not all items initialized) is a common pattern that could use some more built-in support. But does it need to be language-level with DSTs, or could it be a library-level?

@bergus

This comment has been minimized.

Show comment
Hide comment
@bergus

bergus Jun 14, 2018

@SimonSapin Looking for exactly that as well. Have you found a suitable workaround yet?

bergus commented Jun 14, 2018

@SimonSapin Looking for exactly that as well. Have you found a suitable workaround yet?

@SimonSapin

This comment has been minimized.

Show comment
Hide comment
@SimonSapin

SimonSapin Jun 15, 2018

Contributor

@bergus Today, the best you can do is manual memory layout computation, raw allocation, and unsafe initialization. See for example:

https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L714-L727
https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L664-L680

Contributor

SimonSapin commented Jun 15, 2018

@bergus Today, the best you can do is manual memory layout computation, raw allocation, and unsafe initialization. See for example:

https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L714-L727
https://github.com/rust-lang/rust/blob/1.26.2/src/liballoc/rc.rs#L664-L680

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jun 15, 2018

Member

There's also this pattern for a slice with a header, and !Sized thin pointers:
https://github.com/rust-lang/rust/blob/5205ae8bdc2991eecf3bfbb58ed8f56c0673e738/src/librustc/ty/mod.rs#L603-L706

Member

eddyb commented Jun 15, 2018

There's also this pattern for a slice with a header, and !Sized thin pointers:
https://github.com/rust-lang/rust/blob/5205ae8bdc2991eecf3bfbb58ed8f56c0673e738/src/librustc/ty/mod.rs#L603-L706

@bergus

This comment has been minimized.

Show comment
Hide comment
@bergus

bergus Jun 15, 2018

@SimonSapin Thanks
@eddyb That's pretty much what I ended up with (needed a bit until I figured out #[repr(C)] was necessary…) though without the OpaqueSliceContents. I had seen extern type mentioned in #813, but the guide explanation from the RFC was not exactly helpful. Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

bergus commented Jun 15, 2018

@SimonSapin Thanks
@eddyb That's pretty much what I ended up with (needed a bit until I figured out #[repr(C)] was necessary…) though without the OpaqueSliceContents. I had seen extern type mentioned in #813, but the guide explanation from the RFC was not exactly helpful. Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

@eddyb

This comment has been minimized.

Show comment
Hide comment
@eddyb

eddyb Jun 15, 2018

Member

Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

Yes, and if you need mutable references, it prevents someone from (mis-)using mem::swap.

Member

eddyb commented Jun 15, 2018

Does this basically prevent us from allocating the dynamically-sized slice on the stack and from statically determining its size, so that we are forced to use it only through pointers?

Yes, and if you need mutable references, it prevents someone from (mis-)using mem::swap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment