Unify and nest structs and enums #24

Closed
wants to merge 3 commits into from

8 participants

@nrc

Another alternative to RFC #5 and an extension/variant of RFC #11.

Unify enums and structs by allowing enums to have fields, and structs to have
variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum
pointers. Remove struct variants. Treat enum variants as first class. Possibly
remove nullary structs and tuple structs.

@cmr

@nick29581 you didn't modify the template

@nrc nrc Unify and nest structs and enums
Another alternative to RFC #5 and an extension/variant of RFC #11.

Unify enums and structs by allowing enums to have fields, and structs to have
variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum
pointers. Remove struct variants. Treat enum variants as first class. Possibly
remove nullary structs and tuple structs.
ef2ff45
@nrc

Whoops! Thanks @cmr , fixed now.

@brendanzab

Not sure if this is the right place to express this, but I really think our sum types should use a different keyword to C-style enums. Perhaps the union keyword?.

union Foo<T, U> {
    A(T, U),
    B { t: T },
    C,
}
enum Foo: c_uint {
    A = 0x01,
    B,
    C = 0x10,
    D,
}

Whilst the intention behind using enum was to make C and C++ developers more at home I feel like it causes too much confusion to be worth it. The only precedent I have found is in Haxe. Using the union keyword would at least still retain the 'friendliness factor'.

Below is an exchange on #ada I had a couple of days ago. Whilst I think the person in question was being overly antagonistic and somewhat close minded, it is a good example of the constant confusion I am greeted with when outsiders first encounter Rust's enums. I know I was certainly confused at first.

<Lucretia> had a skim through the rust tutorial, not convinced
<bjz> what do you mean?
<Lucretia> enums as Lists? what drugs are they on?
<Lucretia> see the tutorial
<bjz> oh the linked list tutorial
<Lucretia> the way to implement a list with an enum
<bjz> have you used haskell?
<Lucretia> at uni
<bjz> or an ML?
<Lucretia> didn't get it
<Lucretia> ope
<bjz> an enum is a sum type
<Lucretia> was given a haskell tutorial from here
<bjz> or a 'variant type'
<bjz> or tagged union
<bjz> lots of names for it
<bjz> http://en.wikipedia.org/wiki/Tagged_union
<bjz> using enums for lists isn't really ideomatic in Rust
<bjz> but they are very useful for things like abstract syntax trees
<Lucretia> yeah, I wouldn't say an enum is that at all and neither does that link
<bjz> you can think of them as nini dynamic type systems, but you have to check what type it is before you can do operations on them
<bjz> what do you mean?
<Lucretia> you say an enum is a tagged_union, that link does not say that at all, just searched for enum on that page - it uses an enum to determine the type of a variant record
<bjz> yeah, I don't like the use of the 'enum' keyword either
<bjz> but the semantics are the interesting bit
<Lucretia> but to say that an enum and a list go together in the way they do is just wrong
<Lucretia> I remember "Cons" from uni - the term only, the meaning, not at all
<bjz> think of it like this: you can express what the semaintics of a C or Java enum with rust's enum
<bjz> but you can also express a lot more
<bjz> C/Java's enum is a subset of what you can do with Rust's enum
<Lucretia> but an enumeration is a set of values, that's it, it's not a list, no matter how you twist things, it's just not
<Lucretia> it's an orthogonal and separate concept
<bjz> the is a enumerated set of *types*
<bjz> I would highly recommend learning some haskell
<bjz> it would probably make lots of this stuff more clear
<bjz> lists and trees spring naturally out of sum types
<bjz> (no matter what you call them)

Edit: Perhaps I should make a separate RFC for this - sorry if I am derailing things.

@cmr

On first (and second) read-through, it sits really unwell with me, and the amount of open questions is worrying. I feel this is a drastic increase in complexity for a feature that honestly should be rare.

@brendanzab

@cmr Yeah, I am rather confused by the RFC :/

I understand if structs and enums were unified because structs are basically just single-variant enums, but I am still unclear about the motivation for this specific proposal.

@SimonSapin

@bjz I agree that the name "enum" is not great for full sum types (since almost everybody else is using that name for (roughly) sums of unit types), but that discussion does not belong here. I think you should write another RFC.

@brendanzab

@SimonSapin Yeah, sorry. Glad to hear that at least somebody feels the same way though.

@nrc

@cmr - yeah, it is a pretty big change, and the change is complex, but I'm not sure it adds complexity - it certainly removes features from the language and in that way removes complexity. Although I can see that it does make enums harder to grok. The number of open question is indicative that it is a big change and I wanted to get early feedback (I could have sat on this for a week and not had any questions, but I don't think that is a good approach).

I hope that as well as addressing the inheritance motivation we also address the motivation for refinement types, so we are killing more than one bird here. Also, I think that if it is not used a lot, that is more motivation to fit into existing structures, which I think this does (somewhat) rather than adding new structures, even if the new ones are simpler.

@nrc

@bjz the motivation is the same as PR #5 - to efficiently support the DOM and COM-like stuff. I get to that in a very roundabout way, sorry. I hope this is a more Rust-y way to provide a solution than PR #5 is. I.e., I generalise existing features until things work, rather than adding new features.

@cmr
@nrc

@cmr - no, by using nested enums you can specify a subset, albeit only subsets specified when defining the enum, not any subset (so it is not a total replacement, but I hope can be used to satisfy the common case).

@bill-myers

I think this is better than #5, but I must say I like my own #11 better, which shares the core ideas of using enums, having first class variants, etc.

An issue I see in this RFC (and not in #11) is that structs can be both instantiated and inherited. This makes the language less expressive because there is no easy way to distinguish the types "exactly struct S1" and "struct S1 or any derived struct".

My proposal in #11 is to instead make structs non-inheritable, which means that one has to create an enum with an empty variant instead of a struct with a struct variant like in RFC, which allows to natrually distinguish the types above (the former is denoted by the empty variant name, and the latter with the enum name).

The other major difference with #11 is that this RFC uses virtual methods, and allows overriding non-abstract methods while #11 exclusively uses traits for inheritance, and only allows overriding abstract methods.

To sum it up, the idea of #11 is that traits can be implemented using the "impl as match" syntax, which means "derive a trait implementation whose methods match on all variants and call the corresponding method in the impl for the variant" (where the match is likely implemented as vtable dispatch) or explicitly, but you cannot implement a trait explicitly if a base enum also implements it explicitly.

I think that leads to an easier to understand and cleaner language, because it forces to give a names to sets of virtual methods, unifies virtual dispatch and enum matching, allows external implementations of virtual methods, and by only allowing to override abstract functions, makes method lookup far simpler.

A key insight in this area is that the compiler can convert a match in the same crate of the type into a virtual method dispatch by extracting match arms into functions and assigning a vtable slot, and viceversa can implement virtual functions by matching on type tags, and aside from external ABI interoperability constraints, it is in fact an implementation detail to decide which to use; thus, we should unify those notions.

@nrc

@bill-myers re inheritable structs - you don't need a struct with a struct variant, just a struct - in fact struct variants would disappear. We could tweak this so that structs were not inheritable, but I think there is value in being able to instantiate non-leaf 'classes' in an inheritance hierarchy - we definitely need this for the DOM. In fact, if we take #11, I think we would have to change this.

re virtual methods, again, I think it is a requirement to allow overriding of non-abstract methods. The rest is just a different syntax really. I'm not really sure if involving traits is an advantage or disadvantage - in particular, it is not clear to me where we would get thin pointers and where fat pointers. I think it is important for it to be clear from the syntax when you fall off a fast path.

I agree that match and virtual dispatch are the same from the implementation point of view. I toyed with the idea of only doing dispatch via match, but I think the syntax would be cumbersome. Unless I misunderstand your proposal, #11 does not really unify since you still have separate match statements and impl ... as match? I guess I don't feel too strongly that we should not have two mechanisms here since the use cases are kind of different, but I could be persuaded.

@bill-myers

To put it with an example, regarding struct inheritance:

struct A
{
    struct B {...}
}

Is equivalent to:

enum A
{
    struct AStruct,
    struct B {...}
}

So you don't need to be able to override structs.

But in the former syntax there is no way to distinguish between "exactly A" and "A or B", making the language less expressive.

In the latter syntax, the former is called AStruct, and the latter is called A.

Now of course you could introduce syntax like "&struct A" to make the distinction, but that complicates the language unnecessarily.

That's why I think allowing to inherit structs is bad.

@bill-myers

Regarding overriding methods, the pseudocode:

class A
{
    virtual void foo() {...}    
}

class A1 : A {}
class A2 : A {}

class B : A
{
    override void foo() {...}
}

Is equivalent to:

abstract class A
{
    void a_foo() {...}
    void foo();
}

class AStruct : A
{
    void foo() {a_foo();}
}

class A1 : A {void foo() {a_foo();}}
class A2 : A {void foo() {a_foo();}}

class B : A
{
    void foo() {...}
}

So there is no need to allow overriding non-abstract methods.

In the first snippet, calling "foo" on B could technically refer to both the "foo" on A and the "foo" on B and you now need an explicit notion of virtual dispatch to distinguish between them, while in the second it can only refer to the "foo" on B because the one on "A" is abstract.

The implication here is that a human reader cannot get confused and think that the "foo" on A is being called rather than the "foo" on B, because the one on A is abstract.

Plus, you need an "override" keyword and concept.

Calling the version of foo in A from B is easily done with "a_foo" in the second snippet without needing to introduce "super.foo()" or "A::foo()".

The second snippet is more verbose, but one could add some syntax sugar to make it less verbose (namely, allowing to implement A1 and A2 at once).

This is to some extent a matter of taste, but I think the second snippet makes a simpler language and fits more with current Rust.

@bill-myers

Unless I misunderstand your proposal, #11 does not really unify since you still have separate match statements and impl ... as match? I guess I don't feel too strongly that we should not have two mechanisms here since the use cases are kind of different, but I could be persuaded.

"impl as match" is proposed to be syntax sugar for implementing each method by doing a match on all variants and calling the corresponding method, plus the exception allowing you to override a trait implemented as "impl as match".

I must say I don't like the exception, but I'm not sure how to do it otherwise; the idea is that the exception is fine, because "impl as match" guarantees that there is no difference between calling the function on the parent or on the derived class, since the one in the parent just redirects using match to the one on the derived class.

I suppose we could instead specify that the compiler detects when a trait is implemented using a straight redirecting match, and treats it as "impl as match", although that's not so great either.

[of course, the idea is that the compiler then optimizes the matches to use a vtable in most cases]

I'm not really sure if involving traits is an advantage or disadvantage - in particular, it is not clear to me where we would get thin pointers and where fat pointers.

There's no difference: enum pointers are thin, and trait object pointers are fat.

The idea of invoking traits is that "virtual methods" are put in a trait instead, which is separately implemented on each variant, and where the implementation on the enum does "virtual dispatch" to the impl for the variant corresponding to the dynamic type (either as a built-in language concept of virtual dispatch, or using "impl as match" syntax sugar).

This allows to give a name to sets of virtual methods that must be implemented or overridden together, and makes it naturally possible to define things like "impl as match" that would otherwise have to take raw method names.

@bill-myers

Here is a motivating example for forcing virtual methods to be in traits and not allowing to override them.

Let's say you have a web browser with an object hierarchy that supports renderToOpenGL and renderToPixmap, which are supposed to render the same image, but one as an OpenGL texture, and the other as an array of bytes.

You are currently printing by printing the pixmap, but that sucks, so you add a renderToPostscript function, hook it so the postscript is sent to the printer, and implement it for a base class.

With this RFC, or if you were using C++ or Java, your application now compiles, but it is totally broken, because you forgot to implement renderToPostscript for derived classes, so printing a document now no longer looks the same as the on-screen document (since you are instead overriding renderToPixmap).

If instead one were forced to put those methods in a Render trait, then you will be immediately faced with the prospect of changing a trait, and if you do so, all impls will fail to compile until you provide an implementation of the new method.

Let's say you decide instead to add a new RenderToPostscript trait and implement it for the base class.

If overriding trait impls is allowed, then your application will once again compile, and once again be totally broken, since you forgot to implement it for derived classes.

If overriding concrete impls is not allowed, then your implementation will only be for one concrete class, and your program will not compile because you forgot to implement it for the other concrete classes.

@jaredly jaredly commented on an outdated diff Apr 1, 2014
0000-enum-struct.md
+virtual call and is only necessary if the static type implements `Drop`.
+
+## Initialisers
+
+Need to think a bit about struct initialisers. We should require all fields to
+be specified. We should support constructors too. I'm not sure how we support
+'struct' initialisers for enums - which should not be instantiable. Since there
+is no kind of cross-module inheritance, perhaps it is not an issue since fields
+can always be accessed.
+
+## Calling overridden methods
+
+If a method is overridden, we should still be able to call it. C++ uses `::`
+syntax to allow this. In the example above we use `Foo::bar(self)` to indicate
+static dispatch of an overridden method. I'm not sure if this is currently
+valid Rust or if it is the optimal tsolution. But it looks nice to me and we
@jaredly
jaredly added a note Apr 1, 2014

typo tsolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@nrc
nrc commented Apr 1, 2014

Hmm, I fear I have missed something here - what I propose adds behaviour for pointer-to-enum that did not previously exist, but I neglected the enum value case. We certainly don't want all the variants of an enum like this to be the same size, so then matching on an enum value could not be supported. That seems bad. I'm not sure if there is a solution. I guess that is a distinction between enums and inheritance, and perhaps makes me feel less bad about the duplication of behaviour there with an approach like #5.

@nikomatsakis

@nick29581 I haven't followed the entire conversation. I had a hard time understanding the proposal, I fear, but my biggest fear was precisely what you seemed to be hitting on here -- I didn't quite get how the by-value enum case fits in.

@esummers

We certainly don't want all the variants of an enum like this to be the same size, so then matching on an enum value could not be supported.

@nick29581 Maybe enums could be fixed size or unsized based on context. A pointer to an enum that doesn't allow the variant to be changed could be unsized. All other instances could be sized. Basically the variant tag is immutable, but the fields inside could be mutable or immutable.

@dobkeratops

Interesting to see all this.

[1] would any of this facilitate a future enhancement (or unsafe hacks) where immutable enums could be compacted (the tag implies the size of the type, by type-specific lookup; different sized variants can be placed back to back, eg tree leaves and nodes , reducing the number of pointers required to do that sort of thing). of course i can do that now in C or in rust unsafe code.

[2] if you lose 'tuple-structs' (i dont mind, tuples and real structs are more useful), are the enum variants still going to be able to look the same .. a tag/variant name and a tuple.. i think those are very handy, even though i haven't wanted to use 'tuple-structs' elsewhere

[3] are you going to be declaring actual virtual functions like in C++ classes? it looked to me like you could keep the idea of traits describing vtables, and just use inheritance of structs to say a trait can assume some fields (and vica versa, perhaps inherit or embed the trait in a struct and it would check whats compatable)

[4] I likeed the idea of keeping vtables more general , eg adding sugar for accessing them and composing with a struct pointer for a specific call - allowing for layouts and uses beyond whats been formalized in various languages (like 'class-objects' that hold a vtable and metadata applied to a collection of other objects, or using vtable swaps as state-machines.. that could all be done safely if you had propper types for them). I was pleased with the hacks one seemed to be able to do already with transmute. (i saw eddyb's many and had a bash at emulating c++ layout myself)

i'm definitely keen on plain struct single inheritance, thats just shortening the paths to the most common data.

@nrc
nrc commented Apr 1, 2014

@esummers I don't think that addresses the issue - the problem is with values only (pointer-to-variant is not really an issue). The problem is that some variants might be very small and some very large and we don't want to pad the smaller ones to the size of the larger ones. We must have a size for values to be able to compile, so unsized there isn't an option.

@nrc
nrc commented Apr 1, 2014

@dobkeratops 1 - I don't think this would facilitate that, but we would probably need something like that to enable this, that is the downside I noted a few comments up and which I didn't think about initially.

2 - yes, enum variants could still be a name + tuple combo. Struct varitants could still be used, they would just be the same as regular structs, so its not that the idea disappears, only that it is redundant. The syntactic change to a program would just be adding the struct keyword.

3 - Yes, we add virtual fns to impls. We explicitly wanted to avoid traits for this because using traits requires a fat pointer and we want thin ones here. Having this as an optimisation is against Rust's guiding principle of predictable performance. Also, having fields in traits (in any way) further blurs the distinction between data and behaviour. Since we already allow functions (behaviour) for impls for structs, we don't make things worse this way.

4 - this is probably a matter of taste. It is certainly flexible and in some ways elegant. But I am not a fan, I prefer a language to be easier to use and present abstractions for that kind of thing. Having to use unsafe code/transmute for a relatively common and safe use case, seems bad to me. Its not clear to me if you can guarantee the performance characteristics we require that way either, but perhaps you can.

@dobkeratops

4- well with just a bit of safe sugar - it wouldn't be an unsafe codepath to do this. Some intrinsic functions..

fn vtable<St,Tr>() -> VTable<St,Tr>,
fn as_trait_obj<St,Tr>(s:&St,v:&VTable<S,T>)->&Tr   // St=Struct, Tr=Trait

maybe sleeker syntax is possible (... for .. ) get's the vtable - symetrical with 'impl for..' .... and could any tuple (&St, &VTable) be vcallable, '&Ttrait' is just something that coerces to..
https://gist.github.com/dobkeratops/9841737 this was my own hack to make C++ style vtables as it stands now, (i'm sure others have done similar and there might be better ways to do it)

I would see adding this type of sugar as leveraging more of rusts' existing character rather than retrofitting a completely different vtable system centred on structs

@esummers

The problem is that some variants might be very small and some very large and we don't want to pad the smaller ones to the size of the larger ones. We must have a size for values to be able to compile, so unsized there isn't an option.

@nick29581 I guess I didn't really mean unsized. I meant sized to the variant instead of sized to the enum. A pointer to an enum could be sized to the variant and everything else sized with padding to the largest variant. I think that once we just have a reference we don't care about the other variants because we can never become one of those. Basically the size is statically determined when it is constructed based on the size of the variant (but only when it is a pointer). When using virtual inheritance, you will always pass by reference.

Maybe I have a flaw in my reasoning somewhere, but I mean sized to variant when passed by reference.

EDIT: I was assuming heap allocations when using virtual inheritance (so size on stack doesn't matter), but maybe that is a bad assumption.

@nrc
nrc commented Apr 1, 2014

@esummers - the problematic case is given an enum E and a function fn(x: E) you need to know how much space to leave for x on the stack. We solve this at the moment by leaving the same amount of space for every variant, that is the maximum amount. But if we want to allocate lots of the small variants and they are all padded to the size of the largest variant, then that is a waste. You can't size items differently if they are values or pointers-to-values, since you can dereference the pointer to get a value and then you need the size to be the same as if you had started with a value.

@nrc
nrc commented Apr 1, 2014

@dobkeratops I'm afraid this is just going to come down to taste. You are right that we can avoid adding a language feature this way, but I don't think it is worth it in exchange for lots of ugly boilerplate all over the place.

If I understand your example correctly, you are still passing a tuple - so it is two words per pointer, not one.

@dobkeratops

for lots of ugly boilerplate
Something similar could be given in standard library code - and other features(vtable sugar/single inheritance) would streamline them

If I understand your example correctly, you are still passing a tuple
not quite: - in my example, the structure layout is the same as a C++ class: the vtable is "hacked out" by cast::transmute in make_class! and stored in the member 'vtable'. You reference the whole with one pointer.

then a temporary 'trait object' is created for a vcall ,by the member function '.as_trait_obj()'. I'm assuming that will inline. (I should add #[inline]). TBAA would cover opt.
That could be 'deref' to streamline the calls visually, but i'd already used that for field accesss (fields are behind '.data')

eddyb's sample is more interesting, it creates a type "Many" along similar lines that has multiple vtable interfaces carried for one struct .. mine could be seen as a special case of that.

Well I dont know whats easier to to implement in the compiler, or what would get more demand. I guess people are familiar with C++ behaviour, and virtuals,single-inheritance + traits wouldn't be so different to virtuals+multiplle-inheritance... but this method would keep one vtable concept and make it more versatile

@bill-myers

Regarding sizing, the simplest and default approach should be to have a fixed size like current enums (and thus pad the smaller variants).

As an extension, one can add an "unsized" keyword that makes enums unsized (which of course requires to have implemented DST before).

However, note that with unsized enums, you must either disallow assigning to an &mut or ~ of an enum, or throw a run-time error if the run-time variant is different (since assignment is impossible if the new variant is of a different size).

This is the same restriction that languages like Java or C# have (note that Java or C# also disallow assigning non-overridable classes, which is unnecessary and a bad idea in Rust).

You can allow to pass unsized enums by value by padding them, if inheritance is closed; if inheritance is open, then you cannot pass them by value (unless you autobox, but I guess we don't want that).

@nrc
nrc commented Apr 2, 2014

Ah, that might be nice. I think we would indeed prevent dereferencing of DST and pointers to struct objects, so that side of things would all work. We would just need to add the keyword as you suggest to indicate the unsized-ness and forbid referring to such values by their enum (as opposed to variant) type.

Padding (even for closed inheritance, as here) is a non-starter in general, since some variants might be hugely bigger than others (e.g, in the DOM).

@nrc
nrc commented Apr 2, 2014

As a note (which I'll incorporate into the RFC later), JDM pointed out that having all 'classes' in one lexically nested block is impractical. We also need to allow specifying 'classes' in sub-modules (so they can be in different files). Both problems are solvable, but need to be addressed.

@nrc

Superseded by #142

@nrc nrc closed this Jun 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment