New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Structural Records #2584

Open
wants to merge 15 commits into
base: master
from

Conversation

Projects
None yet
@Centril
Contributor

Centril commented Nov 2, 2018

πŸ–ΌοΈ Rendered

πŸ“ Summary

Introduce structural records of the form { foo: 1u8, bar: true } of type { foo: u8, bar: bool } into the language. Another way to understand these sorts of objects is to think of them as "tuples with named fields", "unnamed structs", or "anonymous structs".

πŸ’– Thanks

To @kennytm, @alexreg, @Nemo157, and @tinaun for reviewing the draft version of this RFC.
To @varkor and @pnkfelix for good and helpful discussions.

@Diggsey

This comment has been minimized.

Contributor

Diggsey commented Nov 2, 2018

This looks like a super well thought out and comprehensive RFC.

It might be worth clarifying the behaviour of #[derive(...)] with respect to types like:

struct RectangleTidy {
    dimensions: {
        width: u64,
        height: u64,
    },
    color: {
        red: u8,
        green: u8,
        blue: u8,
    },
}

Presumably, there will be no "magic" here, and you will only be able to derive traits which are implemented for the anonymous structs themselves.

Another question is around trait implementations: at the moment, 3rd party crates can provide automatic implementations of their traits for "all" tuples by using macro expansion to implement them for 0..N element tuples. With anonymous structs this will not be possible. Especially with the likely arrival of const generics in the not too distant future, negating the need for macros entirely, anonymous structs will become second class citizens. Is this a problem, and are there any possible solutions to allow implementing traits for all anonymous structs?

other_stuff(color.1);
...
yet_more_stuff(color.2);
}

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor
fn do_stuff_with((red, green, blue): (u8, u8, u8)) {
    some_stuff(red);
    other_stuff(green);
    yet_more_stuff(blue);
}

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

Even better, with #2522:

fn do_stuff_with((red: u8, green: u8, blue: u8)) {
    some_stuff(red);
    other_stuff(green);
    yet_more_stuff(blue);
}

However, while if you write it in this way it is clear what each tuple component means, it is not clear at the call site...

let color = (255, 0, 0);
// I can guess that this is (red, green, blue) because that's
// the usual for "color" but it isn't clear in the general case.
blue: u8,
},
}
```

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

This syntax looks very similar to another already accepted RFC, but semantics seems different - #2102.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

True; I have to think about if some unification can be done in some way here.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 2, 2018

@Diggsey

This looks like a super well thought out and comprehensive RFC.

Thanks!

It might be worth clarifying the behaviour of #[derive(...)] with respect to types like:

struct RectangleTidy {
    dimensions: {
        width: u64,
        height: u64,
    },
    color: {
        red: u8,
        green: u8,
        blue: u8,
    },
}

Presumably, there will be no "magic" here, and you will only be able to derive traits which are implemented for the anonymous structs themselves.

This bit is not currently well specified, but it should be so I will fix that (EDIT: fixed)... I see two different ways to do it:

  1. Use the structural records in there as types; this means that #[derive(Default)] would produce:

    impl Default for RectangleTidy {
        fn default() -> Self {
            RectangleTidy {
                dimensions: Default::default(),
                color: Default::default()
            }
        }
    }

    The main benefit of this approach is that it is quite simple to implement. It will also work for all derivable standard library traits because implementations are auto-provided for structural records.

  2. Use the structural records in there as syntax; this means that #[derive(Default)] would produce:

    impl Default for RectangleTidy {
        fn default() -> Self {
            RectangleTidy {
                dimensions: {
                   width: Default::default(),
                   height: Default::default(),
                },
                color: {
                    red: Default::default(),
                    green: Default::default(),
                    blue: Default::default(),
                }
            }
        }
    }

    This would need to be done recursively in the macro but it shouldn't be very hard to implement.

    The main benefit of this is flexibility. More types would be derivable.

    However, due to the automatically provided implementations as noted in 1. there is no benefit to this approach for the standard library so I think the sane behaviour is 1.

    That said, a crate like serde can benefit from using the "magical" approach in 2. and we certainly cannot mandate what a procedural macro must do in the language so it will be up to each derive macro to decide what to do.

Another question is around trait implementations: at the moment, 3rd party crates can provide automatic implementations of their traits for "all" tuples by using macro expansion to implement them for 0..N element tuples.

Yup. Usually up to 12, emulating the way the standard library does this.

Especially with the likely arrival of const generics in the not too distant future,
negating the need for macros entirely, anonymous structs will become second
class citizens.

Nit: to not have to use macros for implementing traits for tuples you need variadic generics, not const generics. :)

Is this a problem, and are there any possible solutions to allow implementing traits for all anonymous structs?

I think it might be possible technically; I've written down some thoughts about it in the RFC. However, the changes needed to make it possible might not be what folks want.

However, not solving the issue might also be good; by not solving the issue you add a certain pressure to gradually move towards nominal typing once there is enough operations and structure that you want on the type.

standard traits that are implemented for [tuples]. These traits are: `Clone`,
`Copy`, `PartialEq`, `Eq`, `PartialOrd`, `Ord`, `Debug`, `Default`, and `Hash`.
Each of these traits will only be implemented if all the field types of a struct
implements the trait.

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

That's... quite a bit of magic.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

I agree; this is noted in the drawbacks.

This comment has been minimized.

@Centril

Centril Nov 3, 2018

Contributor

Ideas for possible magic reduction discussed below in #2584 (comment).

+ For `PartialEq`, each field is compared with same field in `other: Self`.
+ For `ParialOrd` and `Ord`, lexicographic ordering is used based on
the name of the fields and not the order given because structural records

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

With macros fields can have names with same textual representation, but different hygienic contexts.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

Does that change anything wrt. the implementations provided tho? I don't see any problems with hygiene intuitively, but maybe I can given elaboration?

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

This is probably relevant to other places where the order of fields needs to be "normalized" as well.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

Sure. :) But I'm not sure if you are pointing out a problem or just noting...
Any sorting / "normalization" is done post expansion.

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

It's not clear to me how to sort hygienic contexts in stable order, especially in cross-crate scenarios. That probably can be figured out somehow though.

```rust
ty ::= ... | ty_srec ;
ty_srec ::= "{" (ty_field ",")+ (ty_field ","?)? "}" ;

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

I don't think types can start with { syntactically.
There are multiple already existing ambiguities where parser assumes that types cannot start with {, and upcoming const generics introduce one more big ambiguity ({} is used to disambiguate in favor of const generic arguments).
#2102 uses struct { field: Type, ... } to solve this (and also to discern between struct { ... } and union { ... }).

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

I'm not sure about expressions/patterns.
Unlimited lookahead may also be required, need to check more carefully.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

There are multiple already existing ambiguities where parser assumes that types cannot start with {

Such as? (examples please...)

Const generics allow literals and variables as expressions but anything else needs to be in { ... }.
This is not ambiguous with structural records (because the one-field-record requires a comma...) but requires lookahead.

Unlimited lookahead may also be required, need to check more carefully.

Scroll down ;) It's discussed in the sub-section "Backtracking".

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

It's discussed in the sub-section "Backtracking".

It's a huge drawback.

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

Such as? (examples please...)

Nothing that can't be solved by infinite lookahead, but here's "how to determine where where clause ends":

fn f() where PREDICATE1, PREDICATE2, { a: ...

Does { a: ... belong to the function body or to the third predicate?

This comment has been minimized.

@Centril

Centril Nov 4, 2018

Contributor

@H2CO3 oh sure; the parser must handle it, but the perils of backtracking is that it would considerably slow down parsing or produce bad error messages. What I'm saying is that the code paths in the parser that would handle this are pathological, so the slow downs are unlikely.

There's also no ambiguity here at all, but lookahead / backtracking / GLL may be required.

If we ever use the GLL crate in the compiler then we don't even need backtracking... it provides a correct algorithm to parse context-free-grammars and can produce good error messages for these sort of things because it produces a parse forest that you can catch certain patterns on.

This comment has been minimized.

@petrochenkov

petrochenkov Nov 4, 2018

Contributor

The fact that advanced formalisms exist doesn't necessarily mean that we should use them.
I many respects the simpler means the better.
Officially abandoning LL(n) is a separate decision that shouldn't be done in comments to a mostly unrelated RFC.

(That said, for this RFC disambiguation can also be done using cover grammar TYPE_OR_EXPR.
We parse { IDENT: TYPE_OR_EXPR and look at the next token, if it's , then we have a struct record type.)

(Nit: GLL is a backtracking of sorts, just with fancy memoization making is O(n^3) rather than exponential).

This comment has been minimized.

@Centril

Centril Nov 4, 2018

Contributor

The fact that advanced formalisms exist doesn't necessarily mean that we should use them.
I many respects the simpler means the better.

Well there are advantages to both, and we should decide what trade-off is best for Rust.

Officially abandoning LL(n) is a separate decision that shouldn't be done in comments to a mostly unrelated RFC.

Abandoning LL(k) must be made for a reason, and that reason would be some feature or change we would like to make that forces us to do it. I don't think (correct me if that is inaccurate) we ever stated explicitly that LL(k) is a design goal so if so it doesn't need explicit and official abandonment either.

So I think either we abandon LL(k) on this RFC or in the #2544 RFC, or we accept neither and keep LL(k).

(That said, for this RFC disambiguation can also be done using cover grammar TYPE_OR_EXPR.
We parse { IDENT: TYPE_OR_EXPR and look at the next token, if it's , then we have a struct record type.)

Yeah, that's the idea I had also.

(Nit: GLL is a backtracking of sorts, just with fancy memoization making is O(n^3) rather than exponential).

Fair enough ;)

This comment has been minimized.

@Laaas

Laaas Nov 7, 2018

Even if you disregard the fact that this syntax would necessitate backtracking, it is confusing for readers IMO. Why not require a struct keyword or something?

This comment has been minimized.

@Centril

Centril Nov 7, 2018

Contributor

@Laaas Because it is inconsistent with Tuples/Tuple-structs and also less ergonomic.

I don't think it's confusing for readers, the one-field case is clearly disambiguated by an extra comma and that is consistent with how tuples are treated. In other words, if { x, } is confusing, then so is (x,).

field_init ::= ident | field ":" expr ;
```
Note that this grammar permits `{ 0: x, 1: y }`.

This comment has been minimized.

@petrochenkov

petrochenkov Nov 2, 2018

Contributor

#1595 was closed, so I'd expect this to be rejected.
This saves us from stuff like "checking that there are no gaps in the positional fields" as well.

This comment has been minimized.

@Centril

Centril Nov 2, 2018

Contributor

Yeah probably; I've left an unresolved question about it.

This comment has been minimized.

@varkor

varkor Nov 3, 2018

Member
struct A(u8);
let _ = A { 0: 0u32 };

is accepted, so this is currently inconsistent as it stands anyway.

This comment has been minimized.

@Centril

Centril Nov 4, 2018

Contributor

@varkor yeah that was accepted to make macro writing easier (and it totally makes it easier...!)

@petrochenkov

This comment has been minimized.

Contributor

petrochenkov commented Nov 2, 2018

I think the language should be frozen for large additions like this in general, for a couple of years at least.
This is not a crucial feature and not a completion of another crucial feature, motivation seems really insufficient for a change of this scale.

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 3, 2018

The trait auto-implementation magic worries me, mostly from the point of view of things not the compiler wanting to implement traits on things. Today crates can do a macro-based implementation for the important tuples, and I can easily see a path to variadic generics where the impls could easily support any tuple. But I can imagine serde really wanting to be able to support these, so I wouldn't want to accept them without some sort of plan -- especially as we're getting closer to finally having good stories for our two other structural type families.

I'd also be tempted to block this on changing struct literal syntax to using = -- I really don't like the lookahead requirement here.

Overall, I think I feel that this is cool, but not necessary.

@clarcharr

This comment has been minimized.

Contributor

clarcharr commented Nov 3, 2018

I'm very concerned with properties of the struct explicitly relying on lexicographic ordering. I don't think that Ord or PartialOrd should be implemented for structural records-- rather, they should be treated as having unordered fields. If someone wishes to enforce Ord or PartialOrd, I feel that they should be required to either name the struct or use a tuple.

I think that defining lexicographic behaviour for Hash and Debug, however, is completely fine. Hash really only needs to uphold that x == y => hash(x) == hash(y), and any ordering (as long as it's consistent) will satisfy this. Additionally, lexicographic ordering makes the most sense for debugging.

But as a user, I would be surprised to find the below code fail:

let first = { foo: 1, bar: 2 };
let second = { foo: 2, bar: 1};
assert!(first < second);
@Centril

This comment has been minimized.

Contributor

Centril commented Nov 3, 2018

@clarcharr

I've added an unresolved question to be resolved prior to merging (if we do that...) about whether (Partial)Ord should be provided or not. I'm certainly open to your suggestion.

@scottmcm

I'd also be tempted to block this on changing struct literal syntax to using = -- I really don't like the lookahead requirement here.

I would have loved if we had used = instead... but blocking on that seems like blocking on a thing that no one will agree to changing because it would cause churn in more or less every single Rust program/library ever written... ;)

The trait auto-implementation magic worries me, mostly from the point of view of things not the compiler wanting to implement traits on things.

Sure; I entirely agree with that sentiment; it is by far the biggest drawback.

Today crates can do a macro-based implementation for the important tuples, and I can easily see a path to variadic generics where the impls could easily support any tuple. But I can imagine serde really wanting to be able to support these, so I wouldn't want to accept them without some sort of plan -- especially as we're getting closer to finally having good stories for our two other structural type families.

I noted somewhere in the RFC that with the combination of const generics and variadic generics, you may be able to extend this to structural records as well. For example (please note that this is 100% a sketch), using a temporary syntax due to Yato:

impl
    <*(T: serde::Serialize, const F: meta::Field)>
    // quantify a list of pairs with:
    // - a) type variables T all bound by `serde::Serialize`,
    // - b) a const generic variables F standing in for the field of type meta::Field
    //       where `meta` is the crate we reserved and `Field` is a compile time
    //       reflection / polymorphism mechanism for fields.
    serde::Serialize
for
   { *(F: T) }
   // The structural record type; (Yes, you need to extend the type grammar *somehow*..)
{
    // logic...
}

This idea is inspired by Glasgow Haskell's Symbol.

@varkor

This comment has been minimized.

Member

varkor commented Nov 3, 2018

While this would address a few pain points (such as defining nested anonymous structs), I tend to agree with @petrochenkov and @scottmcm in that I don't think there's sufficient need here to justify the feature at the moment.


That said, at any rate, I think the unit type equivalent needs to be mentioned here. Is {} a unit type? It stands to reason that it should be, but the proposed grammar doesn't account for it at the moment and I'm not sure if there is extra ambiguity if it is. (It technically already functions like a unit expression, so that's not an issue.)

@mark-i-m

This comment has been minimized.

Contributor

mark-i-m commented Nov 4, 2018

Sorry if this has been discussed already... I haven't been able to keep up.

One possible alternative would be to just allow naming the fields of a tuple. The names don't affect the types at all. In every other respect, they are ordinary tuples that we already have.

For example, one might write the following:

fn foo() -> (x: u8, y: u8) { // same as (u8, u8)
  (0, 0)
}

fn main() {
  let baz: (u8, u8) = foo();
  println!("{} {}", baz.x, baz.y);
  println!("{} {}", baz.0, baz.1);
@iopq

This comment has been minimized.

Contributor

iopq commented Nov 4, 2018

Problem with that is is that it's going to be equivalent to (y: u8, x: u8) because those tuples are the same type.

So how does the compiler know which one is x and which one is y? It can't, since those types are equivalent

@rodrimati1992

This comment has been minimized.

rodrimati1992 commented Nov 4, 2018

If this RFC gets postponed you could emulate the polymorphic aspect of this for parameters using custom derive + const generics.

This is how it would look:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2015&gist=85d6c16e6ac3c66c297cbfbcc763a00c

The obvious downsides are that this requires every struct to be annotated with #[derive(Structural)] ,and that the syntax for accessing a field is not very ergonomic.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 4, 2018

@varkor

That said, at any rate, I think the unit type equivalent needs to be mentioned here. Is {} a unit type? It stands to reason that it should be, but the proposed grammar doesn't account for it at the moment and I'm not sure if there is extra ambiguity if it is. (It technically already functions like a unit expression, so that's not an issue.)

No, {} is not a unit type; it's not legal in the grammar and if we made it legal I think you'd have ambiguity with const generics... sadly. That is, const generics is banking on { X } being a legal expression, but if you also made it a type, then it's ambiguous.

@H2CO3

This comment has been minimized.

H2CO3 commented Nov 4, 2018

Agree with @petrochenkov, @scottmcm and @varkor here. While there are several well-thought-out and well-designed parts of this RFC (great job, @Centril!), most of the remaining questions seem hard to resolve nicely (especially those around syntactic and typing ambiguities), and I'd prefer energies to be focussed on more substantial features.

@leonardo-m

This comment has been minimized.

leonardo-m commented Nov 4, 2018

Code like this is quite bad, the fields named are usually longer than just three letters, it will often lead to too much large signatures:

fn do_stuff_with(color: { red: u8, green: u8, blue: u8 }) {

As we can see, the current situation is inconsistent. While the language provides for unit types and positional product types of both the nominal and structural flavour, the structural variant of structs with named fields is missing while the nominal type exists. A consistent programming language is a beautiful language, but it's not an end in itself. Instead, the main benefit is to reduce surprises for learners.

This is a bogus argument. It's not like searching for symmetries in quantum physics. In language design symmetries can't be used to justify adding a feature. If a feature is useful then it could be added, but not just because a cell in a feature matrix is still empty. So please remove this part from the proposal.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 4, 2018

@leonardo-m

This comment has been minimized.

leonardo-m commented Nov 4, 2018

Reducing-complexity-with-uniformity: right, sometimes removing missing parts reduces the number of things you have to remember, and allows to use the language in a smoother way. In the RFC proposed here removing the asymmetry increases the language complexity, increases the number of things to know and remember, increases the compiler complexity, so it's not one of those cases.

@Centril

This comment has been minimized.

Contributor

Centril commented Nov 4, 2018

In the RFC proposed here removing the asymmetry increases the language complexity, increases the number of things to know and remember, increases the compiler complexity, so it's not one of those cases.

Reasonable people can disagree here. I personally view it as a "you can infer this from other features in the language / expect it to be there" and this is then backed up by other people actually saying so (i.e. I didn't get this from nowhere...). Also, by complexity we almost never mean compiler implementation complexity but rather the "complexity budget" for users. In any case, this is a minor motivation (that is situated last on purpose and for a reason: because it's the least important argument) and you don't have to agree with everything said in it.

Show resolved Hide resolved text/0000-structural-records.md Outdated
Indeed, we would much prefer to use a less magical approach,
but providing these traits without compiler magic would require
significantly more complexity such as polymorphism over field names

This comment has been minimized.

@oli-obk

oli-obk Nov 6, 2018

Contributor

I think you underestimate the compiler complexity needed to support this as written in the RFC. Additionally this will cause not insignificant code bloat for all the impls required.

I think the most reasonable way to support this RFC, plus loads of default trait impls, plus custom trait impls for users is to desugar these types (and their field accesses) to tuples (and tuple field accesses).

From the user perspective, this means that if you implement a trait for (T, U), then it is also available for {bar: T, foo: U} assuming that foo is sorted after bar. It is left to the one who implements a trait for tuples to ensure that the impls are order independent.

From the compiler perspective, we'd create a new TyKind:

    /// The second field is equivalent to the field for the `Tuple` variant
    /// The first field is a list of names for the tuple's fields
    StructuralRecord(&[Name], &[Ty<'tcx>]),

(or we just add this first field to Tuple in general (filling it in with ["0", "1", ...] for normal tuples and do everything below for the Tuple variant)

At this point we guarantee that the memory layout of structural records is equivalent to that of their corresponding tuple. This means the compiler may transmute freely between the two. So when we're calling <{bar: u32, foo: String} as PartialEq>::eq will actually invoke <(u32, String) as PartialEq>::eq by effectively transmuting each argument (&self and other: &{bar: u32, foo: String}) to &(u32, String)

Field accesses via .foo or .bar can be lowered to .1 and .0 respectively after typechecking.

The only downside I see with this is that <{bar: u32, foo: String} as Debug>::fmt will print a tuple and not the fields and that <{bar: u32, foo: String} as PartialEq<(u32, String)> actually typechecks.

Also for serde, all structural records would be serialized as tuples, losing the advantage of named fields in human readable serialization formats.

This comment has been minimized.

@burdges

burdges Nov 6, 2018

I'd agree the code works exactly like tuples, and code deduplicating rocks, but we need structural records to be different types from tuples if we want this to serve much purpose.

This comment has been minimized.

@oli-obk

oli-obk Nov 6, 2018

Contributor

With this scheme they'd be different types up to the point of trait resolution. So you can't pass a (u32, String) where a {bar: u32, foo: String} is expected and vice versa. But when you use trait methods, then they are the same type.

This comment has been minimized.

@eddyb

eddyb Nov 6, 2018

Member

I'm not sure this is a good idea. Things like comparisons have lexicographic ordering, which would observe an order we don't want to exist.

Also, implementing something like field-name/row polymorphism isn't as hard as coming up with a syntax for it.

This comment has been minimized.

@eddyb

eddyb Nov 6, 2018

Member

Note that Copy & Clone are already automatically generated by rustc for tuples and closures bypassing derive, so I don't think extending anything that works for closures to records is fundamentally problematic in any way.

@Centril basically, feel free to rely on #2132 and #2133 as precedent.
However, note that Default is not included, so you'd have to probably make an RFC to add compiler-provided Default to tuples, if not closures.

This comment has been minimized.

@oli-obk

oli-obk Nov 6, 2018

Contributor

I think the most reasonable way to support this RFC

That statement did not mean that I thought it was a reasonable way.

Note that Copy & Clone are already automatically generated by rustc for tuples and closures bypassing derive, so I don't think extending anything that works for closures to records is fundamentally problematic in any way.

This does not address the serde use case at all. I don't think it's reasonable to have anonymous structs without an ability to generate trait impls for them.

This comment has been minimized.

@eddyb

eddyb Nov 6, 2018

Member

I agree about serde. I don't think there's a clean solution without #2584 (comment).

At this point, const generics and/or VG are required for experimenting with any kind of anonymous record scheme, that can actually handle serde, if we don't want to outright add subtyping(-like) constructs to the language.

This comment has been minimized.

@Centril

Centril Nov 7, 2018

Contributor

@oli-obk

I think you underestimate the compiler complexity needed to support this as written in the RFC. Additionally this will cause not insignificant code bloat for all the impls required.

I don't think I've estimated the compiler complexity at all; I've provided a specification of the behavior, not exactly how it would be implemented. I fully expect some complexity to come from this. There's an infinite number of implementations possible; so they will need to be lazily provided.
If de-duplication is the concern then some sort of de-duplication of impls could ostensibly be provided the fields are of no concern.

From the user perspective, this means that if you implement a trait for (T, U), then it is also available for {bar: T, foo: U} assuming that foo is sorted after bar. It is left to the one who implements a trait for tuples to ensure that the impls are order independent.

This would make that behavior observable in the type system such that (T, U) and {bar: T, foo: U} are effectively the same type.

At this point we guarantee that the memory layout of structural records is equivalent to that of their corresponding tuple.

The RFC already states that this is the case (in the section on dynamic semantics) but equivalent memory layout is not a sufficient condition to permit free transmutation. I think this would need to be stated separately that the compiler may freely transmute them.

The only downside I see with this is that <{bar: u32, foo: String} as Debug>::fmt will print a tuple and not the fields and that <{bar: u32, foo: String} as PartialEq<(u32, String)> actually typechecks.

This is quite the drawback and quite surprising... It would also mean that implementations are only provided for records of size up to 12. (pending VG). I think the behavior wrt. PartialEq this is fine as an implementation strategy as long as no behavior is observable

Also for serde, all structural records would be serialized as tuples, losing the advantage of named fields in human readable serialization formats.

Also not a great drawback.

@eddyb

At this point, const generics and/or VG are required for experimenting with any kind of anonymous record scheme, that can actually handle serde, if we don't want to outright add subtyping(-like) constructs to the language.

I don't think we want to add subtyping-like constructs (row polymorphism is not sub-typing so it doesn't count).

With respect to const generics and VG, I already noted this in #2584 (comment). This is not unheard of... GHC allows you to sort of do this so that structural records can be provided well as a library solution.

rfc/structural-records: fix typo.
Co-Authored-By: Centril <twingoow@gmail.com>
@eddyb

This comment has been minimized.

Member

eddyb commented Nov 6, 2018

I love records, but I'm not sure they work well in a language without subtyping that allows combining {a:T} and {b:U} in one way to get {a:T, b:U} and in another to get {a:T} | {b:U}.

Probably the best example of what I think would work great, is TinyBang & friends's "onions".
A more mainstream example is TypeScript, although I don't know how sound it is.

@bestouff

This comment has been minimized.

Contributor

bestouff commented Nov 7, 2018

One great aspect of this is it's way easier to create substructures (the RectangleTidy example) which is sometimes needed to be able to borrow different parts of a struct separately.

@iopq

This comment has been minimized.

Contributor

iopq commented Nov 7, 2018

I wouldn't expect {x: u8, y: u8, z: u8} to implement the same traits as {r: u8, g: u8, b: u8}

Just because something is hard to do right, doesn't mean it's reasonable to do it wrong

For example, if `A` and `B` are `Send`, then so is `{ x: A, y: B }`.
A structural record is `Sized` if all field types are.
If the lexicographically last field of a structural record is `!Sized`,

This comment has been minimized.

@fbenkstein

fbenkstein Nov 7, 2018

I feel like forcing a particular name by in unnecessarily restrictive. Couldn't the unsized field just be reordered to be last? What's the imagined use case even for allowing unsized fields or is it just for completeness? Wouldn't it be more parsimonious to disallow them initially?

This comment has been minimized.

@Laaas

Laaas Nov 7, 2018

I think disallowing unsized fields would be a shame, since they are allowed in tuples, although even in tuples they are awkward to use, since you must place them last, and if they are not, you will get an odd error about the field not being sized, when you try to use it.

This comment has been minimized.

@eddyb

eddyb Nov 7, 2018

Member

We kind of want to allow any tuple field to be unsized, FWIW.
The current limitation is almost entirely an implementation limitation.

This comment has been minimized.

@Centril

Centril Nov 7, 2018

Contributor

I feel like forcing a particular name by in unnecessarily restrictive. Couldn't the unsized field just be reordered to be last?

I don't think so no; The type system needs to see these in lexicographic ordering in all respects so that type identity can be determined by sorting fields lexicographically. Otherwise I think the well-formedness requirements would become difficult to reason about. This is also the way Hash, Debug, and behavior wrt. coherence is meant to work.

What's the imagined use case even for allowing unsized fields or is it just for completeness? Wouldn't it be more parsimonious to disallow them initially?

Purely for consistency with tuples. I want to achieve parity with them.

We kind of want to allow any tuple field to be unsized, FWIW.
The current limitation is almost entirely an implementation limitation.

If and when we do that then we can also lift the same restrictions for structural records but I think it would be premature to do it only for structural records.

This comment has been minimized.

@Nemo157

Nemo157 Nov 7, 2018

Contributor

The type system needs to see these in lexicographic ordering in all respects so that type identity can be determined by sorting fields lexicographically.

This doesn't seem to require lexicographic ordering specifically. As long as there is a canonical order that can be determined by the type definition, ignoring the syntactic order of field declarations, then that would work. At its simplest change from what's specified it could be something like lexicographic ordering, except a singular unsized field is moved to the end.

This comment has been minimized.

@eddyb

eddyb Nov 7, 2018

Member

@Nemo157 is correct here, the problem already exists with trait objects, that are e.g. dyn X + Y + Z, and they're equal no matter the order they were written in.

I'd say disallow unsized records until we can support all fields being unsized.

This comment has been minimized.

@Centril

Centril Nov 7, 2018

Contributor

At its simplest change from what's specified it could be something like lexicographic ordering, except a singular unsized field is moved to the end.

I suppose; just to make sure... how would this handle polymorphic cases where we have a T: ?Sized...? It might be Sized at instantiation but we don't know that yet...

+ For `PartialEq`, each field is compared with same field in `other: Self`.
+ For `ParialOrd` and `Ord`, lexicographic ordering is used based on

This comment has been minimized.

@fbenkstein

fbenkstein Nov 7, 2018

Is it necessary to commit to lexicographic ordering? Maybe it would be better to say there's some ordering based on the (name, type) tuples, telling the users they shouldn't rely on any particular ordering? That way rustc could decide, e.g. to compare integer fields before str fields as an optimization.

This comment has been minimized.

@eddyb

eddyb Nov 7, 2018

Member

I don't think records should implement PartialOrd and Ord, because of this.

This comment has been minimized.

@Centril

Centril Nov 7, 2018

Contributor

That way rustc could decide, e.g. to compare integer fields before str fields as an optimization.

That would be semantically surprising imo since it affects observable behavior and thus it isn't really an optimization; if we implement PartialOrd and Ord then it should be done lexicographically.

@tikue

This comment has been minimized.

tikue commented Nov 8, 2018

Would Rust ever consider structural enums? If so, there'd be consistency in reusing the struct and enum keywords in the types:

let a: enum { A(i32), B(String) } = A(1);
let a: struct { a: i32, b: String } = { a: 1, b: "".into() };

Edit: then you could also have a unit type, struct {} actually I guess you still wouldn't be able to construct it unless initialization syntax were also struct { }

type Bigger = { foo: u8, bar: bool, baz: char };
let b: Bigger = { foo: 1, bar: true, baz: 'a' };
let s: Smaller = b; // OK! `b` has all the fields `Smaller` needs.

This comment has been minimized.

@sighoya

sighoya Nov 8, 2018

This causes similar problems as with subtyping or implicit conversion in general. A more explicit solution is recommended, for instance with the as operator:
let s: b as Smaller

The same with nominal<->structural conversions.

This comment has been minimized.

@Centril

Centril Nov 8, 2018

Contributor

Not exactly; implicit conversions have problems (mainly in terms of giving type inference too many options to pick from sometimes as well as not being as clear about semantics), but they are nothing like subtyping which can affect and constrain memory layout / representation.

The main difference between subtyping and implicit coercion is that the latter only affects top level constructions while the former (when covariance occurs) makes it such that:

Vec<{ alpha: T, }> <: Vec<{ alpha: T, beta: U }>

For a systems language this would be quite problematic.

That said, because implicit conversions do have problems, I have left it out of the main proposal at the moment.

@jswrenn

This comment has been minimized.

jswrenn commented Nov 8, 2018

This RFC is really well put together @Centril!

I am very much in favor of this proposal! I prototyped Pyret's initial language-level support for "data science" and came away from the experienced with an enormous appreciation for what can be achieved with a combination of just anonymous record types, FRU, and the three basic higher-order functions, map, filter and fold. For awhile, we basically had LINQ implemented as very simple syntactic sugar! (I tried to implement the same sugar in Rust using macros, and was immediately stymied by the lack of anonymous record types.)

In the two years since then, I've done virtually all of the data analysis work for my research in Rust. Serde makes it a breeze to parse enormous JSON datafiles into data structures with meaningful types, and Rust's rich iterator support (along with Itertools) let's me confidently and performantly implement transformations of that data. However, the inability to ergonomically create ad-hoc types at intermediate stages of those those transformations significantly increases the complexity of my codebase, reduces readability, and hinders Rust's potential as a language for safe, performant and ergonomic data processing. I am very invested in this proposal, and I basically support everything about it.

However, I'd really like to see row polymorphism discussed as a Future Possibility. I don't think it's just a nicety; I think it's actually pretty fundamental for using anonymous record types because without it, you can't write abstractions.


A few previous comments have mentioned row polymorphism, but nobody's given a concrete example of what it lets you express and I'm worried passing readers of this issue might assume it's weird type theory nonsense. Row polymorphism let's you express that a record has fields x, y, z that you care about and are going to manipulate, but maybe some other fields and values ρ and fields and types Α. For instance:

// Transform a stream of things with euclidian coordinates into a stream of
// things polar coordinates.
//
// `Ξ‘` denotes "some collection of additional fields and types"
//
// this type signature conveys that the _only_ thing altered between the
// input and output types is the `pos` field.
fn convert<Ξ‘>(input: impl IntoIterator<Item={pos: EuclidianCoordinate, ..Ξ‘}>)
                      -> impl Iterator<Item={pos: PolarCoordinate,     ..Ξ‘}>
{
    input
        .into_iter()
        .map(| {pos, ..ρ} : {pos: EuclidianCoordinate, ..Α} |
            {pos: pos.into_polar_coordinate(), ..ρ})
}
@ssokolow

This comment has been minimized.

ssokolow commented Nov 8, 2018

Might it be a good idea to more explicitly position this as being to structs as closures are to functions?

I was concerned about this at first, because of the potential for abuse, but realizing that closures present analogous risks and benefits really helped me to get on the same page.

@iopq

This comment has been minimized.

Contributor

iopq commented Nov 9, 2018

@ssokolow I see the resemblance, but what exact parallels can you draw? It's actually easy to name the type of a record, as opposed to a closure.

@ssokolow

This comment has been minimized.

ssokolow commented Nov 9, 2018

I was actually referring more to how they represent a similar trade-off:

Functions serve as a boundary to type inference, to aid in maintaining a stable API and ABI while closures provide a means of declaring things inline and allow a controlled exception to the type inference requirement, making constructs like .filter(|x| ...)more useful when full-blown functions would just be a lot of extra boierplate at best.

Traditional structs have a clear definition with its own identity, which serves to aid in using the type system to communicate a stable interface to others working on the codebase and enforce it while structural records provide a means of declaring things inline, and allow a controlled exception to the notion of identity beyond mere structure, making the various constructs listed in the RFC more useful when full-blown structs would just be a lot of extra boierplate at best.

@iopq

This comment has been minimized.

Contributor

iopq commented Nov 9, 2018

If anything, the example by @jswrenn is the most convincing. Looks like something that's not convenient to do right now.

I can also read and understand exactly what that code is doing, it seems like a natural extension of other Rust features

@scottmcm

This comment has been minimized.

Member

scottmcm commented Nov 9, 2018

@ssokolow I think there's a few differences between closures and what's proposed here, though. For one, closures are anonymous, not structural -- your || {} isn't the same type as my || {}, but this proposal has your {a:i32,} the same as my {a:i32,}. Also, you only interact with the closure through traits, whereas this proposal is all about interacting directly.

The last few comments here make me think more about the fields-in-traits proposal -- with closures you have impl Fn() + Clone; here you could hypothetically have some impl Fields{a:i32} + Clone. For prior art, see anonymous types in C#, where new { A = 4 } gives you an instance of a type you cannot name, but which has a property named A. As such, it's mostly used for inside-a-method stuff, particularly in conjunction with iterator adapters -- exactly that scenario would also work great for rust, and in fact even better given our type inference that means we need to name things less often anyway.

@bbatha

This comment has been minimized.

bbatha commented Nov 9, 2018

The last few comments here make me think more about the fields-in-traits proposal -- with closures you have impl Fn() + Clone; here you could hypothetically have some impl Fields{a:i32} + Clone

This also plays nicely with defaults for optional and default parameters. For instance fn foo(bar: { a: i32 = 3, b: u32 }) can desugar into something like impl Field{b: u32} + DefaultField{a: i32, fn default() -> i32 { 3 }}

@ssokolow

This comment has been minimized.

ssokolow commented Nov 9, 2018

@scottmcm I never intended to say that they are similar on a technical level.

My intent was to more effectively communicate the value and purpose of having structural records in addition to structs by drawing an analogy to when and why people use (and don't use) closures with only explicit arguments when we already have functions.

(ie. As an alternative or complement to throwing a list of example use cases at people.)

@Centril Centril added the data-types label Nov 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment