Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: document cycle restrictions for alias type declarations #25187

Open
griesemer opened this issue Apr 30, 2018 · 7 comments

Comments

@griesemer
Copy link
Contributor

commented Apr 30, 2018

Per the original type-alias design doc from @rsc, it must be possible to "expand out" type alias declarations; this wouldn't be possible for type T = *T without expanding endlessly.

The spec doesn't say anything in this regard; and is generally vague or silent on the subject of cycles.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Apr 30, 2018

Terminology: I use "type alias cycles" to refer to things like type T = *T and "interface cycles" to refer to things like type U interface { F() interface { U } }.

I see three options here:

  1. Revise the spec to disallow type alias cycles, but continue allowing interface cycles.
  2. Revise the spec to disallow both type alias and interface cycles.
  3. Decide to allow both, and leave the spec as-is.

I think with the type alias proposal we had decided on option 1 (and subsequently mostly implemented it), but I'm thinking now it was done without considering the larger scope. As far as I can tell, the problem that type alias cycles introduce shouldn't be any fundamentally trickier than interface cycles. (E.g., we already have types that cannot be expanded out: https://play.golang.org/p/8KQfNRbGwRV)

For consistency, I'd be inclined to favor options 2 or 3 over option 1.

Option 2 is definitely the simplest (other than how to explain it in the spec), but may risk breaking backwards compatibility.

I expect option 3 is necessary if we're interested in accepting #8082.

--

I think we could perhaps pursue option 3 by defining a type serialization format that avoids needing to "expand out" the types. For example, if we use $N to denote anonymous cycles (where N measures how many steps "outward" in the type literal it refers), then a few examples:

  • type T = *T can be described as type T = *$1
  • type U interface { F() interface { U } } as type U interface { F() $1 } (or maybe $2 if we count the method signature as a separate type).
  • type V = map[V][]V can be described as type V = map[$1][]$2

We'd need to make sure there's a canonical mapping though. For example, []V could be written either []map[$1]$2 or []map[$1][]$2, but it has to always be one or the other. (I'd lean towards the former; i.e., always use the shortest type descriptor.)

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented May 2, 2018

@mdempsky Since such cycles can only be created via names in the first place, why not keep the names? Specifically, that might mean that in the compiler we would have to keep an Alias type node around (and always go to the aliased type before we use it). But it would also allow better error messages (that may refer to the alias name used in the error scenario, if any).

@mdempsky

This comment has been minimized.

Copy link
Member

commented May 3, 2018

@griesemer I think keeping extra type information around at compile-time is fine, but I'm concerned about what we do at link-time and run-time.

There are a lot of ways that we depend on identical types mapping to the same runtime type descriptor. For example, type assertions, interface comparisons, or comparing reflect.Type values with ==.

So if we have type t1 = *t1 and type t2 = *t2 in separate packages, they need to reuse the same runtime type descriptor. What should runtime.Type.String() return for this shared runtime type descriptor?

We could arbitrarily pick one, but that doesn't seem any better to me than creating an artificial type descriptor using synthetic placeholders like $1, $2, ....

Alternatively, we could possibly modify the runtime code to not depend on a correspondence between type identity and type descriptor uniqueness, but that seems invasive and likely to (mildly, but negatively) impact even Go programs that don't use anonymous cycles. I'm also not even certain we could keep == working on reflect.Type if we get rid of type descriptor uniqueness.

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented May 3, 2018

@mdempsky Good point. A numbering scheme seems like a better choice. Instead of counting "backwards" (or outwards), one could also just number the types that are referenced from left to right (when written out); that might be a bit easier to understand. So T = *T would be *$1 (as before); V = map[V][]V would be map[$1][]$1. This would have the advantage that a given number always refers to the same type in a given type string. And T = *T; M = map[T]M would be *$1 for T as before, and map[*$2]$1 for M.

@griesemer

This comment has been minimized.

Copy link
Contributor Author

commented May 3, 2018

On the other hand, the inside out numbering may allow simpler composition: An outer type (M) in my last example above would start its numbering with the max + 1 number of the inner types (loosely speaking).

@mdempsky

This comment has been minimized.

Copy link
Member

commented May 4, 2018

I see 4 different possible encoding schemes. For each one, I'll use map[T][]T (where type T = *T) as the example for encoding.

  1. map[*$1][]*$1. This is the "outward" scheme I suggested above.

  2. map[*$2][]$2. This is simple left-to-right numbering of types, referring back to the first occurrence of an identical type. (I think this is what you suggested.)

  3. map[*$2][]*$4. This is 2, but disallowing references except to break cycles.

  4. map[*$2][]*$3. This is 3, but resetting the numbering each time we 'pop' type context. (Alternatively, think of this as 1, but counting inward from the root type, rather than outward from the cycle.)

I think all of these should work, so it seems like a matter of preference on which to use.

I'd argue we shouldn't use 2, because it would be inconsistent with how we handle non-cyclic DAG types. For example, I don't imagine we'd start encoding map[*int][]*int as map[*int][]$2 instead. This makes it trickier (but not impossible) to implement correctly.

I'd also argue we shouldn't use 3, because it makes it too easy for people to accidentally implement 2 instead, with it working most of the time, but not always. (Again, trickier/subtler, but not impossible.)

And lastly, I'd argue that 1 is nicer than 4 because it has the property that each time T occurred, 1 uses the same description (*$1), whereas 4 used two different descriptions (*$2 and *$3).

@bcmills

This comment has been minimized.

Copy link
Member

commented May 8, 2018

The “inside out” encodings are similar to de Bruijn indices, which similarly keep terms local and composable. (I'm not sure whether they correspond to (1) or (4) above, but it doesn't really matter.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.