Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: generics: use type sets to remove type keyword in constraints #45346

Closed
ianlancetaylor opened this issue Apr 1, 2021 · 365 comments
Closed

spec: generics: use type sets to remove type keyword in constraints #45346

ianlancetaylor opened this issue Apr 1, 2021 · 365 comments

Comments

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 1, 2021

We propose clarifications for the semantics of constraint satisfaction in the generics proposal. We also propose changing the syntax of type lists to remove the type keyword and to explicitly specify when type arguments should match on underlying types.

The changes this would make to the current generics proposal document can be seen in https://golang.org/cl/306689.

Background

The current generics proposal proposes a new syntax for type lists within interfaces. A type list within an interface is the keyword type followed by a list of types separated by commas. Type lists are only permitted in interface types that are used as type constraints. For example:

// SignedInteger is a type constraint that permits any
// signed integer type.
type SignedInteger interface {
	type int, int8, int16, int32, int64
}

A type argument matches a constraint with a type list if

  1. The type argument implements the interface ignoring the type list, and
  2. either the type argument or its underlying type is identical to one of the types in the type list.

This rule was adopted in part to support permitting type lists in ordinary interface types, not only in constraints. However, discussion has made clear that the rule is too subtle. This suggests that it is too subtle not just for use in ordinary interface types, but also for use in constraints.

The behavior when embedding interfaces with type lists is also subtle.

We can do better.

Type sets

We start by defining a type set for all types. We will define what it means for a type to implement an interface in terms of type sets, resulting in a behavior that is equivalent to the current definition based on method sets.

Every type has an associated type set. The type set of an ordinary non-interface type T is simply the set {T} which contains just T itself. The type set of an interface type (in this section we only discuss ordinary interface types, without type lists) is the set of all types that declare all the methods of the interface.

Note that the type set of an interface type is an infinite set. For any given type T and interface type IT it's easy to tell whether T is in the type set of IT (by checking whether all methods of IT are declared by T), but there is no reasonable way to enumerate all the types in the type set of IT. The type IT is a member of its own type set because an interface inherently declares all of its own methods. The type set of the empty interface interface{} is the set of all possible types.

With this idea of type sets, we can restate what it means for a type T to implement an interface type IT: T implements IT if T is a member of the type set of IT. Since the type set of IT is the set of all types that declare all the methods of the interface, T is a member of the type set of IT if and only if the method set of T is a (possibly improper) superset of the method set of IT, which is the standard definition of implementing an interface.

Now let's consider embedded interfaces. For a case like type O1 interface{ E }, the type set of O1 is the same as the type set of E. The case type O2 interface{ E1; E2 } is more interesting: the type set of O2 is the intersection of the type sets of E1 and E2. To see this, observe that the type set of E1 is the set of all types that implement all the methods of E1, and similarly for E2. What we want for the type set of O2 is the set of all types that implement all the methods of O2. The methods of O2 are all of the methods of E1 combined with all of the methods of E2. The set of types that implement all the methods of both E1 and E2 is the intersection of the type sets of E1 and E2.

Note that listing a method in an interface type definition in the usual way is, from a type set perspective, indistinguishable from embedding an interface that declares just that method. Although a method by itself is not a type, for our purposes we can say that the type set for a method listed explicitly in an interface type definition is exactly the type set of an interface type with only that method: the set of all types that implement that method. The advantage of doing this is that we can now say that the type set of an interface type is exactly the intersection of the type sets of each element listed in the interface.

We've now described type sets, and we've explained the meaning of implementing an interface in terms of type sets. None of this changes the language in any way, but it serves as background and motivation for the next steps.

Proposal

We propose to replace type lists as defined by the generics proposal with three new, simpler, ideas.

An interface type that is used as a constraint, or that is embedded in a constraint, is permitted to embed some additional constructs that we will call interface elements. An interface element can be:

  1. Any type, not just an interface type.
  2. A new syntactic construct called an approximation element.
  3. A new syntactic construct called a union element.

With these new elements we will be able to state simply that a type argument A satisfies a constraint C exactly when A implements the interface type C, or, in terms of type sets, exactly when A is a member of the type set of C.

First, we propose that an interface type used as a constraint is permitted to embed a non-interface type. For example: type Integer interface{ int }. As discussed in the previous section, the type set of an interface type is the intersection of the type sets of the elements of the interface. The type set of int is simply {int}. This means that the type set of Integer is also {int}.
This constraint can be satisfied by any type that is a member of the set {int}. There is exactly one such type: int.

Of course, that is useless by itself. For constraint satisfaction, we want to be able to say not just int, but "any type whose underlying type is int." To implement this, we propose a new syntactic construct, which may be embedded in an interface type used as a constraint. This is an approximation element, written as ~T. The type set of an approximation ~T is the set of all types whose underlying type is T. An approximation ~T is only valid if the underlying type of T is itself T; this is discussed in more detail below.

For example: type AnyInt interface{ ~int }. The type set of ~int, and therefore the type set of AnyInt, is the set of all types whose underlying type is int. For example, if MyInt is defined as type MyInt int, then MyInt used as a type argument will satisfy the constraint AnyInt.

The final step is another new syntactic construct that may be embedded in an interface type used as a constraint: a union element. A union element is written as a sequence of types or approximation elements separated by vertical bars (|). For example: int | float32 or ~int8 | ~int16 | ~int32 | ~int64. The type set of a union element is the union of the type sets of each element in the sequence. The types and elements listed in a union must all be different: no two types may be identical, and no two approximation elements ~T1 and ~T2 may have T1 identical to T2. For example:

type PredeclaredSignedInteger interface {
	int | int8 | int16 | int32 | int64
}

The type set of this union element is the set {int, int8, int16, int32, int64}. Since the union is the only element of PredeclaredSignedInteger, that is also the type set of PredeclaredSignedInteger. This constraint can be satisfied by any of those five types.

Here is an example using approximation elements:

type SignedInteger interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64
}

The type set of this constraint is the set of all types whose underlying type is one of int, int8, int16, int32, or int64.
Any of those types will satisfy this constraint. This is the equivalent of the notation used in the generics proposal

interface {
	type int, int8, int16, int32, int64
}

The use of explicit approximation elements clarifies when we are matching on underlying types, the use of | instead of , emphasizes that this is a union of elements, and the type keyword can be omitted by permitting constraints to embed non-interface elements.

The purpose of introducing type lists in the generics proposal was to specify the operations available to type parameters in parameterized functions. This is easy to define based on the idea of type sets. Given a type parameter P with a constraint C, a parameterized function is permitted to use an operation with a value of type P if the operation is permitted for every member of the type set of C.

That is the complete proposal: a conceptual change to use type sets, and three new syntax changes. We will now mention some details and ramifications.

Approximation elements

The new ~T syntax will be the first use of ~ as a token in Go.

Since ~T means the set of all types whose underlying type is T, it will be an error to use ~T with a type T whose underlying type is not itself. Types whose underlying types are themselves are:

  1. Type literals, such as []byte or struct{ f int }.
  2. Predeclared types, such as int or string.

We do not permit ~P where P is a type parameter.

The type set of ~T is an infinite set of types.

The ~ will bind more tightly than |.
~T1 | T2 means (~T1) | (T2), not ~(T1 | T2) (note that ~(T1 | T2) is not syntactically valid)..

The new syntax is

InterfaceType = "interface" "{" { ( MethodSpec | InterfaceTypeName | ConstraintElem ) ";" } "}" .
ConstraintElem = ConstraintTerm { "|" ConstraintTerm } .
ConstraintTerm = [ "~" ] Type .

Embedding constraints

A constraint can embed another constraint. Union elements can include constraints.

// Signed is a constraint whose type set is any signed integer type.
type Signed interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64
}

// Unsigned is a constraint whose type set is any unsigned integer type.
type Unsigned interface {
	~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr
}

// Float is a constraint whose type set is any floating point type.
type Float interface {
	~float32 | ~float64
}

// Ordered is a constraint whose type set is any ordered type.
// That is, any type that supports the < operator.
type Ordered interface {
	Signed | Unsigned | Float | ~string
}

Interface types in union constraint elements

The type set of a union element is the union of the type sets of all elements in the union. For most types T the type set of T is simply T itself. For interface types (and approximation elements), however, this is not the case.

The type set of an interface type that does not embed a non-interface element is the set of all types that implement the interface, including the interface type itself. Using such an interface type in a union element will add that type set to the union. For example:

type Stringish interface {
	string | fmt.Stringer
}

The type set of Stringish will be the type string and all types that implement fmt.Stringer. Any of those types (including fmt.Stringer itself) will be permitted as a type argument for this constraint. No operations will be permitted for a value of a type parameter that uses Stringish as a constraint (other than operations supported by all types). This is because fmt.Stringer is in the type set of Stringish, and fmt.Stringer, an interface type, does not support any type-specific operations. The operations permitted by Stringish are those operations supported by all the types in the type set, including fmt.Stringer, so in this case there are no operations other than those supported by all types. A parameterized function that uses this constraint will have to use type assertions or reflection in order to use the values. Still, this may be useful in some cases for stronger static type checking. The main point is that it follows directly from the definition of type sets and constraint satisfaction.

Combining embedded non-interfaces with methods

A constraint can embed a constraint element and also list methods.

type StringableSignedInteger interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64
	String() string
}

The rules for type sets define what this means. The type set of the union element is the set of all types whose underlying type is one of the predeclared signed integer types. The type set of String() string is the set of all types that declare that method. The type set of StringableSignedInteger is the intersection of those two type sets. The result is the set of all types whose underlying type is one of the predeclared signed integer types and that declare the method String() string. A function that uses a parameterized type P that uses StringableSignedInteger as a constraint may use the operations permitted for any integer type (+, *, and so forth) on a value of type P. It may also call the String method on a value of type P to get back a string.

Empty type sets

It is possible to write a constraint with an empty type set. There is no type argument that will satisfy such a constraint. The compiler should give an error whenever it detects such an unsatisfiable constraint. However, in general a compiler may not be able to detect all such cases. It is not feasible to detect all such cases, though they can't be used with any type argument. It may be appropriate to have vet give an error for cases that it can detect.

// Unsatisfiable is an unsatisfiable constraint with an empty type set.
// No predeclared types have any methods.
// If this used ~int | ~float32 the type set would not be empty.
type Unsatisfiable interface {
	int | float32
	String() string
}

Method sets of constraint elements

Much as the type set of an interface type is the intersection of the type sets of the elements of the interface, the method set of an interface type can be defined as the union of the method sets of the elements of the interface. In most cases, an embedded element will have no methods, and as such will not contribute any methods to the interface type. That said, for completeness, we'll note that the method set of ~T is the method set of T. The method set of a union element is the intersection of the method sets of the elements of the union. These rules are implied by the definition of type sets, but they are not needed for understanding the behavior of constraints.

Possible future step: permitting constraints as ordinary interface types

We have proposed that constraints can embed some additional elements. With this proposal, any interface type that embeds anything other than an interface type can only be used as a constraint or as an embedded element in another constraint. A natural next step would be to permit using interface types that embed any type, or that embed these new elements, as an ordinary type, not just as a constraint.

We are not proposing that today. But the rules for type sets and methods set above describe how they would behave.
Any type that is an element of the type set could be assigned to such an interface type. A value of such an interface type would permit calling any member of the corresponding method set.

This would permit a version of what other languages call sum types or union types. It would be a Go interface type to which only specific types could be assigned. Such an interface type could still take the value nil, of course, so it would not be quite the same as a typical sum type.

In any case, this is something to consider in a future proposal, not this one.

@ianlancetaylor ianlancetaylor added this to the Proposal milestone Apr 1, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 1, 2021

Change https://golang.org/cl/306689 mentions this issue: design: update type parameters design for type sets

Loading

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Apr 1, 2021
@Merovius
Copy link

@Merovius Merovius commented Apr 2, 2021

At first glance, this seems okay to me. Some comments:

  1. It makes me a bit sad that the default is exact-matching, with underlying-type-matching needing the ~ token, given that we introduce this for constraints only first and ~all usages in that context should probably use ~. It seems easy to accidentally use the unadorned types and then get locked into an unnecessarily restrictive constraint. I don't know a good alternative though - the obvious analogy would be to require =int for exact matching and it seems less clear that doesn't lead to parsing ambiguities.
  2. There recently have been some questions about incomplete, implementation-defined validity checks: One example on Twitter and one example on golang-nuts. In both cases, the spec allows the compiler to reject certain programs, but doesn't require it. These questions convinced me that this is a bad idea to put into the spec. We don't want the validity of Go programs to be implementation-defined, IMO. If nothing else, it makes it very hard to change the used heuristic in the future. So, if we can't give a clear rule as to which unsatisfyable constraints to disallow (and I don't think we can, in general), I would personally advocate to simply allow them. I don't see a lot of disadvantages - after all, checking if a given type fulfills a constraint is still easy, so an impossibly constrained function can not be instantiated. So even the most basic test would surface the problem, it literally can't be called. And we can always do this as a vet check, which can use any heuristic it likes and can be progressively improved over time.
  3. Even though you don't want to bind this to a future extension to sum types, I still think it's reasonable to consider how such an extension could happen - given that this change is prompted by that consideration. Most importantly, such an extension should allow to switch on the matched type. I don't know if you thought about this yet? My first impression is that it would be possible to allow type-assertions/switches of the form x.(~string), which would assert that xs underlying type is string and if so, evaluate to a string. It seems like a natural syntax which would allow everything we'd need and I don't see anything immediately wrong with it. So, ISTM that it's possible to do that extension somewhat naturally - but of course, I've only thought about it for two hours :) [edit] I guess i should've looked at the actual updated design doc first - this possibility is already mentioned there [/edit]

Loading

@kortschak
Copy link
Contributor

@kortschak kortschak commented Apr 2, 2021

Part 3. of @Merovius's comment nicely addresses the issue in Identifying the matched predeclared type of the Type Parameters Proposal.

Loading

@DmitriyMV
Copy link

@DmitriyMV DmitriyMV commented Apr 2, 2021

Since ~T means the set of all types whose underlying type is T, it will be an error to use ~T with a type T whose underlying type is not itself.

Does this mean that we can't define something like:

type Dictionary[K any, V any] interface {
    ~map[K]V
}

or

type GenChan[T any] interface {
    ~chan T
}

?

Loading

@griesemer
Copy link
Contributor

@griesemer griesemer commented Apr 2, 2021

Thanks, @Merovius for the excellent initial feedback. Some comments to your points:

  1. Ack. We've looked at = and other options, but, to turn the viewpoint around, it would also be somewhat sad to have to say =int when we exactly mean just int; and to have int mean something different than just int in the context of a constraint. But I'm sure you gathered as much as well. I suspect that we won't often write ~T elements in practice because many times we will just use constraints already declared elsewhere, e.g., constraints.Ordered or the like. Maybe that's a consolation.

  2. It's an excellent point. One might also say that "implementation restrictions" are de-facto parts of the language, and perhaps we need to stop being a "chicken" (my apologies to chickens) and admit as much. With respect to unsatisfiable constraints, we've considered both, not reporting an error, or reporting an error. I agree that if there's no clear rule, we should probably not promise that the compiler reports an error. (Consider also parameterized constraints that may only become unsatisfiable upon instantiation.)

  3. I'll let @iant chime in on this one as he's spent more time thinking about this.

Loading

@griesemer
Copy link
Contributor

@griesemer griesemer commented Apr 2, 2021

@DmitriyMV No, to the contrary. Such constraints are explicitly permitted because the underlying type of map[K]V is itself (same for chan T).

Loading

@DmitriyMV
Copy link

@DmitriyMV DmitriyMV commented Apr 2, 2021

Thanks! So what are those

T whose underlying type is not itself

types for example?

Loading

@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Apr 2, 2021

As I described in #43651 (comment) and #44235, I think it creates a great pedagogical load to reuse the name "interface" for concepts that are not actually types. So, to me, the strength of this proposal is in addressing the concerns raised in #41716. I believe that the right decision is to leave as few releases as possible, ideally zero, between the release of generics and the "possible future step" of using this syntax (or another, if this is ultimately rejected) for sum types. The longer we go without being able to instantiate every type parameter with its own definition, the more tutorials, books, guidelines, and style documents will be written based on it being impossible.


@Merovius

  1. It makes me a bit sad that the default is exact-matching, with underlying-type-matching needing the ~ token, given that we introduce this for constraints only first and ~all usages in that context should probably use ~. It seems easy to accidentally use the unadorned types and then get locked into an unnecessarily restrictive constraint. I don't know a good alternative though - the obvious analogy would be to require =int for exact matching and it seems less clear that doesn't lead to parsing ambiguities.

Conversely, I like that the default behavior here preserves Go's strong one-to-one relationship between names and types (aside from type aliases, which were introduced late and are usually discouraged). ~ being a new syntactic element meaning "approximately" leaves open a number of future options, including but not limited to type switches and assertions on underlying types as you mentioned.

Loading

@griesemer
Copy link
Contributor

@griesemer griesemer commented Apr 2, 2021

@DmitriyMV Any defined type has an underlying type that is not itself. Look also for the definition of underlying types. For instance, myint declared as

type myint int

is a defined type whose underlying type is int. Writing ~myint is likely a programmer error because there's no type whose underlying type is myint, thus the type set of ~myint is empty. One could permit it, but that seems like an opportunity missed to catch a bug.

It's important to fully understand the notion of underlying type for this proposal.

Loading

@Merovius
Copy link

@Merovius Merovius commented Apr 2, 2021

@griesemer To be clear: I'm advocating not for "not promising to return an error", but for "promising not to return an error". That is, the compiler should consider an empty type-set valid, but vet might flag it :)

Loading

@ajwerner
Copy link

@ajwerner ajwerner commented Apr 2, 2021

I'm wondering whether this proposal has any rough edges related to instantiation of constraints using interface types. Say I do something like below:

type I[T any] interface {
    T
}

Are either of the following valid:

func ToString[A fmt.Stringer, B I[A]](b B) string { 
    return b.String() 
}
func ToString[A I[fmt.Stringer]](a A) string {
     return a.String()
}

edit: my reading is yes and they that are both valid.

Loading

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Apr 2, 2021

It's kind of a bias from other languages, but my first inclination is to read ~int as not int. I'm sure that I'd get used to it, though, and I agree that it's better than =int. I think it makes more sense for it to default to exact, rather than approximate, as it makes embedding a concrete type the same as embedding an interface, as every line just defines a type set and, as stated in the proposal, the type set of a concrete type can contain just itself. Switching that default requires there to be a difference between the two because it would conflict with existing usage outside of interfaces. As proposed, you can think of any usage as a type set, despite some being illegal under the current proposal, at least for now:

var Interface fmt.Stringer // Valid values limited to those in the set of fmt.Stringer.
var Concrete int // Valid values limited to those in the set of int.

The second would conflict with the embedding of types if ~int was replaced with =int, though, as int would have a different type set in different contexts.

Kind of makes me wonder about the potential legality in the future of var Underlying ~int or var Union ~int | ~float64. If they were allowed, interfaces would become exactly the same as a named type set, although something like type Signed ~int | ~int8 | ~int16 | ~int32 | ~int64 would also probably make sense at that point.

Disclaimer: I am not necessarily advocating for ~int being usable outside of interfaces.

Loading

@qualidafial
Copy link

@qualidafial qualidafial commented Apr 2, 2021

Speaking as just a random developer who is pretty new to Go--so take this with a grain of salt--I can't help but feel like the whole constraint interfaces thing is just a complicated end run to avoid having to add operator overloading to the language.

If we had the ability to express that an interface includes specific operators (with whatever syntactic form makes it unambiguous), then we don't need type sets / constraint interfaces to enumerate types.

The generics proposal, with or without this type sets proposal, takes Go further away from structural typing.

Edit: and honestly, we don't even need operator overloading to keep the language unambiguous. We just need a way to express that an interface includes certain operators. Then Ordered could just be the interface that includes <, >, and ==. And it could apply to any primitive number, string, or any custom type with an ordered underlying type. Operator overloading could come later (if ever)

Loading

@bserdar
Copy link

@bserdar bserdar commented Apr 2, 2021

Is there a practical use for having both T and ~T? If not, ~ can be dropped.

That is: T means "all types derived from T", and there would be no support for "type is T"

Loading

@kortschak
Copy link
Contributor

@kortschak kortschak commented Apr 2, 2021

For numerical work you would like to implement float64 and float32 cases but probably not derived types (i.e. only interface{ float32 | float64 }), while if you wanted to implement an orderable container you would want all orderable types including their derivatives, so interface{ ~int | ~int8 | ... | ~string }.

Loading

@bserdar
Copy link

@bserdar bserdar commented Apr 2, 2021

@kortschak

For numerical work you would like to implement float64 and float32 cases but probably not derived types (i.e. only interface{ float32 | float64 })

I still can't see the benefit of excluding derived types from such an implementation. If a type has the underlying type of float64, is there a case where you don't want to treat it like one? I don't have much experience with numerical work, but I am thinking of a type Temperature float64 and can't see the benefit of excluding it from a generic function requiring float64 instead of ~float64.

Loading

@kortschak
Copy link
Contributor

@kortschak kortschak commented Apr 2, 2021

Yeah, I can see your point. The ~ does allow the potential of underlying type switching though, which would be very useful.

Loading

@fzipp
Copy link
Contributor

@fzipp fzipp commented Apr 2, 2021

  1. It makes me a bit sad that the default is exact-matching, with underlying-type-matching needing the ~ token, given that we introduce this for constraints only first and ~all usages in that context should probably use ~. It seems easy to accidentally use the unadorned types and then get locked into an unnecessarily restrictive constraint.

@Merovius Isn't it better to be accidentally too restricting and have the possibility to lift the restriction later than to be accidentally too permissive? Are there situations where changing T to ~T is a breaking change for the consumer of a parametrized type or function?

Loading

@zephyrtronium
Copy link
Contributor

@zephyrtronium zephyrtronium commented Apr 2, 2021

The case for including both T and ~T comes primarily from discussion on #41716 about sum types using the previous proposed type list syntax. Essentially, for generics, ~T is usually the correct choice, but for a sum type, T is usually better. It would be contrary to the philosophy of Go to use different syntaxes for a list of types as generic constraints and a list of types as union options, but having different semantics for the same syntax based on where that syntax appears is even more contrary. This proposal keeps the desirable behavior for generics without obstructing the eventual possibility of sum types by allowing users to choose the appropriate option for each respective use case.

Loading

@tooolbox
Copy link

@tooolbox tooolbox commented Apr 2, 2021

At the risk of sounding +1, I would like to say that this seems to be a strong step forward, and that Ian and Robert have my thanks for their ongoing efforts to make generics a reality in a way that fits with Go’s ethos and that gophers can learn easily and use effectively. No objections to the proposal, although I agree with @zephyrtronium that it would be ideal to narrow the gap between regular interfaces and constraint interfaces, hopefully to nothing, before generics are released, so I look forward to further iteration.

Loading

@urandom
Copy link

@urandom urandom commented Apr 2, 2021

I'm curious whether it would be possible to allow approximations of structs to match not only the underlying type of a particular struct, but also to allow matches for structs that have at least the exact fields that are listed as the approximate struct element?

type Fooer interface {
    ~struct { Foo int; Bar string } 
} 

type MyFoo struct {
    Foo int
    Bar string
    Baz float64
}

In the above snippet, there is a potential for allowing types like MyFoo to satisfy the Fooer constraint. This could be quite useful, as structs are the only class of types that vary wildly. I know of at least one other language that allows for a similar expression

Loading

@DmitriyMV
Copy link

@DmitriyMV DmitriyMV commented Apr 2, 2021

@griesemer

Any defined type has an underlying type that is not itself.

Maybe this should be added to proposal in a form of explanation "aka having ~Type where Type is a defined type aka type MyType Type is invalid. Or something like that?

Loading

@Merovius
Copy link

@Merovius Merovius commented Apr 2, 2021

@bserdar AFAICT the main reason to have both is the possibility of expanding to sum types. Constraints almost always want to use ~T and sums almost always want to use T. Having both is a form of future-proofing (though as I said, it makes me a bit sad that if we don't do that expanding, we'll be stuck with extra ~ all over the place. But only a bit).

@fzipp I'm not sure. It might not be, especially if we only consider the proposal as-is with this change. Note that it's also possible to use constraints defined by other packages for your own function, so it's not just the consumers of a generic function that are affected, but also authors. A function that uses a constraint and type-asserts an argument would be an example of a breakage, when this constraint is relaxed. But this is shot from the hip - I have no idea how real/practically relevant this is. It becomes more relevant if we expand to sum types though.

@urandom It's definitely possible. I'm not sure it's a good idea though. It's a pretty significant change in the meaning of ~ based on the context.

Loading

@DmitriyMV
Copy link

@DmitriyMV DmitriyMV commented Apr 2, 2021

@urandom I agree with @Merovius. I think that T1 which embeds T should have a different syntax.

Loading

@markusheukelom
Copy link

@markusheukelom markusheukelom commented Apr 2, 2021

Some questions / remarks:

  1. Instead of prefix "~", what about suffix "+"?
type Floats interface {
	float32+ | float64+
}

This has the benefit of not introducing a new token and does not confuses to read as "not float32 or not float64". Also T- could at some point refer to the underlying type of T, which might be useful to express at some point.

  1. Why a new list operater "|" instead of ","

Do this not work?

type Floats interface {
	float32+, float64+
} 
  1. What's the use case of mixin T~ with S in a single list?

Ie.

type Floats interface {
	int,~int32	
}

When would you ever need that?

Instead, would it be possible / sensible to move the "~" to the parameter type side? I.e.

type Floats interface {float32, float64}
func Min[F ~Floats](a, b F) F 		// accept all floats, or all types with float as underlying type
  1. Why ramming it into the interface construct?

Could something like this work to declare a type set/list?

type[] Floats {float32,float34} 		
type[] Strings  {string, fmt.Stringer}		

To me, an interface is used when you care about functionality only (methods) and not about representation (actual type). So the word "interface" does not resonate very well to me with a type list (plus the interface concept of Go is already quite involved).

Loading

@akavel
Copy link
Contributor

@akavel akavel commented Apr 2, 2021

There's an interesting comment on r/golang, bringing up a parallel of this proposed syntax resembling the (also newline-based) boolean/set logic syntax of // +build directives, which was eventually found to be confusing and is expected to be replaced by a more "traditional" syntax for //go:build. Although this technically qualifies as 🚲 🏠 🖌️ -ing, in light of the //go:build situation I believe it might be worth calling out at least in passing a consideration of the currently proposed vs. more explicit syntax somewhere in the document.

Loading

@Merovius
Copy link

@Merovius Merovius commented Apr 2, 2021

@markusheukelom As for question 3: type Stringish interface { ~string | fmt.Stringer }.

Loading

@Merovius
Copy link

@Merovius Merovius commented May 14, 2021

FWIW I'm still thinking about the question of when a generic function/type should be able to be instantiated with a type argument. That is, how should we decide if this code is legal:

func F[T C1]() {}

func G[T C2]() {
    F[T]()
}

The natural definitions in terms of type sets are all hard to prove for the compiler. The natural definition in terms of operation sets ("the operation set of C1 must be a superset of the operation set of C2") is easy to prove, but it's awkward:

type C1 interface { ~int }

type C2 interface { ~int8 }

This shouldn't be allowed, really, but it's not entirely clear from the "operation set" perspective. If an "operation" is something like "adding two values of type T and getting back a T", then they should have the same operation set, so it should be legal. If, OTOH, we distinguish that by saying "the result of addition is an int8/int, which are different types, so these are different operations", then the operation set of ~int | ~int8 should be (relatively) empty.

Related to this question, though not really to the question of the constraint language we use (so arguably independent of this proposal) @rogpeppe recently brought up another question about instantiating generic functions with type arguments:

func F[T any]() {
    F[[2]T]()
}

Obviously this code doesn't do anything useful. But it illustrates a problem, namely: How is a purely stenciling implementation supposed to compile this? There is an infinite set of instantiations needed. For generic types, this is already excluded by requiring any recursive instantiations to use the same type arguments in the same order - I think that rule must be extended to functions, including mutual recursion. I don't know if that prevents us from writing some generic functions we might want to write, though. It might be, for example, useful to annotate a type argument by wrapping it in a struct and store the result in some data structure, i.e.

func F[T any]() {
    type val[T any] struct {
        v T
        annotation int
    }
    s := stack.New[val[T]]()
    // use s
}

Type lists as in the accepted design don't really have the first problem, I think. At least as long as they don't allow interfaces as elements. So, we might be able to solve some of these problems for this proposal by also disallowing interfaces in union elements - though that would again remove some of the elegance of the proposal. But maybe that's necessary.

I'm not sure any of this really is a blocker, per se. I think it would be possible to accept this as a general "we like the new syntax" and then solve these problems when they come up naturally. But I do think we still need to think about the question of how to handle type arguments in instantiations.

Loading

@ianlancetaylor
Copy link
Contributor Author

@ianlancetaylor ianlancetaylor commented May 14, 2021

For generic types, this is already excluded by requiring any recursive instantiations to use the same type arguments in the same order - I think that rule must be extended to functions, including mutual recursion.

Yes, this is the intent. I unintentionally left that out of the current proposal doc, but that was always the plan.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented May 22, 2021

Change https://golang.org/cl/321689 mentions this issue: [dev.typeparams] cmd/compile/internal/types2: accept embedded interface elements

Loading

@griesemer
Copy link
Contributor

@griesemer griesemer commented May 22, 2021

Clarification: The above CL 321689 is an initial, but not complete (!), implementation of this proposal. For now, it doesn't permit interfaces in ~ or union terms. This seems ok for now as we didn't really support this for type lists, either. See the CL description for more details if curious. In short, this CL supports what we had supported so far, but using the "type" keyword-free notation, and can be viewed as an "approximation" if true type sets (and hopefully where the implementation works it behaves no different from a true type set behavior).

Loading

gopherbot pushed a commit that referenced this issue May 24, 2021
…ce elements

Accept embedded interface elements of the form ~T or A|B and
treat them like type lists: for now the elements of a union
cannot be interfaces. Also, translate existing style "type"-
lists in interfaces into interface elements: "type a, b, c"
becomes a union element "~a|~b|~c" which in turn is handled
internally like a type list.

For now, "~" is still ignored and type lists are mapped to
Sum types as before, thus ensuring that all existing tests
work as before (with some minor adjustments).

Introduced a new Union type to represent union elements.
For now they don't make it past interface completion where
they are represented as a Sum type. Thus, except for printing
(and the respective tests) and substitution for interfaces,
the various type switches ignore Union types. In a next step,
we'll replace Sum types with union types and then consider
the ~ functionality as well.

Because union elements are no different from embedded interfaces
we don't need a separate Interface.types field anymore. Removed.

For #45346.

Change-Id: I98ac3286aea9d706e98aee80241d4712ed99af08
Reviewed-on: https://go-review.googlesource.com/c/go/+/321689
Trust: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Findley <rfindley@google.com>
@asv7c2
Copy link

@asv7c2 asv7c2 commented May 25, 2021

Just curious, type switch will be exhaustive? If yes, will nil be forced to handle in type switch?

Loading

@Merovius
Copy link

@Merovius Merovius commented May 25, 2021

@asv7c2 There are no changes to type-switches here. They will work as they do now, you type-switch on an interface value to determine it's dynamic type.

Loading

@asv7c2
Copy link

@asv7c2 asv7c2 commented May 25, 2021

@Merovius
What benefit then have type sets in ordinary interface types?

Loading

@Merovius
Copy link

@Merovius Merovius commented May 25, 2021

@asv7c2 Type sets don't replace interfaces, they replace type lists from the accepted proposal and have the same purpose.

Loading

@griesemer
Copy link
Contributor

@griesemer griesemer commented Jun 16, 2021

Based on our current experience with a concrete implementation, for the initial generics release we are planning to proceed with a restricted version of this type set proposal. We believe this will leave open the door for a more complete implementation while still providing more expressive power than type lists because of approximation elements.

The restriction is that interfaces containing methods (including the predeclared interface comparable) may only be embedded "stand-alone", i.e., such interfaces cannot be used as terms in union elements (unions for short). Note that ~T where T is an interface is not permitted.

For example, the following remains valid:

type Stringer {
	String() string
}

type Number {
	~int | ~float64
}

type C1 interface {
	Number | ~string	// Number doesn't declare any methods
	Stringer		// Stringer is embedded "stand-alone"
m()
}

type C2 interface {
	C1			// C1 embedded "stand-alone"
}

but the following is invalid:

type C2 interface {
	~int | Stringer		// invalid: Stringer contains methods
}

With this restriction, we can compute the relevant type, method, and operation sets relatively easily by considering the types and methods separately, similarly to what we do with type lists. To see this let's consider the various cases. In the following, for readability, we reason "by example", but we believe that the examples can easily be generalized (i.e., the examples do not restrict the generality of the arguments).

First off, an interface that is embedded "stand-alone" (not part of a union) can always be "inlined", i.e., the embedded interface's effect is as if the contents of that interface (methods and other embedded elements) were added manually to the embedding interface. This is the usual Go embedding behavior.

This leaves us with "flat" interfaces that declare methods, or embed approximation elements or unions. Approximation elements cannot contain interfaces, and let's ignore unions containing interfaces for a moment. Furthermore, an approximation element and any non-interface type can be viewed as a single-term union. Thus we're left with a (newline-separated) list of unions and methods.

Each union defines a type set which corresponds to the types listed in the union, adjusted for the presence of ~. For instance, int | float32 is the type set { int, float32 }, and ~int | int is the type set ~int, which is the set of all types with underlying type int. More interestingly, myInt | ~int describes the type set ~int if myInt's underlying type is int.

In general, whenever we have a single type T with underlying type U and we have a term ~U in the same union, we can simplify the union by removing the T: T | ~U == ~U because the very definition of ~U includes T. And of course T | T simplifies to T, and ~T | ~T simplifies to ~T.

Thus, for each union we can compute a canonical (minimal) representation, which is a list of types with ~ annotations. This representation describes the type set expressed by the union.

In general we don't just have a single union element, we may have several, each describing a type set. To get the final type set, per the type set proposal, we need to intersect the individual union type sets, which means we need to intersect the respective canonical representations. For this purpose, we pair individual matching types from two canonical representations and compute their intersections (using & to denote intersection): for instance, int & int == int, myInt & ~int == myInt, int & string == {} (empty set), etc. Again, it's not too hard to see how this generalizes.

Eventually we end up with a single canonical representation and a list of methods for an interface. The type set described this way is the set of types that can be found in the union representation and which implement all the methods. It's easy to test whether a given type T is in that type set: we simply check if T is included in the canonical representation and whether T implements all the methods.

A more complex situation occurs if we want to test if a type parameter P1 with constraint C1 satisfies the constraint C2 of another type parameter P2. This amounts to a subset test: is the type set described by C1 a subset of the type set described by C2? Or in other words, is every type described by C1 permissible (included) by C2? Because the canonical representation for a union enumerates all the types described by that union, we can simply check each type of C1 against the types in C2 (while taking the ~ annotation into account). The additionally required superset test for the method sets is the same as for ordinary Go.

Finally, we need to consider the previously excluded case of a union containing an interface term. For instance a | b | interface{ … } | c (where each of a, b, c may be an approximation element). Per the proposed implementation restriction, that interface cannot have methods, only unions. After canonicalization, we're left with a single union U describing a type set for which we have a precise internal representation, for instance U == p | q | r. Altogether we have a | b | interface{ p | q | r } | c. Again, it is not hard to see that this is equivalent to a | b | p | q | r | c (which may be simplified through canonicalization, depending on the actual terms).

To summarize, by disallowing interfaces declaring methods as terms of union elements, it is possible to implement a restricted but sufficiently powerful version of the type set proposal for the first version of generics.

The representation we have described is a simplified disjunctive normal form where each term consists of a single type plus ~ annotation, with the associated methods factored out and handled separately (because there is only a single set of methods for each term).

Without the proposed restriction, union terms may specify methods. As a consequence, the canonical disjunctive normal form will become more complicated: methods cannot be factored out anymore. Instead, each term in the canonical form will consist of a type, a ~ annotation, and a (possibly empty) set of methods. Initial implementation experiments suggest that computing that representation should be possible but that it is significantly more subtle to get right. Removing the restriction matters should we allow constraint interfaces for general use (to model sum types) at some point in the future; we don't need to address this now.

Loading

@rsc
Copy link
Contributor

@rsc rsc commented Jul 14, 2021

This was pending implementation. The implementation is (nearly) done and people seem very happy with the previous comment, so we will move this to likely accept. As with all of generics, we expect we will find details that need fixing, and they can be filed as separate proposals.

Loading

@rsc rsc moved this from Active to Likely Accept in Proposals Jul 14, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Jul 14, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Loading

@rsc rsc moved this from Likely Accept to Accepted in Proposals Jul 21, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Jul 21, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet