Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: go 2: (Lightweight) generics in Go using type lists #33604

Open
markusheukelom opened this issue Aug 12, 2019 · 7 comments

Comments

@markusheukelom
Copy link

commented Aug 12, 2019

Introduction

This proposal contains an idea to define generics in Go (2) using (named) type lists. A possible way to define a type list in Go is sketched below, after which the use of the type list to write generic code is discussed. The second half contains a couple of thoughts on the details of type lists / generics that are less essential to the 'main' idea.

(Lightweight) generics in Go using type lists

A type list is a comma separated list of types enclosed in parenthesis. A type list can be named like normal Go types:

type T (float32,float64)	// T is a named type list that contains the types float32 and float64
type ColorElemType T 		// ColorElemType is a named type list that contains the same types as T 

Besides explicit type lists, Go provides two implicit type list:

type A ()					// the empty list '()' denotes all assignable types; 
type E (==)					// (==) denotes the list of all types that are comparable (on equivalence, including !=)

A type list is used to specify type parameters for generic functions, structs and interfaces:

func add(a, b T) T // add is a generic function with a single type parameter T that is used for 3 variables: a, b and the return value
func sum(a ...T) T // a generic, variadic function with a single type parameter T

type struct Vertex { // Vertex is a generic type (struct) with two type parameters T and ColorElemType used in 3 fields
	Coords [3]T
	Normal [3]T
	Color  [3]ColorElemType
}

type K (==) 
type V ()
type struct Bimap  {	// Bimap is a generic container with two type parameters: K an V 
	Forward map[K]V
	Reverse map[V]K
}

type interface Vector {	// Vector is a generic interface with a single type parameter
	Dot(a, b T[]) T
}

func multMatrix(a, b []T, dot Vector(T)) []T 	// a more complex generic function signature

Note that each named type list accounts for a single type parameter. This is the central theme in this proposal. In particular, it was chosen to lift type parameter definitions out of generic function signature and generic struct/interface definitions to increase their readability while also making them reusable.

Unnamed type lists are allowed as well but will always get their own type parameter:

type ABC struct {	// ABS is a generic struct with three type parameters
	A      (string,int)
	B, C (string,int)
}

func foo(a, b (==)) (==) {	// a generic function with 3 type parameters
	//	
}

To use a generic function, struct or interface, the type parameters must be bound to specific types by providing a list of type arguments:

addf := add(float32)   // addf is a 'normal' Go function with signature (float32, float32) float32 
addi := add(int)   // error: int not in T (float32, float32)
addi := add(float32,float32)	// error: too many type arguments 

addf(1, 1)	  // "2";
add(float32)(2, 2)  // "4"; both type arguments and call arguments can be specified in a single statement

type Vertexf Vertex(float32,float64)	
type Vertexd Vertex(float64,float32)

Note that the type arguments are specified in the order of appearance of the type parameters.

A type argument satisfies a type parameter's type list if the type is directly in the list, or if the underlying type is in the list:

type MyFloat float32
var x, y MyFloat
add(MyFloat)(a, b)	// ok because the underlying type of MyFloat is in T 

As can be expected, type arguments can be omitted whenever they are clear from context:

var a, b float32
add(a, b)	// ok
add(1.0, 2.0)	// error: T cannot be inferred from const
add(float32(1.0), float32(2.0)) // ok

When "compiling" a generic function or type definition, the compiler checks that the operations used are supported by all types in the type lists specified for the type parameters:

type K ()
type V K
type struct Bimap {			// error: cannot used K () as key type: K () does not support the '==' operation 
	Forward map[K]V
	Forward map[V]K
}

type K (==)	// required comparable types for K
type V K
type struct Bimap {...}	// ok

type T (==)
func sum(values T...) {
	r := T{}
	for _, v := range values {
		r += v   // error (when compiling package): T (==) does not support the '+' operator
	}
	return r
}

// package stats
type T (int,int8,int15,int32,int64,uint,uint8,uint15,uint32,uint64,float32,float64,complex64,complex128)

// T is reused for many functions (implementations omitted for brevity), all ok
func Sum(values T...) T
func Percentile(p float32, values T...) T 	
func Median(values T...) 
func Average(values T...) 

// ... but:
func Min(a, b T) T {
	if a < b {	 // error (when compiling package): complex64 does not support the '<' operator
		return a 		// (the author of stats must decide to drop the function or complex type support) 
	}
	return b
}

When a generic function or type is instantiated in a program, the compiler checks that the type arguments satisfy the type parameters:

type MyBimap Bimap(string,func()) // error: func() is not in V (==) (operator == is not supported)

m := stats.Min(complex64)	// error: complex is not in T (int,...)     (the author decided to drop support for complex types)

Because the compiler only needs to check whether a type is an explicit list, or in a limited set of built-in implicit type lists, type argument validation should be quick and the error messages clear to the programmer that writes the code that uses the generics.

Using the compiler provided implicit type lists and package author provide explicit type lists, many useful gerenic functions and types can be written. The idea of contract/concepts as found in other languages are explicitly avoided: generic code must either be constrainted to the Golang type 'contracts' (), (==) (and possibly (struct)), or just provide an explicit list.

Details

To get to a specification many details need to be worked out, and I've probably overseen many and it might turned out the whole idea doesn't work. That being said, below is a list of additional thoughts.

Unexported type parameters

It is well possible that the type parameter for an exported type is used only in unexported fields:

// package caching
type T

type Cache struct {
	cache map[string]T
}

// package main
type Calc struct {...}
type myCach Cache(Calc)  // how does the coder of main knowns about this type parameter? Does the compiler 'export' the type parameters? 

A solution for this problem could be to allow 'ghost fields' that are always exported by the compiler. A ghost field does not consume any bits and cannot be addressed, but is used to express existence of unexported type parameters:

type Cache struct {
	_ T 	// a ghost field
	// contains unexported types
}

// if needed ghostfield can also be used to order the type parameters in a more senseful order
type Bimap struct {
	_ K 						
	_ V
	// contains unexported types
}

type MyCache Cache(string)			// ok
type MyMain Bimap(string,int)		// ok

Using ghost fields, the type parameters order can also be kept the same if even the field order changes.

Type parameter type assertion

When writing generic code at some point it will because necessary to branch code depending on specific type arguments. The normal Go type assertion could possibly be reused for this, which has slightly different semantics in the generic code:

type F (float32,float64)
type I (uint16,uint32)
type Triangles struct {
	Indices []I	
	Coords	[][3]F
}

func (t *Triangles) draw() {
	
	// note: all code is compiled for all F, so gl.DrawTrianglef and gl.DrawTriangled would receive a interface{} or something unsafe.
	switch F.(type) {
	case float32:
		gl.DrawTrianglef(...)

	case float64:
		gl.DrawTriangled(...)

	}
}

Generic type methods

A named type list used as type parameter in a generic method body are bound the same type if the type list is used in the type definition:

type T (int,string)
type MyCache struct {
	_ T
}

func (m *MyCache) Foot(value T) { 
	// m.T is the same type a T
}

Because instantition of generic types are different types, method specialisation for subsets of the parameter type list could be allowed:

func (t *Triangles(float32)) draw() {
	// code is not compiled for float64
}

func (t *Triangles(float64)) draw() {
	// code is not compiled for float32
}

func (t *Triangles(float64)) Buffer() {
	// buffer is available only fro float64
}

Whether is useful is to be seen. It could always scraped as part of a first version. An alternative way could be so allow some kind of static_if that can be used for conditional compilation. However, both ideas are very advanced, and the impliciation are hard to oversee at this point.

Implicit type list (struct) and others

For marshalling and orm code it would be beneficial to express that a type parameter must be of kind struct:

type T (struct) // all type that are of kind struct (useful for marshalling and ORM for example)

func Fetch(db *sql.DB, table string) *T {
	// SELECTs and maps columns to fields of T
}

package main

type User struct {...}

// ..
var user *User
user = Fetch(db, "users") // T is inferred from the assigned result variable

Implicit type lists for other innate Go types are not as necessary, as most can expressed directly:

func FanIn(chans chan T...) chan T {
	// ...	
}

func Sort(list []T, less func(a, b T) bool) {
	// ...
}

@gopherbot gopherbot added this to the Proposal milestone Aug 12, 2019

@gopherbot gopherbot added the Proposal label Aug 12, 2019

@deanveloper

This comment has been minimized.

Copy link

commented Aug 15, 2019

In type declarations, the right side is always a valid type.

type T (int,string) assumes that (int, string) is a valid type. If you're proposing that this becomes a valid type, then that'd be a huge breaking change, as any functions that take advantage of multiple return would cease to work.
Even if a special case was added for generics, I also think that sharing the same syntax as multiple return in general is just a bad idea.

Not to mention that now generic functions do not look like generic functions anymore. When I see func Sum(a []T) T, I am not sure if T is a pre-existing type in your package, or if T is a generic type that I need to be aware of.

Because instantition of generic types are different types, method specialisation for subsets of the parameter type list could be allowed

This was something that the original generics draft was explicitly trying to avoid, to quote the original draft:

It is also not a goal to enable specialized implementations of polymorphic definitions, such as defining a general vector<T> and a special-case vector<bool> using bit-packing.

This however doesn't mean that it's a bad idea, but it may reflect that it isn't a problem that Go wants to solve.

I do think that this brings some interesting ideas though. Having things like struct meaning "any struct" and (==) meaning "anything comparable" I think would be useful.

@markusheukelom

This comment has been minimized.

Copy link
Author

commented Aug 15, 2019

Thanks for taking the time to read this.

In type declarations, the right side is always a valid type.

type T (int,string) assumes that (int, string) is a valid type. If you're proposing that this becomes a valid type, then that'd be a huge breaking change, as any functions that take advantage of multiple return would cease to work.

I am not aware that type T (int,string) is currently valid syntax; can you explain why it would be a huge breaking change? Cause I am probably overseeing something... Btw, T would not be instantiable in non-generic code (making typelist different from other types), instead it would denote valid type arguments in generic code. But of course, this syntax is not essential to the idea of typelist: another syntax could be used as well, like typelist T (int,string).

Not to mention that now generic functions do not look like generic functions anymore. When I see func Sum(a []T) T, I am not sure if T is a pre-existing type in your package, or if T is a generic type that I need to be aware of.

Well you have to lookup non-generic type definitions too when use a function. So going to the definition of T you'd see it's a typelist and you'd immediately see the types that are allowed. The fact that that makes Sum a generic function is something you would not even need to realise to use the function. The benefit is that T may be reused and that packages should (generally) only declare a couple (think about a statistical package or a package with channel algorithms). Also, a coding style consensus of using single capital letters for type lists can be encouraged, so that they stand out immediately as type parameters. (Note that this is already a defacto stand in Go-lang documentation, where some function argument types are denote with 'T' and they listed what they maybe be). Another major benefit is that it leads to simpler syntax, because in other proposals, any type contraints on the type parameters have to be supplied for every function signature.

Because instantition of generic types are different types, method specialisation for subsets of the parameter type list could be allowed

This was something that the original generics draft was explicitly trying to avoid, to quote the original draft:

Thanks for mentioning this, I wasn't aware of this. I wholeheartedly agree with this; I also rather keep generics as 'lightweight' as possible.

@deanveloper

This comment has been minimized.

Copy link

commented Aug 15, 2019

I am not aware that type T (int,string) is currently valid syntax; can you explain why it would be a huge breaking change? Cause I am probably overseeing something... Btw, T would not be instantiable in non-generic code (making typelist different from other types), instead it would denote valid type arguments in generic code. But of course, this syntax is not essential to the idea of typelist: another syntax could be used as well, like typelist T (int,string).

What I was saying is that in type declarations, the right side is always the underlying type of the left side. If we want to continue this rule, then that would mean that something like type T (int, string) would meant that T's underlying type is (int, string), meaning that (int, string) would be a valid type. Unfortunately it shares the same syntax as the return values of a declared function's return value, so it would be ambiguous if func Foo() (Bar, Baz) returned a (Bar, Baz) meaning "a Bar and a Baz" or a (Bar, Baz) meaning "a generic type that is either Bar or Baz". Granted this is just syntax stuff that could be pretty easily changed.

Well you have to lookup non-generic type definitions too when use a function.

This is actually a fair point

Another major benefit is that it leads to simpler syntax, because in other proposals, any type contraints on the type parameters have to be supplied for every function signature.

While it's a simpler syntax, it's not as clear. For instance:

type T (int, string)
func AddThree(t1 T, t2 T, t3 T) T

The syntax doesn't make it clear the difference between "t1, t2, t3, and the return value are all (int, string) and are the same type" and "t1, t2, t3 and the return value are all (int, string) but are all different types". In the syntax shown by the draft and the blog post, it is very clear:

contract IntString(T) {
    T int, string
}
func AddThree(type T IntString)(t1 T, t2 T, t3 T) T

This syntax very clearly shows that all 4 must be the same type, since it illustrates that T is a parameter to the function definition.

@markusheukelom

This comment has been minimized.

Copy link
Author

commented Aug 16, 2019

While it's a simpler syntax, it's not as clear. ...

While I agree that the explicit type parameter in you example makes it somewhat clearer that all T must be of the same type, I think in practise it is really of not such a big deal. For example, take a look at the documentation of the builtin functions (https://golang.org/pkg/builtin)[https://golang.org/pkg/builtin]: the documentation only expresses that Type must be of the same type in the type definition but never mentions is any further with function definition. Have you ever looked at any of those functions and got confused whether 'Type' got be different types? The thing with generics is that when you use it you already have an idea of the types you want to use, so for

type (int,string) 
func Min(a, b T) T

I don't think there is a risk of confusion about if T may be int for a and string for b and the return type: it's just not natural to think this when you're using the function. However if you would, you'd just get the compile error cannot not use string and int for T at the same time (or something similar).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

Seems like a fairly subtle change in the type definition might accidentally change the order of the type parameters, and therefore the meaning of the type arguments in different packages.

@deanveloper

This comment has been minimized.

Copy link

commented Aug 17, 2019

I think that what I should have originally gone for was not the clarity of function definitions being parametric, but the idea of type T (int, string) means that T is a generic type.

It's already partially confusing because (int, string) means something entirely different in a different context, and it doesn't hint at all that T is a generic type. The only way to know it's a generic type is that you know the generic type syntax, which so happens to be the same syntax as multiple return syntax.

Granted I'm just arguing about syntax at this point, the problems you are solving and the way you solve them are really what matter, which is something I've addressed (and you've responded to) already. So I'm not sure if I have any other feedback

@markusheukelom

This comment has been minimized.

Copy link
Author

commented Aug 17, 2019

@ianlancetaylor

Seems like a fairly subtle change in the type definition might accidentally change the order of the type parameters, and therefore the meaning of the type arguments in different packages.

If I understand you correctly, you mean

type K (==)
type V K
type Bimap struct {
   Forward map[K][V]
   Reverse map[V][K]
}
// changing into:
type Bimap struct {
   Reverse map[K][V]   // Note: Reverse is now the first field
   Forward map[V][K]
}

right? This is indeed a drawback, as are for example type parameters that are used only in non-exported fields. For structs, please see the section 'Unexported type parameters' in my proposal for an idea that might alleviate these issues for structs. For functions this problem is probably less of an issue because changing the order of arguments affects calling code directly anyway (in contrast to changing struct members). That being said, it is a sort of a weakness in the proposal that comes with not explicitly stating the type parameters with every declaration.

I'd like to take the opportunity to explain that an inspiration for this proposal is how the builtin in functions are documented in go. I though a bit about how I would like to see generics used and I imagined packages likes 'stats or 'linalg', that provide functions on T where T is float32,float64 and possible the complex types. Now if the function definitions in those package where like:

// T is a contract or something
func (type T)Median(values ...T) T
func (type T)Avg(values ...T) T
func (type T)Sum(values ...T) T
//... etc

The type paremeter signature (type T) starts for feel really redundant here, because the caller knows that 'right, T is always a float32 or float64`. Moreover, in combination with (if these are allowed) generic methods will have a lot of paren groups:

type  Stats struct {
   Total float64
   Avg float64
   Median float64
}

func (s *Stats) Add(type T)(values ...T) (bool, error) { 
...
}

As another example, for a package 'channels' that would provide channel algorithms, T is just any type:

type T ()

func FanIn(... chan T) chan T 
func Broadcast(... chan T) chan T 

And in package 'containers' we could have two type parameters: type T () for data types and type K (==) for any type parameter used as a key in a map.

The pattern here is that any package with generic functions/types will (or maybe should) only use a couple of type parameters kinds (contracts) anyway to keep things simple and in the spirit and practicality of Go. When using the package (ie not being the author), you should be able to quickly learn and easily lookup the one or two type parameter kinds that are used in the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.