Permalink
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1694 lines (1425 sloc) 61.8 KB

Type Parameters in Go

This is a proposal for adding generics to Go, written by Ian Lance Taylor in December, 2013. This proposal will not be adopted. It is being presented as an example for what a complete generics proposal must cover.

Introduction

This document describes a possible implementation of type parameters in Go. We permit top-level types and functions to use type parameters: types that are not known at compile time. Types and functions that use parameters are called parameterized, as in "a parameterized function."

Some goals, borrowed from Garcia et al:

  • Do not require an explicit relationship between a definition of a parameterized function and its use. The function should be callable with any suitable type.
  • Permit interfaces to express relationships between types of methods, as in a comparison method that takes two values of the same parameterized type.
  • Given a type parameter, make it possible to use related types, such as a slice of that type.
  • Do not require explicit instantiation of parameterized functions.
  • Permit type aliasing of parameterized types.

Background

My earlier proposal for generalized types had some flaws.

This document is similar to my October 2013 proposal, but with a different terminology and syntax, and many more details on implementation.

People expect parameterized functions to be fast. They do not want a reflection based implementation in all cases. The question is how to support that without excessively slowing down the compiler.

People want to be able to write simple parameterized functions like Sum(v []T) T, a function that returns the sum of the values in the slice v. They are prepared to assume that T is a numeric type. They don’t want to have to write a set of methods simply to implement Sum or the many other similar functions for every numeric type, including their own named numeric types.

People want to be able to write the same function to work on both []byte and string, without requiring the bytes to be copied to a new buffer.

People want to parameterize functions on types that support simple operations like comparisons. That is, they want to write a function that uses a type parameter and compares a value of that type to another value of the same type. That was awkward in my earlier proposal: it required using a form of the curiously recurring template pattern.

Go’s use of structural typing means that a program can use any type to meet an interface without an explicit declaration. Type parameters should work similarly.

Proposal

We permit package-level type and func declarations to use type parameters. There are no restrictions on how these parameters may be used within their scope. At compile time each actual use of a parameterized type or function is instantiated by replacing each type parameter with an ordinary type, called a type argument. A type or function may be instantiated multiple times with different type arguments. A particular type argument is only permitted if all the operations used with the corresponding type parameter are permitted for the type argument. How to implement this efficiently is discussed below.

Syntax

Any package-scope type or func may be followed by one or more type parameter names in square brackets.

type [T] List struct { element T; next *List[T] }

This defines T as a type parameter for the parameterized type List.

Every use of a parameterized type must provide specific type arguments to use for the type parameters. This is done using square brackets following the type name. In List, the next field is a pointer to a List instantiated with the same type parameter T.

Examples in this document typically use names like T and T1 for type parameters, but the names can be any identifier. The scope of the type parameter name is only the body of the type or func declaration. Type parameter names are not exported. It is valid, but normally useless, to write a parameterized type or function that does not actually use the type parameter; the effect is that every instantiation is the same.

Some more syntax examples:

type ListInt List[int]
var v1 List[int]
var v2 List[float]
type (
[T1, T2] MyMap map[T1]T2
[T3] MyChan chan T3
)
var v3 MyMap[int, string]

Using a type parameter with a function is similar.

func [T] Push(l *List[T], e T) *List[T] {
	return &List[T]{e, l}
}

As with parameterized types, we must specify the type arguments when we refer to a parameterized function (but see the section on type deduction, below).

var PushInt = Push[int] // Type is func(*List[int], int) *List[int]

A parameterized type can have methods.

func [T] (v *List[T]) Push(e T) {
	*v = &List[T]{e, v}
}

A method of a parameterized type must use the same number of type parameters as the type itself. When a parameterized type is instantiated, all of its methods are automatically instantiated too, with the same type arguments.

We do not permit a parameterized method for a non-parameterized type. We do not permit a parameterized method to a non-parameterized interface type.

A note on syntax

The use of square brackets to mark type parameters and the type arguments to use in instantiations is new to Go. We considered a number of different approaches:

  • Use angle brackets, as in Vector<int>. This has the advantage of being familiar to C++ and Java programmers. Unfortunately, it means that f<T>(true) can be parsed as either a call to function f<T> or a comparison of f<T (an expression that tests whether f is less than T) with (true). While it may be possible to construct complex resolution rules, the Go syntax avoids that sort of ambiguity for good reason.
  • Overload the dot operator again, as in Vector.int or Map.(int, string). This becomes confusing when we see Vector.(int), which could be a type assertion.
  • We considered using dot but putting the type first, as in int.Vector or (int, string).Map. It might be possible to make that work without ambiguity, but putting the types first seems to make the code harder to read.
  • An earlier version of this proposal used parentheses for names after types, as in Vector(int). However, that proposal was flawed because there was no way to specify types for parameterized functions, and extending the parentheses syntax led to MakePair(int, string)(1, "") which seems less than ideal.
  • We considered various different characters, such as backslash, dollar sign, at-sign or sharp. The square brackets grouped the parameters nicely and provide an acceptable visual appearance.
  • We considered a new keyword, gen, with parenthetical grouping of parameterized types and functions within the scope of a single gen. The grouping seemed un-Go-like and made indentation confusing. The current syntax is a bit more repetitive for methods of parameterized types, but is easier to understand.

Semantics

There are no restrictions on how parameterized types may be used in a parameterized function. However, the function can only be instantiated with type arguments that support the uses. In some cases the compiler will give an error for a parameterized function that can not be instantiated by any type argument, as described below.

Consider this example, which provides the boilerplate for sorting any slice type with a Less method.

type [T] SortableSlice []T
func [T] (v SortableSlice[T]) Len() int { return len(v) }
func [T] (v SortableSlice[T]) Swap(i, j int) {
	v[i], v[j] = v[j], v[i]
}
func [T] (v SortableSlice[T]) Less(i, j int) bool {
	return v[i].Less(v[j])
}
func [T] (v SortableSlice[T]) Sort() {
	sort.Sort(v)
}

We don’t have to declare anywhere that the type parameter T has a method Less. However, the call of the Less method tells the compiler that the type argument to SortableSlice must have a Less method. This means that trying to use SortableSlice[int] would be a compile-time error, since int does not have a Less method.

We can sort types that implement the < operator, like int, with a different vector type:

type [T] PSortableSlice []T
func [T] (v PSortableSlice[T]) Len() int { return len(v) }
func [T] (v PSortableSlice[T]) Swap(i, j int) {
	v[i], v[j] = v[j], v[i]
}
func [T] (v PSortableSlice[T]) Less(i, j int) bool {
	return v[i] < v[j]
}
func [T] (v PSortableSlice[T]) Sort() {
	sort.Sort(v)
}

The PSortableSlice type may only be instantiated with types that can be used with the < operator: numeric or string types. It may not be instantiated with a struct type, even if the struct type has a Less method.

Can we merge SortableSlice and PSortableSlice to have the best of both worlds? Not quite; there is no way to write a parameterized function that supports either a type with a Less method or a builtin type. The problem is that SortableSlice.Less can not be instantiated for a type without a Less method, and there is no way to only instantiate a method for some types but not others.

(Technical aside: it may seem that we could merge SortableSlice and PSortableSlice by having some mechanism to only instantiate a method for some type arguments but not others. However, the result would be to sacrifice compile-time type safety, as using the wrong type would lead to a runtime panic. In Go one can already use interface types and methods and type assertions to select behavior at runtime. There is no need to provide another way to do this using type parameters.)

All that said, one can at least write this:

type [T] Lessable T
func [T] (a Lessable[T]) Less(b T) bool {
	return a < b
}

Now one can use SortableSlice with a slice v of some builtin type by writing

	SortableSlice([]Lessable(v))

Note that although Lessable looks sort of like an interface, it is really a parameterized type. It may be instantiated by any type for which Lessable.Less can be compiled. In other words, one can write []Lessable(v) for a slice of any type that supports the < operator.

As mentioned above parameterized types can be used just like any other type. In fact, there is a minor enhancement. Ordinarily type assertions and type switches are only permitted for interface types. When writing a parameterized function, type assertions and type switches are permitted for type parameters. This is true even if the function is instantiated with a type argument that is not an interface type. Also, a type switch is permitted to have multiple parameterized type cases even if some of them are the same type after instantiation. The first matching case is used.

Cycles

The instantiation of a parameterized function may not require the instantiation of the same parameterized function with different type parameters. This means that a parameterized function may call itself recursively with the same type parameters, but it may not call itself recursively with different type parameters. This rule applies to both direct and indirect recursion.

For example, the following is invalid. If it were valid, it would require the construction of a type at runtime.

type [T] S struct { f T }
func [T] L(n int, e T) interface{} {
	if n == 0 {
		return e
	}
	return L(n-1, S[T]{e})
}

Type Deduction

When calling a parameterized function, as opposed to referring to it without calling it, the specific types to use may be omitted in some cases. A function call may omit the type arguments when every type parameter is used for a regular parameter, or, in other words, there are no type parameters that are used only for results. When a call is made to a parameterized function without specifying the type arguments, the compiler will walk through the arguments from left to right, comparing the actual type of the argument A with the type of the parameter P. If P contains type parameters, then A and P must be identical. The first time we see a type parameter in P, it will be set to the appropriate portion of A. If the type parameter appears again, it must be identical to the actual type at that point.

Note that at compile time the type argument may itself be a parameterized type, when one parameterized function calls another. The type deduction algorithm is the same. A type parameter of P may match a type parameter of A. Once this match is made, then every subsequent instance of the P type parameter must match the same A type parameter.

When doing type deduction with an argument that is an untyped constant, the constant does not determine anything about the type argument. The deduction proceeds with the remaining function arguments. If at the end of the deduction the type argument has not been determined, the constants that correspond to unknown type arguments are re-examined and given the type int, rune, float64, or complex128 as usual. Type deduction does not support passing an untyped nil constant; nil may only be used with an explicit type conversion (or, of course, the type arguments may be written explicitly).

When passing a parameterized function F1 to a non-parameterized function F2, type deduction runs the other way around: the type of the corresponding argument of F2 is used to deduce the type of F1.

When passing a parameterized function F1 to a parameterized function F2, the type of F1 is compared to the type of the corresponding argument of F2. This may yield specific types for F1 and/or F2 type parameters, for which the compiler proceeds as usual. If any of the type arguments of F1 are not determined, type deduction proceeds with the remaining arguments. At the end of the deduction, the compiler reconsiders F1 with the final set of types. At that point it is an error if all the type parameters of F1 are not determined. This is not an iterative algorithm; the compiler only reconsiders F1 once, it does not build a stack of retries if multiple parameterized functions are passed.

Type deduction also applies to composite literals, in which the type arguments for a parameterized composite type are deduced from the types of the literals.

Type deduction also applies to type conversions to a parameterized type. The type arguments for the type are deduced from the type of the expression being converted.

Examples:

func [T] Sum(a, b T) T { return a + b }
var v1 = Sum[int](0, 0)
var v2 = Sum(0, 0)			// [int] deduced
type [T] Cons struct { car, cdr T }
var v3 = Cons{0, 0}			// [int] deduced
type [T] Opaque T
func [T] (a Opaque[T]) String() string {
	return "opaque"
}
var v4 = []Opaque([]int{1})	// Opaque[int] deduced

var i int
var m1 = Sum(i, 0)		// i causes T to be deduced as int, 0 is
				// passed as int.
var m2 = Sum(0, i)		// 0 ignored on first pass, i causes T
				// to be deduced as int, 0 passed as int.
var m3 = Sum(1, 2.5)		// 1 and 2.5 ignored on first pass. On
				// second pass 1 causes T to be deduced as
				// int. Passing 2.5 is an error.
var m4 = Sum(2.5, 1)		// 2.5 and 1 ignored on first pass. On
				// second pass 2.5 causes T to be deduced
				// as float64. 1 converted to float64.

func [T1, T2] Transform(s []T1, f func(T1) T2) []T2
var s1 = []int{0, 1, 2}
	// Below, []int matches []T1 deducing T1 as int.
	// strconv.Itoa matches T1 as int as required,
	// T2 deduced as string. Type of s2 is []string.
var s2 = Transform(s1, strconv.Itoa)

func [T1, T2] Apply(f func(T1) T2, v T1) T2 { return f(v) }
func [T] Ident(v T) T { return v }
	// Below, Ident matches func(T1) T2, but neither T1 nor T2
	// are known. The compiler continues. Next i, type int,
	// matches T1, so T1 is int. The compiler returns to Ident.
	// T matches T1, which is int, so T is int. Then T matches
	// T2, so T2 is int. All type arguments are deduced.
func F(i int) int { return Apply(Ident, i) }

Note that type deduction requires types to be identical. This is stronger than the usual requirement when calling a function, namely that the types are assignable.

func [T] Find(s []T, e T) bool
type E interface{}
var f1 = 0
	// Below does not compile.  The first argument means that T is
	// deduced as E. f1 is type int, not E. f1 is assignable to
	// E, but not identical to it.
var f2 = Find([]E{f1}, f1)
	// Below does compile. Explicit type specification for Find
	// means that type deduction is not performed.
var f3 = Find[E]([]E{f1}, f1)

Requiring identity rather than assignability is to avoid any possible confusion about the deduced type. If different types are required when calling a function it is always possible to specify the types explicitly using the square bracket notation.

Examples

A hash table.

package hashmap

type [K, V] bucket struct {
	next *bucket
	key K
	val V
}

type [K] Hashfn func(K) uint
type [K] Eqfn func(K, K) bool

type [K, V] Hashmap struct {
	hashfn Hashfn[K]
	eqfn Eqfn[K]
	buckets []bucket[K, V]
	entries int
}

// This function must be called with explicit type arguments, as
// there is no way to deduce the value type.  For example,
// h := hashmap.New[int, string](hashfn, eqfn)
func [K, V] New(hashfn Hashfn[K], eqfn Eqfn[K]) *Hashmap[K, V] {
	// Type parameters of Hashmap deduced as [K, V].
	return &Hashmap{hashfn, eqfn, make([]bucket[K, V], 16), 0}
}

func [K, V] (p *Hashmap[K, V]) Lookup(key K) (val V, found bool) {
	h := p.hashfn(key) % len(p.buckets)
	for b := p.buckets[h]; b != nil; b = b.next {
		if p.eqfn(key, b.key) {
			return b.val, true
		}
	}
	return
}

func [K, V] (p *Hashmap[K, V]) Insert(key K, val V) (inserted bool) {
	// Implementation omitted.
}

Using the hash table.

package sample

import (
	"fmt"
	"hashmap"
	"os"
)

func hashint(i int) uint {
	return uint(i)
}

func eqint(i, j int) bool {
	return i == j
}

var v = hashmap.New[int, string](hashint, eqint)

func Add(id int, name string) {
	if !v.Insert(id, name) {
		fmt.Println(“duplicate id”, id)
		os.Exit(1)
	}
}

func Find(id int) string {
	val, found := v.Lookup(id)
	if !found {
		fmt.Println(“missing id”, id)
		os.Exit(1)
	}
	return val
}

Sorting a slice given a comparison function.

func [T] SortSlice(s []T, less func(T, T) bool) {
	sort.Sort(&sorter{s, less})
}

type [T] sorter struct {
	s []T
	less func(T, T) bool
}

func [T] (s *sorter[T]) Len() int { return len(s.s) }
func [T] (s *sorter[T]) Less(i, j int) bool {
	return s.less(s[i], s[j])
}
func [T] (s *sorter[T]) Swap(i, j int) {
	s.s[i], s.s[j] = s.s[j], s.s[i]
}

Sorting a numeric slice (also works for string).

// This can be successfully instantiated for any type T that can be
// used with <.
func [T] SortNumericSlice(s []T) {
	SortSlice(s, func(a, b T) bool { return a < b })
}

Merging two channels into one.

func [T] Merge(a, b <-chan T) <-chan T {
	c := make(chan T)
	go func() {
		for a != nil && b != nil {
			select {
			case v, ok := <-a:
				if ok {
					c <- v
				} else {
					a = nil
				}
			case v, ok := <-b:
				if ok {
					c <- v
				} else {
					b = nil
				}
			}
		}
		close(c)
	}()
	return c
}

Summing a slice.

// Works with any type that supports +.
func [T] Sum(a []T) T {
	var s T
	for _, v := range a {
		s += v
	}
	return s
}

A generic interface.

type [T] Equaler interface {
	Equal(T) bool
}

// Return the index in s of v1, or -1 if not found.
func [T] Find(s []T, v1 T) int {
	eq, eqok := v1.(Equaler[T])
	for i, v2 := range s {
		if eqok {
			if eq.Equal(v2) {
				return i
			}
		} else if reflect.DeepEqual(v1, v2) {
			return i
		}
	}
	return -1
}

type S []int

// Slice equality that treats nil and S{} as equal.
func (s1 S) Equal(s2 S) bool {
	if len(s1) != len(s2) {
		return false
	}
	for i, v1 := range s1 {
		if v1 != s2[i] {
			return false
		}
	}
	return true
}

var i = Find([]S{S{1, 2}}, S{1, 2})

Joining sequences; works for any T that supports len, copy to []byte, and conversion from []byte; in other words, works for []byte and string.

func [T] Join(a []T, sep T) T {
	if len(a) == 0 {
		return T([]byte{})
	}
	if len(a) == 1 {
		return a[0]
	}
	n := len(sep) * (len(a) - 1)
	for _, v := range a {
		n += len(v)
	}
	b := make([]byte, n)
	bp := copy(b, a[0])
	for _, v := range a[1:] {
		bp += copy(b[bp:], sep)
		bp += copy(b[bp:], v)
	}
	return T(b)
}

Syntax/Semantics Summary

That completes the description of the language changes. We now turn to implementation details. When considering this language proposal, consider it in two parts. First, make sure the syntax and semantics are clean, useful, orthogonal, and in the spirit of Go. Second, make sure that the implementation is doable and acceptably efficient. I want to stress the two different parts because the implementation proposal is complex. Do not let the complexity of the implementation influence your view of the syntax and semantics. Most users of Go will not need to understand the implementation.

Comparison to other languages

C

Type parameters in C are implemented via preprocessor macros. The system described here can be seen as a macro system. However, unlike in C, each parameterized function must be complete and compilable by itself. The result is in some ways less powerful than C preprocessor macros, but does not suffer from problems of namespace conflict and does not require a completely separate language (the preprocessor language) for implementation.

C++

The system described here can be seen as a subset of C++ templates. Go’s very simple name lookup rules mean that there is none of the confusion of dependent vs. non-dependent names. Go’s lack of function overloading removes any concern over just which instance of a name is being used. Together these permit the explicit accumulation of constraints when compiling a generalized function, whereas in C++ where it’s nearly impossible to determine whether a type may be used to instantiate a template without effectively compiling the instantiated template and looking for errors (or using concepts, proposed for later addition to the language). Also, since instantiating a parameterized types always instantiates all methods, there can’t be any surprises as can arise in C++ when code separate from both the template and the instantiation calls a previously uncalled method.

C++ template metaprogramming uses template specialization, non-type template parameters, variadic templates, and SFINAE to implement a Turing complete language accessible at compile time. This is very powerful but at the same time has significant complexities: the template metaprogramming language has a baroque syntax, no variables or non-recursive loops, and is in general completely different from non-template C++. The system described here does not support anything similar to template metaprogramming for Go. I believe this is a feature. I think the right way to implement such features in Go would be to add support in the go tool for writing Go code to generate Go code, most likely using the go/ast package and friends, which is in turn compiled into the final program. This would mean that the metaprogramming language in Go is itself Go.

Java

I believe this system is slightly more powerful than Java generics, in that it permits direct operations on basic types without requiring explicit methods (that is, the methods are in effect generated automatically). This system also does not use type erasure. Type boxing is minimized. On the other hand there is of course no function overloading, and there is nothing like covariant return types.

Type Checking

A parameterized type is valid if there is at least one set of type arguments that can instantiate the parameterized type into a valid non-parameterized type. This means that type checking a parameterized type is the same as type checking a non-parameterized type, but the type parameters are assumed to be valid.

type [T] M1 map[T][]byte		// Valid.
type [T] M2 map[[]byte]T		// Invalid.  Slices can not
					// be map keys.

A parameterized function is valid if the values of parameterized types are used consistently. Here we describe consistency checking that may be performed while compiling the parameterized function. Further type checking will occur when the function is instantiated.

It is not necessary to understand the details of these type checking rules in order to use parameterized functions. The basic idea is simple: a parameterized function can be used by replacing all the type parameters with type arguments. I originally thought it would be useful to describe an exact set of rules so that all compilers would be consistent in rejecting parameterized functions that can never be instantiated by any type argument. However, I now think this becomes too strict. We don’t want to say that a future, smarter, compiler must accept a parameterized function that can never be instantiated even if this set of rules permits it. I don’t think complete consistency of handling of invalid programs is essential.

These rules are still useful as a guide to compiler writers.

Each parameterized function will use a set of types unknown at compile time. The initial set of those types will be the type parameters. Analyzing the function will add new unknown types. Each unknown type will be annotated to indicate how it is determined from the type parameters.

In the following discussion an unknown type will start with U, a known type with K, either known or unknown with T, a variable or expression of unknown type will start with v, an expression with either known or unknown type will start with e.

Type literals that use unknown types produce unknown types. Each identical type literal produces the same unknown type, different type literals produce different unknown types.

The new unknown types will be given the obvious annotation: []U is the type of a slice of the already identified type U, and so forth. Each unknown type may have one or more restrictions, listed below.

  • []U
    • indexable with value type U
    • sliceable with result type U
  • [N]U (for some constant expression N)
    • indexable with value type U
    • sliceable with result type []U ([]U is a new unknown type)
  • *U
    • points to type U
  • map[T1]T2 (assuming either T1 or T2 is unknown)
    • indexable with value type T2
    • map type with value type T2
  • struct { ... f U ... }
    • has field or method f of type U
    • composite
  • interface { ... F(anything) U ... }
  • func ( ... U ... ) (anything) or func (anything) ( ... U ...)
    • callable
  • chan U
    • chan of type U

Each use of an unknown type as the type of a composite literal adds the restriction composite.

Each expression using a value v of unknown type U may produce a value of some known type, some previously seen unknown type, or a new unknown type. A use of a value that produces a value of a new unknown type may add a restriction to U.

  • v.F, U.F
    • If U has the restriction has field or method F of type U2 then the type of this expression is U2.
    • Otherwise a new unknown type U2 is created annotated as the type of U.F, U gets the restriction has field or method F of type U2, and the type of the expression is U2.
  • v[e]
    • If U has the restriction indexable with value type U2, then the type of the expression is U2.
    • If the type of e is known, and it is not integer, then a new unknown type U2 is created, U gets the restrictions indexable with value type U2 and map type with value type U2 and the type of the result is U2.
    • Otherwise a new unknown type U2 is created annotated as the element type of U, U gets the restriction indexable with value type U2, and the type of the result is U2.
  • e[v] (where the type of e is known)
    • If the type of e is slice, string, array, or pointer to array, then U gets the restriction integral.
    • If the type of e is a map type, then U gets the restriction comparable.
    • Otherwise this is an error, as usual.
  • v[e1:e2] or v[e1:e2:e3]
    • If any of the index expressions have unknown type, those unknown types get the restriction integral.
    • If U has the restriction sliceable with result type U2, then the type of the result is U2.
    • Otherwise a new unknown type U2 is created annotated as the slice type of U, U gets the restriction sliceable with result type U2, and the type of the result is U2. (In many cases U2 is the same as U, but not if U is an array type.)
  • v.(T)
    • Does not introduce any restrictions; type of value is T.
  • v1(e2)
    • This is a function call, not a type conversion.
    • U1 gets the restriction callable.
    • Does not introduce any restrictions on the arguments.
    • If necessary, new unknown types are introduced for the result types, annotated as the type of the corresponding result parameter.
  • e1(v2)
    • This is a function call, not a type conversion.
    • If e1 is known to be a parameterized function, and any of the arguments have unknown type, then any restrictions on e1’s type parameters are copied to the unknown types of the corresponding arguments.
  • e1(v2...)
    • This is the case with an actual ellipsis in the source code.
    • e1 is handled as though the ellipsis were not present.
    • If U2 does not already have the restriction sliceable, a new unknown type U3 is created, annotated as the element type of U2, and U2 gets the restriction sliceable with result type U3.
  • v1 + e2, e2 + v1
    • U1 gets the restriction addable.
    • As usual, the type of the expression is the type of the first operand.
  • v1 {-,*,/} e2, e2 {-,*,/} v1
    • U1 gets the restriction numeric.
    • Type of expression is type of first operand.
  • v1 {%,&,|,^,&^,<<,>>} e2, e2 {%,&,|,^,&^,<<,>>} v1
    • U1 gets the restriction integral.
    • Type of expression is type of first operand.
  • v1 {==,!=} e2, e2 {==,!=} v1
    • U1 gets the restriction comparable; expression has untyped boolean value.
  • v1 {<,<=,>,>=} e2, e2 {<,<=,>,>=} v1
    • U1 gets the restriction ordered; expression has untyped boolean value.
  • v1 {&&,||} e2, e2 {&&,||} v1
    • U1 gets the restriction boolean; type of expression is type of first operand.
  • !v
    • U gets the restriction boolean; type of expression is U.
  • &v
    • Does not introduce any restrictions on U.
    • Type of expression is new unknown type as for type literal *U.
  • *v
    • If U has the restriction points to type U2, then the type of the expression is U2.
    • Otherwise a new unknown type U2 is created annotated as the element type of U, U gets the restriction points to type U2, and the type of the result is U2.
  • <-v
    • If U has the restriction chan of type U2, then the type of the expression is U2.
    • Otherwise a new unknown type U2 is created annotated as the element type of U, U gets the restriction chan of type U2, and the type of the result is U2.
  • U(e)
    • This is a type conversion, not a function call.
    • If e has a known type K, U gets the restriction convertible from K.
    • The type of the expression is U.
  • T(v)
    • This is a type conversion, not a function call.
    • If T is a known type, U gets the restriction convertible to T.
    • The type of the expression is T.

Some statements introduce restrictions on the types of the expressions that appear in them.

  • v <- e
    • If U does not already have a restriction chan of type U2, then a new type U2 is created, annotated as the element type of U, and U gets the restriction chan of type U2.
  • v++, v--
    • U gets the restriction numeric.
  • v = e (may be part of tuple assignment)
    • If e has a known type K, U gets the restriction assignable from K.
  • e = v (may be part of tuple assignment)
    • If e has a known type K, U gets the restriction assignable to K.
  • e1 op= e2
    • Treated as e1 = e1 op e2.
  • return e
    • If return type is known, treated as an assignment to a value of the return type.

The goal of the restrictions listed above is not to try to handle every possible case. It is to provide a reasonable and consistent approach to type checking of parameterized functions and preliminary type checking of types used to instantiate those functions.

It’s possible that future compilers will become more restrictive; a parameterized function that can not be instantiated by any type argument is invalid even if it is never instantiated, but we do not require that every compiler diagnose it. In other words, it’s possible that even if a package compiles successfully today, it may fail to compile in the future if it defines an invalid parameterized function.

The complete list of possible restrictions is:

  • addable
  • integral
  • numeric
  • boolean
  • comparable
  • ordered
  • callable
  • composite
  • points to type U
  • indexable with value type U
  • sliceable with value type U
  • map type with value type U
  • has field or method F of type U
  • chan of type U
  • convertible from U
  • convertible to U
  • assignable from U
  • assignable to U

Some restrictions may not appear on the same type. If some unknown type has an invalid pair of restrictions, the parameterized function is invalid.

  • addable, integral, numeric are invalid if combined with any of
    • boolean, callable, composite, points to, indexable, sliceable, map type, chan of.
  • boolean is invalid if combined with any of
    • comparable, ordered, callable, composite, points to, indexable, sliceable, map type, chan of.
  • comparable is invalid if combined with callable.
  • ordered is invalid if combined with any of
    • callable, composite, points to, map type, chan of.
  • callable is invalid if combined with any of
    • composite, points to, indexable, sliceable, map type, chan of.
  • composite is invalid if combined with any of
    • points to, chan of.
  • points to is invalid if combined with any of
    • indexable, sliceable, map type, chan of.
  • indexable, sliceable, map type are invalid if combined with chan of.

If one of the type parameters, not some generated unknown type, has the restriction assignable from T or assignable to T, where T is a known named type, then the parameterized function is invalid. This restriction is intended to catch simple errors, since in general there will be only one possible type argument. If necessary such code can be written using a type assertion.

As mentioned earlier, type checking an instantiation of a parameterized function is conceptually straightforward: replace all the type parameters with the type arguments and make sure that the result type checks correctly. That said, the set of restrictions computed for the type parameters can be used to produce more informative error messages at instantiation time. In fact, not all the restrictions are used when compiling the parameterized function, but they will still be useful at instantiation time.

Implementation

This section describes a possible implementation that yields a good balance between compilation time and execution time. The proposal in this section is only a suggestion.

In general there are various possible implementations that yield the same syntax and semantics. For example, it is always possible to implement parameterized functions by generating a new copy of the function for each instantiation, where the new function is created by replacing the type parameters with the type arguments. This approach would yield the most efficient execution time at the cost of considerable extra compile time and increased code size. It’s likely to be a good choice for parameterized functions that are small enough to inline, but it would be a poor tradeoff in most other cases. This section describes one possible implementation with better tradeoffs.

Type checking a parameterized function produces a list of unknown types, as described above. Create a new interface type for each unknown type. For each use of a value of that unknown type, add a method to the interface, and rewrite the use to be a call to the method. Compile the resulting function.

Callers of the function will see a list of unknown types with corresponding interfaces, with a description for each method. The unknown types will all be annotated to indicate how they are derived from the type arguments. Given the type arguments used to instantiate the function, the annotations are sufficient to determine the real type corresponding to each unknown type.

For each unknown type, the caller will construct a new copy of the type argument. For each method description for that unknown type, the caller will compile a method for the new type. The resulting type will satisfy the interface type that corresponds to the unknown type.

If the type argument is itself an interface type, the new copy of the type will be a struct type with a single member that is the type argument, so that the new copy can have its own methods. (This will require slight but obvious adjustments in the instantiation templates shown below.) If the type argument is a pointer type, we grant a special exception to permit its copy to have methods.

The call to the parameterized function will be compiled as a conversion from the arguments to the corresponding new types, and a type assertion of the results from the interface types to the type arguments.

We will call the unknown types Un, the interface types created while compiling the parameterized function In, the type arguments used in the instantiation An, and the newly created corresponding types Bn. Each Bn will be created as though the compiler saw type Bn An followed by appropriate method definitions (modified as described above for interface and pointer types).

To show that this approach will work, we need to show the following:

  • Each operation using a value of unknown type can be implemented as a call to a method M on an interface type I.
  • We can describe each M for each I in such a way that we can instantiate the methods for any valid type argument; for simplicity we can describe these methods as templates in the form of Go code, and we call them instantiation templates.
  • All valid type arguments will yield valid method implementations.
  • All invalid type arguments will yield some invalid method implementation, thus causing an appropriate compilation error. (Frankly this description does not really show that; I’d be happy to see counter-examples.)

Simple expressions

Simple expressions turn out to be easy. For example, consider the expression v.F where v has some unknown type U1, and the expression has the unknown type U2. Compiling the original function will generate interface types I1 and I2.

Add a method $FieldF to I1 (here I’m using $ to indicate that this is not a user-callable method; the actual name will be generated by the compiler and never seen by the user). Compile v.F as v.$FieldF() (while compiling the code, v has type I1). Write out an instantiation template like this:

func (b1 *B1) $FieldF() I2 { return B2(A1(*b1).F) }

When the compiler instantiates the parameterized function, it knows the type arguments that correspond to U1 and U2. It has defined new names for those type arguments, B1 and B2, so that it has something to attach methods to. The instantiation template is used to define the method $FieldF by simply compiling the method in a scope such that A1, B1, and B2 refer to the appropriate types.

The conversion of *b1 (type B1) will always succeed, as B1 is simply a new name for A1.

The reference to field (or method) F will succeed exactly when B1 has a field (or method) F; that is the correct semantics for the expression v.F in the original parameterized function. The conversion to type B2 will succeed when F has the type A2. The conversion of the return value from type B2 to type I2 will always succeed, as B2 implements I2 by construction.

Returning to the parameterized function, the type of v.$FieldF() is I2, which is correct since all references to the unknown type U2 are compiled to use the interface type I2.

An expression that uses two operands will take the second operand as a parameter of the appropriate interface type. The instantiation template will use a type assertion to convert the interface type to the appropriate type argument. For example, v1[v2], where both expressions have unknown type, will be converted to v1.$Index(v2) and the instantiation template will be

func (b1 *B1) $Index(i2 I2) I3 { return B3(A1(*b1)[A2(*i2.(*B2))]) }

The type conversions get admittedly messy, but the basic idea is as above: convert the Bn values to the type arguments An, perform the operation, convert back to Bn, and finally return as type In. The method takes an argument of type I2 as that is what the parameterized function will use; the type assertion to *B2 will always succeed.

This same general procedure works for all simple expressions: index expressions, slice expressions, relational operators, arithmetic operators, indirection expressions, channel receives, method expressions, method values, conversions.

To be clear, each expression is handled independently, regardless of how it appears in the original source code. That is, a + b - c will be translated into two method calls, something like a.$Plus(b).$Minus(c) and each method will have its own instantiation template.

Untyped constants

Expressions involving untyped constants may be implemented by creating a specific method for the specific constants. That is, we can compile v + 10 as v.$Add10(), with an instantiation template

func (b1 *B1) $Add10() I1 { return B1(A1(*b1) + 10) }

Another possibility would be to compile it as v.$AddX(10) and

func (b1 *B1) $AddX(x int64) { return B1(A1(*b1) + A1(x)) }

However, this approach in general will require adding some checks in the instantiation template so that code like v + 1.5 is rejected if the type argument of v is not a floating point or complex type.

Logical operators

The logical operators && and || will have to be expanded in the compiled form of the parameterized function so that the operands will be evaluated only when appropriate. That is, we can not simply replace && and || of values of unknown types with method calls, but must expand them into if statements while retaining the correct order of evaluation for the rest of the expression. In the compiler this can be done by rewriting them using a compiler-internal version of the C ?: ternary operator.

Address operator

The address operator requires some additional attention. It must be combined with the expression whose address is being taken. For example, if the parameterized function has the expression &v[i], the compiler must generate a $AddrI method, with an instantiation template like

func (b1 *B1) $AddrI(i2 I2) I3 { return B3(&A1(*b1)[A2(i2.(*B2))]) }

Type assertions

Type assertions are conceptually simple, but as they are permitted for values of unknown type they require some additional attention in the instantiation template. Code like v.(K), where K is a known type, will be compiled to a method call with no parameters, and the instantiation template will look like

func (b1 B1) $ConvK() K {
	a1 := A1(b1)
	var e interface{} = a1
	return e.(K)
}

Introducing e avoids an invalid type assertion of a non-interface type.

For v.(U2) where U2 is an unknown type, the instantiation template will be similar:

func (b1 B1) $ConvU() I2 {
	a1 := A1(b1)
	var e interface{} = a1
	return B2(e.(A2))
}

This will behave correctly whether A2 is an interface or a non-interface type.

Function calls

A call to a function of known type requires adding implicit conversions from the unknown types to the known types. Those conversions will be implemented by method calls as described above. Only conversions valid for function calls should be accepted; these are the set of conversions valid for assignment statements, described below.

A call to a function of unknown type can be implemented as a method call on the interface type holding the function value. Multiple methods may be required if the function is called multiple times with different unknown types, or with different numbers of arguments for a variadic function. In each case the instantiation template will simply be a call of the function, with the appropriate conversions to the type arguments of the arguments of unknown type.

A function call of the form F1(F2()) where neither function is known may need a method all by itself, since there is no way to know how many results F2 returns.

Composite literals

A composite literal of a known type with values of an unknown type can be handled by inserting implicit type conversions to the appropriate known type.

A composite literal of an unknown type can not be handled using the mechanisms described above. The problem is that there is no interface type where we can attach a method to create the composite literal. We need some value of type Bn with a method for us to call, but in the general case there may not be any such value.

To implement this we require that the instantiation place a value of an appropriate interface type in the function’s closure. This can always be done as generalized functions only occur at top-level, so they do not have any other closure (function literals are discussed below). We compile the code to refer to a value $imaker in the closure, with type Imaker. The instantiation will place a value with the appropriate type Bmaker in the function instantiation's closure. The value is irrelevant as long as it has the right type. The methods of Bmaker will, of course, be those of Imaker. Each different composite literal in the parameterized function will be a method of Imaker.

A composite literal of an unknown type without keys can then be implemented as a method of Imaker whose instantiation template simply returns the composite literal, as though it were an operator with a large number of operands.

A composite literal of an unknown type with keys is trickier. The compiler must examine all the keys.

  • If any of the keys are expressions or constants rather than simple names, this can not be a struct literal. We can generate a method that passes all the keys and values, and the instantiation template can be the composite literal using those keys and values. In this case if one of the keys is an undefined name, we can give an error while compiling the parameterized function.
  • Otherwise, if any of the names are not defined, this must be a struct literal. We can generate a method that passes the values, and the instantiation template can be the composite literal with the literal names and the value arguments.
  • Otherwise, we call a method passing all the keys and values. The instantiation template is the composite literal with the key and value arguments. If the type argument is a struct, the generated method will ignore the key values passed in.

For example, if the parameterized function uses the composite literal U{f: g} and there is a local variable named f, this is compiled into imaker.$CompLit1(f, g), and the instantiation template is

func (bm Bmaker) $CompLit1(f I1, g I2) I3 {
	return bm.$CompLit2(A1(f.(B1)), A2(g.(B2)))
}
func (Bmaker) $CompLit2(f A1, g A2) I3 { return B3(A3{f: g}) }

If A3, the type argument for U, is a struct, then the parameter f is unused and the f in the composite literal refers to the field f of A3 (it is an error if no such field exists). If A3 is not a struct, then f must be an appropriate key type for A3 and the value is used.

Function literals

A function literal of known type may be compiled just like any other parameterized function. If a maker variable is required for constructs like composite literals, it may be passed from the enclosing function’s closure to the function literal’s closure.

A function literal of unknown type requires that the function have a maker variable, as for composite literals, above. The function literal is compiled as a parameterized function, and parameters of unknown type are received as interface types as we are describing. The type of the function literal will itself be an unknown type, and will have corresponding real and interface types just like any other unknown type. Creating the function literal value requires calling a method on the maker variable. That method will create a function literal of known type that simply calls the compiled form of the function literal.

For example:

func [T] Counter() func() T {
	var c T
	return func() T {
		c++
		return c
	}
}

This is compiled using a maker variable in the closure. The unknown type T will get an interface type, called here I1; the unknown type func() T will get the interface type I2. The compiled form of the function will call a method on the maker variable, passing a closure, something along the lines of

type CounterTClosure struct { c *I1 }
func CounterT() I2 {
	var c I1
	closure := CounterTClosure{&c}
	return $bmaker.$fnlit1(closure)
}

The function literal will get its own compiled form along the lines of

func fnlit1(closure CounterTClosure) I1 {
	(*closure.c).$Inc()
	return *closure.c
}

The compiled form of the function literal does not have to correspond to any particular function signature, so it’s fine to pass the closure as an ordinary parameter.

The compiler will also generate instantiation templates for callers of Counter.

func (Bmaker) $fnlit1(closure struct { c *I1}) I2 {
	return func() A1 {
		i1 := fnlit1(closure)
		b1 := i1.(B1)
		return A1(b1)
	}
}

func (b1 *B1) $Inc() {
	a1 := A1(*b1)
	a1++
	*b1 = B1(a1)
}

This instantiation template will be compiled with the type argument A1 and its method-bearing copy B1. The call to Counter will use an automatically inserted type assertion to convert from I2 to the type argument B2 aka func() A1. This gives us a function literal of the required type, and tracing through the calls above shows that the function literal behaves as it should.

Statements

Many statements require no special attention when compiling a parameterized function. A send statement is compiled as a method on the channel, much like a receive expression. An increment or decrement statement is compiled as a method on the value, as shown above. A switch statement may require calling a method for equality comparison, just like the == operator.

Assignment statements

Assignment statements are straightforward to implement but require a bit of care to implement the proper type checking. When compiling the parameterized function it's impossible to know which types may be assigned to any specific unknown type. The type checking could be done using annotations of the form U1 must be assignable to U2, but here I’ll outline a method that requires only instantiation templates.

Assignment from a value of one unknown type to the same unknown type is just an ordinary interface assignment.

Otherwise assignment is a method on the left-hand-side value (which must of course be addressable), where the method is specific to the type on the right hand side.

func (b1 *B1) $AssignI2(i2 I2) {
	var a1 A1 = A2(i2.(B2))
	*b1 = B1(a1)
}

The idea here is to convert the unknown type on the right hand side back to its type argument A2, and then assign it to a variable of the type argument A1. If that assignment is not valid, the instantiation template can not be compiled with the type arguments, and the compiler will give an error. Otherwise the assignment is made.

Return statements are implemented similarly, assigning values to result parameters. The code that calls the parameterized function will handle the type conversions at the point of the call.

Range clauses

A for statement with a range clause may not know anything about the type over which it is ranging. This means that range clauses must in general be implemented using compiler built-in functions that are not accessible to ordinary programs. These will be similar to the runtime functions that the compiler already uses. A statement:

	for v1 := range v2 {}

could be compiled as something like:

	for v1, it, f := v2.$init(); !f; v1, f = v2.$next(it) {}

with instantiation templates that invoke compiler built-in functions:

func (b2 B2) $init() (I1, I3, bool) {
	return $iterinit(A2(b2))
}
func (b2 B2) $next(I3) (I1, bool) {
	return $iternext(A2(b2), I3.(A3))
}

Here I’ve introduced another unknown type I3 to represent the current iteration state.

If the compiler knows something specific about the unknown type, then more efficient techniques can be used. For example, a range over a slice could be written using $Len and $Index methods.

Type switches

Type switches, like type assertions, require some attention because the value being switched on may have a non-interface type argument. The instantiation method will implement the type switch proper, and pass back the index of the select case. The parameterized function will do a switch on that index to choose the code to execute.

func [T] Classify(v T) string {
	switch v.(type) {
	case []byte:
		return “slice”
	case string:
		return “string”
	default:
		return “unknown”
	}
}

The parameterized function is compiled as

func ClassifyT(v I1) string {
	switch v.$Type1() {
	case 0:
		return “slice”
	case 1:
		return “string”
	case 2:
		return “unknown”
	}
}

The instantiation template will be

func (b1 B1) $Type1() int {
	var e interface{} = A1(b1)
	switch e.(type) {
	case []byte:
		return 0
	case string
		return 1
	default
		return 2
	}
}

The instantiation template will have to be compiled in an unusual way: it will have to permit duplicate types. That is because a type switch that uses unknown types in the cases may wind up with the same type in multiple cases. If that happens the first matching case should be used.

Select statements

Select statements will be implemented much like type switches. The select statement proper will be in the instantiation template. It will accept channels and values to send as required. It will return an index indicating which case was chosen, and a receive value (an empty interface) and a bool value. The effect will be fairly similar to reflect.Select.

Built-in functions

Most built-in functions when called with unknown types are simply methods on their first argument: append, cap, close, complex, copy, delete, imag, len, real. Other built-in functions require no special handling for parameterized functions: panic, print, println, recover.

The built-in functions make and new will be implemented as methods on a special maker variable, as described above under composite literals.

Methods of parameterized types

A parameterized type may have methods, and those methods may have arguments and results of unknown type. Any instantiation of the parameterized type must have methods with the appropriate type arguments. That means that the compiler must generate instantiation templates that will serve as the methods of the type instantiation. Those templates will call the compiled form of the method with the appropriate interface types.

type [T] Vector []T
func [T] (v Vector[T]) Len() int { return len(v) }
func [T] (v Vector[T]) Index(i int) T { return v[i] }

type Readers interface {
	Len() int
	Index(i int) io.Reader
}

type VectorReader struct { Vector[io.Reader] }
var _ = VectorReader{}.(Readers)

In this example, the type VectorReader inherits the methods of the embedded field Vector[io.Reader] and therefore implements the non-parameterized interface type Readers. When implementing this, the compiler will assign interface types for the unknown types T and []T; here those types will be I1 and I2, respectively. The methods of the parameterized type Vector will be compiled as ordinary functions:

func $VectorLen(i2 I2) int { return i2.$len() }
func $VectorIndex(i2 I2, i int) I1 { return i2.$index(i) }

The compiler will generate instantiation templates for the methods:

func (v Vector) Len() int { return $VectorLen(I2(v)) }
func (v Vector) Index(i int) A1 {
return A1($VectorIndex(I2(v), i).(B1))
}

The compiler will also generate instantiation templates for the methods of the type B2 that corresponds to the unknown type []T.

func (b2 B2) $len() int { return len(A2(b2)) }
func (b2 B2) $index(i int) I1 { return B1(A2(b2)[i]) }

With an example this simple there is a lot of effort for no real gain, but this does show how the compiler can use the instantiation templates to define methods of the correct instantiated type while the bulk of the work is still done by the parameterized code using interface types.

Implementation summary

I believe that covers all aspects of the language and shows how they may be implemented in a manner that is reasonably efficient both in compile time and execution time. There will be code bloat in that instantiation templates may be compiled multiple times for the same type, but the templates are, in general, small. Most are only a few instructions. There will be run time cost in that many operations will require a method call rather than be done inline. This cost will normally be small. Where it is significant, it will always be possible to manually instantiate the function for the desired type argument.

While the implementation technique described here is general and covers all cases, real compilers are likely to implement a blend of techniques. Small parameterized functions will simply be inlined whenever they are called. Parameterized functions that only permit a few types, such as the Sum or Join examples above, may simply be compiled once for each possible type in the package where they are defined, with callers being compiled to simply call the appropriate instantiation.

Implementing type parameters using interface methods shows that type parameters can be viewed as implicit interfaces. Rather than explicitly defining the methods of a type and then calling those methods, type parameters implicitly define an interface by the way in which values of that type are used.

In order to get good stack tracebacks and a less confusing implementation of runtime.Caller, it will probably be desirable to, by default, ignore the methods generated from instantiation templates when unwinding the stack. However, it might be best if they could influence the reporting of the parameterized function in a stack backtrace, so that it could indicate that types being used. I don’t yet know if that would be helpful or feasible.

Deployment

This proposal is backward compatible with Go 1, in that all Go 1 programs will continue to compile and run identically if this proposal is adopted. That leads to the following proposal.

  • Add support for type parameters to a future Go release 1.n, but require a command line option to use them. This will let people experiment with the new facility.
  • Add easy support for that command line option to the go tool.
  • Add a // +build constraint for the command line option.
  • Try out modified versions of standard packages where it seems useful, putting the new versions under the exp directory.
  • Decide whether to keep the facility for Go 2, in which the standard packages would be updated.

In the standard library, the most obvious place where type parameters would be used is to introduce compile-time-type-safe containers, like container/list but with the type of the elements known at compile time. It would also be natural to add to the sort package to make it easier to sort slices with less boilerplate. Other new packages would be algorithms (find the max/min/average of a collection, transform a collection using a function), channels (merge channels into one, multiplex one channel into many), maps (copy a map).

Type parameters could be used to partially unify the bytes and strings packages. However, the implementation would be based on using an unknown type that could be either []byte or string. Values of unknown type are passed as interface values. Neither []byte nor string fits in an interface value, so the values would have be passed by taking their address. Most of the functions in the package are fairly simple; one would only want to unify them if they could be inlined, or if escape analysis were smart enough to avoid pushing the values into the heap, or if the compiler were smart enough to see that only two types would work and to compile both separately.

Similar considerations apply to supporting a parameterized Writer interface that accepts either []byte or string. On the other hand, if the compiler has the appropriate optimizations, it would be convenient to write unified implementations for Write and WriteString methods.

The perhaps surprising conclusion is that type parameters permit new kinds of packages, but need not lead to significant changes in existing packages. Go does after all already support generalized programming, using interfaces, and the existing packages were designed around that fact. In general they already work well. Adding type parameters does not change that. It opens up the ability to write new kinds of packages, ones that have not been written to date because they are not well supported by interfaces.

Summary

I think this is the best proposal so far. However, it will not be adopted.

The syntax still needs work. A type is defined as type [T] Vector []T but is used as Vector[int], which means that the brackets are on the left in the definition but on the right in the use. It would be much nicer to write type Vector[T] []T, but that is ambiguous with an array declaration. That suggests the possibility of using double square brackets, as in Vector[[int]], or perhaps some other character(s).

The type deduction rules are too complex. We want people to be able to easily use a Transform function, but the rules required to make that work without explicitly specifying type parameters are very complex. The rules for untyped constants are also rather hard to follow. We need type deduction rules that are clear and obvious, so that there is no confusion as to which type is being used.

The implementation description is interesting but very complicated. Is any compiler really going to implement all that? It seems likely that any initial implementation would just use macro expansion, and unclear whether it would ever move beyond that. The result would be increased compile times and code bloat.