Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: spec: generic parameterization of array sizes #44253

Open
ajwerner opened this issue Feb 13, 2021 · 19 comments
Open

proposal: Go 2: spec: generic parameterization of array sizes #44253

ajwerner opened this issue Feb 13, 2021 · 19 comments

Comments

@ajwerner
Copy link

@ajwerner ajwerner commented Feb 13, 2021

See detailed design here. The recent acceptance of #43651, we can now begin discussing filling in the gaps and omissions of that proposal. This issue provides a proposal to fulfill the following stated omission in the type parameter proposal:

No parameterization on non-type values such as constants. This arises most obviously for arrays, where it might sometimes be convenient to write type Matrix[n int] [n][n]float64. It might also sometimes be useful to specify significant values for a container type, such as a default value for elements.

The structure of the proposal boils down to two extensions:

  1. Expression of all array types inside of type lists

Today one can specify a type constraint that a type is an array of some specific, constant size. For example:

type Array8[T any] interface {
    type [8]T
}

Similarly one can enumerate a set of array sizes:

type ArraysOfSomeSizes[T any] interface {
    type [2]T, [4]T, [8]T, [16]T
}

But, there's no way to capture the idea of all arrays of a given type. This proposal proposes the following syntax to capture that idea:

type Array[T any] interface {
    type [...]T
}

This syntax is familiar as it is used for length inference in array literals.

  1. Permit constant expressions to determine the length of an array type using len([...]T)

I'm not wedded in any way to the function with this behavior actually being len. It could be some
new thing like ArrayLen([...]T) in some package somewhere.

Today go provides both constant expressions which take types (see unsafe.Sizeof) and more generally
builtin functions which take types. Secondly, the go compiler already statically computes array lengths
at compile time when it can.

Constant expressions can be already be used to size arrays.

Taken together, one can imagine using this constant expression over some parameterized array type to
size other array types.

A detailed discussion with examples, background and motivation can be found in this proposal.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Feb 13, 2021

Thanks. I'm not quite sure about the constant expression bit; it's important to not constrain the compiler implementation. In any case I think we have our hands full with the current proposal. I'm going to put this on hold for now for consideration as a future language extension.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Feb 13, 2021

Thanks for giving it a look.

I'm not quite sure about the constant expression bit; it's important to not constrain the compiler implementation.

Look forward to working out the implications here for the compiler when the time comes. Will give it a rest for 6-9mo and see how the rest of the type parameter implementation shakes out.

ajwerner added a commit to ajwerner/proposal that referenced this issue Feb 15, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 12, 2021

Change https://golang.org/cl/301290 mentions this issue: Add proposal for parameterized generic array sizes

gopherbot pushed a commit to golang/proposal that referenced this issue Mar 16, 2021
Detailed proposal for golang/go#44253.

Change-Id: I9e37363a4d5c41de100def1222d50ff6ad09d078
GitHub-Last-Rev: 8e3bc88
GitHub-Pull-Request: #33
Reviewed-on: https://go-review.googlesource.com/c/proposal/+/301290
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Apr 16, 2021

A few thoughts:

  • I came up with the exact same [...]T notation independently, which seems like a reasonable indicator that it's a natural fit.
  • there's no need for a new len([...]T) syntax: len(T{}) is already valid and sufficient (as is len((*T)(nil))). In general, len takes a value expression, not a type expression, so I think that's fine.
  • I don't think that making len(T{}) into a const expression will constrain the compiler implementation. The generated code already needs to know the sizes of all type parameters so it can compute field and slice offsets, and I don't think this is any different.

With regard to the last point, I don't see a huge difference between:

type Array[T any] interface {
	[...]T
}
func concatArrays[T any, A, B Array[T]](a A, b B) [len(A{}) + len(B{})]T {
	var r[len(A{}] + len(B{})]T
	copy(r[:], a[:])
	copy(r[len(a):], b[:])
	return r
}

and:

func mkPair[A, B any](a A, b B) struct{T0 A; T1 B} {
	return struct{T0 A; T1 B}{
		T0: a,
		T1: b,
	}
}

They're both forming a new type with a size dependent on the sizes of two type parameters.

The main thing that concerns me about this proposal is that there is considerable potential for code bloat if the feature is used a lot, particularly if the stenciling or GC stenciling implementation approach is used. But there's also potential for the same kind of issue even without generic arrays so maybe that's not a good argument.

It's just a bit easier with arrays. For example:

const n = 1<<30

func F[T interface {[...]byte}]() int {
	if len(T{}) >= n {
		return len(T{})
	}
	return F[[len(T{})+1]byte]()
}

That particular kind of abuse could be avoided by prohibiting recursive generic functions where the type parameters of the recursive call aren't identical. I haven't yet seen a good use case for allowing that yet, and it avoids a bunch of potential problems.

In general, I'm +1 on this proposal as long as we can be happy that it's not open to serious abuse (e.g. computation in the type system).

@Merovius
Copy link

@Merovius Merovius commented Apr 16, 2021

With regard to the last point, I don't see a huge difference between: […] and […]

I agree, but it should be noted that this is not the only kind of code you can write. In particular, you can use len(a) in any expression. This leads to problems.

I think it's generally important to keep the type-checking of the body a generic function independent of the type-argument. And AFAIK that's the plan, with sentences like "a generic function can use an operation if its valid for all possible type-arguments". (Note: This is what's in the current design). However, in general, that's not necessarily feasible. A simple example of the kinds of problems we might encounter is

func F[A interface{ [...]T}, T any]() {
    const _ uint8 = uint8(len(A{}))
}

This does not type-check if A is [256]int, for example. Off-the-cuff, this seems to imply that type-checking generic functions is, again, equivalent to solving SAT and thus NP complete. That is, you can encode any binary formula by

  • Using type-parameters with constraint interface{ [...]struct{} } as variables and map them using len
  • Using [0]struct{} to encode FALSE and [256]struct{} to encode TRUE
  • Using | to encode , & to encode , and 256 ^ len(A{}) to encode ¬A

My example would thus be encoded as (using the type-list syntax for now):

type (
    Var interface { type […]struct{} }
    True [256]struct{}
    False [0]struct{}
)

// (A∨¬B∨¬C) ∧ (¬A∨C∨D)
func F[A, B, C, D Var]() {
    const (
        // for readability
        a, b, c = len(A), len(B), len(C)
        notA, notB, notC = 256^a, 256^b, 256^c
    )
    const _ = uint8( (a | notB | notC) & (notA | c | d) )
}

This fails type-checking if and only if the formula is satisfiable - as long as we limit type-checking to the generic function, using the constraints on the type-parameters only.

Of course, if we are willing to delay type-checking of constant expressions until instantiation, this problem goes away.

e.g. computation in the type system

I think this is excluded by the initialization order section of the spec, which effectively limits the kinds of recursion you can do. But I'm not 100% sure. I believe this at least requires some thinking, because I do think we have some recursion at least.


FTR, this is not limited to len(A{}) though. unsafe.Sizeof has exactly the same problem. That is, you don't need generic arrays to run into this, you can use struct{} and [256]byte for FALSE and TRUE respectively and use unsafe.Sizeof instead of len and get the same argument.

So, I actually think we need likely can't make len and unsafe.Sizeof (and friends) constant expressions if they are applied to generic variables in any case.

@Merovius
Copy link

@Merovius Merovius commented Apr 16, 2021

@ianlancetaylor highlighting you to be sure - see my last comment, I think this requires some refinement of the generics proposal as a whole, in terms of what it means for constant expressions.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 16, 2021

I think this is excluded by the initialization order section of the spec, which effectively limits the kinds of recursion you can do. But I'm not 100% sure. I believe this at least requires some thinking, because I do think we have some recursion at least.

I'm not sure I understand how instantiation of generic types relates to initialization order. Isn't initialization order a runtime property? Are you suggesting that the generic type checking should have to work on the infinite set of types implied by constraints at compile time? I don't see why that has to be the case.


Just to make sure I understand, is this any different without [...]struct{} support? Say instead the constraint were as follows, wouldn't we have the same problem?

Var interface { type [0]struct{}, [256]struct{} }
@Merovius
Copy link

@Merovius Merovius commented Apr 16, 2021

I'm not sure I understand how instantiation of generic types relates to initialization order.

Maybe nothing. I was thinking about stuff like this and other recursive type definitions. I'm not quite sure where in the spec this is dealt with right now.

But ultimately, the computation question is off-topic right now :) I haven't found a specific problem (yet). So feel free to disregard that part :)

Just to make sure I understand, is this any different without [...]struct{} support? Say instead the constraint were as follows, wouldn't we have the same problem?

Yes. The problem is specific to len and unsafe.Sizeof and friends being constant expressions under certain circumstances. The problem isn't even restricted to arrays - you can even use byte, struct{}, unsafe.Sizeof and add a couple of bitshifts, to get to the same problem. So it doesn't matter for discussing the introduction of [...]T per se, it only immediately matters for the idea of having len (et al) be constant expressions.

Unfortunately, I think len (et al) not being constant expressions limits the usefulness of array type constraints. You could still write a function that works on any array type, but you could not, for example, write a generic concat function for arrays. At least as far as I can tell. Ultimately, I think you'd be served just as well using slices in that situation - though I'd be ready to be proven wrong.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 16, 2021

I think the big learning of the day is that it's particularly hard to understand whether something is going to (1) overflow or (b) produce a value that is larger than the address space. As @Merovius has pointed out, we're in a real pickle if we want to determine whether an expression could overflow. Indeed, we've now discussed two different classes of errors which might occur with these generic constant expressions:

  • (1) Deciding whether an illegal overflow will occur
  • (2) Deciding whether an implied type will fit in the memory space

(1) is squarely a problem of the type system and is something we deal with to make this proposal coherent. (2), however, is not a concern of the type checker but rather a concern of compilation; it is a platform-specific concern. As I see it, the type system, in a sense, does not care about the bounds of the memory space (and, relatedly, the word size used for int/uintptr). The type system assumes those could be infinite.

So, below find my thought process and proposal to try to address the problems in a way while attempting to come up with rules that are small, simple, and coherent with the concepts of the language.

Rejected Straw-Man: Resolve overflows upon instantiation

One straw-man proposal might be to just ignore the possibility for overflow problems until later in compilation/linking when all instantiations are known. For a given instantiation, it's trivial to determine if overflow occurs. This approach, however, violates a key principle of the Type Parameters Proposal as laid out in the comparison to C++:

This means that changing template code can accidentally break far-off instantiations. It also means that error messages are reported only at instantiation time, and can be deeply nested and difficult to understand. This design avoids these problems through mandatory and explicit constraints.

If we did this checking later and some constant expression deep in the bowels of the library changed, it may break code which had previously worked. Such changes would be hard to detect, but, in a sense, might be considered part of the API.

Proposal: Reject typed constant expressions referencing type parameters

The succinct rule to ward off problem (1) and @Merovius's concerns about detecting overflow is to prevent a constant expression which references a type parameter from ever becoming typed. This, with perhaps minor tweaks to the spec, should work well for the motivating use case. While the language spec requires that the array length must be a constant expression "representable by int", the type checker, as far as I understand, doesn't need to care about the maximum size of an int [1]. This solution does not address memory capacity overflow problems, but, then again, we are plagued by those today [2]. As far as the type system is concerned, we may have infinite memory. This proposal would allow generic data structures but would not allow these constant expressions to be directly assigned to concrete numeric types.

The go spec has some langauge describing when a constant expression becomes typed. The extension to the spec would say something to the effect of: all constant expressions which reference type parameters cannot be typed, cast to a type, used in an assignment to a typed constant, or assigned to a variable. The other additional extension would be to note that len of a typed expression whose type is a type parameter is not a constant when used as part of an expression which must be typed.

Let's explore the implications of this proposal:

type Var interface { type [0]struct, [256]struct } 

func F[A Var]() {
    const _ = uint8(len(A{})) // illegal, cannot convert untyped constant referencing type params to uint8
    var _ uint8 = len(A{}) // illegal, cannot convert untyped constant referencing type params to uint8
    var _ = uint(len(A{})) // legal cast but not guaranteed to be evaluated at compile time (though certainly can be).
}

This proposal also allows a generic concatenation function (assuming we have [...]T syntax) as in #44253 (comment).

[1] I would like to believe that we'd say that the below program type checks in the go programming language even though it doesn't compile on 32-bit platforms.

func main() {
	fmt.Println(len((*[math.MaxInt32+1]byte)(nil)))
}

[2] consider the following silly program:

type Arr[T any] interface {
	type [1]T, [math.MaxUint32]T
}

type Big2DArr[T any, A Arr[T]] [math.MaxUint32 >> 14]A

The expression len(s) is constant if s is a string constant. The expressions len(s) and cap(s) are constants if the type of s is an array or pointer to an array and the expression s does not contain channel receives or (non-constant) function calls; in this case s is not evaluated. Otherwise, invocations of len and cap are not constant and s is evaluated.

I did not realize when I wrote this that len is already a constant expression! That changes the nature of this proposal a bit I suspect. Mea culpa for not reading the spec closely enough. I'll update this proposal accordingly at some point as it is currently inaccurate.

@Merovius
Copy link

@Merovius Merovius commented Apr 16, 2021

To repeat what I just said on slack: I only used constant overflow because it's an easy to trigger compilation failure. But any compilation failure that depends on an integer constant can be used for this.

For example, indexing an array would be suitable, and it doesn't require a typed constant.

Similarly, you could declare an array type and assign that to a statically known array type:

type (
	Var   interface{ type True, False}
	True  [256]struct{}
	False [0]struct{}
)

func F[A, B, C, D Var]() {
	const (
		// for readability
		a, b, c          = len(A), len(B), len(C)
		notA, notB, notC = 256 ^ a, 256 ^ b, 256 ^ c
	)
	type X [(a | notB | notC) & (notA | c | d)]int
	type Y [0]int
	var _ Y = Y(X{}) // valid if and only if the formula always evaluates to false
}

This one is especially interesting, because "use a constant expression derived from one array types length to define another array type" is the entire point of making len constant expressions. But it can still be used to break type-checking. So ISTM that the goals of this proposal are relatively fundamentally at odds with the design criteria of the generics proposal. But, of course, I might be wrong and it wouldn't be the first time :)

I think the core problem is that it's not entirely clear what the rule "an operation is valid, if it is valid for every type fulfilling the constraints" means, in many circumstances. For example, this last example could just reject the code no matter if the formula is satisfiable or not - but that would violate this rule. That might be okay, but we need to make sure it's understandable in what circumstances we violate it for which classes of operations. The spec must clearly state that the compiler rejects this program and why.

I'm also not entirely sure, how we would distinguish "cases like this" from more reasonable cases. i.e. I'm not sure what differentiates type X [len(A{})+1]int from type X [((len(A{}) | (256^len(B{}) | (256^len(C{}))) & ((256^len(A{})) | len(C) | len(D))]int to the compiler. Like, it's clear to me, as a human, that one of them is a reasonable formula used to implement a B+ tree or something while the other is a wild attempt to trick the compiler into doing unforeseen things. But I couldn't draw a line of what makes one okay and the other not.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 16, 2021

Indexing into an array by a constant seems seems like a non-issue. If the type is constrained to be the set of all array lengths, then no constant value is safe to iterate over all of those lengths. Indexing by a variable into an array is a runtime check.

The casting case, however, is more interesting. Hopefully there's a nice out there too. I'll think on it. Thanks!

@Merovius
Copy link

@Merovius Merovius commented Apr 16, 2021

Indexing into an array by a constant seems seems like a non-issue. If the type is constrained to be the set of all array lengths, then no constant value is safe to iterate over all of those lengths.

To clarify the same way I did on slack, this is the real idea:

func F[A, B, C, D Var]() {
	const (
		// for readability
		a, b, c          = len(A), len(B), len(C)
		notA, notB, notC = 256 ^ a, 256 ^ b, 256 ^ c
	)
	type X [10]int
	var _ = X[(a|notB|notC)&(notA|c|d)] // valid if and only if the formula always evaluates to false
}

The array bounds are fixed - the index is varied.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 16, 2021

Fortunately the spec and the above constraint prevents this one too.

The spec says:

the index x must be of integer type or an untyped constant

As noted above, anything that's a constant expr involving a type parameter must be an untyped constant.

The spec also says:

a constant index that is untyped is given type int

We've already noted that typing an untyped expression such as this is disallowed.

https://golang.org/ref/spec#Index_expressions

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Apr 16, 2021

This proposal also allows a generic concatenation

I'm fairly sure it doesn't unless the return type is a slice rather than an array (which is fine - array concat is not necessarily a useful operation).

Allowing array lengths as const expressions (and hence arithmetic) does seem to open the gateway to a world of potential pain, regardless of this proposal. Without that, I don't think this proposal is so useful, as you can't form any array types other than the original (for example an array of bool to mark validity of elements).

In the end, this is just a performance issue and perhaps the consequent overhead of using slices rather than arrays is worth it for the avoidance of all those language issues.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 16, 2021

@rogpeppe I think it the proposal would permit the following:

type Array[T any] interface{ [...]T }
type ArrayConcat[T any, A, B Array[T]] interface { [len(A{}) + len(B{})]T }

func Concat[T any, A, B Array[T], R ArrayConcat[T, A, B]](a A, b B) R {
    var r R
    copy(r[:], a[:])
    copy(r[len(a):], b[:])
    return r
}
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 18, 2021

Personally I tend to think that len called on a value of parameterized array type should return a non-constant value of type int. That is what happens if you call len on a slice, after all. The rules for constants in Go are complex enough that I think we need to keep them out of generic code.

The same applies to unsafe.Sizeof and unsafe.Offsetof.

If we did this, it would make it harder to write certain kinds of code, like defining a new type that was an array whose length was the same as that of some type parameter. But I think that we can live with that.

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 19, 2021

My short term preference would be to disallow unsafe.Sizeof, unsafe.Offsetof and len on type parameter expressions altogether. For arrays if you want to call len you could be required to have a [:] to make it a slice or something. The reason to disallow it is that it preserves future flexibility. Once you commit to those functions being runtime non-constant expressions then it seems that you're sort of stuck.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 19, 2021

@ajwerner, is it not backward-compatible to change a non-constant expression to a constant of the same type?

(It isn't backward-compatible to change a variable to a constant, because the variable can be reassigned or have its address taken — but a call-expression for len or unsafe.Sizeof is not addressable and nothing can be assigned to it.)

@ajwerner
Copy link
Author

@ajwerner ajwerner commented Apr 19, 2021

@bcmills that's a good point. In fact, all 3 of these functions already return typed values in constant expressions. If effect, I guess just making them non-constant expressions when involving type parameters preserves flexibility.

but a call-expression for len or unsafe.Sizeof is not addressable

Yet. 👀 on #45624.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants