Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: permit conversion of []A to []B if A and B have same underlying type modulo struct tags #71183

Open
adonovan opened this issue Jan 8, 2025 · 34 comments
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee LanguageProposal Issues describing a requested change to the Go language specification. Proposal
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Jan 8, 2025

Background: It is not uncommon to need to convert a value of slice type []A to type []B, where A and B have the same representation (e.g. string). Unfortunately, one must allocate a copy.

Proposal: We propose to relax the restrictions so that, given var ( a A; aa []A; b B; bb []B), the conversion ([]B)(aa) is legal so long as both B(a) and A(b) are legal A and B have the same underlying type, ignoring struct tags. The result would be a slice of the same len, cap, and pointer, but a different type. In other words, the operation creates an alias, it does not allocate an array.

The requirement for the mutual assignability same underlying type is that if aa aliases bb, then assigning to aa[0] and reading from bb[0] effects a conversion from A to B, and vice versa. Therefore both conversions had better be legal. Consequently, the two types must have the same memory layout (among other requirements).

[Edited: I mistakenly started with "mutually convertible" but this is not sufficient for e.g. A=float32 B=float64.]

Prior art:

Loosely related:

@ianlancetaylor @griesemer

@gopherbot gopherbot added this to the Proposal milestone Jan 8, 2025
@ianlancetaylor ianlancetaylor added LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee labels Jan 8, 2025
@ianlancetaylor

This comment was marked as resolved.

@adonovan

This comment was marked as resolved.

@adonovan adonovan moved this to Incoming in Proposals Jan 8, 2025
@Jorropo
Copy link
Member

Jorropo commented Jan 8, 2025

float32float64 is valid, so is float64float32
Yet I do not see any reasonable (aka without introducing dynamic tying to everything) way to implement the following code in constant time using aliases:

func f(x []float32) []float64 { return []float64(x) }

same for differently sized integers, floats ←→ integers, ...

what am I missing ?

Edit:
This part tries address this

Consequently, the two types must have the same memory layout (among other requirements).

But I think it's unclear for []float64 ←→ []uint64. It seems to me like we need a formal « same underlying bit representation » and only {,u}int{,64,32,16,8} would end up working ?
And everything based on the same underlying type.

@ianlancetaylor

This comment was marked as resolved.

@ianlancetaylor
Copy link
Member

Agreed, I think mutual convertibility is not enough. I think we need to say that the underlying type is the same. (We could perhaps add some special language about channels, if we care.)

@jimmyfrasche
Copy link
Member

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any. That's not mutually convertible so only one direction makes sense but, when it does make sense, the source and target do have the same representation so it could be aliased without a copy.

@jimmyfrasche
Copy link
Member

Since this came up in #69264, I'll use my example from there. In that I asked whether

func Mean(x []float64) float64

should be

func Mean[Slice ~[]E, E ~float64](x Slice) E

so that if you had a slice of type []Temp where type Temp float64 you could use it directly.

This proposal allows the API without generics to avoid a copy by allowing:

var temps []Temp = f()
tbar := Mean([]float64(temps))

But that doesn't do quite the same thing as the generic version which returns a value of type Temp so you'd need to write

Temp(Mean([]float64(temps)))

Further the generics version would accept a value with a type like type Temps []Temp and return a Temp. Would you be able to use that last line to do that or would it require another conversion first like:

Temp(Mean([]float64([]Temp(temps))))

?

If it were instead some function F that takes and returns a slice would it be

Temps([]Temp(F([]float64([]Temp(temps)))))
// or
Temps(F([]float64(temps)))

?

This would be a useful change but I don't know if it makes API design as nice and simple as one would hope so much as push it down the road.

@ianlancetaylor
Copy link
Member

@jimmyfrasche Actually, fmt.Stringer and any don't have the same representation. The empty interface any looks like https://go.googlesource.com/go/+/refs/heads/master/src/runtime/runtime2.go#184. A non-empty interface looks like https://go.googlesource.com/go/+/refs/heads/master/src/runtime/runtime2.go#179.

And in fact different non-empty interface types don't have the same representation, because they have a different itab layout.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

Agreed, I think mutual convertibility is not enough. I think we need to say that the underlying type is the same.

Ah, thanks for the examples, I agree. I was trying to avoid having to say "same underlying type modulo struct tags", but that's what I thought I was getting at.

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any. That's not mutually convertible so only one direction makes sense but, when it does make sense, the source and target do have the same representation so it could be aliased without a copy.

The aliased 1-element slice is a portal through which values can be pushed in both directions, hence the requirement that the constraint on A and B be symmetric. That means conversion from one interface type to another (or from concrete to interface) is not allowed.

@adonovan adonovan changed the title proposal: spec: permit conversion of []A to []B if A and B are mutually convertible proposal: spec: permit conversion of []A to []B if A and B have same underlying type module struct tags Jan 8, 2025
@adonovan adonovan changed the title proposal: spec: permit conversion of []A to []B if A and B have same underlying type module struct tags proposal: spec: permit conversion of []A to []B if A and B have same underlying type modulo struct tags Jan 8, 2025
@aclements
Copy link
Member

@adonovan and I were accidentally working on this proposal concurrently, so I'll add my details here.

Proposal details

We propose allowing explicit conversion from []T to []U if T and U have the same underlying type.

Specifically, update the Conversions section of the Go language spec with the following:

A non-constant value x can be converted to type T in any of these cases:

  • ignoring struct tags (see below), x's type and T are slice types that are not named types, and their slice element types are not type parameters but have identical underlying types.

This language exactly parallels the language on converting between pointer types.

Rationale

This comes up with some regularity on the golang-nuts@ mailing list (e.g., 1, 2, 3, 4, 5) and StackOverflow (e.g., 1, 2, 3, 4). In our experience, it often comes up in data analysis code, when the code wants to represent data using []T where T has the same underlying type as float64 (or int, etc) but call functions—often mathematical functions—that take []float64 ([]int, etc). Using []T confers additional type safety (it can’t accidentally be mixed up with other float-typed data), and the opportunity to add type-dependent methods like a custom String() method for printing.

Today, this problem can be partially solved using generics. A function like Sum(x []float64) float64 could instead be written func Sum[Slice []Elem, Elem ~float64](x Slice) Elem. However, we do not want to encourage this style. Furthermore, it doesn’t apply to the huge number of existing slice APIs.

This conversion is always safe and can be done in O(1) time. Without explicit support, this conversion requires either O(n) time or the use of unsafe. Similar arguments applied to slice-to-array-pointer conversion, which was added in Go 1.17..

This exact conversion we’re proposing to add is mentioned in the language FAQ, but the justification given for not supporting this conversion is primarily that “Go requires you to be explicit about type conversions.” This issue is proposing an explicit type conversion. It’s true that, unlike *T to *U, slice conversion does not change the method set of a value, but it does help change the method set of the contents of a slice. We consider that a feature of this proposal.

This conversion was briefly discussed in #29864, but in that issue, it was treated as a bug that the language doesn’t allow this. In contrast, the issue is a language change proposal that the language should allow this.

On #29864, there’s some concern that some slice conversions would require allocation and copying. We only propose conversions if the element types have the same underlying type, which will never require copying. Because Go already allows an explicit conversion from *T1 to *T2 if T1 and T2 have the same underlying type, it must be possible to do the equivalent slice conversion without copying.

Alternatives

Should we allow “deeper” conversions, like []*T1 to []*T2 where T1 and T2 have the same underlying type? We argue that we should not. While there are use cases for this, stopping at “one level deep” is a much simpler rule, almost always sufficient, and consistent with Go’s other explicit type conversion rules.

Should we allow similar conversions for the component types of map, chan, or func? We argue no simply because this doesn’t seem to come up nearly as often as slice conversions, making it unjustified complexity. Similarly, we could allow conversions from [N]T to [N]U. This likewise doesn’t seem to carry its weight, and with the addition of slice conversions, this can be expressed as (*[N]U)(([]U)(x[:])), which is not pretty, but will run in O(1) time.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

@ianlancetaylor points out that there's no fundamental reason the same approach couldn't be generalized to other composite types such as arrays, maps, functions, structs, and channels, so long as the corresponding pairs of key/value, param/result, field, or element types have identical underlying types. So, for example, conversions betweenmap[int]string and map[http.ConnState]http.Dir would both preserve type safety.

The earlier proposal #19778 is even more general, relying on the notion of deep structural equality and "memory layout", which is not really a concept in the spec; also it is plainly unsafe w.r.t. interfaces. (An io.Reader and an io.Writer both have the same memory layout---an interface--but being able to freely mutually convert one to another would lead to the wrong methods being called, or crashes.)

[Update: the map and chan data structures might retain the type used in the call to make, which is a principled reason not to extend this proposal to those types.]

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

Another case where one would want something similar is, for example, converting a []fmt.Stringer to an []any

Actually, fmt.Stringer and any don't have the same representation.

And even if they did, we wouldn't want to prevent future alterations to the underlying representations.

I can't think offhand of cases in which we'd want different memory layouts for values with the same underlying type, because of the ability to take the address of any given element. But worth double-checking to make sure this doesn't create any future implementation restrictions.

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

So, for example, conversions betweenmap[int]string and map[http.ConnState]http.Dir would both preserve type safety.

I can't think offhand of cases in which we'd want different memory layouts for values with the same underlying type, because of the ability to take the address of any given element.

hmm. map elements aren't addressable. so one could store e.g. slices in a compressed format (combined len/cap) until that didn't work any more for that particular map, at which point we'd set a flag in the map header and change the representation throughout. we considered doing something not dissimilar for small readonly maps, so this isn't hypothetical.

ditto for values in a channel.

so at least in theory, doing this with maps and channels could end up posing an implementation restriction.

@adonovan
Copy link
Member Author

adonovan commented Jan 8, 2025

hmm. map elements aren't addressable. so one could store e.g. slices in a compressed format (combined len/cap) until that didn't work any more for that particular map, at which point we'd set a flag in the map header and change the representation throughout. we considered doing something not dissimilar for small readonly maps, so this isn't hypothetical.

I didn't quite follow that. Do you mean that the representation of maps might globally vary based on the rtypes of its keys and values?

Your comment made me wonder whether the type descriptor used to make a map is recorded within the map. It doesn't seem to be, but perhaps that may change in future. For a channel, the element type used to make it is recorded within it. That could be an obstacle.

@josharian
Copy link
Contributor

josharian commented Jan 8, 2025

Do you mean that the representation of maps might globally vary based on the rtypes of its keys and values?

Yeah, seems implausible.

Your comment made me wonder whether the type descriptor used to make a map is recorded within the map. It doesn't seem to be,

There are generated algs recorded there (eq, hash), and last I was aware we don't fully canonicalize them in the compiler. (I tried, long ago, and got stuck.) So there are at least artifacts of the original types in the headers.

@jimmyfrasche
Copy link
Member

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

@adonovan
Copy link
Member Author

adonovan commented Jan 9, 2025

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

That's true, but it's equally true of many other pairs of types related by conversions, such as concrete types and interfaces.

@Merovius
Copy link
Contributor

Merovius commented Jan 9, 2025

@jimmyfrasche

This would also mean you could not change the underlying type of A or B except in unison or such conversions would fail to compile

Changing the underlying type is always a breaking change. Consider that if A has underlying type X, you could have func F[T ~X]() and a call F[A]().

@robpike
Copy link
Contributor

robpike commented Jan 9, 2025

I think this is a bad idea because of the door it opens. It doesn't go far enough for all the questions it suggests, and going any further could lead to extremely complicated problems. The decision to not do this goes right back to the beginning of the language and our desire to keep things simple to explain. Please do not make this change.

@jimmyfrasche
Copy link
Member

I think using generics provides a nicer API than this language change but I don't think that solves all the problem presented here either.

For general slices, there have been some calls (forgive me not tracking down the old issues) to let copy do a copy and conversion which covers more than this, albeit with a copy. Perhaps that should be reconsidered?

There are probably some times when you'd like to do this in performance sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAs[T](slice)?

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented Jan 9, 2025

There are probably some times when you'd like to do this in performance sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAs[T](slice)?

I'm a bit sympathetic to the cases (and less so to other cases) where this zero or fixed cost conversion is desired for slices of floats and ints going into standard / third-party math packages. But would unsafe be required by callers of these packages to accomplish the efficient conversion, and is that worth it? Is there something for the numerical types/math set of cases that could be done by callee packages (like math)?

@adonovan
Copy link
Member Author

@jimmyfrasche: There are probably some times when you'd like to do this in performance-sensitive code without a copy. Maybe this conversion should go in unsafe? Something like unsafe.AliasSliceAsT?

The proposed operation preserves type safety whereas unsafe.AliasSliceAs does not. Let's keep unsafe ideas out-of-scope here.

@robpike: I think this is a bad idea because of the door it opens. It doesn't go far enough for all the questions it suggests, and going any further could lead to extremely complicated problems. The decision to not do this goes right back to the beginning of the language and our desire to keep things simple to explain.

Could you elaborate on what door it opens? Are you concerned that similar conversions for other data types (such as maps and channels) will follow, despite the qualitative difference that those data structures may internally retain knowledge of the type used to make them? What kind of problem do you anticipate?

@zigo101
Copy link

zigo101 commented Jan 10, 2025

How about [][]A <-> [][]B conversions?

@aclements
Copy link
Member

aclements commented Jan 10, 2025

I think this is a bad idea because of the door it opens. It doesn't go far enough for all the questions it suggests, and going any further could lead to extremely complicated problems.

@robpike, this is why we're proposing a very limited change and specifically proposing that we not support more general convertability. Slice conversion appears to come up with some regularity; others much less so. To me, this seems like the sweet spot.

The alternative I'm seeing more and more in the wild is that people use generics to solve this problem. So instead of simply writing Sum(x []float64) float64, they write func Sum[Slice []Elem, Elem ~float64](x Slice) Elem. That does not seem like a direction we want the ecosystem to go in and this proposal aims to forestall that. (It may be valuable to do an analysis of how often this pattern comes up. I don't have that data right now.)

@aclements
Copy link
Member

How about [][]A <-> [][]B conversions?

No existing conversions are "deep" like this right now. To @robpike's concern, I don't think we want to open that door.

@gabyhelp gabyhelp added the LanguageProposal Issues describing a requested change to the Go language specification. label Jan 10, 2025
@jimmyfrasche
Copy link
Member

The generics version the library author writes once.

The conversions are needed for every call and can involve 4 conversions in the defined-slice-of-defined-types to defined-slice-of-defined-types case.

Even if this conversion is added there are still cases where it makes more sense for the library to handle it.

The example of Sum is one where I would expect generics even if this conversion existed. I'm not sure I could articulate precise criteria for when it's needed without a lot more thought, but I'm sure the stats package and Sum are cases where, if it were up to me, I would write the generic version irrespective of the outcome of this issue.

@zigo101
Copy link

zigo101 commented Jan 10, 2025

How about [][]A <-> [][]B conversions?

No existing conversions are "deep" like this right now. To @robpike's concern, I don't think we want to open that door.

The doors to []A <-> []B and [][]A <-> [][]B are the same to me. :D

@aclements
Copy link
Member

The generics version the library author writes once.

That's fair. Though there are also a lot of libraries out there that don't use the generic pattern for this.

The example of Sum is one where I would expect generics even if this conversion existed. I'm not sure I could articulate precise criteria for when it's needed without a lot more thought ...

One argument for doing the type parameterized Sum is that the caller probably wants the result of Sum to be the same type as the slice element type. There are cases where this applies and cases where it doesn't.

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jan 10, 2025

there are also a lot of libraries out there that don't use the generic pattern for this.

Do they not do it because they shouldn't or because they couldn't as it was written before generics? I do think there are a lot of places where it makes sense to just take, say, a []string but things like the stats functions and the hypothetical Sum are perhaps the best example of the kind of thing that should be generic.

Sum is an especially bad example as it could be defined over all numeric types quite naturally.

One argument for doing the type parameterized Sum is that the caller probably wants the result of Sum to be the same type as the slice element type. There are cases where this applies and cases where it doesn't.

If you have a []E you'd want an E back more often than not and when you don't you need a conversion somewhere no matter what and in that case the natural place for that, imo, would be float64(Sum(es)) instead of Sum([]float64(es)).

I'm not opposed to adding this conversion but I disagree about the extent of its impact on API design.

@earthboundkid
Copy link
Contributor

I worry that people would mistakenly think this copies the slice and not realize it keeps the pointer. It feels more like an unsafe helper function than a general tool.

@apparentlymart
Copy link

apparentlymart commented Jan 13, 2025

I share the concern that it would not be clear to an unfamiliar reader whether this is creating a copy or an alias. There aren't many other places in Go where you get to alias a memory location as two different types, and the subtlety of "underlying types" is something that folks don't often encounter until later in their Go journey (in my experience, at least).

I'm sure I'd go so far as to say it ought to be an unsafe function -- as far as I can reason, there's no inherent memory unsafety in the operation itself although subsequent uses of the result might not be safe -- but having it be an explicit function call somewhere, rather than just an implicit part of assignment, would tip the scales enough to "explicit" for me: the function can be named and documented in a way that communicates that it's aliasing and invites curiosity about the potential implications of that aliasing.

EDIT: The comment below makes me realize both that I initially made a typo here (I wrote "memory safety" instead of "memory unsafety", inverting the meaning of my sentence) and that my followup wasn't clear. Overall I should not have used the word "safe" to describe what "subsequent uses of the result" might not be. Specifically, I meant that someone not realizing that they've created an alias might inadvertently cause a data race in subsequent code, but that's a problem with the subsequent code rather than of the conversion operation, and we don't usually use "safety" in the context of data races.


Of course, we don't currently have any way to write a generic function that can be statically checked as having type parameters with the same underlying type and so I don't think we can actually spell the signature of such a function in today's language without adding something to the language.

I'm not going to get into lots of weeds here since I realize I'm now effectively penning a counterproposal, and I don't really feel strongly enough about this to warrant that, but my first thought is to offer a predeclared generic interface type assignable[T any] whose type set contains everything that can be assigned to T, giving a signature like:

package slices

func ReinterpretAs[D any, S assignable[D]](s []S) []D

...and an implementation that itself does some unsafe fiddling to return a differently-typed slice header aliasing the same memory.

An immediate downside of that signature is that I believe the current inference rules would require the caller to explicitly specify both D and S, which isn't very ergonomic. Perhaps this situation is rare enough that it doesn't need to be so ergonomic, but the original proposal was calling for this to be the default explicit conversion behavior between the types -- basically as ergonomic as it can possibly get -- so I don't think this initial attempt is satisfactory as a replacement.

(It's also not immediately obvious to me what other uses assignable[T] would have; this use-case alone doesn't seem enough to justify such a generic solution.)

@aclements
Copy link
Member

I worry that people would mistakenly think this copies the slice and not realize it keeps the pointer. It feels more like an unsafe helper function than a general tool.

I agree that that's a possible confusion. On the other hand, there are a lot of operations in Go that alias and it's necessary to understand that. On the gripping hand, aliasing is also a common source of bugs.

I'm sure I'd go so far as to say it ought to be an unsafe function -- as far as I can reason, there's no inherent memory safety in the operation itself although subsequent uses of the result might not be safe

This operation is inherently memory-safe. The underlying type determines the memory representation, so simply changing the name by which we refer to the type can't change the memory representation. This is already baked into the language rules. For example, you can convert from *A to *B if A and B have the same underlying type. Since these pointers must point to the same object, A and B must have the same memory representation.

Note that your ReinterpretAs example uses "assignability". We're not proposing any changes to assignability.

@varungandhi-src
Copy link

In case an example of a use case would help, at Sourcegraph, I'd love to have this operation be available for use at the DB layer. In particular, there are some cases where we want to:

  • Write several GBs of data into the DB as fast as reasonably possible
  • AND Constrain memory consumption
  • AND Preserve domain-specific types until the very end (instead of having to resort to using primitive types to avoid the extra allocations/memory copies)

I'm not too tied to the spelling for the operation -- whether it be a magic built-in function, a direct type conversion syntax (which, as fairly pointed out, might be misinterpreted as a copy), some library function (ideally not in unsafe) or something else.

For this use case, some additional verbosity in having to explicitly specify both the From and To types is also OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LanguageChange Suggested changes to the Go language LanguageChangeReview Discussed by language change review committee LanguageProposal Issues describing a requested change to the Go language specification. Proposal
Projects
None yet
Development

No branches or pull requests