Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: unsafe: add Slice(ptr *T, len anyIntegerType) []T #19367

Open
mdempsky opened this issue Mar 2, 2017 · 100 comments
Open

proposal: unsafe: add Slice(ptr *T, len anyIntegerType) []T #19367

mdempsky opened this issue Mar 2, 2017 · 100 comments
Projects
Milestone

Comments

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Mar 2, 2017

reflect.SliceHeader and reflect.StringHeader are clumsy to use because their Data fields have type uintptr instead of unsafe.Pointer.

This proposal is to add types unsafe.Slice and unsafe.String as replacements. They would be declared just like their package reflect analogs, except with unsafe.Pointer-typed Data fields:

type Slice struct {
    Data Pointer
    Len int
    Cap int
}

type String struct {
    Data Pointer
    Len int
}

Additionally, I suggest that for the purposes of type conversion, we treat that string and unsafe.String have the same underlying type, and also []T and unsafe.Slice. For example, these would be valid:

func makestring(p *byte, n int) string {
    // Direct conversion of unsafe.String to string.
    return string(unsafe.String{unsafe.Pointer(p), n})
}

func memslice(p *byte, n int) (res []byte) {
    // Direct conversion of *[]byte to *unsafe.Slice, without using unsafe.Pointer.
    s := (*unsafe.Slice)(&res)
    s.Data = unsafe.Pointer(p)
    s.Len = n
    s.Cap = n
    return
}

While the same results can be achieved using unsafe.Pointer conversions, by using direct conversions the compiler can provide a little extra type safety.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 2, 2017

If we do this, we should figure out a way to exempt these new types from the Go 1 compatibility guarantee, so that we can change the representation of strings and slices in the future. I'm not sure how best to do that.

@cespare
Copy link
Contributor

@cespare cespare commented Mar 2, 2017

@ianlancetaylor reflect.SliceHeader and reflect.StringHeader already try:

It cannot be used safely or portably and its representation may change in a later release.

but the compat doc itself gives a strong exemption for all of unsafe:

Packages that import unsafe may depend on internal properties of the Go implementation. We reserve the right to make changes to the implementation that may break such programs.

ISTM that unsafe.{Slice,String} would already be exempted sufficiently.

@rsc
Copy link
Contributor

@rsc rsc commented Mar 6, 2017

Go 2 seems like the time to think about this (and reflect.SliceHeader etc).

-rsc for @golang/proposal-review

@bcmills
Copy link
Member

@bcmills bcmills commented Mar 23, 2017

This proposal seems a bit redundant with #13656.

How much of the use-case is "create a string or slice aliasing C memory" vs. "manipulate existing strings and slices by tweaking header fields unsafely"?

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Oct 18, 2019

I'd like to suggest renewing consideration of this proposal for Go 1.14. I think it will be useful for users trying to address issues flagged by -d=checkptr.

I'll also offer a counter-proposal that I think better addresses most end user needs in a more ergonomic manner:

package unsafe

func Slice(ptr *ArbitraryType, len, cap int) []ArbitraryType
func String(ptr *byte, len int) string

[Edit: As discussed below, I'm now in favor of combining Slice's len/cap parameter into a single parameter.]

This is a little less versatile than exposing the Header types, but I think it will minimize typing for most users, while also providing better type safety.

We could also do both this proposal and my original one, if we want to still offer the full flexibility of the Header types. In that case, I would suggest renaming the types to SliceHeader and StringHeader, and reserve the shorter Slice and String identifiers for the constructor functions.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 18, 2019

I like that counter proposal API.

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Oct 18, 2019

A few additional thoughts to add to my counter proposal:

  1. We should decide what happens when len < 0 or cap < len. I'm leaning towards panic, but maybe we should just leave it unspecified/undefined.

    Edit: ptr == nil && len > 0 is another case to consider.

    Edit 2: Also, len > MAXWIDTH / unsafe.Sizeof(*ptr).

  2. The functions would be builtins; in particular, users can't write f := unsafe.String; f(...).

  3. The cap argument to unsafe.Slice can be optional; if it's omitted, the len argument is used. (Just like make([]T, n) is shorthand for make([]T, n, n).)

  4. Perhaps the int parameters should actually follow the same goofy semantics that make([]T, n, m) follows. (I.e., make([]T, uint64(10), int8(20)) is valid, even though uint64 and int8 aren't normally assignable to int.)

  5. Since unsafe.String would be a builtin, it could evaluate to an untyped string.

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 18, 2019

That API is closer to what I had suggested in https://golang.org/issue/13656#issuecomment-303216308, and we've been using that variant within Google for a couple of years now without complaints.

If the type desired for the slice does not match the pointer that the user has (for example, if one is a cgo-generated type and the other is a native Go type), I'm assuming that the caller could do something like:

	var s = unsafe.Slice((*someGoType)(unsafe.Pointer(cPtr)), len, cap)

to set the element type?

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 18, 2019

We should decide what happens when len < 0 or cap < len. I'm leaning towards panic, but maybe we should just leave it unspecified/undefined.

I would leave it unspecified, but panic is a fine implementation of “unspecified”.

Perhaps the int parameters should actually follow the same goofy semantics that make([]T, n, m) follows.

That would certainly smooth out the call site in the (overwhelmingly common) case that len and/or cap is a C.size_t.

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Oct 18, 2019

If the type desired for the slice does not match the pointer that the user has (for example, if one is a cgo-generated type and the other is a native Go type), I'm assuming that the caller could do something like:

	var s = unsafe.Slice((*someGoType)(unsafe.Pointer(cPtr)), len, cap)

to set the element type?

Yeah, that's my thought. If a user wants to convert *T into []U, then I think it's reasonable to require an explicit conversion there.

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Oct 18, 2019

I would leave it unspecified, but panic is a fine implementation of “unspecified”.

Ack, though my concern is if we panic by default, then users might come to rely on it panicking and not write their own checking.

It would be easy to put the panic behind -d=checkptr though.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Oct 18, 2019

func Slice(ptr *ArbitraryType, len, cap int) []ArbitraryType

Can we instead do:

func Slice(ptr *ArbitraryType, len int[, cap int]) []ArbitraryType

... with an optional cap. Where omitting cap means cap == len?

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Oct 18, 2019

@bradfitz Yeah, that's my additional thought #3 above. :)

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 18, 2019

if we panic by default, then users might come to rely on it panicking and not write their own checking.

Hmm, good point. We could make it a throw! 😉

Or we could make it a panic in ordinary code but a throw under -race or -d=checkptr. (The important thing, I think, is to vary it just enough that it causes tests to fail in some reasonably-common configuration.)

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Jul 28, 2020

some more direct "add unsafe.Pointer + uintptr -> Pointer" operation (can't find a number for that at the moment).

There's my 2014 safe pointer arithmetic proposal: https://docs.google.com/a/dempsky.org/document/d/1yyCMzE4YPfsXvnZNjhszaYNqavxHhvbY-OWPqdzZK30/pub

Parts of it have been independently incorporated into Go since then. Eg, the pointer arithmetic rule is one of the unsafe.Pointer safely rules now, and the compiler instrumentation was implemented as checkptr.

@rsc
Copy link
Contributor

@rsc rsc commented Jul 29, 2020

Thanks @mdempsky. I created #40481 for unsafe.Add.

@rsc rsc moved this from Active to Likely Accept in Proposals Jul 30, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Aug 5, 2020

Forgot to say: Based on the comments, this seems like a likely accept.
Going to leave this in likely accept until the others are ready, just in case further discussion there leads to some kind of general solution that covers this one too.

@smasher164
Copy link
Member

@smasher164 smasher164 commented Aug 5, 2020

With the "anyIntegerType" requirement, under the current generics draft, the signature looks like this:

type integer interface {
	type int, int8, int16, int32, int64, uint, uint8, uint16, uint32, uint64, uintptr
}

func Slice[type T interface{}, I integer](p *T, len I) []T 
@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Aug 5, 2020

@smasher164 Note that's a very close approximation, but not 100% precise. E.g., make([]int, 1.0) works fine with make as a builtin; but as a generic function, it would fail because the type argument would be inferred as float64 due to 1.0's default type.

(I'm not aware of any use case for passing untyped float/complex constants to make, but that's what the Go spec today allows, and how I'd define/implement unsafe.Slice.)

@smasher164
Copy link
Member

@smasher164 smasher164 commented Aug 5, 2020

@smasher164 but as a generic function, it would fail because the type argument would be inferred as float64

I see. And extending the constraint to allow floats wouldn't work either, since it compares the underlying type. This would require you to panic at runtime, instead of just statically rejecting untyped constants.

Maybe that's okay? Since people don't pass floating-point literals into builtins, changing the signature in a future release would only be backwards incompatible in a draconian sense.

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 5, 2020

Given that unsafe.Slice does not yet exist, I think it would also be fine to have the compiler reject unsafe.Slice(p, 1.0) from the outset, and just have that behavior be slightly different from make([]int, 1.0) until and unless we tighten up make itself.

@rsc
Copy link
Contributor

@rsc rsc commented Aug 12, 2020

Still likely accept; waiting on others.

@rsc
Copy link
Contributor

@rsc rsc commented Aug 26, 2020

For what it's worth, I spent a while looking through the corpus for possible uses here, and I was surprised how little it would apply directly. Most of the time the construction of a slice was starting with an unsafe.Pointer and not a *T, so code would look like:

slice := (*[1000]T)(p)[:n]

and it would become

slice := unsafe.Slice((*T)(p), n)

which is a bit better but it's odd to have to write *T when you are trying to produce a []T, especially compared with something like unsafe.Slice([]T, p, n).

I was doing this survey as part of gathering data for #395, and I haven't written up the results in full yet, but I wanted to note this. It does make me wonder whether there's some other conversion we should be thinking about that would target both this issue and #395.

@seebs
Copy link
Contributor

@seebs seebs commented Aug 26, 2020

If memory serves, the [:n] form, right now, is likely to provoke checkptr in some cases, if the cap is too large for the space pointed to, so that's another advantage of the unsafe.Slice() form.

Vaguely reminded of the C++ism of placement-new, maybe we need make([]T, n,[ n,] ptr). ... on looking at it i think i'm going to vote against it.

I do sort of like the explicit provision of the type as a parameter, maybe. Consider:

b := make([]byte, 64)
u = unsafe.Slice([]uint64, &b[0], 8)

Options would be (1) this is an error, because the pointer is the wrong type, or (2) this is a valid call and works like type-punning.

Okay, madness: Imagine that we have, effectivtely, three signatures:

unsafe.Slice(ptr, len[, cap]) => uses pointer's type
unsafe.Slice(T, ptr, len[, cap]) => uses specified type
unsafe.Slice(T, slice, len[, cap]) => uses specified type, and the pointer from slice, and verifies that provided len and cap are valid.

I would be inclined to say that T should be specified as the slice type, not the member type, for consistency with make.

So these would be exactly equivalent:

unsafe.Slice((*T)(p), n)
unsafe.Slice([]T, p, n)

except that in the second case, p could be a pointer of any type, and/or possibly a slice. Having a slice work like a pointer to its first member isn't a completely novel notion -- %p does it.

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 26, 2020

it's odd to have to write *T when you are trying to produce a []T, especially compared with something like unsafe.Slice([]T, p, n).

I could imagine an implementation of generics, not too far from the current design draft but with a couple of additions, that would allow callers to elide the slice type when it can be inferred but to specify it otherwise.

Specifically, with a convertible.To[T] constraint and a inference algorithm that can infer a default T from convertible.To[T], then

package unsafe
func [T any, P convertible.To[*T]] Slice(ptr P, n int) []T

should allow both of the calls in

	type X struct { … }

	var u unsafe.Pointer
	s1 := unsafe.Slice[X](u, n)

	var p *X
	s2 := unsafe.Slice(p, n)
@bcmills
Copy link
Member

@bcmills bcmills commented Aug 26, 2020

On the other hand, I don't think it's unreasonable to have to convert an unsafe.Pointer to a specific pointer type in order to use it, even for another unsafe API.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 26, 2020

I would not want to infer T from convertible.To[T]. But that's OK; I don't think we need convertible at all here. It would be OK to require people to write their own conversion for the uncommon case of wanting a different type.

@rsc
Copy link
Contributor

@rsc rsc commented Aug 26, 2020

I don't think it's unreasonable to have to convert an unsafe.Pointer to a specific pointer type in order to use it,

This wasn't my point. My point was that it's weird to write a conversion to *T when what you are trying to produce is []T.

If you were already holding a *T then there'd be no conversion at all; but in most of the cases I found, you're holding an unsafe.Pointer, so a conversion is needed, but it's a conversion to not quite the type you want.

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 27, 2020

The conversion from *T to []T requires the caller to supply only one item of additional data:

  • the number of elements in the slice.

The conversion from unsafe.Pointer to []T requires two:

  • the number of elements in the slice, and
  • the element type.

unsafe.Slice as proposed provides the element type first (via the conversion to *T), and then the number of elements in the slice (via the call to unsafe.Slice). However, I could imagine some alternatives.

If array types with run-time lengths were allowed, then the conversion from unsafe.Pointer could supply both the type and the number of elements as a single conversion:

	slice := (*[n]T)(p)[:]

If array types with indeterminate lengths were allowed, then the conversion would look more-or-less the same, but the slice operation would have to supply an explicit length and capacity:

	slice := (*[...]T)(p)[:n:n]

With generics, I could imagine spelling [...]T differently, perhaps as unsafe.Array[T]. Either way, the semantics would be “an array of T for which the size is unknown and not checked during slicing and indexing operations”.

However, with either of those approaches, we would either still need an unsafe.Slice function or equivalent, or else callers would lose substantial type-safety when they already have a *T: because they would need to convert the *T to unsafe.Pointer, dropping the information about the element type, and then convert the unsafe.Pointer back to a *[...]T or *[n]T, redundantly supplying that same information.

On the other hand, if we add some form of generics to the language it will likely be possible to implement both of those conversions as a library.

@rsc rsc moved this from Likely Accept to Active in Proposals Sep 2, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Sep 2, 2020

I still don't have the data I wanted to present about this, but it doesn't seem settled even so. Moving back to Active.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Sep 2, 2020

With unsafe.Slice(ptr *anyType, len anyInteger) and generics, you could write

func SliceOf[T any](ptr unsafe.Pointer, len int) []T {
  return unsafe.Slice((*T)(ptr), len)
}

(This could probably take another parameter to accept something like anyInteger as well)

Conversely, if there were something like unsafe.Slice(T typeArg, ptr unsafe.Pointer, len anyInteger)

func SliceFrom[T any](ptr *T, len int) []T {
  return unsafe.Slice(T, unsafe.Pointer(ptr), len)
}

but that would require defining the type parameter of unsafe.Slice in some way that works with generics. The former seems simpler overall even if it often requires a conversion in practice.

@rsc what's the split when you say most code would require the conversion—are we talking roughly 51% or 99% here?

@mdempsky
Copy link
Member Author

@mdempsky mdempsky commented Sep 3, 2020

My point was that it's weird to write a conversion to *T when what you are trying to produce is []T.

I don't think it's weird. If I want to read a T value from the memory location pointed to by an unsafe.Pointer, I have to convert it to *T and then dereference the result. There's no direct "load T value from unsafe.Pointer" operation.

As I see it, a []T slice is just a 3-tuple consisting of a *T pointer and two int values, length and capacity. Analogously, I don't see why we would add a direct "create []T slice from unsafe.Pointer operation." Converting unsafe.Pointer to *T and then bundling that *T along with a length/capacity as a slice are two logically distinct operations.

I think there's a reasonable alternative view that a []T slice is actually a 3-tuple with an unsafe.Pointer instead of a *T pointer. I can see this as slightly cleaner when it comes to empty-but-non-nil slices, where the *T pointer is non-nil yet doesn't necessarily actually point to a T variable. But for this to be consistent, I think the length and capacity values should be byte counts, rather than T-element counts. unsafe.Pointer is an element-type-less pointer, and byte counts are the only element-type-less measure of memory.

I'll also remind that a direct unsafe.Pointer-to-[]T conversion operation was already previously suggested (as an extension of make, rather than a new unsafe.Slice function), and I even made the same observation that generally the *T pointers were converted from unsafe.Pointer. Yet after consideration, consensus still favored the current proposal.

Edit: I'll caveat though that in that "consensus" comment that I argued I expect *T pointers to be more common in user code than unsafe.Pointer due to cgo, and Russ's data reportedly refutes that expectation. However, in the unsafe.Pointer vs *T scenarios, we're weighing between these two spellings:

// ptr has type unsafe.Pointer
unsafe.Slice((*T)(ptr), len)  // current proposal
unsafe.Slice([]T, ptr, len)   // alternate proposal

// ptr has type *T
unsafe.Slice(ptr, len)                       // current proposal
unsafe.Slice([]T, unsafe.Pointer(ptr), len)  // alternate proposal

For unsafe.Pointers, I think the proposals are very comparably ergonomic. But for *Ts, the current proposal is considerably simpler.

but it doesn't seem settled even so.

I'd appreciate if folks who were previously settled but now unsettled would affirmatively voice that (preferably with their concerns). There have been several comments since the "likely accept" update, but I only see one that clearly expresses withdrawn support for the unsafe.Slice proposal. The rest—as I read them—seem still favor the proposal, and are just responding to the counter-proposal.

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 14, 2020

I'm still in favor, but I do think we should be careful to ensure that the call sites can be also expressed as some likely form of generics.

unsafe.Slice((*T)(ptr), len) has that property — I can envision a lot of possible type-inference algorithms in which that could be expressed.

However, unsafe.Slice([]T, …) does not have that property — I think it's relatively unlikely that the final form of generics will allow intermixing types and values in the run-time argument list.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 14, 2020

I also think that unsafe.Slice[T any](*T, int) []T is the right declaration. I understand why @rsc suggests that it is odd to write *T when you want a []T. However, you only call unsafe.Slice when you have a pointer, and you want to get a slice. Programs that are operating at this level are intimately familiar with the fact that a slice is in effect a bounded pointer. It seems to me to be entirely reasonable that there is an operation that takes a pointer and a bound and returns a slice. I don't think the fact that typical uses will explicitly say *T will lead to any confusion; the pointer is type *T, and the resulting bounded pointer is type []T.

@elichai
Copy link

@elichai elichai commented Sep 15, 2020

I'm afraid that involving generics in the API here will mean that this won't be added for at least the next year

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 15, 2020

@elichai The actual implementation of this function would be entirely in the compiler, so generics are not really involved. It's just a way of picturing the declaration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.