Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: use (*[4]int)(x) to convert slice x into array pointer #395

Open
rogpeppe opened this issue Dec 8, 2009 · 87 comments
Open

spec: use (*[4]int)(x) to convert slice x into array pointer #395

rogpeppe opened this issue Dec 8, 2009 · 87 comments

Comments

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Dec 8, 2009

Currently, it's possible to convert from an array or array pointer to a slice, but
there's no way of reversing 
this.

A possible syntax could be similar to the current notation for type assertions:

ArrayAssertion  = "." "[" Expression ":" Expression
"]" .

where the operands either side of the ":" are constant expressions.

One motivation for doing this is that using an array pointer allows the compiler to
range check constant 
indices at compile time.

a function like this:

func foo(a []int) int
{
    return a[0] + a[1] + a[2] + a[3];
}

could be turned into:

func foo(a []int) int
{
    b := a.[0:4];
    return b[0] + b[1] + b[2] + b[3];
}

allowing the compiler to check all the bounds once only and give compile-time errors
about out of range 
indices.
@robpike
Copy link
Contributor

@robpike robpike commented Dec 9, 2009

Comment 1:

Labels changed: added languagechange.

Status changed to Thinking.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 10, 2009

Comment 2:

I think this functionality would be nice.
Personally I would rather not assume
that the compiler can subtract arbitrary
expressions (as in b := a.[0:4]) but instead
say explicitly what type I want:
b := (*[4]int)(a[0:4])
The argument against this is that we hoped
introducing x[:] would let us get rid of the
implicit conversion from *[4]int to slice.
Maybe it still does, but we allow the explicit one.
There are certainly compelling cases (mostly
in low-level things like jpeg or sha1 block
processing) where converting a slice to *[N]int
for some N would eliminate many bounds checks
for cheap.

Owner changed to r...@golang.org.

@rogpeppe
Copy link
Contributor Author

@rogpeppe rogpeppe commented Dec 10, 2009

Comment 3:

> b := (*[4]int)(a[0:4])
i thought about this. the problem is that it looks like other type conversions, and
currently no 
type conversion can fail at runtime.
actually, i can't see any reason why we couldn't just use the normal type coercion
syntax:
b := a[0:4]).(*[4]int)
@rsc
Copy link
Contributor

@rsc rsc commented Dec 10, 2009

Comment 4:

Yes, that's a disadvantage of (*[4]int)(a[0:4]).
A disadvantage of a[0:4].(*[4]int) is that it takes over
syntax currently reserved for interface values.  At one
point conversion syntax and type guard syntax was 
interchangeable.  It clarified things quite a bit
to require that in x.(T), x must be an interface value
and that in T(x), the conversion must be statically
guaranteed to succeed.
Unfortunately, this particular conversion doesn't fit 
into either of those categories.
We've got enough going on that's it going to be a while
before we do anything with this.
@rogpeppe
Copy link
Contributor Author

@rogpeppe rogpeppe commented May 11, 2011

Comment 5:

I just encountered a nice example of when this functionality might be useful. To quote
from a reddit user on why Go "needs" pointer arithmetic: 
"One well-used example is making classes as small as possible for tree nodes or linked
list nodes so you can cram as many of them into L1 cache lines as possible. This is done
by each node having a single pointer to a left sub-node, and the right sub-node being
accessed by the pointer to the left sub-node + 1. This saves the 8-bytes for the
right-node pointer. To do this you have to pre-allocate all the nodes in a vector or
array so they're laid out in memory sequentially, but it's worth it when you need it for
performance. (This also has the added benefit of the prefetchers being able to help
things along performance-wise - at least in the linked list case).''
You can *almost* do this in Go with 
   type node struct {
      value int
      children *[2]node
   }
except that there's no way of getting a *[2]node from the underlying slice.
@gopherbot
Copy link

@gopherbot gopherbot commented May 17, 2011

Comment 6 by nmessenger:

If neither syntax is ideal, perhaps a new unslice builtin?
    ary, ok := unslice([n]T, slc)
Though should ary have type [n]T or *[n]T? If n is large and the unslice fails, a large
zeroed array might not be ideal. Anything wrong with this that I'm not seeing? Well,
besides it being another new builtin.
@gopherbot
Copy link

@gopherbot gopherbot commented May 17, 2011

Comment 7 by robpike:

This is a big deal because ary would not have static type.
@rsc
Copy link
Contributor

@rsc rsc commented May 17, 2011

Comment 8:

If we were going to do it - which is far from even up for debate - I think
the syntax x.(*[10]int) works well (x has type []int).  You can't .(T) a slice
type right now, so it would not be overloading anything, and like an
interface type assertion it can fail or be checked at run time.
You can even think of it as []int containing a *[10]int the same way
an interface value holds a concrete type, and you're extracting the
underlying array pointer.
That said, I don't think this is important enough to worry about now.
There is enough else going on.
@rsc
Copy link
Contributor

@rsc rsc commented Dec 9, 2011

Comment 9:

Labels changed: added priority-later.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jan 29, 2013

Comment 11:

Since this is something that's been bugging me lately too, I thought I'd add a few
random thoughts I had that don't seem to have been mentioned:
Allowing conversions from slices to array-pointers means pointers can now refer to
partially overlapping objects.  I don't believe that's currently possible in the
language.  Slices can already overlap though, so it's not a big change overall.
Like #8 says, []T is sort-of an interface type for *[N]T, so type assertions are
arguably suitable syntax.  Except that cap(x.(*[N]T)) might give a different value than
cap(x), which isn't true for other interfaces/implementor-type relations.  Seems like an
open question whether this inconsistency is worth accepting into the language, and
really since there's already a way to convert a *[N]T into a []T, just the ability to
turn a []T into a *[N]T is the relevant missing feature.
It would be nice if an expression like x[e1:e2] could actually have a static type of
*[e2-e1]T (assuming e1 and e2 are constant expressions), then you could write something
like *dst[16:24] = *src[136:144] and the compiler can verify that the array bounds match
up.  Unfortunately, the expression can't actually be x[e1:e2] since existing code might
rely on cap(x[e1:e2]) == cap(x)-e1, and that would be a backwards incompatible change. 
The x.[e1:e2] syntax suggested originally would solve this issue.
If you want a range like x.[n:n+4], instead of requiring the language to recognize this
pattern somehow, x[n:].[:4] is equivalent and has static indices at the expense of
clunkier notation.  A short-hand notation like x.[n:+4] might be nice to indicate that 4
is a length not an end position, but not strictly necessary and complicates the
language.  (Also +4 here is technically ambiguous here whether it's length 4 or end
position "+4", so again some new notation would be necessary.)
@gopherbot
Copy link

@gopherbot gopherbot commented Jun 8, 2013

Comment 12 by peter.waller:

I just want to note that we have a use case for this at go-gl. More information:
go-gl/gl#111
The underlying OpenGL API only accepts the equivalent of, e.g, a *[4]float32, so it is
nice to have this in the type system on our side. OTOH, a consumer of this API might be
holding a []float32 they want to pass to us. So it would be great to find a solution to
this, as the current solutions are a bit messy or require the use of unsafe.
@rsc
Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 13:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 15:

Labels changed: added repo-main.

@ncw
Copy link
Contributor

@ncw ncw commented Jan 20, 2014

Comment 16:

An alternative which occurred to me was in a function which only indexes a slice with
constants values (eg the JPEG routines or unrolled FFTs which are what I'm working on)
and the slice doesn't change the compiler could bounds check the slice just once at the
start of the function with min(constants) and max(constants).
This would achieve the removal of the bounds checking without a language change.  It
wouldn't allow the compiler to do range checking at compile time though.
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 19, 2014

Comment 17 by matthieu.riou:

Another use case is to be able to use a small slice of known length as a map key. The Go
API uses slices heavily even though the same length is expected. Being able to get the
underlying array would allow usage as map keys.
Two examples I've run into recently are hashes (SHA-256) and IP addresses (as 16 byte
slices). It seems rather wasteful to have to copy them or transform them to strings to
have to use them as map keys.
@nerdatmath
Copy link
Contributor

@nerdatmath nerdatmath commented Apr 11, 2015

FWIW, something similar to unslice() above can be implemented with reflect and unsafe. Despite being implemented in terms of unsafe, I believe unslice itself is safe. I don't know whether it violates any assumptions made by the GC, however.

http://play.golang.org/p/DixtgwxXUH

@daviddengcn
Copy link

@daviddengcn daviddengcn commented Apr 11, 2015

Actually I believe the compiler could easily be smart enough to make a single range check for statement like this:

return a[0] + a[1] + a[2] + a[3]
@rsc rsc removed release-none labels Apr 14, 2015
@chalonga
Copy link

@chalonga chalonga commented Apr 14, 2015

Is there a good way to do this currently?

For example if one function gives me a slice as output and I need to use that output in another function that wants a fixed array as input. What is the best way to coerce the slice into an array that is the current size of the slice and containing it's current members?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 14, 2015

Currently there is no safe way to convert from a slice type to an array type (that is the point of this issue).

You can do it using unsafe by writing code like

(*[10]byte)(unsafe.Pointer(&b[0]))
@bcmills
Copy link
Member

@bcmills bcmills commented Aug 3, 2020

@josharian, note that returning nil still probably wouldn't address @FiloSottile's concern, since that could provoke a new panic at arbitrarily many later points in the execution of the program. (That is, from the perspective of code auditing, returning nil for a conversion from a non-nil value is likely just as disruptive as panicking directly.)

@josharian
Copy link
Contributor

@josharian josharian commented Aug 3, 2020

@bcmills I’d love to get Filippo’s take on that. That’s not obvious to me. But it’s true that it might break existing nil pointer analysis tools, in the same way that panicking might break existing control flow analysis tools.

@beoran
Copy link

@beoran beoran commented Aug 5, 2020

One approach which I haven't seen mentioned is to have an unslice() built in which simply does the "right thing", which is allocating a new array and copying over the available elements from the slice if the slice is too small. Remaining entries are left at the zero value. If I have a []int{1, 2} that I want to convert to a [4]int, then, [1,2,0,0] is, IMHO, the correct answer to what the conversion result should be. On the other hand the slice is big enough the copy doesn't happen and we get a pointer to the underlying array of the slice in stead.

slice := []int{}
arr := unslice([4]int, slice)
// arr now is [4]int{0,0,0,0}, but the array doesn't point into slice and is newly allocated.

slice := []int{1, 2, 3, 4, 5}
arr := unslice([4]int, slice)
// arr now is [4]int{1,2,3,4} and points into slice
@rsc rsc changed the title proposal: spec: derive array pointer from slice proposal: spec: use (*[4]int)(x) to convert slice x into array pointer Aug 5, 2020
@rsc
Copy link
Contributor

@rsc rsc commented Aug 5, 2020

The security argument here seems pretty weak. It's already the case that x.f can panic, as can x.m(), and lots of other expressions. No one is actually mentally checking every single one of those. That's a job for tools.

Keith's point about printing the short length is a great reason to panic instead of silently coercing to nil. I've really enjoyed the new index and slice panics that show the out-of-range index and length. It would be very helpful, in the rare case when a conversion does fail, to see that without having to add the kind of print that @jimmyfrasche showed, preemptively at every conversion site.

From the reactions to Ian's comment above and the discussion after,
it sounds like we are converging on using
(*[4]int)(x) for the checked conversion, with a panic if x is too short.

Does anyone object to that? (Retitled as well.)

The one thing that I'm still not sure about is how often this comes up at all. I still need to check the corpus I have, but if anyone can chime in with anecdotal stories about when you would have needed this, that would be helpful too.

Thanks.

@jimmyfrasche
Copy link
Member

@jimmyfrasche jimmyfrasche commented Aug 5, 2020

@rsc I think this is a problem worth solving. I don't think it's a common enough problem to deserve weakening an invariant of existing syntax (which affects tools as well as mental checks). Overloading the type assertion syntax seems a better fit as it can already panic, but a builtin or even something in unsafe seems best. I ultimately wouldn't mind too much if conversion syntax is used but it certainly doesn't feel right to me.

@beoran this is for getting a pointer to the same underlying array as the slice uses: if you want a copy of the contents of that array you can use the copy builtin.

@martisch
Copy link
Contributor

@martisch martisch commented Aug 5, 2020

Anecdotal story for when I would like to have used this:

I was working on an optimization for a hot library code path. It was using a string as key to a map. That string got constructed by a strings.Join from a []string. For all practical purposes they were always 4 elements long. Due to historical reasons and generality the functions involved used a []string as input and changing this as it was a widely used API and config would have introduced alot of effort and churn in a large codebase.

Using [4]string as a map key (with a fallback for the case where it could have been used otherwise by joining the down to only using the first element of [4]string with a special terminal symbol) was a bit faster as it avoid an allocation to construct the string key. However there is no safe way to just convert []string to [4]string (or *[4]string) without copying [4]string (or allocating) which made the new code seem to be a bit slower then it needed to be.

Code that i liked to have ideally written for performance would have been (simplified):

if len(strs) == 4 {
   key := (*[4]string)(strs)
   ... 
   v := m[*key]
}

Googlers: Search for the phrase "In practice, entries always have four components" to find the unoptimized code location in the mono repo.

@randall77
Copy link
Contributor

@randall77 randall77 commented Aug 5, 2020

key := (*[4]string)(strs)
   ... 
   v := m[*key]

That seems not a very compelling motivator for this issue, as you could do:

    var key [4]string
    copy(key[:], strs)
    v := m[key]

instead. One more line, but not really any more expensive (in an ideal compiler world).

@martisch
Copy link
Contributor

@martisch martisch commented Aug 5, 2020

Sure its already possible to use a copy. Not meaning to imply it was compelling.
The compiler currently zeros key, then copies in strs with a typedslicecopy then moves the contents to a tmp and passes a pointer to the tmp copy to the runtime map routine. All in all 3 times storing to memory and 2 times loading from it that can be avoided. Ideally (which is easier to optimize by the compiler with the explicit conversion) the generated binary would just pass a pointer to the backing array of the strs slice.

https://godbolt.org/z/57ejs6

@rsc
Copy link
Contributor

@rsc rsc commented Aug 12, 2020

I still need to search for conversions in the corpus.

@rsc
Copy link
Contributor

@rsc rsc commented Aug 26, 2020

I started on this, and I commented on #19367 with some of what I found, but I don't have full results yet. Downloading and type-checking lots of code to look for specific use case possibilities.

@rsc
Copy link
Contributor

@rsc rsc commented Oct 7, 2020

I didn't find many such conversions, but it's likely that (1) people have worked around the lack and (2) there will never be too many of these, but when it's needed it will be important. The comments above have lots of potential uses.

The latest version of the proposal is #395 (comment):

(*[4]int)(x) for the checked conversion, with a panic if x is too short.

And the panic will print both the target length and the too-short actual length.

I asked for objections on Aug 7 and no one had any (except me).
Based on the discussion, then, this seems like a likely accept.

@bcmills
Copy link
Member

@bcmills bcmills commented Oct 8, 2020

Is it possible for [4]int to have a different alignment from int?

If so, would the conversion from []int to *[4]int also panic if the slice is not properly aligned for the array type?

@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 8, 2020

Is it possible for [4]int to have a different alignment from int?

No. The alignment of an array is the same as the alignment of its element.

@rsc rsc modified the milestones: Unplanned, Proposal Oct 8, 2020
@rsc rsc added Proposal-FinalCommentPeriod and removed Go2 labels Oct 8, 2020
@rsc rsc moved this from Active to Likely Accept in Proposals Oct 8, 2020
@SlyMarbo
Copy link

@SlyMarbo SlyMarbo commented Oct 10, 2020

Would it be possible for the syntax to require an explicit slice? For example, (*[4]int)(x[2:6]). Explicit slicing will be necessary anyway when you're not using the entire slice and this way any panic would happen during the slice (as it would today), meaning the conversion itself can't fail. The compiler can assert that the slice bounds are constant expressions and produce the same range as the array length. I think this would be less surprising than the conversion panicking or returning nil, plus any tooling looking for slices that could panic will still find this.

@josharian
Copy link
Contributor

@josharian josharian commented Oct 11, 2020

@SlyMarbo this was discussed at some length in earlier comments

@rsc
Copy link
Contributor

@rsc rsc commented Oct 14, 2020

No change in consensus, so accepted.
It's a bit late to do this for Go 1.16, so will milestone to Go 1.17.

@rsc rsc moved this from Likely Accept to Accepted in Proposals Oct 14, 2020
@rsc rsc modified the milestones: Proposal, Go1.17 Oct 14, 2020
@rsc rsc changed the title proposal: spec: use (*[4]int)(x) to convert slice x into array pointer spec: use (*[4]int)(x) to convert slice x into array pointer Oct 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.