Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: derive array pointer from slice #395

Open
rogpeppe opened this issue Dec 8, 2009 · 45 comments

Comments

@rogpeppe
Copy link
Contributor

commented Dec 8, 2009

Currently, it's possible to convert from an array or array pointer to a slice, but
there's no way of reversing 
this.

A possible syntax could be similar to the current notation for type assertions:

ArrayAssertion  = "." "[" Expression ":" Expression
"]" .

where the operands either side of the ":" are constant expressions.

One motivation for doing this is that using an array pointer allows the compiler to
range check constant 
indices at compile time.

a function like this:

func foo(a []int) int
{
    return a[0] + a[1] + a[2] + a[3];
}

could be turned into:

func foo(a []int) int
{
    b := a.[0:4];
    return b[0] + b[1] + b[2] + b[3];
}

allowing the compiler to check all the bounds once only and give compile-time errors
about out of range 
indices.
@robpike

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2009

Comment 1:

Labels changed: added languagechange.

Status changed to Thinking.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2009

Comment 2:

I think this functionality would be nice.
Personally I would rather not assume
that the compiler can subtract arbitrary
expressions (as in b := a.[0:4]) but instead
say explicitly what type I want:
b := (*[4]int)(a[0:4])
The argument against this is that we hoped
introducing x[:] would let us get rid of the
implicit conversion from *[4]int to slice.
Maybe it still does, but we allow the explicit one.
There are certainly compelling cases (mostly
in low-level things like jpeg or sha1 block
processing) where converting a slice to *[N]int
for some N would eliminate many bounds checks
for cheap.

Owner changed to r...@golang.org.

@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented Dec 10, 2009

Comment 3:

> b := (*[4]int)(a[0:4])
i thought about this. the problem is that it looks like other type conversions, and
currently no 
type conversion can fail at runtime.
actually, i can't see any reason why we couldn't just use the normal type coercion
syntax:
b := a[0:4]).(*[4]int)
@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 10, 2009

Comment 4:

Yes, that's a disadvantage of (*[4]int)(a[0:4]).
A disadvantage of a[0:4].(*[4]int) is that it takes over
syntax currently reserved for interface values.  At one
point conversion syntax and type guard syntax was 
interchangeable.  It clarified things quite a bit
to require that in x.(T), x must be an interface value
and that in T(x), the conversion must be statically
guaranteed to succeed.
Unfortunately, this particular conversion doesn't fit 
into either of those categories.
We've got enough going on that's it going to be a while
before we do anything with this.
@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented May 11, 2011

Comment 5:

I just encountered a nice example of when this functionality might be useful. To quote
from a reddit user on why Go "needs" pointer arithmetic: 
"One well-used example is making classes as small as possible for tree nodes or linked
list nodes so you can cram as many of them into L1 cache lines as possible. This is done
by each node having a single pointer to a left sub-node, and the right sub-node being
accessed by the pointer to the left sub-node + 1. This saves the 8-bytes for the
right-node pointer. To do this you have to pre-allocate all the nodes in a vector or
array so they're laid out in memory sequentially, but it's worth it when you need it for
performance. (This also has the added benefit of the prefetchers being able to help
things along performance-wise - at least in the linked list case).''
You can *almost* do this in Go with 
   type node struct {
      value int
      children *[2]node
   }
except that there's no way of getting a *[2]node from the underlying slice.
@gopherbot

This comment has been minimized.

Copy link

commented May 17, 2011

Comment 6 by nmessenger:

If neither syntax is ideal, perhaps a new unslice builtin?
    ary, ok := unslice([n]T, slc)
Though should ary have type [n]T or *[n]T? If n is large and the unslice fails, a large
zeroed array might not be ideal. Anything wrong with this that I'm not seeing? Well,
besides it being another new builtin.
@gopherbot

This comment has been minimized.

Copy link

commented May 17, 2011

Comment 7 by robpike:

This is a big deal because ary would not have static type.
@rsc

This comment has been minimized.

Copy link
Contributor

commented May 17, 2011

Comment 8:

If we were going to do it - which is far from even up for debate - I think
the syntax x.(*[10]int) works well (x has type []int).  You can't .(T) a slice
type right now, so it would not be overloading anything, and like an
interface type assertion it can fail or be checked at run time.
You can even think of it as []int containing a *[10]int the same way
an interface value holds a concrete type, and you're extracting the
underlying array pointer.
That said, I don't think this is important enough to worry about now.
There is enough else going on.
@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2011

Comment 9:

Labels changed: added priority-later.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Jan 29, 2013

Comment 11:

Since this is something that's been bugging me lately too, I thought I'd add a few
random thoughts I had that don't seem to have been mentioned:
Allowing conversions from slices to array-pointers means pointers can now refer to
partially overlapping objects.  I don't believe that's currently possible in the
language.  Slices can already overlap though, so it's not a big change overall.
Like #8 says, []T is sort-of an interface type for *[N]T, so type assertions are
arguably suitable syntax.  Except that cap(x.(*[N]T)) might give a different value than
cap(x), which isn't true for other interfaces/implementor-type relations.  Seems like an
open question whether this inconsistency is worth accepting into the language, and
really since there's already a way to convert a *[N]T into a []T, just the ability to
turn a []T into a *[N]T is the relevant missing feature.
It would be nice if an expression like x[e1:e2] could actually have a static type of
*[e2-e1]T (assuming e1 and e2 are constant expressions), then you could write something
like *dst[16:24] = *src[136:144] and the compiler can verify that the array bounds match
up.  Unfortunately, the expression can't actually be x[e1:e2] since existing code might
rely on cap(x[e1:e2]) == cap(x)-e1, and that would be a backwards incompatible change. 
The x.[e1:e2] syntax suggested originally would solve this issue.
If you want a range like x.[n:n+4], instead of requiring the language to recognize this
pattern somehow, x[n:].[:4] is equivalent and has static indices at the expense of
clunkier notation.  A short-hand notation like x.[n:+4] might be nice to indicate that 4
is a length not an end position, but not strictly necessary and complicates the
language.  (Also +4 here is technically ambiguous here whether it's length 4 or end
position "+4", so again some new notation would be necessary.)
@gopherbot

This comment has been minimized.

Copy link

commented Jun 8, 2013

Comment 12 by peter.waller:

I just want to note that we have a use case for this at go-gl. More information:
go-gl/gl#111
The underlying OpenGL API only accepts the equivalent of, e.g, a *[4]float32, so it is
nice to have this in the type system on our side. OTOH, a consumer of this API might be
holding a []float32 they want to pass to us. So it would be great to find a solution to
this, as the current solutions are a bit messy or require the use of unsafe.
@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 27, 2013

Comment 13:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 15:

Labels changed: added repo-main.

@ncw

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2014

Comment 16:

An alternative which occurred to me was in a function which only indexes a slice with
constants values (eg the JPEG routines or unrolled FFTs which are what I'm working on)
and the slice doesn't change the compiler could bounds check the slice just once at the
start of the function with min(constants) and max(constants).
This would achieve the removal of the bounds checking without a language change.  It
wouldn't allow the compiler to do range checking at compile time though.
@gopherbot

This comment has been minimized.

Copy link

commented Mar 19, 2014

Comment 17 by matthieu.riou:

Another use case is to be able to use a small slice of known length as a map key. The Go
API uses slices heavily even though the same length is expected. Being able to get the
underlying array would allow usage as map keys.
Two examples I've run into recently are hashes (SHA-256) and IP addresses (as 16 byte
slices). It seems rather wasteful to have to copy them or transform them to strings to
have to use them as map keys.
@nerdatmath

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2015

FWIW, something similar to unslice() above can be implemented with reflect and unsafe. Despite being implemented in terms of unsafe, I believe unslice itself is safe. I don't know whether it violates any assumptions made by the GC, however.

http://play.golang.org/p/DixtgwxXUH

@daviddengcn

This comment has been minimized.

Copy link

commented Apr 11, 2015

Actually I believe the compiler could easily be smart enough to make a single range check for statement like this:

return a[0] + a[1] + a[2] + a[3]

@rsc rsc removed release-none labels Apr 14, 2015

@chalonga

This comment has been minimized.

Copy link

commented Apr 14, 2015

Is there a good way to do this currently?

For example if one function gives me a slice as output and I need to use that output in another function that wants a fixed array as input. What is the best way to coerce the slice into an array that is the current size of the slice and containing it's current members?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2015

Currently there is no safe way to convert from a slice type to an array type (that is the point of this issue).

You can do it using unsafe by writing code like
([10]byte)unsafe.Pointer(&b[0])

@rsc rsc added the Go2 label Jun 20, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2017

https://golang.org/cl/21008 demonstrates a different mechanism for early bounds checks: add _ = s[n] at the start of the function. If that indirection passes the compiler does not do further bounds checks on constant indexes.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2017

Converting a slice to an array in order to use it as a map key still seems like a valid use case.

@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

Another valid use case, I believe, is when some function expects a pointer to an array (for example, if I want to use an existing slice of key bytes to pass to secretbox.Open).

Currently it's not possible to do this without making a copy.

Converting a slice to an array in order to use it as a map key still seems like a valid use case.

I'm not convinced that converting a slice to an array in order to use it as a map key is that useful, because we'd need to make a copy anyway, so we can already do that with:

var a [N]T
copy(a[:], slice)

for presumably much the same runtime cost as the proposed a.([N]T), albeit with more syntactic expense.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 29, 2018

I take the point of this proposal to be converting a slice to an array without making a copy, in effect reversing the way [:] can be used to convert an array to a slice. Either way you would wind up with two values: an array A, and a slice whose underlying array was A.

And so this would theoretically be useful for looking up an existing value in a map without requiring a copy.

@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented Jan 29, 2018

I take the point of this proposal to be converting a slice to an array without making a copy

Not quite. The proposal is to be able to convert a slice to an array pointer without making a copy. I don't see how you could convert a slice to an array without copying, because then the array would need to refer to the contents of the slice and I don't think array values can work like that.

For example, how would it be possible to specify that in the expression:

a := slice.([N]T)

a now somehow is the same as the underlying backing array of the slice?
I don't think we can.

However, for looking up an existing value in a map without a copy, we could optimize the expression:

m[*slice.(*[N]T)]

so that it doesn't take a copy (much as I believe m[string(byteSlice)] avoids making a copy currently).

I suppose that we could then make slice.([N]T) syntactic sugar for *slice.(*[N]T), although I'm not sure how much it would be used.

@Elfansoer

This comment has been minimized.

Copy link

commented Mar 20, 2018

So, it's still discussed? I intuitively thought there is a method for this, getting the underlying array of a slice.

I mean, if by using slices one can modify the array, why can't you do it without? If passing a slice includes passing the array pointer, why one can't obtain that very pointer?

There's append and its friends, of course. Perhaps security problems may occur from using a single big array for slices with different purposes?

I use CGO to reuse C libraries; passing arrays back and forth as array pointers. I guess I have to directly use array rather than slices...

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2018

I see two reasonable syntaxes here to convert from s[i:j] to *[4]int:

  1. s[i:j].(*[4]int) (also permits comma-ok form)
  2. (*[4]int)(s[i:j][:4])

The difference, of course, is that version 1 permits the type assertion to fail. In version 2 we only permit constant indexes, and require that the difference in indexes match the array length, so that the conversion can not fail.

The problem with case 1 is that if v is an interface type, then v.(*[4]int) starts to look slightly ambiguous, in that it might mean a type assertion of v to *[4]int or it might mean a type assertion of v to a slice of four integers that is then converted to an array pointer.

We also have to consider named array types. Is this OK?

type A [4]int
var s = (*A)[]int{1, 2, 3, 4} // Is an explicit slice expression required here?  Probably not.
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2018

@griesemer and I prefer option 2 above. We require constant indexes in a slice expression, or a slice composite literal with a known number of elements.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

I'm thrilled to see this getting attention again.

One issue with option 2 is that there's a long tail of ways to write code in which the slice is "obviously" big enough. For example:

x := s[i:j]
if len(x) >= 4 {
  a := (*[4]int)(x)
  // use a
}

(Also worth noting, to spare future readers from having to wonder why, just s[i:i+4] is not enough, because i+4 can overflow...at least unless Go 2 also brings arbitrary precision ints.)

Another way:

a := (*[4]int)(s[i:j][1:5:5])

Another:

x := s[i:j]
_ = x[3]
a := (*[4]int)(x)
// use a

And of course, clever compilers can prove much more. If we go with option 2, it'd be nice to have a somewhat principled understanding of where to draw the line.

The problem with case 1 is that if v is an interface type, then v.(*[4]int) starts to look slightly ambiguous, in that it might mean a type assertion of v to *[4]int or it might mean a type assertion of v to a slice of four integers that is then converted to an array pointer.

I have to say, this seems pretty unambiguous to me. Interfaces store a concrete type. Type assertions retrieve values of that concrete type. Thus the first reading is the correct one.

The big advantage, to my mind, of option 1, is the comma-ok form. Without comma-ok and with option 2 requiring a slice with constant indices, I worry we'll see a lot of code like:

x := s[i:j]
if len(x) >= 4 {
  a := (*[4]int)(x[:4])
  // use a
} else {
  // handle insufficient size
}

This is a lot more ceremony than:

if a, ok := s[i:j].(*[4]int); ok {
  // use a
} else {
  // handle insufficient size
}

Another way of putting this is: Option 2 blends a lot more with the language, by finding a clever mechanism to write array pointer extraction as a conversion by ensuring that that conversion cannot fail. But from the perspective of the programmer, the conversion can effectively fail, so it'd be nicer to be able to just express that directly.

Note that allowing explicit length checks to "count" for option 2 fixes this problem, but at the cost of (probably unacceptable) spec complexity.

Here's an option 3: Introduce the notion of a conversion that can fail. It works like type assertions: If used in any but a comma-ok context, it panics on failure.

a, ok := (*[4]int)(s[i:j])

Then you keep the clean conversion syntax, avoid the [:4] ceremony, avoid any perceived ambiguity with type assertions, and keep comma-ok support. Extend comma-ok support to all conversions; it just so happens that only this particular conversion fails. This might also turn out to be helpful in other places, e.g. if Go 2 has arbitrary precision integers: You might want int64(i) to fail if int i is out of range.

I'm not really sure which option is best, just want to stir the pot a bit.

We also have to consider named array types. Is this OK?

Why would it not be?

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

@josharian Regarding option 1: Given a variable x of interface type where x holds the concrete value &[]int{1, 2, 3, 4} (a pointer to a slice []int of length 4), currently the type assertion x.(*[]int) will succeed. As you say ("the first reading is the correct one"), x.(*[4]int) is a type assertion and thus succeeds if the type of the concrete value stored in x is *[4]int. It shouldn't succeed if there's a pointer to a slice of length 4 in there (otherwise, how would one be able to type switch between a *[4]int and a *[]int concrete value in x ?). So using the existing type assertion notation seems not workable at all. What am I missing?

@randall77

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

For option 1, we could define a.(T) to be a standard type assertion if a is statically an interface type, and one of these new casts only if a is statically a concrete type.

You'd have to do i.([]int).(*[4]int) if you had an interface with a []int in it, and wanted a *[4]int out. A direct i.(*[4]int) would not work.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

Thanks, Keith. That was exactly what I had in mind.

@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented Jun 6, 2018

The suggestions above from @randall77 and @josharian was what I had in mind in comment 3 above.
But on reflection, I quite like option 2 because it keeps .() specific to interfaces, and keeps the usual semantics for slicing; it's a little more verbose but I don't think this will be a particularly common operation.

However, requiring an expression of the form (*[n]T)(slice[n]) seems unusual from the p.o.v. of language design. I can't think of any other place where Go requires a particular form of expression as an operand. In Go expressions, you can always rewrite an statement of the form expression(x) to t := x; expression(t) but this suggestion would break that invariant, because I think it would be hard to write a rule that specifically allows:

y := x[0:4]
a := (*[4]int)(y)

but not:

i := 4
y := x[0:i]
a := (*[4]int)(y)

Another possibility is just to let the type conversion panic if the slice bounds are exceeded, and allow it to work with any compatible array type. That is, the type conversion operation implies the slice too. I know that no type conversions can currently panic, but this particular case seems to me like it might be intuitively OK, and to my mind it's probably better than requiring a slice expression as an argument.

I think I'd prefer that.

@btracey

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

@josharian @ianlancetaylor I don't understand why (*[4]int)(s[i:i+4]) is not okay, but (*[4]int)(s[i:j][:4]) is. It's true that i+4 can overflow, and so the index is out of bounds, but it's also true that (*[4]int)(s[:4]) can panic because the slice has less than length 4. Why is the panic "mid-conversion" okay in the later case but not the former? Both conversions work if and only if the indexes are in-bounds.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

@btracey The issue is that currently conversion expressions never panic. I see your point about [i:i+4] but let's consider instead [i:j]. Permitting (*[4]int)(s[i:j]) would imply that conversion expressions can panic. So the simplest approach seems to be: we require constant indexes.

...or we could just say that conversion expressions are permitted to panic in this case. The general guideline for the comma-ok form is that it gives programs a way to test a condition that can not be tested in some other way. But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form, so arguably it's not so bad to just panic if the length is wrong.

Along those lines, @josharian, I have to say that despite what I just wrote I am not really worried about pushing the bounds test into the surrounding code as in

    if len(x) >= 4 { /* convert */ } else { /* oh no */ }

I think that in practice anybody who wants to use this conversion is going to know ahead of time that their slice is the right size.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

...or we could just say that conversion expressions are permitted to panic in this case.

It is worth noting that some tools would need updating if we do this. For example, the bool vet check assumes that conversions have no side-effects. There are similar assumptions in the compiler. (On the other hand, implementing this in the compiler seems very straightforward, which is a positive sign.)

That said, this is currently my favorite of the options.

@ainar-g

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Whatever happened to the originally proposed form?

arrayPointer, ok := slice.[N]
  • It doesn't interfere with .(T), leaving it for interfaces only.
  • It doesn't create a world where a T(v) conversion may panic.
  • It doesn't repeat the type unnecessarily.

Yes, it's a new syntax, but it has a new semantic behind it as well. T(v) panicking sounds like bad news.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

I think it's hard to justify introducing new syntax for this quite unusual case.

@martisch

This comment has been minimized.

Copy link
Member

commented May 27, 2019

Here is a new Idea for syntax of converting to an array slice[1..2] is the underlying array value with 2 elements. Wanting a pointer one can use &slice[1..2]. If the slicing does not work it throws a panic like slice[1:2] does.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

...or we could just say that conversion expressions are permitted to panic in this case.
[...]
But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form...

Another option is to require a comma-ok when converting a slice to an array pointer. Tools may still need updating to understand the notion of a comma-ok conversion, but those failures should be more obvious, unlike quietly removing the invariant that conversions cannot panic.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

And yet another option that I don't think we've explicitly discussed yet is to add API somewhere to do this, perhaps package reflect. After all, you can do this with unsafe now. Here's a very rough cut to start discussion:

package reflect

// ArrayPtr returns a Value containing a pointer to an array with length n and the same element type as v.
// The returned pointer points to the first element of v.
// ArrayPtr panics if v is not of type Slice, if n <= 0, or if len(v) < n.
// TODO: allow n == 0 and point to runtime.zerobase?
func (v Value) ArrayPtr(n int) Value

Sample usage:

   s := make([]byte, 4)
   p := reflect.ValueOf(s).ArrayPointer(4).Interface().(*[4]byte)
@rogpeppe

This comment has been minimized.

Copy link
Contributor Author

commented May 28, 2019

I don't think I'd support adding this just to the reflect package - this capability is likely to be most useful in very low level code, and reflect seems inappropriately heavyweight to me.

I guess it's just about possible that the compiler could inline the above statement into a no-op, but it seems a big ask, and it would be nice to be able to use the construct in places where an import of reflect is not allowed, or with less sophisticated compilers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.