New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

language: Go 2: allow setting slice cap < len, prohibiting writes #25725

Closed
dantoye opened this Issue Jun 4, 2018 · 18 comments

Comments

Projects
None yet
7 participants
@dantoye

dantoye commented Jun 4, 2018

I'm not sure why there is no way to create a slice's from another, with a capacity lower than it's length.

It has a few uses. For example, I could pass in a copy of my slice set to 0 capacity to benefit from copy-on-write semantics. The length would still appear to have N items, but trying to write to any of them would trigger the slice to re-grow. This would allow for immutable, lightweight slices in general.

I'm not certain if it would break any code, I've seen almost noone calling "cap" in my experience.

@rogpeppe

This comment has been minimized.

Contributor

rogpeppe commented Jun 4, 2018

You can do this already. See "Full slice expressions" in https://golang.org/ref/spec#Slice_expressions.

@rogpeppe rogpeppe closed this Jun 4, 2018

@dantoye

This comment has been minimized.

dantoye commented Jun 4, 2018

@rogpeppe a bit quick on the trigger there!

"The indices are in range if 0 <= low <= high <= max <= cap(a)"

I'm discussing relaxing the last constraint, and allowing cap to be smaller than the max/length. Then normal semantics kick in, where the slice will grow if you append outside it's length. However, the only new behavior is that writing to an index that is below the cap is meaningless and won't actually do anything, or could even panic. If a function tried to write to index 0, when my cap was 0, it should panic.

It's a method of creating simple, effective readonly slices.

Here's an example where a function will, once in a blue moon, edit one of the underlying array items. I don't want to have to pass in a copy, because it would most likely be wasting memory/cpu to make a copy each time. However, I do want to ensure my slice is readonly. Right now I have no choices. However, it intuitively makes sense that if I create a new slice with 0 capacity, then no one can shove anything new into my slice. Anything people try to do to my slice would be over it's capacity, and so internally would need to grow. The length would still be 10, so they can still read all my items, just can't write to them.

https://play.golang.org/p/xf8F6_-4usD

@cespare

This comment has been minimized.

Contributor

cespare commented Jun 4, 2018

It doesn't seem intuitive to me to allow the slice's capacity to be smaller than its length. It especially doesn't seem intuitive that s[i] could panic when i < len(s).

There are other discussions around immutability that seem more promising to me. For example, #22876 and #20443 seem relevant, and there are other proposals and mailing list discussions.

@dantoye

This comment has been minimized.

dantoye commented Jun 4, 2018

Why not, @cespare ?

All the other proposals I've seen include adding new keywords and changing types entirely.

Also, I believe it makes perfect sense for the capacity to be 0, how else would you describe an item that cannot hold anything? Slices with zero capacity but a positive length are simply saying "I have something, but I can't hold anything", which is precisely the definition of readonly.

To me it's a natural extension.

@randall77

This comment has been minimized.

Contributor

randall77 commented Jun 5, 2018

I agree with @cespare , cap < len is weird.
Bounds checks are now different for a[i] = x and x = a[i] (the former also has an i < cap(a) check).

What does slicing even mean now?
x = a[:i] used to always succeed if i<cap(a), and sets len(a)=i. Now what happens? Can this still set the length if i>cap(a) but i < len(a)?

Plus all the readonly problems, like ranging over a slice of large objects without copying them:

for i := 0; i < len(a); i++ {
    p := &a[i]  //oops! if cap(a)<len(a)
    ... do something with p ...
}

Or needing multiple versions of library functions like bytes.Index, one to search only up to cap(a) because the caller wants to do a modification, and one which searches up to len(a) because the caller is only reading.

@dantoye

This comment has been minimized.

dantoye commented Jun 5, 2018

@randall77

I appreciate your feedback, but I'm not sure you understand the proposal.

Slicing would be the same as it is now - the cap would work as it does now, where the new slice will have zero capacity and len(a)=1, as you put it.

Your loop is a good example, as getting a reference to a specific element is a unique bit of syntactic sugar that would have to be modified to check for cap, not just len. It is the compilers job, in that situation, to determine if your code can safely use a reference. Alternatively, try take a reference, and when the caller passes you a cap<len slice it will panic, and the caller will now know it needs write-access to your slice. Then you might get curious why it wants write-access, and investigate, in case you misunderstood what the function was going to do.

I'm not sure where the multiple versions comment comes from. bytes.Index would remain unchanged, like most functions that only use "len". The only functions that would "change" are functions that edit a slice in-place, and even then, the change would be to panic only if passed a cap<len slice. It's 100% backwards compatible.

@randall77

This comment has been minimized.

Contributor

randall77 commented Jun 5, 2018

Your loop is a good example, as getting a reference to a specific element is a unique bit of syntactic sugar that would have to be modified to check for cap, not just len. It is the compilers job, in that situation, to determine if your code can safely use a reference. Alternatively, try take a reference, and when the caller passes you a cap<len slice it will panic, and the caller will now know it needs write-access to your slice. Then you might get curious why it wants write-access, and investigate, in case you misunderstood what the function was going to do.

But what if the ... do something with p... is only reading from p? Somehow you would now want two different &a[i] operations, one of which gives you back a writeable pointer (and panics if i>=len(a) || i >= cap(a)) and one of which gives you back a read-only pointer (and panics if i>=len(a)).
It would be unfortunate if you had to give write permission to a function which only reads a slice, solely to allow it to construct pointers to the elements. As a result, I don't think the cap<len solution can work in isolation; a more holistic read-only data structure solution is required.

For library functions, I think you need multiple versions because the "length" of the operation now depends on what you're going to do with the result. For example:

    a[bytes.IndexByte('a')] = 'b' // replace an 'a' with a 'b' 
    a[bytes.IndexByte('a') + 1] // return the character after 'a'

So does IndexByte stop searching at len(a), or min(len(a),cap(a))?
(This is maybe not the best example, I will try and think of a better one.)

@dantoye

This comment has been minimized.

dantoye commented Jun 5, 2018

@randall77 I think maybe what's not clear is that this does not change the default behavior of anything. There is no "give write permissions", there is only removing write permissions.

If a function wants a reference to a specific index, Go would simply add i<cap to the bounds check. Only if I explicitly pass a cap<len slice would it be any different than it usually is. I would only pass in a cap<len slice because, now or in the future, I'm not anticipating this function changing my slice elements in any way. There's no read/writeable pointer differentiation, it simply would not allow you to create a reference to an element outside of a slice's cap, because that value does not exist. In can not be referenced, nor dereferenced, only read. This makes complete sense when you remember that you're looking at an element the slice does not have the capacity to be holding.

Also, for all those situations (and all existing code), you would use len as normal. Using slices never changes in any scenario. The only real change is that you can use a cap<len for slice creation, and if you choose to do so, some functions may reveal themselves as not-just-reading from your slices.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 5, 2018

@dantoye What do you mean when you say "trying to write to any of them would trigger the slice to re-grow"?

@dantoye

This comment has been minimized.

dantoye commented Jun 5, 2018

@ianlancetaylor By "write to", I did mean append to. Apologies.

Writing to an array would be changed in one way: perform a cap check first, then a len check. In most situations it would have equal performance, but would have an extra check if writing to an index between len and cap. If the cap check fails, but the len check passes, a new panic is raised: "runtime error: index not within slice capacity".

For appending, however, the standard mechanics apply, though I'm sure some changes must be made if the append logic assumes the len<cap invariant.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 5, 2018

It seems to me to be fairly unfortunate to have to do an extra memory load and comparison for every write to a slice, but I agree that this seems to work. I'm going to turn into a language change proposal. Fair warning, though: I do not think it will be accepted.

@ianlancetaylor ianlancetaylor reopened this Jun 5, 2018

@ianlancetaylor ianlancetaylor changed the title from Allow setting slice capacity to language: Go 2: allow setting slice cap < len, prohibiting writes Jun 5, 2018

@ianlancetaylor ianlancetaylor added this to the Proposal milestone Jun 5, 2018

@stemar94

This comment has been minimized.

stemar94 commented Jun 5, 2018

This would allow for immutable, lightweight slices in general.

All the other proposals I've seen include adding new keywords and changing types entirely.

Having a clear way of expressing read-only/immutability - be it via keywords and types - helps developers and compilers/tooling.

@dantoye

This comment has been minimized.

dantoye commented Jun 5, 2018

@ianlancetaylor we don't have to do an extra load often. I'm not too certain about the growth factor for slices, but an average chance of needing the extra load could be calculated fairly precisely.

It would add the extra load/cmp for cases where len < idx < cap, which would be equal to half of the growth factor, which is a function of cap. Probably something like log(len)/2 more load/cmp's on average, not a linear growth.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Jun 5, 2018

@dantoye I'm probably missing something, but it seems to me that for a write to a slice we must always check against both cap and len. If we skip the cap check, we will mutate an immutable slice. If we skip the len check, we will do the wrong thing when cap > len. In the normal case both comparisons pass, so I don't see how we can skip either one.

@dantoye

This comment has been minimized.

dantoye commented Jun 6, 2018

@ianlancetaylor true, now that I think about it.

However, maybe it could be determined at slice creation time. Internally setting a hidden "max" property to max(len,cap) and use that for bounds checking?

Implementation aside, I'm more curious of your thoughts about it's use. As it's just a slice of any type, it automatically can be used for readonly values of anything.

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 6, 2018

Internally setting a hidden "max" property to max(len,cap) and use that for bounds checking?

Increasing the size of a slice from 3 words to 4 would probably have more far-reaching consequences (performance and otherwise) than adding the extra bounds check, since the compiler could probably eliminate a bunch of them. Although I suspect it'd still be expensive enough to be quite noticeable.

Implementation aside, I'm more curious of your thoughts about it's use. As it's just a slice of any type, it automatically can be used for readonly values of anything.

My 2c, FWIW: It is an interesting idea. But (a) I find it rather unintuitive and (b) like many of the readonly proposals, it enforces one particular kind of readonly-ness. Note that in a readonly slice []*T, you can't change which T a particular element points to, but you can modify the T that is pointed to. Relatedly, some of the blog posts linked to in #22876 are useful reading. If Go 2 is going to include some kind of immutability, I'd personally rather see it be more deeply integrated, rather than a clever but awkward trick that applies only to slices.

@dantoye

This comment has been minimized.

dantoye commented Jun 6, 2018

@josharian Fair points.

It seems there's a lot more to cap<len than meets the eye, as well as readability in general.

Your []*T example, and all the points above, show that for all intents and purposes it could function well as a backwards-compatible shim for readonly semantics. However, as with all shims, it would not serve well as a permanent fixture.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Sep 25, 2018

We aren't going to make this change. It will make all slice writes more costly. It's not a general answer to issues about immutability, as it only applies to slices. It's a special case with relatively few applications, and we don't need another special case.

(By the way, I'm not really clear on what happens when an index outside of the cap is written. The initial message suggests that the slice would "re-grow", but where would the newly grown slice be stored? Later messages suggest a panic on write, which would be easier to implement.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment