-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes: appending to a single slice from Split output can affect other slices of the output #21149
Comments
That'd make the result "append-safe", but at the cost of worse performance for code that does the same while aware of/insensitive to the behind-the-scenes append semantics. Modifying the backing array through a view of it must be always well though out when that backing array is shared. It is an easy trap to fall for in the beginning, admitted, but hopefully the first lesson is enough to learn the proper technique. |
Yes, it is true it would result in worse performance in the cases you mention. However, in this case I would say it's a decent trade of. Actually using the current behaviour consciously, seems almost impossible to me, especially because a new slice will be allocated anyway if too much is appended. Cases where it's insensitive are definitely possible, when only using one of the slices for instance. However, these are very vulnerable to the same issue when in future iterations other slices are also used. Furthermore, it makes sense for regular slice views to have the full capacity for performance reasons, because there's no reason to believe that there will be other living slices to the same buffer (expecting not to reuse the original slice). For If eventually it is decided that the current behaviour is the desired behaviour (which I hope won't be the case). I would still consider this a documentation bug in the
|
IMO this works as expected, if you want copy of slice, then copy it explicitly (or via setting capacity as second example) |
Wrt: 'NeedsDecision'. The proposed "fix" is not Go1 guarantee compatible and the current behavior is not a bug.
Nor it says it can lead to nuking an allied country. It's not possible to exhaustively document what will not happen. The behavior is common to all slices unless explicitly treated as in s[:len(a):len(s)]. IMO definitely working as intended. It's a programmers mistake to mutate the shared backing array. |
@djadala, the proposed solution doesn't make copies it still make slices. The only difference is that it would make them safe to append to. @cznic From what I understand it's compatible with the Go1 guarantee as this is an implementation detail:
Otherwise it also wouldn't be possible for go to change the algorithm with which the growth factor of new slices is determined.
With |
Have you any examples of such code in mind? Did you benchmark the append-safe version against the append-unsafe one?
Setting the capacities inside |
Implementation detail of what? Your quote mentions UB and unsafe.Pointer, neither seems relevant here. What I'm talking about is, I believe, specified behavior, particularly
The proposal can break existing Go programs and above I tried to explain why I think this does fall under the compatibility guarantee.
I don't. Of course that does not mean no such code exists in the wild.
No need to benchmark. Appending one byte to a byte slice is O(1) if len(s) < cap(s), O(n) otherwise. The propsal can change the class of the append operation on the result element, and n can be arbitrary large. |
I meant that it is unspecified behaviour of
I think this reasoning is flawed. Even without the three slicing operation the stdlib could do this, by modifying the capacity in the slice header using unsafe.
What programs would it break? Ones doing this on purpose? I cannot imagine what these would be. These theoretical programs would be super error prone anyway because depending on how much you append you get different behaviour. If you append too much it doesn't overwrite the other the following slice: https://play.golang.org/p/EoSShvpBgr
This is true, but I believe this is a fair tradeof. Because fast but wrong behaviour is worse than slow but correct behaviour. |
The compatibility guarantee is about correct, valid programs continue to do what they did before unless - Anyway, code like for _, v := range bytes.Split(s, []byte{foo}) {
if condition {
return append(v, bar...)
}
} seems perfectly normal to me.
I concur, but making all correct code, like above, slower to bypass a bug in someone else's code is IMO not reasonable. |
I think you are right about this. (although maybe you could put it as a security issue, because it could cause buffer overflows in user buffers) However, the code you posted would not be broken. It would still return the same thing, it would just be a bit slower. Performance is not part of the of the Go1 guarantee:
I think this final statement is a matter of opinion, and I do not agree with yours in this case. I would love to hear some other opinions. Finally if it's not deemed possible or desired to change the
Maybe a new function could even be added where the output is |
It's not specific to the results of Split/SplitN. In the general case, every append statement in a program can cause a change in some other slice viewing the underlying backing array of append's first argument. Documenting it wherever a slice appears in the stdlib is probably a non-starter. Append has side effects. Side effect can be welcome when intended and evil when unconsidered. I'm afraid the only help is to know and understand the side effects well. For sure, I too was bitten by append's semantics. I guess more than once ;-) |
This is misleading, at best. Yes, Join() (and other generic functions) can't assume that the slice views it gets are non-overlapping and/or alterable. However Split/SplitN/Fields create all of the sliced views, so why not control them so altering doesn't break? Realistically the standard recommendation will be to create code like Join() does where you never alter the data returned by Split(), because it can't be trusted ... so I don't see how fixing this is a performance regression.
Nobody is asking for that. The only other ones I can find are csv/suffixarray and both are safe to alter. |
I think it would be very useful to see examples of real programs where the current behavior of I'm somewhat sympathetic to the change; we might have designed |
@ianlancetaylor The case where I ran into it went something like this:
|
Yep, it should be |
@cznic There's like a 10 different solutions. That wasn't really the point. The point was to show what kind of bug this caused for me. |
That was just a "for the record" for future readers. BTW: Wrt appending to a slice which backing array must not be mutated. I know of only one reasonable solution. Can you please share some others? |
Ok, 10 was a bit exagarated but:
|
This is quite hard to search for, you need usage of bytes.Split() followed by using append() on the return value ... then it needs to be live (or you find the commit of when it was fixed). I found: https://stackoverflow.com/questions/37281626/strange-behavior-when-appending-to-a-2d-slice This looks bad, but might be currently safe: This fails, sometimes, if line 34 is removed (is that an accident or is it known to have a dual purpose): This works because they do it on the last element: https://github.com/kyoh86/richgo/blob/b5351b59507f266319400d92db9ed195c6aec9b3/editor/editor.go#L42
This is a subset of the first, where it's an intentional use of the feature. For what it's worth I couldn't find anything, although lots of code doesn't hit the bug by converting to strings, using an empty slice as an initial result or calling something like bytes.Replace() ... all of which hint that very few people are micro-optimizing in this way. |
@james-antill Thanks very much for doing this research. |
@james-antill good finds! Do I understand correctly that you also find commits that fix bugs related to this, because those might still be relevant. Since even though they're fixed now people still had bugs because of the current behaviour. |
No, I tried to search for commits that fixed things ... but had a hard time even getting just commits in the results from google. codesearch doesn't do it AFAIK. github makes that easy but the language selector doesn't work for commits and it has the terrible feature of showing you the same commit for each fork/branch ... which doesn't help when golang itself has 4k forks. |
May I chime in with a silly suggestion. The bug report makes a valid point. It is the nature of Yet the counter arguments are equally valid. Performance has always been a primary goal in Go, and so has the compatibility promise. So as a compromise, how about introducing a new set of functions, |
@christophberger this solution would be fine by me as well. I also like it more than just putting the current behaviour in the documentation, since now the documentation could also point to an easy to use solution. @everyone This is my first issue at the Go repo, so what is the correct way to preceed now? It doesn't seem there are any other arguments to be made. So I think a decision should be made between the 4 choices (I put emojis behind them so it's possible to vote on them):
|
OK, let's try this and roll it back if people start reporting ways that their code depends on the old behavior that we didn't expect. |
CL welcome but note that we're up against the Go 1.10 freeze deadline. |
I'm up for providing a CL, but I don't think it will be in time for the 1.10 freeze. |
Change https://golang.org/cl/74530 mentions this issue: |
Actually, fixing it was quite simple so I created a CL and included some
tests. It would be really nice if it could be included for 1.10.
…On Tue, Oct 31, 2017, 00:10 GopherBot ***@***.***> wrote:
Change https://golang.org/cl/74530 mentions this issue: bytes: return
slices where capacities don't overlap
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21149 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABG8Jn4qne5bToHkiErv-l9x8WYMOUN6ks5sxldhgaJpZM4OhsFE>
.
|
Change https://golang.org/cl/74510 mentions this issue: |
What version of Go are you using (
go version
)?Go 1.8.3
What operating system and processor architecture are you using (
go env
)?Linux
What did you do?
https://play.golang.org/p/_6HotFbp1V
What did you expect to see?
To see that variable
b
remained unchanged and was still a slice containing theb
character.What did you see instead?
The slice in variable
b
has changed to contain the charactera
.Proposed solution
A core solution to this problem would be setting the
capacity
on the slices coming out ofbytes.Split
, such that the resulting capacites:https://play.golang.org/p/VonjdLUymU
This was proposed by https://www.reddit.com/r/golang/comments/6p88m9/golang_slices_gotcha/dknvsxp/
The text was updated successfully, but these errors were encountered: