-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes: make Buffer.String avoid copy if possible #18990
Comments
This package doesn't sound too magical to require compiler/runtime helps
and it will be generally useful.
Perhaps add it to the strings package instead?
one possible implementation:
// invariant: with all grow methods, we must make sure len(bytes) <
cap(bytes)
// When String is called, we freeze the byte slice by reslicing it so that
len == cap
// and then just return the String itself as a string header using unsafe
(this relying
// on the fact that the first two fields in a slice header is the same as a
string header..
type String struct {
bytes []byte
}
|
We can even make compiler do the transform for us (like Java's
StringBuilder).
e.g.
transform all variable string concatenations like:
s := fmt.Sprintf("A%d", n) + "ABC"
into:
var sb builder.String
fmt.Fprintf(&sb, "A%d", n)
sb.WriteString("ABC")
s := sb.String()
|
+1 on this as I'm actually using something quite similar already. |
I realized that we don't need to maintain any invariant: the underlying
[]byte is append-only, so we can convert that to a string at any time
without worrying whether the later updates could modify the string's
underlying buffer.
|
bytes.Buffer already does no allocations up to 64 bytes, but then to make a string of course there is one allocation. It could be smarter. I would prefer we find a way not to add more ways to create strings. There are several already. |
I would love a publicly available API for this rather than an internal package |
Java has had this same problem since its start: There is no way for any caller outside the It would nice to have a method on As the original description here notes,
A method like |
What's the reason and effect of Edit: The effect is that it would prevent code outside of GOROOT to import the package. |
I have a number of questions and concerns.
Doing this right in the compiler seems tractable, it avoids additional API, it avoids internal-only API, and it avoids forcing programmers to make unsafe decisions. It seems better on every axis than introducing a new package. But maybe there is some consideration I am missing. |
@rsc,
See #6714 and the bugs referencing it. The justification is to reduce allocations and thus make the GC run less often and thus reduce CPU usage.
Only because I figured it would be less controversial to start internal and see if we like the API first. We could export it later.
No, the API would be designed such that it's impossible to use incorrectly or unsafely. It would only use unsafe itself, like reflect does, even though reflect is a safe part of the language. I tried to convey that in the top post.
Significantly? In any case, that is #6714, filed in 2013. We can keep waiting, of course. |
Yes, but
If we did that, we'd need all the safety bookkeeping in that type anyway, so at that point I'd rather just make But only if we decide that |
The Determining there are no live aliases to a A |
Sorry, @bradfitz, I missed that the []byte readers were omitted. I agree that the API is safe, and I agree that that's preferred over unsafe methods on bytes.Buffer. That resolves a big concern I had. There's still the question of what the benefit would be. What important benchmarks would get faster, and by how much? I remember that there have been past conversations, but #6714 in particular is pretty light on detail, and the other optimizations in the system have changed since then anyway. @crawshaw, I'm not sure that's quite true. If you call func f(io.Writer) with f(w), and we know from escape analysis that f does not retain a pointer to w, and we also know from looking at w's methods that none of them expose w.buf (or even know that f doesn't call the ones that do), then we know that there are no new aliases to w.buf due to the call to f. The compiler could plausibly put that together today for fmt.Fprintf(&buf, ...) where buf is a bytes.Buffer. I believe I understand this problem much better than I did six months ago, certainly much better than in 2013 when Brad filed #6714 originally. But I have other things on my plate ahead of this. |
Related: #18822 (re: immutability and live references to []byte) |
After discussion with @golang/proposal-review, it sounds like maybe we can make bytes.Buffer's String method avoid the copy (provided there has been no call to Bytes) but remember that it gave out a string and make its own copy on any future mutation. Let's start with that "String is just more efficient" and go from there. |
@bradfitz et al., mind if I take this? |
@cespare, I already have the start of a CL. I've been waiting for years for this, so I'd like to give it a shot. :-) |
CL https://golang.org/cl/37767 mentions this issue. |
Mirroring my comments from https://go-review.googlesource.com/c/37767/:
It turns out that the |
Another reason to keep the API identical (but more restricted) is to allow us to use it in the compiler and other programs that must run with old Go versions, by doing something like: // +build !go1.10
type Buffer struct {
bytes.Buffer
} // +build go1.10
type Buffer struct {
strings.Buffer
} |
@bradfitz is it OK to send a CL for the new API? We're already using our own Builder implementation that is pretty much what @rsc described in #18990 (comment) so it's essentially ready to go. |
Change https://golang.org/cl/74931 mentions this issue: |
Thanks; I sent https://golang.org/cl/74931. I have a few questions about the API which we can discuss on the CL. |
Change https://golang.org/cl/83255 mentions this issue: |
…adFrom The Builder's ReadFrom method allows the underlying unsafe slice to escape, and for callers to subsequently modify memory that had been unsafely converted into an immutable string. In the original proposal for Builder (#18990), I'd noted there should be no Read methods: > There would be no Reset or Bytes or Truncate or Read methods. > Nothing that could mutate the []byte once it was unsafely converted > to a string. And in my prototype (https://golang.org/cl/37767), I handled ReadFrom properly, but when https://golang.org/cl/74931 arrived, I missed that it had a ReadFrom method and approved it. Because we're so close to the Go 1.10 release, just remove the ReadFrom method rather than think about possible fixes. It has marginal utility in a Builder anyway. Also, fix a separate bug that also allowed mutation of a slice's backing array after it had been converted into a slice by disallowing copies of the Builder by value. Updates #18990 Fixes #23083 Fixes #23084 Change-Id: Id1f860f8a4f5f88b32213cf85108ebc609acb95f Reviewed-on: https://go-review.googlesource.com/83255 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
The string method at 3058d38#diff-4c939e2b8a761c673f94b9505cdc2d08L22 relies on the assumption that the layout of the first two struct fields of the string header and slice header are the same. At what point does warning from the reflect package about "may change in a later release" become untenable? |
The standard lib can rely on implementation details that other packages can't, since it is distributed with that implementation. |
Yup.
If the layout changes, unit tests will fail and we'll make the String method a big uglier (but likely still a single line). I've always doubted we'd be able to ever change StringHeader or SliceHeader, though. |
Are there benchmarks available for I'm not finding any as part of CL 74931. I figure there's a good chance there are some existing benchmarks, perhaps somewhere else. I'm implementing |
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
Not that I'm aware of. |
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
…chitecture. Can't use package unsafe, so use a simple type conversion from []byte to string in Builder.String method. Further optimizations are deferred for later (most likely not needed because Go->wasm will be ready by the time they're needed). Related to golang/go#18990.
Change https://golang.org/cl/96980 mentions this issue: |
For posterity, the CL mentioned above adds some nice benchmarks. I also found from https://talks.godoc.org/github.com/campoy/gotalks/go1.10/talk.slide#29 that @campoy made some benchmarks for that talk at |
…ilder Despite the existing test that locks in the allocation behavior, people really want a benchmark. So: BenchmarkBuildString_Builder/1Write_NoGrow-4 20000000 60.4 ns/op 48 B/op 1 allocs/op BenchmarkBuildString_Builder/3Write_NoGrow-4 10000000 230 ns/op 336 B/op 3 allocs/op BenchmarkBuildString_Builder/3Write_Grow-4 20000000 102 ns/op 112 B/op 1 allocs/op BenchmarkBuildString_ByteBuffer/1Write_NoGrow-4 10000000 125 ns/op 160 B/op 2 allocs/op BenchmarkBuildString_ByteBuffer/3Write_NoGrow-4 5000000 339 ns/op 400 B/op 3 allocs/op BenchmarkBuildString_ByteBuffer/3Write_Grow-4 5000000 316 ns/op 336 B/op 3 allocs/op I don't think these allocate-as-fast-as-you-can benchmarks are very interesting because they're effectively just GC benchmarks, but sure. If one wants to see that there's 1 fewer allocation, there it is. The ns/op and B/op numbers will change as the built string size changes. Updates #18990 Change-Id: Ifccf535bd396217434a0e6989e195105f90132ae Reviewed-on: https://go-review.googlesource.com/96980 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Alan Donovan <adonovan@google.com>
Change https://golang.org/cl/102479 mentions this issue: |
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test files. I skipped a few on purpose and probably missed a few others, but otherwise I think this should be most of them. Updates #18990 Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774 Reviewed-on: https://go-review.googlesource.com/102479 Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Years ago I filed #6714, hoping that it would one day be possible to construct a string efficiently without wasted allocations.
I hereby propose that we stop waiting and just do it explicitly with a new type in a new internal package.
The
(*String).String
method would useunsafe
to make a string header out of an internal[]byte
slice header.The
Write
methods could even recycle old too-small[]byte
backing arrays as they grow.There would be no
Reset
orBytes
orTruncate
orRead
methods. Nothing that could mutate the[]byte
once it was unsafely converted to astring
.The implementation would have to take care to use a new
[]byte
backing array on any resize after an unsafestring
was constructed.The text was updated successfully, but these errors were encountered: