Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: make Buffer.String avoid copy if possible #18990

Closed
bradfitz opened this issue Feb 8, 2017 · 36 comments
Closed

bytes: make Buffer.String avoid copy if possible #18990

bradfitz opened this issue Feb 8, 2017 · 36 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Feb 8, 2017

Years ago I filed #6714, hoping that it would one day be possible to construct a string efficiently without wasted allocations.

I hereby propose that we stop waiting and just do it explicitly with a new type in a new internal package.

package builder // import "internal/builder"

// String is like a bytes.Buffer but never allows its internal
// byte slice to escape. The zero value is usable.
type String struct {
     // ...
}

func (s *String) Len() int { ... }
func (s *String) Grow(n int) { ... }
func (s *String) WriteByte(b byte) { ... }
func (s *String) WriteString(s string) { ... }
func (s *String) Write(b []byte) { ... }
func (s *String) String() string { ... }

The (*String).String method would use unsafe to make a string header out of an internal []byte slice header.

The Write methods could even recycle old too-small []byte backing arrays as they grow.

There would be no Reset or Bytes or Truncate or Read methods. Nothing that could mutate the []byte once it was unsafely converted to a string.

The implementation would have to take care to use a new []byte backing array on any resize after an unsafe string was constructed.

@minux
Copy link
Member

minux commented Feb 8, 2017 via email

@minux
Copy link
Member

minux commented Feb 8, 2017 via email

@cznic
Copy link
Contributor

cznic commented Feb 8, 2017

+1 on this as I'm actually using something quite similar already.

@minux
Copy link
Member

minux commented Feb 8, 2017 via email

@robpike
Copy link
Contributor

robpike commented Feb 8, 2017

bytes.Buffer already does no allocations up to 64 bytes, but then to make a string of course there is one allocation.

It could be smarter.

I would prefer we find a way not to add more ways to create strings. There are several already.

@luna-duclos
Copy link

I would love a publicly available API for this rather than an internal package

@seh
Copy link
Contributor

seh commented Feb 8, 2017

Java has had this same problem since its start: There is no way for any caller outside the java.lang package to create a String that doesn't copy the backing character array. As noted here, even Go's bytes.Buffer makes a copy before exposing the accumulated string, in fear that subsequent operations against the bytes.Buffer would change the would-be-shared backing array.

It would nice to have a method on bytes.Buffer (or something similar) like Take, which returns a string using the backing array—without copying it—and then invalidates or resets the Buffer, so that subsequent operations against it would either panic or use a fresh backing array.

As the original description here notes,

The implementation would have to take care to use a new []byte backing array on any resize after an unsafe string was constructed.

A method like Take makes that need explicit, though it does impose unnecessary cost when the last operation against the buffer is to take a string from it.

@dmitshur
Copy link
Contributor

dmitshur commented Feb 8, 2017

a new internal package

What's the reason and effect of internal being involved in this proposal?

Edit: The effect is that it would prevent code outside of GOROOT to import the package.

@rsc
Copy link
Contributor

rsc commented Feb 8, 2017

I have a number of questions and concerns.

  • Why? This proposal gives no justification or intended use cases, making it very hard to evaluate as an engineering decision.
  • Why only internal? Is there some compelling reason that the standard library is special and needs this magic capability but code outside the standard library does not? It's hard to believe people won't just copy it into their own projects, which is not something we want to encourage.
  • This is an unsafe operation and will be overused and used incorrectly.
  • I still believe the compiler can do significantly better. My recent reading about other systems with immutability (in particular, Midori) suggests that it's actually not too hard to identify the string(buf) conversions in which buf is not retained elsewhere and not used after the conversion; those can reuse buf for the storage when appropriate.

Doing this right in the compiler seems tractable, it avoids additional API, it avoids internal-only API, and it avoids forcing programmers to make unsafe decisions. It seems better on every axis than introducing a new package. But maybe there is some consideration I am missing.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 8, 2017

@rsc,

Why? This proposal gives no justification or intended use cases, making it very hard to evaluate as an engineering decision.

See #6714 and the bugs referencing it. The justification is to reduce allocations and thus make the GC run less often and thus reduce CPU usage.

Why only internal? Is there some compelling reason that the standard library is special and needs this magic capability but code outside the standard library does not? It's hard to believe people won't just copy it into their own projects, which is not something we want to encourage.

Only because I figured it would be less controversial to start internal and see if we like the API first. We could export it later.

This is an unsafe operation and will be overused and used incorrectly.

No, the API would be designed such that it's impossible to use incorrectly or unsafely. It would only use unsafe itself, like reflect does, even though reflect is a safe part of the language. I tried to convey that in the top post.

I still believe the compiler can do significantly better. My recent reading about other systems with immutability (in particular, Midori) suggests that it's actually not too hard to identify the string(buf) conversions in which buf is not retained elsewhere and not used after the conversion; those can reuse buf for the storage when appropriate.

Significantly? In any case, that is #6714, filed in 2013. We can keep waiting, of course.

@bradfitz
Copy link
Contributor Author

bradfitz commented Feb 8, 2017

@robpike, @seh,

bytes.Buffer already does no allocations up to 64 bytes, but then to make a string of course there is one allocation.

Yes, but bytes.Buffer exposes its innards via (*Buffer).Bytes() etc. It would need to keep track more state to use unsafe safely.

A method like Take makes that need explicit, though it does impose unnecessary cost when the last operation against the buffer is to take a string from it.

If we did that, we'd need all the safety bookkeeping in that type anyway, so at that point I'd rather just make (*bytes.Buffer).String be the Take function, so there's no new API surface.

But only if we decide that bytes.Buffer wants to get into this business and use unsafe.

@crawshaw
Copy link
Member

crawshaw commented Feb 8, 2017

The builder.String type proposed here is significantly more useful than the compiler optimization described in #6714.

Determining there are no live aliases to a []byte (which is what #6714 proposes) will fail the moment it is wrapped and passed as an io.Writer. This is a very common thing to do when constructing a string via repeated calls to fmt.Fprintf.

A builder.String can be used as an io.Writer and then invalidate its internal []byte after the String method is called.

@rsc
Copy link
Contributor

rsc commented Feb 8, 2017

Sorry, @bradfitz, I missed that the []byte readers were omitted. I agree that the API is safe, and I agree that that's preferred over unsafe methods on bytes.Buffer. That resolves a big concern I had. There's still the question of what the benefit would be. What important benchmarks would get faster, and by how much? I remember that there have been past conversations, but #6714 in particular is pretty light on detail, and the other optimizations in the system have changed since then anyway.

@crawshaw, I'm not sure that's quite true. If you call func f(io.Writer) with f(w), and we know from escape analysis that f does not retain a pointer to w, and we also know from looking at w's methods that none of them expose w.buf (or even know that f doesn't call the ones that do), then we know that there are no new aliases to w.buf due to the call to f. The compiler could plausibly put that together today for fmt.Fprintf(&buf, ...) where buf is a bytes.Buffer. I believe I understand this problem much better than I did six months ago, certainly much better than in 2013 when Brad filed #6714 originally. But I have other things on my plate ahead of this.

@philhofer
Copy link
Contributor

philhofer commented Feb 8, 2017

Related: #18822 (re: immutability and live references to []byte)

@rsc
Copy link
Contributor

rsc commented Feb 13, 2017

After discussion with @golang/proposal-review, it sounds like maybe we can make bytes.Buffer's String method avoid the copy (provided there has been no call to Bytes) but remember that it gave out a string and make its own copy on any future mutation. Let's start with that "String is just more efficient" and go from there.

@rsc rsc changed the title proposal: add internal string builder type bytes: make Buffer.String avoid copy if possible Feb 13, 2017
@rsc rsc modified the milestones: Go1.9, Proposal Feb 13, 2017
@cespare
Copy link
Contributor

cespare commented Feb 13, 2017

@bradfitz et al., mind if I take this?

@bradfitz
Copy link
Contributor Author

@cespare, I already have the start of a CL. I've been waiting for years for this, so I'd like to give it a shot. :-)

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/37767 mentions this issue.

@bradfitz
Copy link
Contributor Author

Mirroring my comments from https://go-review.googlesource.com/c/37767/:

Yeah, I don't like this CL.

I don't think we can reuse the bytes.Buffer type for this magic and still have readable code.

But if we made a new type that didn't permit any escapes, then the magic would be very tiny and readable.

It turns out that the bytes.Buffer type is too large and flexible to add this magic transparently.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 May 23, 2017
@rsc rsc added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jun 5, 2017
@josharian
Copy link
Contributor

Another reason to keep the API identical (but more restricted) is to allow us to use it in the compiler and other programs that must run with old Go versions, by doing something like:

// +build !go1.10

type Buffer struct {
  bytes.Buffer
}
// +build go1.10

type Buffer struct {
  strings.Buffer
}

@cespare
Copy link
Contributor

cespare commented Oct 31, 2017

@bradfitz is it OK to send a CL for the new API?

We're already using our own Builder implementation that is pretty much what @rsc described in #18990 (comment) so it's essentially ready to go.

@ianlancetaylor
Copy link
Member

@cespare Go for it (@bradfitz is still more or less on leave).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/74931 mentions this issue: strings: add Builder

@cespare
Copy link
Contributor

cespare commented Nov 1, 2017

Thanks; I sent https://golang.org/cl/74931. I have a few questions about the API which we can discuss on the CL.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/83255 mentions this issue: strings: fix two Builder bugs allowing mutation of strings, remove ReadFrom

gopherbot pushed a commit that referenced this issue Dec 11, 2017
…adFrom

The Builder's ReadFrom method allows the underlying unsafe slice to
escape, and for callers to subsequently modify memory that had been
unsafely converted into an immutable string.

In the original proposal for Builder (#18990), I'd noted there should
be no Read methods:

> There would be no Reset or Bytes or Truncate or Read methods.
> Nothing that could mutate the []byte once it was unsafely converted
> to a string.

And in my prototype (https://golang.org/cl/37767), I handled ReadFrom
properly, but when https://golang.org/cl/74931 arrived, I missed that
it had a ReadFrom method and approved it.

Because we're so close to the Go 1.10 release, just remove the
ReadFrom method rather than think about possible fixes. It has
marginal utility in a Builder anyway.

Also, fix a separate bug that also allowed mutation of a slice's
backing array after it had been converted into a slice by disallowing
copies of the Builder by value.

Updates #18990
Fixes #23083
Fixes #23084

Change-Id: Id1f860f8a4f5f88b32213cf85108ebc609acb95f
Reviewed-on: https://go-review.googlesource.com/83255
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@dgryski
Copy link
Contributor

dgryski commented Dec 11, 2017

The string method at 3058d38#diff-4c939e2b8a761c673f94b9505cdc2d08L22 relies on the assumption that the layout of the first two struct fields of the string header and slice header are the same. At what point does warning from the reflect package about "may change in a later release" become untenable?

@cespare
Copy link
Contributor

cespare commented Dec 11, 2017

The standard lib can rely on implementation details that other packages can't, since it is distributed with that implementation.

@bradfitz
Copy link
Contributor Author

The standard lib can rely on implementation details that other packages can't, since it is distributed with that implementation.

Yup.

At what point does warning from the reflect package about "may change in a later release" become untenable?

If the layout changes, unit tests will fail and we'll make the String method a big uglier (but likely still a single line).

I've always doubted we'd be able to ever change StringHeader or SliceHeader, though.

@dmitshur
Copy link
Contributor

dmitshur commented Dec 15, 2017

Are there benchmarks available for strings.Builder anywhere?

I'm not finding any as part of CL 74931. I figure there's a good chance there are some existing benchmarks, perhaps somewhere else. I'm implementing strings.Builder for another architecture, and I wanted to reuse existing benchmarks, if possible, instead of spending time recreating them myself. Or is TestBuilderAllocs the only one?

dmitshur added a commit to gopherjs/gopherjs that referenced this issue Dec 15, 2017
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
@cespare
Copy link
Contributor

cespare commented Dec 15, 2017

Are there benchmarks available for strings.Builder anywhere?

Not that I'm aware of.

dmitshur added a commit to gopherjs/gopherjs that referenced this issue Dec 19, 2017
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
dmitshur added a commit to gopherjs/gopherjs that referenced this issue Dec 31, 2017
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
dmitshur added a commit to gopherjs/gopherjs that referenced this issue Jan 25, 2018
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
dmitshur added a commit to gopherjs/gopherjs that referenced this issue Feb 9, 2018
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
dmitshur added a commit to gopherjs/gopherjs that referenced this issue Feb 16, 2018
…chitecture.

Can't use package unsafe, so use a simple type conversion from []byte
to string in Builder.String method.

Further optimizations are deferred for later (most likely not needed
because Go->wasm will be ready by the time they're needed).

Related to golang/go#18990.
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/96980 mentions this issue: strings: add Builder benchmarks comparing bytes.Buffer and strings.Builder

@dmitshur
Copy link
Contributor

dmitshur commented Feb 26, 2018

Are there benchmarks available for strings.Builder anywhere?

For posterity, the CL mentioned above adds some nice benchmarks.

I also found from https://talks.godoc.org/github.com/campoy/gotalks/go1.10/talk.slide#29 that @campoy made some benchmarks for that talk at github.com/campoy/gotalks/go1.10/strings.

gopherbot pushed a commit that referenced this issue Feb 26, 2018
…ilder

Despite the existing test that locks in the allocation behavior, people
really want a benchmark. So:

BenchmarkBuildString_Builder/1Write_NoGrow-4    20000000  60.4 ns/op   48 B/op  1 allocs/op
BenchmarkBuildString_Builder/3Write_NoGrow-4    10000000   230 ns/op  336 B/op  3 allocs/op
BenchmarkBuildString_Builder/3Write_Grow-4      20000000   102 ns/op  112 B/op  1 allocs/op
BenchmarkBuildString_ByteBuffer/1Write_NoGrow-4 10000000   125 ns/op  160 B/op  2 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_NoGrow-4  5000000   339 ns/op  400 B/op  3 allocs/op
BenchmarkBuildString_ByteBuffer/3Write_Grow-4    5000000   316 ns/op  336 B/op  3 allocs/op

I don't think these allocate-as-fast-as-you-can benchmarks are very
interesting because they're effectively just GC benchmarks, but sure.
If one wants to see that there's 1 fewer allocation, there it is. The
ns/op and B/op numbers will change as the built string size changes.

Updates #18990

Change-Id: Ifccf535bd396217434a0e6989e195105f90132ae
Reviewed-on: https://go-review.googlesource.com/96980
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/102479 mentions this issue: all: use strings.Builder instead of bytes.Buffer where appropriate

gopherbot pushed a commit that referenced this issue Mar 26, 2018
I grepped for "bytes.Buffer" and "buf.String" and mostly ignored test
files. I skipped a few on purpose and probably missed a few others,
but otherwise I think this should be most of them.

Updates #18990

Change-Id: I5a6ae4296b87b416d8da02d7bfaf981d8cc14774
Reviewed-on: https://go-review.googlesource.com/102479
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Mar 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted
Projects
None yet
Development

No branches or pull requests