New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile, bytes: bootstrap array causes bytes.Buffer to always be heap-allocated #7921

Open
lukescott opened this Issue May 1, 2014 · 18 comments

Comments

Projects
None yet
@lukescott

lukescott commented May 1, 2014

What steps reproduce the problem?

1. Download playground.zip
2. Run go build -gcflags=-m bytes.go
3. Run go build -gcflags=-m bytes2.go

playground/bytes: Resembles a simplified version of Go's bytes.Buffer, with the
exception that NewBuffer returns Buffer{} instead of &Buffer{}.
playground/bytes2: Different implementation that avoids reslicing

What happened?

$ go build -gcflags=-m bytes.go
# playground/bytes
bytes/buffer.go:12: can inline NewBuffer
bytes/buffer.go:57: can inline (*Buffer).Read
bytes/buffer.go:69: can inline (*Buffer).Bytes
bytes/buffer.go:12: leaking param: b to result ~anon1
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap
bytes/buffer.go:16: leaking param: b
bytes/buffer.go:44: leaking param: b
bytes/buffer.go:44: leaking param: b
bytes/buffer.go:52: leaking param: b
bytes/buffer.go:52: (*Buffer).Write p does not escape
bytes/buffer.go:57: (*Buffer).Read b does not escape
bytes/buffer.go:57: (*Buffer).Read p does not escape
bytes/buffer.go:69: (*Buffer).Bytes b does not escape
# command-line-arguments
./bytes.go:9: inlining call to bytes.NewBuffer
./bytes.go:13: inlining call to Read
./bytes.go:18: inlining call to Bytes
./bytes.go:9: moved to heap: myBuf
./bytes.go:11: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:14: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:11: main []byte literal does not escape
./bytes.go:12: main make([]byte, 2) does not escape
./bytes.go:13: main myBuf does not escape
./bytes.go:14: main []byte literal does not escape
./bytes.go:18: main myBuf does not escape


$ go build -gcflags=-m bytes2.go
# playground/bytes2
bytes2/buffer.go:13: can inline NewBuffer
bytes2/buffer.go:61: can inline (*Buffer).Read
bytes2/buffer.go:73: can inline (*Buffer).Bytes
bytes2/buffer.go:13: leaking param: b to result ~anon1
bytes2/buffer.go:38: make([]byte, size) escapes to heap
bytes2/buffer.go:17: (*Buffer).grow b does not escape
bytes2/buffer.go:47: (*Buffer).Grow b does not escape
bytes2/buffer.go:54: (*Buffer).Write b does not escape
bytes2/buffer.go:54: (*Buffer).Write p does not escape
bytes2/buffer.go:61: (*Buffer).Read b does not escape
bytes2/buffer.go:61: (*Buffer).Read p does not escape
bytes2/buffer.go:73: (*Buffer).Bytes b does not escape
# command-line-arguments
./bytes2.go:9: inlining call to bytes2.NewBuffer
./bytes2.go:13: inlining call to Read
./bytes2.go:18: inlining call to Bytes
./bytes2.go:8: main make([]byte, 4) does not escape
./bytes2.go:11: main myBuf does not escape
./bytes2.go:11: main []byte literal does not escape
./bytes2.go:12: main make([]byte, 2) does not escape
./bytes2.go:13: main myBuf does not escape
./bytes2.go:14: main myBuf does not escape
./bytes2.go:14: main []byte literal does not escape
./bytes2.go:18: main myBuf does not escape
# command-line-arguments


What should have happened instead?

This shouldn't be happening with playground/bytes:

./bytes.go:9: moved to heap: myBuf
./bytes.go:11: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:14: myBuf escapes to heap
./bytes.go:8: make([]byte, 4) escapes to heap

A re-slicing shouldn't cause playground/bytes's Buffer to always be heap allocated.

It should be like playground/bytes2, which avoids heap allocation until a resize happens:

bytes2/buffer.go:38: make([]byte, size) escapes to heap

Please provide any additional information below.

If this is working as intended, the implementation of bytes.Buffer should change to
allow it to be completely stack allocated. The playground/bytes2 implementation is
completely stack allocated until the initial buffer needs to be resized. This allows it
to operate on the stack, avoiding a heap allocation if possible.

Attachments:

  1. playground.zip (5470 bytes)
@lukescott

This comment has been minimized.

lukescott commented May 1, 2014

Comment 1:

Another thing to note: If I change "NewBuffer" in playground/bytes2 to return &Buffer{}
then buf (in main()) escapes to the heap even though NewBuffer is inlined:
./bytes2.go:9: inlining call to bytes2.NewBuffer
./bytes2.go:13: inlining call to Read
./bytes2.go:18: inlining call to Bytes
./bytes2.go:8: make([]byte, 4) escapes to heap
./bytes2.go:9: <S> &bytes2.Buffer literal does not escape
./bytes2.go:11: main []byte literal does not escape
./bytes2.go:12: main make([]byte, 2) does not escape
./bytes2.go:14: main []byte literal does not escape
Otherwise if I return Buffer{} directly nothing gets allocated to the heap, unless a
resize happens.
@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented May 2, 2014

Comment 2:

Labels changed: added repo-main, release-go1.4.

@rsc

This comment has been minimized.

Contributor

rsc commented Sep 15, 2014

Comment 3:

Labels changed: added release-none, removed release-go1.4.

Status changed to Accepted.

dvyukov added a commit that referenced this issue Jan 28, 2015

cmd/gc: ignore re-slicing in escape analysis
Escape analysis treats everything assigned to OIND/ODOTPTR as escaping.
As the result b escapes in the following code:

	func (b *Buffer) Foo() {
		n, m := ...
		b.buf = b.buf[n:m]
	}

This change recognizes such assignments and ignores them.

Update issue #9043.
Update issue #7921.

There are two similar cases in std lib that benefit from this optimization.
First is in archive/zip:

type readBuf []byte
func (b *readBuf) uint32() uint32 {
	v := binary.LittleEndian.Uint32(*b)
	*b = (*b)[4:]
	return v
}

Second is in time:

type data struct {
	p     []byte
	error bool
}

func (d *data) read(n int) []byte {
	if len(d.p) < n {
		d.p = nil
		d.error = true
		return nil
	}
	p := d.p[0:n]
	d.p = d.p[n:]
	return p
}

benchmark                         old ns/op     new ns/op     delta
BenchmarkCompressedZipGarbage     32431724      32217851      -0.66%

benchmark                         old allocs     new allocs     delta
BenchmarkCompressedZipGarbage     153            143            -6.54%

Change-Id: Ia6cd32744e02e36d6d8c19f402f8451101711626
Reviewed-on: https://go-review.googlesource.com/3162
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

@cespare

This comment has been minimized.

Contributor

cespare commented Mar 11, 2017

With Go 1.8, the provided sample code (the playground/bytes package) no longer escapes:

GOPATH=/tmp/gopath go build -gcflags -m bytes.go
# playground/bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: can inline NewBuffer
/tmp/gopath/src/playground/bytes/buffer.go:57: can inline (*Buffer).Read
/tmp/gopath/src/playground/bytes/buffer.go:69: can inline (*Buffer).Bytes
/tmp/gopath/src/playground/bytes/buffer.go:12: leaking param: b to result ~r1 level=0
/tmp/gopath/src/playground/bytes/buffer.go:21: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:40: (*Buffer).grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:16: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:34: make([]byte, 2 * cap(b.buf) + n) escapes to heap
/tmp/gopath/src/playground/bytes/buffer.go:49: (*Buffer).Grow ignoring self-assignment to b.buf
/tmp/gopath/src/playground/bytes/buffer.go:44: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:52: leaking param content: p
/tmp/gopath/src/playground/bytes/buffer.go:57: leaking param content: b
/tmp/gopath/src/playground/bytes/buffer.go:57: (*Buffer).Read p does not escape
/tmp/gopath/src/playground/bytes/buffer.go:69: leaking param: b to result ~r0 level=1
# command-line-arguments
./bytes.go:9: inlining call to bytes.NewBuffer
./bytes.go:13: inlining call to (*bytes.Buffer).Read
./bytes.go:18: inlining call to (*bytes.Buffer).Bytes
./bytes.go:8: make([]byte, 4) escapes to heap
./bytes.go:11: main myBuf does not escape
./bytes.go:11: main ([]byte)("test") does not escape
./bytes.go:12: main make([]byte, 2) does not escape
./bytes.go:13: main myBuf does not escape
./bytes.go:14: main myBuf does not escape
./bytes.go:14: main ([]byte)("00") does not escape
./bytes.go:16: main string(buf) does not escape
./bytes.go:18: main myBuf does not escape
./bytes.go:18: main string(([]byte)(~r0)) does not escape

However, a bytes.Buffer still seems to always escape if you call its WriteXxx methods, so the root issue persists.

@cespare cespare changed the title from cmd/gc, bytes: re-slicing causes bytes.Buffer to always allocate on heap to cmd/compile, bytes: bootstrap array causes bytes.Buffer to always be heap-allocated Mar 11, 2017

@cespare

This comment has been minimized.

Contributor

cespare commented Mar 11, 2017

It's the use of the bootstrap array that's forcing bytes.Buffer to always escape now, from what I can tell. Here's a very simple repro:

package main

type B struct {
	buf     []byte
	storage [64]byte
}

func (b *B) X() {
	b.buf = b.storage[:10]
}

func main() {
	var b B
	b.X()
}
./test.go:8: can inline (*B).X
./test.go:12: can inline main
./test.go:14: inlining call to (*B).X
./test.go:9: b.storage escapes to heap
./test.go:8: leaking param: b
./test.go:14: b.storage escapes to heap
./test.go:14: b escapes to heap
./test.go:13: moved to heap: b

Is it the case that any self-referencing pointers foil escape analysis? For instance, it also escapes if you do this:

type B struct {
	p *int
	n int
}

func (b *B) X() {
	b.p = &b.n
}

cc @randall77 for escape analysis thoughts

@lukescott I took the liberty of retitling the issue given that the re-slicing thing was fixed but the underlying problem of bytes.Buffer always being heap-allocated was not (you also discussed this in the now-closed #7661).

@randall77

This comment has been minimized.

Contributor

randall77 commented Mar 14, 2017

I don't know why this would cause an escape. Probably a tricky corner case? Cycles in the escapes-to graph?
@drchase might know.

@dr2chase

This comment has been minimized.

Contributor

dr2chase commented Mar 14, 2017

I don't "know", but I suspect. Pretty sure that forms a cycle in the graph, which can look like an escape for various reasons.

@gopherbot

This comment has been minimized.

gopherbot commented Jan 9, 2018

Change https://golang.org/cl/86976 mentions this issue: strings: prevent copyCheck from forcing Builder to escape and allocate

gopherbot pushed a commit that referenced this issue Jan 9, 2018

strings: prevent copyCheck from forcing Builder to escape and allocate
All credit and blame goes to Ian for this suggestion, copied from the
runtime.

Fixes #23382
Updates #7921

Change-Id: I3d5a9ee4ab730c87e0f3feff3e7fceff9bcf9e18
Reviewed-on: https://go-review.googlesource.com/86976
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot

This comment has been minimized.

gopherbot commented Sep 4, 2018

Change https://golang.org/cl/133375 mentions this issue: cmd/compile/internal/gc: handle array slice self-assign in esc.go

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Sep 4, 2018

It's hard to solve this issue without any API changes just with improvements to the escape analysis.
Although https://golang.org/cl/133375 does make it possible to implement "proper" bytes.Buffer outside of golang stdlib.

See https://github.com/intel-go/bytebuf.

The only change to buffer.go:

type Buffer struct {
	buf       []byte   // contents are the bytes buf[off : len(buf)]
	off       int      // read at &buf[off], write at &buf[len(buf)]
- 	bootstrap [64]byte // memory to hold first slice; helps small buffers avoid allocation.
+ 	bootstrap *[64]byte // memory to hold first slice; helps small buffers avoid allocation.
	lastRead  readOp   // last read operation, so that Unread* can work correctly.
}

And we get these numbers:

name            old time/op    new time/op    delta
String/empty-8     138ns ±13%      24ns ± 0%   -82.94%  (p=0.000 n=10+8)
String/5-8         186ns ±11%      60ns ± 1%   -67.82%  (p=0.000 n=10+10)
String/64-8        225ns ±10%     108ns ± 6%   -52.26%  (p=0.000 n=10+10)
String/128-8       474ns ±17%     338ns ±13%   -28.57%  (p=0.000 n=10+10)
String/1024-8      889ns ± 0%     740ns ± 1%   -16.78%  (p=0.000 n=9+10)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%        5B ± 0%   -95.73%  (p=0.000 n=10+10)
String/64-8         176B ± 0%       64B ± 0%   -63.64%  (p=0.000 n=10+10)
String/128-8        368B ± 0%      256B ± 0%   -30.43%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/64-8         2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/128-8        3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
@rasky

This comment has been minimized.

Member

rasky commented Sep 5, 2018

@Quasilyte there's something I don't understand. The theory behind the bootstrap array was to avoid an allocation, by having the initial slice point to it. Unfortunately, our escape analysis is unable to prove that the buffer does not escape (would you be able to explain why, what is the limit?), so we always got 1 allocation (the bytes.Buffer instance).

Your change makes the buffer to be a pointer to an array, so that (bytes.Buffer) does not escape, but it forces an allocation for the bootstrap buffer.

So shouldn't it be the same?

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Sep 5, 2018

@rasky trick is that bootstrap array allocated with new may be proven as non-escaping with the current escape analysis. So, it does achieve the initial goal of avoiding heap allocations for local buffers that may benefit from small buffer optimization.

@opennota

This comment has been minimized.

opennota commented Sep 6, 2018

Without applying CL133375 and with this patch to buffer.go (keeping zero Buffer value usable), I get these numbers in the bytebuf benchmarks:

name                         old time/op    new time/op    delta
String/bytes.Buffer/empty-4    64.1ns ± 0%    10.9ns ± 1%   -83.07%  (p=0.000 n=8+9)
String/bytes.Buffer/5-4        89.3ns ± 2%    66.5ns ± 1%   -25.52%  (p=0.000 n=10+10)
String/bytes.Buffer/64-4        103ns ± 0%      83ns ± 1%   -19.78%  (p=0.000 n=10+10)
String/bytes.Buffer/128-4       235ns ± 1%     175ns ± 2%   -25.57%  (p=0.000 n=9+10)
String/bytes.Buffer/1024-4      591ns ± 1%     547ns ± 2%    -7.56%  (p=0.000 n=8+9)

name                         old alloc/op   new alloc/op   delta
String/bytes.Buffer/empty-4      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/bytes.Buffer/5-4          117B ± 0%       69B ± 0%   -41.03%  (p=0.000 n=10+10)
String/bytes.Buffer/64-4         176B ± 0%      128B ± 0%   -27.27%  (p=0.000 n=10+10)
String/bytes.Buffer/128-4        368B ± 0%      256B ± 0%   -30.43%  (p=0.000 n=10+10)
String/bytes.Buffer/1024-4     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name                         old allocs/op  new allocs/op  delta
String/bytes.Buffer/empty-4      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/bytes.Buffer/5-4          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/bytes.Buffer/64-4         2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/bytes.Buffer/128-4        3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)
String/bytes.Buffer/1024-4       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

The bytes package benchmarks aren't so rosy, though.

@reezer

This comment has been minimized.

reezer commented Sep 6, 2018

Since this seems to be a big improvement for certain scenarios, but requires an API change, maybe this change would be a candidate for Go 2.

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Sep 6, 2018

@reezer, please, have some more patience.

What @opennota described is almost an optimal thing for go1.
I'll be back soon with explanations and, probably, a CL.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 6, 2018

Change https://golang.org/cl/133715 mentions this issue: bytes: remove bootstrap array from Buffer

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Sep 6, 2018

@opennota what you're doing is essentially a make([]byte), but in a slightly more convoluted way.
We can remove bootstrap field completely and replace it with make.

The main benefit from not having a bootstrap (or having it as a pointer) is that b receiver
stops leaking from Buffer.grow method. Since Buffer.grow can't be inlined, it's important
to not have receiver leaking in order to make stack allocated buffer possible.

It also opens some new optimization possibilities described below.


Results for benchmarks from bytebuf package (comparing only old and new bytes.Buffer):

name            old time/op    new time/op    delta
String/empty-8     132ns ±14%      21ns ± 0%   -84.30%  (p=0.000 n=10+10)
String/5-8         190ns ±12%     130ns ± 5%   -31.62%  (p=0.000 n=10+9)
String/64-8        205ns ±15%     166ns ±13%   -18.79%  (p=0.000 n=10+10)
String/1024-8      831ns ± 1%     780ns ± 0%    -6.09%  (p=0.000 n=8+7)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%       69B ± 0%   -41.03%  (p=0.000 n=10+10)
String/64-8         176B ± 0%      128B ± 0%   -27.27%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.05kB ± 0%    -5.19%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/64-8         2.00 ± 0%      2.00 ± 0%      ~     (all equal)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

With array self-assignment removed, we don't have a leaking param in bytes.Buffer.grow.
This makes it possible to do a fancy pre-allocation which will reside on stack for buffers that do not escape:

- var buf bytes.Buffer
+ buf := bytes.NewBuffer(make([]byte, 0, 64))

Note that one can use different capacity, so we can effectively have bootstrap "array" of more than 64 bytes.
And given that we're not leaking anymore, it really does improve things:

name            old time/op    new time/op    delta
String/empty-8     132ns ±14%      26ns ± 0%   -80.47%  (p=0.000 n=10+8)
String/5-8         190ns ±12%      53ns ± 1%   -71.95%  (p=0.000 n=10+10)
String/64-8        205ns ±15%      99ns ± 9%   -51.62%  (p=0.000 n=10+10)
String/1024-8      831ns ± 1%     931ns ±20%      ~     (p=0.500 n=8+10)

name            old alloc/op   new alloc/op   delta
String/empty-8      112B ± 0%        0B       -100.00%  (p=0.000 n=10+10)
String/5-8          117B ± 0%        5B ± 0%   -95.73%  (p=0.000 n=10+10)
String/64-8         176B ± 0%       64B ± 0%   -63.64%  (p=0.000 n=10+10)
String/1024-8     2.16kB ± 0%    2.18kB ± 0%    +0.74%  (p=0.000 n=10+10)

name            old allocs/op  new allocs/op  delta
String/empty-8      1.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
String/5-8          2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/64-8         2.00 ± 0%      1.00 ± 0%   -50.00%  (p=0.000 n=10+10)
String/1024-8       3.00 ± 0%      2.00 ± 0%   -33.33%  (p=0.000 n=10+10)

Note that we do less allocations for workloads that fit the slice capacity.

Key points of the new implementation:

  • Zero value bytes.Buffer performs better than before
  • You can have trully stack-allocated buffer, and it's not even limited to 64 bytes
  • The unsafe.Sizeof(bytes.Buffer{}) is reduced significantly
  • Empty writes don't cause allocations

There were no big reasons to use bytes.NewBuffer before. Now there are.


bytes.Buffer benchmark results (go test -v -count=10 -bench='ReadString|WriteByte|WriteRune|Buffer' -benchmem bytes):

name                       old time/op    new time/op    delta
ReadString-8                 9.20µs ± 1%    9.22µs ± 1%     ~     (p=0.148 n=10+10)
WriteByte-8                  28.1µs ± 0%    26.2µs ± 0%   -6.78%  (p=0.000 n=10+10)
WriteRune-8                  64.9µs ± 0%    65.0µs ± 0%   +0.16%  (p=0.000 n=10+10)
BufferNotEmptyWriteRead-8     469µs ± 0%     461µs ± 0%   -1.76%  (p=0.000 n=9+10)
BufferFullSmallReads-8        108µs ± 0%     108µs ± 0%   -0.21%  (p=0.000 n=10+10)

name                       old speed      new speed      delta
ReadString-8               3.56GB/s ± 1%  3.55GB/s ± 1%     ~     (p=0.165 n=10+10)
WriteByte-8                 146MB/s ± 0%   156MB/s ± 0%   +7.26%  (p=0.000 n=9+10)
WriteRune-8                 189MB/s ± 0%   189MB/s ± 0%   -0.16%  (p=0.000 n=10+10)

name                       old alloc/op   new alloc/op   delta
ReadString-8                 32.8kB ± 0%    32.8kB ± 0%     ~     (all equal)
WriteByte-8                   0.00B          0.00B          ~     (all equal)
WriteRune-8                   0.00B          0.00B          ~     (all equal)
BufferNotEmptyWriteRead-8    4.72kB ± 0%    4.67kB ± 0%   -1.02%  (p=0.000 n=10+10)
BufferFullSmallReads-8       3.44kB ± 0%    3.33kB ± 0%   -3.26%  (p=0.000 n=10+10)

name                       old allocs/op  new allocs/op  delta
ReadString-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
WriteByte-8                    0.00           0.00          ~     (all equal)
WriteRune-8                    0.00           0.00          ~     (all equal)
BufferNotEmptyWriteRead-8      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
BufferFullSmallReads-8         3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)

gopherbot pushed a commit that referenced this issue Sep 6, 2018

bytes: remove bootstrap array from Buffer
Rationale: small buffer optimization does not work and it has
made things slower since 2014. Until we can make it work,
we should prefer simpler code that also turns out to be more
efficient.

With this change, it's possible to use
NewBuffer(make([]byte, 0, bootstrapSize)) to get the desired
stack-allocated initial buffer since escape analysis can
prove the created slice to be non-escaping.

New implementation key points:

    - Zero value bytes.Buffer performs better than before
    - You can have a truly stack-allocated buffer, and it's not even limited to 64 bytes
    - The unsafe.Sizeof(bytes.Buffer{}) is reduced significantly
    - Empty writes don't cause allocations

Buffer benchmarks from bytes package:

    name                       old time/op    new time/op    delta
    ReadString-8                 9.20µs ± 1%    9.22µs ± 1%     ~     (p=0.148 n=10+10)
    WriteByte-8                  28.1µs ± 0%    26.2µs ± 0%   -6.78%  (p=0.000 n=10+10)
    WriteRune-8                  64.9µs ± 0%    65.0µs ± 0%   +0.16%  (p=0.000 n=10+10)
    BufferNotEmptyWriteRead-8     469µs ± 0%     461µs ± 0%   -1.76%  (p=0.000 n=9+10)
    BufferFullSmallReads-8        108µs ± 0%     108µs ± 0%   -0.21%  (p=0.000 n=10+10)

    name                       old speed      new speed      delta
    ReadString-8               3.56GB/s ± 1%  3.55GB/s ± 1%     ~     (p=0.165 n=10+10)
    WriteByte-8                 146MB/s ± 0%   156MB/s ± 0%   +7.26%  (p=0.000 n=9+10)
    WriteRune-8                 189MB/s ± 0%   189MB/s ± 0%   -0.16%  (p=0.000 n=10+10)

    name                       old alloc/op   new alloc/op   delta
    ReadString-8                 32.8kB ± 0%    32.8kB ± 0%     ~     (all equal)
    WriteByte-8                   0.00B          0.00B          ~     (all equal)
    WriteRune-8                   0.00B          0.00B          ~     (all equal)
    BufferNotEmptyWriteRead-8    4.72kB ± 0%    4.67kB ± 0%   -1.02%  (p=0.000 n=10+10)
    BufferFullSmallReads-8       3.44kB ± 0%    3.33kB ± 0%   -3.26%  (p=0.000 n=10+10)

    name                       old allocs/op  new allocs/op  delta
    ReadString-8                   1.00 ± 0%      1.00 ± 0%     ~     (all equal)
    WriteByte-8                    0.00           0.00          ~     (all equal)
    WriteRune-8                    0.00           0.00          ~     (all equal)
    BufferNotEmptyWriteRead-8      3.00 ± 0%      3.00 ± 0%     ~     (all equal)
    BufferFullSmallReads-8         3.00 ± 0%      2.00 ± 0%  -33.33%  (p=0.000 n=10+10)

The most notable thing in go1 benchmarks is reduced allocs in HTTPClientServer (-1 alloc):

    HTTPClientServer-8           64.0 ± 0%      63.0 ± 0%  -1.56%  (p=0.000 n=10+10)

For more explanations and benchmarks see the referenced issue.

Updates #7921

Change-Id: Ica0bf85e1b70fb4f5dc4f6a61045e2cf4ef72aa3
Reviewed-on: https://go-review.googlesource.com/133715
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@cespare

This comment has been minimized.

Contributor

cespare commented Sep 6, 2018

I had retitled this bug to be about bytes.Buffer's bootstrap array, so now that we got rid of that I suppose we can close this bug.

The underlying issue of self-referential structures tricking escape analysis (as demonstrated in #7921 (comment)) remains, however.

gopherbot pushed a commit that referenced this issue Sep 17, 2018

cmd/compile/internal/gc: handle array slice self-assign in esc.go
Instead of skipping all OSLICEARR, skip only ones with non-pointer
array type. For pointers to arrays, it's safe to apply the
self-assignment slicing optimizations.

Refactored the matching code into separate function for readability.

This is an extension to already existing optimization.

On its own, it does not improve any code under std, but
it opens some new optimization opportunities. One
of them is described in the referenced issue.

Updates #7921

Change-Id: I08ac660d3ef80eb15fd7933fb73cf53ded9333ad
Reviewed-on: https://go-review.googlesource.com/133375
Run-TryBot: Iskander Sharipov <iskander.sharipov@intel.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment