Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: self assigned append should not escape #51462

Open
dsnet opened this issue Mar 3, 2022 · 8 comments
Open

cmd/compile: self assigned append should not escape #51462

dsnet opened this issue Mar 3, 2022 · 8 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Mar 3, 2022

Using go1.17.5.

In the following snippet:

bb := &Buffer{buf: make([]byte, 0, 64)}
bb.buf = append(bb.buf)

the buffer for make([]byte, 0, 64) escapes to the heap because escape analysis cannot seem to determine that bb.buf = append(bb.buf) does not cause buf to escape.

I expect the code above to be able to stack allocate make([]byte, 0, 64).

\cc @josharian @mdempsky

@josharian theorizes that we might need a special case in https://github.com/golang/go/blame/master/src/cmd/compile/internal/escape/utils.go

@randall77
Copy link
Contributor

Dup of #27772 ?

@dsnet
Copy link
Member Author

dsnet commented Mar 3, 2022

Dup of #27772 ?

Possibly. #27772 seems like a more general issue, while this issue targets a specific case that seems more tenable to fixing in the short term.

@beoran
Copy link

beoran commented Mar 5, 2022

In this case, the append does nothing, though. I feel that this is a case of an incorrect use of append, so that doesn't matter much.

@josharian
Copy link
Contributor

I believe that it also reproduces with non-trivial uses of append, and that making it non-trivial doesn't alter the escape analysis. This showed up in practice in https://go-review.googlesource.com/c/go/+/349994.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 7, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Mar 7, 2022
@Jorropo
Copy link
Member

Jorropo commented Mar 8, 2022

In this case, the append does nothing, though. I feel that this is a case of an incorrect use of append, so that doesn't matter much.

I sometime see things like this:

// assume b, c and d sizes are known and small enough
a := make([]byte, len(b)+len(c)+len(d))[:0]
a = append(a, b...)
a = append(a, c...)
a = append(a, d...)

It is mesurabely faster than initialising like this:

var a []byte // = nil

While being more readable and less error prone than a copy based version (even tho copy would be somewhat faster here).

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/349994 mentions this issue: bytes: rely on runtime.growslice for growing

gopherbot pushed a commit that referenced this issue Mar 10, 2022
Rather than naively making a slice of capacity 2*c+n,
rely on the append(..., make(...)) pattern to allocate a
slice that aligns up to the closest size class.

Performance:
	name                          old time/op    new time/op    delta
	BufferWriteBlock/N4096       3.03µs ± 6%    2.04µs ± 6%  -32.60%  (p=0.000 n=10+10)
	BufferWriteBlock/N65536      47.8µs ± 6%    28.1µs ± 2%  -41.32%  (p=0.000 n=9+8)
	BufferWriteBlock/N1048576     844µs ± 7%     510µs ± 5%  -39.59%  (p=0.000 n=8+9)

	name                          old alloc/op   new alloc/op   delta
	BufferWriteBlock/N4096       12.3kB ± 0%     7.2kB ± 0%  -41.67%  (p=0.000 n=10+10)
	BufferWriteBlock/N65536       258kB ± 0%     130kB ± 0%  -49.60%  (p=0.000 n=10+10)
	BufferWriteBlock/N1048576    4.19MB ± 0%    2.10MB ± 0%  -49.98%  (p=0.000 n=10+8)

	name                          old allocs/op  new allocs/op  delta
	BufferWriteBlock/N4096         3.00 ± 0%      3.00 ± 0%     ~     (all equal)
	BufferWriteBlock/N65536        7.00 ± 0%      7.00 ± 0%     ~     (all equal)
	BufferWriteBlock/N1048576      11.0 ± 0%      11.0 ± 0%     ~     (all equal)

The performance is faster since the growth rate is capped at 2x,
while previously it could grow by amounts potentially much greater than 2x,
leading to significant amounts of memory waste and extra copying.

Credit goes to Martin Möhrmann for suggesting the
append(b, make([]T, n)...) pattern.

Fixes #42984
Updates #51462

Change-Id: I7b23f75dddbf53f8b8b93485bb1a1fff9649b96b
Reviewed-on: https://go-review.googlesource.com/c/go/+/349994
Trust: Joseph Tsai <joetsai@digital-static.net>
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@quasilyte
Copy link
Contributor

Leaving it here for the completeness https://go-review.googlesource.com/c/go/+/370578

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/469556 mentions this issue: encoding/json: unify encodeState.string and encodeState.stringBytes

gopherbot pushed a commit that referenced this issue Feb 24, 2023
This is part of the effort to reduce direct reliance on bytes.Buffer
so that we can use a buffer with better pooling characteristics.

Unify these two methods as a single version that uses generics
to reduce duplicated logic. Unfortunately, we lack a generic
version of utf8.DecodeRune (see #56948), so we cast []byte to string.
The []byte variant is slightly slower for multi-byte unicode since
casting results in a stack-allocated copy operation.
Fortunately, this code path is used only for TextMarshalers.
We can also delete TestStringBytes, which exists to ensure
that the two duplicate implementations remain in sync.

Performance:

    name              old time/op    new time/op    delta
    CodeEncoder          399µs ± 2%     409µs ± 2%   +2.59%  (p=0.000 n=9+9)
    CodeEncoderError     450µs ± 1%     451µs ± 2%     ~     (p=0.684 n=10+10)
    CodeMarshal          553µs ± 2%     562µs ± 3%     ~     (p=0.075 n=10+10)
    CodeMarshalError     733µs ± 3%     737µs ± 2%     ~     (p=0.400 n=9+10)
    EncodeMarshaler     24.9ns ±12%    24.1ns ±13%     ~     (p=0.190 n=10+10)
    EncoderEncode       12.3ns ± 3%    14.7ns ±20%     ~     (p=0.315 n=8+10)

    name              old speed      new speed      delta
    CodeEncoder       4.87GB/s ± 2%  4.74GB/s ± 2%   -2.53%  (p=0.000 n=9+9)
    CodeEncoderError  4.31GB/s ± 1%  4.30GB/s ± 2%     ~     (p=0.684 n=10+10)
    CodeMarshal       3.51GB/s ± 2%  3.46GB/s ± 3%     ~     (p=0.075 n=10+10)
    CodeMarshalError  2.65GB/s ± 3%  2.63GB/s ± 2%     ~     (p=0.400 n=9+10)

    name              old alloc/op   new alloc/op   delta
    CodeEncoder          327B ±347%     447B ±232%  +36.93%  (p=0.034 n=9+10)
    CodeEncoderError      142B ± 1%      143B ± 0%     ~     (p=1.000 n=8+7)
    CodeMarshal         1.96MB ± 2%    1.96MB ± 2%     ~     (p=0.468 n=10+10)
    CodeMarshalError    2.04MB ± 3%    2.03MB ± 1%     ~     (p=0.971 n=10+10)
    EncodeMarshaler      4.00B ± 0%     4.00B ± 0%     ~     (all equal)
    EncoderEncode        0.00B          0.00B          ~     (all equal)

    name              old allocs/op  new allocs/op  delta
    CodeEncoder           0.00           0.00          ~     (all equal)
    CodeEncoderError      4.00 ± 0%      4.00 ± 0%     ~     (all equal)
    CodeMarshal           1.00 ± 0%      1.00 ± 0%     ~     (all equal)
    CodeMarshalError      6.00 ± 0%      6.00 ± 0%     ~     (all equal)
    EncodeMarshaler       1.00 ± 0%      1.00 ± 0%     ~     (all equal)
    EncoderEncode         0.00           0.00          ~     (all equal)

There is a very slight performance degradation for CodeEncoder
due to an increase in allocation sizes. However, the number of allocations
did not change. This is likely due to remote effects of the growth rate
differences between bytes.Buffer and the builtin append function.
We shouldn't overly rely on the growth rate of bytes.Buffer anyways
since that is subject to possibly change in #51462.
As the benchtime increases, the alloc/op goes down indicating
that the amortized memory cost is fixed.

Updates #27735

Change-Id: Ie35e480e292fe082d7986e0a4d81212c1d4202b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/469556
Run-TryBot: Joseph Tsai <joetsai@digital-static.net>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Auto-Submit: Joseph Tsai <joetsai@digital-static.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests

8 participants