-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile,runtime: introduce growbyteslice? #49480
Comments
Seems reasonable. I wonder if there are other byte-specific cases for other builtins (maybe
|
Encoders (at least those who encode to buffer) will greatly benefit from this optimization, because they are in general something like that: func (e *Encoder) EncodeT(v T) {
e.buf = append(e.buf, start)
// ...
e.buf = append(e.buf, value...)
e.buf = append(e.buf, end)
} Upd: example |
Not sure how much performance/size win this still is, we can generally partition growslice into 3 use cases with reduced code path sizes and case switches internally:
|
Change https://golang.org/cl/363095 mentions this issue: |
Only tangentially related to the specific issue but, since |
I dont think Line 1252 in 6f327f7
|
Initially, I only thought about
This part of the change should be relatively safe and there should be no performance regressions. I did one extra step that Keith suggested and added a I tried to avoid other ideas that can be related or useful, but they require a separate investigation each, and the results can be less obvious (more tradeoff-like). I wonder if these ideas should be dug upon and reported as a separate issue (anyone is welcome to do that, I don't think that I'll go too deep in this one). In my tests, |
It seems like I rediscovered this idea independently and did a quick experiment: https://go-review.googlesource.com/c/go/+/493836
Note: that implementation doesn't omit the element type, so there might be some additional win by omitting it. Adding similar optimization for all the Line 211 in 8d5065c
[]byte , []int , []*T and "power of two" would be the important ones.
|
Unfortunately
Either way, seems promising, but definitely needs some analysis why things get slower. |
@quasilyte btw, what did you use to create the initial top 20? |
@egonelbre IIRC, it was a bincmp tool executed on a |
@quasilyte I didn't mean the difference between the binaries, but more of the output from:
Or does bincmp do that as well? |
I used a patched Go compiler to collect these metrics; there is no tool that would give you this output. If you run the same test now, you won't get the same numbers due to the Go source code changes. Nonetheless, I would guess that it wouldn't affect the distribution by a whole lot, a |
@egonelbre this is a really common tactic for stuff like this. Even easier than maintaining counters is to print every instance to stdout and use a tool like https://github.com/josharian/pct to accumulate them. (See the README there for another compiler contributor re-creating a similar tool. This is clearly a local optimum. Maybe even a global one. :P) |
This is actually the exact thing I was doing, but I was too embarrassed to admit that. :D |
Anyways some rough stats from different things:
Roughly speaking, it seems that handling |
Change https://go.dev/cl/495878 mentions this issue: |
Change https://go.dev/cl/496141 mentions this issue: |
Change https://go.dev/cl/495876 mentions this issue: |
Change https://go.dev/cl/495877 mentions this issue: |
Change https://go.dev/cl/497318 mentions this issue: |
Ok, created the necessary changes for them... the benchmark results: https://gist.github.com/egonelbre/34efef3f5c4aef2035396a2f0a90cc96 The benefit from []byte, []string seems significant. I still need to dig into the benchmarks, because from my initial testing it seems the append related tests are sensitive to code layout on my processor. |
This allows to reuse the slice cap computation across specialized growslice funcs. Updates #49480 Change-Id: Ie075d9c3075659ea14c11d51a9cd4ed46aa0e961 Reviewed-on: https://go-review.googlesource.com/c/go/+/495876 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Egon Elbre <egonelbre@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Auto-Submit: Ian Lance Taylor <iant@golang.org>
Move growslice generation to a separate func so that specialization logic can be shared. Updates #49480 Change-Id: I9ea5bb898753622d2d767546a46b4db6410dc725 Reviewed-on: https://go-review.googlesource.com/c/go/+/495877 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com>
When
append()
is compiled, this call is emited:While building
go
tool, I found out that ~24% calls togrowslice
have first argument of[]byte
.16.5% is for
[]string
. Every other combination is less frequent and is not as interesting.Here is a top20 results for
go
tool (total=10416
):Most other cases are in
1-5
range.If we add a specialized wrapper
growbyteslice
like this one:...then we can avoid passing the first argument for
[]byte
.Let's see how it affects an isolated case:
Asm listings are below, for completeness.
Why this can be a worthwhile thing even if we add an extra call in slice growing path?
I believe that slice growing is not a hot path. It we're about to make allocations and
that would dominate the execution time. If we're following the happy path when
newlen < cap
,then smaller code could help to be more code cache-friendly.
Now to some more real-world cases:
Different apps yield varying results, but the
.text
segmentsize of the binary typically goes down by 0.22%-0.61%.
The main question: is adding a new runtime function
growbyteslice
too much for this case?From one point of view, more functions -> bigger runtime code. At the same time, this is a simple
wrapper that decreases the code size more often than not without apparent performance issues.
All tests from
all.bash
seem to be passing for my prototype. I could send the CL isthis idea looks reasonable.
Before
growbyteslice
:After:
Update: makebyteslice
As pointed in the comments, it could be also worthwhile to see whether a specialized makeslice can be useful.
Top
makeslice
usages found during./make.bash
(total=4487):All 1-byte
makeslice
usages found during./make.bash
(note thatuint8
should be added tobyte
here):So, ~26% of
makeslice
calls have a byte type argument, most of the 1-byte sizes are byte/uint8 directly (named byte-sized types are a rarity).More analysis
A detailed bincmp result for a simple web server is presented below (under a spoiler).
One may notice that for some symbols, the size has actually increased.
The reason for that, I assume, is that sometimes relevant arguments happen to be in the matching registers already. That could play in both directions though. It should be treated as a random noise.
Detailed bincmp output (A LOT OF TEXT)
The text was updated successfully, but these errors were encountered: