-
Notifications
You must be signed in to change notification settings - Fork 17.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytes: bytes.Clone(), using copy is faster than append. #55905
Comments
CC @icholy, @martisch, @ianlancetaylor. |
The title seems to suggest that |
Generally append needs to find what the smallest allocation size class is that fits the slize and then may allocate a larger slice with more capacity then the original and then may need to zero additional space. So it will often do more work than make+copy with same length as it provides an extra bonus capacity that would otherwise be unused memory. In addition make+copy is optimized by the compiler: https://go-review.git.corp.google.com/c/go/+/146719 While we can make Same issue just for slices.Clone: #53643 |
"make+copy" will also find the smallest allocation size class. So the actual reason why "append" is slower is it will zero additional space. Now the "make+copy" optimization has many restrictions. Calling |
For "make+copy" and append the allocator will find the smallest allocation size class but in addition "append" does it explicitly before calling the allocator because the allocator has no mode to tell allocate as much as possible elements that fit this type for a slice and report back how much was allocated. "make+copy" just uses the length as is to compute the value for mallocgc: Line 51 in 223a563
"append" first calculates the size class explicitly and the length that fits in it before calling mallocgc: Line 213 in 223a563
Using roundupsize the "append" searches explicitly for the size class before using the allocator which "make+copy" does not. In general growslice used for append is less specialized than "make+copy" (in the general use where its optimized) and computes more parameters and has to take care of different cases (no additional capacity or additional capacity, exisiting append to slice fits added items) that "make+copy" does not. This is both because the compiler specialises it more to different cases as well as it generally does need todo less. If this is important enough and does not degrade performance in the general case the following can be evaluated:
On the high level I think we first need consensus if the issue is to be:
I would think we can deduplicate to the issue #53643 and make a comment there this would also benefit bytes.Clone. |
My idea is that since the internal implementation of Clone(), uses append, it should be optimized as much as possible, because this function will be called a lot. |
Benchmark_clone
Benchmark_clone-6 35159740 74.59 ns/op 96 B/op 1 allocs/op
Benchmark_cloneV2
Benchmark_cloneV2-6 43229692 59.33 ns/op 96 B/op 1 allocs/op
Benchmark_cloneV3
Benchmark_cloneV3-6 39171712 71.48 ns/op 96 B/op 1 allocs/op
The text was updated successfully, but these errors were encountered: