Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize append with string concatenation #56442

Open
dsnet opened this issue Oct 26, 2022 · 2 comments
Open

cmd/compile: optimize append with string concatenation #56442

dsnet opened this issue Oct 26, 2022 · 2 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Oct 26, 2022

Consider the following benchmark:

const prefix = "some sufficiently long prefix "
var something = "something"
const suffix = " some sufficiently long suffix"

var sink []byte

func BenchmarkNaive(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		sink = append(sink[:0], prefix+something+suffix...)
	}
}

There's a number of string concatenations going on, some from variables and others from constants,
which runs in:

BenchmarkNaive-24        	23387558	        45.20 ns/op	      80 B/op	       1 allocs/op

However, the benchmark is semantically equivalent to:

func BenchmarkOptimized(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		sink = append(append(append(sink[:0], prefix...), something...), suffix...)
	}
}

which runs in:

BenchmarkOptimized-24    	191343406	         6.218 ns/op	       0 B/op	       0 allocs/op

That is a 100% reduction in allocations, and a dramatic speedup.

The compiler should implicitly perform this transformation for me.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Oct 26, 2022
@dsnet dsnet added Performance and removed compiler/runtime Issues related to the Go compiler and/or runtime. labels Oct 26, 2022
@randall77 randall77 added this to the Unplanned milestone Oct 27, 2022
@go101
Copy link

go101 commented Oct 27, 2022

@dsnet
If the capacity of sink is large enough, then the optimized way is really more performant.
Othewise, it might not.

You use a global sink, so the benchmark is not very fair.

[edit]: the two ways are only semantically equivalent if the capacity of sink is large enough.

@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 27, 2022
@dsnet
Copy link
Member Author

dsnet commented Oct 27, 2022

The purpose of a global sink is to work around compiler optimizations that gets rid of the entire append. It still demonstrates my point that appending to a slice with sufficient capacity is dramatically faster as individual appends rather than a single one.

Also, whether multiple small appends is faster for empty slices is an implementation detail. The transformation can compute the total size of all the appended strings and call runtime.growslice before calling the series of appends. Thus, the first-time allocation for append is identical to BenchmarkNaive, but avoids the allocations that come from constructing the concatenated string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants