Description
CL123256 started a discussion on which strings concatenation specialization we should choose (if any).
String concatenation can be made faster if we specialize concatenation with escaping result and len(args)<=5 (N) arguments. Escaping result means that we can avoid passing buf
that is always nil
as well as use rawstring
instead of rawstringtmp
. Known N means that we can unroll loops.
There are several ways (and these are not all of them):
- Specialize for N=2
x+y
. Less code, seems like most frequent, the boost is maximal, but does not improve any other concatenation (likex+y+z
which usesconcatstring3
). - Specialize for all N, but for marginal performance boost that may not even worth it.
- Specialize for N={2,3,4,5}. More code.
- Specialize for N={2,3} covers most concatenations. Speeds up concat2 and concat3 measurably.
I've started from (1) since it's the easiest change that requires less amount of changes and gives significant performance gain. But in order to have decision extra feedback is required.
Here is comparison of (1) against unoptimized concat:
name old time/op new time/op delta
Concat2-8 74.2ns ± 0% 53.5ns ± 1% -27.95% (p=0.000 n=9+15)
Concat3-8 94.9ns ± 0% 94.8ns ± 0% ~ (p=0.082 n=14+15)
Concat2Empty-8 21.4ns ± 1% 14.2ns ± 0% -33.75% (p=0.000 n=15+14)
Concat3Empty-8 23.9ns ± 1% 23.9ns ± 1% ~ (p=0.756 n=15+15)
[Geo mean] 43.6ns 36.2ns -16.88%
Comparison for (4) implementation against go tip:
name old time/op new time/op delta
Concat2-8 74.2ns ± 0% 66.1ns ± 1% -10.95% (p=0.000 n=9+15)
Concat3-8 94.9ns ± 0% 71.9ns ± 1% -24.22% (p=0.000 n=14+15)
Concat2Empty-8 21.4ns ± 1% 21.1ns ± 0% -1.56% (p=0.000 n=15+15)
Concat3Empty-8 23.9ns ± 1% 16.6ns ± 1% -30.63% (p=0.000 n=15+12)
[Geo mean] 43.6ns 35.9ns -17.61%
Note that these numbers represent overhead savings, not general case strings concatenation performance boost, since length of strings determine these timings significantly.
Benchmarks:
package foo
import "testing"
//go:noinline
func concat2(x, y string) string {
return x + y
}
//go:noinline
func concat3(x, y, z string) string {
return x + y + z
}
var x = "foor"
var y = "2ews"
var z = ""
func BenchmarkConcat2(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = concat2(x, "abc")
}
}
func BenchmarkConcat3(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = concat3(x, "abc", y)
}
}
func BenchmarkConcat2Empty(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = concat2(x, "")
}
}
func BenchmarkConcat3Empty(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = concat3("", x, z)
}
}
I would like to know:
- That this kind of optimization is welcome.
- If it is desired, we need to choose which approach to take.
CC @martisch