Skip to content

runtime,cmd/compile: string concatenation specializations for performance #27801

Open
@quasilyte

Description

@quasilyte

CL123256 started a discussion on which strings concatenation specialization we should choose (if any).

String concatenation can be made faster if we specialize concatenation with escaping result and len(args)<=5 (N) arguments. Escaping result means that we can avoid passing buf that is always nil as well as use rawstring instead of rawstringtmp. Known N means that we can unroll loops.

There are several ways (and these are not all of them):

  1. Specialize for N=2 x+y. Less code, seems like most frequent, the boost is maximal, but does not improve any other concatenation (like x+y+z which uses concatstring3).
  2. Specialize for all N, but for marginal performance boost that may not even worth it.
  3. Specialize for N={2,3,4,5}. More code.
  4. Specialize for N={2,3} covers most concatenations. Speeds up concat2 and concat3 measurably.

I've started from (1) since it's the easiest change that requires less amount of changes and gives significant performance gain. But in order to have decision extra feedback is required.

Here is comparison of (1) against unoptimized concat:

name            old time/op    new time/op    delta
Concat2-8         74.2ns ± 0%    53.5ns ± 1%  -27.95%  (p=0.000 n=9+15)
Concat3-8         94.9ns ± 0%    94.8ns ± 0%     ~     (p=0.082 n=14+15)
Concat2Empty-8    21.4ns ± 1%    14.2ns ± 0%  -33.75%  (p=0.000 n=15+14)
Concat3Empty-8    23.9ns ± 1%    23.9ns ± 1%     ~     (p=0.756 n=15+15)
[Geo mean]        43.6ns         36.2ns       -16.88%

Comparison for (4) implementation against go tip:

name            old time/op    new time/op    delta
Concat2-8         74.2ns ± 0%    66.1ns ± 1%  -10.95%  (p=0.000 n=9+15)
Concat3-8         94.9ns ± 0%    71.9ns ± 1%  -24.22%  (p=0.000 n=14+15)
Concat2Empty-8    21.4ns ± 1%    21.1ns ± 0%   -1.56%  (p=0.000 n=15+15)
Concat3Empty-8    23.9ns ± 1%    16.6ns ± 1%  -30.63%  (p=0.000 n=15+12)
[Geo mean]        43.6ns         35.9ns       -17.61%

Note that these numbers represent overhead savings, not general case strings concatenation performance boost, since length of strings determine these timings significantly.

Benchmarks:

package foo

import "testing"

//go:noinline
func concat2(x, y string) string {
	return x + y
}

//go:noinline
func concat3(x, y, z string) string {
	return x + y + z
}

var x = "foor"
var y = "2ews"
var z = ""

func BenchmarkConcat2(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat2(x, "abc")
	}
}

func BenchmarkConcat3(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat3(x, "abc", y)
	}
}

func BenchmarkConcat2Empty(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat2(x, "")
	}
}

func BenchmarkConcat3Empty(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat3("", x, z)
	}
}

I would like to know:

  1. That this kind of optimization is welcome.
  2. If it is desired, we need to choose which approach to take.

CC @martisch

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions