New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime,cmd/compile: string concatenation specializations for performance #27801

Open
Quasilyte opened this Issue Sep 21, 2018 · 4 comments

Comments

Projects
None yet
5 participants
@Quasilyte
Contributor

Quasilyte commented Sep 21, 2018

CL123256 started a discussion on which strings concatenation specialization we should choose (if any).

String concatenation can be made faster if we specialize concatenation with escaping result and len(args)<=5 (N) arguments. Escaping result means that we can avoid passing buf that is always nil as well as use rawstring instead of rawstringtmp. Known N means that we can unroll loops.

There are several ways (and these are not all of them):

  1. Specialize for N=2 x+y. Less code, seems like most frequent, the boost is maximal, but does not improve any other concatenation (like x+y+z which uses concatstring3).
  2. Specialize for all N, but for marginal performance boost that may not even worth it.
  3. Specialize for N={2,3,4,5}. More code.
  4. Specialize for N={2,3} covers most concatenations. Speeds up concat2 and concat3 measurably.

I've started from (1) since it's the easiest change that requires less amount of changes and gives significant performance gain. But in order to have decision extra feedback is required.

Here is comparison of (1) against unoptimized concat:

name            old time/op    new time/op    delta
Concat2-8         74.2ns ± 0%    53.5ns ± 1%  -27.95%  (p=0.000 n=9+15)
Concat3-8         94.9ns ± 0%    94.8ns ± 0%     ~     (p=0.082 n=14+15)
Concat2Empty-8    21.4ns ± 1%    14.2ns ± 0%  -33.75%  (p=0.000 n=15+14)
Concat3Empty-8    23.9ns ± 1%    23.9ns ± 1%     ~     (p=0.756 n=15+15)
[Geo mean]        43.6ns         36.2ns       -16.88%

Comparison for (4) implementation against go tip:

name            old time/op    new time/op    delta
Concat2-8         74.2ns ± 0%    66.1ns ± 1%  -10.95%  (p=0.000 n=9+15)
Concat3-8         94.9ns ± 0%    71.9ns ± 1%  -24.22%  (p=0.000 n=14+15)
Concat2Empty-8    21.4ns ± 1%    21.1ns ± 0%   -1.56%  (p=0.000 n=15+15)
Concat3Empty-8    23.9ns ± 1%    16.6ns ± 1%  -30.63%  (p=0.000 n=15+12)
[Geo mean]        43.6ns         35.9ns       -17.61%

Note that these numbers represent overhead savings, not general case strings concatenation performance boost, since length of strings determine these timings significantly.

Benchmarks:

package foo

import "testing"

//go:noinline
func concat2(x, y string) string {
	return x + y
}

//go:noinline
func concat3(x, y, z string) string {
	return x + y + z
}

var x = "foor"
var y = "2ews"
var z = ""

func BenchmarkConcat2(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat2(x, "abc")
	}
}

func BenchmarkConcat3(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat3(x, "abc", y)
	}
}

func BenchmarkConcat2Empty(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat2(x, "")
	}
}

func BenchmarkConcat3Empty(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = concat3("", x, z)
	}
}

I would like to know:

  1. That this kind of optimization is welcome.
  2. If it is desired, we need to choose which approach to take.

CC @martisch

@Quasilyte

This comment has been minimized.

Contributor

Quasilyte commented Sep 21, 2018

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Sep 24, 2018

Just adding data from CL:

For go executable, there are:

	501 concatstring2
	194 concatstring3
	55  concatstring4
@odeke-em

This comment has been minimized.

Member

odeke-em commented Nov 9, 2018

Kindly paging @randall77 @ianlancetaylor in case of some more ideas too

@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 26, 2018

I'm not particularly worried about the extra code. If this means duplicating concatstrings and concatstring[2345] for the escape vs. nonescape case, that's not a big deal.

Are you also planning on unrolling the loop in concatstrings? Also here I'm not worried about binary code size, but I worry about about maintainability of lots of copy-pasted inner loop bodies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment