Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: 10-30% speed regression in Contains from 1.13 to tip #35686

Open
zikaeroh opened this issue Nov 19, 2019 · 11 comments

Comments

@zikaeroh
Copy link

@zikaeroh zikaeroh commented Nov 19, 2019

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/amd64
go version devel +8cf5293c Tue Nov 19 06:10:03 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jake/.cache/go-build"
GOENV="/home/jake/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/jake/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/jake/testproj/strtest/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build998582668=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package strtest

import (
	"strconv"
	"strings"
	"testing"
)

var sink1 bool

func BenchmarkContains(b *testing.B) {
	tests := []string{
		"This is just a cool test.",
		"This is just a cool test. (Or is it?)",
		"Hello, (_USER_)!",
		"RIP count reset to (_VARS_RIP-(_GAME_CLEAN_)_SET_0_), neat.",
		"(_USER_) rubs on (_PARAMETER_) 's booty! PogChamp / (_(_|",
	}

	for i, test := range tests {
		b.Run(strconv.Itoa(i), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				sink1 = strings.Contains(test, "(_")
			}
		})
	}
}

Benchmark this against 1.13 and tip.

What did you expect to see?

No difference in performance (or hopefully better).

What did you see instead?

Contains is consistently slower. benchstat-ing the above with go test -run=- -bench . -cpu=1,4 -count=10 .:

name          old time/op  new time/op   delta
Contains/0    10.1ns ± 2%   11.4ns ± 1%  +12.94%  (p=0.000 n=10+10)
Contains/0-4  10.1ns ± 0%   11.4ns ± 2%  +12.57%  (p=0.000 n=7+10)
Contains/1    12.1ns ± 1%   13.0ns ± 1%   +7.56%  (p=0.000 n=9+9)
Contains/1-4  12.1ns ± 2%   12.8ns ± 2%   +5.48%  (p=0.000 n=8+10)
Contains/2    7.70ns ± 3%   9.82ns ± 6%  +27.52%  (p=0.000 n=10+10)
Contains/2-4  7.82ns ± 3%   9.64ns ± 2%  +23.25%  (p=0.000 n=9+10)
Contains/3    9.31ns ± 1%  11.01ns ± 1%  +18.22%  (p=0.000 n=9+10)
Contains/3-4  9.45ns ± 3%  11.11ns ± 2%  +17.56%  (p=0.000 n=9+9)
Contains/4    7.61ns ± 1%   9.87ns ± 1%  +29.60%  (p=0.000 n=8+9)
Contains/4-4  7.81ns ± 3%   9.73ns ± 3%  +24.69%  (p=0.000 n=10+9)

This has come out of some performance testing between 1.13 and tip in a project I'm working on which shows a consistent 6% regression for a hand-written parser (https://github.com/hortbot/hortbot/blob/master/internal/cbp/cbp_test.go), and this was the first thing I noticed while comparing the profiles. The first thing the parser does is check for one of two tokens before doing any work, with the assumption that if neither are present, there's no work to be done.

In the grand scheme of things, maybe a few extra nanoseconds isn't such a big deal, but I haven't tested other functions in strings quite yet.

@mvdan

This comment has been minimized.

Copy link
Member

@mvdan mvdan commented Nov 19, 2019

Perhaps bisect what commit introduced this performance regression? It might have been a change in strings or bytes, or something in cmd/compile.

@ianlancetaylor ianlancetaylor added this to the Go1.14 milestone Nov 19, 2019
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2019

@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 19, 2019

I tried to bisect this and got CL 206938, but I don't feel confident that my bisect was high quality. I'm going to try and retest later on a more consistent machine to make sure I'm not measuring this wrong.

Newer test was:

package strtest

import (
	"strconv"
	"strings"
	"testing"
)

var (
	sink1 bool
	sink2 bool
)

func BenchmarkContains(b *testing.B) {
	tests := []string{
		"This is just a cool test.",
		"This is just a cool test. (Or is it?)",
		"RIP count reset to (_VARS_RIP-(_GAME_CLEAN_)_SET_0_), neat.",
	}

	for i, test := range tests {
		b.Run(strconv.Itoa(i), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				sink1 = strings.Contains(test, "(_")
				sink2 = strings.Contains(test, "_)")
			}
		})
	}
}
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 19, 2019

I'm reasonably confident that CL 206938 is not the cause of a slowdown in strings.Contains.

@opennota

This comment has been minimized.

Copy link

@opennota opennota commented Nov 20, 2019

I don't see any regression. If anything, time/op on tip is better (compared to 1.13.4):

name          old time/op  new time/op  delta
Contains/0-4  18.4ns ± 1%  18.0ns ± 0%  -2.12%  (p=0.000 n=10+7)
Contains/1-4  20.7ns ± 1%  20.5ns ± 0%  -0.87%  (p=0.000 n=10+10)
Contains/2-4  20.6ns ± 0%  20.5ns ± 0%  -0.27%  (p=0.029 n=9+9)
@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 20, 2019

I too can't reproduce this on another machine, but it's very consistent on my main dev machine. I'm working on bisecting this and trying to eliminate any outside variables that could be affecting things, and I'll reply with results (and close this if it turns out to be nothing after all).

@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 20, 2019

Another benchmark, made longer in runtime to try and make it more consistent:

package strtest

import (
	"strings"
	"testing"
)

var sink bool

func BenchmarkContains(b *testing.B) {
	vs := []string{"(_", "_)", " ", ".", "?"}
	const str = "This is just a cool test. (Or is it?)"
	for i := 0; i < b.N; i++ {
		for _, v := range vs {
			sink = strings.Contains(str, v)
		}
	}
}

Between Go 1.13.4 and 15bff20 (Oct 31st):

name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  42.1ns ± 2%  -7.16%  (p=0.000 n=10+10)

But between Go 1.13.4 and the current tip (ce7829f):

name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  46.7ns ± 3%  +2.84%  (p=0.000 n=10+10)
@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 20, 2019

@ianlancetaylor Unbelievably, I bisected again with the above code, and again got the same CL:

2d8c1995b9c4123ab7ac35a1ba1497b290755d4d is the first bad commit
commit 2d8c1995b9c4123ab7ac35a1ba1497b290755d4d
Author: Ian Lance Taylor <iant@golang.org>
Date:   Fri Nov 15 10:05:13 2019 -0800

    runtime: release timersLock while running timer

I too doubt that this is related directly to strings.Contains, but maybe there's something else going on (maybe in the testing library itself? some other goroutine?). It being a timers CL is what made me think my bisect was bad, but getting it again is suspicious.

What is also interesting is that within the same test run, I get similar results, but immediately running it again will give me another set of different but still consistent (according to benchstat) times.

At 2d8c199:

name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  49.4ns ± 2%  +8.90%  (p=0.000 n=10+10)
name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  47.0ns ± 1%  +3.45%  (p=0.000 n=10+9)
name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  47.5ns ± 2%  +4.58%  (p=0.000 n=10+10)

At 2d8c199~1:

name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  45.6ns ± 2%   ~     (p=0.432 n=10+9)
ame        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  45.0ns ± 2%   ~     (p=0.088 n=10+9)
name        old time/op  new time/op  delta
Contains-4  45.4ns ± 2%  45.3ns ± 3%   ~     (p=0.838 n=10+10)

Bisect log:

git bisect start
# bad: [ce7829fdb2f9acc6a228e0fc8093afe6d2ca989c] doc: add section for major version suffixes to module reference
git bisect bad ce7829fdb2f9acc6a228e0fc8093afe6d2ca989c
# good: [3f21c2381d9b0f0977f388cc89104f557a7d2c88] os/signal: don't ignore SIGINT in TestAtomicStop child process
git bisect good 3f21c2381d9b0f0977f388cc89104f557a7d2c88
# good: [73d20f8186a091c8d7e81b621136770981cf8e44] runtime: always use Go signal stack in non-cgo program
git bisect good 73d20f8186a091c8d7e81b621136770981cf8e44
# bad: [1046a9cb1b348a91e551ed2d1714e0926a1b04b1] cmd/link,cmd/internal/obj/riscv: implement archreloc for riscv64
git bisect bad 1046a9cb1b348a91e551ed2d1714e0926a1b04b1
# good: [fff450256a541decd2bb2060c46b788d92692faa] cmd/go: convert TestUserOverrideFlags to a script test
git bisect good fff450256a541decd2bb2060c46b788d92692faa
# good: [9bba63bbbe43ee6370367220feb46717ec50ea9c] cmd/compile: make a better bogus line for empty infinite loops
git bisect good 9bba63bbbe43ee6370367220feb46717ec50ea9c
# good: [3efa09f3862b68f0dfa5012455f475836e040e91] time: deflake TestTicker
git bisect good 3efa09f3862b68f0dfa5012455f475836e040e91
# bad: [2d8c1995b9c4123ab7ac35a1ba1497b290755d4d] runtime: release timersLock while running timer
git bisect bad 2d8c1995b9c4123ab7ac35a1ba1497b290755d4d
# first bad commit: [2d8c1995b9c4123ab7ac35a1ba1497b290755d4d] runtime: release timersLock while running timer

Feel free to retitle this issue appropriately, if needed.

@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 20, 2019

Hm, my mistake, that's not the same CL, but another timers related CL... I'm not sure what this means, then.

@martisch

This comment has been minimized.

Copy link
Member

@martisch martisch commented Nov 20, 2019

What CPU does run on your dev machine and at what CL are you building at tip?

Any CL can potentially cause the benchmark code to align differently and then benchmark the effects of branch alignment. I have seen in the past that it can matter a lot where the benchmark loop is placed in the file which can cause different alignment.

With all the side channel attacks on caching/branch prediction there can also be benchmark differences due to different microcode versions of the same CPU: https://www.phoronix.com/scan.php?page=article&item=intel-jcc-microcode&num=1

@zikaeroh

This comment has been minimized.

Copy link
Author

@zikaeroh zikaeroh commented Nov 20, 2019

i7-6600U on Linux 5.3.11, all software mitigations disabled. But, I'm on Arch where the ucode version is dated 2019-11-15, so I can roll that back and see what happens.

I'll have another crack at benchmarking and trying things this afternoon when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.