cmd/compile: optimize len([]rune(string)) to just count runes #24923

inancgumus · 2018-04-18T15:51:12Z

What version of Go are you using (`go version`)?

go1.10.1

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

darwin/amd64

What did you do?

I've benchmarked len([]rune(string)) and utf8.RuneCountInString(string) and I saw that the latter performs better.

Here's the benchmark code.

Benchmark Results:

BenchmarkRunCountInString16ascii-8      100000000               11.9 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString16multi-8      20000000                63.5 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString32ascii-8      100000000               18.2 ns/op             0 B/op          0 allocs/op
BenchmarkRunCountInString32multi-8      10000000               120 ns/op               0 B/op          0 allocs/op
BenchmarkCastToRuneArray16ascii-8       50000000                26.2 ns/op             0 B/op          0 allocs/op
BenchmarkCastToRuneArray16multi-8       10000000               171 ns/op               0 B/op          0 allocs/op
BenchmarkCastToRuneArray32ascii-8       30000000                46.1 ns/op             0 B/op          0 allocs/op
BenchmarkCastToRuneArray32multi-8        5000000               322 ns/op

What did you expect to see?

Actually, I wasn't expecting len([]rune(string)) to be faster compared to utf8.RuneCountInString, then again I wanted to open this issue. I noticed that there are a lot people are using this pattern.

The text was updated successfully, but these errors were encountered:

martisch · 2018-04-18T16:06:57Z

If its used often and decided to be optimized a possible improvement would be to detect the pattern in compiler walk pass and replace it with a new runecount runtime builtin function that is:

func runeCount(s string) int {
    n := 0
    for range s {
        n++
    }
    return n
}

For even better speed on non-ascii input the decoderune function could be inlined into runeCount and tuned to not need to store the runes. Then utf8.RuneCountInString could be made to return len([]rune(s)) to use the same code. Also see: https://go-review.googlesource.com/c/go/+/33637

/cc @randall77

ianlancetaylor · 2018-04-18T16:24:26Z

In the absence of a compiler optimization, it seems clear that len([]rune(s)) would be slower, since it has to do strictly more work: create a new slice and fill in the values.

gopherbot · 2018-04-24T13:26:12Z

Change https://golang.org/cl/108985 mentions this issue: cmd/compile: optimize len([]rune(string)) to use a runtime function

perillo · 2018-05-31T15:25:54Z

I modified the original benchmark (https://play.golang.org/p/M2HHTtuHMI-) to use constant strings: https://play.golang.org/p/_sdu0fy97dm

Here is the benchmark result

goos: linux
goarch: amd64
BenchmarkRunCountInString16ascii-4   	100000000	        13.2 ns/op
BenchmarkRunCountInString16multi-4   	20000000	        82.5 ns/op
BenchmarkRunCountInString32ascii-4   	100000000	        22.7 ns/op
BenchmarkRunCountInString32multi-4   	10000000	       145 ns/op
BenchmarkCastToRuneArray16ascii-4    	1000000000	         2.37 ns/op
BenchmarkCastToRuneArray16multi-4    	1000000000	         2.37 ns/op
BenchmarkCastToRuneArray32ascii-4    	300000000	         4.15 ns/op
BenchmarkCastToRuneArray32multi-4    	300000000	         4.15 ns/op

I find the result surprising, since, after b9a59d9, the correct method to count runes in a string now results in worse performance compared to the incorrect method.

Using utf8.RuneCountInString on a constant string is probably rare, so I'm not sure if optimizing this case is worth the effort. But from b9a59d9 the code is already here.

martisch · 2018-05-31T17:15:03Z

I find the result surprising, since, after b9a59d9, the correct method to count runes in a string now results in worse performance compared to the incorrect method.

That len([]rune()) is faster on constant strings has been the case before b9a59d9 since at least go1.4 (see benchmarks) and has not been changed by the cl/108985 :
BenchmarkRunCountInString16ascii 20000000 71.2 ns/op
BenchmarkRunCountInString16multi 5000000 255 ns/op
BenchmarkRunCountInString32ascii 10000000 139 ns/op
BenchmarkRunCountInString32multi 3000000 495 ns/op
BenchmarkCastToRuneArray16ascii 500000000 3.70 ns/op
BenchmarkCastToRuneArray16multi 500000000 3.69 ns/op
BenchmarkCastToRuneArray32ascii 200000000 6.40 ns/op
BenchmarkCastToRuneArray32multi 200000000 6.97 ns/op

And is due to the compiler optimizing []rune(constantstring) at compile time in:

go/src/cmd/compile/internal/gc/typecheck.go

Line 1720 in 4258b43

case OSTRARRAYRUNE:

Update:
looked a bit further in the code and that optimization has been there since the conversion from string to int slice was introduced: 3910161 26 Feb 2010

Special casing utf8.RuneCountInString in the same manner by detecting the function name in the compiler would mean that copying the utf8 code to another package will result in performance loss which would also be surprising.

perillo · 2018-05-31T20:22:52Z

@martisch Thanks for the clarification. Special cases are no good, but also the fact that len([]rune(s)) and utf8.RuneCountInString performances with constant and non constant strings are so different is probably no good.

By the way, it seems there is a typo in the comment for the isRuneCount function.
In These are optimized into a call to runtime.runecount, runecount should be replaced by countrunes. Should I submit a new issue?

gopherbot · 2018-06-01T17:50:22Z

Change https://golang.org/cl/115836 mentions this issue: cmd/compile: fix comment to reference runtime.countrunes

Updates #24923 Change-Id: Ie5a1b54b023381b58df618080f3d742a50d46d8b Reviewed-on: https://go-review.googlesource.com/115836 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

dominikh mentioned this issue Apr 18, 2018

staticcheck: recommend RuneCountInString over len([]rune(s)) dominikh/go-tools#279

Closed

mvdan changed the title ~~len([]rune(string)) performs worst than utf8.RuneCountInString~~ cmd/compile: len([]rune(string)) performs worst than utf8.RuneCountInString Apr 18, 2018

mvdan added the Performance label Apr 18, 2018

ianlancetaylor changed the title ~~cmd/compile: len([]rune(string)) performs worst than utf8.RuneCountInString~~ cmd/compile: optimize len([]rune(string)) to just count runes Apr 18, 2018

ianlancetaylor added this to the Unplanned milestone Apr 18, 2018

ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 18, 2018

gopherbot closed this as completed in b9a59d9 May 6, 2018

golang locked and limited conversation to collaborators Jun 1, 2019

gopherbot added the FrozenDueToAge label Jun 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: optimize len([]rune(string)) to just count runes #24923

cmd/compile: optimize len([]rune(string)) to just count runes #24923

inancgumus commented Apr 18, 2018 •

edited

Loading

martisch commented Apr 18, 2018 •

edited

Loading

ianlancetaylor commented Apr 18, 2018

gopherbot commented Apr 24, 2018

perillo commented May 31, 2018

martisch commented May 31, 2018 •

edited

Loading

perillo commented May 31, 2018

gopherbot commented Jun 1, 2018

cmd/compile: optimize len([]rune(string)) to just count runes #24923

cmd/compile: optimize len([]rune(string)) to just count runes #24923

Comments

inancgumus commented Apr 18, 2018 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

martisch commented Apr 18, 2018 • edited Loading

ianlancetaylor commented Apr 18, 2018

gopherbot commented Apr 24, 2018

perillo commented May 31, 2018

martisch commented May 31, 2018 • edited Loading

perillo commented May 31, 2018

gopherbot commented Jun 1, 2018

inancgumus commented Apr 18, 2018 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

martisch commented Apr 18, 2018 •

edited

Loading

martisch commented May 31, 2018 •

edited

Loading