-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: optimize len([]rune(string)) to just count runes #24923
Comments
If its used often and decided to be optimized a possible improvement would be to detect the pattern in compiler walk pass and replace it with a new runecount runtime builtin function that is:
For even better speed on non-ascii input the decoderune function could be inlined into runeCount and tuned to not need to store the runes. Then utf8.RuneCountInString could be made to return len([]rune(s)) to use the same code. Also see: https://go-review.googlesource.com/c/go/+/33637 /cc @randall77 |
In the absence of a compiler optimization, it seems clear that len([]rune(s)) would be slower, since it has to do strictly more work: create a new slice and fill in the values. |
Change https://golang.org/cl/108985 mentions this issue: |
I modified the original benchmark (https://play.golang.org/p/M2HHTtuHMI-) to use constant strings: https://play.golang.org/p/_sdu0fy97dm Here is the benchmark result
I find the result surprising, since, after b9a59d9, the correct method to count runes in a string now results in worse performance compared to the incorrect method. Using utf8.RuneCountInString on a constant string is probably rare, so I'm not sure if optimizing this case is worth the effort. But from b9a59d9 the code is already here. |
That len([]rune()) is faster on constant strings has been the case before b9a59d9 since at least go1.4 (see benchmarks) and has not been changed by the cl/108985 : And is due to the compiler optimizing []rune(constantstring) at compile time in: go/src/cmd/compile/internal/gc/typecheck.go Line 1720 in 4258b43
Update: Special casing utf8.RuneCountInString in the same manner by detecting the function name in the compiler would mean that copying the utf8 code to another package will result in performance loss which would also be surprising. |
@martisch Thanks for the clarification. Special cases are no good, but also the fact that len([]rune(s)) and utf8.RuneCountInString performances with constant and non constant strings are so different is probably no good. By the way, it seems there is a typo in the comment for the isRuneCount function. |
Change https://golang.org/cl/115836 mentions this issue: |
Updates #24923 Change-Id: Ie5a1b54b023381b58df618080f3d742a50d46d8b Reviewed-on: https://go-review.googlesource.com/115836 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com> Run-TryBot: Martin Möhrmann <moehrmann@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
What version of Go are you using (
go version
)?go1.10.1
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?darwin/amd64
What did you do?
I've benchmarked
len([]rune(string))
andutf8.RuneCountInString(string)
and I saw that the latter performs better.Here's the benchmark code.
Benchmark Results:
What did you expect to see?
Actually, I wasn't expecting
len([]rune(string))
to be faster compared toutf8.RuneCountInString
, then again I wanted to open this issue. I noticed that there are a lot people are using this pattern.The text was updated successfully, but these errors were encountered: