Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: performance issue with ranging over string #13162

mpvl opened this issue Nov 5, 2015 · 12 comments

cmd/compile: performance issue with ranging over string #13162

mpvl opened this issue Nov 5, 2015 · 12 comments


Copy link

mpvl commented Nov 5, 2015

Looping over a string with range is significantly slower than using utf8.DecodeRuneInString.

So using
for i, r := range s { ... }

is significantly slower than

var size int
for i := 0; i < len(s); {
        r := rune(s[i])
        if r < RuneSelf {
        } else {
            r, size = DecodeRune(s[i:])
            i += size

An (approximate) example of is one gets by comparing the benchmarks of the implementations for bytes and strings in unicode/utf8:

BenchmarkRuneCountTenASCIIChars-8               100000000           16.6 ns/op
BenchmarkRuneCountInStringTenASCIIChars-8       30000000            43.1 ns/op

BenchmarkValidTenASCIIChars-8                   100000000           13.8 ns/op
BenchmarkValidStringTenASCIIChars-8             30000000            52.5 ns/op

The benchmarks for the *String versions are considerably slower than their bytes counterparts, when processing ASCII.

Copy link

CL mentions this issue.

mpvl added a commit that referenced this issue Nov 16, 2015
Ranging over string is much slower than using DecodeRuneInString.

Replacing ranging over a string with the implementation of the Bytes
counterpart results in the following performance improvements:

RuneCountInStringTenASCIIChars-8     43.0ns ± 1%  16.4ns ± 2%  -61.80%  (p=0.000 n=7+8)
RuneCountInStringTenJapaneseChars-8   161ns ± 2%   154ns ± 2%   -4.58%  (p=0.000 n=8+8)
ValidStringTenASCIIChars-8           52.2ns ± 1%  13.2ns ± 1%  -74.62%  (p=0.001 n=7+7)
ValidStringTenJapaneseChars-8         173ns ± 2%   153ns ± 2%  -11.78%  (p=0.000 n=7+8)

Update #13162

Change-Id: Ifc40a6a94bb3317f1f2d929d310bd2694645e9f6
Reviewed-by: Russ Cox <>
@rsc rsc added this to the Go1.7Early milestone Dec 28, 2015
@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Early May 5, 2016
Copy link

martisch commented Aug 26, 2016

To avoid duplicate work:

I am working to improve this by a compiler patch that removes stringiter and handles
ascii chars directly and calls charntorune when needed.

quick test results:

with old range loop and without patch:
BenchmarkRuneCountTenASCIIChars-4               100000000           11.8 ns/op
BenchmarkRuneCountTenJapaneseChars-4            20000000            60.6 ns/op
BenchmarkRuneCountInStringTenASCIIChars-4       30000000            43.0 ns/op
BenchmarkRuneCountInStringTenJapaneseChars-4    20000000            88.3 ns/op

with old range loop and with patch:
BenchmarkRuneCountTenASCIIChars-4               100000000           11.9 ns/op
BenchmarkRuneCountTenJapaneseChars-4            20000000            61.4 ns/op
BenchmarkRuneCountInStringTenASCIIChars-4       100000000           18.9 ns/op
BenchmarkRuneCountInStringTenJapaneseChars-4    20000000            72.2 ns/op

Copy link

CL mentions this issue.

gopherbot pushed a commit that referenced this issue Aug 30, 2016
Generate a for loop for ranging over strings that only needs to call
the runtime function charntorune for non ASCII characters.

This provides faster iteration over ASCII characters and slightly
faster iteration for other characters.

The runtime function charntorune is changed to take an index from where
to start decoding and returns the index after the last byte belonging
to the decoded rune.

All call sites of charntorune in the runtime are replaced by a for loop
that will be transformed by the compiler instead of calling the charntorune
function directly.

go binary size decreases by 80 bytes.
godoc binary size increases by around 4 kilobytes.


name                           old time/op  new time/op  delta
RuneIterate/range/ASCII-4      43.7ns ± 3%  10.3ns ± 4%  -76.33%  (p=0.000 n=44+45)
RuneIterate/range/Japanese-4   72.5ns ± 2%  62.8ns ± 2%  -13.41%  (p=0.000 n=49+50)
RuneIterate/range1/ASCII-4     43.5ns ± 2%  10.4ns ± 3%  -76.18%  (p=0.000 n=50+50)
RuneIterate/range1/Japanese-4  72.5ns ± 2%  62.9ns ± 2%  -13.26%  (p=0.000 n=50+49)
RuneIterate/range2/ASCII-4     43.5ns ± 3%  10.3ns ± 2%  -76.22%  (p=0.000 n=48+47)
RuneIterate/range2/Japanese-4  72.4ns ± 2%  62.7ns ± 2%  -13.47%  (p=0.000 n=50+50)


name                 old time/op    new time/op    delta
IndexRune-4            64.7ns ± 5%    22.4ns ± 3%  -65.43%  (p=0.000 n=25+21)
MapNoChanges-4          269ns ± 2%     157ns ± 2%  -41.46%  (p=0.000 n=23+24)
Fields-4               23.0ms ± 2%    19.7ms ± 2%  -14.35%  (p=0.000 n=25+25)
FieldsFunc-4           23.1ms ± 2%    19.6ms ± 2%  -14.94%  (p=0.000 n=25+24)

name                 old speed      new speed      delta
Fields-4             45.6MB/s ± 2%  53.2MB/s ± 2%  +16.87%  (p=0.000 n=24+25)
FieldsFunc-4         45.5MB/s ± 2%  53.5MB/s ± 2%  +17.57%  (p=0.000 n=25+24)

Updates #13162

Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701
TryBot-Result: Gobot Gobot <>
Reviewed-by: Keith Randall <>
Reviewed-by: Josh Bleecher Snyder <>
Run-TryBot: Martin Möhrmann <>
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016
Copy link

martisch commented Oct 19, 2016

at tip on my macbookpro with intel ivy bridge where:

func RuneCountInString(s string) (n int) {
    for range s {
    return n
name                                 time/op
RuneCountTenASCIIChars-4             11.1ns ± 3%
RuneCountTenJapaneseChars-4          65.0ns ± 2%
RuneCountInStringTenASCIIChars-4     10.9ns ± 1%
RuneCountInStringTenJapaneseChars-4  62.4ns ± 1%

is the speed of ranging over a string good enough on other machines too now?

Copy link

Here are numbers changing unicode/utf8.RuneCountInString from its current implementation to:

func RuneCountInString(s string) (n int) {
    for range s {
    return n

Looks like it's now only 13-19% slower.

name                                 old time/op  new time/op   delta
RuneCountInStringTenASCIIChars-4     9.05ns ± 1%  10.27ns ± 1%  +13.51%  (p=0.000 n=10+10)
RuneCountInStringTenJapaneseChars-4  52.5ns ± 1%   62.4ns ± 1%  +18.86%   (p=0.000 n=10+9)

It's at least good enough for Go 1.8. I'll kick this to Unplanned at this point since it's a performance thing, which we don't generally track for releases unless it's really bad or important.

@bradfitz bradfitz modified the milestones: Unplanned, Go1.8 Oct 20, 2016
@bradfitz bradfitz removed the NeedsFix The path to resolution is known, but the work has not been done. label Oct 20, 2016
Copy link

I actually get better performance (~10%) using string range. Brad, I'm not sure why you see something different.

I'll mail a CL to use string range for RuneCountInString. We can argue on that CL.

Copy link

CL mentions this issue.

Copy link

@randall77, I think some compiler changes landed after my comment there. I don't see d295174 mentioned here, but it also apparently landed 3 days before my comment. So not sure which one.

@bradfitz bradfitz modified the milestones: Unplanned, Go1.11 Mar 5, 2018
Copy link

bradfitz commented Mar 5, 2018

Marking release-blocker to at least figure out whether we accept .

Ideally we would accept it, because it makes the code so much nicer, but it seems slower now with Go tip than it did in past releases.


Copy link

Remeasuring, RuneCountInString using range is no faster for ascii, and it is slower for non-ascii. So I don't think we should check it in at this point.
I'd like to see #22234 fixed first, looking at the assembly we're tripping on that issue.

Copy link

Even with the #22234 fix in, this is still slower.

I measured two variants of the CL above: first with an unnamed result parameter, and second with a named result parameter. (ideally they shouldn't generate different code, but alas.)

bradfitz@gdev:~/go/src/unicode/utf8$ benchstat before after     
name                                 old time/op  new time/op  delta
RuneCountTenASCIIChars-4             11.8ns ± 1%  12.7ns ± 3%   +7.61%   (p=0.000 n=9+9)
RuneCountTenJapaneseChars-4          55.7ns ± 2%  55.8ns ± 1%     ~     (p=0.921 n=10+9)
RuneCountInStringTenASCIIChars-4     11.7ns ± 1%  12.2ns ± 2%   +4.52%  (p=0.000 n=9+10)
RuneCountInStringTenJapaneseChars-4  55.1ns ± 1%  71.3ns ± 3%  +29.56%  (p=0.000 n=9+10)

bradfitz@gdev:~/go/src/unicode/utf8$ benchstat before after2
name                                 old time/op  new time/op  delta
RuneCountTenASCIIChars-4             11.8ns ± 1%  13.6ns ±13%  +15.04%   (p=0.000 n=9+10)
RuneCountTenJapaneseChars-4          55.7ns ± 2%  56.1ns ± 2%     ~     (p=0.196 n=10+10)
RuneCountInStringTenASCIIChars-4     11.7ns ± 1%  12.3ns ± 1%   +4.69%   (p=0.000 n=9+10)
RuneCountInStringTenJapaneseChars-4  55.1ns ± 1%  70.1ns ± 1%  +27.25%   (p=0.000 n=9+10)

I'll kick this to Go 1.12 for @randall77 et al to investigate more.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 19, 2018
Copy link

odeke-em commented Feb 4, 2019

Punting more to Go1.13 but @randall77 sent a CL in late 2016. I'll also cc @josharian

@odeke-em odeke-em removed this from the Go1.12 milestone Feb 4, 2019
@odeke-em odeke-em added this to the Go1.13 milestone Feb 4, 2019
@randall77 randall77 modified the milestones: Go1.13, Go1.14 Jun 3, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

8 participants