New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: performance issue with ranging over string #13162

Open
mpvl opened this Issue Nov 5, 2015 · 11 comments

Comments

Projects
None yet
7 participants
@mpvl
Member

mpvl commented Nov 5, 2015

Looping over a string with range is significantly slower than using utf8.DecodeRuneInString.

So using
for i, r := range s { ... }

is significantly slower than

var size int
for i := 0; i < len(s); {
        r := rune(s[i])
        if r < RuneSelf {
            i++
        } else {
            r, size = DecodeRune(s[i:])
            i += size
        }
}

An (approximate) example of is one gets by comparing the benchmarks of the implementations for bytes and strings in unicode/utf8:

BenchmarkRuneCountTenASCIIChars-8               100000000           16.6 ns/op
BenchmarkRuneCountInStringTenASCIIChars-8       30000000            43.1 ns/op

BenchmarkValidTenASCIIChars-8                   100000000           13.8 ns/op
BenchmarkValidStringTenASCIIChars-8             30000000            52.5 ns/op

The benchmarks for the *String versions are considerably slower than their bytes counterparts, when processing ASCII.

@gopherbot

This comment has been minimized.

gopherbot commented Nov 5, 2015

CL https://golang.org/cl/16695 mentions this issue.

mpvl added a commit that referenced this issue Nov 16, 2015

unicode/utf8: removed uses of ranging over string
Ranging over string is much slower than using DecodeRuneInString.
See golang.org/issue/13162.

Replacing ranging over a string with the implementation of the Bytes
counterpart results in the following performance improvements:

RuneCountInStringTenASCIIChars-8     43.0ns ± 1%  16.4ns ± 2%  -61.80%  (p=0.000 n=7+8)
RuneCountInStringTenJapaneseChars-8   161ns ± 2%   154ns ± 2%   -4.58%  (p=0.000 n=8+8)
ValidStringTenASCIIChars-8           52.2ns ± 1%  13.2ns ± 1%  -74.62%  (p=0.001 n=7+7)
ValidStringTenJapaneseChars-8         173ns ± 2%   153ns ± 2%  -11.78%  (p=0.000 n=7+8)

Update #13162

Change-Id: Ifc40a6a94bb3317f1f2d929d310bd2694645e9f6
Reviewed-on: https://go-review.googlesource.com/16695
Reviewed-by: Russ Cox <rsc@golang.org>

@rsc rsc added this to the Go1.7Early milestone Dec 28, 2015

@bradfitz bradfitz added the Performance label May 5, 2016

@bradfitz bradfitz modified the milestones: Go1.8, Go1.7Early May 5, 2016

@martisch

This comment has been minimized.

Member

martisch commented Aug 26, 2016

To avoid duplicate work:

I am working to improve this by a compiler patch that removes stringiter and handles
ascii chars directly and calls charntorune when needed.

quick test results:

with old range loop and without patch:
BenchmarkRuneCountTenASCIIChars-4               100000000           11.8 ns/op
BenchmarkRuneCountTenJapaneseChars-4            20000000            60.6 ns/op
BenchmarkRuneCountInStringTenASCIIChars-4       30000000            43.0 ns/op
BenchmarkRuneCountInStringTenJapaneseChars-4    20000000            88.3 ns/op

with old range loop and with patch:
BenchmarkRuneCountTenASCIIChars-4               100000000           11.9 ns/op
BenchmarkRuneCountTenJapaneseChars-4            20000000            61.4 ns/op
BenchmarkRuneCountInStringTenASCIIChars-4       100000000           18.9 ns/op
BenchmarkRuneCountInStringTenJapaneseChars-4    20000000            72.2 ns/op
@gopherbot

This comment has been minimized.

gopherbot commented Aug 28, 2016

CL https://golang.org/cl/27853 mentions this issue.

gopherbot pushed a commit that referenced this issue Aug 30, 2016

cmd/compile: improve string iteration performance
Generate a for loop for ranging over strings that only needs to call
the runtime function charntorune for non ASCII characters.

This provides faster iteration over ASCII characters and slightly
faster iteration for other characters.

The runtime function charntorune is changed to take an index from where
to start decoding and returns the index after the last byte belonging
to the decoded rune.

All call sites of charntorune in the runtime are replaced by a for loop
that will be transformed by the compiler instead of calling the charntorune
function directly.

go binary size decreases by 80 bytes.
godoc binary size increases by around 4 kilobytes.

runtime:

name                           old time/op  new time/op  delta
RuneIterate/range/ASCII-4      43.7ns ± 3%  10.3ns ± 4%  -76.33%  (p=0.000 n=44+45)
RuneIterate/range/Japanese-4   72.5ns ± 2%  62.8ns ± 2%  -13.41%  (p=0.000 n=49+50)
RuneIterate/range1/ASCII-4     43.5ns ± 2%  10.4ns ± 3%  -76.18%  (p=0.000 n=50+50)
RuneIterate/range1/Japanese-4  72.5ns ± 2%  62.9ns ± 2%  -13.26%  (p=0.000 n=50+49)
RuneIterate/range2/ASCII-4     43.5ns ± 3%  10.3ns ± 2%  -76.22%  (p=0.000 n=48+47)
RuneIterate/range2/Japanese-4  72.4ns ± 2%  62.7ns ± 2%  -13.47%  (p=0.000 n=50+50)

strings:

name                 old time/op    new time/op    delta
IndexRune-4            64.7ns ± 5%    22.4ns ± 3%  -65.43%  (p=0.000 n=25+21)
MapNoChanges-4          269ns ± 2%     157ns ± 2%  -41.46%  (p=0.000 n=23+24)
Fields-4               23.0ms ± 2%    19.7ms ± 2%  -14.35%  (p=0.000 n=25+25)
FieldsFunc-4           23.1ms ± 2%    19.6ms ± 2%  -14.94%  (p=0.000 n=25+24)

name                 old speed      new speed      delta
Fields-4             45.6MB/s ± 2%  53.2MB/s ± 2%  +16.87%  (p=0.000 n=24+25)
FieldsFunc-4         45.5MB/s ± 2%  53.5MB/s ± 2%  +17.57%  (p=0.000 n=25+24)

Updates #13162

Change-Id: I79ffaf828d82bf9887592f08e5cad883e9f39701
Reviewed-on: https://go-review.googlesource.com/27853
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>

@quentinmit quentinmit added the NeedsFix label Oct 11, 2016

@martisch

This comment has been minimized.

Member

martisch commented Oct 19, 2016

at tip on my macbookpro with intel ivy bridge where:

func RuneCountInString(s string) (n int) {
    for range s {
        n++
    }
    return n
}
name                                 time/op
RuneCountTenASCIIChars-4             11.1ns ± 3%
RuneCountTenJapaneseChars-4          65.0ns ± 2%
RuneCountInStringTenASCIIChars-4     10.9ns ± 1%
RuneCountInStringTenJapaneseChars-4  62.4ns ± 1%

is the speed of ranging over a string good enough on other machines too now?

@bradfitz

This comment has been minimized.

Member

bradfitz commented Oct 20, 2016

Here are numbers changing unicode/utf8.RuneCountInString from its current implementation to:

func RuneCountInString(s string) (n int) {
    for range s {
        n++
    }
    return n
}

Looks like it's now only 13-19% slower.

name                                 old time/op  new time/op   delta
RuneCountInStringTenASCIIChars-4     9.05ns ± 1%  10.27ns ± 1%  +13.51%  (p=0.000 n=10+10)
RuneCountInStringTenJapaneseChars-4  52.5ns ± 1%   62.4ns ± 1%  +18.86%   (p=0.000 n=10+9)

It's at least good enough for Go 1.8. I'll kick this to Unplanned at this point since it's a performance thing, which we don't generally track for releases unless it's really bad or important.

@bradfitz bradfitz modified the milestones: Unplanned, Go1.8 Oct 20, 2016

@bradfitz bradfitz removed the NeedsFix label Oct 20, 2016

@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 28, 2016

I actually get better performance (~10%) using string range. Brad, I'm not sure why you see something different.

I'll mail a CL to use string range for RuneCountInString. We can argue on that CL.

@gopherbot

This comment has been minimized.

gopherbot commented Nov 28, 2016

CL https://golang.org/cl/33637 mentions this issue.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Nov 28, 2016

@randall77, I think some compiler changes landed after my comment there. I don't see d295174 mentioned here, but it also apparently landed 3 days before my comment. So not sure which one.

@bradfitz bradfitz modified the milestones: Unplanned, Go1.11 Mar 5, 2018

@bradfitz

This comment has been minimized.

Member

bradfitz commented Mar 5, 2018

Marking release-blocker to at least figure out whether we accept https://go-review.googlesource.com/c/go/+/33637 .

Ideally we would accept it, because it makes the code so much nicer, but it seems slower now with Go tip than it did in past releases.

@randall77?

@randall77

This comment has been minimized.

Contributor

randall77 commented Mar 5, 2018

Remeasuring, RuneCountInString using range is no faster for ascii, and it is slower for non-ascii. So I don't think we should check it in at this point.
I'd like to see #22234 fixed first, looking at the assembly we're tripping on that issue.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Jun 19, 2018

Even with the #22234 fix in, this is still slower.

I measured two variants of the CL above: first with an unnamed result parameter, and second with a named result parameter. (ideally they shouldn't generate different code, but alas.)

bradfitz@gdev:~/go/src/unicode/utf8$ benchstat before after     
name                                 old time/op  new time/op  delta
RuneCountTenASCIIChars-4             11.8ns ± 1%  12.7ns ± 3%   +7.61%   (p=0.000 n=9+9)
RuneCountTenJapaneseChars-4          55.7ns ± 2%  55.8ns ± 1%     ~     (p=0.921 n=10+9)
RuneCountInStringTenASCIIChars-4     11.7ns ± 1%  12.2ns ± 2%   +4.52%  (p=0.000 n=9+10)
RuneCountInStringTenJapaneseChars-4  55.1ns ± 1%  71.3ns ± 3%  +29.56%  (p=0.000 n=9+10)

bradfitz@gdev:~/go/src/unicode/utf8$ benchstat before after2
name                                 old time/op  new time/op  delta
RuneCountTenASCIIChars-4             11.8ns ± 1%  13.6ns ±13%  +15.04%   (p=0.000 n=9+10)
RuneCountTenJapaneseChars-4          55.7ns ± 2%  56.1ns ± 2%     ~     (p=0.196 n=10+10)
RuneCountInStringTenASCIIChars-4     11.7ns ± 1%  12.3ns ± 1%   +4.69%   (p=0.000 n=9+10)
RuneCountInStringTenJapaneseChars-4  55.1ns ± 1%  70.1ns ± 1%  +27.25%   (p=0.000 n=9+10)

I'll kick this to Go 1.12 for @randall77 et al to investigate more.

@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 19, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment