New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: emit optimized assembly for checking the first chars before `{strings|bytes}.IndexByte` call #25844

Open
valyala opened this Issue Jun 12, 2018 · 3 comments

Comments

Projects
None yet
3 participants
@valyala
Contributor

valyala commented Jun 12, 2018

The issue

Currently strings.IndexByte (and bytes.IndexByte) call overhead makes these functions slower comparing to a plain check for the first few bytes on amd64. The following benchmark shows the issue:

package indexbyte_test

import (
        "strings"
        "testing"
)

func stdFunc(s string, ch byte) int {
        // Just call strings.IndexByte
        return strings.IndexByte(s, ch)
}

func prefixCheckFunc(s string, ch byte) int {
        if len(s) >= 4 {
                // Search for ch in the first few bytes of s before resorting
                // to strings.IndexByte call.
                // The compiler may substitute this by compact optimized assembly in order
                // to reduce overhead when ch isn't found in the first few bytes.
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                if s[2] == ch {
                        return 2
                }
                if s[3] == ch {
                        return 3
                }
                return strings.IndexByte(s, ch)
        }

        // The code for short s completely skips strings.IndexByte call overhead.
        if len(s) == 0 {
                return -1
        }
        if len(s) == 1 {
                if s[0] == ch {
                        return 0
                }
                return -1
        }
        if len(s) == 2 {
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                return -1
        }
        if len(s) == 3 {
                if s[0] == ch {
                        return 0
                }
                if s[1] == ch {
                        return 1
                }
                if s[2] == ch {
                        return 2
                }
                return -1
        }
        return -1
}

func BenchmarkCommaSearch(b *testing.B) {
        for _, s := range []string{
                        "",
                        "1",
                        "12",
                        "123",
                        "1234",
                        "12345",
                        "123456",
                        ",",
                        "1,",
                        "12,",
                        "123,",
                        "1234,",
                        "12345,",
                        "123456,",
                        ", 123, 4567890 qwertyuiop",
                        "1, 23, 4567890 qwertyuiop",
                        "12, 34, 567890 qwertyuiop",
                        "123, 4, 567890 qwertyuiop",
                        "1234, 5, 67890 qwertyuiop",
                        "12345, 6, 7890 qwertyuiop",
                        "123456, 7, 890 qwertyuiop",
                } {
                b.Run(s, func(b *testing.B) {
                        b.Run("std", func(b *testing.B) {
                                for i := 0; i < b.N; i++ {
                                        Sink += stdFunc(s, ',')
                                }
                        })
                        b.Run("prefixCheck", func(b *testing.B) {
                                for i := 0; i < b.N; i++ {
                                        Sink += prefixCheckFunc(s, ',')
                                }
                        })
                })
        }
}

var Sink int

Benchmark results:

$ go test -bench=CommaSearch
goos: linux
goarch: amd64
BenchmarkCommaSearch/#00/std-4     	300000000	         5.39 ns/op
BenchmarkCommaSearch/#00/prefixCheck-4         	500000000	         3.04 ns/op
BenchmarkCommaSearch/1/std-4                   	300000000	         5.72 ns/op
BenchmarkCommaSearch/1/prefixCheck-4           	500000000	         3.24 ns/op
BenchmarkCommaSearch/12/std-4                  	300000000	         5.88 ns/op
BenchmarkCommaSearch/12/prefixCheck-4          	500000000	         3.46 ns/op
BenchmarkCommaSearch/123/std-4                 	300000000	         5.52 ns/op
BenchmarkCommaSearch/123/prefixCheck-4         	500000000	         3.94 ns/op
BenchmarkCommaSearch/1234/std-4                	300000000	         5.92 ns/op
BenchmarkCommaSearch/1234/prefixCheck-4        	200000000	         6.22 ns/op
BenchmarkCommaSearch/12345/std-4               	200000000	         5.68 ns/op
BenchmarkCommaSearch/12345/prefixCheck-4       	200000000	         6.32 ns/op
BenchmarkCommaSearch/123456/std-4              	300000000	         5.97 ns/op
BenchmarkCommaSearch/123456/prefixCheck-4      	200000000	         6.24 ns/op
BenchmarkCommaSearch/,/std-4                   	300000000	         6.17 ns/op
BenchmarkCommaSearch/,/prefixCheck-4           	500000000	         3.30 ns/op
BenchmarkCommaSearch/1,/std-4                  	200000000	         6.49 ns/op
BenchmarkCommaSearch/1,/prefixCheck-4          	500000000	         3.76 ns/op
BenchmarkCommaSearch/12,/std-4                 	300000000	         6.23 ns/op
BenchmarkCommaSearch/12,/prefixCheck-4         	500000000	         3.74 ns/op
BenchmarkCommaSearch/123,/std-4                	200000000	         6.08 ns/op
BenchmarkCommaSearch/123,/prefixCheck-4        	300000000	         4.04 ns/op
BenchmarkCommaSearch/1234,/std-4               	200000000	         6.23 ns/op
BenchmarkCommaSearch/1234,/prefixCheck-4       	200000000	         6.55 ns/op
BenchmarkCommaSearch/12345,/std-4              	300000000	         5.95 ns/op
BenchmarkCommaSearch/12345,/prefixCheck-4      	200000000	         6.55 ns/op
BenchmarkCommaSearch/123456,/std-4             	200000000	         6.47 ns/op
BenchmarkCommaSearch/123456,/prefixCheck-4     	200000000	         7.08 ns/op
BenchmarkCommaSearch/,_123,_4567890_qwertyuiop/std-4     	200000000	         6.28 ns/op
BenchmarkCommaSearch/,_123,_4567890_qwertyuiop/prefixCheck-4         	500000000	         3.35 ns/op
BenchmarkCommaSearch/1,_23,_4567890_qwertyuiop/std-4                 	200000000	         6.68 ns/op
BenchmarkCommaSearch/1,_23,_4567890_qwertyuiop/prefixCheck-4         	500000000	         3.60 ns/op
BenchmarkCommaSearch/12,_34,_567890_qwertyuiop/std-4                 	200000000	         6.87 ns/op
BenchmarkCommaSearch/12,_34,_567890_qwertyuiop/prefixCheck-4         	300000000	         3.73 ns/op
BenchmarkCommaSearch/123,_4,_567890_qwertyuiop/std-4                 	200000000	         6.26 ns/op
BenchmarkCommaSearch/123,_4,_567890_qwertyuiop/prefixCheck-4         	300000000	         4.32 ns/op
BenchmarkCommaSearch/1234,_5,_67890_qwertyuiop/std-4                 	200000000	         6.26 ns/op
BenchmarkCommaSearch/1234,_5,_67890_qwertyuiop/prefixCheck-4         	200000000	         6.96 ns/op
BenchmarkCommaSearch/12345,_6,_7890_qwertyuiop/std-4                 	200000000	         6.25 ns/op
BenchmarkCommaSearch/12345,_6,_7890_qwertyuiop/prefixCheck-4         	200000000	         7.47 ns/op
BenchmarkCommaSearch/123456,_7,_890_qwertyuiop/std-4                 	200000000	         6.76 ns/op
BenchmarkCommaSearch/123456,_7,_890_qwertyuiop/prefixCheck-4         	200000000	         6.94 ns/op

The benchmark shows that the prefixCheckFunc outperforms strings.IndexByte for short strings and when the required char is located in the short string prefix.

Solution

Emit optimized assembly for checking the first chars before {strings|bytes}.IndexByte call.

This may speed up parsers extracting small tokens such as short numbers, ids and strings.

cc'ing @TocarIP and @randall77 , who may be interested in the implementation.

@valyala

This comment has been minimized.

Contributor

valyala commented Jun 12, 2018

cc'ing @Quasilyte and @josharian , since it looks like this may help CLs like https://go-review.googlesource.com/c/go/+/115616 .

@bcmills bcmills added the Performance label Jun 12, 2018

@bcmills bcmills added this to the Unplanned milestone Jun 12, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Jun 12, 2018

We could also do something similar for HasPrefix and HasSuffix, particularly when the sought byte/prefix/suffix is a constant. However, it seems to me these are perfect candidates for mid-stack or partial inlining, rather than hand-coded compiler optimizations.

@valyala

This comment has been minimized.

Contributor

valyala commented Jun 12, 2018

However, it seems to me these are perfect candidates for mid-stack or partial inlining, rather than hand-coded compiler optimizations.

But as I know mid-stack inlining cannot inline yet functions written in assembly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment