Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: named returns are faster than inline returns #40638

Closed
prashantv opened this issue Aug 7, 2020 · 3 comments
Closed

cmd/compile: named returns are faster than inline returns #40638

prashantv opened this issue Aug 7, 2020 · 3 comments
Milestone

Comments

@prashantv
Copy link
Contributor

@prashantv prashantv commented Aug 7, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.7 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/prashant/.cache/go-build"
GOENV="/home/prashant/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/prashant/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/prashant/.gimme/versions/go1.14.7.linux.amd64"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/prashant/.gimme/versions/go1.14.7.linux.amd64/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build010078908=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Return a medium sized struct (E.g., 4 byte slices) in similar, but slightly different ways (e.g., using named returns, as part of return), and saw unexpected slow performance for the return "immediate"

E.g., this approach was slower:

func (s largeStruct) ReturnDirectly() (largeStruct, error) {
        return largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }, nil
}

as compared to using named returns:

//go:noinline
func (s largeStruct) NamedReturnExplicit() (ls largeStruct, _ error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

Full repro:

package main

import "testing"

type largeStruct struct {
	bs1      []byte
	bs2      []byte
	bs3, bs4 int
}

func BenchmarkFoo(b *testing.B) {
	tests := []struct {
		msg string
		f   func(s largeStruct) (largeStruct, error)
	}{
                {"ReturnDirectly", largeStruct.ReturnDirectly},
                {"ReturnImmediate", largeStruct.ReturnIntermediate},
                {"NamedReturnExplicit", largeStruct.NamedReturnExplicit},
                {"NamedReturnImplicit", largeStruct.NamedReturnImplicit},
        }

        for _, tt := range tests {
                b.Run(tt.msg, func(b *testing.B) {
                        var s largeStruct
                        var err error
                        for i := 0; i < b.N; i++ {
                                s, err = tt.f(s)
                        }
                        _ = err
                })
        }
}

//go:noinline
func (s largeStruct) ReturnDirectly() (largeStruct, error) {
        return largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }, nil
}

//go:noinline
func (s largeStruct) ReturnIntermediate() (largeStruct, error) {
        ls := largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

//go:noinline
func (s largeStruct) NamedReturnExplicit() (ls largeStruct, _ error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

//go:noinline
func (s largeStruct) NamedReturnImplicit() (ls largeStruct, err error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return
}

What did you expect to see?

Expected the performance to be the same for the different approaches.

What did you see instead?

Named returns has a performance advantage over returning a value as part of a return statement.

prashantv added a commit to uber/tchannel-go that referenced this issue Aug 7, 2020
As part of #792, we moved to typed.ReadBuffer which is much more
readable than direct binary translations, but we lost a little
bit of performance on the KV iterator,

Original:
```
BenchmarkKeyValIterator-12      52558426                66.7 ns/op
```

With typed.ReadBuffer:
```
BenchmarkKeyValIterator-12      34918740               101 ns/op
```

WIth these optimizations, performance is close to the original:
```
BenchmarkKeyValIterator-12      46994628                76.1 ns/op
```

This change has the following optimizations:

 * Make ReadBuffer smaller by 2 words (replace []byte with int). This
   gets rid of the Fill functionality, which was never used in any
   production code anyway.
 * Don't create a read buffer for single Uint16 read when creating KeyValueIterator
 * Instead of using io.EOF to check for end of all headers, add a Remaining method
   which returns a bool (vs the Next method which returns > 10 words). We were
   spending time just copying the large struct values on the stack after the last
   header read.
 * Simplify how the KeyValueIterator.Next method creates remaining by adding method
   directly to the ReadBuffer.
 * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance
   improvement. Filed golang/go#40638 since this was unexpected.
prashantv added a commit to uber/tchannel-go that referenced this issue Aug 7, 2020
As part of #792, we moved to typed.ReadBuffer which is much more
readable than direct binary translations, but we lost a little
bit of performance on the KV iterator,

Original:
```
BenchmarkKeyValIterator-12      52558426                66.7 ns/op
```

With typed.ReadBuffer:
```
BenchmarkKeyValIterator-12      34918740               101 ns/op
```

WIth these optimizations, performance is close to the original:
```
BenchmarkKeyValIterator-12      46994628                76.1 ns/op
```

This change has the following optimizations:

 * Make ReadBuffer smaller by 2 words (replace []byte with int). This
   gets rid of the Fill functionality, which was never used in any
   production code anyway.
 * Don't create a read buffer for single Uint16 read when creating KeyValueIterator
 * Instead of using io.EOF to check for end of all headers, add a Remaining method
   which returns a bool (vs the Next method which returns > 10 words). We were
   spending time just copying the large struct values on the stack after the last
   header read.
 * Simplify how the KeyValueIterator.Next method creates remaining by adding method
   directly to the ReadBuffer.
 * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance
   improvement. Filed golang/go#40638 as we'd like to avoid
   named returns here.
@ianlancetaylor ianlancetaylor changed the title cmd/go: Named returns are faster than inline returns cmd/compile: named returns are faster than inline returns Aug 7, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 7, 2020

It's a pretty small effect, but it does seem that using a local variable causes the compiler to assemble the value on the stack and then copy it to the result parameter stack location. Using a named result parameter causes the compiler to assemble the value directly in the result parameter stack location.

When using gccgo I see that ReturnIntermediate is slightly slower, and the others are all about the same speed.

CC @randall77

@ianlancetaylor ianlancetaylor added this to the Backlog milestone Aug 7, 2020
@dgryski
Copy link
Contributor

@dgryski dgryski commented Aug 7, 2020

Duplicate of #20859 ?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 7, 2020

Thanks. Closing as dup.

prashantv added a commit to uber/tchannel-go that referenced this issue Aug 7, 2020
As part of #792, we moved to typed.ReadBuffer which is much more
readable than direct binary translations, but we lost a little
bit of performance on the KV iterator,

Original:
```
BenchmarkKeyValIterator-12      52558426                66.7 ns/op
```

With typed.ReadBuffer:
```
BenchmarkKeyValIterator-12      34918740               101 ns/op
```

WIth these optimizations, performance is close to the original:
```
BenchmarkKeyValIterator-12      46994628                76.1 ns/op
```

This change has the following optimizations:
 * Make ReadBuffer smaller by 2 words (replace []byte with int). This
   gets rid of the Fill functionality, which was never used in any
   production code anyway.
 * Don't create a read buffer for single Uint16 read when creating KeyValueIterator
 * Instead of using io.EOF to check for end of all headers, add a Remaining method
   which returns a bool (vs the Next method which returns > 10 words). We were
   spending time just copying the large struct values on the stack after the last
   header read.
 * Simplify how the KeyValueIterator.Next method creates remaining by adding method
   directly to the ReadBuffer.
 * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance
   improvement. Filed golang/go#40638 as we'd like to avoid
   named returns here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.