Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: function argument value escapes no matter the run-time condition #42781

Open
Julio-Guerra opened this issue Nov 23, 2020 · 8 comments
Open

Comments

@Julio-Guerra
Copy link

@Julio-Guerra Julio-Guerra commented Nov 23, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.5 linux/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/julio/.cache/go-build"
GOENV="/home/julio/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/julio/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/julio/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.15"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.15/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build392370816=/tmp/go-build -gno-record-gcc-switches"

What did you do?

While benchmarking a time histogram implementation, I found an unexpected allocation in a IncrementOrStore() method implementation, happening directly in the prolog code the compiler generated, where the return value is allocated regardless of whether it's needed or not.

I could reproduce the issue with the following dummy function:

func foo(cond bool, v int) *int {
    if cond {
        return &v
    }
    return nil
}

The compiled function of the previous example will allocate the int value in its function prolog rather than in the true-condition path:

TEXT    "".foo(SB), ABIInternal, $24-24
MOVQ    (TLS), CX
CMPQ    SP, 16(CX)
PCDATA  $0, $-2
JLS     102
PCDATA  $0, $-1
SUBQ    $24, SP
MOVQ    BP, 16(SP)
LEAQ    16(SP), BP
FUNCDATA        $0, gclocals·54241e171da8af6ae173d69da0236748(SB)
FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
LEAQ    type.int(SB), AX
MOVQ    AX, (SP)
PCDATA  $1, $0
CALL    runtime.newobject(SB) // <--- The int value is allocated in the function prolog, no matter the condition
MOVQ    8(SP), AX
MOVQ    "".v+40(SP), CX
MOVQ    CX, (AX)
MOVBLZX "".cond+32(SP), CX
NOP
TESTB   CL, CL
JEQ     83
MOVQ    AX, "".~r2+48(SP)
MOVQ    16(SP), BP
ADDQ    $24, SP
RET
MOVQ    $0, "".~r2+48(SP)
MOVQ    16(SP), BP
ADDQ    $24, SP
RET
NOP
PCDATA  $1, $-1
PCDATA  $0, $-2
CALL    runtime.morestack_noctxt(SB)
PCDATA  $0, $-1
JMP     0

The workaround I use for now is to rather return the value from the condition block scope:

func foo(cond bool, v int) *int {
    if cond {
+       v := v // new declaration in this scope to avoid using function scope
        return &v
    }
    return nil
}

Resulting in:

TEXT    "".foo(SB), ABIInternal, $24-24
MOVQ    (TLS), CX
CMPQ    SP, 16(CX)
PCDATA  $0, $-2
JLS     101
PCDATA  $0, $-1
SUBQ    $24, SP
MOVQ    BP, 16(SP)
LEAQ    16(SP), BP
FUNCDATA        $0, gclocals·54241e171da8af6ae173d69da0236748(SB)
FUNCDATA        $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
MOVBLZX "".cond+32(SP), AX
TESTB   AL, AL
JEQ     82
LEAQ    type.int(SB), AX
MOVQ    AX, (SP)
PCDATA  $1, $0
CALL    runtime.newobject(SB)  // <--- The int value is no longer allocated by the function prolog, and according to the condition
MOVQ    8(SP), AX
MOVQ    "".v+40(SP), CX
MOVQ    CX, (AX)
MOVQ    AX, "".~r2+48(SP)
MOVQ    16(SP), BP
ADDQ    $24, SP
RET
MOVQ    $0, "".~r2+48(SP)
MOVQ    16(SP), BP
ADDQ    $24, SP
RET
NOP
PCDATA  $1, $-1
PCDATA  $0, $-2
CALL    runtime.morestack_noctxt(SB)
PCDATA  $0, $-1
JMP     0
@cuonglm
Copy link
Member

@cuonglm cuonglm commented Nov 23, 2020

Would you mind elaborating more details? Both your programs above will be heap allocated the v parameter.

@Julio-Guerra
Copy link
Author

@Julio-Guerra Julio-Guerra commented Nov 23, 2020

Would you mind elaborating more details? Both your programs above will be heap allocated the v parameter.

I update the issue with the generated asm, and highlighting the change in the allocation position: one is always, while the other is conditional.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 23, 2020

Yes, the escape analysis is per-variable and is not flow-sensitive. Making it flow sensitive would require identifying when a variable can safely be split--basically, automatically inserting the v := v that you use in your second example. This seems doable in principle, but it doesn't seem easy. I'm not sure that it would help many real programs.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 23, 2020

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 23, 2020

This is one case where if we do escape analysis in SSA it probably can be done fairly easily. (We've been thinking about it for a while. But it needs a lot of work.)

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Nov 24, 2020

I'd prefer you didn't call it "wrong".
Lots of code is slower than it could be if we focused our attention on that one single piece of code.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Mar 12, 2021

I'm not sure that it would help many real programs.

I found this issue (or at least a related issue) when looking at a real-world scenario that might be quite common.

It's common to have tracing functions that only need to record occasionally. When tracing isn't
enabled, we'd prefer a low overhead and no allocations.

Here's an example, derived from some real-world code:

https://play.golang.org/p/z4O9gg8UOkp

func f(t *Tracer) {
	t.Trace(1, 2, 3, 4)
}

type ExternalTracer interface {
	Trace(...int)
}

type Tracer struct {
	enabled bool
	tracer  ExternalTracer
}

func (t *Tracer) Trace(args ...int) {
	if !t.enabled {
		return
	}
	t.tracer.Trace(args...)
}

When I run the above in a benchmark, I see this:

BenchmarkArgs-8         	57396238	        22.91 ns/op	      32 B/op	       1 allocs/op
BenchmarkArgsWithIf-8   	1000000000	         0.2498 ns/op	       0 B/op	       0 allocs/op

That is, the argument slice is always being allocated, even when tracing is not enabled.

Currently the only way of working around this issue a bit tedious: put an explicit if statement around
each call.

@rogpeppe
Copy link
Contributor

@rogpeppe rogpeppe commented Mar 12, 2021

Currently the only way of working around this issue a bit tedious: put an explicit if statement around
each call.

Actually, I've just realised that's not the case. We can work around it by adding an explicit allocation:

func (t *Tracer) Trace(args ...int) {
	if !t.enabled {
		return
	}
	args1 := append([]int(nil), args...)
	t.tracer.Trace(args1...)
}

It's not totally ideal (if Trace is itself called via an interface, then we'll be copying an already-copied arg list), but it's a nice workaround for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants