New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: CSE some function calls args/results #20934

josharian opened this Issue Jul 6, 2017 · 5 comments


None yet
4 participants

josharian commented Jul 6, 2017

package p

var b bool

func f(x, y int) int {
	var r int
	if b {
		r = g(x, y)
	} else {
		r = h(x, y)
	return r

func g(x, y int) int { return 0 }

func h(x, y int) int { return 0 }

This kind of code ("call either g or h with the same args") is not uncommon. It is compiled suboptimally; there is lots of duplicate code to set up and read the stack:

"".f STEXT size=122 args=0x18 locals=0x20
	0x0000 00000 (x.go:5)	TEXT	"".f(SB), $32-24
	0x0000 00000 (x.go:5)	MOVQ	(TLS), CX
	0x0009 00009 (x.go:5)	CMPQ	SP, 16(CX)
	0x000d 00013 (x.go:5)	JLS	115
	0x000f 00015 (x.go:5)	SUBQ	$32, SP
	0x0013 00019 (x.go:5)	MOVQ	BP, 24(SP)
	0x0018 00024 (x.go:5)	LEAQ	24(SP), BP
	0x001d 00029 (x.go:5)	FUNCDATA	$0, gclocals·54241e171da8af6ae173d69da0236748(SB)
	0x001d 00029 (x.go:5)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x001d 00029 (x.go:7)	MOVBLZX	"".b(SB), AX
	0x0024 00036 (x.go:7)	TESTB	AL, AL
	0x0026 00038 (x.go:7)	JEQ	84
	0x0028 00040 (x.go:7)	MOVQ	"".x+40(SP), AX
	0x002d 00045 (x.go:8)	MOVQ	AX, (SP)
	0x0031 00049 (x.go:8)	MOVQ	"".y+48(SP), AX
	0x0036 00054 (x.go:8)	MOVQ	AX, 8(SP)
	0x003b 00059 (x.go:8)	PCDATA	$0, $0
	0x003b 00059 (x.go:8)	CALL	"".g(SB)
	0x0040 00064 (x.go:8)	MOVQ	16(SP), AX
	0x0045 00069 (x.go:12)	MOVQ	AX, "".~r2+56(SP)
	0x004a 00074 (x.go:12)	MOVQ	24(SP), BP
	0x004f 00079 (x.go:12)	ADDQ	$32, SP
	0x0053 00083 (x.go:12)	RET
	0x0054 00084 (x.go:12)	MOVQ	"".x+40(SP), AX
	0x0059 00089 (x.go:10)	MOVQ	AX, (SP)
	0x005d 00093 (x.go:10)	MOVQ	"".y+48(SP), AX
	0x0062 00098 (x.go:10)	MOVQ	AX, 8(SP)
	0x0067 00103 (x.go:10)	PCDATA	$0, $0
	0x0067 00103 (x.go:10)	CALL	"".h(SB)
	0x006c 00108 (x.go:10)	MOVQ	16(SP), AX
	0x0071 00113 (x.go:7)	JMP	69
	0x0073 00115 (x.go:7)	NOP
	0x0073 00115 (x.go:5)	PCDATA	$0, $-1
	0x0073 00115 (x.go:5)	CALL	runtime.morestack_noctxt(SB)
	0x0078 00120 (x.go:5)	JMP	0

Setting up the stack for the call and processing the return could be CSE'd to dominating and post-dominating blocks. In pseudo code:

func f(x, y int) int {
	var r int
        arg0 = x
        arg1 = y
	if b {
		call g
	} else {
		call h
        r = ret0
	return r

I suspect that at least for function arguments, our current CSE pass would work, but for the prohibition on CSEing values of memory type.

At the end of the generic cse pass, the code above looks like:

f <T>
    v1 = InitMem <mem>
    v2 = SP <uintptr>
    v3 = SB <uintptr>
    v6 = Addr <*int> {~r2} v2
    v7 = Arg <int> {x}
    v8 = Arg <int> {y}
    v10 = Addr <*bool> {"".b} v3
    v11 = Load <bool> v10 v1
    v13 = OffPtr <*int> [0] v2
    v17 = OffPtr <*int> [8] v2
    v20 = OffPtr <*int> [16] v2
    If v11 -> b2 b4
  b2: <- b1
    v15 = Store <mem> {int} v13 v7 v1
    v18 = Store <mem> {int} v17 v8 v15
    v19 = StaticCall <mem> {"".g} [24] v18
    v21 = Load <int> v20 v19
    Plain -> b3
  b3: <- b2 b4
    v29 = Phi <int> v21 v28
    v30 = Phi <mem> v19 v27
    v31 = VarDef <mem> {~r2} v30
    v32 = Store <mem> {int} v6 v29 v31
    Ret v32
  b4: <- b1
    v24 = Store <mem> {int} v13 v7 v1
    v26 = Store <mem> {int} v17 v8 v24
    v27 = StaticCall <mem> {"".h} [24] v26
    v28 = Load <int> v20 v27
    Plain -> b3

Observe that v15/v24 and v18/v26 certainly look ripe for CSE.

Rolling back the prohibition on CSEing memory values would require care, though, since that helped a lot with CSE toolspeed problems. Continuing to prohibit CSE of calls might be a workable middle ground.

cc @randall77 @tzneal @dr2chase

@josharian josharian added this to the Go1.10 milestone Jul 6, 2017


This comment has been minimized.


randall77 commented Jul 7, 2017

Our calling convention allows callees to modify their input args on the stack. So I don't think you can CSE v15 & v24, as the first call may modify what v15 put on the stack.
We could change the calling convention, of course. I don't think many callees rely on this ability. But nonetheless it's not a trivial change.


This comment has been minimized.


josharian commented Jul 7, 2017

So I don't think you can CSE v15 & v24, as the first call may modify what v15 put on the stack.

Can you explain a bit more? I still don't see how this could cause trouble, at least as long as the sequence of CSE'd writes are all setting up the stack before a call.


This comment has been minimized.


randall77 commented Jul 7, 2017

Sorry, I was misinterpreting what you were suggesting. The calls are in different branches, not in serial.

Yes, we could CSE those writes. We haven't been CSEing any pair where one instance does not dominate the other (memory type or otherwise). In general, this is to avoid putting work at the dominator that doesn't need to be done on some path out of the dominator. But when that work needs to be done on every path out of that dominator, we could move the values up with no chance of doing unneeded work.


This comment has been minimized.


randall77 commented Jul 7, 2017

For a simpler example, try

func f(x int, p *int, b bool) {
	if b {
		*p = x * 5
	} else {
		*p = x * 5

Currently we don't CSE the x*5 computation. It would be nice if we could, and such an optimization is probably a prerequisite for the optimization you're talking about in this issue.


This comment has been minimized.


dr2chase commented Jul 7, 2017

In doing the "all dominated paths share expression" CSE, we should exclude panic edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment