Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inefficient assembly code generated by simple loop&branch code #48882

Open
flypig opened this issue Oct 9, 2021 · 5 comments
Open

cmd/compile: inefficient assembly code generated by simple loop&branch code #48882

flypig opened this issue Oct 9, 2021 · 5 comments
Labels
NeedsFix Performance
Milestone

Comments

@flypig
Copy link

@flypig flypig commented Oct 9, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

centos8 & amd64

What did you do?

Given the code:

//go:noinline
func test_loop(arr []int) (resp int){
        max := 0
        for i := 0; i < len(arr); i++ {
                if (arr[i] > max) {
                        max = arr[i]
                }
        }
        return max
}

What did you expect to see?

TEXT "".test_loop2(SB), ABIInternal
00001 (28) FUNCDATA $0, gclocals·1a65e721a2ccc325b382662e7ffee780(SB)
00002 (28) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
00003 (28) FUNCDATA $5, "".test_loop2.arginfo1(SB)
00004 (31) XORL CX, CX
00005 (31) XORL DX, DX
00006 (30) JMP 8
00007 (+30) INCQ CX
00008 (+30) CMPQ BX, CX
00009 (30) JLE 15
00010 (+31) MOVQ (AX)(CX*8), SI
00011 (31) CMPQ DX, SI
00012 (31) JGE 7
00013 (31) MOVQ SI, DX
00014 (+34) JMP 7
00015 (+36) MOVQ DX, AX
00016 (36) RET
00017 (?) END

What did you see instead?

00000 (40) TEXT "".test_loop3(SB), ABIInternal
00001 (40) FUNCDATA $0, gclocals·1a65e721a2ccc325b382662e7ffee780(SB)
00002 (40) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
00003 (40) FUNCDATA $5, "".test_loop3.arginfo1(SB)
00004 (43) XORL CX, CX
00005 (43) XORL DX, DX
00006 (42) JMP 9
00007 (+42) INCQ CX
00008 (43) MOVQ SI, DX
00009 (+42) CMPQ BX, CX
00010 (42) JLE 16
00011 (+43) MOVQ (AX)(CX*8), SI
00012 (43) CMPQ DX, SI
00013 (43) JLE 7
00014 (43) MOVQ DX, SI
00015 (43) JMP 7
00016 (+47) MOVQ DX, AX
00017 (47) RET
00018 (?) END

The above assembly code has 2 more invalid move operations than the first code. Can someone help explain the reason?

@seankhliao seankhliao changed the title cmd:compile: inefficient assembly code generated by simple loop&branch code cmd/compile: inefficient assembly code generated by simple loop&branch code Oct 9, 2021
@seankhliao seankhliao added the NeedsInvestigation label Oct 9, 2021
@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 9, 2021

This is probably the register allocator making a nonoptimal choice.

There's a phi operation which is merging arr[i] and max. arr[i] is in SI and max is in DX. It decides to put the phi result in SI. It would be better to put in in DX because it needs to end up in DX for the next iteration. The code is just arbitrarily picking between the registers of the phi inputs, and chooses wrong. Looks like phi register allocation could use some lookahead when there are multiple good choices for the phi register.

The real question is why this code isn't generating conditional moves. Looks like branchelim doesn't quite work because prove has left some dead code around that confuses branchelim. Hmmm...

@randall77 randall77 added Performance NeedsFix and removed NeedsInvestigation labels Oct 9, 2021
@randall77 randall77 added this to the Unplanned milestone Oct 9, 2021
@flypig
Copy link
Author

@flypig flypig commented Oct 10, 2021

@randall77 thx for answering, it helps me a lot.

If i rewrite the code like this, it will be OK.

func test_loop2(arr []int) (resp int) {
	max := 0
	for i := 0; i < len(arr); i++ {
		if (arr[i] <= max) {
			continue
		}
		max = arr[i]
	}
	return max
}

Look into the ssa.html ->
screenshot-20211010-110427

The problematic code ssa.html ->
screenshot-20211010-110441

What confuses me is that,these two codes are almost the same in the schedule phase. Why are different registers allocated during the RegAlloc phase?

@renthraysk
Copy link

@renthraysk renthraysk commented Oct 12, 2021

The only method I've found that consistently generates conditional moves is to use a helper function.

func tern(b bool, t,f int) int {
    if b { return t }
    return f
}

func test_loop(arr []int) (resp int) {
	max := 0
	for i := 0; i < len(arr); i++ {
            max = tern(arr[i] > max, arr[i], max)
	}
	return max
}

https://go.godbolt.org/z/ox57Pvobr

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 2, 2021

Change https://golang.org/cl/360674 mentions this issue: cmd/compile: regalloc reuse the register of the phi argument that is referenced more in the phi arguments

@hanchaoqun
Copy link
Contributor

@hanchaoqun hanchaoqun commented Nov 2, 2021

Complete comparison of results after applying CL https://golang.org/cl/360674 compilation for go binary itself:

  old new diff
addr2line 3613104 3604912 -8192
api 5058320 5041936 -16384
asm 4526816 4514528 -12288
buildid 2395120 2391024 -4096
cgo 4251120 4238832 -12288
compile 22882048 22775552 -106496
cover 4423616 4415424 -8192
dist 3115360 3111264 -4096
doc 3602576 3594384 -8192
fix 3045712 3037520 -8192
link 6247472 6235184 -12288
nm 3583904 3575712 -8192
objdump 3965696 3957504 -8192
pack 2152400 2148304 -4096
pprof 13390864 13358096 -32768
test2json 2385296 2381200 -4096
trace 10123216 10106832 -16384
vet 6840944 6824560 -16384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix Performance
Projects
None yet
Development

No branches or pull requests

6 participants