Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inefficient assembly code generated by simple loop&branch code #48882

Open
flypig opened this issue Oct 9, 2021 · 3 comments
Open

cmd/compile: inefficient assembly code generated by simple loop&branch code #48882

flypig opened this issue Oct 9, 2021 · 3 comments

Comments

@flypig
Copy link

@flypig flypig commented Oct 9, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

centos8 & amd64

What did you do?

Given the code:

//go:noinline
func test_loop(arr []int) (resp int){
        max := 0
        for i := 0; i < len(arr); i++ {
                if (arr[i] > max) {
                        max = arr[i]
                }
        }
        return max
}

What did you expect to see?

TEXT "".test_loop2(SB), ABIInternal
00001 (28) FUNCDATA $0, gclocals·1a65e721a2ccc325b382662e7ffee780(SB)
00002 (28) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
00003 (28) FUNCDATA $5, "".test_loop2.arginfo1(SB)
00004 (31) XORL CX, CX
00005 (31) XORL DX, DX
00006 (30) JMP 8
00007 (+30) INCQ CX
00008 (+30) CMPQ BX, CX
00009 (30) JLE 15
00010 (+31) MOVQ (AX)(CX*8), SI
00011 (31) CMPQ DX, SI
00012 (31) JGE 7
00013 (31) MOVQ SI, DX
00014 (+34) JMP 7
00015 (+36) MOVQ DX, AX
00016 (36) RET
00017 (?) END

What did you see instead?

00000 (40) TEXT "".test_loop3(SB), ABIInternal
00001 (40) FUNCDATA $0, gclocals·1a65e721a2ccc325b382662e7ffee780(SB)
00002 (40) FUNCDATA $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
00003 (40) FUNCDATA $5, "".test_loop3.arginfo1(SB)
00004 (43) XORL CX, CX
00005 (43) XORL DX, DX
00006 (42) JMP 9
00007 (+42) INCQ CX
00008 (43) MOVQ SI, DX
00009 (+42) CMPQ BX, CX
00010 (42) JLE 16
00011 (+43) MOVQ (AX)(CX*8), SI
00012 (43) CMPQ DX, SI
00013 (43) JLE 7
00014 (43) MOVQ DX, SI
00015 (43) JMP 7
00016 (+47) MOVQ DX, AX
00017 (47) RET
00018 (?) END

The above assembly code has 2 more invalid move operations than the first code. Can someone help explain the reason?

@seankhliao seankhliao changed the title cmd:compile: inefficient assembly code generated by simple loop&branch code cmd/compile: inefficient assembly code generated by simple loop&branch code Oct 9, 2021
@randall77
Copy link
Contributor

@randall77 randall77 commented Oct 9, 2021

This is probably the register allocator making a nonoptimal choice.

There's a phi operation which is merging arr[i] and max. arr[i] is in SI and max is in DX. It decides to put the phi result in SI. It would be better to put in in DX because it needs to end up in DX for the next iteration. The code is just arbitrarily picking between the registers of the phi inputs, and chooses wrong. Looks like phi register allocation could use some lookahead when there are multiple good choices for the phi register.

The real question is why this code isn't generating conditional moves. Looks like branchelim doesn't quite work because prove has left some dead code around that confuses branchelim. Hmmm...

@flypig
Copy link
Author

@flypig flypig commented Oct 10, 2021

@randall77 thx for answering, it helps me a lot.

If i rewrite the code like this, it will be OK.

func test_loop2(arr []int) (resp int) {
	max := 0
	for i := 0; i < len(arr); i++ {
		if (arr[i] <= max) {
			continue
		}
		max = arr[i]
	}
	return max
}

Look into the ssa.html ->
screenshot-20211010-110427

The problematic code ssa.html ->
screenshot-20211010-110441

What confuses me is that,these two codes are almost the same in the schedule phase. Why are different registers allocated during the RegAlloc phase?

@renthraysk
Copy link

@renthraysk renthraysk commented Oct 12, 2021

The only method I've found that consistently generates conditional moves is to use a helper function.

func tern(b bool, t,f int) int {
    if b { return t }
    return f
}

func test_loop(arr []int) (resp int) {
	max := 0
	for i := 0; i < len(arr); i++ {
            max = tern(arr[i] > max, arr[i], max)
	}
	return max
}

https://go.godbolt.org/z/ox57Pvobr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants