New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: less optimized AMD64 code #24537

Open
benshi001 opened this Issue Mar 26, 2018 · 3 comments

Comments

Projects
None yet
4 participants
@benshi001
Member

benshi001 commented Mar 26, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

the newest master branch on March 25, 2018

Does this issue reproduce with the latest release?

not tried

What operating system and processor architecture are you using (go env)?

amd64/ubuntu16.04

What did you do?

test commit 4b00d3f#diff-6b1f55ed5d66ec08fb9cbe82259791e9

func ssa(a []int) bool {
        return a[0] > a[1]
}

is compiled to

        00000 (2)       TEXT    "".ssa(SB)
        00001 (2)       FUNCDATA        $0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
        00002 (2)       FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
 v13    00003 (2)       MOVQ    "".a+8(SP), AX
 v9     00004 (3)       TESTQ   AX, AX
 b1     00005 (3)       JLS     13
 v25    00006 (3)       MOVQ    "".a(SP), CX
 v16    00007 (3)       MOVQ    (CX), DX
 v3     00008 (3)       CMPQ    AX, $1
 b2     00009 (3)       JLS     13
 v4     00010 (3)       CMPQ    8(CX), DX
 v27    00011 (3)       SETLT   "".~r1+24(SP)
 b4     00012 (3)       RET
 v12    00013 (3)       PCDATA  $0, $1
 v12    00014 (3)       CALL    runtime.panicindex(SB)
 b3     00015 (3)       UNDEF
        00016 (?)       END

which is expected. but

func ssb(a []int) (bool, bool) {
        return a[0] > a[1], a[2] != a[3]
}

is compiled to

        00000 (6)       TEXT    "".ssb(SB)
    	00001 (6)       FUNCDATA        $0, gclocals·4032f753396f2012ad1784f398b170f4(SB)
    	00002 (6)       FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
v7     00003 (6)       MOVQ    "".a+8(SP), AX
v4     00004 (7)       TESTQ   AX, AX
b1     00005 (7)       JLS     24
v38    00006 (7)       MOVQ    "".a(SP), CX
v17    00007 (7)       MOVQ    (CX), DX
v16    00008 (7)       CMPQ    AX, $1
b2     00009 (7)       JLS     24
v10    00010 (7)       MOVQ    8(CX), BX
v12    00011 (7)       CMPQ    BX, DX
v24    00012 (7)       CMPQ    AX, $2
b4     00013 (7)       JLS     24
v34    00014 (7)       MOVQ    16(CX), SI
v33    00015 (7)       CMPQ    AX, $3
b5     00016 (7)       JLS     24
v43    00017 (7)       MOVQ    24(CX), AX
v8     00018 (7)       CMPQ    AX, SI
v40    00019 (7)       CMPQ    BX, DX
v46    00020 (7)       SETLT   "".~r1+24(SP)
v14    00021 (7)       CMPQ    AX, SI
v48    00022 (7)       SETNE   "".~r2+25(SP)
b6     00023 (7)       RET
v13    00024 (7)       PCDATA  $0, $1
v13    00025 (7)       CALL    runtime.panicindex(SB)
 b3     00026 (7)       UNDEF
		00027 (?)       END

The issue here

  1. a dead "v8 00018 (7) CMPQ AX, SI" is not eliminated
  2. CMPQmem are not used as expected
@andybons

This comment has been minimized.

Member

andybons commented Mar 26, 2018

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Mar 26, 2018

Looks like after https://go-review.googlesource.com/86035 flagalloc pass can generate DEAD ops and we don't run late deadcode after flagalloc, this explains 1.

@randall77

This comment has been minimized.

Contributor

randall77 commented Mar 28, 2018

For your first point, it has always been the case that when flagalloc needs to "spill" and "restore" flags, it leaves the old flag computation behind. It shouldn't be too hard to eliminate those comparisons. It wouldn't even require a full deadcode pass, because we know the args of the comparison are still used.

For your second point, this is working as expected, sort of. CMPQmem are generated, but they need to be undone during flagalloc. The problem is that we first schedule the CMPQmem, then a CMPQ for the bounds check. Because the bounds check clobbers flags, flagalloc must move the flag computation later, but it doesn't think it is safe to move the load later (it is safe, in this case, but it isn't in general).
I don't know whether we should fix this by making the scheduling smarter (moving the bounds checks earlier, maybe?) or teaching flagalloc to know when it is ok to move the load later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment