Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile/internal/ssa: plan9-amd64 is broken #15001

Closed
mdempsky opened this issue Mar 28, 2016 · 11 comments

Comments

@mdempsky
Copy link
Member

commented Mar 28, 2016

https://go-review.googlesource.com/#/c/21005 broke plan9-amd64. This can be reproduced with

$ GOARCH=amd64 GOOS=plan9 go build syscall
# syscall
./zsysnum_plan9.go:50: invalid instruction: 00322 ($GOROOT/src/syscall/exec_plan9.go:108)   MOVWLZX "".buf+121(SP), DI
./zsysnum_plan9.go:50: invalid instruction: 00416 ($GOROOT/src/syscall/exec_plan9.go:115)   MOVWLZX "".buf+121(SP), R8

Standalone mostly minimized test case at https://gist.github.com/mdempsky/8e17da862a0f3a2c1036. This test case repros even with GOOS=linux, so the failure isn't plan9 specific at all, except that package syscall's plan9-specific code happens to tickle the failure.

Also, it seems to be sensitive to inlining, because removing the definition of Pread causes it to stop failing.

/cc @randall77 @dr2chase

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2016

cmd/asm seems to accept:

#include "textflag.h"

TEXT repro(SB),NOSPLIT,$0-0
    MOVWLZX ·buf+121(SP), DI
    MOVWLZX ·buf+121(SP), R8
    RET

Not sure why cmd/internal/obj would be unhappy with cmd/compile generating it.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2016

I can generate that exact instruction (with different output register) with:

func f() uint16 {
    var b [128]byte
    return uint16(b[121]) | uint16(b[122])<<8
}

and it works fine.
Bootstrapping with SSA off and using that compiler to compile your test case also fails, so it isn't a bug SSA introduces in the compiler itself.
Weird...

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2016

Ah, I see what is happening. Small repro:

func f(i int) uint16 {
    var b [128]byte
    x := b[i:]
    return uint16(x[121]) | uint16(x[122])<<8
}

We're trying to generate MOVWLZX ·buf+121(CX), CX but that seems disallowed, although we could in principle assemble such a thing. The printer assumes and prints a base register of SP when it sees a stack local name, which is why the error is not obvious.
One fix is to disable rewrites that allow non-SP bases in memory loads/stores with symbolic offsets.
Another fix would be to allow instructions like the one above.

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2016

I note that trying to assemble:

TEXT repro(SB),1,$0-0
    MOVWLZX buf+121(SP)(CX), CX
    RET

fails with:

asm: invalid instruction: 00000 (repro.s:2) MOVWLZX buf+121(SP), CX

Definitely confusing that cmd/internal/obj isn't printing the (CX) index.

Also notably, changing it to (CX*1) makes it assemble successfully. Is that an easy fix? Is the SSA backend not setting the index scale correctly?

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2016

I think this is a separate issue. buf+121(SP)(CX*1) is ok, SSA generates it fine and the assembler assembles it fine. I'd considier buf+121(SP)(CX) as a syntax error, I'm surprised it gets as far as it does through the assembler.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 29, 2016

CL https://golang.org/cl/21245 mentions this issue.

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2016

I've uploaded https://go-review.googlesource.com/#/c/21245 which fixes the compilation failure. The result now is:

0x0000 00000 (/tmp/repro.go:3)  TEXT    "".f(SB), $128-16
0x0000 00000 (/tmp/repro.go:3)  MOVQ    (TLS), CX
0x0009 00009 (/tmp/repro.go:3)  CMPQ    SP, 16(CX)
0x000d 00013 (/tmp/repro.go:3)  JLS 115
0x000f 00015 (/tmp/repro.go:3)  SUBQ    $128, SP
0x0016 00022 (/tmp/repro.go:3)  FUNCDATA    $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB)
0x0016 00022 (/tmp/repro.go:3)  FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0016 00022 (/tmp/repro.go:4)  LEAQ    "".b(SP), DI
0x001a 00026 (/tmp/repro.go:4)  XORPS   X0, X0
0x001d 00029 (/tmp/repro.go:4)  DUFFZERO    $266
0x0022 00034 (/tmp/repro.go:5)  MOVQ    "".i+136(FP), CX
0x002a 00042 (/tmp/repro.go:5)  CMPQ    CX, $128
0x0031 00049 (/tmp/repro.go:5)  JHI $0, 108
0x0033 00051 (/tmp/repro.go:5)  LEAQ    -128(CX), AX
0x0037 00055 (/tmp/repro.go:5)  NEGQ    AX
0x003a 00058 (/tmp/repro.go:5)  CMPQ    AX, $0
0x003e 00062 (/tmp/repro.go:5)  JEQ $0, 104
0x0040 00064 (/tmp/repro.go:6)  CMPQ    AX, $121
0x0044 00068 (/tmp/repro.go:6)  JLS $0, 97
0x0046 00070 (/tmp/repro.go:6)  MOVWLZX "".b+121(SP)(CX*1), CX
0x004b 00075 (/tmp/repro.go:6)  CMPQ    AX, $122
0x004f 00079 (/tmp/repro.go:6)  JLS $0, 97
0x0051 00081 (/tmp/repro.go:6)  MOVW    CX, "".~r1+144(FP)
0x0059 00089 (/tmp/repro.go:6)  ADDQ    $128, SP
0x0060 00096 (/tmp/repro.go:6)  RET
0x0061 00097 (/tmp/repro.go:6)  PCDATA  $0, $0
0x0061 00097 (/tmp/repro.go:6)  CALL    runtime.panicindex(SB)
0x0066 00102 (/tmp/repro.go:6)  UNDEF
0x0068 00104 (/tmp/repro.go:5)  MOVQ    $0, CX
0x006a 00106 (/tmp/repro.go:6)  JMP 64
0x006c 00108 (/tmp/repro.go:5)  PCDATA  $0, $0
0x006c 00108 (/tmp/repro.go:5)  CALL    runtime.panicslice(SB)
0x0071 00113 (/tmp/repro.go:5)  UNDEF
0x0073 00115 (/tmp/repro.go:5)  NOP
0x0073 00115 (/tmp/repro.go:3)  CALL    runtime.morestack_noctxt(SB)
0x0078 00120 (/tmp/repro.go:3)  JMP 0

which translates to roughly:

cx = i
if cx > 128 { panicslice() }
ax = 128 - cx
if ax == 0 { cx = 0 }
if ax <= 121 { panicindex() }
cx = *(sp + 121 + cx)
if ax <= 122 { panicindex() }
return cx

Notably the load is occurring before we've validated that the slice length is >122. It's possible this could result in triggering a SIGSEGV instead of a panicindex failure.

Also, I can't quite make sense of the if ax == 0 { cx = 0 } logic, but that's present even before CL 21005.

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2016

Notably, if you remove the b = b[:X:len(b)] "hints" from encoding/binary.littleEndian.UintXX, you get the same subtle miscompilation: only the len(b) > 0 check happens before the combined load.

So all that to say, I think CL 21245 fixes the immediate problems with CL 21005 (namely that plan9-amd64 fails to build), but there's still a subtle correctness issue with the load combining rules.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2016

Yuck, yes, we need to be a bit more careful with these rules, don't we. Maybe we need to put the resulting load in a block that is dominated by or equal to the blocks of the original loads.

Now that I look at it, I think we need uses==1 for all of the byte loads as well.

@mdempsky

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2016

(For the record, plan9-amd64 is actually still broken at da19a0c, but we're at least it's now back to the older #14894 failure.)

@0intro

This comment has been minimized.

Copy link
Member

commented Mar 29, 2016

Thanks for fixing this issue.

@golang golang locked and limited conversation to collaborators Mar 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
4 participants
You can’t perform that action at this time.