Note: as far as I know, this issue is not reachable with any Go source today, but it may be roadblock as we do more with SIMD.
When preforming register allocation, rematerializable values are not issued immediately.
Instead, when the value is used as an input to a later op, allocValToReg will copy the value and then assign it to a register based on the allocValToReg mask argument.
This mask argument is the constraint on the input argument at the point of use. But the rematerializable value also has output register constraints, which are completely ignored. Thus register allocation may assign an invalid register.
As far as I know this isn't reachable due to the limited number of rematerializable ops and a limited set of conflicting ops that may take them as an input. The obvious case here is a rematerializable op with a GPR output and users with FP input.
https://go.dev/cl/629815 demonstrates this issue by manually using POR with a int constant. You can build it with GOARCH=amd64 GOAMD64=v2 go build runtime, which fails with:
# runtime
<autogenerated>:1: runtime.(*pageBits).popcntRange: invalid instruction: 00077 (/usr/local/google/home/mpratt/src/go/src/runtime/mpallocbits.go:116) XORL X1, X1
<autogenerated>:1: runtime.(*pageBits).popcntRange: invalid instruction: 00135 (/usr/local/google/home/mpratt/src/go/src/runtime/mpallocbits.go:113) XORL X0, X0
<autogenerated>:1: runtime.(*mspan).countAlloc: invalid instruction: 00038 (/usr/local/google/home/mpratt/src/go/src/runtime/mbitmap.go:1418) XORL X1, X1
<autogenerated>:1: runtime.sweepLocked.countAlloc: invalid instruction: 00048 (<autogenerated>:1) XORL X1, X1
<autogenerated>:1: runtime.(*liveUserArenaChunk).countAlloc: invalid instruction: 00050 (<autogenerated>:1) XORL X1, X1
<autogenerated>:1: runtime.liveUserArenaChunk.countAlloc: invalid instruction: 00053 (<autogenerated>:1) XORL X1, X1
<autogenerated>:1: runtime.(*sweepLocked).countAlloc: invalid instruction: 00050 (<autogenerated>:1) XORL X1, X1
<autogenerated>:1: go:(**mspan).runtime.countAlloc: invalid instruction: 00059 (<autogenerated>:1) XORL X1, X1
<autogenerated>:1: runtime.(*pageCache).allocN: invalid instruction: 00175 (/usr/local/google/home/mpratt/src/go/src/runtime/mpagecache.go:63) XORL X1, X1
<autogenerated>:1: runtime.(*pageAlloc).allocToCache: invalid instruction: 00506 (/usr/local/google/home/mpratt/src/go/src/runtime/mpagecache.go:171) XORL X1, X1
<autogenerated>:1: too many errors
Looking at the SSA of runtime.sweepLocked.countAlloc, we see
(+1418) v47 = POPCNTQ <int> v46 : SI
(-1418) v34 = Copy <int> v47 : X0
(-1418) v38 = MOVQconst <int> [0] : X1
(1418) v48 = POR <int> v34 v38 : X0
...
(-1418) v33 = Copy <int> v48 : SI
(1418) v50 = ADDQ <int> v58 v33 : BX **(count[int])**
The problem here is that MOVQconst is assigned register X1, even though it has an output constraint for gp registers only.
It is unclear to me what we want to happen here. Probably one of:
- Register allocation emits an extra Copy for rematerializable outputs to incompatible inputs, similar to how v34 and v33 above copy between gp and fp registers.
- Or, earlier passes should prevent incompatible values from ever becoming arguments to later ops. e.g., some pass would convert
MOVQconst to MOVSDconst.
(2) alludes to why I don't think this can be triggered in Go code today: the main incompatibility is between gp and fp registers, but fp registers today are used almost exclusively for float operations, so the inputs are already floats. But if we start doing more SIMD, this will be less true, as many operations treat the fp registers as vectors of non-float types.
cc @golang/compiler @randall77
Note: as far as I know, this issue is not reachable with any Go source today, but it may be roadblock as we do more with SIMD.
When preforming register allocation, rematerializable values are not issued immediately.
Instead, when the value is used as an input to a later op,
allocValToRegwill copy the value and then assign it to a register based on theallocValToRegmaskargument.This
maskargument is the constraint on the input argument at the point of use. But the rematerializable value also has output register constraints, which are completely ignored. Thus register allocation may assign an invalid register.As far as I know this isn't reachable due to the limited number of rematerializable ops and a limited set of conflicting ops that may take them as an input. The obvious case here is a rematerializable op with a GPR output and users with FP input.
https://go.dev/cl/629815 demonstrates this issue by manually using POR with a int constant. You can build it with
GOARCH=amd64 GOAMD64=v2 go build runtime, which fails with:Looking at the SSA of
runtime.sweepLocked.countAlloc, we seeThe problem here is that
MOVQconstis assigned register X1, even though it has an output constraint for gp registers only.It is unclear to me what we want to happen here. Probably one of:
MOVQconsttoMOVSDconst.(2) alludes to why I don't think this can be triggered in Go code today: the main incompatibility is between gp and fp registers, but fp registers today are used almost exclusively for float operations, so the inputs are already floats. But if we start doing more SIMD, this will be less true, as many operations treat the fp registers as vectors of non-float types.
cc @golang/compiler @randall77