Recently, I am working on the bit field optizations for arm64 and find that the existing bitfield optimizations will cause the compiler generate bad codes for some cases.
For example, for the following case, because we have the bitfield rewrite rule (SLLconst [sc] (MOVWUreg x)) && isARM64BFMask(sc, 1<<32-1, 0) => (UBFIZ [armBFAuxInt(sc, 32)] x) to optimize uint64(hi) << 18 as UBFIZ, but we also have the rewrite rule (OR x0 x1:(SLLconst [c] y)) && clobberIfDead(x1) => (ORshiftLL x0 y [c]) to optimize it as ORR(shifted register). Obviously, the later one is better.
e.g.
func or(hi, lo uint32) uint64 {
return uint64(hi) << 18 | uint64(lo)
}
// the master assembly code:
or STEXT size=32 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT or(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVWU hi(FP), R0
0x0004 00004 UBFIZ $18, R0, $32, R0
0x0008 00008 MOVWU lo+4(FP), R1
0x000c 00012 ORR R1, R0, R0
0x0010 00016 MOVD R0, r2+8(FP)
0x0014 00020 RET (R30)
// without the UBFIZ rewrite rule, the assembly code:
or STEXT size=32 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT or(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVWU hi(FP), R0
0x0004 00004 MOVWU lo+4(FP), R1
0x0008 00008 ORR R0<<18, R1, R0
0x000c 00012 MOVD R0, r2+8(FP)
0x0010 00016 RET (R30)
We know that the following rewrite rules are to merge zero/sign extensions into bitfiled ops. For the following case, they will eliminate a zero/sign extension instruction.
- (SRAconst [rc] (MOVWreg x)) && rc < 32 => (SBFX [armBFAuxInt(rc, 32-rc)] x) ...
- (SLLconst [sc] (MOVWUreg x)) && isARM64BFMask(sc, 1<<32-1, 0) => (UBFIZ [armBFAuxInt(sc, 32)] x) ...
- (SRLconst [sc] (MOVWUreg x)) && isARM64BFMask(sc, 1<<32-1, sc) => (UBFX [armBFAuxInt(sc, arm64BFWidth(1<<32-1, sc))] x)
e.g
func sbfx(a, b int32) int64 {
hi := a+b
return int64(hi) >> 18
}
func ubfiz(a, b uint32) uint64 {
hi := a+b
return uint64(hi) << 18
}
But in the following case, comparing the assembly code with and without these rewrite rules, both of have have only one intruction, there is no benefit from these changes. Without these rewrite rules,the reason why the zero/sign extension will not be generated is that the codegen do the optimization, that is, if the value is a proper-typed load, already zero/sign-extended, do not extend again.
e.g.
func shiftR(hi int32) int64 {
return int64(hi) >> 18
}
func shiftL(hi uint32) uint64 {
return uint64(hi) << 18
}
// the master assembly code:
shiftR STEXT size=16 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT shiftR(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVW hi(FP), R0
0x0004 00004 SBFX $18, R0, $14, R0
0x0008 00008 MOVD R0, r1+8(FP)
0x000c 00012 RET (R30)
shiftL STEXT size=16 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT shiftL(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVWU hi(FP), R0
0x0004 00004 UBFIZ $18, R0, $32, R0
0x0008 00008 MOVD R0, r1+8(FP)
0x000c 00012 RET (R30)
// without the above rules, the assembly code:
shiftR STEXT size=16 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT shiftR(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVW hi(FP), R0
0x0004 00004 ASR $18, R0, R0
0x0008 00008 MOVD R0, r1+8(FP)
0x000c 00012 RET (R30)
shiftL STEXT size=16 args=0x10 locals=0x0 funcid=0x0 leaf
0x0000 00000 TEXT shiftL(SB), LEAF|NOFRAME|ABIInternal, $0-16
0x0000 00000 MOVWU hi(FP), R0
0x0004 00004 LSL $18, R0, R0
0x0008 00008 MOVD R0, r1+8(FP)
0x000c 00012 RET (R30)
Recently, I am working on the bit field optizations for arm64 and find that the existing bitfield optimizations will cause the compiler generate bad codes for some cases.
For example, for the following case, because we have the bitfield rewrite rule
(SLLconst [sc] (MOVWUreg x)) && isARM64BFMask(sc, 1<<32-1, 0) => (UBFIZ [armBFAuxInt(sc, 32)] x)to optimizeuint64(hi) << 18asUBFIZ, but we also have the rewrite rule(OR x0 x1:(SLLconst [c] y)) && clobberIfDead(x1) => (ORshiftLL x0 y [c])to optimize it asORR(shifted register). Obviously, the later one is better.e.g.
We know that the following rewrite rules are to merge zero/sign extensions into bitfiled ops. For the following case, they will eliminate a zero/sign extension instruction.
e.g
But in the following case, comparing the assembly code with and without these rewrite rules, both of have have only one intruction, there is no benefit from these changes. Without these rewrite rules,the reason why the zero/sign extension will not be generated is that the codegen do the optimization, that is, if the value is a proper-typed load, already zero/sign-extended, do not extend again.
e.g.