Skip to content

Commit

Permalink
cmd/internal/obj/arm64, cmd/compile: improve offset folding on ARM64
Browse files Browse the repository at this point in the history
ARM64 assembler backend only accepts loads and stores with small
or aligned offset. The compiler therefore can only fold small or
aligned offsets into loads and stores. For locals and args, their
offsets to SP are not known until very late, and the compiler
makes conservative decision not folding some of them. However,
in most cases, the offset is indeed small or aligned, and can
be folded into load and store (but actually not).

This CL adds support of loads and stores with large and unaligned
offsets. When the offset doesn't fit into the instruction, it
uses two instructions and (for very large offset) the constant
pool. This way, the compiler doesn't need to be conservative,
and can simply fold the offset.

To make it work, the assembler's optab matching rules need to be
changed. Before, MOVD accepts C_UAUTO32K which matches multiple
of 8 between 0 and 32K, and also C_UAUTO16K, which may not be
multiple of 8 and does not fit into MOVD instruction. The
assembler errors in the latter case. This change makes it only
matches multiple of 8 (or offsets within ±256, which also fits
in instruction), and uses the large-or-unaligned-offset rule
for things doesn't fit (without error). Other sized move rules
are changed similarly.

Class C_UAUTO64K and C_UOREG64K are removed, as they are never
used.

In shared library, load/store of global is rewritten to using
GOT and temp register, which conflicts with the use of temp
register for assembling large offset. So the folding is disabled
for globals in shared library mode.

Reduce cmd/go binary size by 2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              8.67s ± 0%     8.61s ± 0%   -0.60%  (p=0.000 n=9+10)
Fannkuch11-8                6.24s ± 0%     6.19s ± 0%   -0.83%  (p=0.000 n=10+9)
FmtFprintfEmpty-8           116ns ± 0%     116ns ± 0%     ~     (all equal)
FmtFprintfString-8          196ns ± 0%     192ns ± 0%   -1.89%  (p=0.000 n=10+10)
FmtFprintfInt-8             199ns ± 0%     198ns ± 0%   -0.35%  (p=0.001 n=9+10)
FmtFprintfIntInt-8          294ns ± 0%     293ns ± 0%   -0.34%  (p=0.000 n=8+8)
FmtFprintfPrefixedInt-8     318ns ± 1%     318ns ± 1%     ~     (p=1.000 n=10+10)
FmtFprintfFloat-8           537ns ± 0%     531ns ± 0%   -1.17%  (p=0.000 n=9+10)
FmtManyArgs-8              1.19µs ± 1%    1.18µs ± 1%   -1.41%  (p=0.001 n=10+10)
GobDecode-8                17.2ms ± 1%    17.3ms ± 2%     ~     (p=0.165 n=10+10)
GobEncode-8                14.7ms ± 1%    14.7ms ± 2%     ~     (p=0.631 n=10+10)
Gzip-8                      837ms ± 0%     836ms ± 0%   -0.14%  (p=0.006 n=9+10)
Gunzip-8                    141ms ± 0%     139ms ± 0%   -1.24%  (p=0.000 n=9+10)
HTTPClientServer-8          256µs ± 1%     253µs ± 1%   -1.35%  (p=0.000 n=10+10)
JSONEncode-8               40.1ms ± 1%    41.3ms ± 1%   +3.06%  (p=0.000 n=10+9)
JSONDecode-8                157ms ± 1%     156ms ± 1%   -0.83%  (p=0.001 n=9+8)
Mandelbrot200-8            8.94ms ± 0%    8.94ms ± 0%   +0.02%  (p=0.000 n=9+9)
GoParse-8                  8.69ms ± 0%    8.54ms ± 1%   -1.69%  (p=0.000 n=8+10)
RegexpMatchEasy0_32-8       227ns ± 1%     228ns ± 1%   +0.48%  (p=0.016 n=10+9)
RegexpMatchEasy0_1K-8      1.92µs ± 0%    1.63µs ± 0%  -15.08%  (p=0.000 n=10+9)
RegexpMatchEasy1_32-8       256ns ± 0%     251ns ± 0%   -2.19%  (p=0.000 n=10+9)
RegexpMatchEasy1_1K-8      2.38µs ± 0%    2.09µs ± 0%  -12.49%  (p=0.000 n=10+9)
RegexpMatchMedium_32-8      352ns ± 0%     354ns ± 0%   +0.39%  (p=0.002 n=10+9)
RegexpMatchMedium_1K-8      106µs ± 0%     106µs ± 0%   -0.05%  (p=0.005 n=10+9)
RegexpMatchHard_32-8       5.92µs ± 0%    5.89µs ± 0%   -0.40%  (p=0.000 n=9+8)
RegexpMatchHard_1K-8        180µs ± 0%     179µs ± 0%   -0.14%  (p=0.000 n=10+9)
Revcomp-8                   1.20s ± 0%     1.13s ± 0%   -6.29%  (p=0.000 n=9+8)
Template-8                  159ms ± 1%     154ms ± 1%   -3.14%  (p=0.000 n=9+10)
TimeParse-8                 800ns ± 3%     769ns ± 1%   -3.91%  (p=0.000 n=10+10)
TimeFormat-8                826ns ± 2%     817ns ± 2%   -1.04%  (p=0.050 n=10+10)
[Geo mean]                  145µs          143µs        -1.79%

Change-Id: I5fc42087cee9b54ea414f8ef6d6d020b80eb5985
Reviewed-on: https://go-review.googlesource.com/42172
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
  • Loading branch information
cherrymui committed May 9, 2017
1 parent 5e0bcb3 commit fb0ccc5
Show file tree
Hide file tree
Showing 8 changed files with 562 additions and 258 deletions.
54 changes: 54 additions & 0 deletions src/cmd/asm/internal/asm/testdata/arm64.s
Expand Up @@ -85,6 +85,60 @@ TEXT foo(SB), DUPOK|NOSPLIT, $-8
MOVD $1, R1
MOVD ZR, (R1)

// small offset fits into instructions
MOVB 1(R1), R2 // 22048039
MOVH 1(R1), R2 // 22108078
MOVH 2(R1), R2 // 22048079
MOVW 1(R1), R2 // 221080b8
MOVW 4(R1), R2 // 220480b9
MOVD 1(R1), R2 // 221040f8
MOVD 8(R1), R2 // 220440f9
FMOVS 1(R1), F2 // 221040bc
FMOVS 4(R1), F2 // 220440bd
FMOVD 1(R1), F2 // 221040fc
FMOVD 8(R1), F2 // 220440fd
MOVB R1, 1(R2) // 41040039
MOVH R1, 1(R2) // 41100078
MOVH R1, 2(R2) // 41040079
MOVW R1, 1(R2) // 411000b8
MOVW R1, 4(R2) // 410400b9
MOVD R1, 1(R2) // 411000f8
MOVD R1, 8(R2) // 410400f9
FMOVS F1, 1(R2) // 411000bc
FMOVS F1, 4(R2) // 410400bd
FMOVD F1, 1(R2) // 411000fc
FMOVD F1, 8(R2) // 410400fd

// large aligned offset, use two instructions
MOVB 0x1001(R1), R2 // MOVB 4097(R1), R2 // 3b04409162078039
MOVH 0x2002(R1), R2 // MOVH 8194(R1), R2 // 3b08409162078079
MOVW 0x4004(R1), R2 // MOVW 16388(R1), R2 // 3b104091620780b9
MOVD 0x8008(R1), R2 // MOVD 32776(R1), R2 // 3b204091620740f9
FMOVS 0x4004(R1), F2 // FMOVS 16388(R1), F2 // 3b104091620740bd
FMOVD 0x8008(R1), F2 // FMOVD 32776(R1), F2 // 3b204091620740fd
MOVB R1, 0x1001(R2) // MOVB R1, 4097(R2) // 5b04409161070039
MOVH R1, 0x2002(R2) // MOVH R1, 8194(R2) // 5b08409161070079
MOVW R1, 0x4004(R2) // MOVW R1, 16388(R2) // 5b104091610700b9
MOVD R1, 0x8008(R2) // MOVD R1, 32776(R2) // 5b204091610700f9
FMOVS F1, 0x4004(R2) // FMOVS F1, 16388(R2) // 5b104091610700bd
FMOVD F1, 0x8008(R2) // FMOVD F1, 32776(R2) // 5b204091610700fd

// very large or unaligned offset uses constant pool
// the encoding cannot be checked as the address of the constant pool is unknown.
// here we only test that they can be assembled.
MOVB 0x44332211(R1), R2 // MOVB 1144201745(R1), R2
MOVH 0x44332211(R1), R2 // MOVH 1144201745(R1), R2
MOVW 0x44332211(R1), R2 // MOVW 1144201745(R1), R2
MOVD 0x44332211(R1), R2 // MOVD 1144201745(R1), R2
FMOVS 0x44332211(R1), F2 // FMOVS 1144201745(R1), F2
FMOVD 0x44332211(R1), F2 // FMOVD 1144201745(R1), F2
MOVB R1, 0x44332211(R2) // MOVB R1, 1144201745(R2)
MOVH R1, 0x44332211(R2) // MOVH R1, 1144201745(R2)
MOVW R1, 0x44332211(R2) // MOVW R1, 1144201745(R2)
MOVD R1, 0x44332211(R2) // MOVD R1, 1144201745(R2)
FMOVS F1, 0x44332211(R2) // FMOVS F1, 1144201745(R2)
FMOVD F1, 0x44332211(R2) // FMOVD F1, 1144201745(R2)

//
// MOVK
//
Expand Down
10 changes: 10 additions & 0 deletions src/cmd/compile/internal/gc/asm_test.go
Expand Up @@ -1424,6 +1424,16 @@ var linuxARM64Tests = []*asmTest{
`,
[]string{"\tAND\t"},
},
{
// make sure offsets are folded into load and store.
`
func f36(_, a [20]byte) (b [20]byte) {
b = a
return
}
`,
[]string{"\tMOVD\t\"\"\\.a\\+[0-9]+\\(RSP\\), R[0-9]+", "\tMOVD\tR[0-9]+, \"\"\\.b\\+[0-9]+\\(RSP\\)"},
},
}

var linuxMIPSTests = []*asmTest{
Expand Down
115 changes: 76 additions & 39 deletions src/cmd/compile/internal/ssa/gen/ARM64.rules
Expand Up @@ -525,103 +525,141 @@
(ADDconst [off1] (MOVDaddr [off2] {sym} ptr)) -> (MOVDaddr [off1+off2] {sym} ptr)

// fold address into load/store
(MOVBload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 1, sym) ->
(MOVBload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBload [off1+off2] {sym} ptr mem)
(MOVBUload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 1, sym) ->
(MOVBUload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBUload [off1+off2] {sym} ptr mem)
(MOVHload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 2, sym) ->
(MOVHload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHload [off1+off2] {sym} ptr mem)
(MOVHUload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 2, sym) ->
(MOVHUload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHUload [off1+off2] {sym} ptr mem)
(MOVWload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(MOVWload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWload [off1+off2] {sym} ptr mem)
(MOVWUload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(MOVWUload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWUload [off1+off2] {sym} ptr mem)
(MOVDload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 8, sym) ->
(MOVDload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDload [off1+off2] {sym} ptr mem)
(FMOVSload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(FMOVSload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVSload [off1+off2] {sym} ptr mem)
(FMOVDload [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 8, sym) ->
(FMOVDload [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVDload [off1+off2] {sym} ptr mem)

(MOVBstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 1, sym) ->
(MOVBstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBstore [off1+off2] {sym} ptr val mem)
(MOVHstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 2, sym) ->
(MOVHstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHstore [off1+off2] {sym} ptr val mem)
(MOVWstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(MOVWstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWstore [off1+off2] {sym} ptr val mem)
(MOVDstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 8, sym) ->
(MOVDstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDstore [off1+off2] {sym} ptr val mem)
(FMOVSstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(FMOVSstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVSstore [off1+off2] {sym} ptr val mem)
(FMOVDstore [off1] {sym} (ADDconst [off2] ptr) val mem) && fitsARM64Offset(off1+off2, 8, sym) ->
(FMOVDstore [off1] {sym} (ADDconst [off2] ptr) val mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVDstore [off1+off2] {sym} ptr val mem)
(MOVBstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 1, sym) ->
(MOVBstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBstorezero [off1+off2] {sym} ptr mem)
(MOVHstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 2, sym) ->
(MOVHstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHstorezero [off1+off2] {sym} ptr mem)
(MOVWstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 4, sym) ->
(MOVWstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWstorezero [off1+off2] {sym} ptr mem)
(MOVDstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && fitsARM64Offset(off1+off2, 8, sym) ->
(MOVDstorezero [off1] {sym} (ADDconst [off2] ptr) mem) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDstorezero [off1+off2] {sym} ptr mem)

(MOVBload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 1, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVBUload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 1, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBUload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVHload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 2, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVHUload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 2, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHUload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVWload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVWUload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWUload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVDload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 8, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(FMOVSload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVSload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(FMOVDload [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 8, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVDload [off1+off2] {mergeSym(sym1,sym2)} ptr mem)

(MOVBstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 1, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(MOVHstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 2, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(MOVWstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(MOVDstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 8, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(FMOVSstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVSstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(FMOVDstore [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) val mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 8, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(FMOVDstore [off1+off2] {mergeSym(sym1,sym2)} ptr val mem)
(MOVBstorezero [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 1, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVBstorezero [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVHstorezero [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 2, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVHstorezero [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVWstorezero [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 4, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVWstorezero [off1+off2] {mergeSym(sym1,sym2)} ptr mem)
(MOVDstorezero [off1] {sym1} (MOVDaddr [off2] {sym2} ptr) mem)
&& canMergeSym(sym1,sym2) && fitsARM64Offset(off1+off2, 8, mergeSym(sym1, sym2)) ->
&& canMergeSym(sym1,sym2) && is32Bit(off1+off2)
&& (ptr.Op != OpSB || !config.ctxt.Flag_shared) ->
(MOVDstorezero [off1+off2] {mergeSym(sym1,sym2)} ptr mem)

// store zero
Expand Down Expand Up @@ -1211,7 +1249,6 @@
y0:(MOVDnop x0:(MOVBUload [i1] {s} p mem))
y1:(MOVDnop x1:(MOVBUload [i0] {s} p mem)))
&& i1 == i0+1
&& fitsARM64Offset(i0, 2, s)
&& x0.Uses == 1 && x1.Uses == 1
&& y0.Uses == 1 && y1.Uses == 1
&& mergePoint(b,x0,x1) != nil
Expand Down
14 changes: 0 additions & 14 deletions src/cmd/compile/internal/ssa/rewrite.go
Expand Up @@ -283,20 +283,6 @@ func isAuto(s interface{}) bool {
return ok
}

func fitsARM64Offset(off, align int64, sym interface{}) bool {
// only small offset (between -256 and 256) or offset that is a multiple of data size
// can be encoded in the instructions
// since this rewriting takes place before stack allocation, the offset to SP is unknown,
// so don't do it for args and locals with unaligned offset
if !is32Bit(off) {
return false
}
if align == 1 {
return true
}
return !isArg(sym) && (off%align == 0 || off < 256 && off > -256 && !isAuto(sym))
}

// isSameSym returns whether sym is the same as the given named symbol
func isSameSym(sym interface{}, name string) bool {
s, ok := sym.(fmt.Stringer)
Expand Down

0 comments on commit fb0ccc5

Please sign in to comment.