New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: merge sequential memory moves into bigger memory moves #25866

Open
valyala opened this Issue Jun 13, 2018 · 2 comments

Comments

Projects
None yet
3 participants
@valyala
Contributor

valyala commented Jun 13, 2018

The issue

Go tip compiles the following code into the following assembly:

func fff(s string) int {
        return strings.IndexByte(s, ' ')
}
EXT fff(SB)
func fff(s string) int {
  0x50f060              64488b0c25f8ffffff      MOVQ FS:0xfffffff8, CX
  0x50f069              483b6110                CMPQ 0x10(CX), SP
  0x50f06d              763f                    JBE 0x50f0ae
  0x50f06f              4883ec28                SUBQ $0x28, SP
  0x50f073              48896c2420              MOVQ BP, 0x20(SP)
  0x50f078              488d6c2420              LEAQ 0x20(SP), BP
        return strings.IndexByte(s, ' ')
  0x50f07d              488b442430              MOVQ 0x30(SP), AX
  0x50f082              48890424                MOVQ AX, 0(SP)
  0x50f086              488b442438              MOVQ 0x38(SP), AX
  0x50f08b              4889442408              MOVQ AX, 0x8(SP)
  0x50f090              c644241020              MOVB $0x20, 0x10(SP)
  0x50f095              e8b635efff              CALL strings.IndexByte(SB)
  0x50f09a              488b442418              MOVQ 0x18(SP), AX
  0x50f09f              4889442440              MOVQ AX, 0x40(SP)
  0x50f0a4              488b6c2420              MOVQ 0x20(SP), BP
  0x50f0a9              4883c428                ADDQ $0x28, SP
  0x50f0ad              c3                      RET
func fff(s string) int {
  0x50f0ae              e85d7df4ff              CALL runtime.morestack_noctxt(SB)
  0x50f0b3              ebab                    JMP fff(SB)

The assembly copies sequential stack arguments from fff to IndexByte using the following instructions:

  0x50f07d              488b442430              MOVQ 0x30(SP), AX
  0x50f082              48890424                MOVQ AX, 0(SP)
  0x50f086              488b442438              MOVQ 0x38(SP), AX
  0x50f08b              4889442408              MOVQ AX, 0x8(SP)

These instructions may be substituted by shorter (and, probably, faster) assembly:

MOVO 0x30(SP), Xn
MOVO Xn, 0(SP)

Solution

Add the corresponding arch-specific rewrite rules for merging sequential memory moves into bigger memory moves

cc'ing @randall77 and @TocarIP .

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Jun 13, 2018

We've tried this in https://go-review.googlesource.com/c/go/+/57130 but it caused performance regression #23424 and had to be reverted

@valyala

This comment has been minimized.

Contributor

valyala commented Jun 15, 2018

@TocarIP , probably, the CL must be restricted only to aligned load / store pairs as noted by @ulikunitz at #23424 (comment) ?

Now we can avoid regressions from #23424 by testing the CL against benchmarks from the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment