New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: unexpected memequal call in short string comparison #24765

Open
bradfitz opened this Issue Apr 8, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@bradfitz
Member

bradfitz commented Apr 8, 2018

(Using Go tip, 8818b4d)

Compiling this code on amd64,

type BoxType [4]byte

func (t BoxType) EqualString(s string) bool {
        return string(t[:]) == s
}

I would expect that to compile to something pretty minimal, since the memory comparison must always be 4 bytes, but instead I see a call to memequal:

"".BoxType.EqualString STEXT size=103 args=0x20 locals=0x28
        0x0000 00000 (./bmff/bmff.go:62)        TEXT    "".BoxType.EqualString(SB), $40-32
        0x0000 00000 (./bmff/bmff.go:62)        MOVQ    (TLS), CX
        0x0009 00009 (./bmff/bmff.go:62)        CMPQ    SP, 16(CX)
        0x000d 00013 (./bmff/bmff.go:62)        JLS     96
        0x000f 00015 (./bmff/bmff.go:62)        SUBQ    $40, SP
        0x0013 00019 (./bmff/bmff.go:62)        MOVQ    BP, 32(SP)
        0x0018 00024 (./bmff/bmff.go:62)        LEAQ    32(SP), BP
        0x001d 00029 (./bmff/bmff.go:62)        FUNCDATA        $0, gclocals·a20105803dd226ab8faa525f9ceddf12(SB)
        0x001d 00029 (./bmff/bmff.go:62)        FUNCDATA        $1, gclocals·69c1753bd5f81501d95132d08af04464(SB)
        0x001d 00029 (./bmff/bmff.go:63)        MOVQ    "".s+64(SP), AX
        0x0022 00034 (./bmff/bmff.go:63)        CMPQ    AX, $4
        0x0026 00038 (./bmff/bmff.go:63)        JEQ     56
        0x0028 00040 (./bmff/bmff.go:63)        XORL    AX, AX
        0x002a 00042 (./bmff/bmff.go:63)        MOVB    AL, "".~r1+72(SP)
        0x002e 00046 (./bmff/bmff.go:63)        MOVQ    32(SP), BP
        0x0033 00051 (./bmff/bmff.go:63)        ADDQ    $40, SP
        0x0037 00055 (./bmff/bmff.go:63)        RET
        0x0038 00056 (./bmff/bmff.go:63)        LEAQ    "".t+48(SP), AX
        0x003d 00061 (./bmff/bmff.go:63)        MOVQ    AX, (SP)
        0x0041 00065 (./bmff/bmff.go:63)        MOVQ    "".s+56(SP), AX
        0x0046 00070 (./bmff/bmff.go:63)        MOVQ    AX, 8(SP)
        0x004b 00075 (./bmff/bmff.go:63)        MOVQ    $4, 16(SP)
        0x0054 00084 (./bmff/bmff.go:63)        PCDATA  $0, $1
        0x0054 00084 (./bmff/bmff.go:63)        CALL    runtime.memequal(SB)
        0x0059 00089 (./bmff/bmff.go:63)        MOVBLZX 24(SP), AX
        0x005e 00094 (./bmff/bmff.go:63)        JMP     42
        0x0060 00096 (./bmff/bmff.go:63)        NOP
        0x0060 00096 (./bmff/bmff.go:62)        PCDATA  $0, $-1
        0x0060 00096 (./bmff/bmff.go:62)        CALL    runtime.morestack_noctxt(SB)
        0x0065 00101 (./bmff/bmff.go:62)        JMP     0

Ideally we'd just do a 4 byte (potentially unaligned) load from the string into a register and compare with the receiver?

/cc @josharian @randall77 @rasky

@bradfitz bradfitz added the Performance label Apr 8, 2018

@bradfitz bradfitz added this to the Unplanned milestone Apr 8, 2018

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 11, 2018

In case someone wants to look into this, the relevant bit of code to soup up is from walk.go, func walkexpr, near the beginning of case OCMPSTR:

		// Rewrite comparisons to short constant strings as length+byte-wise comparisons.
		var cs, ncs *Node // const string, non-const string
		switch {
		case Isconst(n.Left, CTSTR) && Isconst(n.Right, CTSTR):
			// ignore; will be constant evaluated
		case Isconst(n.Left, CTSTR):
			cs = n.Left
			ncs = n.Right
		case Isconst(n.Right, CTSTR):
			cs = n.Right
			ncs = n.Left
		}

This code tries to detect strings with known length by looking for constant strings. Fixing this issue would involve adding detection of strings converted from byte (and rune) slices of known length.

Might (maybe) be a decent starter compiler issue.

@ChrisALiles

This comment has been minimized.

Contributor

ChrisALiles commented Apr 17, 2018

I’m continuing to try to get started (or maybe continually trying) so I’d like to have a look at this.

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 18, 2018

It might make sense to do this via SSA rewrite rules. See the memmove rewrite rules for reference. Without some digging, I couldn't say offhand which is best.

One argument for putting it somewhere suitable in the front-end is that this code should boil down to returning a constant:

package p

func f() int {
	var a [4]byte
	return len(string(a[:]))
}
@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Jun 28, 2018

@ChrisALiles are you working on this? I've encountered this pattern in internal code, so I was thinking on implementing this optimization.

@bradfitz bradfitz added the NeedsFix label Jun 28, 2018

@ChrisALiles

This comment has been minimized.

Contributor

ChrisALiles commented Jun 28, 2018

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Jun 29, 2018

@ChrisALiles are you using walk.go or ssa rules based approach? I've implemented something to see if it will speed-up internal code and going to mail it for trybot testing on other architectures, but please continue working on your prototype, as there is plenty of time left until code unfreeze.

@gopherbot

This comment has been minimized.

gopherbot commented Jun 29, 2018

Change https://golang.org/cl/121697 mentions this issue: cmd/compile: inline runtime.memequal if possible

@ChrisALiles

This comment has been minimized.

Contributor

ChrisALiles commented Jun 30, 2018

@gopherbot

This comment has been minimized.

gopherbot commented Aug 21, 2018

Change https://golang.org/cl/130255 mentions this issue: cmd/compile: optimise some small equality comparisons

@ChrisALiles

This comment has been minimized.

Contributor

ChrisALiles commented Aug 21, 2018

CL 130255 is probably only of academic interest at best seeing that CL 121697 has already been submitted, but I thought I would send it anyway because it cost me a fair amount of effort. A couple of notes: 1. I originally tried to find a way to fix this in walk and found a way to handle Josharian's "len" example above, so I left that in. 2. Don't send it to the bots - it fails in test/codegen because I wasn't brave/silly enough to try anything with architectures other than amd64/386.

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Aug 23, 2018

@ChrisALiles CL 121697 is not submitted, due to trybot failures. My understanding is that main problem is that replacing call to memequal with store to the stack may change stack layout and cause write after stack end. I didn't had time to fix that.

CAFxX added a commit to CAFxX/go that referenced this issue Oct 19, 2018

cmd/compile: inline runtime.memequal if possible
For calls to runtime.memequal with known size, we can just
generate single compare instruction. Do this on all architectures, that
have appropriate sized unaligned load and compare instructions.

Shaves ~8kb from go tool:
go_old 12092977
/localdisk/itocar/golang/bin/go 12080689 [-12288 bytes]

section differences
global text (code) = -5904 bytes (-0.134493%)
read-only data = -2839 bytes (-0.098694%)
Total difference -8743 bytes (-0.112717%)

Fixes golang#24765

Change-Id: I13fcb7ac046df6915521340ec6321465c7f4e93c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment