Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: emit (MAX|MIN)SD instead of UCOMISD when possible #31272

Closed
agnivade opened this issue Apr 5, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@agnivade
Copy link
Member

commented Apr 5, 2019

What version of Go are you using (go version)?

$ go version
go version devel +4091cf972a Sun Mar 31 23:35:35 2019 +0000 linux/amd64

Consider this:

func max(x, y float64) float64 {
	if x > y {
		return x
	}
	return y
}

This gets compiled to

0x0000 00000 (max.go:5) MOVSD   "".x+8(SP), X0
0x0006 00006 (max.go:5) MOVSD   "".y+16(SP), X1
0x000c 00012 (max.go:5) UCOMISD X1, X0
0x0010 00016 (max.go:5) JLS     25
0x0012 00018 (max.go:6) MOVSD   X0, "".~r2+24(SP)
0x0018 00024 (max.go:6) RET
0x0019 00025 (max.go:8) MOVSD   X1, "".~r2+24(SP)
0x001f 00031 (max.go:8) RET

Whereas, it could have been written as

MOVSD x+8(SP), X0
MOVSD y+16(SP), X1
MAXSD X1, X0
MOVSD X0, ret+24(SP)
RET

Benchmarks show a 12.6% improvement, when the assembly call overhead is ignored.

I am wondering if it is possible to detect patterns like these and insert a (MAX/MIN)SD instead of a UCOMISD and a Jump ? Might also be possible to use a FCMOV, but that is also not implemented yet.

Is this possible with a simple rewrite rule or does it require deeper compiler knowledge to achieve this ? Happy to work on it if it is the former :)

FYI, the SSA during the lower pass is

b1:-
v1 (?) = InitMem <mem>
v2 (?) = SP <uintptr>
v6 (?) = LEAQ <*float64> {~r2} v2
v7 (4) = Arg <float64> {x} (x[float64])
v8 (4) = Arg <float64> {y} (y[float64])
v11 (+5) = UCOMISD <flags> v7 v8
v10 (+5) = SETGF <bool> v11
v12 (+5) = TESTB <flags> v10 v10
UGT v11 → b3 b2 (5)
b2: ← b1-
v17 (8) = VarDef <mem> {~r2} v1
v18 (+8) = MOVSDstore <mem> {~r2} v2 v8 v17
Ret v18 (+8)
b3: ← b1-
v13 (6) = VarDef <mem> {~r2} v1
v14 (+6) = MOVSDstore <mem> {~r2} v2 v7 v13
Ret v14 (+6)

@josharian @randall77 @martisch

@josharian

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

The change probably isn’t too intricate, although one can never be sure without actually implementing.

However, floating point operations have a bunch of weird corner cases. It’d be good to double-check that those instructions handle all the corner cases the same way Go does. @dr2chase might be able to weigh in usefully here.

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

The corner cases involve Nan and -0, and using the instruction probably will get us closer to best practices (in general, NaN op x is the same as x op NaN is NaN, and I did not say "equals" because NaN == NaN is false).

Oops -- if we have a defined Max function, yes, the instruction is probably better. But as a replacement here, nope, NaNs I think mess things up.

Double oops, that instruction is weird. I'm sure there's a reason, but it's not IEEE.

@agnivade

This comment has been minimized.

Copy link
Member Author

commented Apr 5, 2019

So you are suggesting not to use this instruction due to NaN complications ?

The instruction does have some documentation on how it handles NaN:

If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN of either source operand be returned, the action of MAXSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (4-18 Vol2B)

It seems for the NaN cases, always the second operand is returned (which may or may not be a Nan). So it looks like we don't want such a behavior ?

@dr2chase

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

For legal programs, optimizations are supposed to not change behavior, so no.

@agnivade agnivade closed this Apr 6, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.