Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: intrinsify more of math/bits on amd64 #28273

Closed
bradfitz opened this issue Oct 18, 2018 · 6 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

commented Oct 18, 2018

Now that more API has been added to math/bits (#24813), this is a tracking bug to intrinsify more of it in the compiler for amd64.

As @randall77 said in #24813 (comment) ...

Only Mul has been intrinsified so far.
I think we want Add/Sub intrinsified also (or better, recognize the Go code and generate the add-with-carry instruction).

@bmkessler

This comment has been minimized.

Copy link
Contributor

commented Oct 19, 2018

I originally had added the Div64 intrinsic for amd64, but there were concerns during review about the behavior when the quotient won't fit into 64-bits. In particular, the pure go implementation ported from math/big returns a value of 1<<64 - 1, 1<<64 - 1 in this case.

if hi >= y {
	return 1<<64 - 1, 1<<64 - 1
}

whereas the DIVQ instruction will cause a divide by zero exception. Additionally, the 32-bit pure go version just truncates quo to 32-bits, which is not consistent either.

quo, rem = uint32(z/uint64(y)), uint32(z%uint64(y))

Currently, the documentation from the original proposal states:

hi must be < y otherwise the behavior is undefined (the quotient won't fit into quo).

The concern was that "undefined" shouldn't encompass both returning a value in one implementation and panic in another.

I propose that the behavior when hi >= y should be well-defined so that an intrinsic can be implemented consistently across platforms; either to panic or return a sensible value. I recommend changing the definition to panic when hi >= y since returning a value when the actual result does not fit in the output variable could be the source of subtle bugs. A panic will force users to handle hi >= y correctly if that is a possibility in their code. It also aligns with the behavior of the DIVQ instruction on amd64.

@griesemer @randall77

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2018

I'm ok with defining it to panic.
Divides are expensive enough that a compare and branch before each isn't a big overhead.

We should fix this for 1.12 before the API is set in stone.

@randall77 randall77 modified the milestones: Unplanned, Go1.12 Oct 22, 2018

@randall77

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2018

Actually, probably deserves its own issue.

@gopherbot

This comment has been minimized.

Copy link

commented Oct 23, 2018

Change https://golang.org/cl/144257 mentions this issue: cmd/compile: intrinsify math/bits.Add on amd64

@gopherbot

This comment has been minimized.

Copy link

commented Oct 23, 2018

Change https://golang.org/cl/144258 mentions this issue: cmd/compile: intrinsify math/bits.Sub on amd64

@gopherbot

This comment has been minimized.

Copy link

commented Oct 24, 2018

Change https://golang.org/cl/144378 mentions this issue: cmd/compile: intirnsify math/bits.Div on amd64

gopherbot pushed a commit that referenced this issue Oct 25, 2018

cmd/compile: intrinsify math/bits.Add on amd64
name             old time/op  new time/op  delta
Add-8            1.11ns ± 0%  1.18ns ± 0%   +6.31%  (p=0.029 n=4+4)
Add32-8          1.02ns ± 0%  1.02ns ± 1%     ~     (p=0.333 n=4+5)
Add64-8          1.11ns ± 1%  1.17ns ± 0%   +5.79%  (p=0.008 n=5+5)
Add64multiple-8  4.35ns ± 1%  0.86ns ± 0%  -80.22%  (p=0.000 n=5+4)

The individual ops are a bit slower (but still very fast).
Using the ops in carry chains is very fast.

Update #28273

Change-Id: Id975f76df2b930abf0e412911d327b6c5b1befe5
Reviewed-on: https://go-review.googlesource.com/c/144257
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>

@gopherbot gopherbot closed this in dd78955 Oct 25, 2018

gopherbot pushed a commit that referenced this issue Nov 27, 2018

cmd/compile: intrinsify math/bits.Div on amd64
Note that the intrinsic implementation panics separately for overflow and
divide by zero, which matches the behavior of the pure go implementation.
There is a modest performance improvement after intrinsic implementation.

name     old time/op  new time/op  delta
Div-4    53.0ns ± 1%  47.0ns ± 0%  -11.28%  (p=0.008 n=5+5)
Div32-4  18.4ns ± 0%  18.5ns ± 1%     ~     (p=0.444 n=5+5)
Div64-4  53.3ns ± 0%  47.5ns ± 4%  -10.77%  (p=0.008 n=5+5)

Updates #28273

Change-Id: Ic1688ecc0964acace2e91bf44ef16f5fb6b6bc82
Reviewed-on: https://go-review.googlesource.com/c/144378
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.