{{ message }}

# runtime: timediv should use |= instead of += since we are adding bitsets to an originally cleared value #27529

Closed
opened this issue Sep 6, 2018 · 13 comments
Closed

# runtime: timediv should use |= instead of += since we are adding bitsets to an originally cleared value#27529

opened this issue Sep 6, 2018 · 13 comments
Labels
Milestone

### odeke-em commented Sep 6, 2018

Right now just before bed, I have been studying `runtime/os_darwin.go` in relation to https://go-review.googlesource.com/c/go/+/133655 to try to write a regression/end-to-end test after that CL is merged.

I noticed that in the code for `timediv`

Lines 414 to 421 in 22afb35

 func timediv(v int64, div int32, rem *int32) int32 { res := int32(0) for bit := 30; bit >= 0; bit-- { if v >= int64(div)<

we try to reduce the original timeunit by at most ((1<<31)-1) = 0x7FFFFFFF if (v >= conversionUnit<<bits)
where:

• bits is in the reverse domain of [0, 30]
• conversionUnit is "div" aka the number of units converting from the original to the target unit e..g for ns to ms conversion, div = 1e6, ns to us, div = 1e3

In particular notice that we are doing a `+=` for values that are essentially bit sets for a value that started at 0

Line 419 in 22afb35

 res += 1 << uint(bit)

i.e. we are only doing power of 2 increments/toggling bits.

We could replace that by `|=` and that code should be faster by a couple of nanoseconds and to justify the change:

• That code is used is converting from nanotime() `ns` so at most ((1<<31)-1) * 1e9 ns (since max epoch second = (1<<31)-1) which by any measure will mean that every single invocation of `timediv`
with a system nanotime that began later than September 8th 2001 ~7PM, will run through that loop at least once -- from `ns` to at best `s`, with smaller units it is even more guaranteed to run since (1e9<<30) (div=1e9, ns to s)

I believe that we could shave off a couple of nanoseconds and take advantage of registers.
My simple & dirty benchmark at https://play.golang.org/p/tWy9l5oH0tG when run shows an 18% decrement in time

```name  old time/op  new time/op  delta
-8    37.7ns ± 2%  30.7ns ± 4%  -18.37%  (p=0.008 n=5+5)```

`timediv` is in a very hot path throughout Go and the ecosystem; when we perform any semaphore operations e.g. while locking and unlocking a mutex on many OSes, of which Go's bread and butter for dealing with concurrent accesses are mutex operations and those are also used all around the standard library.

I am filing this issue because the comment for that function says that is is a very special function

Lines 410 to 411 in 22afb35

 // This is a very special function, do not use it if you are not sure what you are doing. // int64 division is lowered into _divv() call on 386, which does not fit into nosplit functions.

and I also wonder if perhaps there was an intention of using `+=`?

Please feel free to correct me or provide some ideas, my apologies for any mistakes #~4AMChecks. Thank you!

### ianlancetaylor commented Sep 6, 2018

 Looks like `|=` is faster on amd64 because it replaces `mov; shl; add` with `bts`. On 386 we don't use `bts`, so both versions are about the same speed. I can't think of any reason why `|=` would be slower than `+=` on any plausible hardware, so the change seems reasonable to me. Thanks.
added this to the Unplanned milestone Sep 6, 2018

### rasky commented Sep 6, 2018 • edited

 If `timediv` is just a 64/32=32 division, it should probably be converted to use `math/bits.Div32` once CL123157 goes in. That would speed it up a lot more than just using `|=` instead of `+=`, on both 32-bit and 64-bit platforms.

### odeke-em commented Sep 10, 2018

If timediv is just a 64/32=32 division, it should probably be converted to use math/bits.Div32 once CL123157 goes in. That would speed it up a lot more than just using |= instead of +=, on both 32-bit and 64-bit platforms.

Thanks for the chime in @rasky! Interesting and cool that bits.Div* is being added. Perhaps I am misunderstanding how the new API will be used but I think it'll involve a bit of retrofitting given a few restrictions in the doc comments. I just skimmed real quick over it and thought to plugin and play one of the sanity checks in the runtime

Line 208 in 22afb35

 if timediv(12345*1000000000+54321, 1000000000, &e) != 12345 || e != 54321 {

but the results are unexpected

```package main_test

import (
"math/bits"
"testing"
)

func TestDiv(t *testing.T) {
v := int64(12345*1000000000+54321)
div := int32(1000000000)

hi, lo := uint(v>>32), uint(v&0xFFFFFFFF)
quo, rem := bits.Div(hi, lo, uint(div))
if g, w := quo, uint(12345); g != w {
t.Errorf("Quotient:\n\tgot  %d (0x%X)\n\twant %d (0x%X)", g, g, w, w)
}
if g, w := rem, uint(54321); g != w {
t.Errorf("Remainder :\n\tgot  %d (0x%X)\n\twant %d (0x%X)", g, g, w, w)
}
}```

with results

```\$ go test -run=TestDiv
--- FAIL: TestDiv (0.00s)
div_test.go:15: Quotient:
got  53015942467842 (0x3037BC6B1102)
want 12345 (0x3039)
div_test.go:18: Remainder :
got  515390001 (0x1EB83A31)
want 54321 (0xD431)
FAIL
exit status 1
FAIL	_/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv	0.005s```

With my point being that: perhaps let's file a separate bug as a reminder to retrofit "math/bits".Div into the runtime once https://golang.org/cl/123157 is landed as you suggested. For now I think we can easily address using `|=` :)

### gopherbot commented Sep 10, 2018

 Change https://golang.org/cl/134231 mentions this issue: `runtime: convert initial timediv quotient increments to bitsets`

### cherrymui commented Sep 10, 2018

 We should not use timediv on 64-bit machines. We should just use the division operator, as os_linux.go does.

### rasky commented Sep 10, 2018

 @odeke-em which arch are you testing that on? Try using Div32 instead of Div to make it behave the same on both 32-bit and 64-bit archs (your code would fail on 64 bit because of the right shift sign extending the 64 bit input). @cherrymui using bits.Div is a portable way of doing a 64 bit division without two different code paths

### odeke-em commented Sep 10, 2018

 @rasky I was on AMD64, Darwin. Thank you for the tip of switching to Div32, that runs correctly and is magnitudes faster than all the others in deed ```\$ go test -v -run=^\$ -bench=As goos: darwin goarch: amd64 BenchmarkAsPlus-8 50000000 27.8 ns/op BenchmarkAsOr-8 50000000 26.0 ns/op BenchmarkAsDiv-8 2000000000 0.39 ns/op PASS ok _/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv 3.568s```

### cherrymui commented Sep 10, 2018

 I'm not sure we want the runtime package to import math/bits, although it is probably a possibility. And, with the current implementation in CL 123157, it probably doesn't work. math/bits.Div32 is implemented simply with 64-bit division. On 32-bit systems, this lowers to runtime.uint64div. As the comment in timediv says, this may not fit into nosplit stack limit (maybe it does now, I don't know, at least it didn't when timediv was introduced). And not all 32-bit machines have 64-bit-divided-by-32-bit intrinsic. Of course we could probably have a different implementation of Div32 that uses very small stack space.

### randall77 commented Sep 10, 2018

 @odeke-em : That last benchmark result looks like the entire operation was optimized away. It's unlikely that the chip is doing a divide in less than a nanosecond. (Or maybe it's just really pipelined?)

### cherrymui commented Sep 10, 2018

 If 64-bit division now fits into nosplit stack limit on all machines, I suggest changing back to using the division operator.

### odeke-em commented Sep 10, 2018

 @odeke-em : That last benchmark result looks like the entire operation was optimized away. @randall77 aha, yeah it looked too good to be true, but alas seems to actually be running alright ```package main import ( "math/bits" "testing" ) func BenchmarkAsDiv(b *testing.B) { v := int64(12345*1000000000 + 54321) div := int32(1000000000) var quo, rem uint32 for i := 0; i < b.N; i++ { hi, lo := uint32(v>>32), uint32(v&0xFFFFFFFF) quo, rem = 0, 0 quo, rem = bits.Div32(hi, lo, uint32(div)) } if g, w := quo, uint32(12345); g != w { b.Fatalf("Got %d\nWant %d", g, w) } if g, w := rem, uint32(54321); g != w { b.Fatalf("Got %d\nWant %d", g, w) } }``` still giving ```\$ go test -run=\$^ -bench=As goos: darwin goarch: amd64 BenchmarkAsPlus-8 50000000 24.5 ns/op BenchmarkAsOr-8 100000000 22.8 ns/op BenchmarkAsDiv-8 2000000000 0.33 ns/op PASS ok _/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/runtime-timediv 4.252s``` ```\$ uname -a Darwin Emmanuels-MacBook-Pro-2.local 17.7.0 Darwin Kernel Version 17.7.0: Thu Jun 21 22:53:14 PDT 2018; root:xnu-4570.71.2~1/RELEASE_X86_64 x86_64```

### rasky commented Sep 10, 2018

 I'm not sure we want the runtime package to import math/bits, although it is probably a possibility. We'll have to figure this out, as there are many other points where runtime could benefit from `math/bits` operations; there are for instances some multiplication overflow checks that could be simplified and made faster using `bits.Mul`. And, with the current implementation in CL 123157, it probably doesn't work. Right, I was implying that an intrinsification CL would eventually land (like CL129415, but for 32-bit archs). And not all 32-bit machines have 64-bit-divided-by-32-bit intrinsic. Oh that's right, I guess ARM doesn't.
closed this in ``` 178a609 ``` Sep 12, 2018

### odeke-em commented Sep 12, 2018

 @rasky would you mind perhaps filing a reminder issue to examine what we can do around the standard library with math/bits.Div* when the CL lands? I ask because that conversation dominated this issue(which is now closed), but requires more thoughts and investigation, but also the conversations are very valuable and would be nice to keep it going on a fresh issue. Thank you!
locked and limited conversation to collaborators Sep 12, 2019
added the label Sep 12, 2019