Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: arm64 assembly code for shlVU is incorrect #31084

Open
ALTree opened this Issue Mar 27, 2019 · 23 comments

Comments

Projects
None yet
7 participants
@ALTree
Copy link
Member

commented Mar 27, 2019

The TestFloat64SpecialCases test in math/big (ratconv_test.go) is currently broken on the android/arm64 builder, with log:

https://build.golang.org/log/fbc581637a0375532f0d819ae67bc9b9abbe8ba9

Breakage started at e4ba400 (math/big: accept non-decimal floats with Rat.SetString).

cc @griesemer

@ALTree ALTree added this to the Go1.13 milestone Mar 27, 2019

@griesemer griesemer self-assigned this Mar 27, 2019

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

Hm. It's only broken on android-arm-wikofever. @eliasnaur, it looks like you're the owner of that builder. Is there anything special about that arm device? It's a Cortex-A53 CPU?

@eliasnaur

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

I don't think it's a special device, other than it being an Android phone.

The test also broke on darwin/arm64 (iPhone):

https://build.golang.org/log/0b86dd35833a34f28785d335153946ed9a5a3691

which I assume uses an entirely different CPU.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

Indeed, and that appears to be an iPhone 7+. Interesting.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

Here's a smaller test case: https://play.golang.org/p/7OUzcEgoVGk

In the playground, and elsewhere, running this code produces the correct result:

r = 1/100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 (true)
len(r) = 308
f = 1e-305 (false)
e = 1e-305

On android-arm-wikofever (and presumably darwin/arm64) we get:

r = 1/96541187637860552075819821540903887880024095043179244007497829804714413865809870901655967120022719342752763287637146850362464306041721929996152029995888654682648990460561537640800770715572834310810731676381392326036599721803873644105423265719091992037347983408132957025607386110373485836090285769528130906264626714751533056 (true)
len(r) = 325
f = 1e-323 (false)
e = 1e-305

That is the fraction is completely wrong.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 27, 2019

The fraction is correct for 1e-164, and then wrong for 1e165.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 27, 2019

Change https://golang.org/cl/169721 mentions this issue: math/big: temporarily disable buggy shlVU assembly for arm64

gopherbot pushed a commit that referenced this issue Mar 27, 2019

math/big: temporarily disable buggy shlVU assembly for arm64
This addresses the failures we have seen in #31084. The correct
fix is to find the actual bug in the assembly code.

Updates #31084.

Change-Id: I437780c53d0c4423d742e2e3b650b899ce845372
Reviewed-on: https://go-review.googlesource.com/c/go/+/169721
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

@griesemer griesemer changed the title math/big: ratconv tests broken on android arm64 builder math/big: arm64 assembly code for shlVU is incorrect Mar 28, 2019

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

I've tracked this down a bit more. Here is a stand-alone test file (to be added to the math/big test files) that exposes the problem:

package big

import (
	"strings"
	"testing"
)

func TestSHLVU(t *testing.T) {
	// compute 10^n via 5^n << n
	const n = 165
	p := nat(nil).expNN(nat{5}, nat{n}, nil)
	p = p.shl(p, uint(n)) // <<< this doesn't work using ·shlVU
	// p = nat(nil).shl(p, uint(n)) // <<< this works

	got := string(p.utoa(10))
	want := "1" + strings.Repeat("0", n)
	if got != want {
		t.Errorf("got  %s\nwant %s\n", got, want)
	}
}

When run with the ·shlVU assembly routine (in arith_arm64.s) enabled, this test fails on arm:

$ gomote run -path '$PATH,$WORKDIR/go/bin' $MOTE go/bin/go test -run SHLVU ../src/math/big
go_android_exec: adb wait-for-device exec-out while [[ -z $(getprop sys.boot_completed) ]]; do sleep 1; done;
go_android_exec: adb push /var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test /data/local/tmp/go_android_exec/big.test-89906/big.test
[  1%] /data/local/tmp/go_android_exec/big.test-89906/big.test
...
[100%] /data/local/tmp/go_android_exec/big.test-89906/big.test
/var/folders/f6/d2bhfqss2716nxm8gkv1fmb80000gn/T/workdir-host-darwin-amd64-eliasnaur-android/tmp/go-build825267701/b001/big.test: 1 file pushed. 10.1 MB/s (6329512 bytes in 0.597s)
go_android_exec: adb exec-out export TMPDIR="/data/local/tmp/go_android_exec/big.test-89906"; export GOROOT="/data/local/tmp/go_android_exec/goroot"; export GOPATH="/data/local/tmp/go_android_exec/big.test-89906/gopath"; export CGO_ENABLED=0; export GOPROXY=; export GOCACHE="/data/local/tmp/go_android_exec/gocache"; export PATH=$PATH:"/data/local/tmp/go_android_exec/goroot/bin"; cd "/data/local/tmp/go_android_exec/goroot/src/math/big"; '/data/local/tmp/go_android_exec/big.test-89906/big.test' -test.run=SHLVU; echo -n exitcode=$?
--- FAIL: TestSHLVU (0.00s)
    shl_test.go:18: got  999999999999999999999999999999999999997461262869164377797676543286813257059020658007941683889923711150197932805888983316150450477342246492361833965695145908050591744
        want 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
FAIL
exitcode=1go_android_exec: adb exec-out rm -rf /data/local/tmp/go_android_exec/big.test-89906
FAIL	math/big	0.988s
Error running run: exit status 1

When the assembly routine is disabled, the code works fine.

When the assembly routine is enabled but the test case is tweaked slightly such that it uses a nat(nil) receiver rather than p, then the code also seems to work. This is likely an indicator as to what might be wrong with the assembly code.

When fixing this, one should also look into ·shrVU and verify that it doesn't suffer from the same problem.

cc: @erifan @josharian who seem to have written some of this code

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Looks like this routine was written by @erifan. The problem is definitely not obvious at a first read through the code. Let me know if you'd like me to take a closer look; otherwise, I will (tentatively) leave this for @erifan.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Also, the shift==0 case is missing an optimization. If z.ptr == x.ptr, there's no need to do a copy at all; you can simply return.

The pure Go code uses the copy built-in to do the copy, which does this optimization under the hood. amd64 is missing the shift==0 optimization entirely. I'll file a separate issue for that.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

It's definitively subtle. The problem doesn't appear for n < 165, at least for https://play.golang.org/p/7OUzcEgoVGk code.

@erifan

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Sorry, I'll fix this bug.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

The problem is that the loop in the assembler code goes forward through the arrays. It has to go backward. Otherwise the output overwrites the input.

The same is true of shrVU.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Of course I'm wrong about shrVU, which has to go forward. Only shlVU has to go backward.

@eliasnaur

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

@bradfitz I can't imagine this doesn't fail on all arm64 CPUs, yet I see no arm64 builders on build.golang.org apart from the mobiles. Are they gone or am I blind?
With generic arm64 builders, gri wouldn't have to go through that painful debugging session.

@erifan

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

The code has been written and I will submit it immediately.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 28, 2019

Change https://golang.org/cl/169780 mentions this issue: math/big: fix the bug in assembly implementation of shlVU on arm64

@erifan

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

I found that the implementation of shlVU on amd64 does not consider the case where the high byte of z overlaps with the low byte of x, so I also did not consider this point. Do we need to consider this situation?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

We only have to consider word level overlap, not byte level overlap, if that is what you mean. But we do have to consider the case where the high word of z overlaps with the low word of x. I don't see any obvious problem in the amd64 implementation but maybe there is one.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Byte-overlap should never happen, it's either one or more words, or none - otherwise we'd have a serious bug elsewhere. You can safely assume that all pointers are word-aligned.

The assembly code should behave essentially the same as shlVU_g in arith.go.

@erifan

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

Yes I mean Word level overlap. And I can construct an negative case for this situation, which can prove that the current pure Go implementation are defective.

func TestShlVUOverlap(t *testing.T) {
        a := argVW{nat{2, 2, 2, 2, 2, _M << 1 &_M, 1}, nat(nil).set(nat{1, 1, 1, 1, 1, 1, 1, 1, 1, _M}), 1, 1}
        // save a.x for error message because it will be overrided.
        b := nat(nil).make(10)
        copy(b, a.x)
        got := a.x[0:6].shl(a.x[4:10], uint(a.y))
        for i, zi := range got {
                if zi != a.z[i] {
                        t.Errorf("special shl(%v, %v)\n\tgot z[%d] = %#x; want %#x", b[4:10], a.y, i, zi, a.z[i])
                        break
                }
        }
}

Add this test to file math/big/arith_test.go, and run
$ ../../../bin/go test -run TestShlVU

With the pure Go implementation, we get:

--- FAIL: TestShlVUOverlap (0.00s)
arith_test.go:260: special shl([1 1 1 1 1 18446744073709551615], 1)
got z[0] = 0x4; want 0x2
FAIL
exit status 1
FAIL math/big 0.072s

The current use cases of function shl didn't expose this bug, and as function nat.shl is not a public function, so I wonder if we need to consider this situation when implementing this function or is there some mechanisms that can guarantee this situation never happen?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

shlVU is an internal function, so we don't need to consider all possible ways to call it. We only need to consider uses of the exported methods. We can trust that the code in the math/big package understands the limitations of shlVU.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2019

We can trust that the code in the math/big package understands the limitations of shlVU.

The code might, but I couldn't tell you those limitations offhand. It'd be good to document them.

Once they're documented, we can start fuzzing them using those documented limitations. Since there are multiple implementations, these are good fuzz candidates. (I have started on this.)

@gopherbot

This comment has been minimized.

Copy link

commented Apr 4, 2019

Change https://golang.org/cl/168257 mentions this issue: math/big: add comprehensive aliasing tests (and minor fixes to Exp, Rand)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.