-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/big: zero divide in ProbablyPrime (garbage collection bug?) #4869
Comments
Comment 1 by huseby@lindenlab.com: I attached the executable. |
There really is a zero on the stack as the argument to divWW (it's not a divide by -1, for example), but that word is the most significant word of the divisor, which is never zero. If the divisor itself were zero it would have len(v) == 0, so that v[len(v)-1] would have panicked. Worse, in the context of the called code the divisor is the modulus passed to expNN, which remains constant throughout the execution of expNNWindowed. And yet the code died in the div at line 1358: 1339 for i := len(y) - 1; i >= 0; i-- { 1340 yi := y[i] 1341 for j := 0; j < _W; j += n { 1342 if i != len(y)-1 || j != 0 { 1343 // Unrolled loop for significant performance 1344 // gain. Use go test -bench=".*" in crypto/rsa 1345 // to check performance before making changes. 1346 zz = zz.mul(z, z) 1347 zz, z = z, zz 1348 zz, r = zz.div(r, z, m) 1349 z, r = r, z 1350 1351 zz = zz.mul(z, z) 1352 zz, z = z, zz 1353 zz, r = zz.div(r, z, m) 1354 z, r = r, z 1355 1356 zz = zz.mul(z, z) 1357 zz, z = z, zz 1358 zz, r = zz.div(r, z, m) 1359 z, r = r, z 1360 1361 zz = zz.mul(z, z) 1362 zz, z = z, zz 1363 zz, r = zz.div(r, z, m) 1364 z, r = r, z 1365 } 1366 1367 zz = zz.mul(z, powers[yi>>(_W-n)]) 1368 zz, z = z, zz 1369 zz, r = zz.div(r, z, m) 1370 z, r = r, z 1371 1372 yi <<= n 1373 } 1374 } If m is a denormalized zero at 1358, why is not a denormalized zero at 1353 or 1348? My hypothesis is that this is a garbage collection bug, that the underlying array backing m has been collected and zeroed prematurely. That would resolve all the mysteries. It's easy to believe there are garbage collection bugs at tip, sadly. The garbage collector has been having some work done recently and I've seen a couple strange crashes I haven't tracked down yet. If you need stability my suggestion would be to stay at the release version. How many times have you seen this crash? How long does it typically take to happen? Labels changed: added priority-later, go1.1, removed priority-triage, go1.1maybe. Status changed to Accepted. |
Comment 3 by huseby@lindenlab.com: This crash usually takes anywhere between 5 minutes and an hour to happen. I've never seen this exe run longer than an hour before failing with this exact crash. |
Comment 5 by huseby@lindenlab.com: I updated to version: go version devel +e0b23d56fd6a Wed Feb 27 21:17:53 2013 +0200 linux/386 And now it's failing with: panic: runtime error: invalid memory address or nil pointer dereference [signal 0xb code=0x1 addr=0x1 pc=0x80a7bbf] I'm not sure which callstack it was so here's all of them: goroutine 3 [running]: math/big.nat.setBytes(0x1, 0x10, 0x14, 0x18411880, 0x40, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1494 +0xbf math/big.(*Int).SetBytes(0x184f4b60, 0x18411880, 0x40, 0x40, 0x40, ...) /home/huseby/Projects/go/src/pkg/math/big/int.go:567 +0x48 crypto/rand.Prime(0x1834b400, 0x1834b020, 0x200, 0x184f4b60, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rand/util.go:69 +0x22c crypto/rsa.GenerateMultiPrimeKey(0x1834b400, 0x1834b020, 0x2, 0x400, 0x1840fde0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:168 +0x1af crypto/rsa.GenerateKey(0x1834b400, 0x1834b020, 0x400, 0x0, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:127 +0x51 main.CreateCert(0x18335510) /home/huseby/Projects/onion_finder/of.go:35 +0x4f created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 1 [chan receive]: main.main() /home/huseby/Projects/onion_finder/of.go:128 +0x19f goroutine 0 [syscall]: goroutine 4 [running]: created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 5 [running]: math/big.nat.divLarge(0x184bb750, 0x10, 0x24, 0x184bb7e0, 0x21, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:584 +0xbf4 math/big.nat.div(0x18461b40, 0x10, 0x24, 0x18501630, 0x20, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:519 +0x26f created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 6 [running]: created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 7 [running]: math/big.nat.divLarge(0x1839bab0, 0x10, 0x24, 0x1839bab0, 0x21, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:582 +0xa36 math/big.nat.div(0x1839bab0, 0x10, 0x24, 0x1839bab0, 0x20, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:519 +0x26f math/big.nat.expNNWindowed(0x1839bab0, 0x20, 0x24, 0x1840e780, 0x10, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1363 +0xa4e math/big.nat.expNN(0x1840e7d0, 0x10, 0x14, 0x1840e780, 0x10, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1257 +0x3f7 math/big.nat.probablyPrime(0x184f1be0, 0x10, 0x14, 0x14, 0x0, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1442 +0x573 math/big.(*Int).ProbablyPrime(0x184d9270, 0x14, 0x35) /home/huseby/Projects/go/src/pkg/math/big/int.go:726 +0x42 crypto/rand.Prime(0x1834b400, 0x1834b020, 0x200, 0x184d9270, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rand/util.go:97 +0x337 crypto/rsa.GenerateMultiPrimeKey(0x1834b400, 0x1834b020, 0x2, 0x400, 0x1840f4e0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:168 +0x1af crypto/rsa.GenerateKey(0x1834b400, 0x1834b020, 0x400, 0x0, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:127 +0x51 main.CreateCert(0x18335510) /home/huseby/Projects/onion_finder/of.go:35 +0x4f created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 8 [running]: math/big.nat.divLarge(0x183c2e40, 0x10, 0x15, 0x183c2e40, 0x21, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:541 +0x2bc math/big.nat.div(0x184f23f0, 0x10, 0x24, 0x184f2360, 0x10, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:519 +0x26f math/big.nat.expNNWindowed(0x18493140, 0x20, 0x25, 0x1841deb0, 0x10, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1353 +0x7f8 math/big.nat.expNN(0x1841df00, 0x10, 0x14, 0x1841deb0, 0x10, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1257 +0x3f7 math/big.nat.probablyPrime(0x18402370, 0x10, 0x14, 0x14, 0x0, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:1442 +0x573 math/big.(*Int).ProbablyPrime(0x184d9130, 0x14, 0x35) /home/huseby/Projects/go/src/pkg/math/big/int.go:726 +0x42 crypto/rand.Prime(0x1834b400, 0x1834b020, 0x200, 0x184d9130, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rand/util.go:97 +0x337 crypto/rsa.GenerateMultiPrimeKey(0x1834b400, 0x1834b020, 0x2, 0x400, 0x183adea0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:168 +0x1af crypto/rsa.GenerateKey(0x1834b400, 0x1834b020, 0x400, 0x0, 0x0, ...) /home/huseby/Projects/go/src/pkg/crypto/rsa/rsa.go:127 +0x51 main.CreateCert(0x18335510) /home/huseby/Projects/onion_finder/of.go:35 +0x4f created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 9 [running]: math/big.nat.divLarge(0x183ae870, 0x10, 0x24, 0x183ae870, 0x21, ...) /home/huseby/Projects/go/src/pkg/math/big/nat.go:596 +0xf55 created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 10 [running]: created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 11 [running]: created by main.main /home/huseby/Projects/onion_finder/of.go:121 +0x155 goroutine 12 [chan receive]: main.HashCert(0xbfd854ca, 0x6, 0x18335510, 0x183354e0) /home/huseby/Projects/onion_finder/of.go:54 +0xff created by main.main /home/huseby/Projects/onion_finder/of.go:125 +0x189 |
Comment 6 by huseby@lindenlab.com: I'll re-run it with a single producer and a single consumer goroutine and wait for it to fail. |
I created a patch that turns off some code in the garbage collector: https://golang.org/cl/7557044/ The idea is to gradually extend the patch until the cause of the issue is pinpointed. I am unable to reproduce the failure on my notebook. Please apply https://golang.org/cl/7557044/ to Go tip on a computer on which the issue is reproducible and report the results. |
I have managed to reproduce this with: go build mal.go && file ./mal && GOGCTRACE=1 GOMAXPROCS=10 ./mal -ncpu=9 gc329284(8): 0+1+0 ms, 1 -> 0 MB 11545 -> 581 (3746016176-3746015595) objects, 14(497) handoff, 47(697) steal, 327/166/9 yields SIGSEGV: segmentation violation PC=0x8065892 Segmentation fault I've also seen something suspect while running the time unit tests (nil pointer dereference). Investigating to see if they are related. |
issue #5007 describes some more linux/386 issues. this might be related to that. |
mal.go is http://play.golang.org/p/05udxnXbLH |
Comment 15 by huseby@lindenlab.com: I applied the patch (https://golang.org/cl/7557044/) and re-ran the code. This time it failed with: unexpected fault address 0x4a983dc9 fatal error: fault [signal 0xb code=0x1 addr=0x4a983dc9 pc=0x805178b] several goroutines: panic: runtime error: invalid memory address or nil pointer dereference I'll try GOGCTRACE=1 next using the latest tip without the patch. |
Comment 16 by huseby@lindenlab.com: Running the latest tip, without the patch above, and running it with GOGCTRACE=1, I get the crash. These are the last few lines of output before the crash: gc60339(8): 0+0+0 ms, 1 -> 0 MB 10721 -> 683 (128080926-128080243) objects, 2(10) handoff, 32(333) steal, 283/138/6 yields gc60340(8): 0+0+0 ms, 0 -> 0 MB 684 -> 682 (128080928-128080246) objects, 12(491) handoff, 42(827) steal, 315/131/4 yields gc60341(8): 0+0+0 ms, 0 -> 0 MB 686 -> 664 (128080934-128080270) objects, 7(433) handoff, 42(702) steal, 309/148/9 yields gc60342(8): 0+0+0 ms, 0 -> 0 MB 670 -> 665 (128080942-128080277) objects, 12(474) handoff, 29(648) steal, 318/154/7 yields gc60343(8): 0+0+0 ms, 0 -> 0 MB 675 -> 665 (128080953-128080288) objects, 3(20) handoff, 34(744) steal, 278/106/4 yields gc60344(8): 0+0+0 ms, 0 -> 0 MB 709 -> 652 (128080997-128080345) objects, 16(499) handoff, 31(559) steal, 365/186/9 yields unexpected fault address 0xe7f7d752 fatal error: fault [signal 0xb code=0x1 addr=0xe7f7d752 pc=0x805178b] |
Comment 17 by huseby@lindenlab.com: BTW, running this using a single producer and single consumer thread doesn't cause the panic. I had my test app successfully run for days without a panic when I run it with -ncpu=1. |
Here's one from a recent net test failure. Might be insightful. go test -v -cpu 1,2,4 -timeout=30m std === RUN TestSelfConnect-4 scanstack inconsistent: g1269#0 sp=0xf4d79e84 not in [0xffffff01,0x1] fatal error: scanstack goroutine 150 [running]: [fp=0x18557250] runtime.throw(0x82a453d) /build/go.386/go/src/pkg/runtime/panic.c:465 +0x66 [fp=0x18557284] addstackroots(0x18441000) /build/go.386/go/src/pkg/runtime/mgc0.c:1340 +0xf4 [fp=0x185572ac] addroots() /build/go.386/go/src/pkg/runtime/mgc0.c:1413 +0x1f3 [fp=0x185573dc] gc(0xf4e35fa4) /build/go.386/go/src/pkg/runtime/mgc0.c:1850 +0x1f0 ----- stack segment boundary ----- [fp=0xf4e35fb0] runtime.gc(0x1) /build/go.386/go/src/pkg/runtime/mgc0.c:1788 +0xea [fp=0xf4e35fdc] runfinq() /build/go.386/go/src/pkg/runtime/mgc0.c:2056 +0x166 [fp=0xf4e35fe0] runtime.goexit() /build/go.386/go/src/pkg/runtime/proc.c:1160 created by runtime.gc /build/go.386/go/src/pkg/runtime/mgc0.c:1749 go version devel +d533352b414d Sat Mar 09 05:39:15 2013 +0100 linux/386 |
Here is one more: [signal 0xb code=0x1 addr=0xb572fde4 pc=0x805157b] #0: 0x8057d59 runtime.startpanic src/pkg/runtime/panic.c:406 #1: 0x8058025 runtime.throw src/pkg/runtime/panic.c:481 #2: 0x8060fe7 runtime.sigpanic src/pkg/runtime/thread_linux.c:223 #3: 0x805157b runtime.MCentral_AllocList src/pkg/runtime/mcentral.c:64 #4: 0x80511b7 runtime.MCache_Alloc src/pkg/runtime/mcache.c:24 #5: 0x8062291 runtime.mallocgc src/pkg/runtime/zmalloc_linux_386.c:47 #6: 0x805ea30 makeslice1 src/pkg/runtime/slice.c:61 #7: 0x805e88d runtime.makeslice src/pkg/runtime/slice.c:36 #8: 0x809ff7f math/big.nat.make src/pkg/math/big/nat.go:76 #9: 0x80a1c1f math/big.nat.mul src/pkg/math/big/nat.go:400 #10: 0x80a73bc math/big.nat.expNNWindowed src/pkg/math/big/nat.go:1332 #11: 0x80a6b47 math/big.nat.expNN src/pkg/math/big/nat.go:1257 #12: 0x80a81c3 math/big.nat.probablyPrime src/pkg/math/big/nat.go:1442 #13: 0x809e782 math/big.(*Int).ProbablyPrime src/pkg/math/big/int.go:726 #14: 0x80667b7 crypto/rand.Prime src/pkg/crypto/rand/util.go:97 #15: 0x80671bf crypto/rsa.GenerateMultiPrimeKey src/pkg/crypto/rsa/rsa.go:168 #16: 0x8066ff1 crypto/rsa.GenerateKey src/pkg/crypto/rsa/rsa.go:127 #17: 0x8048c4f main.CreateCert /tmp/mal.go:35 #18: 0x805b110 runtime.goexit src/pkg/runtime/proc.c:1160 |
Malloc/free deadlocks seem to be reincarnation of the same problem: fatal error: malloc/free - deadlock #0: 0x8057d59 runtime.startpanic src/pkg/runtime/panic.c:406 #1: 0x8058025 runtime.throw src/pkg/runtime/panic.c:481 #2: 0x8062217 runtime.mallocgc src/pkg/runtime/zmalloc_linux_386.c:33 #3: 0x8063453 runtime.mal src/pkg/runtime/zmalloc_linux_386.c:611 #4: 0x804fd97 copyin src/pkg/runtime/iface.c:152 #5: 0x804ff4e runtime.convT2E src/pkg/runtime/iface.c:220 #6: 0x8049c99 runtime.newErrorString src/pkg/runtime/error.go:74 #7: 0x80580f1 runtime.panicstring src/pkg/runtime/panic.c:497 #8: 0x806105a runtime.sigpanic src/pkg/runtime/thread_linux.c:216 #9: 0x80511fc runtime.MCache_Alloc src/pkg/runtime/mcache.c:33 #10: 0x8062291 runtime.mallocgc src/pkg/runtime/zmalloc_linux_386.c:47 #11: 0x805ea30 makeslice1 src/pkg/runtime/slice.c:61 #12: 0x805e88d runtime.makeslice src/pkg/runtime/slice.c:36 #13: 0x80a2e2c math/big.nat.divLarge src/pkg/math/big/nat.go:541 #14: 0x80a2b1f math/big.nat.div src/pkg/math/big/nat.go:519 #15: 0x80a7958 math/big.nat.expNNWindowed src/pkg/math/big/nat.go:1353 #16: 0x80a6b47 math/big.nat.expNN src/pkg/math/big/nat.go:1257 #17: 0x80a81c3 math/big.nat.probablyPrime src/pkg/math/big/nat.go:1442 #18: 0x809e782 math/big.(*Int).ProbablyPrime src/pkg/math/big/int.go:726 #19: 0x80667b7 crypto/rand.Prime src/pkg/crypto/rand/util.go:97 #20: 0x80671bf crypto/rsa.GenerateMultiPrimeKey src/pkg/crypto/rsa/rsa.go:168 #21: 0x8066ff1 crypto/rsa.GenerateKey src/pkg/crypto/rsa/rsa.go:127 #22: 0x8048c4f main.CreateCert /tmp/mal.go:35 #23: 0x805b110 runtime.goexit src/pkg/runtime/proc.c:1160 |
Here's one that might be easier to reproduce go test image/jpeg go version devel +16e0e01c2e9b Sat Mar 09 18:14:00 2013 -0800 linux/386 GOMAXPROCS=50 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal 0xb code=0x1 addr=0x19 pc=0x80728dd] goroutine 5 [running]: testing.func·004() /build/go.386/go/src/pkg/testing/testing.go:341 +0xf7 image/jpeg.(*decoder).ensureNBits(0x1841e000, 0x1, 0x810c2c0, 0x46) /build/go.386/go/src/pkg/image/jpeg/huffman.go:38 +0x3d image/jpeg.(*decoder).decodeHuffman(0x1841e000, 0x1841e078, 0x0, 0x46, 0x0, ...) /build/go.386/go/src/pkg/image/jpeg/huffman.go:173 +0xa1 image/jpeg.(*decoder).processSOS(0x1841e000, 0xa, 0x18423498, 0x2) /build/go.386/go/src/pkg/image/jpeg/scan.go:196 +0x14a7 image/jpeg.(*decoder).decode(0x1841e000, 0x18335860, 0x183001d0, 0xf7292c00, 0xc, ...) /build/go.386/go/src/pkg/image/jpeg/reader.go:296 +0x51e image/jpeg.Decode(0x18335860, 0x183001d0, 0x183001d0, 0x0, 0x0, ...) /build/go.386/go/src/pkg/image/jpeg/reader.go:320 +0x4a image/jpeg.decodeFile(0x1835d450, 0x2e, 0x0, 0x0, 0x0, ...) /build/go.386/go/src/pkg/image/jpeg/reader_test.go:78 +0xba image/jpeg.TestDecodeProgressive(0x1837f180) /build/go.386/go/src/pkg/image/jpeg/reader_test.go:35 +0x278 testing.tRunner(0x1837f180, 0x81fa3ec) /build/go.386/go/src/pkg/testing/testing.go:346 +0x87 created by testing.RunTests /build/go.386/go/src/pkg/testing/testing.go:426 +0x684 goroutine 1 [chan receive]: testing.RunTests(0x817100c, 0x81fa3e0, 0x5, 0x5, 0x1, ...) /build/go.386/go/src/pkg/testing/testing.go:427 +0x69f testing.Main(0x817100c, 0x81fa3e0, 0x5, 0x5, 0x81fa420, ...) /build/go.386/go/src/pkg/testing/testing.go:358 +0x69 main.main() image/jpeg/_test/_testmain.go:61 +0x81 FAIL image/jpeg 0.075s |
Just 2 days of debugging to discover that it's... drumroll... unaligned 64-bit atomic... again! Episodically GC sweeps some spans twice, and episodically it leads to double free. It perfectly explains all the crashes we see, a lot of crashes in freelists, crashes only on 386 and only with GOMAXPROCS>1. Owner changed to @dvyukov. |
This issue was closed by revision 433824d. Status changed to Fixed. |
by huseby@lindenlab.com:
Attachments:
The text was updated successfully, but these errors were encountered: