Skip to content
This repository was archived by the owner on Nov 24, 2018. It is now read-only.

Conversation

@btracey
Copy link
Member

@btracey btracey commented Sep 10, 2015

The condition number routines are used to check the condition number of a matrix before attempting a linear solve.

are used to check the condition number of a matrix before attempting a linear solve
@kortschak
Copy link
Member

I've seen the dlantr failure on my machine at home too, but sporadically. For perpetuity:

./cgo
fatal error: unexpected signal during runtime execution
[signal 0xb code=0x1 addr=0xc820500028 pc=0x7f718a6dce80]
runtime stack:
runtime.throw(0x66cd60, 0x2a)
    /usr/local/go/src/runtime/panic.go:527 +0x90
runtime.sigpanic()
    /usr/local/go/src/runtime/sigpanic_unix.go:12 +0x5a
runtime.systemstack(0xc800000065)
    /usr/local/go/src/runtime/asm_amd64.s:279 +0xab
runtime.mHeap_AllocStack(0x5, 0x49, 0x55)
    /usr/local/go/src/runtime/mheap.go:498 +0x25
goroutine 6 [syscall, locked to thread]:
runtime.cgocall(0x42a660, 0xc820041a78, 0xc800000000)
    /usr/local/go/src/runtime/cgocall.go:120 +0x11b fp=0xc820041a30 sp=0xc820041a00
github.com/gonum/lapack/cgo/clapack._Cfunc_LAPACKE_dlantr(0x554c4900000065, 0xa00000005, 0xc8204ffdc0, 0x7f710000000b, 0x0)
    github.com/gonum/lapack/cgo/clapack/_obj/_cgo_gotypes.go:6622 +0x3a fp=0xc820041a78 sp=0xc820041a30
github.com/gonum/lapack/cgo/clapack.Dlantr(0x49, 0x4c, 0x55, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, 0x80)
    /home/travis/gopath/src/github.com/gonum/lapack/cgo/clapack/clapack.go:3112 +0xa1 fp=0xc820041ab8 sp=0xc820041a78
github.com/gonum/lapack/cgo.Implementation.Dlantr(0x7f714ad6c449, 0x7a, 0x84, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, 0xc8204f7600, ...)
    github.com/gonum/lapack/cgo/_test/_obj_test/lapack.go:167 +0x277 fp=0xc820041b20 sp=0xc820041ab8
github.com/gonum/lapack/cgo.(*Implementation).Dlantr(0x95db98, 0x49, 0x7a, 0x84, 0x5, 0xa, 0xc8204ffdc0, 0x37, 0x37, 0xb, ...)
    <autogenerated>:3 +0x12c fp=0xc820041b90 sp=0xc820041b20
github.com/gonum/lapack/testlapack.DlantrTest(0xc820020120, 0x7f714ad6c728, 0x95db98)
    /home/travis/gopath/src/github.com/gonum/lapack/testlapack/dlantr.go:73 +0x607 fp=0xc820041f28 sp=0xc820041b90
github.com/gonum/lapack/cgo.TestDlantr(0xc820020120)
    /home/travis/gopath/src/github.com/gonum/lapack/cgo/lapack_test.go:21 +0x76 fp=0xc820041f68 sp=0xc820041f28
testing.tRunner(0xc820020120, 0x927a98)
    /usr/local/go/src/testing/testing.go:456 +0x98 fp=0xc820041fa0 sp=0xc820041f68
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1 fp=0xc820041fa8 sp=0xc820041fa0
created by testing.RunTests
    /usr/local/go/src/testing/testing.go:561 +0x86d
goroutine 1 [chan receive]:
testing.RunTests(0x6841e8, 0x927a80, 0xe, 0xe, 0xc82007c301)
    /usr/local/go/src/testing/testing.go:562 +0x8ad
testing.(*M).Run(0xc820043f08, 0x93a180)
    /usr/local/go/src/testing/testing.go:494 +0x70
main.main()
    github.com/gonum/lapack/cgo/_test/_testmain.go:126 +0x252
goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

@kortschak
Copy link
Member

LGTM

@btracey
Copy link
Member Author

btracey commented Sep 11, 2015

Interesting. I have never seen that failure. Is there a way to tell cgo problems from lapack problems? Go is throwing the error, but, for example, would Xerbla give a go throw like this? It's in allocStack which makes me think it's a Go problem, but I don't know enough to have an educated guess.

btracey added a commit that referenced this pull request Sep 11, 2015
Add the Dxxcon routines to lapack64.
@btracey btracey merged commit fcf1e18 into master Sep 11, 2015
@btracey btracey deleted the addlapackcon branch September 11, 2015 00:09
@kortschak
Copy link
Member

I would say it's lapacke.

@kortschak
Copy link
Member

It's reliably reproduced with go test -run Dlantr -cpu 2,2,2,2. Fewer than 3 2s doesn't do it, 3 does with lower probability. It may be a threading problem with OpenBLAS or some cgo thing.

@btracey
Copy link
Member Author

btracey commented Sep 11, 2015

Nice find! Reproduced on my computer as well.

@kortschak
Copy link
Member

I think I've figured it out, but would you take a look.

I think that LAPACKE_dtr_nancheck fails when m < n because m is not passed and the caller assumes that a triangle is not trapezoid even though LAPACKE_dlantr may take such a matrix. Then stack corruption and bang! later when that is uncovered.

Does this look right to you?

@kortschak
Copy link
Member

I've confirmed that this fixes the real case, but I don't think it is the proper fix.

diff --git a/lapack-netlib/lapacke/src/lapacke_dlantr.c b/lapack-netlib/lapacke/
index 2cde1eb..9d44972 100644
--- a/lapack-netlib/lapacke/src/lapacke_dlantr.c
+++ b/lapack-netlib/lapacke/src/lapacke_dlantr.c
@@ -46,7 +46,7 @@ double LAPACKE_dlantr( int matrix_order, char norm, char uplo,
     }
 #ifndef LAPACK_DISABLE_NAN_CHECK
     /* Optionally check input matrices for NaNs */
-    if( LAPACKE_dtr_nancheck( matrix_order, uplo, diag, n, a, lda ) ) {
+    if( LAPACKE_dtr_nancheck( matrix_order, uplo, diag, MIN(m,n), a, lda ) ) {
         return -7;
     }
 #endif

The issue now is that dlantr does work on an m by n triangular and the NaN check is only over the triangular component, so the is the possibility of a NaN in the rectangle outside that. What do you think?

@btracey
Copy link
Member Author

btracey commented Sep 12, 2015

I agree with your analysis. LAPACKE_dtr_nancheck should be modified to look at trapezoidal matrices or there should be a dtrap which does.

@btracey
Copy link
Member Author

btracey commented Sep 12, 2015

I checked a couple of other functions that have failed on Travis, and I didn't see a similar bug arising.

@kortschak
Copy link
Member

kortschak commented Sep 12, 2015 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants