Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sage crashes trying to find kernel of large rational matrices #11581

Closed
sagetrac-hdevalence mannequin opened this issue Jul 7, 2011 · 16 comments
Closed

Sage crashes trying to find kernel of large rational matrices #11581

sagetrac-hdevalence mannequin opened this issue Jul 7, 2011 · 16 comments

Comments

@sagetrac-hdevalence
Copy link
Mannequin

sagetrac-hdevalence mannequin commented Jul 7, 2011

On Ubuntu 11.04 64-bit, Sage 4.7 crashes trying to find the kernel of a large rational matrix.
This happens every time, and on two machines -- one a Core 2 Duo T5250 and the other a

The crashes happen on my Core 2 Duo machine when the matrix is larger than 101x101.

hdevalence@hdevalence-laptop:~$ sage -gdb
----------------------------------------------------------------------
| Sage Version 4.7, Release Date: 2011-05-23                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
/opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/bin/sage-ipython
GNU gdb (Ubuntu/Linaro 7.2-1ubuntu11) 7.2
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/bin/python...done.
[Thread debugging using libthread_db enabled]
Python 2.6.4 (r264, May 23 2011, 18:54:18) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
sage: version()
'Sage Version 4.7, Release Date: 2011-05-23'
sage: M = random_matrix(QQ,600,600)
sage: M.kernel()

Program received signal SIGSEGV, Segmentation fault.
0x00007fffec1a518d in ATL_dJIK40x40x40TN40x40x0_a1_b1 () from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/lib/libatlas.so
(gdb) bt
#0  0x00007fffec1a518d in ATL_dJIK40x40x40TN40x40x0_a1_b1 () from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/lib/libatlas.so
#1  0x00007fffec28734a in ATL_dmmJIK2 () from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/lib/libatlas.so
#2  0x00007fffec287dea in ATL_dmmJIK () from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/lib/libatlas.so
#3  0x00007fffec27f366 in ATL_dgemm () from /opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/lib/libatlas.so
#4  0x00007fffd925c261 in RowEchelonTransform_rec (p=<value optimized out>, A=0x7fffc4dbb010, n=<value optimized out>, m=600, m1=<value optimized out>, 
    m2=<value optimized out>, k=0, ks=0, frows=1, lrows=1, redflag=0, eterm=0, P=0x94cb500, rp=0x94cc7d0, d=0x7fffffffc248, mp_a=0x7fffffffc0c0, 
    mp_p=0x7fffffffc0b0) at mtrans.c:233
#5  0x00007fffd925b629 in RowEchelonTransform_rec (p=<value optimized out>, A=0x7fffc4dbb010, n=600, m=600, m1=40, m2=<value optimized out>, k=0, ks=0, 
    frows=1, lrows=1, redflag=0, eterm=0, P=0x94cb500, rp=0x94cc7d0, d=0x7fffffffc248, mp_a=0x7fffffffc0c0, mp_p=0x7fffffffc0b0) at mtrans.c:220
#6  0x00007fffd925b629 in RowEchelonTransform_rec (p=<value optimized out>, A=0x7fffc4dbb010, n=600, m=600, m1=40, m2=<value optimized out>, k=0, ks=0, 
    frows=1, lrows=1, redflag=0, eterm=0, P=0x94cb500, rp=0x94cc7d0, d=0x7fffffffc248, mp_a=0x7fffffffc0c0, mp_p=0x7fffffffc0b0) at mtrans.c:220
#7  0x00007fffd925c6c5 in RowEchelonTransform (p=387977, A=0x7fffc4dbb010, n=600, m=600, frows=<value optimized out>, lrows=<value optimized out>, 
    redflag=0, eterm=0, Q=0x94cb500, rp=0x94cc7d0, d=0x7fffffffc248) at mtrans.c:148
#8  0x00007fffd92652fa in nullspaceMP (n=<value optimized out>, m=600, A=0x57e40, mp_N_pass=<value optimized out>) at nullspace.c:237
#9  0x00007fffd94b8406 in __pyx_pf_4sage_6matrix_20matrix_integer_dense_20Matrix_integer_dense_62_rational_kernel_iml (__pyx_v_self=<value optimized out>, 
    unused=<value optimized out>) at sage/matrix/matrix_integer_dense.c:26267
#10 0x00007ffff7a69163 in PyObject_Call (func=0x46d2f80, arg=0x28, kw=0x28) at Objects/abstract.c:2492
#11 0x00007fffd8de5f9e in __pyx_pf_4sage_6matrix_21matrix_rational_dense_21Matrix_rational_dense_27right_kernel (__pyx_v_self=0x7ffff7ed72f0, 
    __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>) at sage/matrix/matrix_rational_dense.c:13253
#12 0x00007ffff7a69163 in PyObject_Call (func=0x7cbf38, arg=0x28, kw=0x28) at Objects/abstract.c:2492
#13 0x00007ffff7b076c3 in PyEval_CallObjectWithKeywords (func=0x7cbf38, arg=0x7ffff7f90050, kw=0x28) at Python/ceval.c:3575
#14 0x00007fffdaa999dd in __pyx_pf_4sage_6matrix_7matrix2_6Matrix_35left_kernel (__pyx_v_self=0x7ffff7ed7248, __pyx_args=0x7ffff7f90050, 
    __pyx_kwds=<value optimized out>) at sage/matrix/matrix2.c:14300
#15 0x00007ffff7a69163 in PyObject_Call (func=0x46d2e18, arg=0x28, kw=0x28) at Objects/abstract.c:2492
---Type <return> to continue, or q <return> to quit---
#16 0x00007ffff7b076c3 in PyEval_CallObjectWithKeywords (func=0x46d2e18, arg=0x7ffff7f90050, kw=0x28) at Python/ceval.c:3575
#17 0x00007fffdaa99d91 in __pyx_pf_4sage_6matrix_7matrix2_6Matrix_33kernel (__pyx_v_self=<value optimized out>, __pyx_args=0x7ffff7f90050, 
    __pyx_kwds=<value optimized out>) at sage/matrix/matrix2.c:13075
#18 0x00007ffff7b0d592 in call_function (f=0x73d30f0, throwflag=<value optimized out>) at Python/ceval.c:3706
#19 PyEval_EvalFrameEx (f=0x73d30f0, throwflag=<value optimized out>) at Python/ceval.c:2389
#20 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0x47b0558, globals=<value optimized out>, locals=<value optimized out>, args=0x0, 
    argcount=<value optimized out>, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#21 0x00007ffff7b0f342 in PyEval_EvalCode (co=0x28, globals=0x28, locals=0x28) at Python/ceval.c:522
#22 0x00007ffff7b0e619 in exec_statement (f=0x6d3000, throwflag=<value optimized out>) at Python/ceval.c:4401
#23 PyEval_EvalFrameEx (f=0x6d3000, throwflag=<value optimized out>) at Python/ceval.c:1717
#24 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0xaa37b0, globals=<value optimized out>, locals=<value optimized out>, args=0x6d2900, 
    argcount=<value optimized out>, kws=0x2, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#25 0x00007ffff7b0d60d in fast_function (f=0x6d2760, throwflag=<value optimized out>) at Python/ceval.c:3802
#26 call_function (f=0x6d2760, throwflag=<value optimized out>) at Python/ceval.c:3727
#27 PyEval_EvalFrameEx (f=0x6d2760, throwflag=<value optimized out>) at Python/ceval.c:2389
#28 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0xaa36c0, globals=<value optimized out>, locals=<value optimized out>, args=0x2, 
    argcount=<value optimized out>, kws=0xb40348, kwcount=0, defs=0xb40338, defcount=2, closure=0x0) at Python/ceval.c:2968
#29 0x00007ffff7b0d60d in fast_function (f=0x47f1a70, throwflag=<value optimized out>) at Python/ceval.c:3802
#30 call_function (f=0x47f1a70, throwflag=<value optimized out>) at Python/ceval.c:3727
#31 PyEval_EvalFrameEx (f=0x47f1a70, throwflag=<value optimized out>) at Python/ceval.c:2389
#32 0x00007ffff7b0de88 in fast_function (f=0xb9d030, throwflag=<value optimized out>) at Python/ceval.c:3792
#33 call_function (f=0xb9d030, throwflag=<value optimized out>) at Python/ceval.c:3727
#34 PyEval_EvalFrameEx (f=0xb9d030, throwflag=<value optimized out>) at Python/ceval.c:2389
#35 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0xaa3288, globals=<value optimized out>, locals=<value optimized out>, args=0xbad6d8, 
---Type <return> to continue, or q <return> to quit---
    argcount=<value optimized out>, kws=0x2, kwcount=0, defs=0xb00fa8, defcount=1, closure=0x0) at Python/ceval.c:2968
#36 0x00007ffff7b0d60d in fast_function (f=0xbad550, throwflag=<value optimized out>) at Python/ceval.c:3802
#37 call_function (f=0xbad550, throwflag=<value optimized out>) at Python/ceval.c:3727
#38 PyEval_EvalFrameEx (f=0xbad550, throwflag=<value optimized out>) at Python/ceval.c:2389
#39 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0xa9af30, globals=<value optimized out>, locals=<value optimized out>, args=0xbaad00, 
    argcount=<value optimized out>, kws=0x2, kwcount=0, defs=0xb00f68, defcount=1, closure=0x0) at Python/ceval.c:2968
#40 0x00007ffff7b0d60d in fast_function (f=0xbaab70, throwflag=<value optimized out>) at Python/ceval.c:3802
#41 call_function (f=0xbaab70, throwflag=<value optimized out>) at Python/ceval.c:3727
#42 PyEval_EvalFrameEx (f=0xbaab70, throwflag=<value optimized out>) at Python/ceval.c:2389
#43 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0x7ffff7ea5eb8, globals=<value optimized out>, locals=<value optimized out>, args=0x2, 
    argcount=<value optimized out>, kws=0x8bab28, kwcount=2, defs=0x8bab18, defcount=2, closure=0x0) at Python/ceval.c:2968
#44 0x00007ffff7b0d60d in fast_function (f=0x6bdc30, throwflag=<value optimized out>) at Python/ceval.c:3802
#45 call_function (f=0x6bdc30, throwflag=<value optimized out>) at Python/ceval.c:3727
#46 PyEval_EvalFrameEx (f=0x6bdc30, throwflag=<value optimized out>) at Python/ceval.c:2389
#47 0x00007ffff7b0f26d in PyEval_EvalCodeEx (co=0x7ffff7efcbe8, globals=<value optimized out>, locals=<value optimized out>, args=0x0, 
    argcount=<value optimized out>, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2968
#48 0x00007ffff7b0f342 in PyEval_EvalCode (co=0x28, globals=0x28, locals=0x28) at Python/ceval.c:522
#49 0x00007ffff7b2f4e0 in run_mod (fp=0x692620, filename=<value optimized out>, start=<value optimized out>, globals=<value optimized out>, locals=0x63d270, 
    closeit=0, flags=0x7fffffffd690) at Python/pythonrun.c:1335
#50 PyRun_FileExFlags (fp=0x692620, filename=<value optimized out>, start=<value optimized out>, globals=<value optimized out>, locals=0x63d270, closeit=0, 
    flags=0x7fffffffd690) at Python/pythonrun.c:1321
#51 0x00007ffff7b2f6ac in PyRun_SimpleFileExFlags (fp=<value optimized out>, 
    filename=0x7fffffffe975 "/opt/sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux/local/bin/sage-ipython", closeit=0, flags=0x7fffffffd690)
    at Python/pythonrun.c:931
---Type <return> to continue, or q <return> to quit---
#52 0x00007ffff7b3bb77 in RunStartupFile (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:142
#53 Py_Main (argc=<value optimized out>, argv=<value optimized out>) at Modules/main.c:558
#54 0x00007ffff6e02eff in __libc_start_main (main=0x4007a0 <main>, argc=2, ubp_av=0x7fffffffd7b8, init=<value optimized out>, fini=<value optimized out>, 
    rtld_fini=<value optimized out>, stack_end=0x7fffffffd7a8) at libc-start.c:226
#55 0x00000000004006d9 in _start ()
(gdb) q

CC: @jdemeyer

Component: linear algebra

Keywords: rational matrix segmentation fault ATLAS

Reviewer: Leif Leonhardy

Issue created by migration from https://trac.sagemath.org/ticket/11581

@rbeezer
Copy link
Mannequin

rbeezer mannequin commented Jul 7, 2011

comment:1

Thanks for the report.

Works for me on 64-bit Ubuntu 10.10 on Intel i7-2600 with the latest Sage alpha release. The matrix kernel routines were refactored between 4.7 and now (#10746), but I suspect that is totally unrelated (famous last words).

----------------------------------------------------------------------
| Sage Version 4.7.1.alpha4, Release Date: 2011-07-03                |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
**********************************************************************
*                                                                    *
* Warning: this is a prerelease version, and it may be unstable.     *
*                                                                    *
**********************************************************************
sage: M = random_matrix(QQ,4000,4000)
sage: M.kernel()
Vector space of degree 4000 and dimension 0 over Rational Field
Basis matrix:
0 x 4000 dense matrix over Rational Field

Looks like failure is in the Integer Matrix Library (IML) code (nullspaceMP routine) which appers to be using ATLAS.

Are you constrained for memory?

Suggestions you could try.

(a) Looks like you have a pre-built binary? You could build a copy of Sage from source, which is quite easy (but takes hours). See http://groups.google.com/group/sage-release for links to the latest alpha release.

(b) Install the upcoming improved ATLAS spkg at #10226 (positive review).

(c) You could apply #10746, but I think that would be irrelevant in this case.

@sagetrac-hdevalence
Copy link
Mannequin Author

sagetrac-hdevalence mannequin commented Jul 12, 2011

comment:2

Hi, I tried building Sage (the release, not the alpha) on the Core 2 machine. It solved the problem. I also tried the alpha on the other machine; it also solved the problem and performance seems to be better as well. Thanks for the help.

@rbeezer
Copy link
Mannequin

rbeezer mannequin commented Jul 12, 2011

comment:3

Replying to @sagetrac-hdevalence:

Hi, I tried building Sage (the release, not the alpha) on the Core 2 machine. It solved the problem. I also tried the alpha on the other machine; it also solved the problem and performance seems to be better as well. Thanks for the help.

Great! Glad to hear building from source worked for you. Thanks for following up.

'For Jeroen Dmeyer - (release manager):

Should we close this? Seems like maybe a one-off problem, until we hear of other similar experiences?

@jdemeyer
Copy link

comment:4

Replying to @rbeezer:

Should we close this? Seems like maybe a one-off problem, until we hear of other similar experiences?

This sounds like a problem with binary releases so I don't think we should simply ignore this report.

@rbeezer
Copy link
Mannequin

rbeezer mannequin commented Jul 12, 2011

comment:5

Replying to @jdemeyer:

This sounds like a problem with binary releases so I don't think we should simply ignore this report.

Right. Here's a confirmation of sorts. My configuration is not too different from that of the original poster.

64-bit Ubuntu 10.10 on Intel i7-2600

Using latest binary, which unpacks to directory: sage-4.7-linux-64bit-ubuntu_10.04.1_lts-x86_64-Linux

sage: A=random_matrix(QQ, 101, 101)
sage: A.kernel()
Vector space of degree 101 and dimension 0 over Rational Field
Basis matrix:
0 x 101 dense matrix over Rational Field
sage: A=random_matrix(QQ, 102, 102)
sage: A.kernel()
<boom: segmentation fault>

I'll be useless doing much more to debug this unless led by the hand. But I could call for more testing, via IRC or sage-devel, with any reports posted here. If so, is the gdb trace the best thing to post (besides OS)?

What would you advise? That's probably about as much more as I can contribute.

Rob

@jdemeyer
Copy link

comment:6

Replying to @rbeezer:

What would you advise? That's probably about as much more as I can contribute.

Could you try version sage-4.7.1.alpha1 and check whether that works? (the last version without #10746)

If NO, try the released sage-4.7.

If YES, try sage-4.7.1.alpha2 or later.

@jdemeyer
Copy link

comment:7

I just realized I misinterpreted the report. The problematic version is sage-4.7, so you should try the most recent alpha version instead. Currently, this is sage-4.7.1.alpha4

@jdemeyer

This comment has been minimized.

@rbeezer
Copy link
Mannequin

rbeezer mannequin commented Aug 9, 2011

comment:8

A similar problem is discussed at:

http://groups.google.com/group/sage-support/browse_thread/thread/4b421d04fb2d2297

@nexttime
Copy link
Mannequin

nexttime mannequin commented Aug 9, 2011

comment:9

It would perhaps be helpful to know on what kind of machine the binaries were built.

On the other hand unsupported instructions should raise a SIGILL, and not cause a segfault. Also, why should this depend on the sizes of the matrices? (A different code path might get executed, but that's IMHO unlikely regarding the differences in the sizes of the matrices which work and which don't.)

@nexttime
Copy link
Mannequin

nexttime mannequin commented Aug 10, 2011

Changed keywords from rational matrix segmentation fault to rational matrix segmentation fault ATLAS

@nexttime
Copy link
Mannequin

nexttime mannequin commented Aug 10, 2011

comment:10

Another segfault presumably caused by ATLAS with large matrices (this time NumPy, 512x512), and this time clearly in libpthread: #11674

@nexttime
Copy link
Mannequin

nexttime mannequin commented Aug 10, 2011

comment:11

ping because trac notifications seem to work again.

@jdemeyer
Copy link

comment:12

Let's assume this is fixed...

@jdemeyer
Copy link

Reviewer: Leif Leonhardy

@nexttime
Copy link
Mannequin

nexttime mannequin commented Jun 12, 2013

comment:14

I think this was also just caused by the binary distribution being built on redhawk, which ran the obsolete Ubuntu 10.04.1 LTS, with an apparently incompatible / broken libpthread. (It's now at 10.04.4, so this should no longer happen.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants