Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractLinearCode.canonical_representative not properly wrapped in sig_on and sig_off #21628

Closed
johanrosenkilde opened this issue Oct 3, 2016 · 20 comments

Comments

@johanrosenkilde
Copy link
Contributor

sage: C = LinearCode(random_matrix(GF(47), 12, 43))
sage: C.canonical_representative()
<wait half an eternity or try to use C-c>

Pressing C-c either seems to have no effect, throws an exception which is not caught and computation continues, or segfaults.

This is very unfortunate because this is a function commonly tried but interrupted due to long run-time.

Depends on #21651

CC: @sagetrac-tfeulner

Component: coding theory

Keywords: linear_code

Branch/Commit: u/jsrn/21628_canonical_representative_interrupting @ 40269b9

Reviewer: Johan Rosenkilde, Jeroen Demeyer

Issue created by migration from https://trac.sagemath.org/ticket/21628

@johanrosenkilde
Copy link
Contributor Author

@johanrosenkilde
Copy link
Contributor Author

Commit: 40269b9

@johanrosenkilde
Copy link
Contributor Author

Author: Johan Rosenkilde

@johanrosenkilde
Copy link
Contributor Author

comment:2

I've added calls to sig_check in the constructor LinearCodeAutGroupCanLabel (which might call itself recursively), as well as sig_on/off the call to PartitionRefinementLinearCode. The latter is, AFAICT, the main work-horse of the algorithm. Both are Cython, but there's more C-like code in PartitionRefinementLinearCode (in codecan.pyx), so this might be why it used to behave so badly when interrupted.

I'm not exactly sure how the interruption works, but it now seems to me to be responsive and not crash. I also don't know if there might be memory leaks left after the interruption :-S


New commits:

40269b9Insert sig_on/off around expensive check and sig_check on recursive calls

@jdemeyer
Copy link

jdemeyer commented Oct 3, 2016

comment:3

The call to PartitionRefinementLinearCode is a Python call, so sig_on() is inappropriate. sig_on() is only meant to wrap calls to external libraries that you have no control over.

In this case, you should use sig_check() in the Cython code instead.

@johanrosenkilde
Copy link
Contributor Author

comment:4

Replying to @jdemeyer:

The call to PartitionRefinementLinearCode is a Python call, so sig_on() is inappropriate. sig_on() is only meant to wrap calls to external libraries that you have no control over.

In this case, you should use sig_check() in the Cython code instead.

Thanks. Of course you're the mastermind behind cysignals, but the documentation of cysignals does say: "You should put sig_on() before and sig_off() after any Cython code which could potentially take a long time.". Which is why I thought it would be OK to do that.

Further, removing that sig_on/sig_off pair, I get the following segfault when interrupting:

------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

Further leading me to believe sig_on/sig_off was the way to go.

I've tried to look closer at PartitionRefinementLinearCode: it quickly calls into sage/groups/perm_gps/partn_ref2/refinement_generic.pyx which is a very complicated, mutually recursive backtracking algorithm, and it's not very clear what the "expensive loop" is. Exactly why is it not OK to use sig_on/sig_off in such a case?

Also, it's not clear to me why Sage segfaults on C-c since all the code I can see is just Cython code. If this is expected behaviour, why is the following message printed if sig_on/sig_off is not the right way to solve such an issue?

Best,
Johan

@jdemeyer
Copy link

jdemeyer commented Oct 4, 2016

comment:5

Well, the cysignals documentation also has a warning box with

The code inside sig_on() should be pure C or Cython code. If you call any Python code or manipulate any Python object (even something trivial like x = []), an interrupt can mess up Python’s internal state. When in doubt, try to use sig_check() instead.

@jdemeyer
Copy link

jdemeyer commented Oct 4, 2016

comment:6

Replying to @johanrosenkilde:

Further, removing that sig_on/sig_off pair, I get the following segfault when interrupting:

------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
------------------------------------------------------------------------

That's probably an independent problem. I strongly suggest that you first fix this segfault before moving on. It is pointless to build on top of a broken foundation.

Do you have a GDB backtrace (if you have gdb installed, such a backtrace will be shown automatically) or a way for me to reproduce that segfault?

@johanrosenkilde
Copy link
Contributor Author

comment:7

Replying to @jdemeyer:

Well, the cysignals documentation also has a warning box with

The code inside sig_on() should be pure C or Cython code. If you call any Python code or manipulate any Python object (even something trivial like x = []), an interrupt can mess up Python’s internal state. When in doubt, try to use sig_check() instead.

Haha, good point :-)

That's probably an independent problem. I strongly suggest that you first fix this segfault before moving on. It is pointless to build on top of a broken foundation.

Do you have a GDB backtrace (if you have gdb installed, such a backtrace will be shown automatically) or a way for me to reproduce that segfault?

Hmm, OK. What I run to produce the segfault is the following:

sage: C = LinearCode(random_matrix(GF(47), 25, 35))
sage: C.canonical_representative()
< Hit C-c and watch the fireworks >

@johanrosenkilde
Copy link
Contributor Author

comment:8

The backtrace from gdb is:

/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x43c5)[0x7f872e4e73c5]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x4415)[0x7f872e4e7415]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/cysignals/signals.so(+0x6f07)[0x7f872e4e9f07]
/usr/lib/libpthread.so.0(+0x11080)[0x7f87369d8080]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/codecan.so(+0xcec0)[0x7f856f699ec0]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/groups/perm_gps/partn_ref2/refinement_generic.so(+0x10150)[0x7f857b071150]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/codecan.so(+0x2caf1)[0x7f856f6b9af1]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0xbe6b5)[0x7f8736ca26b5]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x5c90)[0x7f856f8f8c90]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x2a74f)[0x7f856f91d74f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x6403c)[0x7f8736c4803c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47)[0x7f8736ce69d7]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyInstance_New+0x6c)[0x7f8736c4fe1c]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x5c90)[0x7f856f8f8c90]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/coding/codecan/autgroup_can_label.so(+0x34d19)[0x7f856f927d19]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x6403c)[0x7f8736c4803c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_CallObjectWithKeywords+0x47)[0x7f8736ce69d7]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyInstance_New+0x6c)[0x7f8736c4fe1c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3adf)[0x7f8736ceaa7f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(+0x8390d)[0x7f8736c6790d]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0xf563)[0x7f872ce17563]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0xaeb0)[0x7f872ce12eb0]
/home/jsrn/local/sage/sage_devel/local/lib/python2.7/site-packages/sage/misc/cachefunc.so(+0x1f946)[0x7f872ce27946]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f8736c36e53]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3adf)[0x7f8736ceaa7f]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f8736cedc39]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5d60)[0x7f8736cecd00]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x5075)[0x7f8736cec015]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x8bc)[0x7f8736cedb3c]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x19)[0x7f8736cedc39]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0x8a)[0x7f8736d1227a]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xd5)[0x7f8736d13655]
/home/jsrn/local/sage/sage_devel/local/lib/libpython2.7.so.1.0(Py_Main+0xc6f)[0x7f8736d29daf]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f8735f3e291]
python(_start+0x2a)[0x40067a]

@jdemeyer
Copy link

jdemeyer commented Oct 4, 2016

comment:9

Replying to @johanrosenkilde:

The backtrace from gdb is

No, it's not :-)

That's the backtrace from glibc. The backtrace from gdb should be below that.

@johanrosenkilde
Copy link
Contributor Author

comment:10

Replying to @jdemeyer:

Replying to @johanrosenkilde:

The backtrace from gdb is

No, it's not :-)

That's the backtrace from glibc. The backtrace from gdb should be below that.

Ah ok. It says "Attaching gdb to process id 10318." and then "Saved trace to ". But that log file is empty.

@jdemeyer
Copy link

jdemeyer commented Oct 4, 2016

comment:11

Replying to @johanrosenkilde:

Ah ok. It says "Attaching gdb to process id 10318." and then "Saved trace to ". But that log file is empty.

Probably you don't have gdb or it's the wrong version...

@johanrosenkilde
Copy link
Contributor Author

comment:12

I do have gdb since gdb launches when I type gdb. It's not hard to test.

How do I know if it's "the right version"? It's 7.11.1. My gcc is 6.1.1. Presumably they fit together since they're in my package manager?

@jdemeyer
Copy link

jdemeyer commented Oct 5, 2016

Dependencies: #21651

@jdemeyer
Copy link

jdemeyer commented Oct 5, 2016

comment:14

I created #21651 for the existing segfault issue.

@jdemeyer
Copy link

comment:15

As far as I can tell, this issue is completely fixed by #21651.

@johanrosenkilde
Copy link
Contributor Author

comment:16

Yes, I think so. I tried some days ago to make it crash again, but was unable. I wanted to get back to it and try harder, but never got around to it. Since you also tried since then, it's probably true that the issue got fixed.

Let's close it.

@johanrosenkilde johanrosenkilde removed this from the sage-7.4 milestone Oct 14, 2016
@jdemeyer
Copy link

Reviewer: Johan Rosenkilde, Jeroen Demeyer

@jdemeyer
Copy link

Changed author from Johan Rosenkilde to none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants