Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying completion list of maxima_lib crashes Sage #22766

Closed
rwst opened this issue Apr 6, 2017 · 43 comments
Closed

Trying completion list of maxima_lib crashes Sage #22766

rwst opened this issue Apr 6, 2017 · 43 comments

Comments

@rwst
Copy link

rwst commented Apr 6, 2017

If Maxima's commands list is not stored, then initialising Maxima/ECL and then hitting TAB after maxima_lib crashes Sage, as shown below. Other similar crashes may be triggered, see e.g. #23956.

The reason for these crashes is the design of tab completion in IPython 5+ using
prompt_toolkit, which uses Python threading, and does tab completion in a separate thread.

$ rm -f ~/.sage/maxima_commandlist_cache.sobj
$ sage
┌────────────────────────────────────────────────────────────────────┐
│ SageMath version 8.0.beta0, Release Date: 2017-03-30               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: from sage.interfaces.maxima_lib import maxima_lib
sage: maxima_lib.
Building Maxima command completion list (this takes
a few seconds only the first time you do it).
To force rebuild later, delete /home/ralf/.sage//maxima_commandlist_cache.sobj.
A
;;;
;;; Stack overflow.
;;; Jumping to the outermost toplevel prompt
;;;


Internal or unrecoverable error in:

;;;
;;; No frame to jump to
;;; Aborting ECL
;;;

;;; ECL C Backtrace
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_dump_c_backtrace+0x26) [0x7f012]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_internal_error+0x3f) [0x7f0127]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(FEerror+0) [0x7f0127706f20]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x1adb1a) [0x7f012772eb1a]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x12b3c5) [0x7f01276ac3c5]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_funcall+0x70) [0x7f01276e7c90]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_serror+0xd9) [0x7f0127708299]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_cs_overflow+0xac) [0x7f012772e]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(+0x12b3c5) [0x7f01276ac3c5]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_funcall+0x70) [0x7f01276e7c90]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(si_serror+0xd9) [0x7f0127708299]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_cs_overflow+0xac) [0x7f012772e]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(ecl_interpret+0x1d67) [0x7f01276ea]
;;; /home/ralf/sage/local/lib/libecl.so.16.1(cl_apply+0x145) [0x7f01276e7e95]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0xcf1d)]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x15511]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x15d6b]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0x16d47]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(+0xc0c7f) [0x7f038ea92c7f]
;;; /home/ralf/sage/local/lib/python2.7/site-packages/sage/libs/ecl.so(+0xcad8)]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f038e]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x56da) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x8020) [0]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x81c) [0x7]
;;; /home/ralf/sage/local/lib/libpython2.7.so.1.0(+0x87ecc) [0x7f038ea59ecc]
Aborted (core dumped)

Depends on #23956

CC: @jdemeyer

Component: interfaces

Reviewer: Dima Pasechnik

Issue created by migration from https://trac.sagemath.org/ticket/22766

@rwst rwst added this to the sage-8.0 milestone Apr 6, 2017
@rwst
Copy link
Author

rwst commented Apr 6, 2017

comment:1

I think the crash happens in line 281 in /home/ralf/sage/local/var/tmp/sage/build/ecl-16.1.2.p2/src/src/c/interpreter.d.

@dimpase
Copy link
Member

dimpase commented Oct 2, 2017

comment:2

It is because since IPython 5.*, tab completion happens in a different thread.
And here you initialise Maxima in the main thread, but run Maxima (via triggering tab completion) in a separate thread.

See also #23700 and #23956 for a different way to trigger the same bug.

@jhpalmieri
Copy link
Member

comment:3

On OS X (10.12.6, Xcode 8.3.3, Sage 8.1.beta6), I don't get this crash, and I also don't see the message "Building Maxima command completion list ...": I just immediately get a list of completions.

@dimpase
Copy link
Member

dimpase commented Oct 2, 2017

comment:4

Replying to @jhpalmieri:

On OS X (10.12.6, Xcode 8.3.3, Sage 8.1.beta6), I don't get this crash, and I also don't see the message "Building Maxima command completion list ...": I just immediately get a list of completions.

It's because you already have this list built and stored in somewhere in ~/.sage, I suppose. Try moving it out of the way 1st.

@jhpalmieri
Copy link
Member

comment:5

Okay, I get the crash after deleting .sage/maxima_commandlist_cache.sobj. Do you know why starting Sage and then doing maxima.<TAB> doesn't trigger the crash, but instead rebuilds this file?

@dimpase
Copy link
Member

dimpase commented Oct 2, 2017

comment:6

Replying to @jhpalmieri:

Okay, I get the crash after deleting .sage/maxima_commandlist_cache.sobj. Do you know why starting Sage and then doing maxima.<TAB> doesn't trigger the crash, but instead rebuilds this file?

Yes, it is because in this case everything happens in the same thread (the one of the tab completion).

@nbruin
Copy link
Contributor

nbruin commented Oct 3, 2017

comment:7

Making maxima_lib "thread-safe" would consist of locking it to one thread. Due to the signal management switching that happens upon entering/exiting ecllib makes it fundamentally incompatible with multi-threading, because signal handlers are process-specific; not thread-specific.

Sage installs special signal handlers (for SIGINT, for instance), and so does ECL. If ECL runs with multi-threading, ECL even goes further with signal handling (it makes a dedicated signal handling thread), and it uses signals to synchronize threads for critical GC operations.

If you want to get ecllib to a state where it can safely be used in a multi-threaded environment, I think one would have to unify the signal management of sage and ecl.

The result would not actually make maxima_lib threadsafe, because maxima itself is rather fundamentally not thread-safe.

@rwst
Copy link
Author

rwst commented Oct 3, 2017

comment:8

Removing dependencies seems so much more promising, see https://trac.sagemath.org/wiki/symbolics/maxima

@dimpase
Copy link
Member

dimpase commented Oct 3, 2017

comment:9

Here is another scenario with two threads, only leading to an abort, not to a segfault.
Here I tab-complete from sage.libs.ecl import to force initialisation in non-main thread.

sage: from sage.libs.ecl import                        
 at  init_ecl
 thread id  <Thread(Thread-32, started 139989416191744)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>, <Thread(Thread-32, started 139989416191744)>] 
sage: from sage.libs.ecl import *
sage: from sage.interfaces.maxima_lib import *
 at  ecl_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_safe_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
 at  ecl_safe_eval
 thread id  <_MainThread(MainThread, started 140000333952768)> 
 active threads  [<_MainThread(MainThread, started 140000333952768)>, <HistorySavingThread(IPythonHistorySavingThread, started 140000088241920)>] 
Collecting from unknown thread
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-5e6d4a068396> in <module>()
----> 1 from sage.interfaces.maxima_lib import *

/home/dima/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py in <module>()
    102 ## i.e. loading it into ECL
    103 ecl_eval("(setf *load-verbose* NIL)")
--> 104 ecl_eval("(require 'maxima)")
    105 ecl_eval("(in-package :maxima)")
    106 ecl_eval("(setq $nolabels t))")

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10977)()
   1328 
   1329 #convenience routine to more easily evaluate strings
-> 1330 cpdef EclObject ecl_eval(bytes s):
   1331     """
   1332     Read and evaluate string in Lisp and return the result

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10916)()
   1344     cdef cl_object o
   1345     o=ecl_safe_read_string(s)
-> 1346     o=ecl_safe_eval(o)
   1347     return ecl_wrap(o)
   1348 

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_safe_eval (build/cythonized/sage/libs/ecl.c:5710)()
    343     report_threading_status("ecl_safe_eval")
    344     cdef cl_object s
--> 345     ecl_sig_on()
    346     cl_funcall(2,safe_eval_clobj,form)
    347     ecl_sig_off()

RuntimeError: Aborted
sage: 

with the following prints inserted into ecl.pyx:

diff --git a/src/sage/libs/ecl.pyx b/src/sage/libs/ecl.pyx
index 20e937876d..879d405d78 100644
--- a/src/sage/libs/ecl.pyx
+++ b/src/sage/libs/ecl.pyx
@@ -240,6 +240,7 @@ def init_ecl():
     cdef sigaction_t sage_action[32]
     cdef int i
 
+    report_threading_status("init_ecl")
     if ecl_has_booted:
         raise RuntimeError("ECL is already initialized")
 
@@ -339,6 +340,7 @@ cdef cl_object ecl_safe_eval(cl_object form) except NULL:
         ...
         RuntimeError: ECL says: Console interrupt.
     """
+    report_threading_status("ecl_safe_eval")
     cdef cl_object s
     ecl_sig_on()
     cl_funcall(2,safe_eval_clobj,form)
@@ -1318,6 +1320,12 @@ cdef EclObject ecl_wrap(cl_object o):
     obj.set_obj(o)
     return obj
 
+cpdef report_threading_status(s):
+    import threading
+    print("\n at ", s)
+    print("\n thread id ", threading.current_thread(), "\n")
+    print(" active threads ", threading.enumerate(), "\n")
+
 #convenience routine to more easily evaluate strings
 cpdef EclObject ecl_eval(bytes s):
     """
@@ -1332,6 +1340,7 @@ cpdef EclObject ecl_eval(bytes s):
         <ECL: (1 1 2 3 5 8 13)>
 
     """
+    report_threading_status("ecl_eval")
     cdef cl_object o
     o=ecl_safe_read_string(s)
     o=ecl_safe_eval(o)

@dimpase
Copy link
Member

dimpase commented Oct 4, 2017

comment:10

I think what we see here is ECL being initialised in a thread (number 32) that later is shut down, and then maxima_lib import breaks, as ECL isn't available to run.

It seems that indeed we must make sure that ECL is always started in the main thread, which does not disappear.

@dimpase
Copy link
Member

dimpase commented Oct 5, 2017

comment:11

A part of the relevant discussion is on #23956, which I'll close as duplicate.

@dimpase
Copy link
Member

dimpase commented Oct 5, 2017

Dependencies: #23956

@dimpase

This comment has been minimized.

@dimpase
Copy link
Member

dimpase commented Oct 5, 2017

comment:14

I wonder whether any other extension (apart from ECL/Maxima) is affected by this issue.

@nbruin
Copy link
Contributor

nbruin commented Oct 6, 2017

comment:15

Reiterating from 23956:

The effect of ecl_sig_on and ecl_sig_off is NOT thread-local. Thus during the clock time that ecl_sig_on is active (i.e., that ecl code is being executed), signals that are supposed to be handled by the sage signal handler will be handled in the wrong way.

That means it is NOT safe to execute sage code in a thread parallel to a thread that is executing ecl code (properly).

So, if we allow for multiple threads in sage, we'd strictly have to halt all the other threads upon executing ecl_sig_on, and start them again when the corresponding ecl_sig_off is entered.
That, or cross your fingers no signals destined for python arrive during that time period.

In addition, we're running ECL with threading support on their end disabled. I would be surprised if, with that configuration, it is still possible to have multiple threads configured to be able to execute ECL (ECL cares a lot about knowing which threads might be executing ECL code, because they need to be stopped during critical GC events. I expect that all of that is turned off when threading support is turned off).

Given that IPython apparently runs tab completion in a separate thread, I think the most straightforward way of solving the immediate problem here is to avoid that ecl code will be run upon tab completion. That can be done by building the completion cache upon build time, rather than on-demand.

@dimpase
Copy link
Member

dimpase commented Oct 6, 2017

comment:16

Replying to @nbruin:

The effect of ecl_sig_on and ecl_sig_off is NOT thread-local. Thus during the clock time that ecl_sig_on is active (i.e., that ecl code is being executed), signals that are supposed to be handled by the sage signal handler will be handled in the wrong way.

That means it is NOT safe to execute sage code in a thread parallel to a thread that is executing ecl code (properly).

Right, I think I finally understand your point about signals---sorry for being thick.

It's even worse, I think - apart from signals, ecllib does non-thread-safe things to global variables...
It's known that in such a case GIL does not suffice, you also need a lock from Python threading

lock=threading.Lock()
with lock:
    <do unsafe (non-atomic) stuff here>

That is we potentially might still get hit by many threads here, even if something seemingly innocent happens.

To me it looks that to disable threads in tab completion is a more robust solution, and it will also make sure that other extensions are safe and sound in this respect, not only ECL/Maxima.

@nbruin
Copy link
Contributor

nbruin commented Oct 6, 2017

comment:17

Replying to @dimpase:

It's even worse, I think - apart from signals, ecllib does non-thread-safe things to global variables...

After initialization that should pretty much be limited to the modifications that are made to the ECL doubly linked list *SAGE-LIST-OF-OBJECTS*. The modifications run in ECL whenever an EclObject is made or deleted (so that should lock, probably). Otherwise I think the signal stuff is the main obstruction to thread-safety.

maximalib is a different issue: maxima is just not thread-safe in its design at all. So I don't think it's worth investing in making ecllib thread-safe (and the signals are a real obstruction), because our main application doesn't allow it anyway.

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:18

Replying to @nbruin:

It uses signals to synchronize threads for critical GC operations.

It seems that ecl_sig_on() changes SIGINT, SIGBUS and SIGSEGV. Does it really use one of those standard signals to deal with GC operations? Because if none of those 3 signals are involved, the issue can't be signal handlers.

It is true that signals and threads generally do not mix well. Signal handlers are set on the level of the process, not threads.

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:20

I don't think that signal handling has anything to do with this bug here. I don't see any signals being raised in a strace dump and also the error message from ECL says "Stack overflow".

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:21

Wait a minute... the "stack overflow" reminds me of a very similar issue that affected PARI/GP: #17773

@dimpase
Copy link
Member

dimpase commented Oct 6, 2017

comment:22

Boehm GC (which is used by ECL) installs its own signal handlers in order to be able to scan for garbage. Thus if another thread does something to signals, then GC and thus ECL might go belly up.

"Stack overflow" might be due to GC being given data to work on from another thread it does not know about.

The more I think about it the more inclined I become towards disabling tab completion in a separate thread.

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:23

Replying to @dimpase:

Boehm GC (which is used by ECL) installs its own signal handlers in order to be able to scan for garbage.

I understand what you are saying but I don't believe that this has anything to do with this ticket.

@dimpase
Copy link
Member

dimpase commented Oct 6, 2017

comment:24

Replying to @jdemeyer:

Replying to @dimpase:

Boehm GC (which is used by ECL) installs its own signal handlers in order to be able to scan for garbage.

I understand what you are saying but I don't believe that this has anything to do with this ticket.

As Nils explains, Maxima is not thread-safe, and thus invoking it from non-main thread (e.g. from the tab-completion one) is prone to errors. Thus invoking ECL from non-main thread does not need to be allowed.
Assuming this, indeed, signals issue has nothing to do with this ticket, at least if limited to ECL/Maxima scope.

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:25

Right. There are two issues:

  1. The signal switching that Sage does for ECL is not thread-safe.

  2. The ECL check for "stack overflow" is broken if run in a different thread.

This ticket is about the second issue. It can be fixed independently.

@jdemeyer
Copy link

jdemeyer commented Oct 6, 2017

comment:26

It turns out that both issues are actually relevant. After fixing the second issue (in plain IPython, not Sage):

In [1]: import sage.all; from sage.interfaces.maxima_lib import maxima_lib

In [2]: maxima_lib.
Building Maxima command completion list (this takes
a few seconds only the first time you do it).
To force rebuild later, delete /home/jdemeyer/.sage//maxima_commandlist_cache.sobj.
AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPower failure

The Power failure is clearly due to an ECL signal. When fixing also this, it still doesn't work:

Building Maxima command completion list (this takes
a few seconds only the first time you do it).
To force rebuild later, delete /home/jdemeyer/.sage//maxima_commandlist_cache.sobj.
AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoCollecting from unknown thread

So it seems that running ECL only in the main thread is the only solution.

@dimpase
Copy link
Member

dimpase commented Oct 6, 2017

comment:27

Replying to @jdemeyer:

So it seems that running ECL only in the main thread is the only solution.

ECL does have facilities for registering/de-registering threads,
ecl_import_current_thread and ecl_release_current_thread,
only available if you build it with --enable-threads, and with somewhat unclear usage rules. Not even sure if they are compatible with our gc version, or whether they would work at all in our setting - I tried with gc-7.6.0 from #23700, it didn't work - perhaps due to the signals trouble you mention?

IMHO making this work seems to be a tough call, and in particular in the upcoming ECL 16.2 this code (and the signals-handling code) is being changed, so what works for 16.1.2 might break in the next version.

@nbruin
Copy link
Contributor

nbruin commented Oct 8, 2017

comment:28

Replying to @dimpase:

IMHO making this work seems to be a tough call, and in particular in the upcoming ECL 16.2 this code (and the signals-handling code) is being changed, so what works for 16.1.2 might break in the next version.

In that case: perhaps downgrade from blocker and solve later? The issue is a serious one, but the symptoms seem to be easily avoided (it's a rather specific tab completion).

Concerning threading: at least a while ago, enabling threading in ECL meant that ECL would start up a dedicated signal handling thread, and really start using (very strange!) signals to signal GC events to other threads. That setup looked very hard to make compatible with sage. That's why I think we want to stick with ECL without threading (and hope they keep supporting that! They really should if they want to keep the "embeddable" a serious option, because in many embedding scenarios having the library take control of signal handling in such an invasive way will be very hard to work with.)

@dimpase
Copy link
Member

dimpase commented Oct 8, 2017

comment:29

I was just pointed out at a solved IPython issue I missed, which reverts reliance on prompt_toolkit, and brings back single-threading behaviour of IPython:
ipython/ipython#10364

There is another reason for avoiding prompt_toolkit - it does multi-threaded importing of Python modules, and given how fragile Sage is in its dependencies handling, this is something to avoid, unless we want more mysterious crashes to happen.

@dimpase
Copy link
Member

dimpase commented Oct 8, 2017

comment:30

By the way, FriCAS is another (soon to be optional (see #23847), currently experimental) Sage package dependent on ECL in a substantial way.

@nbruin
Copy link
Contributor

nbruin commented Oct 9, 2017

comment:31

Replying to @dimpase:

By the way, FriCAS is another (soon to be optional (see #23847), currently experimental) Sage package dependent on ECL in a substantial way.

That shouldn't interact with the issue here at all, as long as FriCas runs via a proper expect interface. If people start running FriCas in ecllib, I expect bigger trouble, because I don't expect that one can run maxima and fricas in the same lisp without special measures -- both are legacy applications that were originally designed to have the world (or at least their process) to themselves.

Getting rid of multi-threading in IPython sounds like a very good idea.

@dimpase
Copy link
Member

dimpase commented Oct 9, 2017

comment:32

Here is one way to try the old new IPython prompt---this gets rid of crashes for me.
This is only IPython hack - I don't know how to force Sage's IPython switch to this.
One can do the following (I also removed ~/.sage/ for a good measure, not sure if this is needed; also not sure if it really needs 5.5.0, it also seems to work with the IPython 5.0 that we ship):

$ ./sage --pip install git+https://github.com/ipython/ipython/@5.5.0
$ ./sage --pip install rlipython
$ ./sage --ipython

once at IPython prompt, type

import rlipython; rlipython.install()

and quit. Then IPython will use readline for completion, as in good old days of version 4.x.
To test that this fixes the bug: start IPython as ./sage --ipython, and run

from sage.all import *
from sage.interfaces.maxima_lib import maxima_lib
maxima_lib.

(notice the 1st import---needed to initialise Sage).

@embray
Copy link
Contributor

embray commented Oct 9, 2017

comment:33

Replying to @nbruin:

Replying to @dimpase:

IMHO making this work seems to be a tough call, and in particular in the upcoming ECL 16.2 this code (and the signals-handling code) is being changed, so what works for 16.1.2 might break in the next version.

In that case: perhaps downgrade from blocker and solve later? The issue is a serious one, but the symptoms seem to be easily avoided (it's a rather specific tab completion).

Concerning threading: at least a while ago, enabling threading in ECL meant that ECL would start up a dedicated signal handling thread, and really start using (very strange!) signals to signal GC events to other threads. That setup looked very hard to make compatible with sage. That's why I think we want to stick with ECL without threading (and hope they keep supporting that! They really should if they want to keep the "embeddable" a serious option, because in many embedding scenarios having the library take control of signal handling in such an invasive way will be very hard to work with.)

One thing I've been investigating--the relevance of which I'm not sure--is that we compile libgc with threading support but ECL without. The implications of this are complicated enough that I don't fully understand yet, but it makes me wonder if this can lead to bugs (this is possibly related to #23973).

@dimpase
Copy link
Member

dimpase commented Oct 9, 2017

comment:34

Replying to @embray:

Replying to @nbruin:

Replying to @dimpase:

...

Concerning threading: at least a while ago, enabling threading in ECL meant that ECL would start up a dedicated signal handling thread, and really start using (very strange!) signals to signal GC events to other threads. That setup looked very hard to make compatible with sage.

IMHO one takes care of this in ecl.pyx:

    ecl_set_option(ECL_OPT_SIGNAL_HANDLING_THREAD, 0)
    cl_boot(1, argv)

making sure that signals are not handled in a separate thread, no?

That's why I think we want to stick with ECL without threading (and hope they keep supporting that! They really should if they want to keep the "embeddable" a serious option, because in many embedding scenarios having the library take control of signal handling in such an invasive way will be very hard to work with.)

One thing I've been investigating--the relevance of which I'm not sure--is that we compile libgc with threading support but ECL without. The implications of this are complicated enough that I don't fully understand yet, but it makes me wonder if this can lead to bugs (this is possibly related to #23973).

Mind you, I came to this ticket via #23956 via #22679; on the latter I had a lot of trouble with threads (yes, in docbuilding with -jx, x>1 too), until finding out that GC folks have not supplied a complete multithreading interface for FreeBSD---now fixed in ivmai/bdwgc#180
and only then realising that some multithreading-related segfaults happen on Linux too :-)

I am not sure what "multithreading for GC" really means; it can be any combination of 2 things:

1) GC using threads to speed itself up

2) GC properly handles the situation of being initialised/called from a multithreaded application.

IMHO 1) is disabled by --disable-parallel-mark (in perhaps more recent that 7.2f versions...).

@jdemeyer
Copy link

comment:35

I don't think that this should be a blocker issue. It's an annoying bug which crashes Sage, but it's not very likely to appear since the TAB completion is cashed.

@dimpase
Copy link
Member

dimpase commented Nov 13, 2017

comment:36

I already mentioned in comment 9 above that even with the TAB cache present, one can get an annoying runtime error; to replicate,

sage: from sage.libs.ecl import <TAB>

and choose something from the list that pops up, then the following input leads to a runtime error.

sage: from sage.interfaces.maxima_lib import *
Collecting from unknown thread
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-5e6d4a068396> in <module>()
----> 1 from sage.interfaces.maxima_lib import *

/home/dima/Sage/sage-dev/local/lib/python2.7/site-packages/sage/interfaces/maxima_lib.py in <module>()
    102 ## i.e. loading it into ECL
    103 ecl_eval("(setf *load-verbose* NIL)")
--> 104 ecl_eval("(require 'maxima)")
    105 ecl_eval("(in-package :maxima)")
    106 ecl_eval("(setq $nolabels t))")

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10787)()
   1320 
   1321 #convenience routine to more easily evaluate strings
-> 1322 cpdef EclObject ecl_eval(bytes s):
   1323     """
   1324     Read and evaluate string in Lisp and return the result

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_eval (build/cythonized/sage/libs/ecl.c:10726)()
   1335     cdef cl_object o
   1336     o=ecl_safe_read_string(s)
-> 1337     o=ecl_safe_eval(o)
   1338     return ecl_wrap(o)
   1339 

/home/dima/Sage/sage-dev/src/sage/libs/ecl.pyx in sage.libs.ecl.ecl_safe_eval (build/cythonized/sage/libs/ecl.c:5716)()
    341     """
    342     cdef cl_object s
--> 343     ecl_sig_on()
    344     cl_funcall(2,safe_eval_clobj,form)
    345     ecl_sig_off()

RuntimeError: Aborted

On the positive side, ipython folks are apparently going to disable tab completion in a separate thread, once a version of prompt_toolkit with the relevant option is released.

@dimpase
Copy link
Member

dimpase commented Feb 17, 2018

comment:37

It appears that this has nuked GAP pexpect interface, too:
hitting Tab at
sage: gap.
leads to
Warning: this should never happen printed ABOVE the line

sage: gap.

and Sage becomes irresponsive and has to be killed.
(this is with Sage 8.2.beta5)

@jhpalmieri
Copy link
Member

comment:38

Replying to @dimpase:

It appears that this has nuked GAP pexpect interface, too:
hitting Tab at
sage: gap.
leads to
Warning: this should never happen printed ABOVE the line

sage: gap.

and Sage becomes irresponsive and has to be killed.
(this is with Sage 8.2.beta5)

This doesn't happen to me with Sage 8.2.beta6, or maybe I don't understand the necessary steps. If I run Sage and then immediately run "gap.", it works fine (OS X 10.13.3). Am I missing some aspect of triggering this?

@rwst
Copy link
Author

rwst commented Mar 1, 2018

comment:39

Confirmed that gap. works on OpenSuSE with beta6. But maxima_lib. as in the ticket description still crashes.

@rwst
Copy link
Author

rwst commented Mar 1, 2018

comment:40

Moreover if I rm -f ~/.sage/giac_commandlist_cache.sobj then giac.<TAB> does not crash. Probably the peculiar properties of ECL make maxima a special case.

@dimpase
Copy link
Member

dimpase commented Mar 1, 2018

comment:41

Replying to @rwst:

Confirmed that gap. works on OpenSuSE with beta6. But maxima_lib. as in the ticket description still crashes.

Sorry, it appears that in case of gap. I have been barking up the wrong tree.

@dimpase
Copy link
Member

dimpase commented Apr 11, 2021

comment:42

all is good in Sage 9.3.rc1

@dimpase dimpase removed this from the sage-8.1 milestone Apr 11, 2021
@dimpase
Copy link
Member

dimpase commented Apr 11, 2021

Reviewer: Dima Pasechnik

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants