New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix sage-cleaner #14055
Comments
comment:1
I am working on a patch. |
Author: Jeroen Demeyer |
comment:4
In the line
would it make sense to use (Is there any good reason for having both directories |
comment:5
THERE ARE STILL SOME INSTANCES OF UPPERCASE SAGE. I don't like the mixture of tmp, temp, TMP and TEMP or whatever.
"Deleting I don't like Command line option handling is certainly suboptimal. ( Likewise, if |
Changed keywords from none to orphans |
Reviewer: Punarbasu Purkayastha, Leif Leonhardy |
comment:6
Ooops. |
Changed reviewer from Punarbasu Purkayastha, Leif Leonhardy to John Palmieri, Leif Leonhardy |
comment:7
Leif: the subject of this ticket is not "Fix all possible issues in sage-cleaner", the goal is to just make it work. |
comment:8
Replying to @jhpalmieri:
Absolutely not, but at least the directory used in |
comment:9
I am having a problem with this patch, but I don't know why. I am repeatedly running
Without the patch, the file Also, if I delete the directory On the bright side, it seems to be cleaning out the appropriate files, directories, and processes. |
comment:10
Maybe the problem is that diff --git a/sage-cleaner b/sage-cleaner
--- a/sage-cleaner
+++ b/sage-cleaner
@@ -115,6 +115,7 @@
else:
wait = 10
+ time.sleep(wait)
# Initial cleanup, ignore time
running_sages = cleanup()
cleanup_time = 0.0 I don't know if the initial sleep should be for |
comment:11
Replying to @jdemeyer:
Where is |
comment:12
I've applied this patch to SAGE_ROOT/local/bin, ran "make ptestlong" and still got two ecl processes eating RAM when tests were finished without errors. |
comment:13
jhpalmieri: I can totally reproduce problems with |
comment:14
If you look at the output from the doctest command, I find it suspicious that the message "sage-cleaner is finished" appears before any doctests are run. |
comment:15
Replying to @jhpalmieri:
Sure, but I don't see how that could lead to doctest failures. It could be that your problem is also an instance of #14323, let's first fix that. |
comment:16
Replying to @jhpalmieri:
I wouldn't rely on some wall time. Unfortunately the cleaner cannot If there's already another Sage cleaner instance running, it could of course exit immediately (modulo the problem mentioned before). |
comment:17
I also get the same problem with |
comment:18
In case it helps, I can duplicate this problem on bsd.math.washington.edu. |
comment:19
John: please check whether the new spkg at #14323 fixes your problem. |
comment:20
Replying to @jhpalmieri:
I can indeed reproduce the problem and #14323 doesn't help. |
comment:21
On boxen, with the spkg from #14323, I see that the issue from that ticket seems to be fixed, but then I can reproduce the issue here. |
comment:22
I found the problem with |
Changed merged from sage-5.10.beta1 to none |
comment:92
I am still seeing the |
comment:93
I've been running repeated doctests and the only thing that I see on taurus is occasional (about 5% of doctests runs) failure with the maxima pty interface. The trouble always starts with the assertion failure where input != echo:
And that has nothing to do with this ticket... |
comment:94
For the record, this is the comparison that fails:
|
Attachment: trac_14055_maxima_debug.patch.gz Initial patch |
This comment has been minimized.
This comment has been minimized.
comment:95
I don't know why there are random spaces sprinkled in, but it certainly doesn't have anything to do with this ticket. I haven't found any other failures on taurus. |
comment:96
Volker, I don't see how your patch would affect the error reported in [comment:82]. |
comment:97
What I'm saying is, I could not reproduce (on taurus) the error reported in comment:82. It might be a miscompile or a transient hardware error as far as I can tell. |
comment:98
See also https://groups.google.com/d/msg/sage-release/07skjCnmRCI/Oyxm1EIquMEJ |
comment:99
Running with
|
strace of failed ecl startup (from taurus) |
comment:100
Attachment: log.10693.gz I've attached an strace from a failed ECL startup on taurus. It seems that during dlopen of ECL an unrelated thread (id 16046) happens to finish which causes a SIGCHLD. If the ECL signal handler is not turned off during init then that would explain things going south. Nils, since you wrote it maybe you have an opinion on that? |
comment:101
The last time I looked at it, ECL as a library with multithread support is rather fundamentally incompatible with the way we use it. As far as I know, if thread support is enabled then ECL expects one (dedicated?) thread to handle asynchronous signals and I think that thread may even be split off when there's otherwise only one thread running. I don't see how to consolidate that with our own signal management. The solution up to know has been to only use ECL in single threaded mode, which may mean building it without thread support. I know ECL has been moving towards multithreadedness. However, as an "embeddable" lisp there's a good argument they should also keep a strictly single-threaded version as an option and I hope they're continuing with that (perhaps sending a message that we really appreciate single threadedness might help). Our signal switching code has been a little behind the times for a while already. Some improvements were suggested here: http://comments.gmane.org/gmane.lisp.ecl.general/8211 That code might be outdated already as well. What is the hypothesis here, by the way? That an event triggered by sage-cleaner produces a signal meant for our own signal handler that happens to end up in with the ecl signal handler? If that's the problem I see no way around it. Then we just need to write our own signal handler to always be in control, keeping flags on whether some signals might have to be dispatched to ECL and do so for signals we deem appropriate for that. That's major surgery (I've actually been surprised we were able to avoid that for so long and I hope we can continue) I guess normally, ECL expects SIGCHLD to be ignored. We apparently set a different mask? Changing to ECL's signal mask would include ignoring a SIGCHLD if it happens to be in control when the signal arrives. Can we afford to? Do we already try to do that? In that case is the problem that the signal arrives in a small window where we've switched handlers but not masks? Then we should be a little more careful about the switch, i.e., perhaps disable signals during the switch completely. |
comment:102
For the record, we build ECL with I don't think the sage-cleaner has anything to do with the problem. It just slightly changes timings so that we happen to trigger a rather unlikely race on taurus/skynet. It certainly should not send any signals except for term/kill. The new doctest framework seems to be the only thing that uses SIGCHLD in the Sage library. |
comment:103
Replying to @vbraun:
OK, good.
So perhaps this should go on a different ticket then?
I took a quick look at unixint.d and it seems ECL does install a handler for SIGCHLD. Perhaps you're just unlucky that the SIGCHLD gets delivered at a very inopportune moment when ECL hasn't quite finished setting up its handlers? Or are we just sloppy in how we change handlers? The handler they use is defined in unixsys.d as |
comment:104
Switching off ECL_OPT_TRAP_SIGCHLD seems to fix it (knock on wood). 50x testing linear_algebra.rst on taurus works with the new spkg (you need to recompile maxima after ecl). |
This comment has been minimized.
This comment has been minimized.
diff for review only: ecl-12.12.1.p2 -> ecl-12.12.1.p3 |
comment:105
Attachment: ecl-p3.diff.gz Volker, if this works, it's a very nice piece of debugging. Testing right away... |
comment:107
I ran 500 iterations for |
Merged: sage-5.10.beta2 |
sage-cleaner
got completely broken due to incompatible filenames betweensage-cleaner
and the Sage library.Apply:
local/bin
.local/bin
.Update ECL spkg with http://boxen.math.washington.edu/home/vbraun/spkg/ecl-12.12.1.p3.spkg
CC: @ppurka @nbruin
Component: scripts
Author: Jeroen Demeyer, Volker Braun
Reviewer: John Palmieri, Leif Leonhardy, Volker Braun
Merged: sage-5.10.beta2
Issue created by migration from https://trac.sagemath.org/ticket/14055
The text was updated successfully, but these errors were encountered: