Skip to content

Commit

Permalink
Trac #21800: Better error handling in sage-cleaner
Browse files Browse the repository at this point in the history
My `sage-cleaner` instance is randomly killing jobs. Reason:
{{{
Checking PIDs [18654]
Process 18654 is no longer running, so we clean up
Killing 18654's spawned jobs
--> Killing 'gp' with PID 18743 and parent PID 18654
--> Killing 'gp' with PID 18759 and parent PID 18654
--> Killing 'gp' with PID 18841 and parent PID 18654
--> Killing 'gp' with PID 18851 and parent PID 18654
--> Killing 'gp' with PID 18868 and parent PID 18654
--> Killing 'gp' with PID 18878 and parent PID 18654
--> Killing 'gp' with PID 18982 and parent PID 18654
--> Killing 'gp' with PID 19333 and parent PID 18654
Exception while cleaning up PID 18654:
Traceback (most recent call last):
  File "/usr/local/src/sage-config/src/bin/sage-cleaner", line 94, in
cleanup
    or kill_spawned_jobs(spawned_processes, parent_pid):
  File "/usr/local/src/sage-config/src/bin/sage-cleaner", line 106, in
kill_spawned_jobs
    pid, cmd = job.strip().split(' ', 1)
ValueError: need more than 1 value to unpack
}}}

Probably the `jobfile` got corrupted somehow and we need to handle this
gracefully.

URL: https://trac.sagemath.org/21800
Reported by: jdemeyer
Ticket author(s): Jeroen Demeyer
Reviewer(s): Frédéric Chapoton
  • Loading branch information
Release Manager authored and vbraun committed Nov 9, 2016
2 parents e07cea8 + 7940870 commit 2fc6073
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions src/bin/sage-cleaner
Expand Up @@ -103,9 +103,13 @@ def kill_spawned_jobs(jobfile, parent_pid):
logger.info("Killing %s's spawned jobs", parent_pid)
killed_them_all = True
for job in open(jobfile).readlines():
pid, cmd = job.strip().split(' ', 1)
logger.info("--> Killing '%s' with PID %s and parent PID %s", cmd, pid, parent_pid)
pid = int(pid)
try:
pid, cmd = job.strip().split(' ', 1)
pid = int(pid)
logger.info("--> Killing %r with PID %s and parent PID %s", cmd, pid, parent_pid)
except Exception:
logger.error("Exception while processing job %r from %s, ignoring", job, jobfile)
continue
try:
pgrp = os.getpgid(pid)
logger.info("--> Killing process group %s", pgrp)
Expand Down

0 comments on commit 2fc6073

Please sign in to comment.