Skip to content

resubmitted tasks sometimes stuck as pending #450

Closed
minrk opened this Issue May 17, 2011 · 2 comments

3 participants

@minrk
IPython member
minrk commented May 17, 2011

@kaazoo reported the following issue under some circumstances of resubmitted tasks. One task does not get properly queued, so it is stuck listed as 'pending'.

That branch has been merged (or will be this afternoon), so opening a new issue here for its resolution.

quoth @kaazoo (#413 (comment)):

I did some testing with my wrapper methods to create a DrQueue job (IPython session) and some tasks. See https://github.com/kaazoo/DrQueueIPython/blob/master/DrQueue/client.py for details.

First step: create tasks of job

python2.6 sendjob_ipython.py -s 1 -e 5 -b 1 -r blender -f /usr/local/drqueue/tmp/icetest.blend -n "job032" -o "{'rendertype':'animation'}" --owner "foobar"

Have a look on them. They are pending:

python2.6 listjobs_ipython.py
Tasks of job job032:
msg_id                                 status    owner       completed at
0a348798-77ce-480e-a264-a726aa8d3c37   pending   foobar      2011-05-05 23:35:47
5e60e1ed-c531-4477-80e5-0ae7c760cc57   pending   foobar      2011-05-05 23:35:47
01915017-fcee-4b28-8e90-e50a364e8f96   pending   foobar      2011-05-05 23:35:47
f2158540-c58a-44b7-8e81-e47d6e828ece   pending   foobar      2011-05-05 23:35:47
4358e073-5641-49e4-b273-b58ed39e3d00   pending   foobar      2011-05-05 23:35:47

Wait a while. Now they are completed:

python2.6 listjobs_ipython.py
Tasks of job job032:
msg_id                                 status    owner       completed at
0a348798-77ce-480e-a264-a726aa8d3c37   ok        foobar      2011-05-05 23:42:19
5e60e1ed-c531-4477-80e5-0ae7c760cc57   ok        foobar      2011-05-05 23:42:19
01915017-fcee-4b28-8e90-e50a364e8f96   ok        foobar      2011-05-05 23:42:45
f2158540-c58a-44b7-8e81-e47d6e828ece   ok        foobar      2011-05-05 23:42:45
4358e073-5641-49e4-b273-b58ed39e3d00   ok        foobar      2011-05-05 23:42:58

Second step: requeue all tasks of job

python2.6 controljob_ipython.py -r -n job032
requeuing 0a348798-77ce-480e-a264-a726aa8d3c37
requeuing 5e60e1ed-c531-4477-80e5-0ae7c760cc57
requeuing 01915017-fcee-4b28-8e90-e50a364e8f96
requeuing f2158540-c58a-44b7-8e81-e47d6e828ece
requeuing 4358e073-5641-49e4-b273-b58ed39e3d00
Job job032 is running another time.

Have a look again. They are pending:

python2.6 listjobs_ipython.py
Tasks of job job032:
msg_id                                 status    owner       completed at
0a348798-77ce-480e-a264-a726aa8d3c37   pending   foobar      2011-05-05 23:35:47
5e60e1ed-c531-4477-80e5-0ae7c760cc57   pending   foobar      2011-05-05 23:35:47
01915017-fcee-4b28-8e90-e50a364e8f96   pending   foobar      2011-05-05 23:35:47
f2158540-c58a-44b7-8e81-e47d6e828ece   pending   foobar      2011-05-05 23:35:47
4358e073-5641-49e4-b273-b58ed39e3d00   pending   foobar      2011-05-05 23:35:47

Wait a while. Hhhmm, one task isn't ready but the engines are idle:

python2.6 listjobs_ipython.py
Tasks of job job032:
msg_id                                 status    owner       completed at
0a348798-77ce-480e-a264-a726aa8d3c37   ok        foobar      2011-05-05 23:45:33
5e60e1ed-c531-4477-80e5-0ae7c760cc57   ok        foobar      2011-05-05 23:45:33
01915017-fcee-4b28-8e90-e50a364e8f96   ok        foobar      2011-05-05 23:45:58
f2158540-c58a-44b7-8e81-e47d6e828ece   ok        foobar      2011-05-05 23:45:58
4358e073-5641-49e4-b273-b58ed39e3d00   pending   foobar      2011-05-05 23:45:58

Third step: requeue again

python2.6 controljob_ipython.py -r -n job032
requeuing 0a348798-77ce-480e-a264-a726aa8d3c37
requeuing 5e60e1ed-c531-4477-80e5-0ae7c760cc57
requeuing 01915017-fcee-4b28-8e90-e50a364e8f96
requeuing f2158540-c58a-44b7-8e81-e47d6e828ece
Traceback (most recent call last):
  File "controljob_ipython.py", line 62, in <module>
    main()
  File "controljob_ipython.py", line 53, in main
    client.job_rerun(options.name)
  File "/Users/kaazoo/Documents/Entwicklung/drqueue-entwicklung/drqueue-zmq/DrQueue/client.py", line 217, in job_rerun
    self.task_requeue(task['msg_id'])
  File "/Users/kaazoo/Documents/Entwicklung/drqueue-entwicklung/drqueue-zmq/DrQueue/client.py", line 198, in task_requeue
    self.ip_client.resubmit(task_id)
  File "<string>", line 2, in resubmit
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/parallel/client/client.py", line 48, in spin_first
    return f(self, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/parallel/client/client.py", line 1098, in resubmit
    raise self._unwrap_exception(content)
IPython.parallel.error.RemoteError: ValueError(Task u'4358e073-5641-49e4-b273-b58ed39e3d00' appears to be inflight)
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/IPython/parallel/controller/hub.py", line 1133, in resubmit_task
    raise ValueError("Task %r appears to be inflight"%(msg_id))
ValueError: Task u'4358e073-5641-49e4-b273-b58ed39e3d00' appears to be inflight

What could be the couse of this? There's a pending task that can't be run by an engine and can't be requeued.

@minrk minrk was assigned May 17, 2011
@fperez
IPython member
fperez commented May 25, 2011

Just commenting so I'm in the loop, because this seems pretty serious for real production work... I don't have any ideas right now though...

@kaazoo
kaazoo commented May 26, 2011

I tried it again today after pulling from https://github.com/ipython/ipython.git which already had your last commits in connection to this topic. The error situation as described above doesn't seem to happen anymore. Thanks.

@minrk minrk closed this Jun 12, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.