Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncResult.wait(0) can hang waiting for the client to get results? #2215

Closed
cgevans opened this issue Jul 27, 2012 · 5 comments · Fixed by #2255
Closed

AsyncResult.wait(0) can hang waiting for the client to get results? #2215

cgevans opened this issue Jul 27, 2012 · 5 comments · Fixed by #2255
Labels
Milestone

Comments

@cgevans
Copy link

cgevans commented Jul 27, 2012

I have some code that runs multiple maps on a load balanced view. I run the code on a remote ipengine that I'm connected to via ssh; the client is running locally in an ipython notebook. The individual tasks take some time to run, perhaps a minute or two each.

It appears that AsyncResult instances can end up in a situation where all outstanding jobs for the are finished, but where the outputs are not actually ready for the client yet. Thus _ready is not set, and somehow wait(0) ends up at self._wait_for_outputs(10) (line 161 of asyncresult.py). This hangs for ten seconds owing to the ten second timeout.

As a result, any function which uses wait(0) will hang for ten seconds before returning, including AsyncResult.progress.

While I'm not exactly sure why this is happening, it seems like it can be solved reasonably easily by changing 10 to timeout, thus causing the timeout there to honor the timeout given to wait.

@minrk
Copy link
Member

minrk commented Aug 4, 2012

I don't think it's possible for _wait_for_outputs to be called if _ready is not set, since it is inside a if self._ready: block.

Can you post code to reproduce this? I have seen it a while ago, but can't reproduce it well enough to actually test and find a fix.

I agree that relaying the timeout would be a decent band-aid, but there's a more serious bug causing this that I would like to actually find and fix.

minrk added a commit to minrk/ipython that referenced this issue Aug 4, 2012
requesting metadata (e.g. ar.data or ar.stdout) will result in flushing iopub if the outputs are incomplete, so separate wait(0) need not be called.

This also applies the workaround discussed in ipython#2215
@cgevans
Copy link
Author

cgevans commented Aug 8, 2012

So here's the problem with reproducing this. I have minimal code right now that will reproduce it in my configuration:

from IPython.parallel import Client

rc = Client()

lv = rc.load_balanced_view()

import time

def stupidfunction(r):
    import numpy.random
    return numpy.random.randn(100) # You may need to increase this to see the problem.

a = lv.map_async(stupidfunction,zeros(10))

time.sleep(0.5) # Run wait too quickly and the problem won't have time to arise.

a.wait(0)

However, this doesn't cause the problem in a local cluster, as I don't think the problem is really noticeable in that case. I'm running my client at home through SSH through a VPN to the controller at work, which is then connected to engines through fast SSH; the link is therefore not incredibly fast. I've not yet tried it in my office, which has a 100Mb line to the controller, but is still through SSH. If I run this with a local controller and engines, it's fine. This is why I believe it's a problem with results being transferred to either the controller or the client, probably the client.

Actually, I've now also checked with the engines running on the same machine as the controller, and the problem still occurs, so it seems that the client-controller connection is the culprit. From what I can tell, self._ready is set to True by self._client.wait, but 'outputs_ready' in self._metadata are still False, which causes the loop in _wait_for_outputs.

@cgevans
Copy link
Author

cgevans commented Aug 8, 2012

I can also confirm, actually, that this does not happen now that #2255 has been merged, but I'm not sure if that's just because of the timeout changes.

@minrk
Copy link
Member

minrk commented Aug 8, 2012

Thanks for the test case. It should behave the same as before if you do wait(10), but I will also try to reproduce the underlying issue myself by rolling back the fix.

You can set client.debug=True to see all messages as they come through.

Carreau pushed a commit to Carreau/ipython that referenced this issue Sep 5, 2012
requesting metadata (e.g. ar.data or ar.stdout) will result in flushing iopub if the outputs are incomplete, so separate wait(0) need not be called.

This also applies the workaround discussed in ipython#2215
@minrk
Copy link
Member

minrk commented Jul 4, 2013

closed by #2255.

@minrk minrk closed this as completed Jul 4, 2013
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
requesting metadata (e.g. ar.data or ar.stdout) will result in flushing iopub if the outputs are incomplete, so separate wait(0) need not be called.

This also applies the workaround discussed in ipython#2215
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants