New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hang in parallel.Client when using SSHAgent #4717
Comments
Note: Discovered this bug while working with @slinderman, mentioning here so he gets updates. |
It shouldn't wait for the sleep to finish, the |
I'm using ssh-agent, yes. After doing some debugging, it seems that |
Sure, it's just never come up before. If you can figure out a workaround before I do, a PR would be welcome. It may be a while before I can create an environment that fails in the right way. |
This is due to a bug in OpenSSH. First workaround: add |
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes ipython#4717
I opened #4720 with the workaround I mentioned, but definitely feel free to submit a better fix if you find one. |
It would also be helpful if you could confirm that #4720 actually fixes the issue for you, just in case I am wrong and/or my tests have not been sufficient. |
never use ssh multiplexer in tunnels `ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes #4717
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes ipython#4717
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes #4717
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes ipython#4717
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes ipython#4717
never use ssh multiplexer in tunnels `ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used. `-S none` disables the multiplexed connection. closes ipython#4717
TL;DR: Due to an un-customizable (and questionably used) "sleep" command blocked on when IPython Parallel connects to a controller over OpenSSH, connection can take minutes when only seconds are needed.
In order for
IPython.external.ssh.tunnel
to detect whether to ask the user interactively for an SSH password (using getpass), it issues asleep %i
command over SSH, and waits for it either to end or to ask for a password. This code is at https://github.com/ipython/ipython/blob/master/IPython/external/ssh/tunnel.py#L218 in the functionopenssh_tunnel
. Somewhat bizarrely, instead of using the timeout argument as the total amount of time to wait for the spawned SSH instance to become responsive, it uses this argument as the amount of time for the sleep command, the%i
mentioned above.While there are almost certainly better ways to solve this, it would not be a problem if a timeout argument was actually provided to
openssh_tunnel
. However, when a Client is created in https://github.com/ipython/ipython/blob/master/IPython/parallel/client/client.py#L462, no timeout is passed, either as a kwarg or otherwise. And the default argument is 60 seconds (!). Additionally, Client creation creates multiple such tunnels in series, one for each type of communication it needs.Therefore, in the case where your controller's server only accepts keypair-based logins, which is the case for many academic clusters, it can spend minutes blocking on absolutely nothing, just to detect that a password wasn't needed, when this lack-of-a-password could have been specified (indeed, elsewhere in the code there are special cases for
Client(..., password=False)
that could be reused here!).My workaround at the moment is monkey-patching
tunnel.py
to have a different default argument:I could put together a simple pull request which passes the user-specified timeout as an argument to
sleep
, but that seems to be a bandage over the larger problem, which is that this loop to wait for SSH to finish is even used regardless of whether the user wants to specify a password! Any thoughts on what the correct course of action would be?The text was updated successfully, but these errors were encountered: