Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hang in parallel.Client when using SSHAgent #4717

Closed
bpartridge opened this issue Dec 19, 2013 · 7 comments · Fixed by #4720
Closed

hang in parallel.Client when using SSHAgent #4717

bpartridge opened this issue Dec 19, 2013 · 7 comments · Fixed by #4720
Milestone

Comments

@bpartridge
Copy link

TL;DR: Due to an un-customizable (and questionably used) "sleep" command blocked on when IPython Parallel connects to a controller over OpenSSH, connection can take minutes when only seconds are needed.

In order for IPython.external.ssh.tunnel to detect whether to ask the user interactively for an SSH password (using getpass), it issues a sleep %i command over SSH, and waits for it either to end or to ask for a password. This code is at https://github.com/ipython/ipython/blob/master/IPython/external/ssh/tunnel.py#L218 in the function openssh_tunnel. Somewhat bizarrely, instead of using the timeout argument as the total amount of time to wait for the spawned SSH instance to become responsive, it uses this argument as the amount of time for the sleep command, the %i mentioned above.

While there are almost certainly better ways to solve this, it would not be a problem if a timeout argument was actually provided to openssh_tunnel. However, when a Client is created in https://github.com/ipython/ipython/blob/master/IPython/parallel/client/client.py#L462, no timeout is passed, either as a kwarg or otherwise. And the default argument is 60 seconds (!). Additionally, Client creation creates multiple such tunnels in series, one for each type of communication it needs.

Therefore, in the case where your controller's server only accepts keypair-based logins, which is the case for many academic clusters, it can spend minutes blocking on absolutely nothing, just to detect that a password wasn't needed, when this lack-of-a-password could have been specified (indeed, elsewhere in the code there are special cases for Client(..., password=False) that could be reused here!).

My workaround at the moment is monkey-patching tunnel.py to have a different default argument:

from IPython.external.ssh import tunnel
from IPython.external.ssh.tunnel import open_tunnel

def short_tunnel_connection(socket, addr, server, keyfile=None, password=None, paramiko=None, timeout=1):
    """Connect a socket to an address via an ssh tunnel.

    This is a wrapper for socket.connect(addr), when addr is not accessible
    from the local machine.  It simply creates an ssh tunnel using the remaining args,
    and calls socket.connect('tcp://localhost:lport') where lport is the randomly
    selected local port of the tunnel.

    """
    new_url, tunnel = open_tunnel(addr, server, keyfile=keyfile, password=password, paramiko=paramiko, timeout=timeout)
    socket.connect(new_url)
    return tunnel

tunnel.__dict__['tunnel_connection'] = short_tunnel_connection

I could put together a simple pull request which passes the user-specified timeout as an argument to sleep, but that seems to be a bandage over the larger problem, which is that this loop to wait for SSH to finish is even used regardless of whether the user wants to specify a password! Any thoughts on what the correct course of action would be?

@bpartridge
Copy link
Author

Note: Discovered this bug while working with @slinderman, mentioning here so he gets updates.

@minrk
Copy link
Member

minrk commented Dec 19, 2013

It shouldn't wait for the sleep to finish, the -f flag should be sending the process into the background and returning as soon as a connection is successfully established. The sleep is there actually as a timeout for initial traffic over the tunnel, to prevent SSH from closing the tunnel prematurely, and also to keep it from being left open indefinitely. It should have no effect on the actual responsiveness of the openssh_tunnel function. I suspect there is some SSH config that is causing -f to be ignored. Are you using ssh-agent, or any other more advanced ssh config?

@bpartridge
Copy link
Author

I'm using ssh-agent, yes. After doing some debugging, it seems that ssh -f server fails if I use ControlMaster auto and already have a connection open to server (as I almost always do). In that case, ssh -f server sleep 60 reports "auto-mux: Trying existing master" and then waits for the command to complete. If this is expected behavior for ControlMaster, then shouldn't openssh_tunnel be detecting that and working around it, or at least showing a message to the user?

@minrk
Copy link
Member

minrk commented Dec 19, 2013

Sure, it's just never come up before. If you can figure out a workaround before I do, a PR would be welcome. It may be a while before I can create an environment that fails in the right way.

@minrk
Copy link
Member

minrk commented Dec 19, 2013

This is due to a bug in OpenSSH.

First workaround: add -S none to disable using the multiplexer for tunnels. It has obvious disadvantages, but I can't think of a better way right now. Obviously, the ideal action to take would be to just add the tunnel to the already existing connection, but I can't see a way to do that. What do you think?

minrk added a commit to minrk/ipython that referenced this issue Dec 19, 2013
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948)
when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes ipython#4717
@minrk
Copy link
Member

minrk commented Dec 19, 2013

I opened #4720 with the workaround I mentioned, but definitely feel free to submit a better fix if you find one.

@minrk
Copy link
Member

minrk commented Dec 19, 2013

It would also be helpful if you could confirm that #4720 actually fixes the issue for you, just in case I am wrong and/or my tests have not been sufficient.

minrk added a commit that referenced this issue Dec 29, 2013
never use ssh multiplexer in tunnels

`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes #4717
jdfreder pushed a commit to jdfreder/ipython that referenced this issue Jan 2, 2014
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948)
when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes ipython#4717
minrk added a commit that referenced this issue Jan 28, 2014
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes #4717
pankajp pushed a commit to pankajp/ipython that referenced this issue Feb 19, 2014
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes ipython#4717
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948)
when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes ipython#4717
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this issue Nov 3, 2014
never use ssh multiplexer in tunnels

`ssh -f` is [broken in OpenSSH](https://bugzilla.mindrot.org/show_bug.cgi?id=1948) when a ControlMaster multiplexed connection is used.

`-S none` disables the multiplexed connection.

closes ipython#4717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants