Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSH agent forwarding error #562

Closed
jravetch opened this issue Feb 16, 2012 · 33 comments
Closed

SSH agent forwarding error #562

jravetch opened this issue Feb 16, 2012 · 33 comments

Comments

@jravetch
Copy link

I enabled ssh agent forwarding with fab 1.4 with env.forward_agent = 'True' and all seems fine, except I receive an odd error during code shipment:

out: Authentication response too long: 3577407571
out: fatal: The remote end hung up unexpectedly

Fab does properly forward the agent when pulling code from github, so not sure what's going on.

Before fab 1.4, I used this function to get around the key forwarding issue:

def sshagent_run(cmd):
local('ssh -A %s@%s "%s"' % (env.user, env.host, cmd))

Any idea what's going on?

@bitprophet
Copy link
Member

Not sure what you mean by "code shipment" exactly. Can you please provide links to a Gist of your fabfile & the full output of when you run it and get this error? Might help. Haven't seen this error before myself but the feature is brand new and has only been tested in a few situations.

@ghost ghost assigned bitprophet Feb 16, 2012
@jravetch
Copy link
Author

Thanks for looking into this. When I say "code shipment", I mean pulling a repo, archiving the previous release, and creating a new directory with new code. Please refer to my fabfile and shipper library here: https://gist.github.com/1849832.

The full output from one of the servers is as follows (I put xxx for the server name and module for privacy sake):

[xxx] out: Loading staging settings ...
[xxx] out: Switching assets hash for xxx (release: milestone_latest-20120217.01) to 2ae3e41
[xxx] out: done!
[xxx] out: Shipped. Deploy when ready.
[xxx] out: Loading staging settings ...
[xxx] out: Shipping env: staging, module: xxx, branch: milestone_latest
[xxx] out: Switching from branch master to milestone_latest
[xxx] out: Switched to branch 'milestone_latest'
[xxx] out: Authentication response too long: 3577407571
[xxx] out: fatal: The remote end hung up unexpectedly

@bitprophet
Copy link
Member

Sorry, I meant literally all the output from the fab task, not just the lines displaying the remote server's stdout :) If you're explicitly hiding the rest with eg hide(), I'd recommend removing those temporarily, and in fact using show('debug') or fab --show=debug.

I can say that the error there appears to be not from Fabric itself but from whatever is being invoked remotely, but without more info (see above) I can't tell which run/sudo call actually generated that :)

@osi
Copy link

osi commented Mar 27, 2012

I've seen this as well. I have a git checkout on a remote server, and if I do 'git submodule update' and there are multiple sub-modules to update, the 2nd one always fails.

@gosusnp
Copy link

gosusnp commented Apr 2, 2012

I have a similar issues using mercurial:

  • when I clone a repository which has multiple subrepos, the first one passes the second hangs,
  • when I use a buildout which has multiple mercurial, same story.

@bitprophet
Copy link
Member

Thanks a lot for the feedback, guys. That's really interesting, especially how it affects hg too -- @gosusnp, is hg also using the SSH protocol in this instance (from your target to the repository host)? I assume 'yes' but want to make sure.

So, sounds like there may be some problem with initiating multiple agent-using requests on the remote box -- I'll see if even the base case for this (two simple ssh calls, using interactivity) triggers it.

EDIT: Well, that didn't replicate the problem -- two subsequent run("ssh xxx") calls to two different hosts (from the same single remote target) worked fine. Don't have time to set up any kind of realistic multi-submodule Git repo right now; if @osi is using a public set of repos I could try using that one sometime.

@gosusnp
Copy link

gosusnp commented Apr 3, 2012

@bitprophet, your assumption is correct.

If it helps, my current workaround is to execute the whole command using local("ssh -A %(host)s .... buildout") instead of run("buildout").

@jravetch
Copy link
Author

jravetch commented Apr 3, 2012

Just as I initially reported ;)

@stevenmcdonald
Copy link

I have a similar problem to gosusnp, though in my case with svn (ssh+svn protocol). I'm not clear that it's exactly the same issue as the original report, though. I'm running python 2.7.1

This hangs after the output from updating 'foo'. A plain "svn up" fails the same way. One line of output then it hangs.

def doesnotwork():
  with cd('$PATH'):
    run('svn update foo bar')

This works as expected:

def works():
  with cd('$PATH'):
    run('svn update foo')

as another commenter mentioned, this workaround seems to work:

def workaround():
  local('ssh %(host)s cd $PATH && svn up foo bar' % env)

running with --show=debug doesn't seem to provide any help.

@stevenmcdonald
Copy link

Did a little more digging, and I think the issue I'm seeing is the same as #613 rather than this one. It does seem to be related to agent forwarding.

FWIW the minimal test case to show the problem I've found is:

def test():
  run('ssh server ls && ssh server ls')

Running the 2 ssh operations in separate run() calls doesn't have any problems.

@bitprophet
Copy link
Member

Yes, this and #613 are almost definitely the same root cause, namely some sort of deficiency in the agent forwarding feature. (The perils of pull requests -- I don't know how it actually works, only that it passed tests & worked in a few basic real world test situations.)

I had tried triggering this issue with two run calls but didn't think to try doing them both within the same remote session -- that's a good catch. I'll try to recreate that version myself which might help me debug. Thanks!

@stevenmcdonald
Copy link

I don't know a lot about how ssh works its magic, and my python is pretty rusty, but I thought I'd take a stab at debugging the issue.

One thing I noticed is that in ssh.transport._parse_channel_open(), the two lines that test kind == 'auth-agent@openssh.com' (2011 and 2090) are indented with tabs while the rest of the file is indented with spaces. Correcting this has no effect on the bug though :)

Looking at the logging, I see that when auth is requested, it makes a connection to the AgentClientProxy. Then when the second ssh call makes an auth request, it fires up another AgentClientProxy to handle that, though the first AgentClientProxy is never closed.

Again, I don't know a lot about how this is supposed to work, but it seems like the ideal case would be to reuse the first AgentClientProxy, or if it does need to be shut down, maybe the fact that it's not being closed before another instance tries to connect that's causing the problem.

@bitprophet
Copy link
Member

Actually, in my explorations, AgentClientProxy is only instantiated once, are you referring to the Secsh channel 3 (auth-agent@openssh.com) opened style lines (of which there are two)?

Anyway, have done some digging but not done yet. Mostly focusing on "what might be different in the 2nd agent request that's causing it to upset the remote sshd". Found two spots already where state is created in a non-idempotent manner:

  • AgentClientProxy._forward_agent_handler sets (and starts) self.thread.
    • Making this idempotent (so only one thread is created) does impact the forwarding, in the sense that the forwarding simply does not occur the 2nd time -- I am prompted for my passphrase (but it works fine otherwise, if I enter it correctly.) Not very useful.
  • AgentClientProxy.connect sets and returns self._conn, a socket object hooked up to the local SSH_AUTH_SOCK.
    • Making this idempotent (returning self._conn immediately if it's not None) has no measurable effect.
    • This was kind of expected -- the local socket isn't changing any so any socket connection to it would theoretically behave identically to any other, even if they're different Python objects.

Still digging around; also thinking based on the above that the agent handler thread might be a place to dig deeper if no other spots in the code seem likely.

@bitprophet
Copy link
Member

I don't believe this is the cause of the problem, but I realized we're never closing the AgentClientProxy that the Fabric layer instantiates, which could theoretically be problematic for large agent-activated runs (there'd be a ton of extra open auth SSH channels as well as local socket connections to SSH_AUTH_SOCK.)

I've committed something to tie that off.

Minor issue with this (though it would still be a problem before my change) is that this combo of local socket + extra SSH channel is being created (and now at least partly torn down) for every command run. So again in large fabfiles this could potentially be inefficient.

However, from what little I know of how forwarding works, we probably can't hold open the local->remote channel persistently, and the local socket connection is a file/Unix socket, so while we could cache that OK, it's not likely a big deal.

Finally, we'd want this granular behavior in any case where somebody only wanted to enable agent forwarding for a subset of their execution run, though I'd expect that to be a pretty uncommon use case.

@bitprophet
Copy link
Member

Sanity check against normal SSH client behavior with the same invocation (two agent forwarded cxns in a single remote command execution). Looks similar to what we're doing at first blush, re: a new agent forwarding channel per remote 'ssh' invoke:

Authenticated to localhost ([127.0.0.1]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: Requesting authentication agent forwarding.
debug1: Sending environment.
debug1: Sending env LANG = en_US.UTF-8
debug1: Sending command: ssh localhost uname && ssh localhost uptime
debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
debug1: channel 1: new [authentication agent connection]
debug1: confirm auth-agent@openssh.com
debug1: channel 1: FORCE input drain
debug1: channel 1: free: authentication agent connection, nchannels 2
Darwin
debug1: client_input_channel_open: ctype auth-agent@openssh.com rchan 2 win 65536 max 16384
debug1: channel 1: new [authentication agent connection]
debug1: confirm auth-agent@openssh.com
debug1: channel 1: FORCE input drain
debug1: channel 1: free: authentication agent connection, nchannels 2
 9:03  up 6 days, 15:02, 3 users, load averages: 0.30 0.41 0.47
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
debug1: channel 0: free: client-session, nchannels 1
Connection to localhost closed.

I notice it's apparently freeing up the auth connection each time (running with ssh -vv confirms this, channel is "dead", gets gc'd and is then freed); unclear if our patch does the same thing, will try to check. Channel open/close seems to be ultimately handled in Transport; I can trace the opening of the agent forward channel in Transport._parse_channel_open but after that the channel is stored in state.

I'll have to go find out where closing occurs and make sure it's logging; I don't see any obvious channel close events from Paramiko at debug level, as I do from ssh -v -- for any channel type, not just this newly added agent forward channel.

@stevenmcdonald
Copy link

I just looked though my saved logs again... and you're right, I'm only seeing AgentClientProxy init'd once, it's connecting twice. Probably not especially helpful, but here's the relavant portion of my logs (with some logging added to agent.py):

EB [20120502-15:26:55.194] thr=1   ssh.transport: Ciphers agreed: local=aes128-ctr, remote=aes128-ctr
DEB [20120502-15:26:55.194] thr=1   ssh.transport: using kex diffie-hellman-group1-sha1; server key type ssh-rsa; cipher: local aes128-ctr, remote aes128-ctr; mac: local hmac-sha1, remote hmac-sha1; compression: local none, remote none
DEB [20120502-15:26:55.211] thr=1   ssh.transport: Switch to new keys ...
DEB [20120502-15:26:55.534] thr=2   ssh.AgentClientProxy: AgentClientProxy init
DEB [20120502-15:26:55.573] thr=1   ssh.transport: [chan 1] Sesch channel 1 request ok
DEB [20120502-15:26:56.369] thr=1   ssh.transport: Incoming forward agent connection
DEB [20120502-15:26:56.369] thr=1   ssh.transport: [chan 2] Max packet in: 34816 bytes
DEB [20120502-15:26:56.370] thr=1   ssh.transport: [chan 2] Max packet out: 16384 bytes
INF [20120502-15:26:56.370] thr=1   ssh.transport: Secsh channel 2 (auth-agent@openssh.com) opened.
DEB [20120502-15:26:56.370] thr=1   ssh.AgentClientProxy: forward agent handler start
DEB [20120502-15:26:56.371] thr=3   ssh.AgentClientProxy: connect
DEB [20120502-15:26:56.545] thr=1   ssh.transport: [chan 2] EOF received (2)
DEB [20120502-15:26:57.113] thr=1   ssh.transport: Incoming forward agent connection
DEB [20120502-15:26:57.113] thr=1   ssh.transport: [chan 3] Max packet in: 34816 bytes
DEB [20120502-15:26:57.113] thr=1   ssh.transport: [chan 3] Max packet out: 16384 bytes
INF [20120502-15:26:57.114] thr=1   ssh.transport: Secsh channel 3 (auth-agent@openssh.com) opened.
DEB [20120502-15:26:57.114] thr=1   ssh.AgentClientProxy: forward agent handler start
DEB [20120502-15:26:57.114] thr=4   ssh.AgentClientProxy: connect
DEB [20120502-15:26:57.120] thr=1   ssh.transport: [chan 2] EOF sent (2)
DEB [20120502-15:27:06.411] thr=1   ssh.transport: EOF in transport thread

(that final EOF log message is from me control-C'ing it after the hang)

Good to know the "one agent connection per request" is the same thing OpenSSH is doing.

I'm suspicious about it logging EOF messages on chan 2 (the first agent channel) and nothing on chan 3 (the second agent connection), but that may just essentially be a symptom.

@stevenmcdonald
Copy link

I found it!

AgentClientProxy.connect() makes a connection to the ssh-agent and stores that in self._conn

AgentProxyThread (the super class of AgentRemoteProxy) selects on and reads from this connection in the AgentClientProxy instance. So when the second AgentProxyThread starts up, it causes the AgentClientProxy to overwrite self._conn with the second connection to the ssh-agent.

The first AgentProxyThread is still running and selecting on self._agent._conn, so it actually receives the message before the second thread can get it. In other words, we have 2 AgentProxyThread classes listening on the same socket, when really they should be listening on separate sockets.

So, in AgentProxyThread, in the run() method, if you have it save agent._conn in self._conn, and likewise change the _communicate() method to use self._conn instead of self._agent._conn, it fixes the problem.

Here's my diff:

--- /Library/Python/2.7/site-packages/ssh/agent.py  2012-04-30 11:34:04.000000000 -0700
+++ agent.py    2012-05-04 16:03:30.000000000 -0700
@@ -105,6 +105,7 @@
         threading.Thread.__init__(self, target=self.run)
         self._agent = agent
         self._exit = False
+        self._conn = None

     def run(self):
         try:
@@ -112,6 +113,7 @@
             self.__inr = r
             self.__addr = addr
             self._agent.connect()
+            self._conn = self._agent._conn
             self._communicate()
         except:
             #XXX Not sure what to do here ... raise or pass ?
@@ -122,10 +124,10 @@
         oldflags = fcntl.fcntl(self.__inr, fcntl.F_GETFL)
         fcntl.fcntl(self.__inr, fcntl.F_SETFL, oldflags | os.O_NONBLOCK)
         while not self._exit:
-            events = select([self._agent._conn, self.__inr], [], [], 0.5)
+            events = select([self._conn, self.__inr], [], [], 0.5)
             for fd in events[0]:
                 if self._agent._conn == fd:
-                    data = self._agent._conn.recv(512)
+                    data = self._conn.recv(512)
                     if len(data) != 0:
                         self.__inr.send(data)
                     else:
@@ -133,7 +135,7 @@
                 elif self.__inr == fd:
                     data = self.__inr.recv(512)
                     if len(data) != 0:
-                        self._agent._conn.send(data)
+                        self._conn.send(data)
                     else:
                         break
             time.sleep(io_sleep)

I'm not sure that really the "right" way to fix it, since AgentClientProxy is still having the value of self._conn overwritten with each new connection... but it least it demonstrates the general idea of how to fix it.

@bitprophet
Copy link
Member

So if I read what you're saying right, I was moving in the right direction re: the poor state handling of the local socket connection to the local SSH agent? And I think what you're saying is that this works well, and the only problem is the value of self._conn is getting overwritten "unnecessarily" (the same behavior I saw re: the agent socket being recreated 2x in a row when it could possibly just be cached once per session somehow.)

I'll try to verify your changes above and then think about whether/how to try and memoize/cache this socket. Thanks a lot for the 2nd look, no idea if I would've run into that specific angle myself :)

@stevenmcdonald
Copy link

The state handling of the local socket seems to be ok, it's the overwriting problem that causes the issue (unless I'm misunderstanding what you mean). Take a look at the loop in AgentProxyThread._communicate(). It's selecting on self._agent._conn. So when the second AgentRemoteProxy instance is created, it overwrites agent._conn with a new socket, so now we have two instances of AgentRemoteProxy (in separate threads) trying to read data from the socket that should be exclusive to the second instance.

I found the problem by putting logging in Connection.send() and recv(). The first instance, where things worked, would send a message and receive a response. On the second run, the new instance would send a message, but the first instance would receive the response.

This is kind of messy, but here are my logs. First the one that works:

DEB [20120504-15:25:50.259] thr=1   ssh.AgentClientProxy: forward agent handler start, chan: <ssh.Channel 2 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 2 open channel(s))>>
DEB [20120504-15:25:50.259] thr=1   ssh.AgentRemoteProxy: init agent: <ssh.agent.AgentClientProxy object at 0x10869e950> chan: <ssh.Channel 2 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 2 open channel(s))>>
DEB [20120504-15:25:50.260] thr=3   ssh.AgentClientProxy: connect
DEB [20120504-15:25:50.260] thr=3   ssh.AgentClientProxy: connection: <socket._socketobject object at 0x10868f280>
DEB [20120504-15:25:50.261] thr=3   ssh.AgentProxyThread: recv from: <ssh.Channel 2 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 2 open channel(s))>>
DEB [20120504-15:25:50.261] thr=3   ssh.AgentProxyThread: send to: <socket._socketobject object at 0x10868f280>
DEB [20120504-15:25:50.272] thr=3   ssh.AgentProxyThread: recv from: <socket._socketobject object at 0x10868f280>
DEB [20120504-15:25:50.273] thr=3   ssh.AgentProxyThread: send to: <ssh.Channel 2 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 2 open channel(s))>>

Then when the second ssh happens:

DEB [20120504-15:25:50.884] thr=1   ssh.AgentClientProxy: forward agent handler start, chan: <ssh.Channel 3 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 3 open channel(s))>>
DEB [20120504-15:25:50.884] thr=1   ssh.AgentRemoteProxy: init agent: <ssh.agent.AgentClientProxy object at 0x10869e950> chan: <ssh.Channel 3 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 3 open channel(s))>>
DEB [20120504-15:25:50.885] thr=4   ssh.AgentClientProxy: connect
DEB [20120504-15:25:50.885] thr=4   ssh.AgentClientProxy: connection: <socket._socketobject object at 0x10868f210>
DEB [20120504-15:25:50.886] thr=4   ssh.AgentProxyThread: recv from: <ssh.Channel 3 (open) window=65536 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 3 open channel(s))>>
DEB [20120504-15:25:50.886] thr=4   ssh.AgentProxyThread: send to: <socket._socketobject object at 0x10868f210>
DEB [20120504-15:25:50.895] thr=3   ssh.AgentProxyThread: recv from: <socket._socketobject object at 0x10868f210>
DEB [20120504-15:25:50.895] thr=3   ssh.AgentProxyThread: send to: <ssh.Channel 2 (EOF received) (open) window=64936 -> <ssh.Transport at 0x8691810L (cipher aes128-ctr, 128 bits) (active; 3 open channel(s))>>

the request comes in via chan 3 on thread 4, but it's thread 3 that gets the response and then tries to pass it along to chan 2 . From there it was just a matter of digging though the code doing the send and recv to spot the problem. You can see that thread 3 starts out working with socket 0x10868f280 but after thread 4 starts up, thread 3 is now working with socket 0x10868f210, the one created for thread 4.

I hope that all makes sense :)

FWIW, I tried running a script that makes a bunch of ssh connections (via svn, i.e. a non-trivial test) and it ran to completion.

@bitprophet
Copy link
Member

Yea, by "state handling" I meant "incorrectly overwrote self._conn", i.e. how data is handled. Same class of problem (IMO) as not closing files/sockets correctly, really inefficient repeated access instead of caching, etc.

I think I get what you mean, but again I'll try to verify on my end soon. Thanks again!

@stevenmcdonald
Copy link

Ah yes, then that is the problem :)

My patch simply copies the conn object from the AgentClinetProxy in to the AgentProxyThread instance... which smooths over this problem, but doesn't leave the code in any better shape to correctly handle state.

@bitprophet
Copy link
Member

I've reproduced your findings -- I am also seeing the older thread select() on the (newer/overwritten) local auth socket fd before the newer thread can. (Ironically your fix doesn't fix it for me -- because I also, locally, started memoizing the socket fd. Which results in the same problem -- one socket, multiple read threads. Whoops! With that removed, it works great.)

Trying to think of best way to "really" solve this, though I'd honestly be fine with your solution since AFAICT it's no worse than the original patch re: how it handles the socket objects.

I was earlier thinking about caching the socket fd, but that's not the real problem now, the problem is the multiple read threads. Need to either convince the older one to terminate when the new one shows up, or (if possible) have the first thread created "switch" to newly created channels. (This might leave stale remote channels lying around too, so really what we want is for all three items -- fd, thread, channel -- to be reused whenever possible. Not sure if that meshes with the SSH spec though.)

@lynxis
Copy link

lynxis commented May 6, 2012

i've created a new class which creates every time a new ssh-agent channel arrives a new AgentClientProxy. it works for me.
lynxis/ssh@37f04a2

@bitprophet
Copy link
Member

Thanks for the patch, @lynxis. This looks like exactly what I was talking about above re: ensuring a single local-agent-socket=>remote-fake-agent connection per Fabric-level statement. I'll test it out in a bit!

EDIT: Confirmed patch works. Want to double check whether I can re-apply an earlier tweak that should .close() the remote agent forwarding channel after Fabric's _execute() terminates -- I worry that otherwise we'll end up with one new channel per run()/sudo() statement which is probably not good. After that I think this is hopefully licked.

@lynxis
Copy link

lynxis commented May 7, 2012

I think it's a good idea to close all channels. Added a commit to my fork. One question about these agent channels. Do we close they in a clean way after we received EOF?

@bitprophet
Copy link
Member

@lynxis re: EOFs triggering closures, I don't know, actually -- I'd have to read the spec or the openssh source to figure that out.

As long as we close the channels automatically at the end of the Fabric level _execute (which your commit + another tweak on fabric's end should accomplish) it's good enough for now, but it would be nice to solve it for the long term / for people using ssh without Fabric.

Will try to test that new change out today sometime if I can.

@techtonik
Copy link

The stuff with connection drop during operation is pretty repeatable here - svn up + svn+ssh + env.forward_agent = True gives stable Network connection closed unexpectedly with Fabric 1.4.1 and ssh (library) 1.7.13. Looking forward to see this fixed, so if you have a version with the fix - I am eager to test.

@bitprophet
Copy link
Member

@techtonik I'm pretty sure that error condition is related to the changes under discussion, yea, though I never experienced it myself (I only got the Authentication too long and/or freeze-up behaviors when I tested.) I'll post here when I have something I'm happy with pushed into master and/or the 1.4 branch.

@bitprophet
Copy link
Member

This is up as ssh 1.7.14 (out on PyPI now, hopefully not broken...) and Fabric branches 1.4 and master. I'll try to release actual tags/PyPI releases for Fabric tomorrow.

@techtonik if you upgrade your Python ssh lib and try nabbing fabric==dev that should let you test whether this change fixes your version of the problem. Please report back!

@bitprophet
Copy link
Member

I lied, they're on PyPI now, not tomorrow :) Still curious if this fixed @techtonik's problem. Also really enjoy spamming his github inbox, apparently :D

@techtonik
Copy link

Well, I've tried to use Fabric 1.4.1 with SSH from checkout unsuccessfully. I need to Logout and try once again with latest checkout of Fabric, because it seems Gnome agent crashed behind the scenes - I started to be prompted for private key password. The most evident bug was messed output:

Before:

[remote] run: svn st
...
[remote] out: !       htdocs
[remote] out: ?       htdocs/signin/test.php
[remote] out: !       htdocs/global

After:

[remote] run: svn st
[remote] out: !?!!       htdocs/global

@jravetch
Copy link
Author

jravetch commented May 8, 2012

That fixed it for me! Thanks a lot @bitprophet @lynxis @stevenmcdonald for your help to troubleshoot and solve this one.

@techtonik
Copy link

I confirm. Fixed with Fabric 1.4.2 / ssh 1.7.14. ;)

$ PYTHONPATH="ssh:fabric" fab --version
Fabric 1.4.2
ssh (library) 1.7.14

jperkin pushed a commit to TritonDataCenter/pkgsrc-legacy that referenced this issue Dec 9, 2013
Upstream changes:
-----------------

## ssh 1.7.14 (2012-05-07)

* #15: Implemented parameter substitution in SSHConfig, matching the
  implementation of `ssh_config(5)`. Thanks to Olle Lundberg for the patch.
* #24: Switch some internal type checking to use `isinstance` to help prevent
  problems with client libraries using subclasses of builtin types. Thanks to
  Alex Morega for the patch.
* [Fabric #562](fabric/fabric#562): Agent forwarding
  would error out (with `Authentication response too long`) or freeze, when more
  than one remote connection to the local agent was active at the same time.
  This has been fixed. Thanks to Steven McDonald for assisting in
  troubleshooting/patching, and to GitHub user `@lynxis` for providing the
  final version of the patch.
bendavis78 pushed a commit to bendavis78/paramiko that referenced this issue Jan 30, 2014
jsonn pushed a commit to jsonn/pkgsrc that referenced this issue Oct 11, 2014
Upstream changes:
-----------------

## ssh 1.7.14 (2012-05-07)

* #15: Implemented parameter substitution in SSHConfig, matching the
  implementation of `ssh_config(5)`. Thanks to Olle Lundberg for the patch.
* #24: Switch some internal type checking to use `isinstance` to help prevent
  problems with client libraries using subclasses of builtin types. Thanks to
  Alex Morega for the patch.
* [Fabric #562](fabric/fabric#562): Agent forwarding
  would error out (with `Authentication response too long`) or freeze, when more
  than one remote connection to the local agent was active at the same time.
  This has been fixed. Thanks to Steven McDonald for assisting in
  troubleshooting/patching, and to GitHub user `@lynxis` for providing the
  final version of the patch.
lukeseawalker added a commit to lukeseawalker/aws-parallelcluster that referenced this issue Jul 25, 2019
We are getting "paramiko.ssh_exception.AuthenticationException: Unable to connect to SSH agent" when we try to run an SSH command inside an SSH session,
e.g.
RemoteCommandExecutor.run_remote_command("ssh {0}".format(mpi_module, remote_host))

Possibly related to fabric/fabric#562

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
lukeseawalker added a commit to aws/aws-parallelcluster that referenced this issue Jul 25, 2019
We are getting "paramiko.ssh_exception.AuthenticationException: Unable to connect to SSH agent" when we try to run an SSH command inside an SSH session,
e.g.
RemoteCommandExecutor.run_remote_command("ssh {0}".format(mpi_module, remote_host))

Possibly related to fabric/fabric#562

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants