New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix hang if using proxycommand #252
Conversation
problem was introduced by commit 068bf63 when using a proxycommand, self.is_alive() loop in stop_thread never ends Signed-off-by: gza <github.guillaume@zitta.fr>
Ping? There are quite a few Fabric users suffering from this. |
@mgedmin I just tried with this fix and it didn't seem to solve the problem I was having with fabric. Does it work for you? |
This PR fixes my problem. (Which is a hang during disconnect if I use Fabric to talk to a server using ProxyCommand.) If I ^C the fab process after it hangs (while using paramiko master without this fix), I get the following traceback, which is the same as described in the original comment:
|
Thanks for suggesting this! I feel like I've either seen this change before, or we already merged something that moved the |
Yea, I was right, this was modified in #156 (which came out in 1.11.2, which is when users are reporting the bug started to appear as per Fabric #1020 and Fabric #1042. Digging into the implications still, as this PR would not completely reverse that patch but might still cause issues (and they may be speaking to slightly different use cases as well.) |
I'd done something wrong in my earlier testing; my previous comment is incorrect - this does indeed fix Fabric #1020 for me. |
Thanks @mgedmin - that's exactly what I was expecting and my (ongoing) investigation definitely implies that shuffling the socket close is not the right approach here. So I won't merge this actual PR (sorry @gza!) but I will keep it open as the Paramiko-level discussion of the issue. Will comment again in a sec w/ condensed version of what I have so far. Think I'm close. |
|
More digging, still no solution:
|
The crux then seems to be that in non ProxyCommand situations, we need to rely on that socket timeout to tell when the remote end is truly done sending us data. Something else needs to happen in the ProxyCommand scenario to the same effect:
|
Could try faking a timeout by using |
The 'select + manually timeout' approach works but is prone to banner errors unless the timeout is inflated (implying math's off or something re: manual timing out?) which then probably makes things a bit slow. But I'm way past my time/stress threshold for such a specific feature, especially one which has a workaround (the gateway/use-nested-channels approach) so I'm tempted to push out a branch w/ this code in it & merge if I get sufficient +1s. |
https://github.com/paramiko/paramiko/tree/fix-proxycommand-infinite-loop-252 - this is that branch. Works for me in the trivial base case. |
Fails here (Fabric 1.8.0, paramiko commit 58489c8) :
|
Filed #268 for that NameError (it also affects master, not just this branch). I'm guessing that code shouldn't be trying to convert a socket.timeout into a BadProxyCommand, so with this diff I get a bit further: diff --git a/paramiko/proxy.py b/paramiko/proxy.py
index abdd157..6262b39 100644
--- a/paramiko/proxy.py
+++ b/paramiko/proxy.py
@@ -101,6 +101,8 @@ class ProxyCommand(object):
read.append(b)
result = ''.join(read)
return result
+ except socket.timeout:
+ raise
except IOError, e:
raise BadProxyCommand(' '.join(self.cmd), e.strerror) Unfortunately "further" is this:
So key authentication stops working. For the record, the ProxyCommand I use for muskatas is
|
I'm using key auth in my base case and it works fine there, so I wonder what else in your situation could be killing it. I also tried your invocation (though AFAIK Feels to me like your slightly uncommon use case (gateway == target) may be complicating things? Are you able to try a more vanilla use case (gateway != target) temporarily so we can compare behaviors on your end? EDIT: also, enabling debug-level stdlib logging in your fabfile would help us see what Paramiko is trying to log. |
@mgedmin oh, I see - socket.error is a subclass of IOError (something I wasn't cognizant of). I'll add that bit in and see if it helps - I am still getting banner errors occasionally and it feels really grody to keep pumping up the timeout. EDIT: didn't help, but I think the banner specific stuff has its own timeout which may not be getting transmitted correctly. Digging. |
This banner error is pretty strange but I'm gonna operate as if it's not a coincidence. When things work, the data gotten back from the initial read is:
When it's broken:
When broken, it keeps sucking down more, I think trying to find the banner, but it looks like the front (the part being tested for - this code just wants to see This data always comes from So suspicion is that our inner timeout is potentially reading that header, but then timing out & raising, then gets called again by the outer loop - thus "eating" the data. Unsure why the |
Yup, keeping a buffer around persistently (again very similar to This also likely means other situations that would goof up, will work now. Who knows if that's the extent of this class' non-real-socket-ness though. @mgedmin, I am pushing this update momentarily, please give it a shot. EDIT: Also realized this explains the oddness where I had to bump up the inner timeout to avoid errors. So that can be removed now. Sweet. |
Tested latest code out with a less trivial fabfile (does some runs, some with cd + sudos, then some more sudos, then a put, then more runs.) Works fine. Added in some interactivity ( Hopefully means this is stable, potentially excepting more specific-to-proxying-itself things like @mgedmin's situation. |
I've pulled the latest changes in the fix-proxycommand-infinite-loop-252 branch and things work fine now. Perhaps that timeout-drops-data bug you fixed was randomly breaking pubkey auth for me. 🎉 🍰 🎆 BTW note that no proxying itself was involved in #252 (comment). The can't-proxy-itself case was with the native gateway mode, and it came up only because there's no way to specify different gateways for different hosts in Fabric, AFAIK. (Being able to specify different gateways would be more useful than being able to gateway to itself, IMHO.) (I've been using netcat in my ProxyCommand since before ssh -W existed. Thanks for bringing it to my attention!) |
Noted re: gateways, and yea, I have plans for Fab 2 to be far more object oriented which will enable per-host settings for everything - gateway, etc. Glad this fixes it for you (also see comments in the Fab ticket). Will release today hopefully. Thanks again for the assist! |
Merged my branch for this into 1.11 -> 1.12 -> master. |
[Maintainer note: this patch can't be used as-is but see comments for eventual diagnosis & fixes.]
problem was introduced by commit 068bf63
when using a proxycommand, self.is_alive() loop in stop_thread never
ends
Don't know if it is the right way to do it, but, works for me