Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

Open
praiskup opened this issue Mar 10, 2017 · 3 comments

Comments

@praiskup
Copy link

praiskup commented Mar 10, 2017

I'm not sure whether this is implemented or not .. Based on the code, it looks like it could work but it does not. When I do conn.get_transport().set_keepalive(5) and conn.exec_command('...', timeout=10), my code hangs indefinitely even if I kill the remote ssh server.

Can this be fixed/implemented?

@praiskup praiskup changed the title Alternative to ssh's ServerAliveInterval=3 and ServerAliveCountMax Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options Mar 10, 2017
@praiskup
Copy link
Author

Btw., I use Fedora's python2-paramiko-2.1.1-2.fc25.noarch.

@bitprophet
Copy link
Member

Thanks for the report!

Not a feature I use myself (& not the original author) so I haven't really touched it; glancing at it, set_keepalive seems like it does roughly map to ServerAliveInterval (vs e.g. TCPKeepAlive) going by its immediate implementation (sends a real SSH message).

The actual keeping-alive timer functionality also looks implemented, there's a timestamp that gets updated on writes, or on reads when a timeout occurs (in the same spot that triggers the message send, when the time since last transmission exceeds the interval, as one would expect).

There's a chance for bugs in there, of course. But I think the real problem is...there's no equivalent of ServerAliveCountMax that I can see. Even assuming the keepalive packet sending works great, there's nothing tracking how often it's happened and excepting/shutting down after a threshold.

Been a while since I dealt with "remote server went away" issues so I don't remember our normal behavior in that case, but assuming it wouldn't trigger socket/etc errors in your situation, certainly seems the current setup would run forever, and implementing the equiv ServerAliveCountMax would be handy.

Afraid I don't have time to deep dive into it now but I'd certainly entertain a PR.

@radssh
Copy link
Contributor

radssh commented Mar 11, 2017

I have seen issues with the existing keepalive logic not being able to detect some duff connections. Never found the root cause, but what I was able to see was that the client keepalive messages were reported as succeeding, but only buffered in the socket Send-Q monitored via netstat. In order to track the server replies, the sent ssh "global-request" message needs to set the want_reply flag to true. Enabling that in the current code would cause the client to hang, since the flag causes a blocking read on the expected reply, which doesn't ever come back.

I implemented a makeshift solution that never got enough polish on it to submit back as a PR, but did seem to properly detect the issues that I was encountering. Lingering doubt as to the proper handling of the Transport.completion_event being subject to altered outside the scope, and whether to explicitly alter the Transport state when the keepalive failure is detected (if so, in what way?).

See code here and if @bitprophet can provide guidance, I can work on converting that into a PR if it looks good.

FrostyX pushed a commit to FrostyX/copr that referenced this issue Apr 17, 2017
This allows us to detect SSH connection issues more earlier
instead of hanging forever.  Paramiko has issuees with keep-alive
packets and timeouts:
paramiko/paramiko#918

So now, reschedule build in case of SSH issues.  Also, heavily
depend on openssh client configuration.
haridsv added a commit to haridsv/fabric that referenced this issue Jul 14, 2017
Currently, the documentation states that keepalive maps to ClientAliveInterval, which is a server side setting, unlike what the name indicates. The client side setting is called ServerAliveInterval. You can see references to this in the below two discussions:

paramiko/paramiko#918
https://unix.stackexchange.com/a/3027/6475
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants