Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

praiskup · 2017-03-10T09:00:36Z

I'm not sure whether this is implemented or not .. Based on the code, it looks like it could work but it does not. When I do conn.get_transport().set_keepalive(5) and conn.exec_command('...', timeout=10), my code hangs indefinitely even if I kill the remote ssh server.

Can this be fixed/implemented?

The text was updated successfully, but these errors were encountered:

praiskup · 2017-03-10T09:11:18Z

Btw., I use Fedora's python2-paramiko-2.1.1-2.fc25.noarch.

bitprophet · 2017-03-11T17:21:21Z

Thanks for the report!

Not a feature I use myself (& not the original author) so I haven't really touched it; glancing at it, set_keepalive seems like it does roughly map to ServerAliveInterval (vs e.g. TCPKeepAlive) going by its immediate implementation (sends a real SSH message).

The actual keeping-alive timer functionality also looks implemented, there's a timestamp that gets updated on writes, or on reads when a timeout occurs (in the same spot that triggers the message send, when the time since last transmission exceeds the interval, as one would expect).

There's a chance for bugs in there, of course. But I think the real problem is...there's no equivalent of ServerAliveCountMax that I can see. Even assuming the keepalive packet sending works great, there's nothing tracking how often it's happened and excepting/shutting down after a threshold.

Been a while since I dealt with "remote server went away" issues so I don't remember our normal behavior in that case, but assuming it wouldn't trigger socket/etc errors in your situation, certainly seems the current setup would run forever, and implementing the equiv ServerAliveCountMax would be handy.

Afraid I don't have time to deep dive into it now but I'd certainly entertain a PR.

radssh · 2017-03-11T23:05:58Z

I have seen issues with the existing keepalive logic not being able to detect some duff connections. Never found the root cause, but what I was able to see was that the client keepalive messages were reported as succeeding, but only buffered in the socket Send-Q monitored via netstat. In order to track the server replies, the sent ssh "global-request" message needs to set the want_reply flag to true. Enabling that in the current code would cause the client to hang, since the flag causes a blocking read on the expected reply, which doesn't ever come back.

I implemented a makeshift solution that never got enough polish on it to submit back as a PR, but did seem to properly detect the issues that I was encountering. Lingering doubt as to the proper handling of the Transport.completion_event being subject to altered outside the scope, and whether to explicitly alter the Transport state when the keepalive failure is detected (if so, in what way?).

See code here and if @bitprophet can provide guidance, I can work on converting that into a PR if it looks good.

This allows us to detect SSH connection issues more earlier instead of hanging forever. Paramiko has issuees with keep-alive packets and timeouts: paramiko/paramiko#918 So now, reschedule build in case of SSH issues. Also, heavily depend on openssh client configuration.

Currently, the documentation states that keepalive maps to ClientAliveInterval, which is a server side setting, unlike what the name indicates. The client side setting is called ServerAliveInterval. You can see references to this in the below two discussions: paramiko/paramiko#918 https://unix.stackexchange.com/a/3027/6475

praiskup changed the title ~~Alternative to ssh's ServerAliveInterval=3 and ServerAliveCountMax~~ Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options Mar 10, 2017

bitprophet added Feature Needs investigation Needs patch labels Mar 11, 2017

haridsv mentioned this issue Jul 14, 2017

keepalive maps to ServerAliveInterval fabric/fabric#1631

Closed

userlocalhost mentioned this issue Dec 20, 2021

(Feature request) Enable to set "ServerAliveInterval" configuration for ParamikoSSHClient StackStorm/st2#5511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

praiskup commented Mar 10, 2017 •

edited

praiskup commented Mar 10, 2017

bitprophet commented Mar 11, 2017

radssh commented Mar 11, 2017

Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

Alternative to ssh's ServerAliveInterval and ServerAliveCountMax client options #918

Comments

praiskup commented Mar 10, 2017 • edited

praiskup commented Mar 10, 2017

bitprophet commented Mar 11, 2017

radssh commented Mar 11, 2017

praiskup commented Mar 10, 2017 •

edited