fix: Prevent SSH idle disconnects via proper keepalive wiring#175
Merged
Conversation
Idle SSH sessions were disconnecting inconsistently - sometimes after a few minutes, sometimes after ~10 minutes. Three underlying issues: 1. russh's default Config::inactivity_timeout is 10 minutes and was inherited verbatim by to_russh_config(), imposing a hard ceiling on every session regardless of keepalive liveness. Now set to None when keepalive is enabled, so the keepalive mechanism alone decides when a peer is dead. 2. No TCP-level SO_KEEPALIVE was set on the underlying socket, so NAT and stateful firewall conntrack entries could expire silently between SSH keepalive packets. connect_with_config now builds the TcpStream manually, applies socket2::TcpKeepalive derived from the SSH keepalive config, and hands it to russh::client::connect_stream. 3. The exec-mode path threaded an SshConnectionConfig through the executor but dropped it at ConnectionConfig (the field was marked dead_code), so user-configured server_alive_interval never reached Client::connect_with_ssh_config. The field is now live and flows through connect_direct / connect_via_jump_hosts / the jump chain. Adds socket2 as a direct dependency (already transitive via tokio).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Idle SSH sessions disconnected inconsistently — sometimes after a few minutes, sometimes after ~10 minutes. Fixes three underlying issues:
inactivity_timeoutceiling —to_russh_config()inherited russh's 10-minute default via..Default::default(), imposing a hard upper bound regardless of SSH keepalive. Now set toNonewhen keepalive is enabled.connect_with_confignow builds theTcpStreammanually, appliessocket2::TcpKeepalivederived from the SSH keepalive config, and hands it torussh::client::connect_stream.SshConnectionConfigwas threaded through the executor but theConnectionConfigfield was#[allow(dead_code)], so user-configuredserver_alive_intervalnever reachedClient::connect_with_ssh_config. The field is now wired end-to-end throughconnect_direct/connect_via_jump_hosts/ the jump chain.Changes
src/ssh/tokio_client/connection.rs: overrideinactivity_timeout, addto_tcp_keepalive(), rewriteconnect_with_configaroundconnect_streamwith SO_KEEPALIVE.src/ssh/client/{config,connection,command,file_transfer}.rs: threadssh_connection_configthrough the exec path.src/executor/{connection_manager,parallel}.rs: forwardSshConnectionConfigintoConnectionConfig; removedead_code.Cargo.toml: addsocket2 = "0.6"(already transitive via tokio).Test plan
cargo build/cargo clippy --all-targetscleancargo test --lib— 1193 passing (3 pre-existing keychain tests fail due to macOS keychain auth prompt, unrelated)server_alive_intervalin~/.config/bssh/config.yamlnow affects exec mode