Skip to content

fix: Prevent SSH idle disconnects via proper keepalive wiring#175

Merged
inureyes merged 1 commit into
mainfrom
fix/ssh-idle-disconnect
Apr 13, 2026
Merged

fix: Prevent SSH idle disconnects via proper keepalive wiring#175
inureyes merged 1 commit into
mainfrom
fix/ssh-idle-disconnect

Conversation

@inureyes
Copy link
Copy Markdown
Member

Summary

Idle SSH sessions disconnected inconsistently — sometimes after a few minutes, sometimes after ~10 minutes. Fixes three underlying issues:

  1. russh inactivity_timeout ceilingto_russh_config() inherited russh's 10-minute default via ..Default::default(), imposing a hard upper bound regardless of SSH keepalive. Now set to None when keepalive is enabled.
  2. No TCP SO_KEEPALIVE — the underlying TCP socket had no kernel-level keepalive, so NAT/firewall conntrack entries could silently expire between SSH keepalive packets. connect_with_config now builds the TcpStream manually, applies socket2::TcpKeepalive derived from the SSH keepalive config, and hands it to russh::client::connect_stream.
  3. Exec-mode config droppedSshConnectionConfig was threaded through the executor but the ConnectionConfig field was #[allow(dead_code)], so user-configured server_alive_interval never reached Client::connect_with_ssh_config. The field is now wired end-to-end through connect_direct / connect_via_jump_hosts / the jump chain.

Changes

  • src/ssh/tokio_client/connection.rs: override inactivity_timeout, add to_tcp_keepalive(), rewrite connect_with_config around connect_stream with SO_KEEPALIVE.
  • src/ssh/client/{config,connection,command,file_transfer}.rs: thread ssh_connection_config through the exec path.
  • src/executor/{connection_manager,parallel}.rs: forward SshConnectionConfig into ConnectionConfig; remove dead_code.
  • Cargo.toml: add socket2 = "0.6" (already transitive via tokio).

Test plan

  • cargo build / cargo clippy --all-targets clean
  • cargo test --lib — 1193 passing (3 pre-existing keychain tests fail due to macOS keychain auth prompt, unrelated)
  • Manual: leave interactive session idle >10 min on a NAT'd path
  • Manual: verify server_alive_interval in ~/.config/bssh/config.yaml now affects exec mode

Idle SSH sessions were disconnecting inconsistently - sometimes after a
few minutes, sometimes after ~10 minutes. Three underlying issues:

1. russh's default Config::inactivity_timeout is 10 minutes and was
   inherited verbatim by to_russh_config(), imposing a hard ceiling on
   every session regardless of keepalive liveness. Now set to None when
   keepalive is enabled, so the keepalive mechanism alone decides when
   a peer is dead.

2. No TCP-level SO_KEEPALIVE was set on the underlying socket, so NAT
   and stateful firewall conntrack entries could expire silently
   between SSH keepalive packets. connect_with_config now builds the
   TcpStream manually, applies socket2::TcpKeepalive derived from the
   SSH keepalive config, and hands it to russh::client::connect_stream.

3. The exec-mode path threaded an SshConnectionConfig through the
   executor but dropped it at ConnectionConfig (the field was marked
   dead_code), so user-configured server_alive_interval never reached
   Client::connect_with_ssh_config. The field is now live and flows
   through connect_direct / connect_via_jump_hosts / the jump chain.

Adds socket2 as a direct dependency (already transitive via tokio).
@inureyes inureyes added type:bug Something isn't working priority:high High priority issue labels Apr 13, 2026
@inureyes inureyes merged commit 291bafb into main Apr 13, 2026
2 checks passed
@inureyes inureyes deleted the fix/ssh-idle-disconnect branch April 13, 2026 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:high High priority issue type:bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant