Skip to content

step ssh proxycommand hangs ~60s when server closes connection before stdin closes #1641

@techwolf359

Description

@techwolf359

Description

step ssh proxycommand hangs for approximately 60 seconds when the SSH server closes the connection before the client has closed stdin. This occurs when the server rejects the connection mid-session (e.g. PAM account check failure after certificate authentication succeeds), sending a SSH2_MSG_USERAUTH_BANNER followed by a disconnect.

The process eventually exits due to an OS-level timeout, but the hang makes it appear to the user that the connection is still in progress.

Suspected Cause

The deadlock is in proxyDirect / proxyDirectWithIO in command/ssh/proxycommand.go:

var wg sync.WaitGroup
wg.Add(1)
go func() {
    io.Copy(conn, os.Stdin)   // goroutine 1: blocks reading stdin
    conn.CloseWrite()
    wg.Done()
}()
wg.Add(1)
go func() {
    io.Copy(os.Stdout, conn)  // goroutine 2: exits when server closes
    conn.CloseRead()
    wg.Done()
}()
wg.Wait()                     // waits for both — never returns

When the server closes the TCP connection:

  1. Goroutine 2 exits and calls conn.CloseRead()
  2. Goroutine 1 is blocked reading from os.Stdin
  3. os.Stdin is a pipe from the SSH client process, which hasn't closed because it's waiting for the ProxyCommand to exit
  4. The ProxyCommand is waiting for both goroutines — deadlock

Calling os.Stdin.Close() from goroutine 2 does not reliably interrupt a blocked read() syscall on macOS when stdin is a pipe.

Reproduction

// Start a TCP server that sends data and immediately closes
ln, _ := net.Listen("tcp", "127.0.0.1:0")
go func() {
    conn, _ := ln.Accept()
    conn.Write([]byte("hello"))
    conn.Close()
}()

// Simulate a stdin that never closes (SSH client waiting for ProxyCommand)
stdinR, _ := io.Pipe()  // write end intentionally left open

// This hangs indefinitely
proxyDirectWithIO("127.0.0.1", port, stdinR, io.Discard)

Fix

Return as soon as either goroutine completes. When the server closes, the process exits and the OS reclaims the blocked goroutine. This is safe — the ProxyCommand's only job is to proxy bytes; once one side closes, there is nothing more to do.

done := make(chan struct{}, 2)

go func() {
    io.Copy(conn, in)
    conn.CloseWrite()
    done <- struct{}{}
}()

go func() {
    io.Copy(out, conn)
    conn.CloseRead()
    done <- struct{}{}
}()

<-done
return nil

Test

A regression test is included in the linked PR that fails before the fix and passes after.

Environment

  • macOS arm64 (Apple Silicon)
  • step installed via Homebrew
  • OpenSSH 9.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions