Fix zombie SSH processes with --sftp-ssh#8930
Conversation
… only once The issue was that cmd.Wait() was being called multiple times on the same process - once in the background goroutine and once in Close(). This could lead to zombie processes because only the first call to Wait() properly reaps the process. The fix uses sync.Once to ensure Wait() is only called once per SSH process, storing and returning the result on subsequent calls. Added tests to verify the fix works correctly. Co-authored-by: ncw <536803+ncw@users.noreply.github.com>
Remove unnecessary exited() check - we should always call cmd.Wait() in the sync.Once block to properly reap the process and capture its exit status. The sync.Once ensures it's only called once. Co-authored-by: ncw <536803+ncw@users.noreply.github.com>
ncw
left a comment
There was a problem hiding this comment.
I think your analysis is wrong. The ssh zombies are caused by Wait not being called at all, rather than multiple times.
I tested your fix and unfortunately it does not fix the original problem.
However I think the implementation idea is good as are the tests, I just think there needs to be more to it.
There seems to be a problem with session in the sshClientExternal structure - it is never assigned to which means that sshClientExternal.Wait() will never actually call Wait. Maybe this is the problem. Please investigate further.
| } | ||
| return err | ||
| // Use sync.Once to ensure we only wait for the process once | ||
| // This prevents zombie processes that occur when Wait() is called multiple times |
There was a problem hiding this comment.
This comment is wrong - zombies do not ocurr when Wait() is called multiple times.
There was a problem hiding this comment.
Corrected in 46bc876 - the comment now accurately reflects that sync.Once provides thread-safety, not zombie prevention.
|
@copilot did you see my review? |
…ernal The real issue was that sshClientExternal.session was never assigned, so Wait() always returned nil without waiting for the SSH process to exit. This caused zombie processes because the process was never reaped. The fix: - Store the first session created in NewSession() to s.session - This allows Wait() to actually wait for the SSH process - The sync.Once pattern is still useful for thread-safety - Updated comments to reflect the correct behavior Fixes the zombie process issue reported in #8929 Co-authored-by: ncw <536803+ncw@users.noreply.github.com>
You're absolutely right! The real issue was that
Yes, addressed in 46bc876 - the session is now properly stored so Wait() actually waits for the SSH process. |
ncw
left a comment
There was a problem hiding this comment.
This is looking good now.
I will merge now - thank you.
When using external SSH binaries via
--sftp-ssh, zombie processes accumulate becausesshClientExternal.Wait()is never actually called. The root cause was thatsshClientExternal.sessionwas never assigned inNewSession(), soWait()always returnednilimmediately without waiting for the SSH process to exit.Changes
NewSession()to assign the created session tos.sessionsoWait()can actually wait for the SSH processsync.Once: Addedsync.OnceinsshSessionExternal.Wait()to ensurecmd.Wait()executes exactly once (thread-safety)waitErrfield to store and return the wait result on subsequent callsWait()calls don't cause process leaksFixes #8929
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.