ProxyCommand objects need the socket-like-obj _closed fix too #789

Closed
bitprophet opened this Issue Jul 30, 2016 · 9 comments

Projects

None yet

2 participants

@bitprophet
Member

See #774 (comment) ; also commutative to #520.

@bitprophet
Member

Interestingly, ProxyCommand doesn't appear to expose any 'closed' attributes whatsoever - no closed, no _closed, etc.

It only implements ClosingContextManager which calls self.close, but no state is ever tracked for whether it's closed or not.

Since it wraps a subprocess.Popen I think the "most correct" analogue to "is the socket closed" would be if process.returncode is not None?

@bitprophet bitprophet added a commit that referenced this issue Jul 30, 2016
@bitprophet bitprophet Untested fix re #789 228ed87
@bitprophet
Member

@nvgoldin If possible, please try cherry-picking or just applying 228ed87 to your local Paramiko. It's pretty basic so I think it'll work, but I haven't actually tested it myself yet. Hope to later.

@nvgoldin

The exception is gone, thanks!!
But, there is another problem which I am not sure is directly related to this(tell me if to open another issue). Seems like the proxy command process isn't killed properly and it leaves zombie processes, running the following for several loops:

import paramiko
import time
import logging
import os
ssh_host='localhost'
proxy_cmd='ssh -o StrictHostKeyChecking=no -W localhost:22 localhost'
logging.basicConfig(level=logging.DEBUG)
while True:
    logging.debug('running PID %s', os.getpid())
    ssh = paramiko.SSHClient()
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh_proxy = paramiko.ProxyCommand(proxy_cmd)
    ssh.connect(sock=ssh_proxy, hostname=ssh_host)
    sftp = ssh.open_sftp()
    sftp.close()
    ssh.close()
    ssh_proxy.close()
    logging.debug('going to sleep')
    time.sleep(3)

Results in: (20125 is the program pid):

>ps  xao pid,ppid,pgid,sid,comm | grep 20125

20126 20125 20125 15530 ssh <defunct>
20169 20125 20125 15530 ssh <defunct>
20228 20125 20125 15530 ssh <defunct>
20271 20125 20125 15530 ssh <defunct>
20330 20125 20125 15530 ssh <defunct>
20373 20125 20125 15530 ssh <defunct>
20416 20125 20125 15530 ssh <defunct>
20459 20125 20125 15530 ssh <defunct>
20502 20125 20125 15530 ssh <defunct>
20545 20125 20125 15530 ssh <defunct>
20588 20125 20125 15530 ssh <defunct>

The logs after authentication look like this:

DEBUG:paramiko.transport:[chan 0] Max packet in: 32768 bytes
DEBUG:paramiko.transport:[chan 0] Max packet out: 32768 bytes
DEBUG:paramiko.transport:Secsh channel 0 opened.
DEBUG:paramiko.transport:[chan 0] Sesch channel 0 request ok
INFO:paramiko.transport.sftp:[chan 0] Opened sftp connection (server version 3)
INFO:paramiko.transport.sftp:[chan 0] sftp session closed.
DEBUG:paramiko.transport:[chan 0] EOF sent (0)
DEBUG:root:going to sleep
DEBUG:paramiko.transport:EOF in transport thread

I tried changing the 'close' method of ProxyCommand from: os.kill to Popen's kill, with no success(from the docs that looks like recommended method, though not related to this.)

Maybe I should use a different order of closing(i.e. sftp/ssh/proxy)?

@nvgoldin

Update: changing proxy.py close method to:

    def close(self):
        self.process.kill()
        self.process.poll()

Resolves the issue(no zombie process leftovers). Though I'm not sure if this has any side-affects.

@bitprophet bitprophet added this to the 1.16.4 et al milestone Jul 31, 2016
@nvgoldin
nvgoldin commented Aug 18, 2016 edited

Ping. Wonder if we can get this going. I've been testing it for the past weeks and the above fix seems to be working(no zombie processes).
Want me to create a new PR?(based on 228ed87 and adding the process.poll() to update the exit status)

@bitprophet
Member

Thanks for #811, I'll try verifying it on my end when I get to the next bugfix release. (May need to bump it to a feature since we're manipulating the public API, but either way it'll get looked at.)

@ktbyers ktbyers referenced this issue in ktbyers/netmiko Nov 15, 2016
Open

netmiko : errors when using the ssh proxy #313

@bitprophet
Member
bitprophet commented Dec 6, 2016 edited

Starting to wonder if I should investigate using Invoke's Runner for this stuff, sigh (as it, too, has to handle all sorts of subprocess shutdowns and suchlike). Not worth it in the short term though, it's not explicitly designed for wholly-noninteractive byte-forwarding (even though that SHOULD work fine and I very much want it to if it does not).

Poking #811 now...

@bitprophet
Member

I can:

  • confirm the before/after re: my old branch about this that adds closed to ProxyCommand
  • NOT confirm any zombie processes as stated, though my methodology is a bit different (using unpublished fabric 2 code, and using netcat as the gateway instead of ssh).
  • Both "normal" (no explicit close) and closer to your style of execution (close on the connection obj in each loop iteration) work fine, no zombies.
  • FWIW I'm doing this on OSX 10.11.

I also doublechecked and the impl of ProxyCommand.close has been this way since 2012 - so it's unlikely to be super incorrect or I'd have expected tickets about zombies before now. Makes me wonder if some extra factor is at work in your case?

Either way I don't think it should block the basic attribute-error-fixing commit from merging so I'm gonna do that and we can spin this discussion into a new ticket if it's still affecting you. Let me know. Thanks!

@bitprophet bitprophet closed this Dec 6, 2016
@bitprophet bitprophet added a commit that referenced this issue Dec 6, 2016
@bitprophet bitprophet Changelog re #789 9c3e555
@nvgoldin

@bitprophet - thanks for looking into this. I'll open a new issue if this happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment