threading bugfix for socket closure #156

Closed
wants to merge 2 commits into
from

Projects

None yet

3 participants

@tycho
Contributor
tycho commented Apr 11, 2013

Submitting a bugfix on behalf of a coworker on my team.

Make sure the Thread.run() method has terminated before closing the
socket. Currently, the socket is closed through Packetizer.close(),
which happens too early. Move the socket.close() into Transport.close()
and after the Thread.join() call.

While at it, modify the stop_thread() method and use it in
Transport.close() to avoid code duplication. Use join() with a timeout
to make it possible to terminate the main thread with KeyboardInterrupt.
Also, remove the now obsolete socket.close() from Transport.atfork().

This fixes a potential infinite loop if paramiko.SSHClient is connected
through a paramiko.Channel instead of a regular socket (tunneling).

Details:

Using a debug patch to dump the current stack of the thread every
couple of seconds while trying to close it, I've seen the following
over and over again:

Thread could not be stopped, still running.
Current traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File ".../paramiko/transport.py", line 1564, in run
    self._channel_handler_table[ptype](chan, m)
  File ".../paramiko/channel.py", line 1102, in _handle_close
    self.transport._send_user_message(m)
  File ".../paramiko/transport.py", line 1418, in _send_user_message
    self._send_message(data)
  File ".../paramiko/transport.py", line 1398, in _send_message
    self.packetizer.send_message(data)
  File ".../paramiko/packet.py", line 319, in send_message
    self.write_all(out)
  File ".../paramiko/packet.py", line 248, in write_all
    n = self.__socket.send(out)
  File ".../paramiko/channel.py", line 732, in send
    self.lock.release()

The thread was running Packetizer.write_all() in an endless loop:

while len(out) > 0:
    ...
    n = Channel.send(out) # n == 0 because channel got closed
    ...
    out = out[n:]         # essentially out = out

Signed-off-by: Frank Arnold <farnold@amazon.com>
Frank Arnold transport: Wait for thread termination before closing the socket
Make sure the Thread.run() method has terminated before closing the
socket. Currently, the socket is closed through Packetizer.close(),
which happens too early. Move the socket.close() into Transport.close()
and after the Thread.join() call.

While at it, modify the stop_thread() method and use it in
Transport.close() to avoid code duplication. Use join() with a timeout
to make it possible to terminate the main thread with KeyboardInterrupt.
Also, remove the now obsolete socket.close() from Transport.atfork().

This fixes a potential infinite loop if paramiko.SSHClient is connected
through a paramiko.Channel instead of a regular socket (tunneling).

Details:

Using a debug patch to dump the current stack of the thread every
couple of seconds while trying to close it, I've seen the following
over and over again:

Thread could not be stopped, still running.
Current traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 524, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File ".../paramiko/transport.py", line 1564, in run
    self._channel_handler_table[ptype](chan, m)
  File ".../paramiko/channel.py", line 1102, in _handle_close
    self.transport._send_user_message(m)
  File ".../paramiko/transport.py", line 1418, in _send_user_message
    self._send_message(data)
  File ".../paramiko/transport.py", line 1398, in _send_message
    self.packetizer.send_message(data)
  File ".../paramiko/packet.py", line 319, in send_message
    self.write_all(out)
  File ".../paramiko/packet.py", line 248, in write_all
    n = self.__socket.send(out)
  File ".../paramiko/channel.py", line 732, in send
    self.lock.release()

The thread was running Packetizer.write_all() in an endless loop:

while len(out) > 0:
    ...
    n = Channel.send(out) # n == 0 because channel got closed
    ...
    out = out[n:]         # essentially out = out

Signed-off-by: Frank Arnold <farnold@amazon.com>
068bf63
@tycho

There seems to be a documentation bug for Python. The "is_alive" here needs to be changed to "isAlive" for Python 2.5 compatibility.

Steven Noonan un-break Python 2.5 compatibility by using isAlive() instead of is_al…
…ive()

Python's documentation has a bug[1], in that it doesn't correctly annotate
is_alive as being a function introduced in Python 2.6.

[1] http://bugs.python.org/issue15126

Signed-off-by: Steven Noonan <snoonan@amazon.com>
5c124cb
@bitprophet
Member

Thanks to you & Frank for this! I acknowledge the bug & fix, though I'd love it if you could provide more details on how to reproduce. The the tunnel-friendly sock option is recent, but I don't recall seeing this issue when I was testing out the new code, nor have I run into it since (though I personally do not use the feature in my daily work).

Also, minor, but please add a changelog entry in NEWS under the 1.11 section (or whichever is latest when you get to it) & crediting whoever wants credit (it can be both of course).

@frank-arnold

With the following script and paramiko.git/master I'm able to reproduce the
issue once in a while. It's a race condition though, so it doesn't trigger every
time.

The underlying problem is, that the send() method of regular socket objects
will raise an exception. This exception cancels the loop in Packetizer.write_all().
The send() method of paramiko.Channel objects just returns 0 if the channel
got closed. This can lead to the race condition resulting in an infinite loop.

#!/usr/bin/env python
import sys
from optparse import OptionParser

sys.path.insert(0, "./paramiko")
import paramiko

def get_option_parser():
    parser = OptionParser(
        usage="%prog --target <target ip> --gateway "
            "<gateway ip> --user <username> --keyfile <key file> <command>",
        description="Reproducer for possible Paramiko infinite loop while "
            "using tunneling (Paramiko pull request #156). Execute a command "
            "on a target host via a tunneled SSH connection through gateway.")
    parser.add_option("--target", action="store", dest="target", default=None,
        help="Name or IP of the target host")
    parser.add_option("--gateway", action="store", dest="gateway", default=None,
        help="Name or IP of the gateway")
    parser.add_option("--user", action="store", dest="username", default=None,
        help="The username to be used for the target host and gateway")
    parser.add_option("--keyfile", action="store", dest="keyfile", default=None,
        help="The keyfile to be used for the target host and gateway")

    return parser

def main():
    parser = get_option_parser()
    (options, args) = parser.parse_args()

    command = " ".join(args)

    failed = False
    if not command:
        sys.stderr.write("Need command to execute on the target.\n")
        failed = True
    if not options.target:
        sys.stderr.write("No target IP given.\n")
        failed = True
    if not options.gateway:
        sys.stderr.write("No gateway IP given.\n")
        failed = True
    if not options.username:
        sys.stderr.write("No username given.\n")
        failed = True
    if not options.keyfile:
        sys.stderr.write("No keyfile given.\n")
        failed = True
    if failed:
        parser.print_usage()
        sys.exit(2)

    # Gateway setup
    gateway_client = paramiko.SSHClient()
    gateway_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    gateway_client.connect(
        options.gateway,
        username=options.username,
        key_filename=options.keyfile,
        allow_agent=True,
        look_for_keys=False
    )
    transport = gateway_client.get_transport()
    transport.set_keepalive(30)
    channel = transport.open_channel(
        "direct-tcpip",
        (options.target, 22),
        ("dummy", 31337)
    )

    # Target setup
    target_client = paramiko.SSHClient()
    target_client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    target_client.connect(
        options.target,
        username=options.username,
        key_filename=options.keyfile,
        allow_agent=True,
        look_for_keys=False,
        sock=channel
    )

    # Execute command on target
    stdout = target_client.exec_command(command)[1]
    output = "".join(stdout.readlines())

    # Close target and gateway
    target_client.close()
    gateway_client.close()

    # Print stdout from target
    sys.stdout.write(output)

if __name__ == "__main__":
    main()
@bitprophet
Member

@frank-arnold Thanks - I can replicate with that (happened on my 2nd try, even.) Using this snippet I can reliably get it to appear & tell me how many runs it took:

$ while true; do gtimeout 5 python test.py --target <host> --gateway <otherhost> --user <name> --keyfile ~
/.ssh/id_rsa uname || break; done | wc -l

Prints out 20 or 21 usually, so getting it on my 2nd try was some luck.

@bitprophet
Member

Sweet deal, when I switch to your branch that script ran for literally an hour without timing out. I'd say that indicates a fix! Merged into the 1.11 branch (figure that's the best target since that's what you would've based off of) and will publish tonight.

@bitprophet bitprophet added a commit that referenced this pull request Sep 28, 2013
@bitprophet bitprophet Changelog re #156, closes #156 e185178
@bitprophet bitprophet added a commit that closed this pull request Sep 28, 2013
@bitprophet bitprophet Changelog re #156, closes #156 e185178
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment