One 'bad' Transport thread hangs indefinitely at shutdown when multiple Transports are active #520

Open
colinmcintosh opened this Issue Apr 30, 2015 · 50 comments

Projects

None yet
@colinmcintosh
colinmcintosh commented Apr 30, 2015 edited

[MAINTAINER NOTE: a variant of this issue under Python 3 has been worked around as per this comment but it's presumably still at large for Python 2 in some scenarios.]

When running SSH connections to multiple devices the script will sometimes hang once the sys.exit(0) is reached. Once this is reached the script will hang indefinitely. It doesn't happen everytime but the more devices the script runs against the more likely it is to happen. It seems like it's related to the amount of time the script takes to run.

The last log message paramiko outputs is DEBUG:EOF in transport thread

Using the faulthandler module I dumped a stacktrace for when it hangs:

Fatal Python error: Aborted

Thread 0x00007f47faff5700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f47fbff7700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f47fcff9700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f47fdffb700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f47feffd700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f480590f700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f48050ce700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Thread 0x00007f4807368700 (most recent call first):
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 204 in read_all
  File "build/bdist.linux-x86_64/egg/paramiko/packet.py", line 341 in read_message
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1590 in run
  File "/usr/local/lib/python2.7/threading.py", line 810 in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 783 in __bootstrap

Current thread 0x00007f480f7c4700 (most recent call first):
  File "/usr/local/lib/python2.7/threading.py", line 358 in wait
  File "/usr/local/lib/python2.7/threading.py", line 960 in join
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 1419 in stop_thread
  File "build/bdist.linux-x86_64/egg/paramiko/transport.py", line 558 in close
  File "build/bdist.linux-x86_64/egg/paramiko/resource.py", line 60 in callback
Aborted

It looks like all but LAST ONE of the resources closes correctly but the LAST ONE doesn't close correctly. The same thing happens if .close() is explicitly called on the Transport object. It will hang at that .close() method indefinitely, sometimes.

There is not one specific device it happens for either. I have tried many hosts and OS's with no specific one standing out a problem.

@colinmcintosh

After further investigation it looks like it's hanging on line 204 under read_all in packet.py which is

x = self.__socket.recv(n)

It looks like it never makes it past this line after the close() method on the Transport object is called.

If I explicitly call

del transport_object

before the end of the script it properly closes the Transport object and cleans up as expected.

@rustyscottweber

I have this same issue and believe it has to do with calling the close method during a del method. Have you tried not including any calls to close paramiko from a delete method and let python's garbage collection process take care of the thread?

@colinmcintosh

I don't have have any del methods in my classes

@colinmcintosh

I've altered the library for debugging purposes by changing the following lines in transport.py@1420:

    def stop_thread(self):
        self.active = False
        self.packetizer.close()
        while self.is_alive() and (self is not threading.current_thread()):
            print("Trying to kill thread.")
            self.join(10)
            print("It's alive: {}".format(self.is_alive()))

The result of this is the script outputs the following once it reaches the sys.exit(0):

Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
Trying to kill thread.
It's alive: True
[truncated, it goes on forever]

It seems to be that the Thread is deadlocked somehow and can't close. I'm not sure what the point of this loop is in transport.py as nothing changes through each loop. Whether you do

while thread.is_alive():
    thread.join(10)

does the same thing, AFAIK, as

thread.join()

I could see value in the loop if there was a debug log there to let you know it's locked up or if there was a loop counter that eventually skips locked threads.

Ideally I would propose that it be changed to

    def stop_thread(self):
        self.active = False
        self.packetizer.close()
        if self is not threading.current_thread():
            self.join(10)
            if self.is_alive():
                raise Exception("Timed out while trying to kill thread")

This doesn't solve whatever is causing the overall locked up Thread issue but this is still a good solution to catch and kill any leftover threads.

@rustyscottweber

What version of paramiko are you using as I think that is a change that has already been made more or less?

@colinmcintosh

What do you mean by more or less? I'm almost positive I'm on the master but if not it's 1.15.1. I took a look at the master and it looks like it will still hang the same way if a thread is hung.

@colinmcintosh

I checked, it's running on version 1.15.2

@Offlein
Offlein commented Jul 24, 2015

I'm experiencing what might be the same issue on Paramiko 1.15.2, although I don't think it's because the Transport object closed. At least, I've set a breakpoint in the Transport object's close() and it is not being hit. (So... maybe it's not?)

Still, though, I'm otherwise getting stuck in the x = self.__socket.recv(n) area as well. This line keeps triggering the socket.timeout exception, I guess, and hence the if got_timeout: section will run, looping back to the beginning of the "while" statement eternally.

Strangely, this script runs just fine when I test it using most servers. This only occurs when I'm connecting to the servers I definitely 100% need to connect to in my office. As well, I can connect to these servers properly from other SFTP software like FileZilla. It's something in conjunction with paramiko and this server...

@stas
stas commented Nov 13, 2015

Please try changing the paramiko.sftp_file.SFTPFile.MAX_REQUEST_SIZE value.
Might be just one of those servers.

@dmitrytokarev

this possibly related to #109

@sanseihappa
Contributor
sanseihappa commented May 26, 2016 edited

#698 Looks like the same issue (Hang in packet.py Packetizer.read_all())

@sanseihappa
Contributor

@colinmcintosh thanks for your initial post using faulthandler, I'm duplicating this issue on a regular basis and dumping the stack traces to confirm I'm hitting the same thing using 2.0.0.

@botanicvelious mentions a workaround put in place further up the call stack in transport.py Transport.run(), I'll try this locally, but...

  • Is this workaround appropriate? (And should I generate a PR to get this in?)
  • Is there something else that should be done instead, that might be considered the actual fix?

Thanks!

@botanicvelious

I wouldn't use my work-around as it doesn't work in all cases, it just works for our specific use case.

@bitprophet bitprophet added the Hangs label Jun 11, 2016
@bitprophet
Member

This ticket accurately describes what I'm getting with fabric v2's integration suite and which made me generate fabric/fabric#1473.

Doesn't matter how many actual hosts (it can be a bunch of connections to the same host) but the more independent Client objects generated, the more likely it is that one of them will hang.

Explicitly calling .close() (which percolates down into Transport.close) on all such objects reliably seems to avoid the problem. This is (probably) why client libs like Fabric have historically always had to explicitly close client/transport objects before interpreter shutdown.

Having that be a hard requirement of using that API or Paramiko itself, feels super crummy to me, so I'd like this to get solved eventually. Going to poke at this somewhat this weekend myself to see if I can get to the root of it.

Difficulty is, threading is often fraught with peril, changes that seem to fix one issue can easily spawn others; and in code this old there's always fun landmines. But this has been an intermittent issue forever so I'd like to at least try fixing it.

@bitprophet
Member
  • Firstly, what is calling stop_thread? Either Transport.close() or the module-level transport._join_lingering_threads.
    • which itself is pretty eyebrow-raising and ancient - dates from 2003/4. Is called via atexit. commit is c565d66 - not very confidence inspiring.
    • In testing, which of those two avenues generates the stop_thread call for a given Transport thread seems pretty randomized.
    • However, the stuck thread always seems to end up there due to a call to Transport.close, never _join_lingering_threads.
  • The stuck thread last executes/gets stuck at self.packetizer.read_message() within run(), with the next activity about that thread being the stop_thread call.
  • The EOF in transport thread stuff seems like it might be orthogonal, as it frequently-to-always is logged by the other two Transports that aren't hanging, and not by the hanging thread.
  • This kinda jives with the core problem: the hung thread never receives any sort of "final" message from the network, not even anything that would make it except, which is why it hangs out forever on packetizer.read_message
    • The "welp, dunno how we got here, but let's just exit-via-excepting on join() timeout" approach taken by @colinmcintosh seems like an okay backstop in general - I assume there are multiple ways to end up in this situation. May add it regardless of whether I find out why these Transports end up in this state.
  • Digging deeper re: connection lifecycles, the stuck threads appear to do:
    • channel-data
    • channel-request (of type exit-status, so this would be the higher level recv_exit_status result most likely)
    • channel-eof
    • channel-close
  • That looks in line with normal behavior going by https://tools.ietf.org/html/rfc4254#section-5.3 and it is also the same order of operations seen in the non-hung threads.
  • So how are the other threads successfully exiting their run while the hung one is not? The answer seems to be the EOFErrors, actually; an exception raising is one of the few ways to exit the while loop, given that the threads tend to fall into blocking on read_message.
    • Which raises the question of why this code even bothers with while self.active, but meh.
    • Anyway, my suspicion is that the EOFErrors are generated by the call to self.packetizer.close() in stop_thread(), but I still need to verify this (& then, again, figure out what's different in the hang situation.) (If that is true, then this seems an awful roundabout way to cause the loop to exit...guessing not intended?)
  • The EOFError raised is always on line 272 of Packetizer.read_all, which is within if got_timeout and triggered by if self.__closed (again, yup, seems like...)
  • Instrumenting further, Packetizer.read_all is encountering socket.timeout when it sets got_timeout, so in a "normal" situation, the order of operations is:
    • Packetizer.close is called by Transport.stop_thread, which sets __closed on the packetizer.
    • The read_all loop encounters socket.timeout (presumably, constantly, anytime recv takes longer than the configured timeout - this is probably key) and sets got_timeout
    • Then hits line 272, sees that self.__closed is True, raises EOFError, breaks the transport out of its run loop, happy days.
  • The key difference is that the hanging thread hits socket.timeout same as the others, but when it gets to the self.__closed check, the answer is False, so it loops again - and then sits on the self.__socket.recv forever.
    • So, first, this does seem to be a race condition - if we assume that all threads would end up blocking on that final recv, it comes down to whether self.__closed gets set fast enough or not.
    • Second, why is socket.timeout not being re-raised? I need to doublecheck exactly how it is supposed to work (and then whether there are bugs in it) but if the loop was a "proper" recv -> timeout -> recv -> timeout semi-hot loop, this problem would not exist, because the "hung" thread's packetizer would simply timeout another 0.1s later, see __closed set, and EOFError.
    • And as mentioned earlier, the fact that this is how the whole construct terminates, seems pretty ridiculous anyways - these errors are always ugly looking in the logs and frequently mislead people. (and at least in my testing, they always appear during normal operation - I don't know if this was always the case, but...)
  • The docs for the socket module certainly back up my memory re: how this is all supposed to behave. So I'm now digging around to see what could cause the timeout to not fire, e.g. the socket being closed elsewhere, or whatnot.

Also of note is that this is all exactly the same stuff #380 describes, though they never got resolution. The only extra wrinkle I see is the assertion that the close call comes from "a different thread"; I don't know that this is the case in my setup, but it does come back to "it only pops up when the closure occurs via Transport.close and not via _join_lingering_threads".

Still unclear if that's germane, tho as noted I need to doublecheck the different treatment of the various objects involved in the two scenarios. I'm 99% sure I've seen non-hanging threads also terminate via Transport.close() (especially if the top level client code does it explicitly.)

@bitprophet
Member

MOAR:

  • If I insert a sleep(1) in the "was not closed on timeout" part of the code, then check __closed after, it is indeed then closed, confirming the guess that this is part of the race condition in question.
  • A slight modification to the previous comment, sometimes the non-hung threads also fulfil part of the race, and get the "isn't closed yet" status; however, they do receive the expected subsequent packet.timeout, and then on that second time through the timeout+closed test, they pass & EOFError.
  • So it still comes down to, why isn't this hung thread's socket raising the timeout?
  • Upon the first "post-hang" loop of the stop_thread join loop, the bad thread's socket object is in a closed state, for whatever good that does us. (This could be another modification on the "backstop" idea, if somehow the straight up "got past 10 seconds without successful join()" condition were not stringent enough.)
  • This specific StackOverflow comment is the first mention I can find of socket.recv not always honoring socket.settimeout, due to a race condition in how Python implements the timeout.
    • The poster seems to walk this back, but I think they're walking back the specific explanation of the race condition, not the fact that it exists?
    • I read the relevant parts https://github.com/python/cpython/blob/8707d182ea722da19ace7fe994f3785cb77a679d/Modules/socketmodule.c and it seems to jive with other explanations of how this works (presence of timeout sets the socket to nonblocking mode; select() used with the given timeout as the select interval; then actual [again, non-blocking] socket call executed; loop).
    • What I don't see there (but granted, I barely know C or network programming at this level) is where a race condition would pop up, because the socket should be in nonblocking mode, so sock_func() should return and then the loop should continue.
  • Both that thread and this one occasionally mention implementing the "nonblocking socket + select" approach at user-level, though none of them say this as a workaround for bugs in settimeout.

Took a different tack and looked at how exactly these bad threads are having .close() called on them; it's via ResourceManager (from resource.py).

  • If I comment out the bit in Client that does that manager registration -- no hangs!
    • But also no third EOFError or other normal shutdown; the last debug message out of the would-have-been-closed-via-ResourceManager thread is the "about to recv()" one it normally hangs on.
    • No obvious ill effects, but with this sort of thing, I'd still worry it would cause issues in some situations. (Whether they are worse than an infinite hang...arguably not?)
  • In some of the success/no-race-condition scenarios, the ResourceManager is still in play - i.e. it's not the sole cause (or symptom) of the issue.
  • The 'bad' thread is almost always the "middle" thread in my "execute 3 commands in sequence" test script. Not sure what that means exactly.
  • Checked timing to figure out what exactly is causing some of these to end up in the lingering-threads list and the 'bad' one to be resource-managered, in case that is a clue...
    • Bad thread: �1465705574.463436
    • Good thread 1: 1465705574.41101
    • Good thread 2: 1465705574.342657
    • So the bad thread's stop_thread is being called 0.12s after the later "good" thread.
    • In subsequent runs, the difference is smaller - 0.04s and 0.05s - but it is still always last by a decent margin. Not that this tells me much.

Other random notes:

  • Slapping a time.sleep at the end of my test script has the expected effect - all 3 threads' Packetizer.read_all are in a hot loop with the recv timeout working great, until the script truly exits, at which time we're at the mercy of the race condition again (though it seems a little less likely to pop up in this case.)
  • Triple checked that the timeout was the one I expected (0.1s) - it is, according to gettimeout. Including just prior to the "bad" recv.
  • Recreated the issue identically on Debian 8, so it's not an OS X wrinkle. (But, as on OS X, the issue only appears present on Python 3.)
  • Python 2 has the same "two threads get linger-killed, one gets resource-managed" behavior - it's just that the thread resource-managed doesn't encounter the race condition / incorrectly-blocking recv.
  • Crawled all over the Python bug tracker, nothing directly relevant seemed to pop up.
    • One bug (https://bugs.python.org/issue23863) did tip me off that Python 3.5 changed a lot of socket timeout related things, so I wonder if, somehow, that is related.
      • Made a Python 3.4.4 venv (thanks, pyenv!) but...nope! same issue there. So unlikely to be PEP-0475's fault.
    • https://bugs.python.org/issue22043 was another possibly related change, but that one was also targeted for 3.5, so.
  • In interests of narrowing things further, I also tested the issue under Python 3.3.6; it's present there too. So, is a general Python 3 issue apparently.
    • I scanned some of the Python 3.x changelogs but nothing useful turned up there either; at this point this is diminishing returns.
@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

Ways forward brainstorm:

  • Still perplexed by the discrepancy between the atexit hook and the ResourceManager closing; while clearly the latter isn't the direct cause, I still wonder if it's a clue. May poke that further; is the ResourceManager firing while the atexit hook loop is running? Or is the 'bad' thread somehow being evicted from _active_threads? If so, why?
  • I'm not super happy leaving things at "use the join timeout as an ultimatum" because it still means the timeout - whatever we set it to - will be an annoyingly long, unexpected wait time at the end of the interpreter session for anybody on Python 3. Including Fabric 2's integration test suite.
    • But, it's still an option, and waiting, say, 1-5s is better than waiting forever. Especially if accompanied by "erm...had to timeout joining the transport thread. sorry! you may have hit bug #520...if you were NOT at the end of your session, please file a bug!".
  • Given that an explicit user-driven close() always seems to work (still unclear why - perhaps the race condition with the socket object has to do with end-of-interpreter shenanigans?) and is arguably good practice, we could list this as "documentation only"...

Another idea occurred to me which I think I like better: keep the ultimatum-style join timeout, but set it to a much shorter value if - by the time we're calling it - the transport's socket and packetizer both appear to have entered their closed states.

That detects the symptoms of this problem, lets the interpreter exit in a reasonable-to-humans amount of time, but limits the possibility of accidentally terminating "too early" in scenarios unlike the one I am testing under.

A couple more wrinkles on this could also be:

  • A while-loop checking those closed states, whose body is join(0.1) (or similar) ; that would ostensibly work even better for "taking a while to shut down" scenarios as they would take as much time as needed;
  • Use threadsafe signaling within Packetizer, around the recv call, so we can truly know whether or not we're in this particular scenario (socket closed, packetizer closed, packetizer "I was reading!" flag still set). I doubt this is necessary but it could be a (IMO pretty stupid) sanity check, if edge cases pop up.
@bitprophet
Member

Yea, mutating the loop to be "not current thread + not socket.closed + not packetizer.closed" and turning the join timeout down to 0.1s seems to do the trick pretty well. I don't have a great way of testing unusually-slow server endpoints right now (something I'd like to get sometime...) but I'm probably going to at least trial this change while I continue hacking on other things.

Minor side note, socket objects have no public "is closed?" flag that I can see, but _closed is a) what's used by __repr__ and b) available in Python 2+ 3, so it'll have to do for now. I'm not crazy worried about that changing anytime soon.

@bitprophet bitprophet added a commit that referenced this issue Jun 12, 2016
@bitprophet bitprophet Experimental fix re #520 0e54d0f
@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

FWIW problem + fix both seem present/workable on 1.16 too (so, this is in no way related to the switch to pyca/crypto - not that I thought it would've been). I committed a cleaned-up version of what I was testing with and forward-ported (1.16, 1.17, 2.0, master) - it's live now.

If anyone watching this can give one of those branches a shot and give a thumbs-up (both re: fixing the issue, if they have it; or at least proving it's not making things worse for them) that would be cool.

@bitprophet
Member

Was reminded by tef on twitter that I never chased down the assertions made in #380 about the issue potentially being how the socket in question is closed from a different thread during the recv call (re this SO thread. So if this needs more love that's probably the next place to look.

Offhand (recalling that threading is not my expertise) if that's the true race condition, it would mean we do want "I'm recving here!" locking in Packetizer.read_all, which is honored by Packetizer.close() when it calls self.__sock.close. And/or in the other 2-3 places where that same socket, since it is also Transport.sock, is closed by Transport methods such as close or the end of run...sigh. But pretty sure in this case the race would be between the two Packetizer methods.

@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

Sadly doesn't seem like a workable avenue:

  • (With my above 'fix' reverted temporarily,) I added locking around recv and self.__socket.close in Packetizer, but we still end up in the blocking recv unfortunately. Implies that Packetizer.__socket.close (from .close) is not actually the trigger.
  • Confirmed this by throwing a 0.5s sleep before the socket close in the original code; changes nothing about the behavior of the race condition, the recv fires and blocks.
    • Also confirmed that this is the right place and the socket isn't closing via some other avenue: if I comment out the socket close entirely, the recv still blocks, and printing the socket object in the join loop shows it staying open indefinitely.
  • So I don't think that's the root cause here (seems more likely it's something to do with this all happening at interpreter shutdown perhaps?), but at least I took a crack at it.
@bitprophet
Member

Hrm, could swear last night that Python 2 socket objects had _closed, but seems not the case this morning. Poking =/

@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

Yea, nope. There's still ways I see to detect whether the socket has had close() called on it, but they're more fragile than just testing _closed (would need to e.g. test isinstance(self.socket._sock, socket._closedsocket)).

Wondering if "just" testing Packetizer's own closed flag is sufficient here...then again, given my instance of the issue only ever seemed to pop up on Python 3, perhaps we need a branch instead.

This fix may never have been good on Python 2 anyways - I occasionally get more hilarious shutdown errors (such as the socket module disappearing inside Transport.run - which I thought had been "solved" a long time ago...) which I presume are due to the now much faster join timeout.

Given I was unable to recreate the problem under Python 2 anyways, going to suck it up and test PY2 for now.

@bitprophet
Member
bitprophet commented Jun 12, 2016 edited

I'm not going to outright close this given at least @colinmcintosh was encountering similar symptoms under Python 2.7 (though in his case, even explicit Transport.close calls weren't fixing it? Ugh.) and my fix only works for 3, but I am done banging my head on it for now since I got past my personal pain point.

Ideally, someone else who can reproduce the issue under Python 2 will do similar horrible debugging as I did above, or at least post details for their reproduction so I or someone else can do so later.

@bitprophet bitprophet changed the title from Paramiko hangs with final message 'EOF in transport thread' to One 'bad' Transport thread hangs indefinitely at shutdown when multiple Transports are active Jun 12, 2016
@colinmcintosh

@bitprophet Interestingly enough, I can only seem to reproduce my issue on <=2.7. Py3 seemed to fix my issue. Can you post the snippet that you use to reproduce this on Python3? I'll try to find the Python2 code I used when I originally had the issue.

@SanderP
SanderP commented Jun 14, 2016

Hi,

This issue has been plaguing me with Py3. I create a channel with invoke_shell, interact with it and then close it. Even though the channel object is local to a method and should be deleted automatically I explicitly del it on all exit paths. Still a paramiko.Transport thread is left alive, preventing the script from exiting. I had to add a loop that looks at all threads and explicitly calls stop_thread on all paramiko.Transport threads. That is pretty bad and does not give me a lot of confidence in using paramiko for our test automation needs.

If an explicit stop_thread is killing these threads properly it appears the internal paramiko cleanup isn't working right.

Sander

@SanderP
SanderP commented Jun 14, 2016

The issue was that my SSHClient() object was not explicitly being closed and garbage collection wasn't doing it either. This may be related to creating these objects as part of unit tests using a unittest.Testcase derived class and calling unittest.main(). I now explicitly close the SSHClient and that resolves the issue. I should have closed it explicitly regardless of garbage collection doing that, a bug in my code.

@bitprophet
Member

@SanderP Thanks for the extra info. Definitely sounds like Paramiko's docs should highlight the need to close the channel, more strongly, or at least make an FAQ for it.

@bitprophet
Member

I did both - FAQ added and a warning block added to the close method docstring.

@sanseihappa
Contributor

I'm seeing my hang case cleared up with the latest changes (is there a PyPI release coming? I'm just doing a local package build and serving on a local dev PyPI for now).

My hang case is rather unusual: basically the other end of the connection is dropping off the network. @bitprophet not sure if there is anything else I can provide?

@bitprophet
Member

@sanseihappa Meant to pop out a bugfix release last week but forgot to. May just do that now, waiting until there's "enough" for a release is a crappy habit I need to break.

Once that's out and you've tried it, I'd ask the following:

  • Which Python version are you on exactly?
  • Did that bugfix work for you?
  • If not, are you explicitly calling .close on your Transport and/or SSHClient objects? If not, does doing so fix the issue for you?
@bitprophet
Member

OK, 1.16.2/1.17.1/2.0.1 are out with this fix (and #537 which is another deadlock fix).

@sanseihappa
Contributor
sanseihappa commented Jun 28, 2016 edited

@bitprophet thanks for the 2.0.1 release. I'm using Python 2.7.11. Still hitting a hang. Here is the faulthandler ABRT dump:

27-Jun-2016 15:54:06    Thread 0x00007fd60a831700 (most recent call first):
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/packet.py", line 254 in read_all
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/packet.py", line 391 in read_message
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 1754 in run
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
27-Jun-2016 15:54:06
27-Jun-2016 15:54:06    Thread 0x00007fd610b29700 (most recent call first):
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxx/xxxxxxxxxx.py", line 643 in xxxxxxxxxxxxxx
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 763 in run
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
27-Jun-2016 15:54:06
27-Jun-2016 15:54:06    Thread 0x00007fd60b2f3700 (most recent call first):
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxx/xxxxxxxxxx.py", line 563 in xxxxxxxxxxxxx
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 763 in run
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
27-Jun-2016 15:54:06
27-Jun-2016 15:54:06    Thread 0x00007fd60bfff700 (most recent call first):
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxx.py", line 197 in xxxxxxxxxxxx
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxx.py", line 155 in xxxxxx
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 763 in run
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
27-Jun-2016 15:54:06
27-Jun-2016 15:54:06    Current thread 0x00007fd61581d740 (most recent call first):
27-Jun-2016 15:54:06      File "/usr/lib/python2.7/threading.py", line 339 in wait
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/buffered_pipe.py", line 156 in read
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/channel.py", line 613 in recv
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/channel.py", line 1234 in _read
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/paramiko/file.py", line 192 in read
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxxx/xxxxxxxxxxxx.py", line 208 in xxxxxxxxxxxxx
27-Jun-2016 15:54:06      File "/mnt/work/test/lib/utils.py", line 776 in xxxxxxxxxxxxxxxxxxxxx
27-Jun-2016 15:54:06      File "features/environment.py", line 222 in after_scenario
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 405 in run_hook
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/behave/model.py", line 919 in run
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/behave/model.py", line 523 in run
27-Jun-2016 15:54:06      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 483 in run_model

[edit] I am explicitly calling .close on SSHClient object in its containing class destructor. However, the hang above is not during interpreter shutdown.

@sanseihappa
Contributor

Sorry for the churn here, but I noticed that a call to SSHClient.exec_command I was using incorrectly had the second argument as the timeout -- which was being interpreted as bufsize. After fixing that call to use a kwarg timeout=timeout, I am now observing a hang that seems to stem from the call to Transport.start_client() from within SSHClient.connect():

28-Jun-2016 08:44:47    Thread 0x00007f72f24cc700 (most recent call first):
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/paramiko/packet.py", line 254 in read_all
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/paramiko/packet.py", line 391 in read_message
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 1754 in run
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
28-Jun-2016 08:44:47    
28-Jun-2016 08:44:47    Thread 0x00007f72f3cdc700 (most recent call first):
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxx/xxxxxxxxxx.py", line 643 in xxxxxxxxxxxxxx
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 763 in run
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
28-Jun-2016 08:44:47    
28-Jun-2016 08:44:47    Thread 0x00007f72f1a0a700 (most recent call first):
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxx/xxxxxxxxxx.py", line 563 in xxxxxxxxxxxxx
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 763 in run
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
28-Jun-2016 08:44:47    
28-Jun-2016 08:44:47    Thread 0x00007f72f34db700 (most recent call first):
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxx.py", line 197 in xxxxxxxxxxxxx
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxxxxxxx/xxxxxxxxxxxxxxxxxxxx.py", line 155 in xxxxxx
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 763 in run
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 810 in __bootstrap_inner
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 783 in __bootstrap
28-Jun-2016 08:44:47    
28-Jun-2016 08:44:47    Current thread 0x00007f72f89d0740 (most recent call first):
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 358 in wait
28-Jun-2016 08:44:47      File "/usr/lib/python2.7/threading.py", line 620 in wait
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/paramiko/transport.py", line 489 in start_client
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/paramiko/client.py", line 338 in connect
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxx/xxxxxxxxxxxx.py", line 141 in xxxxxxx
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/xxxxxxxxxxxxx/xxxxxxxxxxxx.py", line 278 in xxxxxxxxx
28-Jun-2016 08:44:47      File "/mnt/work/test/lib/utils.py", line 774 in xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
28-Jun-2016 08:44:47      File "features/environment.py", line 222 in after_scenario
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 405 in run_hook
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/model.py", line 919 in run
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/model.py", line 523 in run
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 483 in run_model
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 693 in run_with_paths
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/runner.py", line 672 in run
28-Jun-2016 08:44:47      File "/usr/local/lib/python2.7/dist-packages/behave/__main__.py", line 109 in main
28-Jun-2016 08:44:47      File "/usr/local/bin/behave", line 11 in <module>
@sanseihappa
Contributor

Just to confirm: I added an optional timeout argument to Transport.start_client() that would take the timeout from SSHClient.connect(). This doesn't seem to have any adverse side-effects, and allows the client process to not hang per the signature described in my last comment.

@openstack-gerrit openstack-gerrit pushed a commit to openstack/fuel-qa that referenced this issue Jul 6, 2016
@metacoma @theilluminate metacoma + theilluminate Run octane fuel-restore in silent mode
Since 9.0 there is no docker containers and now we should apply all
changes to the Fuel node by applying puppet manifests. Unfortunately,
it's generating tons of lines in stdout and paramiko can hangs on
reading this lines. Disable writting output to stdout/err but keeping
logs in /var/log/octane.log can fix this issue.
For details about paramiko issues see
paramiko/paramiko#520

Change-Id: If3fd0e6f3490d37e486ce70c97de92f83cd6741d
f53db43
@openstack-gerrit openstack-gerrit pushed a commit to openstack/fuel-qa that referenced this issue Jul 6, 2016
@metacoma @theilluminate metacoma + theilluminate Run octane fuel-restore in silent mode
Since 9.0 there is no docker containers and now we should apply all
changes to the Fuel node by applying puppet manifests. Unfortunately,
it's generating tons of lines in stdout and paramiko can hangs on
reading this lines. Disable writting output to stdout/err but keeping
logs in /var/log/octane.log can fix this issue.
For details about paramiko issues see
paramiko/paramiko#520

Change-Id: If3fd0e6f3490d37e486ce70c97de92f83cd6741d
c791aa8
@daboshh
daboshh commented Jul 7, 2016 edited

It looks like I'm having the same problem and fix didn't work for me.
I'm new to python and programming so I ask for your patience.
Code below works on some devices, but sadly not on Alcatel Lucent.
I hope this helps.

Win 10
Python 3.5.1
Paramiko 2.0.1

import paramiko
paramiko.common.logging.basicConfig(level=paramiko.common.DEBUG)

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('x.x.x.x', username='x', password='x')
stdin, stdout, stderr = ssh.exec_command('show uptime')
stdout = stdout.read()
print(stdout)


DEBUG:paramiko.transport:starting thread (client mode): 0x2b714518
DEBUG:paramiko.transport:Local version/idstring: SSH-2.0-paramiko_2.0.1
DEBUG:paramiko.transport:Remote version/idstring: SSH-2.0-OpenSSH_3.5p1
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_3.5p1)
DEBUG:paramiko.transport:kex algos: TOO LONG TO COPY PASTE
DEBUG:paramiko.transport:Kex agreed: diffie-hellman-group1-sha1
DEBUG:paramiko.transport:Cipher agreed: aes128-cbc
DEBUG:paramiko.transport:MAC agreed: hmac-md5
DEBUG:paramiko.transport:Compression agreed: none
DEBUG:paramiko.transport:kex engine KexGroup1 specified hash_algo <built-in function openssl_sha1>
DEBUG:paramiko.transport:Switch to new keys ...
DEBUG:paramiko.transport:Adding ssh-rsa host key for X.X.X.X: b'ad920529adabf592990dcc7d9236881c'
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Authentication (password) successful!
DEBUG:paramiko.transport:[chan 0] Max packet in: 32768 bytes
DEBUG:paramiko.transport:[chan 0] Max packet out: 32768 bytes
DEBUG:paramiko.transport:Secsh channel 0 opened.
DEBUG:paramiko.transport:EOF in transport thread
Traceback (most recent call last):
  File "C:/Users/od/PycharmProjects/ssh/ssh.py", line 7, in <module>
    stdin, stdout, stderr = ssh.exec_command('show uptime')
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\client.py", line 424, in exec_command
    chan.exec_command(command)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 60, in _check
    return func(self, *args, **kwds)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 234, in exec_command
    self._wait_for_event()
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 1103, in _wait_for_event
    raise e
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\transport.py", line 1754, in run
    ptype, m = self.packetizer.read_message()
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\packet.py", line 391, in read_message
    header = self.read_all(self.__block_size_in, check_rekey=True)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\packet.py", line 256, in read_all
    raise EOFError()
EOFError

And line 256 corresponds to previously reported code

               x = self.__socket.recv(n)
                if len(x) == 0:
                    raise EOFError()
@sanseihappa
Contributor

@daboshh I believe your issue is different than the hang being discussed here. In the cases here, no exception is ever being thrown, much less the EOFError you report. You might look at #687 and see if perhaps you're having a similar issue with the other end of the SSH connection?

@sanseihappa
Contributor

@bitprophet Should I file a separate issue for the hang that can occur in Transport.start_client()?

@bitprophet
Member
bitprophet commented Jul 20, 2016 edited

@sanseihappa Yea, that sounds like a good idea, seems orthogonal to me offhand & anything we can do to empower users to get exceptions instead of hangs, would be useful. Please file a PR - thanks!

EDIT: if it wasn't obvious, please drop a ref to "#520" in the ticket body somewhere (not the title, GH doesn't scan those for some reason).

@daboshh
daboshh commented Jul 22, 2016 edited

@sanseihappa I tried switching exchange algorithms, but with no luck. Both sides agree on both exchange algorithms, but return same errors.

@cool-RR
cool-RR commented Dec 7, 2016

I'm using 2.0.2 and I have hangs which I think are caused by this problem. Is there a workaround until a solution is released?

@bitprophet
Member

I'm planning to pop out 2.0.3 today, which has a couple related fixes in it.

@cool-RR
cool-RR commented Dec 9, 2016
@bitprophet
Member

FTR this issue isn't marked as solved yet because it feels like one of those "many causes, similar symptoms" things. We'll see how 2.0.3 and friends do re: fixing it for involved users :)

@daboshh
daboshh commented Dec 12, 2016

It didn't work for me. Looks like I'm having the same errors as previously.

import paramiko
paramiko.common.logging.basicConfig(level=paramiko.common.DEBUG)

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('x.x.x.x', username='x', password='x')
stdin, stdout, stderr = ssh.exec_command('show uptime')
stdout = stdout.read()
print(stdout)

DEBUG:paramiko.transport:starting thread (client mode): 0x4225c668
DEBUG:paramiko.transport:Local version/idstring: SSH-2.0-paramiko_2.1.0
DEBUG:paramiko.transport:Remote version/idstring: SSH-2.0-OpenSSH_3.5p1
INFO:paramiko.transport:Connected (version 2.0, client OpenSSH_3.5p1)
DEBUG:paramiko.transport:kex algos:TOO LONG TO COPY PASTE
DEBUG:paramiko.transport:Kex agreed: diffie-hellman-group1-sha1
DEBUG:paramiko.transport:Cipher agreed: aes128-cbc
DEBUG:paramiko.transport:MAC agreed: hmac-md5
DEBUG:paramiko.transport:Compression agreed: none
DEBUG:paramiko.transport:kex engine KexGroup1 specified hash_algo <built-in function openssl_sha1>
DEBUG:paramiko.transport:Switch to new keys ...
DEBUG:paramiko.transport:Adding ssh-rsa host key for X.X.X.X: b'e412c1b06a2f4c5606c4252390064731'
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Authentication (password) successful!
DEBUG:paramiko.transport:[chan 0] Max packet in: 32768 bytes
DEBUG:paramiko.transport:[chan 0] Max packet out: 32768 bytes
DEBUG:paramiko.transport:Secsh channel 0 opened.
DEBUG:paramiko.transport:EOF in transport thread
Traceback (most recent call last):
  File "C:X\X\X\X", line 8, in <module>
    stdin, stdout, stderr = ssh.exec_command('show uptime')
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\client.py", line 441, in exec_command
    chan.exec_command(command)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 60, in _check
    return func(self, *args, **kwds)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 234, in exec_command
    self._wait_for_event()
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\channel.py", line 1161, in _wait_for_event
    raise e
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\transport.py", line 1760, in run
    ptype, m = self.packetizer.read_message()
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\packet.py", line 391, in read_message
    header = self.read_all(self.__block_size_in, check_rekey=True)
  File "C:\Program Files\Python 3.5.1\lib\site-packages\paramiko\packet.py", line 256, in read_all
    raise EOFError()
EOFError
@rustyscottweber

I ran into a similar problem with this that was related to the Linux kernel I was working on refused to close the socket which would hang the transport thread. Maybe also check the OS you are running on and the kernel version.

@daboshh
daboshh commented Dec 14, 2016

I ran into a similar problem with this that was related to the Linux kernel I was working on refused to close the socket which would hang the transport thread. Maybe also check the OS you are running on and the kernel version.

I'm using updated win 10. :(

@andreycizov

It seems I am running into the same issue when paramiko.SFTPClient.open(). paramiko.sftp_file.SFTPFile.close() would hang indefinitely when called. I have solved this by calling _close(async=True). Not sure if it solves the issue of properly closing the file, but it definitely solves the issue of hangups.

@urban-1
urban-1 commented Dec 23, 2016 edited

Hi all,

I am not sure that I have the same exactly problem but paramiko hangs in packet.py, line 276, in read_all. This is happening only in python 3, details:

$ python3 --version
Python 3.4.3
$ pip3 freeze | grep paramiko
paramiko==1.17.0
$ lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 6.7 (Santiago)
Release:        6.7
Codename:       Santiago

Now, in order to investigate this I made a small function that walks all the threads in sys._current_frames(). Funny thing is that when I run that function the problem disappears!! The code that works is:

            # Close client or transport here ****

            import threading
            import traceback
            for thread_id, frame in sys._current_frames().items():
                for thread in threading.enumerate():
                    if thread.ident == thread_id:
                        name = thread.name
                  ''.join(traceback.format_stack(frame))

EDIT: explicit .close() is required

The most interesting thing is that if you remove the last line you get the symptoms back... I have absolutely no clue or explanation why traceback.format_stack(frame) fixes the problem... I am posting this in case someone has the same issue.

Let me know if I should create a new Issue

Cheers,

Andreas

@tyler-8
tyler-8 commented Dec 28, 2016

Found this thread through Google, I'm running Python 2.7.13 and running into this issue with a multiprocessed script that uses paramiko 2.1.1. Here's my log output where it hangs:

INFO:paramiko.transport:Authentication (password) successful!
DEBUG:paramiko.transport:[chan 0] Max packet in: 32768 bytes
DEBUG:paramiko.transport:[chan 0] Max packet out: 32768 bytes
DEBUG:paramiko.transport:Secsh channel 0 opened.
DEBUG:paramiko.transport:[chan 0] Sesch channel 0 request ok
DEBUG:paramiko.transport:[chan 0] Sesch channel 0 request ok
DEBUG:paramiko.transport:EOF in transport thread
DEBUG:paramiko.transport:EOF in transport thread
DEBUG:paramiko.transport:EOF in transport thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment