I believe I have identified and fixed the problem of SFTP gets hanging (probably the cause of at least issues #560 and #515 ). I was experiencing this problem when retrieving large files (15G +) over long haul WAN links (low bandwidth and large latency) most reliably.
I diagnosed this using python-2.7.9 on CentOS 5.8 using paramiko 1.15.2 although the fix should more or less apply for any recent version; until recently I had been using paramiko 1.9.0 with python-2.5.1 without encountering this error.
Here is a simple program to demonstrate the problem:
from __future__ import with_statement
lock = threading.Lock()
progress = None
def callback(sofar, total):
global lock, progress
progress = (sofar, total)
if len(sys.argv) != 5:
print >>sys.stderr, 'usage: %s host username remote local' % sys.argv
host = sys.argv
username = sys.argv
remote = sys.argv
local = sys.argv
transport = paramiko.Transport((host,22))
path = os.path.join(os.environ['HOME'], '.ssh', 'id_dsa')
key = paramiko.DSSKey.from_private_key_file(path)
sftp = paramiko.SFTPClient.from_transport(transport)
last = 0
t = threading.Thread(target=sftp.get, args=(remote,local, callback))
if progress is not None:
if progress == progress:
if progress == last:
print >>sys.stderr, 'stalled %d of %d' % progress
print '%d of %d' % progress
last = progress
With 1.15 HEAD an example run (you'll need to use your own appropriate remote host, user, and file as mine are all internal to a corporate WAN) looks like this:
% ./test-sftp-stall remote-host remote-user remote-file local-file
20348928 of 15472925029
23199744 of 15472925029
23855104 of 15472925029
stalled 23855104 of 15472925029
I will submit a pull request for the fix shortly, but the synopsis is thus: when the file to get is large (i.e. very many async prefetch requests) and the network slow (slow to receive async results), it is very likely that the window will become full causing the prefetch thread to block on a send waiting for the window to open up. As written, the SFTPFile._prefetch_lock is held by the prefetch thread while attempting to write. But this then blocks the reader thread (typically some client application thread, e.g. main) from reading result packets, such that the window can never open up. A violation of the principal that one should always attempt to read before attempting to write.
The proposed fix avoids holding the SFTPFile._prefetch lock (or any other lock) in SFTPFile._prefetch_thread() while attempting to send the prefetch request; it requires some extra spinning when receiving a response to ensure that the SFTP._prefetch_extents has been updated with the request to avoid a possible race condition. Additionally, SFTPClient._lock must similarly not be held in SFTPClient._async_request() when sending a request (there is no requirement that the send operation itself be part of the protected region). Finally, SFTPClient._read_response() was incorrectly accessing and modifying SFTPClient._expecting without holding SFTPClient._lock; while this in itself was not contributing to the problem, it is clearly incorrect in terms of thread safety.
With the proposed fix, I am able to reliably get very large files over the long, slow WAN without hanging.
my apologies - being a github nube I didn't realize that the pull request would create a whole new issue. I thought I had to create a new issue first. Oh well ...
Yea, Github still needs some feature love there IMHO, don't worry about it :) thanks!