I believe I have identified and fixed the problem of SFTP gets hanging (probably the cause of at least issues #560 and #515 ). I was experiencing this problem when retrieving large files (15G +) over long haul WAN links (low bandwidth and large latency) most reliably.
I diagnosed this using python-2.7.9 on CentOS 5.8 using paramiko 1.15.2 although the fix should more or less apply for any recent version; until recently I had been using paramiko 1.9.0 with python-2.5.1 without encountering this error.
Here is a simple program to demonstrate the problem:
With 1.15 HEAD an example run (you'll need to use your own appropriate remote host, user, and file as mine are all internal to a corporate WAN) looks like this:
I will submit a pull request for the fix shortly, but the synopsis is thus: when the file to get is large (i.e. very many async prefetch requests) and the network slow (slow to receive async results), it is very likely that the window will become full causing the prefetch thread to block on a send waiting for the window to open up. As written, the SFTPFile._prefetch_lock is held by the prefetch thread while attempting to write. But this then blocks the reader thread (typically some client application thread, e.g. main) from reading result packets, such that the window can never open up. A violation of the principal that one should always attempt to read before attempting to write.
The proposed fix avoids holding the SFTPFile._prefetch lock (or any other lock) in SFTPFile._prefetch_thread() while attempting to send the prefetch request; it requires some extra spinning when receiving a response to ensure that the SFTP._prefetch_extents has been updated with the request to avoid a possible race condition. Additionally, SFTPClient._lock must similarly not be held in SFTPClient._async_request() when sending a request (there is no requirement that the send operation itself be part of the protected region). Finally, SFTPClient._read_response() was incorrectly accessing and modifying SFTPClient._expecting without holding SFTPClient._lock; while this in itself was not contributing to the problem, it is clearly incorrect in terms of thread safety.
With the proposed fix, I am able to reliably get very large files over the long, slow WAN without hanging.
The text was updated successfully, but these errors were encountered: