fix sftp stalls #578

Closed
wants to merge 2 commits into
from

Projects

None yet

4 participants

@stephencpope
Contributor

Proposed fix for #577. Implemented as a branch off 1.15 HEAD as I was not interested in the extra credit of determining exactly when the issue arose earlier in the history. However, it would apply in principal beginning at the point where sftp prefetch was implemented (sometime after 1.9.0).

@stephencpope
Contributor

Bah! I don't do no stinkin' Python 3. My submission is failing the various Python 3 variations (no dict.has_key() method). I am sure some simple syntactic change can fix that.

@stephencpope
Contributor

So I fixed the has_key problem. Python 3.2 passes. Python 3.3 and Python 3.4 both timeout, apparently hanging up in the SSHClient tests, which do not involve SFTPClient in any way; I suspect the problem is not caused by me. I got and built Python 3.4.3, ran the tests manually myself, and they worked fine. Don't know what to say ...

@bitprophet
Member

This is most likely covered by some older tickets as well, need to dig those up sometime to make sure use case coverage is better understood. (I'm paranoid about fixes breaking other areas of code...:D)

@stephencpope
Contributor

we've been using the fix here for lots of very large downloads (files as big as 35G thanks to recent market turmoil) as well as plenty of small ones with no adverse effects. Not that that will give you warm fuzzies!

@bitprophet
Member

It's still helpful though! So thanks :) and mostly I just want to make sure we link related tickets for closure or noting that "hey we're merging this and so um watch out for side effects".

@stephencpope
Contributor

So one thing to look out for: it seemed to me that the SFTP code wasn't safe to use for both getting and putting simultaneously. While my fixes address some locking issues, I didn't even attempt to address this since it is not a use case of mine.

@Tatsh
Tatsh commented Feb 6, 2016

Status on this? I'm getting 'Garbage packet' exceptions when retrieving some large files (like anything above 50 MiB it seems, but sometimes things randomly work).

Other times the connection is extremely slow. It might be the server (as always can be the case), but lftp is able to retrieve such files very easily at around 2 MiB/s.

Is this a useful fix for that? I am using Python 3 and it appears this needs patching for 3.4?

@shuhaowu

Any status on this? I think I just ran into this problem, although I am not 100% positive. I used pdb to trace it to hang on: https://github.com/paramiko/paramiko/blob/1.16/paramiko/sftp_file.py#L477.

PDB shows:

(Pdb) p self._prefetch_lock.locked()
True

When I step over that line it hangs.

When I straced one of the processes, I get

poll([{fd=3, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=3, events=POLLIN}], 1, 100)   = 0 (Timeout)
...

fd=3 is the socket to the SFTP server.

@Tatsh
Tatsh commented Mar 10, 2016

I apply this patch on the version of Paramiko I'm using now (in my virtualenv). It works really well. Using Python 3.4. My project really depends on it. So I hope it does get merged at some point.

@shuhaowu

Applied this patch on top of 1.16 today. Seems to have fixed this issue.

Is it possible we get some project member to look into merging this? I can help solve the merge conflict and re PR if OP is not going to do that.

@bitprophet bitprophet modified the milestone: 1.16.2, 1.16.1 Apr 25, 2016
@bitprophet
Member

Noting for the record that I wasn't able to recreate this (using the description in #577) even across a (not slow...) WAN connection, but thanks to @stephencpope's excellent analysis (thank you!) and reading the code, I'm reasonably sure I grok what it's doing.

More importantly, I'm not able to detect any serious regression in my integration branch :)

@bitprophet bitprophet added a commit that closed this pull request Apr 25, 2016
@bitprophet bitprophet Changelog closes #578 30dabff
@stephencpope
Contributor

Thank you!

@bitprophet
Member

@stephencpope it's remotely possible this patchset is causing #730. I'll be trying to confirm/deny that tomorrow; if you happen to have time before then & want to try reproducing it (it's probably intermittent and mostly affecting Python 3.5, hooray) please comment on #730 with any findings.

@bitprophet
Member

As seen on #730 I was wholly unable to recreate the occurrence, so, hopefully a false alarm. glares suspiciously at all computers everywhere

@Tatsh
Tatsh commented Apr 26, 2016

I am glad because I have wanted this merged for a long time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment