-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: correct highWaterMark default issue for large sftp downloads. #42
Conversation
The highest number should be 65535 not 65536. The reason is that we are dealing with unsigned digits and 65535 in binary is 0b1111111111111111 (this is the full integer with all 1s) 65536 in binary is just one more 0b10000000000000000 When downloading large files, what happens is that the buffer THINKs it stores 65536 bytes, but actually only stores 65535 (and the 65536th byte is lost for each buffer) Correcting the issue to 64*1024-1 (65535) makes sure that the highest number of bytes actually meets what it can store.
I'm skeptical. Can you provide a way to reproduce the problem? |
thanks @mscdex - I'm afraid I'm not sure what would be the simplest way to give a clear cut way to reproduce the issue. Although I've tried my best below unless you have any better suggestions. The way we've been reproducing the issue, and verifying the fix:
See this guy for example results - http://imgur.com/a/UGisA (If you'd like to compare the first parts of either of the files, just use: Here is an example text for line 2184 - 2187 of the correct text Here is the same lines for a file retrieved from the download The simplest way to test this, without putting in a large set of code is through the Let me know if you need a more specific set of code
comparison on file: The default used for the highWaterMark from 65536 to 65535 resolves the issue |
Can you give an exact file size that is causing you issues? Does this happen every time? |
Yes this was happening every time. |
I just tested with a randomly generated file with exactly that size. Transferring the file directly with |
It'd also be interesting to see if you have the same problem when using something like |
As mentioned in the issue related to this PR I experienced the same. Here is code that I used and some logs of behaviour before and after applying the It happened every time for me for files of various sizes. The client OS is Linux (ubuntu 14.04 3.16.0-77) running node 4.4.0, not sure what is running on server (not one of our boxes). |
I'm still not able to reproduce the problem, here is what I get with the same log statement placement (and same sized file):
@BorePlusPlus If you set |
This is the reported ident:
|
@BorePlusPlus Looks like some commercial SFTP server that does not have a readily downloadable trial/demo version. Am I correct in assuming that @paulroth3d Do you happen to know what the remote server ident was in your case? |
It all works tip-top as long as |
Yeah I'm not sure there's much that can be done by this module to avoid that since it's ultimately a server-side issue. We could check the server ident, but I don't really feel comfortable magically changing the If we default to |
I also ran into this issue while downloading large XML files. The dropped bytes meant that the downloaded file could no longer be parsed as valid XML. Changing the highWaterMark value fixed the issue. |
@evansnicholas Out of curiosity, what is the remote server's ident if you enable logging? |
This is the remote ident:
|
@evansnicholas Well that's interesting .... that's the example ident used in the SSH Transport RFC :-) |
Any updates on it? We had an issues loading big files and correction proposed by @paulroth3d fixed the problem. |
@cherchyk Are you using |
I use 0.0.23 but problem exists in current version too. Please review paulroth3d's explanation. |
Unfortunately until I can reproduce the issue myself, there's not much I can do to pinpoint the actual problem. |
Will it help if I show you solution I have? I can show code and we can organize video call so I can explain.
What time zone are you in? I'm in EST (+5h)
Bohdan Cherchyk
… On Dec 26, 2016, at 1:12 PM, Brian White ***@***.***> wrote:
Unfortunately until I can reproduce the issue myself, there's not much I can do to pinpoint the actual problem.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@cherchyk Workaround for servers having issue with 64k high watermark is to just use a different (lower value) e.g.: var stream = sftp.createReadStream('directory/file.zip', { highWaterMark: 32 * 1024 }); as per https://gist.github.com/BorePlusPlus/c9186c8feb9902f84da3e35f313a43a8#file-sftp-download-js-L18 |
Thanks @BorePlusPlus , why this is not default setting? |
@cherchyk Well, from conversation above, you can read that this is not the default setting as it is a bit arbitrary. There may as well be servers out there where 32k high water mark wouldn't be a right setting. Essentially it is a server side issue. Regarding file size. If your server works with lower setting, then it doesn't matter how big your files are. |
Closing this for now. If someone is able to create a setup that I can use to easily reproduce the problem locally (e.g. docker setup or otherwise), then I am willing to give that a shot. Otherwise there is not much that I can do at this point. The |
@mscdex i am having the same issue. @BorePlusPlus i tried to set highWaterMark to 32 * 1024, still the same. last bit is always dropped.. I am really stuck. Do you have any suggestion? |
High level, this fix addresses an issue where when downloading a large file
(greater than 65535 bytes), the 65536th byte is dropped on each buffer)
Meaning zips - etc cannot be opened.
The highest number should be 65535 not 65536.
The reason is that we are dealing with unsigned digits
and 65535 in binary is 0b1111111111111111
(this is the full integer with all 1s)
65536 in binary is just one more 0b10000000000000000
When downloading large files, what happens is that the buffer
THINKs it stores 65536 bytes, but actually only stores 65535
(and the 65536th byte is lost for each buffer)
Correcting the issue to 64*1024-1 (65535) makes sure that the
highest number of bytes actually meets what it can store.