FIX: correct highWaterMark default issue for large sftp downloads. #42

paulroth3d · 2016-07-14T14:14:30Z

High level, this fix addresses an issue where when downloading a large file
(greater than 65535 bytes), the 65536th byte is dropped on each buffer)
Meaning zips - etc cannot be opened.

The highest number should be 65535 not 65536.
The reason is that we are dealing with unsigned digits
and 65535 in binary is 0b1111111111111111
(this is the full integer with all 1s)

65536 in binary is just one more 0b10000000000000000

When downloading large files, what happens is that the buffer
THINKs it stores 65536 bytes, but actually only stores 65535
(and the 65536th byte is lost for each buffer)

Correcting the issue to 64*1024-1 (65535) makes sure that the
highest number of bytes actually meets what it can store.

The highest number should be 65535 not 65536. The reason is that we are dealing with unsigned digits and 65535 in binary is 0b1111111111111111 (this is the full integer with all 1s) 65536 in binary is just one more 0b10000000000000000 When downloading large files, what happens is that the buffer THINKs it stores 65536 bytes, but actually only stores 65535 (and the 65536th byte is lost for each buffer) Correcting the issue to 64*1024-1 (65535) makes sure that the highest number of bytes actually meets what it can store.

mscdex · 2016-07-14T15:12:29Z

I'm skeptical. Can you provide a way to reproduce the problem?

paulroth3d · 2016-07-14T18:55:00Z

thanks @mscdex - I'm afraid I'm not sure what would be the simplest way to give a clear cut way to reproduce the issue. Although I've tried my best below unless you have any better suggestions.

The way we've been reproducing the issue, and verifying the fix:

find / generate a zip file that is more than 2 Mb
Place the zip up on a server accessible through ssh/sftp to download.
convert the generated zip file to hex (to help verify the bytes)
- in linux: xxd -p test_bytes.zip | tr -d '\n' > orig_bytes_hex.zip
download the zip file through the stream with a highWaterMark of 64 * 1024.
convert the retrieved zip file to hex
- in linux: xxd -p test_bytes.zip | tr -d '\n' > orig_bytes_hex.zip
  perform a text compare / diff compare between the two files.

See this guy for example results - http://imgur.com/a/UGisA

(If you'd like to compare the first parts of either of the files, just use:
head -c NumBytes FileToCheck.zip)

Here is an example text for line 2184 - 2187 of the correct text
17372b1443db27bfb4a6ebb6c5d016036d8ba14d76d8a232cd7d372957d3
9fdbdef49081b6e4d0a61cda9e5fd4d643fb0740c6178166c1b6d326076d
b68cb65b469b1cb4a588b614d11e2da036e73629689382b603b484d29adb

Here is the same lines for a file retrieved from the download
17372b1443db27bfb4a6ebb6c5d016036d8ba14d76d8a232cd7d372957d3
9fdbdef49081b6e4d0a61cda9e5fd443fb0740c6178166c1b6d326076db6
8cb65b469b1cb4a588b614d11e2da036e73629689382b603b484d29adbe6

The simplest way to test this, without putting in a large set of code is through the
https://www.npmjs.com/package/scp2
which references
https://www.npmjs.com/package/ssh2
which in turn references this module
https://www.npmjs.com/package/ssh2-streams

Let me know if you need a more specific set of code
and I'll try to provide it (but it might take a bit without any help)

"use strict";
var Q = require('q');
var config = require('config');
var client = require('scp2');

var credentials = { host: 'ftp.example.com', username: 'lordDarkHelmet', password: '1234' };
fileFrom = '/path/to/file.zip';
fileTo = './file.zip';

function downloadFile(credentials, fileFrom, fileTo) {
    var deferred = Q.defer();
    var request = credentials;
    request.path = fileFrom;
    client.scp(request, fileTo, function (err) {
        if (err) {
            deferred.reject(err);
        }
        else {
            //-- always succeeds, but the file is bad.
            deferred.resolve();
        }
    });
    return deferred.promise;
}
downloadFile( credentials, fileFrom, fileTo );

comparison on file:
See this guy for example results - http://imgur.com/a/UGisA

The default used for the highWaterMark from 65536 to 65535 resolves the issue
because once the buffer doesn't overflow.

mscdex · 2016-07-14T19:29:58Z

Can you give an exact file size that is causing you issues? Does this happen every time?

paulroth3d · 2016-07-15T06:30:23Z

Yes this was happening every time.
The test file we were having the issue with was
27,103,645 bytes (27.1 MB on disk)

mscdex · 2016-07-28T06:55:30Z

I just tested with a randomly generated file with exactly that size. Transferring the file directly with ssh2 and using sftp.createReadStream() resulted in matching files every time that I tried. I even tried forcing the internal buffer to fill up to the high water mark and simulating a slow link, but the files still match.

mscdex · 2016-07-31T13:31:10Z

It'd also be interesting to see if you have the same problem when using something like sftp.fastGet() instead of sftp.createReadStream().

BorePlusPlus · 2016-09-23T16:49:58Z

As mentioned in the issue related to this PR I experienced the same. Here is code that I used and some logs of behaviour before and after applying the highWaterMark "fix".
To produce logs I added a cheeky console.log at this location.

It happened every time for me for files of various sizes.

The client OS is Linux (ubuntu 14.04 3.16.0-77) running node 4.4.0, not sure what is running on server (not one of our boxes).

mscdex · 2016-09-24T06:17:00Z

I'm still not able to reproduce the problem, here is what I get with the same log statement placement (and same sized file):

Ready!
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 65536 thisPool.length: 65536
bytesRead: 58831 thisPool.length: 65536
bytesRead: 0 thisPool.length: 65536

@BorePlusPlus If you set debug: console.log in your connection config, what server ident is reported near the beginning of the output?

BorePlusPlus · 2016-09-26T08:25:36Z

This is the reported ident:

DEBUG: Remote ident: 'SSH-2.0-1.82_sshlib GlobalSCAPE'

mscdex · 2016-09-26T14:14:27Z

@BorePlusPlus Looks like some commercial SFTP server that does not have a readily downloadable trial/demo version. Am I correct in assuming that 32 * 1024 also doesn't work for highWaterMark? Either way it definitely seems like a bug with the remote SFTP server implementation.

@paulroth3d Do you happen to know what the remote server ident was in your case?

BorePlusPlus · 2016-09-26T16:16:27Z

It all works tip-top as long as highWaterMark is lower than 64 * 1024 as per logs.

mscdex · 2016-09-26T16:25:35Z

Yeah I'm not sure there's much that can be done by this module to avoid that since it's ultimately a server-side issue. We could check the server ident, but I don't really feel comfortable magically changing the highWaterMark out from under the end user if they end up specifying 64 * 1024.

If we default to 64 * 1024 - 1 instead, then who's to say some other server implementation somewhere else would have a problem with that kind of value? Compatibility is tricky and I think at some point a line has to be drawn.

evansnicholas · 2016-10-10T14:46:04Z

I also ran into this issue while downloading large XML files. The dropped bytes meant that the downloaded file could no longer be parsed as valid XML. Changing the highWaterMark value fixed the issue.

mscdex · 2016-10-10T15:12:23Z

@evansnicholas Out of curiosity, what is the remote server's ident if you enable logging?

evansnicholas · 2016-10-10T15:30:03Z

This is the remote ident:

DEBUG: Remote ident: 'SSH-2.0-billsSSH_3.6.3q3<CR><LF> COMPANY_NAME'

mscdex · 2016-10-10T15:38:40Z

@evansnicholas Well that's interesting .... that's the example ident used in the SSH Transport RFC :-)

cherchyk · 2016-12-22T06:03:34Z

Any updates on it? We had an issues loading big files and correction proposed by @paulroth3d fixed the problem.

mscdex · 2016-12-22T06:33:03Z

@cherchyk Are you using ssh2-streams v0.1.16? If so, what is the server ident?

cherchyk · 2016-12-26T15:42:06Z

I use 0.0.23

but problem exists in current version too. Please review paulroth3d's explanation.

mscdex · 2016-12-26T18:12:20Z

Unfortunately until I can reproduce the issue myself, there's not much I can do to pinpoint the actual problem.

cherchyk · 2016-12-26T18:41:00Z

Will it help if I show you solution I have? I can show code and we can organize video call so I can explain. What time zone are you in? I'm in EST (+5h) Bohdan Cherchyk

…

On Dec 26, 2016, at 1:12 PM, Brian White ***@***.***> wrote: Unfortunately until I can reproduce the issue myself, there's not much I can do to pinpoint the actual problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

BorePlusPlus · 2017-01-26T09:47:42Z

@cherchyk Workaround for servers having issue with 64k high watermark is to just use a different (lower value) e.g.:

var stream = sftp.createReadStream('directory/file.zip', { highWaterMark: 32 * 1024 });

as per

https://gist.github.com/BorePlusPlus/c9186c8feb9902f84da3e35f313a43a8#file-sftp-download-js-L18

cherchyk · 2017-01-26T19:11:46Z

Thanks @BorePlusPlus ,

why this is not default setting?
in most cases we don't know how big is the file. 5 or 500mb

BorePlusPlus · 2017-01-30T16:31:49Z

@cherchyk Well, from conversation above, you can read that this is not the default setting as it is a bit arbitrary. There may as well be servers out there where 32k high water mark wouldn't be a right setting. Essentially it is a server side issue.
On the other hand, it does seem like lowering the high water mark usually solves the issue, so it just might be pragmatic to have lower default, but this is not my call. I am just a user.

Regarding file size. If your server works with lower setting, then it doesn't matter how big your files are.

mscdex · 2017-05-28T23:24:22Z

Closing this for now. If someone is able to create a setup that I can use to easily reproduce the problem locally (e.g. docker setup or otherwise), then I am willing to give that a shot. Otherwise there is not much that I can do at this point. The highWaterMark is already configurable if you want to override the default.

orhankutlu · 2017-10-26T12:01:55Z

@mscdex i am having the same issue.

@BorePlusPlus i tried to set highWaterMark to 32 * 1024, still the same. last bit is always dropped.. I am really stuck. Do you have any suggestion?

paulroth3d mentioned this pull request Jul 14, 2016

When downloading a large file over sftp, the 65536th byte is always dropped. #43

Closed

mscdex closed this May 28, 2017

holsdjakne mentioned this pull request Sep 5, 2018

ReadStream._read (used by createReadStream) ignores number of bytes actually received #109

Open

mofux mentioned this pull request Nov 7, 2018

Read streams appear to drop data mscdex/ssh2#739

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: correct highWaterMark default issue for large sftp downloads. #42

FIX: correct highWaterMark default issue for large sftp downloads. #42

paulroth3d commented Jul 14, 2016

mscdex commented Jul 14, 2016

paulroth3d commented Jul 14, 2016

mscdex commented Jul 14, 2016

paulroth3d commented Jul 15, 2016

mscdex commented Jul 28, 2016

mscdex commented Jul 31, 2016

BorePlusPlus commented Sep 23, 2016 •

edited

Loading

mscdex commented Sep 24, 2016

BorePlusPlus commented Sep 26, 2016

mscdex commented Sep 26, 2016 •

edited

Loading

BorePlusPlus commented Sep 26, 2016 •

edited

Loading

mscdex commented Sep 26, 2016

evansnicholas commented Oct 10, 2016

mscdex commented Oct 10, 2016

evansnicholas commented Oct 10, 2016

mscdex commented Oct 10, 2016

cherchyk commented Dec 22, 2016

mscdex commented Dec 22, 2016 •

edited

Loading

cherchyk commented Dec 26, 2016

mscdex commented Dec 26, 2016

cherchyk commented Dec 26, 2016 via email

BorePlusPlus commented Jan 26, 2017

cherchyk commented Jan 26, 2017 •

edited

Loading

BorePlusPlus commented Jan 30, 2017 •

edited

Loading

mscdex commented May 28, 2017

orhankutlu commented Oct 26, 2017

FIX: correct highWaterMark default issue for large sftp downloads. #42

FIX: correct highWaterMark default issue for large sftp downloads. #42

Conversation

paulroth3d commented Jul 14, 2016

mscdex commented Jul 14, 2016

paulroth3d commented Jul 14, 2016

mscdex commented Jul 14, 2016

paulroth3d commented Jul 15, 2016

mscdex commented Jul 28, 2016

mscdex commented Jul 31, 2016

BorePlusPlus commented Sep 23, 2016 • edited Loading

mscdex commented Sep 24, 2016

BorePlusPlus commented Sep 26, 2016

mscdex commented Sep 26, 2016 • edited Loading

BorePlusPlus commented Sep 26, 2016 • edited Loading

mscdex commented Sep 26, 2016

evansnicholas commented Oct 10, 2016

mscdex commented Oct 10, 2016

evansnicholas commented Oct 10, 2016

mscdex commented Oct 10, 2016

cherchyk commented Dec 22, 2016

mscdex commented Dec 22, 2016 • edited Loading

cherchyk commented Dec 26, 2016

mscdex commented Dec 26, 2016

cherchyk commented Dec 26, 2016 via email

BorePlusPlus commented Jan 26, 2017

cherchyk commented Jan 26, 2017 • edited Loading

BorePlusPlus commented Jan 30, 2017 • edited Loading

mscdex commented May 28, 2017

orhankutlu commented Oct 26, 2017

BorePlusPlus commented Sep 23, 2016 •

edited

Loading

mscdex commented Sep 26, 2016 •

edited

Loading

BorePlusPlus commented Sep 26, 2016 •

edited

Loading

mscdex commented Dec 22, 2016 •

edited

Loading

cherchyk commented Jan 26, 2017 •

edited

Loading

BorePlusPlus commented Jan 30, 2017 •

edited

Loading