Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configurable read and write packet length for SFTP #1320

Closed
wants to merge 2 commits into from
Closed

Allow configurable read and write packet length for SFTP #1320

wants to merge 2 commits into from

Conversation

claytongulick
Copy link

While it's not a complete solution, this helps with poor throughput issues when using createReadSteam()

@mscdex
Copy link
Owner

mscdex commented Aug 10, 2023

I'm not fond of this kind of change, as it's not safe to just use whatever values you want. The read and write limit behavior is copied from OpenSSH, so if OpenSSH exhibits significantly faster speeds with the same cryptographic algorithms and same protocol (e.g. ensuring SFTP is actually used and not SCP or something), then perhaps the existing logic should be tweaked instead.

@claytongulick
Copy link
Author

claytongulick commented Aug 10, 2023

I hear you, but I'm not sure what other options there are - .3MB/s to 3MB/s transfer rate difference is the difference between a job that takes 2 hours, and one that takes 20 hours for me.

Others also implement this as an option, while recognizing that it's unsafe

Unless you can think of a better way to address the slow transfer rate problem with a readable stream?

@mscdex
Copy link
Owner

mscdex commented Aug 10, 2023

If using .exec() is an option, have you tried simply doing something like .exec('cat /path/to/file', ...) and seeing how performance is with that?

@claytongulick
Copy link
Author

claytongulick commented Aug 10, 2023

Unfortunately, the server I'm downloading from is SFTP only. It's about 200gigs of healthcare files that need to be parsed and processed - so the download speed is a big deal.

ssh2/sftp is the perfect solution because I can use streams to parse/filter the files "in-flight" without needing to store to disk and then re-read.

Honestly, fastGet would be fine and I could just use the step function if I could have a data delivery order guarantee. The data is in ndjson, so I need to split on lines and then parse json. Out-of-order chunks make that impossible with just using the step function and fastXfer.

If fastXfer was able "slot fill" a continuous buffer and call "step" once the full buffer was filled, it would make life brilliant. At that point, it could just be used as a really fast readable stream too, I guess.

@mscdex
Copy link
Owner

mscdex commented Aug 13, 2023

Have you benchmarked against using OpenSSH's SFTP client to see how it compares to ssh2 without modifying the packet length?

@mofux
Copy link

mofux commented Sep 11, 2023

I'm also seeing a very slow transfer speed on SFTP, so I tried benchmarking the speed of ssh2 against the native ssh / sftp commands. These were run against a fairly old target (OpenSSH_7.9p1 Debian-10+deb10u1, OpenSSL 1.1.1d 10 Sep 2019) transferring a 50MB file:

native exec: 2.171s
ssh2 exec: 2.702s
native sftp: 29.818s
ssh2 sftp: 19.076s

So, executing a cat <file> on the target using exec is quick in both, native ssh and ssh2. Then, transferring the file using sftp is very slow on both, native client and ssh2 - while ssh2 still outperforms the native client which was a suprise to me.

Here is the script I've used, in case someone else once to try (adjust variables on top to match your target):

// ADJUST
const file = '</path/to/file/on/server>';
const keyFile = '</path/to/local/privatekey>';
const user = '<remote-user>';
const host = '<remote-hostname>';
// END ADJUST

const { spawn } = require('child_process');
const { createWriteStream, readFileSync } = require('fs');
const { join } = require('path');
const { Client } = require('ssh2');
const { pipeline } = require('stream');

const runNativeExec = () => new Promise((resolve) => {
	console.time('native exec');
	const process = spawn('/usr/bin/ssh', [`${user}@${host}`, 'cat', `"${file}"`], { shell: true });
	process.stderr.on('data', (data) => console.log(data.toString()));
	pipeline(process.stdout, createWriteStream(join(__dirname, 'ssh-exec-native')), () => {
		console.timeEnd('native exec');
		resolve();
	});
});

const runNativeSftp = () => new Promise((resolve) => {
	console.time('native sftp');
	const process = spawn('/usr/bin/sftp', ['-q', '-R 1', '-s sftp', `${user}@${host}:${file}`, 'ssh-sftp-native'], { shell: true });
	process.stderr.on('data', (data) => console.log(data.toString()));
	process.on('close', () => {
		console.timeEnd('native sftp');
		resolve();
	});
});

const runSsh2Exec = () => new Promise((resolve) => {
	console.time('ssh2 exec');
	const client = new Client();
	client.on('ready', () => {
		client.exec(`cat "${file}"`, { pty: true }, (err, process) => {
			pipeline(process, createWriteStream(join(__dirname, 'ssh-ssh2-exec')), () => {
				console.timeEnd('ssh2 exec');
				client.end();
				resolve();
			});
		});
	});
	client.connect({
		host,
		username: user,
		privateKey: readFileSync(keyFile)
	});
});

const runSsh2Sftp = () => new Promise((resolve) => {
	console.time('ssh2 sftp');
	const client = new Client();
	client.on('ready', () => {
		client.sftp((err, sftp) => {
			pipeline(sftp.createReadStream(file), createWriteStream(join(__dirname, 'ssh-ssh2-sftp')), () => {
				console.timeEnd('ssh2 sftp');
				client.end();
				resolve();
			});
		});
	});
	client.connect({
		host,
		username: user,
		privateKey: readFileSync(keyFile)
	});
});

(async () => {
	await runNativeExec();
	await runSsh2Exec();
	await runNativeSftp();
	await runSsh2Sftp();
})();

Note: it also writes the transferred file to the folder of your test script, so you can verify the size.

@mofux
Copy link

mofux commented Sep 11, 2023

I found this Stackoverflow answer gave me some peace of mind explaining these huge differences in transfer speeds: https://stackoverflow.com/a/21097076

@claytongulick claytongulick closed this by deleting the head repository Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants