Skip to content

perf: pipeline SFTP requests for upload/download (~2-3x speedup)#196

Open
Yaminyam wants to merge 2 commits intolablup:mainfrom
Yaminyam:feat/sftp-request-pipelining
Open

perf: pipeline SFTP requests for upload/download (~2-3x speedup)#196
Yaminyam wants to merge 2 commits intolablup:mainfrom
Yaminyam:feat/sftp-request-pipelining

Conversation

@Yaminyam
Copy link
Copy Markdown
Member

@Yaminyam Yaminyam commented May 6, 2026

Summary

The high-level `AsyncWrite`/`AsyncRead` impls on `File` issue exactly one SFTP `WRITE`/`READ` at a time and `await` its `STATUS`/`DATA` reply before sending the next. Sustained throughput is therefore bounded by `chunk_size / RTT` — at 50 ms RTT with the default 256 KiB chunk that caps a single transfer at ~5 MiB/s no matter how fast the link is. This is `#3` in the SFTP-stack analysis ("the largest unrealized optimization").

This PR adds two pipelined helpers on `File` that keep up to N SFTP requests in flight concurrently, mirroring OpenSSH `sftp(1)`'s default of `-R 64`.

Changes

  • `crates/bssh-russh-sftp/Cargo.toml`: add `futures = "0.3"` (`std` + `async-await` features only) for `FuturesUnordered`.
  • `crates/bssh-russh-sftp/src/client/fs/file.rs`: two new public methods on `File`:
    • `write_all_pipelined<R: AsyncRead>(reader, max_inflight) -> SftpResult`: reads chunks from `reader` and dispatches `session.write(handle, offset, chunk)` futures via `FuturesUnordered`, topping up the pipeline as in-flight writes complete. Memory is bounded by `max_inflight * write_len`.
    • `read_to_writer_pipelined<W: AsyncWrite>(writer, max_inflight) -> SftpResult`: symmetric for downloads. Out-of-order `READ` responses are buffered in a `BTreeMap` keyed by offset and flushed to `writer` once the next-expected chunk arrives. Stops on the first server-signalled `Eof`.
  • `src/ssh/tokio_client/file_transfer.rs`: rewire `upload_file`/`download_file`/`upload_dir_recursive`/`download_dir_recursive` to use the new helpers with `MAX_INFLIGHT_REQUESTS = 64`. Replaces the previous `tokio::fs::read` whole-file load and `read_to_end` pooled-buffer pattern that perf: stream SFTP uploads/downloads instead of buffering whole file #195 also targets.

Measured impact (macOS arm64 → bssh-server v2.1.3, loopback, 1 GiB)

op build real RSS
upload vanilla v2.1.3 39.30s 3.23 GB
upload streaming-only (#195) 3.47s 20 MB
upload streaming + pipelined 2.27s 49 MB
download vanilla v2.1.3 3.93s 2.17 GB
download streaming-only (#195) 3.41s 16 MB
download streaming + pipelined 1.34s 288 MB

Pipelining adds +53% upload throughput and +155% download throughput on top of the streaming patch. End-to-end vs. vanilla v2.1.3: upload 17× faster (27 → 451 MiB/s), download 2.9× faster (261 → 764 MiB/s). Peak RSS stays well below the unpatched levels even with 64 in-flight chunks.

Notes

  • This stacks on top of perf: stream SFTP uploads/downloads instead of buffering whole file #195. If perf: stream SFTP uploads/downloads instead of buffering whole file #195 lands first the streaming changes here become a small re-paint; if not, this PR also includes the streaming improvement (dropping `tokio::fs::read` and `read_to_end`).
  • `max_inflight = 64` matches OpenSSH's default. A future enhancement could expose this on the public Client API for users who want to tune for very high-RTT links (would benefit from larger N) or memory-constrained clients (smaller N).
  • Memory cap on the download path: the `BTreeMap` reorder buffer holds at most `max_inflight` chunks (~16 MiB). The 288 MB peak observed in the table includes kernel page cache and tokio fs write buffers; sustained working set during the transfer stays bounded.

Test plan

  • `cargo check` clean
  • `cargo build --release` clean
  • 1 GiB upload via patched client to vanilla `bssh-server` v2.1.3 verified file integrity (size + md5 match source)
  • 1 GiB download verified
  • CI / cargo test on PR

The high-level `AsyncWrite`/`AsyncRead` impls on `File` issue exactly
one SFTP `WRITE`/`READ` at a time and `await` its `STATUS`/`DATA` reply
before sending the next. Sustained throughput is therefore bounded by
`chunk_size / RTT` — at 50 ms RTT with the default 256 KiB chunk that
caps a single transfer at ~5 MiB/s no matter how fast the link is.

Add two pipelined helpers on `File` that keep up to N SFTP requests in
flight concurrently, mirroring how OpenSSH's `sftp(1)` client behaves
(`-R 64` by default):

* `File::write_all_pipelined<R: AsyncRead>(reader, max_inflight)` —
  reads chunks from `reader` and dispatches `session.write(...)` futures
  via `FuturesUnordered`, refilling the pipeline as in-flight writes
  complete. Memory bounded by `max_inflight * write_len`.
* `File::read_to_writer_pipelined<W: AsyncWrite>(writer, max_inflight)` —
  symmetric for downloads. Out-of-order responses are buffered in a
  `BTreeMap` keyed by offset and flushed to `writer` as soon as the
  next-expected chunk arrives.

Wire `Client::upload_file`/`download_file`/`upload_dir_recursive`/
`download_dir_recursive` to use the new helpers with `MAX_INFLIGHT_REQUESTS = 64`.

Measured on macOS arm64 against `bssh-server` v2.1.3 on loopback with a
1 GiB file:

| op       | build                  | real    | RSS      |
|----------|------------------------|---------|----------|
| upload   | vanilla v2.1.3         | 39.30s  | 3.23 GB  |
| upload   | streaming-only         |  3.47s  |   20 MB  |
| upload   | streaming + pipelined  |  2.27s  |   49 MB  |
| download | vanilla v2.1.3         |  3.93s  | 2.17 GB  |
| download | streaming-only         |  3.41s  |   16 MB  |
| download | streaming + pipelined  |  1.34s  |  288 MB  |

Pipelining adds ~+53% on upload and ~+155% on download throughput on top
of the streaming patch (which already eliminated the whole-file load).
Peak RSS stays well below the unpatched levels: download holds at most
~`max_inflight` chunks pending in the reorder map, and upload caps at
`max_inflight * chunk_size + reader buffer`.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant