Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

vladimirlagunov · 2022-03-04T14:09:12Z

If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly.

This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier.

Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size.

Interestingly, a similar issue #183 was fixed many years ago, but the bug was actually in the code introduced for that fix.

…ffer is too big If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly. This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier. Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size. Interestingly, a similar issue hierynomus#183 was fixed many years ago, but the bug was actually in the code introduced for that fix.

codecov-commenter · 2022-03-04T14:14:36Z

Codecov Report

Merging #769 (cb9acdf) into master (50efeb6) will increase coverage by 0.44%.
The diff coverage is 85.18%.

@@             Coverage Diff              @@
##             master     #769      +/-   ##
============================================
+ Coverage     63.93%   64.37%   +0.44%     
- Complexity     1452     1458       +6     
============================================
  Files           211      211              
  Lines          8617     8615       -2     
  Branches        811      806       -5     
============================================
+ Hits           5509     5546      +37     
+ Misses         2677     2645      -32     
+ Partials        431      424       -7

Impacted Files	Coverage Δ
...rc/main/java/net/schmizz/sshj/sftp/RemoteFile.java	`55.00% <85.18%> (+14.15%)`	⬆️
...net/schmizz/sshj/transport/TransportException.java	`33.33% <0.00%> (-11.12%)`	⬇️
...ain/java/net/schmizz/sshj/common/SSHException.java	`61.29% <0.00%> (-6.46%)`	⬇️
...va/net/schmizz/sshj/connection/ConnectionImpl.java	`59.34% <0.00%> (+3.25%)`	⬆️
.../main/java/net/schmizz/sshj/sftp/PacketReader.java	`95.74% <0.00%> (+4.25%)`	⬆️
...zz/sshj/connection/channel/ChannelInputStream.java	`86.15% <0.00%> (+10.76%)`	⬆️
...t/schmizz/sshj/connection/ConnectionException.java	`33.33% <0.00%> (+11.11%)`	⬆️
...va/net/schmizz/sshj/connection/channel/Window.java	`68.75% <0.00%> (+14.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50efeb6...cb9acdf. Read the comment docs.

vladimirlagunov · 2022-03-04T14:16:50Z

After writing this simple piece of code, I stumbled upon questions: why do I remove already received data chunks if the window is adjusted? Why not reuse them later? The answer came after few hours of hacking. Indeed, it would be cool to make something as complicated as an interval tree. However, if we don't remove the data chunks which can be used only after many other smaller requests, the size of the unconfirmed reads will be equal to maxUnconfirmedReads - 1 for a long time, and the read-ahead algorigthm won't work as expected. Another option might be increasing of maxUnconfirmedReads, but such behaviour would be unfair and unexpected.

sonatype-lift · 2022-03-04T14:17:14Z

src/main/java/net/schmizz/sshj/sftp/RemoteFile.java


        private final byte[] b = new byte[1];

        private final int maxUnconfirmedReads;
        private final long readAheadLimit;
-        private final Queue<Promise<Response, SFTPException>> unconfirmedReads = new LinkedList<Promise<Response, SFTPException>>();
-        private final Queue<Long> unconfirmedReadOffsets = new LinkedList<Long>();
+        private final Queue<UnconfirmedRead> unconfirmedReads = new LinkedList<>();


JdkObsolete: It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not. (details)
(at-me in a reply with help or ignore)

sonatype-lift · 2022-03-04T14:17:15Z

src/main/java/net/schmizz/sshj/sftp/RemoteFile.java

            this.readAheadLimit = readAheadLimit > 0 ? fileOffset + readAheadLimit : Long.MAX_VALUE;
        }

        private ByteArrayInputStream pending = new ByteArrayInputStream(new byte[0]);

        private boolean retrieveUnconfirmedRead(boolean blocking) throws IOException {
-            if (unconfirmedReads.size() <= 0) {
+            final UnconfirmedRead unconfirmedRead = unconfirmedReads.peek();
+            if (unconfirmedRead == null || !blocking && !unconfirmedRead.getPromise().isDelivered()) {


OperatorPrecedence: Use grouping parenthesis to make the operator precedence explicit (details)
(at-me in a reply with help or ignore)

hierynomus · 2022-03-04T20:07:35Z

Thanks for the PR @vladimirlagunov. High quality as always!

…oteFile Due to a bug in logic introduced by hierynomus#769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested. Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic. The bug didn't cause content corruption. It only affected performance, both on servers and on clients.

) Due to a bug in logic introduced by #769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested. Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic. The bug didn't cause content corruption. It only affected performance, both on servers and on clients.

vladimirlagunov requested a review from hierynomus as a code owner March 4, 2022 14:09

sonatype-lift bot reviewed Mar 4, 2022

View reviewed changes

hierynomus approved these changes Mar 4, 2022

View reviewed changes

hierynomus merged commit 9a939d0 into hierynomus:master Mar 4, 2022

brenttyler mentioned this pull request Apr 6, 2022

Slowness in 0.32.1-SNAPSHOT compared to 0.32.0 release #777

Closed

vladimirlagunov mentioned this pull request Apr 6, 2022

Fix #777: Don't request excessive read-ahead packets in RemoteFile #778

Merged

simhro mentioned this pull request May 30, 2022

SCP data transfer speed comparison with OpenSSH #688

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

vladimirlagunov commented Mar 4, 2022

codecov-commenter commented Mar 4, 2022 •

edited

Loading

vladimirlagunov commented Mar 4, 2022

sonatype-lift bot Mar 4, 2022

sonatype-lift bot Mar 4, 2022

hierynomus commented Mar 4, 2022

Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769

Conversation

vladimirlagunov commented Mar 4, 2022

codecov-commenter commented Mar 4, 2022 • edited Loading

Codecov Report

vladimirlagunov commented Mar 4, 2022

sonatype-lift bot Mar 4, 2022

Choose a reason for hiding this comment

sonatype-lift bot Mar 4, 2022

Choose a reason for hiding this comment

hierynomus commented Mar 4, 2022

codecov-commenter commented Mar 4, 2022 •

edited

Loading