-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ReadAheadRemoteFileInputStream not reading the whole file if a buffer is too big #769
Conversation
…ffer is too big If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly. This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier. Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size. Interestingly, a similar issue hierynomus#183 was fixed many years ago, but the bug was actually in the code introduced for that fix.
Codecov Report
@@ Coverage Diff @@
## master #769 +/- ##
============================================
+ Coverage 63.93% 64.37% +0.44%
- Complexity 1452 1458 +6
============================================
Files 211 211
Lines 8617 8615 -2
Branches 811 806 -5
============================================
+ Hits 5509 5546 +37
+ Misses 2677 2645 -32
+ Partials 431 424 -7
Continue to review full report at Codecov.
|
After writing this simple piece of code, I stumbled upon questions: why do I remove already received data chunks if the window is adjusted? Why not reuse them later? The answer came after few hours of hacking. Indeed, it would be cool to make something as complicated as an interval tree. However, if we don't remove the data chunks which can be used only after many other smaller requests, the size of the unconfirmed reads will be equal to |
|
||
private final byte[] b = new byte[1]; | ||
|
||
private final int maxUnconfirmedReads; | ||
private final long readAheadLimit; | ||
private final Queue<Promise<Response, SFTPException>> unconfirmedReads = new LinkedList<Promise<Response, SFTPException>>(); | ||
private final Queue<Long> unconfirmedReadOffsets = new LinkedList<Long>(); | ||
private final Queue<UnconfirmedRead> unconfirmedReads = new LinkedList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JdkObsolete: It is very rare for LinkedList to out-perform ArrayList or ArrayDeque. Avoid it unless you're willing to invest a lot of time into benchmarking. Caveat: LinkedList supports null elements, but ArrayDeque does not. (details)
(at-me in a reply with help
or ignore
)
this.readAheadLimit = readAheadLimit > 0 ? fileOffset + readAheadLimit : Long.MAX_VALUE; | ||
} | ||
|
||
private ByteArrayInputStream pending = new ByteArrayInputStream(new byte[0]); | ||
|
||
private boolean retrieveUnconfirmedRead(boolean blocking) throws IOException { | ||
if (unconfirmedReads.size() <= 0) { | ||
final UnconfirmedRead unconfirmedRead = unconfirmedReads.peek(); | ||
if (unconfirmedRead == null || !blocking && !unconfirmedRead.getPromise().isDelivered()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OperatorPrecedence: Use grouping parenthesis to make the operator precedence explicit (details)
(at-me in a reply with help
or ignore
)
Thanks for the PR @vladimirlagunov. High quality as always! |
…oteFile Due to a bug in logic introduced by hierynomus#769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested. Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic. The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
…oteFile Due to a bug in logic introduced by hierynomus#769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested. Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic. The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
) Due to a bug in logic introduced by #769, RemoteFile.ReadAheadRemoteFileInputStream started to send new read ahead requests for file parts that had already been requested. Every call to read() asked the server to send parts of the file from the point which is already downloaded. Instead, it should have asked to send parts after the last requested part. This commit adds exactly this logic. The bug didn't cause content corruption. It only affected performance, both on servers and on clients.
If an instance of ReadAheadRemoteFileInputStream before this change is wrapped into a BufferedInputStream with a big buffer, the SSH client requests big packets from the server. It turned out that if the server had sent a response smaller than requested, the client wouldn't have adjusted to decreased window size, and would have read the file incorrectly.
This change detects cases when the server is not able to fulfil client's requests. Since this change, the client adjusts the maximum request length, sends new read-ahead requests, and starts to ignore all read-ahead requests sent earlier.
Just specifying some allegedly small constant buffer size wouldn't have helped in all possible cases. There is no way to explicitly get the maximum request length inside a client. All that limits differ from server to server. For instance, OpenSSH defines SFTP_MAX_MSG_LENGTH as 256 * 1024. Apache SSHD defines MAX_READDATA_PACKET_LENGTH as 63 * 1024, and it allows to redefine that size.
Interestingly, a similar issue #183 was fixed many years ago, but the bug was actually in the code introduced for that fix.