-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TCP Window Scaling #253
Conversation
src/socket/tcp.rs
Outdated
// The window field [...] of every outgoing segment, with the exception of SYN | ||
// segments, is right-shifted by [advertised scale value] bits[...] | ||
reply_repr.window_len = | ||
cmp::min(self.rx_buffer.window() >> self.remote_win_shift as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think self.rx_buffer.window() >> self.remote_win_shift as usize
ought to exist as a function. This would also make it easy to verify that the unscaled window is not used anywhere it shouldn't be.
LGTM, looking forward to the unit tests. |
I believe I have fixed everything, @whitequark. |
Since @whitequark elsewhere said that he actually looks at the full diff, I thought I might as well squash these and skip a step if he approves. |
Now that I've finished the unit tests, I'm trying to characterize these timing issues more. They are all reproduced by transferring a 200 MB file into the server example's sink, with a buffer size of 6553500 bytes. I used an old gparted ISO, which was exactly 255,852,544 bytes. Today, I managed to reproduce a too-fast TCP timeout:
Unfortunately, I cannot capture the complete log, because /tmp can't hold both it and the wireshark capture. According to wireshark, that last ACK and the RST are separated by 15 microseconds. Prior to that was the following events: 44.179957 sender sends the first segment of a burst, seq=176882179, len=1446 It seems like if the buffer is slow to be drained by the application, there is a way for the connection to time out too fast. Any idea why this is? Shouldn't that keep-alive packet have kept it alive? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull request! The changes seem fine to me now. However, I don't think that just one unit test is enough to verify all the added functionality. Can you please add a unit test that includes a scaled window per every expression that works with window sizes that you've changed?
@jhwgh1968 In my tests it works well except for buffer sizes which lead to a situation where a socket can't connect. Your test value 262143 works, 524288 works not, 1000000 works, 1048576 does not etc… I haven't looked into detailed stocket state yet, but maybe you already have an idea?
Edit: Only the value for the rx_buffer leads to this problem. Ah, obviously, tx_buffer doesn't matter. |
I am able to reproduce the problem, @pothos. It seems the socket does not send data with 524288, but does with 524287. Unfortunately, I cannot tell why. In fact, I cannot get the trace output you got by patching httpclient. What did you change to accomplish that? |
@jhwgh1968 I think e.g. for the tap interface setup as mentioned in the README it should be
|
I think I have added all the unit tests, @whitequark. And sure enough, I think they found your bug, @pothos. |
What was the bug? |
It was an off-by-one error in the calculation of the correct shift value, something I was suspicious of when the errors seemed to occur near powers of 2. I am not sure why that bug caused the symptoms @pothos reported, but fixing that made their problem impossible to reproduce. I would also add some other good news: I have concluded that all of my previous issues with timing were mostly an artifact of the way I was testing. Making The sink can now handle a 200 MB file in 1.3 seconds with 6 MiB buffers, instead of 1.5 seconds with 64 KiB buffers. And that was with a number of lost segments, which wasted quite a bit of bandwidth since the Linux remote is quite over-eager with retransmits. Without the change in the assembler, it slowed down a whole lot once smoltcp lost too many frames, but I'm presuming that's Linux' fault (congestion control, maybe?). I don't expect many users will make their receive buffers that big. I also hope implementing sACK will give smoltcp better behavior with large buffers, since it will allow the remote TCP to patch assembler holes much faster. But that's a separate PR. |
@m-labs-homu r+ |
📌 Commit d23033a has been approved by |
@jhwgh1968 How many holes do you think we should track? |
☀️ Test successful - status-travis |
I got consistently good results with 16, but I didn't spend too much time tuning it. I plan to seeing how much sACK improves it first because I presume 4 was picked for a reason, e.g. memory consumption. If that is not the case, then increasing the number of holes would indeed solve the issue as far as I can tell. |
I chose 4 completely arbitrarily. :) |
This should close #106
once I put in a couple unit tests(EDIT: they're done).I would like an early review and a set of independent testing because I suspect this PR may cause future issues.
I did testing with the server example using buffers that were 10, 100, and 1000 times bigger, and did repeated 200 MB file transfers. The expanded windows did seem to behave correctly, and the file was correctly assembled at the other end.
However, my testing also suggested that this increased "speed limit" has made smoltcp quite timing sensitive, and is exposing unrelated edge cases and/or timing bugs.
The most annoying was when, on occasion, it would send a duplicate ACK every 20 microseconds between two received segments for reasons unknown. Another was more rare, but ominous. If segments arrived at a high enough speed with the CPU was starved, they would seem to get lost; the Assembler would read those that made it as "out of order". If the CPU remained starved for long enough, it was possible to exceed its tracking limit and hit a panic.
Both problems were intermittent, and I could not troubleshoot them because adding debug code made them go away. 😒
Also, changes were needed to allow rx_buffer > TCP window, even before any of the actual window sizing changes. It makes me wonder if there are more rx_buffer == TCP window baked-in assumptions that I missed somewhere. I hope not. 🤞
And finally, it seems no one tried to run
cargo bench
lately, or they would have discovered my last patch broke it. 😊 That's fixed in this patch as well.