Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement TCP Window Scaling #253

Closed
wants to merge 2 commits into from
Closed

Implement TCP Window Scaling #253

wants to merge 2 commits into from

Conversation

jhwgh1968
Copy link
Contributor

@jhwgh1968 jhwgh1968 commented Jul 9, 2018

This should close #106 once I put in a couple unit tests (EDIT: they're done).

I would like an early review and a set of independent testing because I suspect this PR may cause future issues.

I did testing with the server example using buffers that were 10, 100, and 1000 times bigger, and did repeated 200 MB file transfers. The expanded windows did seem to behave correctly, and the file was correctly assembled at the other end.

However, my testing also suggested that this increased "speed limit" has made smoltcp quite timing sensitive, and is exposing unrelated edge cases and/or timing bugs.

The most annoying was when, on occasion, it would send a duplicate ACK every 20 microseconds between two received segments for reasons unknown. Another was more rare, but ominous. If segments arrived at a high enough speed with the CPU was starved, they would seem to get lost; the Assembler would read those that made it as "out of order". If the CPU remained starved for long enough, it was possible to exceed its tracking limit and hit a panic.

Both problems were intermittent, and I could not troubleshoot them because adding debug code made them go away. 😒

Also, changes were needed to allow rx_buffer > TCP window, even before any of the actual window sizing changes. It makes me wonder if there are more rx_buffer == TCP window baked-in assumptions that I missed somewhere. I hope not. 🤞

And finally, it seems no one tried to run cargo bench lately, or they would have discovered my last patch broke it. 😊 That's fixed in this patch as well.

// The window field [...] of every outgoing segment, with the exception of SYN
// segments, is right-shifted by [advertised scale value] bits[...]
reply_repr.window_len =
cmp::min(self.rx_buffer.window() >> self.remote_win_shift as usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think self.rx_buffer.window() >> self.remote_win_shift as usize ought to exist as a function. This would also make it easy to verify that the unscaled window is not used anywhere it shouldn't be.

@whitequark
Copy link
Contributor

LGTM, looking forward to the unit tests.

@jhwgh1968 jhwgh1968 changed the title [WIP] Implement TCP Window Scaling Implement TCP Window Scaling Jul 22, 2018
@jhwgh1968
Copy link
Contributor Author

I believe I have fixed everything, @whitequark.

@jhwgh1968
Copy link
Contributor Author

Since @whitequark elsewhere said that he actually looks at the full diff, I thought I might as well squash these and skip a step if he approves.

@jhwgh1968
Copy link
Contributor Author

Now that I've finished the unit tests, I'm trying to characterize these timing issues more.

They are all reproduced by transferring a 200 MB file into the server example's sink, with a buffer size of 6553500 bytes. I used an old gparted ISO, which was exactly 255,852,544 bytes.

Today, I managed to reproduce a too-fast TCP timeout:

[1532887156.197s] (server): tcp:6971 recv 8 octets
[1532887156.744s] <- EthernetII src=8a-0f-8b-8c-c4-97 dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=50472 dst=6971 seq=3181168926 ack=1290742821 win=229 len=0
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: received a keep-alive or window probe packet, will send an ACK
[1532887156.744s] -> EthernetII src=02-00-00-00-00-01 dst=8a-0f-8b-8c-c4-97 type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=50472 seq=1290742821 ack=3181168927 win=0 len=0
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: timeout exceeded
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: state=ESTABLISHED=>CLOSED
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: outgoing segment will abort connection
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: sending RST|ACK

Unfortunately, I cannot capture the complete log, because /tmp can't hold both it and the wireshark capture.

According to wireshark, that last ACK and the RST are separated by 15 microseconds. Prior to that was the following events:

44.179957 sender sends the first segment of a burst, seq=176882179, len=1446
44.180065 sender sends the last segment of a burst, seq=176942911, len=1446
44.180066 sender transfers 94 bytes, seq=176944357, which fills the receiver's window
44.201210 receiver sends ACK 176944451, indicates window length 0
44.747640 sender sends keep-alive (546 ms after zero window ack)
44.748426 receiver sends identical ACK, window length still zero
44.748841 receiver sends RST/ACK (4 ms after last ack)

It seems like if the buffer is slow to be drained by the application, there is a way for the connection to time out too fast. Any idea why this is? Shouldn't that keep-alive packet have kept it alive?

Copy link
Contributor

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request! The changes seem fine to me now. However, I don't think that just one unit test is enough to verify all the added functionality. Can you please add a unit test that includes a scaled window per every expression that works with window sizes that you've changed?

@pothos
Copy link
Contributor

pothos commented Aug 6, 2018

@jhwgh1968 In my tests it works well except for buffer sizes which lead to a situation where a socket can't connect. Your test value 262143 works, 524288 works not, 1000000 works, 1048576 does not etc… I haven't looked into detailed stocket state yet, but maybe you already have an idea?

correct behavior for e.g. buffer size 2007152
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: state=CLOSED=>SYN-SENT
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: outgoing segment will update window
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: sending SYN
DEBUG 2018-08-06T07:46:44Z: smoltcp::iface::ethernet: address 192.168.1.1 not in neighbor cache, sending ARP request
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::meta: #0: neighbor 1.0.0.1 missing, silencing until t+3.0s
TRACE 2018-08-06T07:46:44Z: smoltcp::iface::neighbor: filled 192.168.1.1 => xx-…-xx (was empty)
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::meta: #0: neighbor 1.0.0.1 discovered, unsilencing
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: outgoing segment will update window
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: sending SYN
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: received SYN|ACK

misbehavior for e.g. buffer size 2097152:
TRACE 2018-08-06T07:49:27Z: smoltcp::socket::tcp: #0:0.0.0.0:50008:1.0.0.1:80: state=CLOSED=>SYN-SENT
DEBUG 2018-08-06T07:49:30Z: smoltcp::iface::ethernet: address 192.168.1.1 not in neighbor cache, sending ARP request
DEBUG 2018-08-06T07:49:30Z: smoltcp::iface::ethernet: cannot dispatch response packet: unaddressable destination
poll result: Unaddressable

Edit: Only the value for the rx_buffer leads to this problem. Ah, obviously, tx_buffer doesn't matter.

@jhwgh1968
Copy link
Contributor Author

I am able to reproduce the problem, @pothos. It seems the socket does not send data with 524288, but does with 524287.

Unfortunately, I cannot tell why. In fact, I cannot get the trace output you got by patching httpclient. What did you change to accomplish that?

@pothos
Copy link
Contributor

pothos commented Aug 13, 2018

@jhwgh1968 I think e.g. for the tap interface setup as mentioned in the README it should be

cargo build --examples
RUST_LOG=trace target/debug/examples/httpclient tap0 1.1.1.1 http://1.1.1.1

@jhwgh1968
Copy link
Contributor Author

I think I have added all the unit tests, @whitequark. And sure enough, I think they found your bug, @pothos.

@whitequark
Copy link
Contributor

And sure enough, I think they found your bug, @pothos.

What was the bug?

@jhwgh1968
Copy link
Contributor Author

What was the bug?

It was an off-by-one error in the calculation of the correct shift value, something I was suspicious of when the errors seemed to occur near powers of 2. I am not sure why that bug caused the symptoms @pothos reported, but fixing that made their problem impossible to reproduce.

I would also add some other good news: I have concluded that all of my previous issues with timing were mostly an artifact of the way I was testing. Making socket.poll() not panic on failure in the examples and temporarily changing the assembler to track more than four holes made the stack much more solid.

The sink can now handle a 200 MB file in 1.3 seconds with 6 MiB buffers, instead of 1.5 seconds with 64 KiB buffers. And that was with a number of lost segments, which wasted quite a bit of bandwidth since the Linux remote is quite over-eager with retransmits.

Without the change in the assembler, it slowed down a whole lot once smoltcp lost too many frames, but I'm presuming that's Linux' fault (congestion control, maybe?). I don't expect many users will make their receive buffers that big. I also hope implementing sACK will give smoltcp better behavior with large buffers, since it will allow the remote TCP to patch assembler holes much faster. But that's a separate PR.

@whitequark
Copy link
Contributor

@m-labs-homu r+

@m-labs-homu
Copy link

📌 Commit d23033a has been approved by whitequark

@m-labs-homu
Copy link

⌛ Testing commit d23033a with merge 92e970b...

@whitequark
Copy link
Contributor

@jhwgh1968 How many holes do you think we should track?

@m-labs-homu
Copy link

☀️ Test successful - status-travis
Approved by: whitequark
Pushing 92e970b to master...

@jhwgh1968
Copy link
Contributor Author

I got consistently good results with 16, but I didn't spend too much time tuning it.

I plan to seeing how much sACK improves it first because I presume 4 was picked for a reason, e.g. memory consumption. If that is not the case, then increasing the number of holes would indeed solve the issue as far as I can tell.

@jhwgh1968 jhwgh1968 deleted the tcp-window-scaling-phase-2 branch August 19, 2018 23:10
@whitequark
Copy link
Contributor

I plan to seeing how much sACK improves it first because I presume 4 was picked for a reason, e.g. memory consumption. If that is not the case, then increasing the number of holes would indeed solve the issue as far as I can tell.

I chose 4 completely arbitrarily. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Implement TCP window scaling
4 participants