Implement TCP Window Scaling #253

jhwgh1968 · 2018-07-09T02:16:30Z

This should close #106 ~~once I put in a couple unit tests~~ (EDIT: they're done).

I would like an early review and a set of independent testing because I suspect this PR may cause future issues.

I did testing with the server example using buffers that were 10, 100, and 1000 times bigger, and did repeated 200 MB file transfers. The expanded windows did seem to behave correctly, and the file was correctly assembled at the other end.

However, my testing also suggested that this increased "speed limit" has made smoltcp quite timing sensitive, and is exposing unrelated edge cases and/or timing bugs.

The most annoying was when, on occasion, it would send a duplicate ACK every 20 microseconds between two received segments for reasons unknown. Another was more rare, but ominous. If segments arrived at a high enough speed with the CPU was starved, they would seem to get lost; the Assembler would read those that made it as "out of order". If the CPU remained starved for long enough, it was possible to exceed its tracking limit and hit a panic.

Both problems were intermittent, and I could not troubleshoot them because adding debug code made them go away. 😒

Also, changes were needed to allow rx_buffer > TCP window, even before any of the actual window sizing changes. It makes me wonder if there are more rx_buffer == TCP window baked-in assumptions that I missed somewhere. I hope not. 🤞

And finally, it seems no one tried to run cargo bench lately, or they would have discovered my last patch broke it. 😊 That's fixed in this patch as well.

whitequark · 2018-07-11T01:23:59Z

src/socket/tcp.rs

+        // The window field [...] of every outgoing segment, with the exception of SYN
+        // segments, is right-shifted by [advertised scale value] bits[...]
+        reply_repr.window_len =
+            cmp::min(self.rx_buffer.window() >> self.remote_win_shift as usize,


I think self.rx_buffer.window() >> self.remote_win_shift as usize ought to exist as a function. This would also make it easy to verify that the unscaled window is not used anywhere it shouldn't be.

whitequark · 2018-07-11T01:24:21Z

LGTM, looking forward to the unit tests.

jhwgh1968 · 2018-07-22T20:26:20Z

I believe I have fixed everything, @whitequark.

jhwgh1968 · 2018-07-27T01:41:30Z

Since @whitequark elsewhere said that he actually looks at the full diff, I thought I might as well squash these and skip a step if he approves.

jhwgh1968 · 2018-07-29T18:38:07Z

Now that I've finished the unit tests, I'm trying to characterize these timing issues more.

They are all reproduced by transferring a 200 MB file into the server example's sink, with a buffer size of 6553500 bytes. I used an old gparted ISO, which was exactly 255,852,544 bytes.

Today, I managed to reproduce a too-fast TCP timeout:

[1532887156.197s] (server): tcp:6971 recv 8 octets
[1532887156.744s] <- EthernetII src=8a-0f-8b-8c-c4-97 dst=02-00-00-00-00-01 type=IPv4
                \ IPv4 src=192.168.69.100 dst=192.168.69.1 proto=TCP
                 \ TCP src=50472 dst=6971 seq=3181168926 ack=1290742821 win=229 len=0
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: received a keep-alive or window probe packet, will send an ACK
[1532887156.744s] -> EthernetII src=02-00-00-00-00-01 dst=8a-0f-8b-8c-c4-97 type=IPv4
                \ IPv4 src=192.168.69.1 dst=192.168.69.100 proto=TCP
                 \ TCP src=6971 dst=50472 seq=1290742821 ack=3181168927 win=0 len=0
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: timeout exceeded
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: state=ESTABLISHED=>CLOSED
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: outgoing segment will abort connection
[1532887156.744s] (socket::tcp): #3:192.168.69.1:6971:192.168.69.100:50472: sending RST|ACK

Unfortunately, I cannot capture the complete log, because /tmp can't hold both it and the wireshark capture.

According to wireshark, that last ACK and the RST are separated by 15 microseconds. Prior to that was the following events:

44.179957 sender sends the first segment of a burst, seq=176882179, len=1446
44.180065 sender sends the last segment of a burst, seq=176942911, len=1446
44.180066 sender transfers 94 bytes, seq=176944357, which fills the receiver's window
44.201210 receiver sends ACK 176944451, indicates window length 0
44.747640 sender sends keep-alive (546 ms after zero window ack)
44.748426 receiver sends identical ACK, window length still zero
44.748841 receiver sends RST/ACK (4 ms after last ack)

It seems like if the buffer is slow to be drained by the application, there is a way for the connection to time out too fast. Any idea why this is? Shouldn't that keep-alive packet have kept it alive?

whitequark

Thanks for this pull request! The changes seem fine to me now. However, I don't think that just one unit test is enough to verify all the added functionality. Can you please add a unit test that includes a scaled window per every expression that works with window sizes that you've changed?

pothos · 2018-08-06T08:01:00Z

@jhwgh1968 In my tests it works well except for buffer sizes which lead to a situation where a socket can't connect. Your test value 262143 works, 524288 works not, 1000000 works, 1048576 does not etc… I haven't looked into detailed stocket state yet, but maybe you already have an idea?

correct behavior for e.g. buffer size 2007152
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: state=CLOSED=>SYN-SENT
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: outgoing segment will update window
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: sending SYN
DEBUG 2018-08-06T07:46:44Z: smoltcp::iface::ethernet: address 192.168.1.1 not in neighbor cache, sending ARP request
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::meta: #0: neighbor 1.0.0.1 missing, silencing until t+3.0s
TRACE 2018-08-06T07:46:44Z: smoltcp::iface::neighbor: filled 192.168.1.1 => xx-…-xx (was empty)
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::meta: #0: neighbor 1.0.0.1 discovered, unsilencing
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: outgoing segment will update window
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: sending SYN
TRACE 2018-08-06T07:46:44Z: smoltcp::socket::tcp: #0:0.0.0.0:52126:1.0.0.1:80: received SYN|ACK

misbehavior for e.g. buffer size 2097152:
TRACE 2018-08-06T07:49:27Z: smoltcp::socket::tcp: #0:0.0.0.0:50008:1.0.0.1:80: state=CLOSED=>SYN-SENT
DEBUG 2018-08-06T07:49:30Z: smoltcp::iface::ethernet: address 192.168.1.1 not in neighbor cache, sending ARP request
DEBUG 2018-08-06T07:49:30Z: smoltcp::iface::ethernet: cannot dispatch response packet: unaddressable destination
poll result: Unaddressable

Edit: Only the value for the rx_buffer leads to this problem. Ah, obviously, tx_buffer doesn't matter.

jhwgh1968 · 2018-08-13T04:57:41Z

I am able to reproduce the problem, @pothos. It seems the socket does not send data with 524288, but does with 524287.

Unfortunately, I cannot tell why. In fact, I cannot get the trace output you got by patching httpclient. What did you change to accomplish that?

pothos · 2018-08-13T12:45:45Z

@jhwgh1968 I think e.g. for the tap interface setup as mentioned in the README it should be

cargo build --examples
RUST_LOG=trace target/debug/examples/httpclient tap0 1.1.1.1 http://1.1.1.1

jhwgh1968 · 2018-08-19T20:45:41Z

I think I have added all the unit tests, @whitequark. And sure enough, I think they found your bug, @pothos.

whitequark · 2018-08-19T21:04:41Z

And sure enough, I think they found your bug, @pothos.

What was the bug?

jhwgh1968 · 2018-08-19T22:53:14Z

What was the bug?

It was an off-by-one error in the calculation of the correct shift value, something I was suspicious of when the errors seemed to occur near powers of 2. I am not sure why that bug caused the symptoms @pothos reported, but fixing that made their problem impossible to reproduce.

I would also add some other good news: I have concluded that all of my previous issues with timing were mostly an artifact of the way I was testing. Making socket.poll() not panic on failure in the examples and temporarily changing the assembler to track more than four holes made the stack much more solid.

The sink can now handle a 200 MB file in 1.3 seconds with 6 MiB buffers, instead of 1.5 seconds with 64 KiB buffers. And that was with a number of lost segments, which wasted quite a bit of bandwidth since the Linux remote is quite over-eager with retransmits.

Without the change in the assembler, it slowed down a whole lot once smoltcp lost too many frames, but I'm presuming that's Linux' fault (congestion control, maybe?). I don't expect many users will make their receive buffers that big. I also hope implementing sACK will give smoltcp better behavior with large buffers, since it will allow the remote TCP to patch assembler holes much faster. But that's a separate PR.

whitequark · 2018-08-19T22:57:32Z

@m-labs-homu r+

m-labs-homu · 2018-08-19T22:57:35Z

📌 Commit d23033a has been approved by whitequark

m-labs-homu · 2018-08-19T22:57:50Z

⌛ Testing commit d23033a with merge 92e970b...

whitequark · 2018-08-19T22:58:01Z

@jhwgh1968 How many holes do you think we should track?

m-labs-homu · 2018-08-19T23:06:22Z

☀️ Test successful - status-travis
Approved by: whitequark
Pushing 92e970b to master...

jhwgh1968 · 2018-08-19T23:10:19Z

I got consistently good results with 16, but I didn't spend too much time tuning it.

I plan to seeing how much sACK improves it first because I presume 4 was picked for a reason, e.g. memory consumption. If that is not the case, then increasing the number of holes would indeed solve the issue as far as I can tell.

whitequark · 2018-08-19T23:12:16Z

I plan to seeing how much sACK improves it first because I presume 4 was picked for a reason, e.g. memory consumption. If that is not the case, then increasing the number of holes would indeed solve the issue as far as I can tell.

I chose 4 completely arbitrarily. :)

whitequark reviewed Jul 11, 2018

View reviewed changes

jhwgh1968 changed the title ~~[WIP] Implement TCP Window Scaling~~ Implement TCP Window Scaling Jul 22, 2018

whitequark requested changes Jul 31, 2018

View reviewed changes

whitequark mentioned this pull request Aug 3, 2018

kasli: connection reset under certain configuration m-labs/artiq#1125

Closed

jhwgh1968 added 2 commits August 19, 2018 15:32

Fully implement TCP Window Scaling

fff91c2

Merge branch 'master' into tcp-window-scaling-phase-2

d23033a

m-labs-homu closed this in 92e970b Aug 19, 2018

jhwgh1968 deleted the tcp-window-scaling-phase-2 branch August 19, 2018 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement TCP Window Scaling #253

Implement TCP Window Scaling #253

jhwgh1968 commented Jul 9, 2018 •

edited

Loading

whitequark Jul 11, 2018

whitequark commented Jul 11, 2018

jhwgh1968 commented Jul 22, 2018

jhwgh1968 commented Jul 27, 2018

jhwgh1968 commented Jul 29, 2018

whitequark left a comment •

edited

Loading

pothos commented Aug 6, 2018 •

edited

Loading

jhwgh1968 commented Aug 13, 2018

pothos commented Aug 13, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

whitequark commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

Implement TCP Window Scaling #253

Implement TCP Window Scaling #253

Conversation

jhwgh1968 commented Jul 9, 2018 • edited Loading

whitequark Jul 11, 2018

Choose a reason for hiding this comment

whitequark commented Jul 11, 2018

jhwgh1968 commented Jul 22, 2018

jhwgh1968 commented Jul 27, 2018

jhwgh1968 commented Jul 29, 2018

whitequark left a comment • edited Loading

Choose a reason for hiding this comment

pothos commented Aug 6, 2018 • edited Loading

jhwgh1968 commented Aug 13, 2018

pothos commented Aug 13, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

whitequark commented Aug 19, 2018

m-labs-homu commented Aug 19, 2018

jhwgh1968 commented Aug 19, 2018

whitequark commented Aug 19, 2018

jhwgh1968 commented Jul 9, 2018 •

edited

Loading

whitequark left a comment •

edited

Loading

pothos commented Aug 6, 2018 •

edited

Loading