TCP flows stop transmitting after a while when RTT is very small #6113

rip-create-your-account · 2021-06-03T07:31:42Z

For a large number of TCP flows generating lots of traffic the transmitting unexpectedly stops after a short while. I'm using the gonet package to establish many TCP connections and transmit bytes over them. I'm communicating with the host Linux stack over a veth link. I have configured the tcpip.Stack with tcpip.TCPSACKEnabled(true) and tcpip.CongestionControlOption("cubic")

Periodically probing the state of the flows with tcpip.TCPInfoOption revealed that for some of the flows the RTT is 0 after the transmitting stops. Such flows also have SndCwnd:0 which explains the not sending part. Weirdly one flow having RTT of 0 is enough to cause the other flows to stop transmitting too.

6 flows
{RTT:78.244µs RTTVar:153.732µs RTO:200ms State:1 CcState:0 SndCwnd:6307101 SndSsthresh:102973 ReorderSeen:false}
{RTT:84.665µs RTTVar:165.746µs RTO:200ms State:1 CcState:0 SndCwnd:1549153 SndSsthresh:847346 ReorderSeen:false}
{RTT:89.611µs RTTVar:172.463µs RTO:200ms State:1 CcState:0 SndCwnd:400761 SndSsthresh:199531 ReorderSeen:false}
{RTT:9.401µs RTTVar:18.799µs RTO:200ms State:1 CcState:0 SndCwnd:7083391 SndSsthresh:167017 ReorderSeen:false}
{RTT:380ns RTTVar:763ns RTO:200ms State:1 CcState:0 SndCwnd:6371002 SndSsthresh:414132 ReorderSeen:false}
{RTT:0s RTTVar:4ns RTO:200ms State:1 CcState:0 SndCwnd:0 SndSsthresh:1051930 ReorderSeen:false}

Before transmitting stops the flows look like this:

6 flows
{RTT:316.378µs RTTVar:561.974µs RTO:200ms State:1 CcState:0 SndCwnd:7791 SndSsthresh:4294967295 ReorderSeen:false}
{RTT:321.958µs RTTVar:470.527µs RTO:200ms State:1 CcState:0 SndCwnd:214193 SndSsthresh:212219 ReorderSeen:false}
{RTT:750.461µs RTTVar:1.185521ms RTO:200ms State:1 CcState:0 SndCwnd:211726 SndSsthresh:210022 ReorderSeen:false}
{RTT:179.79µs RTTVar:311.117µs RTO:200ms State:1 CcState:0 SndCwnd:528803 SndSsthresh:527604 ReorderSeen:false}
{RTT:92.611µs RTTVar:173.498µs RTO:200ms State:1 CcState:0 SndCwnd:1055615 SndSsthresh:2418 ReorderSeen:false}
{RTT:512.153µs RTTVar:577.385µs RTO:200ms State:1 CcState:0 SndCwnd:100934 SndSsthresh:8227 ReorderSeen:false}

Tracking down the cause I found that RTT of 0 is measured at handleRcvdSegment. It seems that the measured RTT is small enough for s.ep.timestamp() - rcvdSeg.parsedOptions.TSEcr == 0 which then means that updateRTO is called with RTT of 0. This seems to confuse Cubic and others. The comment there mentions clock granularity of a millisecond for timestamps. After modifying the elapsed value to change from 0 to 1ms I have not been able to reproduce the problem. Is this correct?

The text was updated successfully, but these errors were encountered:

hbhasker · 2021-06-06T14:33:35Z

Thanks for the detailed report. We will take a look.

hbhasker · 2021-06-06T14:34:12Z

@nybidari can you take a look?

nybidari · 2021-08-18T18:13:59Z

RTT in netstack is not calculated during the connection establishment phase. It is done after we receive an ACK for the data packet. But having RTT of zero should not cause sndCwnd to be zero. It is weird that having sndCwnd of zero in one flow is causing other flows to stop sending. Also the state of the connection in TCPInfo still shows as Open even after the sndCwnd is zero and the transmission is stopped. I would expect the state to be in SACK or RTO recovery. I was not able to recreate the issue. Were you able to repro?

rip-create-your-account · 2021-08-22T09:26:33Z

I managed to create a flaky reproducer https://gist.github.com/rip-create-your-account/ade2eacc4d07f284636a3202f771c861. It creates multiple flows that send data over the loopback link. Roughly 2/3 times it does not exit cleanly because 1-3 flows stop transmitting. The number of parallel flows seems to have the biggest impact in triggering the bug. Also, I found that the program needs to run at least for a few seconds to give the bug enough time to occur.

When the reproducer triggers the bug it will keep printing lines like,

{RTT:0s RTTVar:3ns RTO:200ms State:1 CcState:0 SndCwnd:1 SndSsthresh:401 ReorderSeen:false}
{RTT:0s RTTVar:0s RTO:1s State:1 CcState:0 SndCwnd:10 SndSsthresh:4294967295 ReorderSeen:false}

because these flows stopped making progress.

It is weird that having sndCwnd of zero in one flow is causing other flows to stop sending.

This reproducer does not seem to trigger this bug but a complex integration test of mine consistently does. For now I choose to believe that it's a bug in my application code that is triggered by the bug here.

On closer inspection TCPInfo.SndCwnd is not really zero. TCPInfo.SndCwnd (uint32) is populated by converting from TCPSenderState.SndCwnd (int). The thing is that for the broken flows TCPSenderState.SndCwnd is a huge negative value like -9223372036854775808 which when converted to uint32 becomes 0.

Anyways, I think the issue is caused by RTT of 0ms that sender.handleRcvdSegment sometimes calculates from the timestamp.

kylecarbs · 2023-05-09T18:35:33Z

@rip-create-your-account I believe I'm currently running into this... I know it's been a few years and the code has changed, but I'm curious if you managed to solve this.

nybidari · 2023-05-09T18:57:04Z

We were not able to repro the bug. Are you also running into the same issue where RTT is calculated as 0ms? Can you let us know the steps to re-create the issue? Also the output of tcpip.TCPInfoOption would be helpful to know the current state of the TCP connection.

kylecarbs · 2023-05-09T19:19:09Z

I'm unfamiliar with gVisor, so I'm somewhat down a rabbit hole. To see the tip of this, check out the PR here: tailscale/tailscale#8106

This is able to reproduce the issue consistently. Don't check out my branch, just check out main and apply the patch, run the test, and you'll see the hang occur.

kylecarbs · 2023-05-09T19:22:27Z

@nybidari I wish I had better steps that were in a closer loop, but that's the best I have so far.

kylecarbs · 2023-05-09T19:27:03Z

@nybidari if I lower the MinRTO in tcpip/transport/tcp to 50 nanoseconds it's obvious that a hang still occurs, but it's much less frequent. This obviously isn't a solution but might help you debug.

I'm happy to hop on Discord if it'd help. I'm still poking around in the code as well!

kylecarbs · 2023-05-09T19:50:01Z

@nybidari this also only happens with TCP SACK enabled, just like reported in this issue.

nybidari · 2023-05-09T19:52:27Z

Thanks for the info. Let me try to repro with the test. Will get back if I need more details.

kylecarbs · 2023-05-09T19:55:28Z

@nybidari this only happens when using TCPRACKLossDetection. If I use the other methods of TCP recovery it doesn't happen.

kylecarbs · 2023-05-09T20:11:30Z

Hmm, I take that back. I can get it to occur without TCPRACKLossDetection, just extremely rarely.

mtojek · 2023-05-10T09:04:53Z

We were not able to repro the bug. Are you also running into the same issue where RTT is calculated as 0ms? Can you let us know the steps to re-create the issue? Also the output of tcpip.TCPInfoOption would be helpful to know the current state of the TCP connection.

I think that I managed to reproduce it with wireguard-go, so it's just gVisor, and small wrapper to create TUN device. I used the basic example of TCP server: examples_test.go. netstack.CreateNetTUN is using mostly gVisor packages.

The unstable network behavior that can be observed in tailscale is mimicked with:

if mathrand.Intn(100) > 98 {
   return 0, os.ErrDeadlineExceeded
}

You can run it with the following command, and simply observe the congestion:

go test -timeout 120s -v -run ^TestHanging$ golang.zx2c4.com/wireguard/tun/netstack/examples -count 1

After the test panics, usually transmission is stopped, and you can see the goroutine dump. I can observe many of them just waiting as gvisor.dev/gvisor/pkg/sleep.(*Sleeper).nextWaker.

Side question: does wireguard-go improperly configure a TUN device, is it a matter of RTO fine-tuning, or is it a bug indeed? We're looking forward to ensuring a continuous transmission, it was spotted while investigating issues with a SCP/SSH transmission. With tcpip.TCPSACKEnabled(false) it is obviously continuous, but it doesn't solve the original problem.

nybidari · 2023-05-10T19:24:28Z

I was able to repro the bug with your test. Will debug the issue more and try to see what is going wrong here.

kylecarbs · 2023-05-12T19:48:52Z

@nybidari, any info you'd be willing to share on timeline or priority from the gVisor team?

I'm unsure whether to adjust our network implementation to avoid SACK or wait for a fix.

nybidari · 2023-05-15T00:08:37Z

I will look into the issue this week, I do not have a fix yet. If it is a blocking issue, then SACK can be disabled for now and re-enabled after the issue is fixed.

kylecarbs · 2023-05-15T02:21:27Z

Appreciate it, thank you!

RTT value should not be zero, set the minimum RTT value to 1ms. This does not happen often and was identified while investigating http://gvisor.dev/issues/6113. Updates #6113 PiperOrigin-RevId: 536885961

kevinGC · 2023-06-27T17:50:47Z

Adding this here so we don't forget it: there's a suspicion that when we RTO, we might be sending the wrong packet. That packet gets sent over and over again, halting TCP progress. It could be a SACK bug, but we're not sure.

kylecarbs · 2023-07-06T16:12:08Z

@kevinGC thanks for the update!

spikecurtis · 2023-07-11T06:11:44Z

@kevinGC to close the loop here, I've been investigating the stalls @kylecarbs @mtojek and I are seeing, and my conclusion is that they are unrelated to the original issue on this thread. c.f. my commment on our repo for details.

One thing I think the gVisor team might be able to help with is the limited buffer for out of order packets, but I've raised a separate issue #9153

nybidari · 2023-10-06T17:45:04Z

The original issue here where the rtt was zero in some cases is fixed with this commit:
c77d00a

Closing this bug.

rip-create-your-account added the type: bug Something isn't working label Jun 3, 2021

hbhasker assigned nybidari Jun 6, 2021

ianlewis added the area: networking Issue related to networking label Jun 8, 2021

kylecarbs mentioned this issue May 9, 2023

wgengine/netstack: switch to cubic congestion control tailscale/tailscale#8106

Closed

This was referenced May 15, 2023

fix: use mtojek/gvisor to tweak RTO coder/coder#7500

Merged

Connection stalls with coder ssh connections coder/coder#7388

Closed

copybara-service bot mentioned this issue Jun 1, 2023

Set minimum value for RTT in TCP. #9032

Closed

DentonGentry mentioned this issue Jun 4, 2023

derp/derphttp: NotePreferred is sent synchronously which causes failures with blocking IO tailscale/tailscale#7557

Open

spikecurtis mentioned this issue Jul 11, 2023

TCP connections can stall when in-flight data exceeds 25% of receive buffer #9153

Closed

nybidari closed this as completed Oct 6, 2023

tmm1 mentioned this issue Oct 10, 2023

Slow throughput over tsnet compared to Windows tailscaled tailscale/tailscale#9707

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TCP flows stop transmitting after a while when RTT is very small #6113

TCP flows stop transmitting after a while when RTT is very small #6113

rip-create-your-account commented Jun 3, 2021

hbhasker commented Jun 6, 2021

hbhasker commented Jun 6, 2021 •

edited

Loading

nybidari commented Aug 18, 2021

rip-create-your-account commented Aug 22, 2021

kylecarbs commented May 9, 2023

nybidari commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

nybidari commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

mtojek commented May 10, 2023 •

edited

Loading

nybidari commented May 10, 2023

kylecarbs commented May 12, 2023

nybidari commented May 15, 2023

kylecarbs commented May 15, 2023

kevinGC commented Jun 27, 2023

kylecarbs commented Jul 6, 2023

spikecurtis commented Jul 11, 2023

nybidari commented Oct 6, 2023

TCP flows stop transmitting after a while when RTT is very small #6113

TCP flows stop transmitting after a while when RTT is very small #6113

Comments

rip-create-your-account commented Jun 3, 2021

hbhasker commented Jun 6, 2021

hbhasker commented Jun 6, 2021 • edited Loading

nybidari commented Aug 18, 2021

rip-create-your-account commented Aug 22, 2021

kylecarbs commented May 9, 2023

nybidari commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

nybidari commented May 9, 2023

kylecarbs commented May 9, 2023

kylecarbs commented May 9, 2023

mtojek commented May 10, 2023 • edited Loading

nybidari commented May 10, 2023

kylecarbs commented May 12, 2023

nybidari commented May 15, 2023

kylecarbs commented May 15, 2023

kevinGC commented Jun 27, 2023

kylecarbs commented Jul 6, 2023

spikecurtis commented Jul 11, 2023

nybidari commented Oct 6, 2023

hbhasker commented Jun 6, 2021 •

edited

Loading

mtojek commented May 10, 2023 •

edited

Loading