Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP flows stop transmitting after a while when RTT is very small #6113

Closed
rip-create-your-account opened this issue Jun 3, 2021 · 22 comments
Assignees
Labels
area: networking Issue related to networking type: bug Something isn't working

Comments

@rip-create-your-account
Copy link

For a large number of TCP flows generating lots of traffic the transmitting unexpectedly stops after a short while. I'm using the gonet package to establish many TCP connections and transmit bytes over them. I'm communicating with the host Linux stack over a veth link. I have configured the tcpip.Stack with tcpip.TCPSACKEnabled(true) and tcpip.CongestionControlOption("cubic")

Periodically probing the state of the flows with tcpip.TCPInfoOption revealed that for some of the flows the RTT is 0 after the transmitting stops. Such flows also have SndCwnd:0 which explains the not sending part. Weirdly one flow having RTT of 0 is enough to cause the other flows to stop transmitting too.

6 flows
{RTT:78.244µs RTTVar:153.732µs RTO:200ms State:1 CcState:0 SndCwnd:6307101 SndSsthresh:102973 ReorderSeen:false}
{RTT:84.665µs RTTVar:165.746µs RTO:200ms State:1 CcState:0 SndCwnd:1549153 SndSsthresh:847346 ReorderSeen:false}
{RTT:89.611µs RTTVar:172.463µs RTO:200ms State:1 CcState:0 SndCwnd:400761 SndSsthresh:199531 ReorderSeen:false}
{RTT:9.401µs RTTVar:18.799µs RTO:200ms State:1 CcState:0 SndCwnd:7083391 SndSsthresh:167017 ReorderSeen:false}
{RTT:380ns RTTVar:763ns RTO:200ms State:1 CcState:0 SndCwnd:6371002 SndSsthresh:414132 ReorderSeen:false}
{RTT:0s RTTVar:4ns RTO:200ms State:1 CcState:0 SndCwnd:0 SndSsthresh:1051930 ReorderSeen:false}

Before transmitting stops the flows look like this:

6 flows
{RTT:316.378µs RTTVar:561.974µs RTO:200ms State:1 CcState:0 SndCwnd:7791 SndSsthresh:4294967295 ReorderSeen:false}
{RTT:321.958µs RTTVar:470.527µs RTO:200ms State:1 CcState:0 SndCwnd:214193 SndSsthresh:212219 ReorderSeen:false}
{RTT:750.461µs RTTVar:1.185521ms RTO:200ms State:1 CcState:0 SndCwnd:211726 SndSsthresh:210022 ReorderSeen:false}
{RTT:179.79µs RTTVar:311.117µs RTO:200ms State:1 CcState:0 SndCwnd:528803 SndSsthresh:527604 ReorderSeen:false}
{RTT:92.611µs RTTVar:173.498µs RTO:200ms State:1 CcState:0 SndCwnd:1055615 SndSsthresh:2418 ReorderSeen:false}
{RTT:512.153µs RTTVar:577.385µs RTO:200ms State:1 CcState:0 SndCwnd:100934 SndSsthresh:8227 ReorderSeen:false}

Tracking down the cause I found that RTT of 0 is measured at handleRcvdSegment. It seems that the measured RTT is small enough for s.ep.timestamp() - rcvdSeg.parsedOptions.TSEcr == 0 which then means that updateRTO is called with RTT of 0. This seems to confuse Cubic and others. The comment there mentions clock granularity of a millisecond for timestamps. After modifying the elapsed value to change from 0 to 1ms I have not been able to reproduce the problem. Is this correct?

@rip-create-your-account rip-create-your-account added the type: bug Something isn't working label Jun 3, 2021
@hbhasker
Copy link
Contributor

hbhasker commented Jun 6, 2021

Thanks for the detailed report. We will take a look.

@hbhasker
Copy link
Contributor

hbhasker commented Jun 6, 2021

@nybidari can you take a look?

@ianlewis ianlewis added the area: networking Issue related to networking label Jun 8, 2021
@nybidari
Copy link
Contributor

RTT in netstack is not calculated during the connection establishment phase. It is done after we receive an ACK for the data packet. But having RTT of zero should not cause sndCwnd to be zero. It is weird that having sndCwnd of zero in one flow is causing other flows to stop sending. Also the state of the connection in TCPInfo still shows as Open even after the sndCwnd is zero and the transmission is stopped. I would expect the state to be in SACK or RTO recovery. I was not able to recreate the issue. Were you able to repro?

@rip-create-your-account
Copy link
Author

I managed to create a flaky reproducer https://gist.github.com/rip-create-your-account/ade2eacc4d07f284636a3202f771c861. It creates multiple flows that send data over the loopback link. Roughly 2/3 times it does not exit cleanly because 1-3 flows stop transmitting. The number of parallel flows seems to have the biggest impact in triggering the bug. Also, I found that the program needs to run at least for a few seconds to give the bug enough time to occur.

When the reproducer triggers the bug it will keep printing lines like,

{RTT:0s RTTVar:3ns RTO:200ms State:1 CcState:0 SndCwnd:1 SndSsthresh:401 ReorderSeen:false}
{RTT:0s RTTVar:0s RTO:1s State:1 CcState:0 SndCwnd:10 SndSsthresh:4294967295 ReorderSeen:false}

because these flows stopped making progress.

It is weird that having sndCwnd of zero in one flow is causing other flows to stop sending.

This reproducer does not seem to trigger this bug but a complex integration test of mine consistently does. For now I choose to believe that it's a bug in my application code that is triggered by the bug here.

On closer inspection TCPInfo.SndCwnd is not really zero. TCPInfo.SndCwnd (uint32) is populated by converting from TCPSenderState.SndCwnd (int). The thing is that for the broken flows TCPSenderState.SndCwnd is a huge negative value like -9223372036854775808 which when converted to uint32 becomes 0.

Anyways, I think the issue is caused by RTT of 0ms that sender.handleRcvdSegment sometimes calculates from the timestamp.

@kylecarbs
Copy link

@rip-create-your-account I believe I'm currently running into this... I know it's been a few years and the code has changed, but I'm curious if you managed to solve this.

@nybidari
Copy link
Contributor

nybidari commented May 9, 2023

We were not able to repro the bug. Are you also running into the same issue where RTT is calculated as 0ms? Can you let us know the steps to re-create the issue? Also the output of tcpip.TCPInfoOption would be helpful to know the current state of the TCP connection.

@kylecarbs
Copy link

I'm unfamiliar with gVisor, so I'm somewhat down a rabbit hole. To see the tip of this, check out the PR here: tailscale/tailscale#8106

This is able to reproduce the issue consistently. Don't check out my branch, just check out main and apply the patch, run the test, and you'll see the hang occur.

@kylecarbs
Copy link

@nybidari I wish I had better steps that were in a closer loop, but that's the best I have so far.

@kylecarbs
Copy link

@nybidari if I lower the MinRTO in tcpip/transport/tcp to 50 nanoseconds it's obvious that a hang still occurs, but it's much less frequent. This obviously isn't a solution but might help you debug.

I'm happy to hop on Discord if it'd help. I'm still poking around in the code as well!

@kylecarbs
Copy link

@nybidari this also only happens with TCP SACK enabled, just like reported in this issue.

@nybidari
Copy link
Contributor

nybidari commented May 9, 2023

Thanks for the info. Let me try to repro with the test. Will get back if I need more details.

@kylecarbs
Copy link

@nybidari this only happens when using TCPRACKLossDetection. If I use the other methods of TCP recovery it doesn't happen.

@kylecarbs
Copy link

Hmm, I take that back. I can get it to occur without TCPRACKLossDetection, just extremely rarely.

@mtojek
Copy link

mtojek commented May 10, 2023

We were not able to repro the bug. Are you also running into the same issue where RTT is calculated as 0ms? Can you let us know the steps to re-create the issue? Also the output of tcpip.TCPInfoOption would be helpful to know the current state of the TCP connection.

I think that I managed to reproduce it with wireguard-go, so it's just gVisor, and small wrapper to create TUN device. I used the basic example of TCP server: examples_test.go. netstack.CreateNetTUN is using mostly gVisor packages.

The unstable network behavior that can be observed in tailscale is mimicked with:

if mathrand.Intn(100) > 98 {
   return 0, os.ErrDeadlineExceeded
}

You can run it with the following command, and simply observe the congestion:

go test -timeout 120s -v -run ^TestHanging$ golang.zx2c4.com/wireguard/tun/netstack/examples -count 1

After the test panics, usually transmission is stopped, and you can see the goroutine dump. I can observe many of them just waiting as gvisor.dev/gvisor/pkg/sleep.(*Sleeper).nextWaker.

Side question: does wireguard-go improperly configure a TUN device, is it a matter of RTO fine-tuning, or is it a bug indeed? We're looking forward to ensuring a continuous transmission, it was spotted while investigating issues with a SCP/SSH transmission. With tcpip.TCPSACKEnabled(false) it is obviously continuous, but it doesn't solve the original problem.

@nybidari
Copy link
Contributor

I was able to repro the bug with your test. Will debug the issue more and try to see what is going wrong here.

@kylecarbs
Copy link

@nybidari, any info you'd be willing to share on timeline or priority from the gVisor team?

I'm unsure whether to adjust our network implementation to avoid SACK or wait for a fix.

@nybidari
Copy link
Contributor

I will look into the issue this week, I do not have a fix yet. If it is a blocking issue, then SACK can be disabled for now and re-enabled after the issue is fixed.

@kylecarbs
Copy link

Appreciate it, thank you!

copybara-service bot pushed a commit that referenced this issue Jun 1, 2023
RTT value should not be zero, set the minimum RTT value to 1ms. This does not
happen often and was identified while investigating
http://gvisor.dev/issues/6113.

Updates #6113

PiperOrigin-RevId: 536885961
@kevinGC
Copy link
Collaborator

kevinGC commented Jun 27, 2023

Adding this here so we don't forget it: there's a suspicion that when we RTO, we might be sending the wrong packet. That packet gets sent over and over again, halting TCP progress. It could be a SACK bug, but we're not sure.

@kylecarbs
Copy link

@kevinGC thanks for the update!

@spikecurtis
Copy link
Contributor

@kevinGC to close the loop here, I've been investigating the stalls @kylecarbs @mtojek and I are seeing, and my conclusion is that they are unrelated to the original issue on this thread. c.f. my commment on our repo for details.

One thing I think the gVisor team might be able to help with is the limited buffer for out of order packets, but I've raised a separate issue #9153

@nybidari
Copy link
Contributor

nybidari commented Oct 6, 2023

The original issue here where the rtt was zero in some cases is fixed with this commit:
c77d00a

Closing this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: networking Issue related to networking type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants