Stuck at retransmitting #224

pothos · 2018-06-01T05:31:35Z

This code here https://github.com/m-labs/smoltcp/blob/master/src/socket/tcp.rs#L913 should either reset local_rx_dup_acks or let it count from 255 to 0 with a overflowing add or stay at 255 with a saturation. What is the desired behavior?

                        self.local_rx_dup_acks += 1;
                        if self.local_rx_dup_acks == 3 {
                        }

thread 'main' panicked at 'attempt to add with overflow', smoltcp/src/socket/tcp.rs:913:25

The text was updated successfully, but these errors were encountered:

pothos · 2018-06-01T08:17:11Z

Actually master stopped working for me, I try to find out if related to the fast retrainsmit code.

pothos · 2018-06-01T08:24:57Z

Yes, both endpoints are stuck in retransmitting…

whitequark · 2018-06-01T14:43:03Z

cc @barskern

barskern · 2018-06-01T23:41:42Z

@pothos Do you think you would be able to write a test which mimics this behavior? Or explain me how it happened so that I could write a test?

I would think that the desired behavior is to do a saturating addition, and change the condition from if self.local_rx_dup_acks == 3 to if self.local_rx_dup_acks >= 3. This would trigger a fast retransmit for every segment which is a duplicate ACK when we have recived more than 3 duplicate ACKs. However I did not find documentation that indicates that this is appropriate behavior for a TCP-implementation.

whitequark · 2018-06-02T00:25:29Z

This would trigger a fast retransmit for every segment which is a duplicate ACK when we have recived more than 3 duplicate ACKs.

I'm not sure that this is right. Per my understanding, with the modified code, if e.g. first packet in a burst of 8 gets lost, the other 7 will arrive, and 4 of them will trigger retransmit of the entire burst. Is that correct?

barskern · 2018-06-02T09:39:02Z

Per my understanding, with the modified code, if e.g. first packet in a burst of 8 gets lost, the other 7 will arrive, and 4 of them will trigger retransmit of the entire burst. Is that correct?

This is the reason I previously used == 3. However if we recive a burst of pacakges and the socket is not polled until all 8 packages have recived, I think there will only be 1 actual retransmit. This is because the way the socket retransmits is by setting an immidiate timeout, not directly sending a packet. Or thats what I think.. Perhaps we could find a way to test this theory.

This is such a edge case though. What is the probablility that we lose a packet and then lose the exact same package again in the retransmit, when all the other packages arrived. I think this is why I can't find a document about it online.

barskern · 2018-06-02T10:01:52Z

After some more research, I found that what should happen after three duplicate ACKs is recived is that the socket should enter "Fast recovery" and every following duplicate ACK only increases the congestion window.

After the fast retransmit algorithm sends what appears to be the
missing segment, the "fast recovery" algorithm governs the
transmission of new data until a non-duplicate ACK arrives. RFC 5681, page 9.

Hence since there hasn't been implemented Congestion Control in this library, I think we just have to "ignore" further duplicate ACKs and wait for timeout to retransmit again.

pothos · 2018-06-04T15:01:06Z

Hello,
sorry for the late reply, the overflow was the first thing I could find as bug in my problem. Now I shortened the logs to a sane size so I can upload them. Basically, the code overreacts and somehow both sides are stuck retransmitting and never leave the poll call again (might be related to #226). I think there is more than one case where the retransmission should not happen.

In these logs, the client sends data.
logclient.txt
logserver.txt
tcpdumpclient.txt
tcpdumpserver.txt

pothos · 2018-06-04T15:02:52Z

Ah, I applied the proposed patch with a saturating add, btw. So the numbers jumping back again should not be caused by overflows.

barskern · 2018-06-04T23:34:40Z

@pothos I found a bug in the fast retransmit code based on your logs!

When there is content in a package, one should not count the ACK as a duplicate ACK! New PR incoming 👍

pothos · 2018-06-05T04:29:42Z

@barskern Yes, that's one point, but also the retransmit should only be scheduled if there is data to be sent out. In the tcpdump in the end the packets of both sides just have length 0.

barskern · 2018-06-05T11:23:31Z

@pothos yes that is correct. I will look into that. I think I have to move the detection of duplicate ACKs to a different branch of the bigger match within the process function. Currently it will only look at the ACK field and respond based on that.

whitequark · 2018-06-06T00:58:39Z

Thanks for debugging this!

pothos · 2018-06-07T10:25:10Z

@barskern Maybe a look on this code here can help? https://github.com/google/netstack/blob/master/tcpip/transport/tcp/snd.go e.g. see checkDuplicateAck and the variables saved for fast recovery (except sending rate)

barskern · 2018-06-09T10:21:33Z

@pothos Could you run the same test that generated this issue in the first place? I'm excited to see if it works now ⚡

Improve detection of duplicate ACKs by checking that the segment does not contain data. Move implementation from the 'reject unacceptable acknowledgements' to later in the process function, because a duplicate ACK is not an 'unacceptable' acknowledgement but rather a condition to update state. Implement more tests to verify the operation of the fast retransmit implementation. Related issue: #224 Closes: #225 Approved by: whitequark

pothos · 2018-06-12T04:42:59Z

Hi,
sorry, but no, two endpoints can still be stuck sending the same packets.

barskern · 2018-06-12T04:56:18Z

Could you upload debug logs again?

pothos · 2018-06-12T06:29:17Z

I've found the issues and fixed it but did not write a test yet: #233

This fixes entering a loop when both endpoints are stuck in retransmission. smoltcp-rs#224

pothos · 2018-06-18T03:31:26Z

Hi,
can somebody review the PR to get master working again?
Thanks

This fixes entering a loop when both endpoints are stuck in retransmission. #224 Closes: #233 Approved by: whitequark

whitequark added the kind/bug label Jun 1, 2018

barskern mentioned this issue Jun 2, 2018

Fixes bug regarding dup_acks overflow (TcpSocket) #225

Closed

pothos changed the title ~~Debug build panic local_rx_dup_acks overflow~~ Stuck at retransmitting Jun 12, 2018

pothos added a commit to pothos/smoltcp that referenced this issue Jun 12, 2018

Only trigger fast retransmit for data to send

277e7a3

This fixes entering a loop when both endpoints are stuck in retransmission. smoltcp-rs#224

m-labs-homu pushed a commit that referenced this issue Jun 18, 2018

Only trigger fast retransmit for data to send

8f3b46f

This fixes entering a loop when both endpoints are stuck in retransmission. #224 Closes: #233 Approved by: whitequark

m-labs-homu pushed a commit that referenced this issue Jun 18, 2018

Only trigger fast retransmit for data to send

7699265

This fixes entering a loop when both endpoints are stuck in retransmission. #224 Closes: #233 Approved by: whitequark

pothos closed this as completed Jun 19, 2018

jD91mZM2 mentioned this issue Jun 19, 2018

Ethernet::poll infinite loop #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stuck at retransmitting #224

Stuck at retransmitting #224

pothos commented Jun 1, 2018

pothos commented Jun 1, 2018

pothos commented Jun 1, 2018

whitequark commented Jun 1, 2018

barskern commented Jun 1, 2018 •

edited

Loading

whitequark commented Jun 2, 2018

barskern commented Jun 2, 2018

barskern commented Jun 2, 2018 •

edited

Loading

pothos commented Jun 4, 2018

pothos commented Jun 4, 2018

barskern commented Jun 4, 2018

pothos commented Jun 5, 2018

barskern commented Jun 5, 2018

whitequark commented Jun 6, 2018

pothos commented Jun 7, 2018 •

edited

Loading

barskern commented Jun 9, 2018

pothos commented Jun 12, 2018

barskern commented Jun 12, 2018

pothos commented Jun 12, 2018

pothos commented Jun 18, 2018

Stuck at retransmitting #224

Stuck at retransmitting #224

Comments

pothos commented Jun 1, 2018

pothos commented Jun 1, 2018

pothos commented Jun 1, 2018

whitequark commented Jun 1, 2018

barskern commented Jun 1, 2018 • edited Loading

whitequark commented Jun 2, 2018

barskern commented Jun 2, 2018

barskern commented Jun 2, 2018 • edited Loading

pothos commented Jun 4, 2018

pothos commented Jun 4, 2018

barskern commented Jun 4, 2018

pothos commented Jun 5, 2018

barskern commented Jun 5, 2018

whitequark commented Jun 6, 2018

pothos commented Jun 7, 2018 • edited Loading

barskern commented Jun 9, 2018

pothos commented Jun 12, 2018

barskern commented Jun 12, 2018

pothos commented Jun 12, 2018

pothos commented Jun 18, 2018

barskern commented Jun 1, 2018 •

edited

Loading

barskern commented Jun 2, 2018 •

edited

Loading

pothos commented Jun 7, 2018 •

edited

Loading