Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP hangs when sending #62

Closed
pothos opened this issue Oct 25, 2017 · 7 comments
Closed

TCP hangs when sending #62

pothos opened this issue Oct 25, 2017 · 7 comments
Labels

Comments

@pothos
Copy link
Contributor

pothos commented Oct 25, 2017

Hello,
actually I wanted to debug it, but need some help because it is disappears e.g. if writing to pcap or extensive logging is used.

The smoltcp server thread on tap0 will send in 64 byte steps, a Linux socket client thread will read only to ~292800 bytes instead of reading all 10000000 bytes. Smoltcp is handing out keep-alives from this point on. The minimal example is here:

https://github.com/pothos/taptest

cargo run --release should reproduce the behavior.

RUST_LOG=trace cargo run --release should let it sometimes complete (same as adding print statements in the sending server).
Uncommenting the pcap writer will let it complete reliably.

I have experienced it with other values as well, but hope that this combination also reproduces it on other machines.

Best regards,
Kai

@whitequark
Copy link
Contributor

whitequark commented Oct 25, 2017

Oh yeah you hit a bug; retransmit timer gets reset by ACKs whereas it should not.

Specifically, by duplicate ACKs that arrive when the window is zero length, and pass through the duplicate ACK checks because they count as keepalives.

@pothos
Copy link
Contributor Author

pothos commented Oct 26, 2017

Thanks for looking at it, now I have to read the RFCs along with the code and find out how the timer is supposed to be set and kept.

@pothos
Copy link
Contributor Author

pothos commented Oct 28, 2017

Ok, finally got it working. Besides keeping the retransmission timer there was an overflow that needed to be fixed. Preparing a PR now.

whitequark added a commit that referenced this issue Dec 21, 2017
This would result in results near usize::MAX, and is indicative of
a bug. A panic is always used instead of a debug_assert!() because
debug builds are easily slow enough so that the underlying bugs
are not tripped.

Related to #62.
@whitequark
Copy link
Contributor

@pothos I'm finally making progress on this—see d1e2292 for one method I'm using to expose underlying bugs. I am afraid that your fix is hiding one of them.

@whitequark
Copy link
Contributor

@pothos I've integrated a test quite like your taptest in the commit 44db954. I've managed to expose one other bug as well.

@whitequark
Copy link
Contributor

Fixed in b247f64.

@pothos
Copy link
Contributor Author

pothos commented Jan 2, 2018

Thanks for including the fix - I am happy that the wrapping can be covered without distinction of cases ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants