Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idle_timeout_crazy_rtt can timeout if there are two losses #974

Open
martinthomson opened this issue Oct 1, 2020 · 7 comments
Open

idle_timeout_crazy_rtt can timeout if there are two losses #974

martinthomson opened this issue Oct 1, 2020 · 7 comments
Labels
p3 Backlog

Comments

@martinthomson
Copy link
Member

martinthomson commented Oct 1, 2020

Run:

SIMULATION_SEED=a5cde2a8492f2e704f54105ed28192deb79005d4f0356236a81550b867471d40 \
  cargo test -p neqo-transport --test network -- idle_timeout_crazy_rtt --nocapture

Fixing this might require some work.

@martinthomson
Copy link
Member Author

OK, analysis is that this is the result of a rare double-loss of packets containing HANDSHAKE_DONE. Nothing serious here; we expect that to happen because this test is badly exposed to loss as the idle timeout ends up being set to 3PTO.

Now, we've discussed sending more aggressively on PTO (by making the first PTO happen at half the nominal time), which might help fix this. Until we have something more stable, I'll keep this open.

@martinthomson martinthomson changed the title idle_timeout_crazy_rtt assertion idle_timeout_crazy_rtt can timeout if there are two losses Oct 1, 2020
@martinthomson
Copy link
Member Author

98c5c815e436f9adc043bd0509f752ee967f0502526b6acf1880fc312f0206ce

@martinthomson
Copy link
Member Author

It took longer to get a failure, but here is another one: ecebadd179ce347280ffcd3e62b5037745f5b588017dace25007bb6914f9d56d

@ddragana ddragana added the p3 Backlog label Nov 24, 2021
@larseggert
Copy link
Collaborator

This is happening frequently enough in CI that we should really fix this.

@mxinden
Copy link
Collaborator

mxinden commented Apr 1, 2024

I am unable to reproduce this failure locally, neither randomly running in a loop (while cargo test -p neqo-transport --test network -- idle_timeout_crazy_rtt --nocapture; do :; done), nor with the above seeds.

Any additional pointers, e.g. CI failure links, ...

Local machine:

$ lsb_release -a                                                                                     
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 23.10
Release:        23.10
Codename:       mantic

@larseggert
Copy link
Collaborator

IIRC the CI failures are always on Windows.

@larseggert
Copy link
Collaborator

This doesn't seem to happen anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p3 Backlog
Projects
None yet
Development

No branches or pull requests

4 participants