-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_icmp_probe_eu_derper flaky on windows #2069
Comments
…#2068) ## Description test(iroh-net): disable test_icmp_probe_eu_derper as flaky on windows See for example https://github.com/n0-computer/iroh/actions/runs/8245723076/job/22550253677?pr=2051 Related issue: #2069 ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [ ] Self-review. - [ ] Documentation updates if relevant. - [ ] Tests if relevant.
See also the The issue seems to be here that DNS resolution does completely fail on windows at times. We have no idea why yet. But without DNS resolution there's not anything that's going to work. |
## Description This is extremely likely to be related to #2069, so not filing a new issue. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant.
https://github.com/n0-computer/iroh/actions/runs/8245723076/job/22550253677 The problem is that the system config choses the On our CI machine it also tries an IPv4 server fairly quickly after, but the whole DNS lookup has a limit of 1s timeout so the thing fails before we get a response from a working server as the resolver spend too much time on the broken server probably. |
## Description This is extremely likely to be related to n0-computer#2069, so not filing a new issue. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant.
## Description This is extremely likely to be related to n0-computer#2069, so not filing a new issue. ## Notes & open questions <!-- Any notes, remarks or open questions you have to make about the PR. --> ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant.
So it seems |
Ah, I can find the resolvers configured on the host when using |
## Description This actively refuses to use the `fec0:0:0:ffff::1`, `fec0:0:0:ffff::2` and `fec0:0:0:ffff::3` DNS servers if the system has them configured. Windows by default adds 3 IPv6 site-local anycast addresses to the DNS servers: `fec0:0:0:ffff::1`, `fec0:0:0:ffff::2` and `fec0:0:0:ffff::3`. Supposedly Microsoft DNS servers by default listen on those. These are present as soon as there's an IPv6 interface configured it seems, even for a loopback interface which is extremely common if not the default. Our hickory-resolver loads the system configuration, which includes these 3 IPv6 DNS servers. When it needs to make a DNS query it selects a random nameserver and tries this. If that fails it will try another one. For the next query there is bias, it will remember which servers to avoid or use. So if you get lucky and your first query falls on an actual DNS server then you are good. If you get unlucky recovering is a bit of a tussle because: Inside netcheck we do DNS queries with a 1s timeout, this because all the probes have a 3s timeout. However hickory-resolver has a 5s timeout configured, so it's queries stay alive longer than ours. This means almost all subsequent DNS queries will end up reusing an existing connection to one of those bad servers if you are unlucky to land on one. The interplay of these timeouts and the connection reuse make recovering to a good DNS server a rather tough prospect for netcheck. It probably would eventually, given enough netcheck runs (which run at intervals of ~30s). The odds of these nameservers being the sole way of having working DNS is basically zero. The odds of these nameservers breaking the resolver are about 50%. So remove these deprecated things. ## Notes & open questions Unfortunately the resolver returned by `get_resolver()` does not have an API that allows to test it. But the test would basically be the inverse logic of the logic that removes the bad servers so perhaps not that useful anyway. Closes #2069 Closes n0-computer/dumbpipe#17 ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant.
It seems that on windows sometimes we can't do this probe. The derper looks healthy, and if it was down all the other tests would fail as well. We have seen this not just in tests but also in real life.
Possibly related: n0-computer/dumbpipe#17
The text was updated successfully, but these errors were encountered: