-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stub resolver cannot resolve www.vox.com or bugzilla.redhat.com, missing A records in response from stub resolver #18972
Comments
Er, hang on, the responses provided by the stub resolver keep changing, so I can sometimes randomly resolve the address and other times not. Here is www.vox.com using 8.8.8.8:
And using the stub resolver, again no A records:
|
Seems www.vox.com is reliably busted. The results I get for bugzilla.redhat.com keep changing so it's down to luck whether that works or not. |
Are you sure you are running the right resolved version? I cannot reproduce this here, both the rh and the vox domain resolve completely reliably here. your output look a lot like from a version without #18819 fixed, as if you didn't get that patch? |
No, I definitely have the patch from #18819, it is applied here for systemd-248~rc2-3.fc34. @nanonyme thinks it may be semi-random, which feels right to me. bugzilla.redhat.com is resolving for me now, though it was almost always failing a couple hours ago. I managed to visit www.vox.com once just now, but it still almost always fails. I'm willing to build a modified systemd with extra debug if that would be helpful. Another thing that would help would be if you temporarily disable nss-resolve, then you should get the stub resolver via nss-dns and will notice yourself when it starts acting wonky. It's definitely specific to the stub resolver: anything using nss-resolve is unaffected. |
Um, I just realized bugzilla.redhat.com is a somewhat bad example because I am connected to Red Hat VPN, sorry. :/ So comparing its output to what 8.8.8.8 returns is just confusing because there are no CNAMEs in the public version. I've edited my posts to remove those distracting examples. www.vox.com is a better example. |
I think the problem is here:
The record |
Yeah, currently systemd is randomly returning this:
And currently it works for me. So sometimes the A record shows up in the wrong section, then it breaks. Sometimes it shows up in the correct section, then it works. |
@cmurf asked me to try SYSTEMD_LOG_LEVEL=debug. Here is a good case:
Notice that in this case we have a cache miss for www.vox.com. Now here is a bad case:
Notice there it starts with a cache hit for www.vox.com. But not all cache hits are bad. Sometimes it caches a good result:
What's frustrating is that I haven't found any way to reliably reproduce. Sometimes it works, sometimes it doesn't. Restarting systemd-resolved seems to get it into a good state, though. |
Same problem with www.reddit.com:
|
I am also still seeing issues with the latest Fedora package build (that includes the previous fix). There's a specific test in openQA which resolves |
Workaround: Edit: better workaround: |
Adam wound up disabling the DNS cache in Fedora 34. Since upgrading to this new version about an hour ago, I have not seen a single DNS failure. I can't be completely certain, but I now strongly suspect the bug only occurs when there is a cache hit (and yet: not always when there is a cache hit). |
yeah, the openQA tests I ran with the disabled cache build all passed (or didn't fail on DNS resolution, at least) too. Will report back if we do hit a failure. |
@mcatanzaro thanks a lot for the elaborate debug logs. I managed to rerproduce this finally with help of those. Working on a fix. |
Fix waiting in #19009. Would be great if you could give this some testing? |
Previously by mistake we'd always match every single reply we get in a CNAME chain to the original question from the stub client. That's broken, we need to test it against the CNAME query we are currently looking at. The effect of this incorrect matching was that we'd assign the RRs to the wrong section since we'd assume they'd be auxiliary answers instead of primary answers. Fixes: systemd#18972
Previously by mistake we'd always match every single reply we get in a CNAME chain to the original question from the stub client. That's broken, we need to test it against the CNAME query we are currently looking at. The effect of this incorrect matching was that we'd assign the RRs to the wrong section since we'd assume they'd be auxiliary answers instead of primary answers. Fixes: systemd#18972
A reporter downstream is still reporting issues with systemd-248~rc2-8.fc34 , which has #19009 backported to rc2. So either I muffed the backport somehow, the reporter messed something up, or we still have issues here. I did have to rediff one small thing in the patch - changing |
oh, never mind, reporter just updated with "Scratch that. This seems to be unrelated as when overriding the old systemd into 34-testing the issue persists. I'm going to dig a little deeper." I'll keep an eye open though. |
You did basically the same thing I did when backporting this. I've now tested both your backport and my own, and they both fix this issue for me. Maybe the downstream tester failed to |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I am also hitting problems with the -8 build with the backport, BTW. |
BTW, rc4 for fedora rawhide and f34 are built successfully. Please try them. |
The above is my new report for the latest issue, as Michael suggested it should be filed separately. |
systemd-248~rc2-3.fc34, note this version contains the recent fix for #18819.
Fedora 34
5.11.5-300.fc34.x86_64
x86_64
Expected behaviour you didn't see
I should be able to resolve www.vox.com and bugzilla.redhat.com. (Edit: I've removed my examples here because they are not right, see next comment for examples.) A similar problem occurs for www.vox.com (not vox.com). Finally, note this occurs with systemd-248~rc2-3.fc34 which already includes the fix from #18819.
The text was updated successfully, but these errors were encountered: