resolved: more CNAME redirect fixes#19009
Conversation
We nowadays cache full answer RRset combinations instead of just the exact matching rrset. This means we should not cache RRs that are not immediate answers to our question for longer then their own RRs. Or in other words: let's determine the shortest TTL of all RRs in the whole answer, and use that as cache lifetime.
When responding from DNS cache, let's slightly tweak how the TTL is lowered: as before let's round down when converting from our internal µs to the external seconds. (This is preferable, since records should better be cached too short instead of too long.) Let's avoid rounding down to zero though, since that has special semantics in many cases (in particular mDNS). Let's just use 1s in that case.
4f8245c to
0d8653e
Compare
|
Commit 533ec8f references the wrong issue to be fixed, no? |
|
oops, typo |
Previously by mistake we'd always match every single reply we get in a CNAME chain to the original question from the stub client. That's broken, we need to test it against the CNAME query we are currently looking at. The effect of this incorrect matching was that we'd assign the RRs to the wrong section since we'd assume they'd be auxiliary answers instead of primary answers. Fixes: systemd#18972
When doing a CNAME/DNAME redirect let's first check if the answer we already have fully answers the redirected question already. If so, let's use that. If not, let's properly restart things. This simply removes one call to dns_answer_reset() that was placed too early: instead of resetting when we detect a CNAME/DNAME redirect, do so only after checking if the answer we already have doesn't match the reply, and then decide to *actually* follow it. Or in other words: rely on the dns_answer_reset() call in dns_query_go() which we'll call to actually begin with the redirected question. This fixes an optimization path which was broken back in 7820b32. (This doesn't really matter as much as one might think, since our cache stepped in anyway and answered the questions before going back to the network. However, this adds noise if RRs with very short TTLs are cached – which some CDNs do – and is of course relavant when people turn off the local cache.)
0d8653e to
b1eea70
Compare
|
@eworm-de fixed now |
|
LGTM. |
|
@mcatanzaro any chance you could give this a whirl, so that we can merge this? @yuwata liked it, but given this is so late in the cycle I'd love a test from someone who knows the isue well before we merge it. Thanks! |
|
Sure, I'll test soon. |
|
OK, I've reenabled my DNS cache and am running this code now. I should notice within a few hours if it doesn't work reliably. |
|
I'm reasonably confident it's fixed. Thanks Lennart! |
|
thanks for testing and reported back. Let's merge. |
|
FYI It seems like systemd-resolved now correctly follows https://www.rfc-editor.org/rfc/rfc2181#section-5.2: |
A fix for #18972.
This look like more than it is. The important fix is a oneliner. The other stuff are trivialities, plus some TTL clean-ups. There's one more fix that restores an optimization path that we accidentally dropped back in 7820b32 (the optimization is not important though, the cache makes it usually unnecessary. But it's nicer to do it, since very low TTL RRs otherwise might add noise).