New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolved: Fix DoT timeout on multiple answer records (for CloudFlare, Google, etc. DoT servers) #22132
Conversation
Update: Fixed error handling for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late review.
Reviewed according to comments + added an automated test for DnsStream reproducing the bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding test. That's super nice!
Ready for re-review:
PS: Sorry for launching so many CI runs, I kept finding small things to fix. |
if (s->encrypted) { | ||
uint32_t events; | ||
|
||
r = sd_event_source_get_io_events(s->io_event_source, &events); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is called?? Why do not you reuse revents
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From man 2 poll
, revents
are the returned events (i.e. which events we received to process at the start of the call), events
are the requested events (i.e. which events we could process next).
The point of checking whether FLAGS_SET(events, EPOLLIN)
is to know whether the stream still wants to read more data. It would be wrong to generate our "pseudo-EPOLLIN
event " on buffered TLS data if the stream is not even expecting to receive further data available events.
Note that in this case events
is just what was set by the last call to dns_stream_update_io
. For example, looking at the condition on dns_stream_update_io
, when we already have a full DNS packet stored on s->read_packet
, we don't want to process any further data, even if it's available.
PS: BTW, something unrelated to this issue which I noticed when developing the test, is that dns_stream_take_read_packet
doesn't call dns_stream_update_io
. What this means is that unless you process the packets on the (supposedly optional) on_packet
callback, taking the packet doesn't set the EPOLLIN
flag again, so no further packets will be processed even if they are available... which seems wrong. But all callers use the on_packet
callback, so this is just a theoretical issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense for me. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS: BTW, something unrelated to this issue which I noticed when developing the test, is that
dns_stream_take_read_packet
doesn't calldns_stream_update_io
. What this means is that unless you process the packets on the (supposedly optional)on_packet
callback, taking the packet doesn't set theEPOLLIN
flag again, so no further packets will be processed even if they are available... which seems wrong. But all callers use theon_packet
callback, so this is just a theoretical issue for now.
I cannot catch the issue... There is dns_stream_update_io()
after on_packet()
. Is it not enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine I had written the loop of line 300 in src/resolve/test-resolved-stream.c
like this:
stream->on_packet = NULL;
while (n_received_packets != 2) {
assert_se(sd_event_run(event, EVENT_TIMEOUT_USEC) >= 1);
DnsPacket *p = dns_stream_take_read_packet(stream);
if (p)
received_packets[n_received_packets++] = p;
}
IMO this ought to work, but it actually fails for both TCP and TLS, reading only a single packet, then timing out.
Why? Because when I call sd_event_run
, stream->read_packet
is filled. When dns_stream_update_io
is called after reading the packet in on_stream_io_impl
, the EPOLLIN
event flag is unset. Later I call dns_stream_take_read_packet
so stream->read_packet
is free again, but no one sets the EPOLLIN
event flag again, so no further packets are read and the test times out.
But all current usages call dns_stream_take_read_packet
inside the on_packet
callback rather than outside, so the EPOLLIN
event flag is never unset, and everything works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation! Let's add a brief comment in dns_stream_take_read_packet()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, while I try to understand your comment previously, I found a potential issue, and filed as #22266. If you'd like, PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation! Let's add a brief comment in
dns_stream_take_read_packet()
.
I definitely agree, so using the
Don't worry about it, most of the CIs automagically cancel the previous runs on force push. |
When sending multiple DNS questions to a DNS-over-TLS server (e.g. a question for A and AAAA records, as is typical) on the same session, the server may answer to each question in a separate TLS record, but it may also aggregate multiple answers in a single TLS record. (Some servers do this very often (e.g. Cloudflare 1.0.0.1), some do it sometimes (e.g. Google 8.8.8.8) and some seem to never do it (e.g. Quad9 9.9.9.10)). Both cases should be handled equivalently, as the byte stream is the same, but when multiple answers came in a single TLS record, usually the first answer was processed, but the second answer was entirely ignored, which caused a 10s delay until the resolution timed out and the missing question was retried. This can be reproduced by configuring one of the offending server and running `resolvectl query google.com --cache=no` a few times. To be notified of incoming data, systemd-resolved listens to `EPOLLIN` events on the underlying socket. However, when DNS-over-TLS is used, the TLS library (OpenSSL or GnuTLS) may read and buffer the entire TLS record when reading the first answer, so usually no further `EPOLLIN` events will be generated, and the second answer will never be processed. To avoid this, if there's buffered TLS data, generate a "fake" EPOLLIN event. This is hacky, but it makes this case transparent to the rest of the IO code.
Tests DnsStream event handling, both for plain TCP DNS and DNS over TLS. The DoT test requires the "openssl s_server" command line tool to mock a simple TLS server. Thus the test's TLS part is skipped if openssl it not available. The test works for both DNS_OVER_TLS_USE_GNUTLS and DNS_OVER_TLS_USE_OPENSSL. The DoT case fails due to a bug, which is fixed on the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
(I dropped doubled empty line. And re-ordered the commits, as the test does not pass without the fix.)
…s_stream_take_read_packet() Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>. See systemd#22132 (comment).
…s_stream_take_read_packet() Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>. See systemd#22132 (comment).
Does this fix meet the criteria for a stable backport? Context: I'm using systemd 250.4 and noticed page loads being delayed in Chromium shortly after enabling |
…s_stream_take_read_packet() Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>. See systemd/systemd#22132 (comment). (cherry picked from commit 4aa6129)
When sending multiple DNS questions to a DNS-over-TLS server (e.g. a question for A and AAAA records, as is typical) on the same session, the server may answer to each question in a separate TLS record, but it may also aggregate multiple answers in a single TLS record.
(Some servers do this very often (e.g. Cloudflare 1.0.0.1), some do it sometimes (e.g. Google 8.8.8.8) and some seem to never do it (e.g. Quad9 9.9.9.10)).
Both cases should be handled equivalently, as the byte stream is the same, but when multiple answers came in a single TLS record, usually the first answer was processed, but the second answer was entirely ignored, which caused a 10s delay until the resolution timed out and the missing question was retried. This can be reproduced by configuring one of the offending server and running
resolvectl query google.com --cache=no
a few times.To be notified of incoming data, systemd-resolved listens to
EPOLLIN
events on the underlying socket. However, when DNS-over-TLS is used, the TLS library (OpenSSL or GnuTLS) may read and buffer the entire TLS record when reading the first answer, so usually no furtherEPOLLIN
events will be generated, and the second answer will never be processed.To avoid this, if there's buffered TLS data, generate a "fake" EPOLLIN event. This is hacky, but it makes this case transparent to the rest of the IO code.
Fixes: https://bbs.archlinux.org/viewtopic.php?id=262926
--
Appendix:
To verify the issue, you can use socat like so to intercept some DoT server:
socat -v -x openssl-listen:853,reuseaddr,fork,cert=127.0.0.1.pem,key=127.0.0.1-key.pem,verify=0 openssl:8.8.8.8:853
(you will need a trusted certificate for localhost).Sample stream that did not cause systemd-resolved to hang:
Sample stream that caused systemd-resolved to hang: