Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved: Fix DoT timeout on multiple answer records (for CloudFlare, Google, etc. DoT servers) #22132

Merged
merged 2 commits into from Jan 27, 2022

Conversation

joanbm
Copy link
Contributor

@joanbm joanbm commented Jan 15, 2022

When sending multiple DNS questions to a DNS-over-TLS server (e.g. a question for A and AAAA records, as is typical) on the same session, the server may answer to each question in a separate TLS record, but it may also aggregate multiple answers in a single TLS record.
(Some servers do this very often (e.g. Cloudflare 1.0.0.1), some do it sometimes (e.g. Google 8.8.8.8) and some seem to never do it (e.g. Quad9 9.9.9.10)).

Both cases should be handled equivalently, as the byte stream is the same, but when multiple answers came in a single TLS record, usually the first answer was processed, but the second answer was entirely ignored, which caused a 10s delay until the resolution timed out and the missing question was retried. This can be reproduced by configuring one of the offending server and running resolvectl query google.com --cache=no a few times.

To be notified of incoming data, systemd-resolved listens to EPOLLIN events on the underlying socket. However, when DNS-over-TLS is used, the TLS library (OpenSSL or GnuTLS) may read and buffer the entire TLS record when reading the first answer, so usually no further EPOLLIN events will be generated, and the second answer will never be processed.

To avoid this, if there's buffered TLS data, generate a "fake" EPOLLIN event. This is hacky, but it makes this case transparent to the rest of the IO code.

Fixes: https://bbs.archlinux.org/viewtopic.php?id=262926

--

Appendix:

To verify the issue, you can use socat like so to intercept some DoT server: socat -v -x openssl-listen:853,reuseaddr,fork,cert=127.0.0.1.pem,key=127.0.0.1-key.pem,verify=0 openssl:8.8.8.8:853 (you will need a trusted certificate for localhost).

Sample stream that did not cause systemd-resolved to hang:

> 2022/01/15 22:23:48.849930  length=2 from=820 to=821
 00 27                                            .'
--
> 2022/01/15 22:23:48.850041  length=39 from=822 to=860
 f2 43 01 00 00 01 00 00 00 00 00 01 06 67 6f 6f  .C...........goo
 67 6c 65 03 63 6f 6d 00 00 01 00 01 00 00 29 ff  gle.com.......).
 e4 00 00 00 00 00 00                             .......
--
> 2022/01/15 22:23:48.850178  length=2 from=861 to=862
 00 27                                            .'
--
> 2022/01/15 22:23:48.850230  length=39 from=863 to=901
 b0 30 01 00 00 01 00 00 00 00 00 01 06 67 6f 6f  .0...........goo
 67 6c 65 03 63 6f 6d 00 00 1c 00 01 00 00 29 ff  gle.com.......).
 e4 00 00 00 00 00 00                             .......
--
< 2022/01/15 22:23:48.883489  length=69 from=1260 to=1328
 00 43 b0 30 81 80 00 01 00 01 00 00 00 01 06 67  .C.0...........g
 6f 6f 67 6c 65 03 63 6f 6d 00 00 1c 00 01 c0 0c  oogle.com.......
 00 1c 00 01 00 00 00 87 00 10 2a 00 14 50 40 03  ..........*..P@.
 08 06 00 00 00 00 00 00 20 0e 00 00 29 02 00 00  ........ ...)...
 00 00 00 00 00                                   .....
--
< 2022/01/15 22:23:48.883906  length=57 from=1329 to=1385
 00 37 f2 43 81 80 00 01 00 01 00 00 00 01 06 67  .7.C...........g
 6f 6f 67 6c 65 03 63 6f 6d 00 00 01 00 01 c0 0c  oogle.com.......
 00 01 00 01 00 00 01 12 00 04 8e fa b2 ae 00 00  ................
 29 02 00 00 00 00 00 00 00                       )........

Sample stream that caused systemd-resolved to hang:

> 2022/01/15 22:23:50.113990  length=2 from=902 to=903
 00 27                                            .'
--
> 2022/01/15 22:23:50.114074  length=39 from=904 to=942
 d4 53 01 00 00 01 00 00 00 00 00 01 06 67 6f 6f  .S...........goo
 67 6c 65 03 63 6f 6d 00 00 01 00 01 00 00 29 ff  gle.com.......).
 e4 00 00 00 00 00 00                             .......
--
> 2022/01/15 22:23:50.114174  length=2 from=943 to=944
 00 27                                            .'
--
> 2022/01/15 22:23:50.114216  length=39 from=945 to=983
 35 79 01 00 00 01 00 00 00 00 00 01 06 67 6f 6f  5y...........goo
 67 6c 65 03 63 6f 6d 00 00 1c 00 01 00 00 29 ff  gle.com.......).
 e4 00 00 00 00 00 00                             .......
--
< 2022/01/15 22:23:50.147560  length=126 from=1386 to=1511
 00 37 d4 53 81 80 00 01 00 01 00 00 00 01 06 67  .7.S...........g
 6f 6f 67 6c 65 03 63 6f 6d 00 00 01 00 01 c0 0c  oogle.com.......
 00 01 00 01 00 00 01 10 00 04 8e fa b2 ae 00 00  ................
 29 02 00 00 00 00 00 00 00 00 43 35 79 81 80 00  ).........C5y...
 01 00 01 00 00 00 01 06 67 6f 6f 67 6c 65 03 63  ........google.c
 6f 6d 00 00 1c 00 01 c0 0c 00 1c 00 01 00 00 00  om..............
 85 00 10 2a 00 14 50 40 03 08 06 00 00 00 00 00  ...*..P@........
 00 20 0e 00 00 29 02 00 00 00 00 00 00 00        . ...)........

@joanbm joanbm changed the title resolved: Fix DoT timeout on multiple answer records resolved: Fix DoT timeout on multiple answer records (fixes CloudFlare, Google DoT servers) Jan 15, 2022
@joanbm joanbm changed the title resolved: Fix DoT timeout on multiple answer records (fixes CloudFlare, Google DoT servers) resolved: Fix DoT timeout on multiple answer records (for CloudFlare, Google, etc. DoT servers) Jan 15, 2022
@joanbm
Copy link
Contributor Author

joanbm commented Jan 16, 2022

Update: Fixed error handling for sd_event_source_get_io_events.

Copy link
Member

@yuwata yuwata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review.

src/resolve/resolved-dnstls-gnutls.c Outdated Show resolved Hide resolved
src/resolve/resolved-dnstls-openssl.c Outdated Show resolved Hide resolved
src/resolve/resolved-dnstls-openssl.c Outdated Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Outdated Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Outdated Show resolved Hide resolved
@yuwata yuwata added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 21, 2022
@joanbm
Copy link
Contributor Author

joanbm commented Jan 23, 2022

Reviewed according to comments + added an automated test for DnsStream reproducing the bug.

Copy link
Member

@yuwata yuwata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding test. That's super nice!

src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Outdated Show resolved Hide resolved
@yuwata yuwata removed the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 24, 2022
@joanbm
Copy link
Contributor Author

joanbm commented Jan 24, 2022

Ready for re-review:

  • Converted logic to handle TLS buffered data from defer event to loop
  • Fixed style issues in the tests
  • Fix CI: Skip the unit test if the 'openssl' command is not available at runtime. @mrc0mmand not sure if you have any better way to go about this. I don't like to use OpenSSL or GnuTLS as a library in the test, I prefer something like openssl/socat/stunnel/etc. since it's simpler, and harder to get wrong.
  • Fix a minor memory safety (ASAN) issue in the test

PS: Sorry for launching so many CI runs, I kept finding small things to fix.

src/resolve/test-resolved-stream.c Outdated Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Show resolved Hide resolved
src/resolve/resolved-dns-stream.c Outdated Show resolved Hide resolved
if (s->encrypted) {
uint32_t events;

r = sd_event_source_get_io_events(s->io_event_source, &events);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is called?? Why do not you reuse revents here?

Copy link
Contributor Author

@joanbm joanbm Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From man 2 poll, revents are the returned events (i.e. which events we received to process at the start of the call), events are the requested events (i.e. which events we could process next).

The point of checking whether FLAGS_SET(events, EPOLLIN) is to know whether the stream still wants to read more data. It would be wrong to generate our "pseudo-EPOLLIN event " on buffered TLS data if the stream is not even expecting to receive further data available events.

Note that in this case events is just what was set by the last call to dns_stream_update_io. For example, looking at the condition on dns_stream_update_io, when we already have a full DNS packet stored on s->read_packet, we don't want to process any further data, even if it's available.

PS: BTW, something unrelated to this issue which I noticed when developing the test, is that dns_stream_take_read_packetdoesn't call dns_stream_update_io. What this means is that unless you process the packets on the (supposedly optional) on_packet callback, taking the packet doesn't set the EPOLLIN flag again, so no further packets will be processed even if they are available... which seems wrong. But all callers use the on_packet callback, so this is just a theoretical issue for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense for me. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS: BTW, something unrelated to this issue which I noticed when developing the test, is that dns_stream_take_read_packetdoesn't call dns_stream_update_io. What this means is that unless you process the packets on the (supposedly optional) on_packet callback, taking the packet doesn't set the EPOLLIN flag again, so no further packets will be processed even if they are available... which seems wrong. But all callers use the on_packet callback, so this is just a theoretical issue for now.

I cannot catch the issue... There is dns_stream_update_io() after on_packet(). Is it not enough?

Copy link
Contributor Author

@joanbm joanbm Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine I had written the loop of line 300 in src/resolve/test-resolved-stream.c like this:

        stream->on_packet = NULL;
        while (n_received_packets != 2) {
                assert_se(sd_event_run(event, EVENT_TIMEOUT_USEC) >= 1);
                DnsPacket *p = dns_stream_take_read_packet(stream);
                if (p)
                        received_packets[n_received_packets++] = p;
        }

IMO this ought to work, but it actually fails for both TCP and TLS, reading only a single packet, then timing out.

Why? Because when I call sd_event_run, stream->read_packet is filled. When dns_stream_update_io is called after reading the packet in on_stream_io_impl, the EPOLLIN event flag is unset. Later I call dns_stream_take_read_packet so stream->read_packet is free again, but no one sets the EPOLLIN event flag again, so no further packets are read and the test times out.

But all current usages call dns_stream_take_read_packet inside the on_packet callback rather than outside, so the EPOLLIN event flag is never unset, and everything works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation! Let's add a brief comment in dns_stream_take_read_packet().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, while I try to understand your comment previously, I found a potential issue, and filed as #22266. If you'd like, PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation! Let's add a brief comment in dns_stream_take_read_packet().

#22274.

@mrc0mmand
Copy link
Member

  • Fix CI: Skip the unit test if the 'openssl' command is not available at runtime. @mrc0mmand not sure if you have any better way to go about this. I don't like to use OpenSSL or GnuTLS as a library in the test, I prefer something like openssl/socat/stunnel/etc. since it's simpler, and harder to get wrong.

I definitely agree, so using the openssl utility here is fine by me.

PS: Sorry for launching so many CI runs, I kept finding small things to fix.

Don't worry about it, most of the CIs automagically cancel the previous runs on force push.

@yuwata yuwata added this to the v251 milestone Jan 25, 2022
@yuwata yuwata added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 26, 2022
@joanbm joanbm requested a review from yuwata January 26, 2022 20:33
When sending multiple DNS questions to a DNS-over-TLS server (e.g. a question
for A and AAAA records, as is typical) on the same session, the server may
answer to each question in a separate TLS record, but it may also aggregate
multiple answers in a single TLS record.
(Some servers do this very often (e.g. Cloudflare 1.0.0.1), some do it sometimes
(e.g. Google 8.8.8.8) and some seem to never do it (e.g. Quad9 9.9.9.10)).

Both cases should be handled equivalently, as the byte stream is the same, but
when multiple answers came in a single TLS record, usually the first answer was
processed, but the second answer was entirely ignored, which caused a 10s delay
until the resolution timed out and the missing question was retried.
This can be reproduced by configuring one of the offending server and running
`resolvectl query google.com --cache=no` a few times.

To be notified of incoming data, systemd-resolved listens to `EPOLLIN` events
on the underlying socket. However, when DNS-over-TLS is used, the TLS library
(OpenSSL or GnuTLS) may read and buffer the entire TLS record when reading the
first answer, so usually no further `EPOLLIN` events will be generated, and the
second answer will never be processed.

To avoid this, if there's buffered TLS data, generate a "fake" EPOLLIN event.
This is hacky, but it makes this case transparent to the rest of the IO code.
Tests DnsStream event handling, both for plain TCP DNS and DNS over TLS.

The DoT test requires the "openssl s_server" command line tool to mock a simple
TLS server. Thus the test's TLS part is skipped if openssl it not available.
The test works for both DNS_OVER_TLS_USE_GNUTLS and DNS_OVER_TLS_USE_OPENSSL.

The DoT case fails due to a bug, which is fixed on the next commit.
Copy link
Member

@yuwata yuwata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

(I dropped doubled empty line. And re-ordered the commits, as the test does not pass without the fix.)

@yuwata yuwata added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks labels Jan 27, 2022
@yuwata yuwata merged commit 33db66a into systemd:main Jan 27, 2022
yuwata added a commit to yuwata/systemd that referenced this pull request Jan 27, 2022
…s_stream_take_read_packet()

Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>.
See systemd#22132 (comment).
yuwata added a commit to yuwata/systemd that referenced this pull request Jan 28, 2022
…s_stream_take_read_packet()

Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>.
See systemd#22132 (comment).
@foutrelis
Copy link
Contributor

Does this fix meet the criteria for a stable backport?

Context: I'm using systemd 250.4 and noticed page loads being delayed in Chromium shortly after enabling DNSOverTLS. After applying 2aaf6bb on top of v250.4, resolvectl query google.com --cache=no no longer hangs. 🐭

yuwata added a commit to systemd/systemd-stable that referenced this pull request Mar 17, 2022
…s_stream_take_read_packet()

Based on the analysis by Joan Bruguera <joanbrugueram@gmail.com>.
See systemd/systemd#22132 (comment).

(cherry picked from commit 4aa6129)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed resolve
Development

Successfully merging this pull request may close these issues.

None yet

4 participants