Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systemd-resolve general protection fault #18427

Closed
ioerror opened this issue Feb 1, 2021 · 17 comments
Closed

systemd-resolve general protection fault #18427

ioerror opened this issue Feb 1, 2021 · 17 comments
Labels

Comments

@ioerror
Copy link

ioerror commented Feb 1, 2021

systemd version the issue has been seen with

systemd 247 (247.2-5)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified

Used distribution

Debian GNU/Linux Bullseye

Linux kernel version used (uname -a)
Linux apu4d4 5.10.0-1-amd64 #1 SMP Debian 5.10.4-1 (2020-12-31) x86_64 GNU/Linux

CPU architecture issue was seen on

x86_64

Expected behaviour you didn't see

systemd-resolved should stay running and automatically repair itself so DNS services do not need a manual restart.

Unexpected behaviour you saw

systemd-resolved stopped replying to clients on the local LAN.

The kernel message log had the following entries:

[189196.063051] traps: systemd-resolve[7652] general protection fault ip:55a6914ecd00 sp:7ffca45514f8 error:0 in systemd-resolved[55a6914e3000+44000]
[189606.310988] traps: systemd-resolve[7670] general protection fault ip:561a03b44d00 sp:7ffcb1b46a88 error:0 in systemd-resolved[561a03b3b000+44000]
[189741.060612] systemd-resolve[7679]: segfault at 55bbd216f140 ip 000055bbd216f140 sp 00007ffd4af8aae8 error 15
[189741.060638] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 41 00 00 00 00 00 00 00 50 94 ab d1 bb 55
[189880.036836] systemd-resolve[7686]: segfault at 30 ip 00007f08d14c14a8 sp 00007ffd402e8560 error 4 in libsystemd-shared-247.so[7f08d13f8000+18c000]
[189880.036892] Code: 20 64 48 8b 04 25 28 00 00 00 48 89 44 24 18 31 c0 c7 04 24 fe ff ff ff 48 c7 44 24 08 00 00 00 00 48 85 ff 0f 84 bb 00 00 00 <0f> b6 47 2f 48 89 fd 41 89 f4 a8 04 0f 85 9e 00 00 00 89 c2 c0 ea
[190153.308476] traps: systemd-resolve[7698] general protection fault ip:55b694383d00 sp:7fffc544fe38 error:0 in systemd-resolved[55b69437a000+44000]
[190287.807839] systemd-resolve[7704]: segfault at 2c4e151967 ip 00007f3d54ebfdb9 sp 00007fffa9664c00 error 4 in libsystemd-shared-247.so[7f3d54d61000+18c000]
[190287.807898] Code: 35 b2 66 07 00 e8 a7 44 ea ff b8 f6 ff ff ff e9 c6 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 85 ff 74 7b 48 83 ec 18 <8b> 47 04 85 c0 74 48 83 e8 01 89 47 04 75 11 f6 47 29 02 74 12 f6

Steps to reproduce the problem

The /etc/systemd/resolved.conf file:

[Resolve]
DNS=46.182.19.48#dns2.digitalcourage.de 1.1.1.1#cloudflare-dns.com 9.9.9.9#dns.quad9.net
DNSSEC=yes
DNSOverTLS=opportunistic
MulticastDNS=no
LLMNR=no
Cache=yes
DNSStubListener=yes
Domains=~.
DNSStubListenerExtra=192.168.1.1
DNSStubListenerExtra=192.168.2.1
DNSStubListenerExtra=192.168.3.1
DNSStubListenerExtra=192.168.4.1
DNSStubListenerExtra=192.168.5.1
DNSStubListenerExtra=192.168.6.1

With this configuration, systemd-resolved requires a near daily restart (systemctl restart systemd-resolved) after it becomes unresponsive. Today was the first time that I observed log entries that appear to have been created around the same time that the resolved daemon became unresponsive to local LAN queries.

@ioerror
Copy link
Author

ioerror commented Feb 1, 2021

Additional error messages generated since the original bug was posted a few hours ago:

[  136.847110] systemd-resolve[477]: segfault at 5627247e23b0 ip 00005627247e23b0 sp 00007ffda68636e8 error 15
[  136.847164] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 61 6b 61 6d 61 69 2e 6e 21 01 00 00 00 00 00 00 <10> 60 43 24 27 56 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  290.219970] systemd-resolve[706]: segfault at 129d72b6 ip 00000000129d72b6 sp 00007ffecd9b3f88 error 14 in systemd-resolved[5611d62ec000+b000]
[  447.651952] traps: systemd-resolve[473] general protection fault ip:562ba5049d00 sp:7ffcc2bc6f48 error:0 in systemd-resolved[562ba5040000+44000]
[  729.761956] systemd-resolve[1268]: segfault at 202d ip 00005627462220d6 sp 00007ffd1c8d4ef0 error 4 in systemd-resolved[56274621d000+44000]
[  729.762010] Code: 75 28 eb 48 48 8b bd 98 00 00 00 48 89 de e8 01 c8 ff ff 48 8b bd a0 00 00 00 48 89 de e8 f2 c7 ff ff 48 89 ef e8 9a 1f 01 00 <48> 8b 7b 20 31 d2 be 01 00 00 00 e8 ca b0 ff ff 48 89 c5 48 85 c0
[  959.268045] systemd-resolve[712]: segfault at 20002f ip 0000562865f000d6 sp 00007ffca2c859b0 error 4 in systemd-resolved[562865efb000+44000]
[  959.268100] Code: 75 28 eb 48 48 8b bd 98 00 00 00 48 89 de e8 01 c8 ff ff 48 8b bd a0 00 00 00 48 89 de e8 f2 c7 ff ff 48 89 ef e8 9a 1f 01 00 <48> 8b 7b 20 31 d2 be 01 00 00 00 e8 ca b0 ff ff 48 89 c5 48 85 c0

@mbiebl
Copy link
Contributor

mbiebl commented Feb 1, 2021

Please install dbgsym package for systemd and then either generate a backtrace by installing systemd-coredump or directly running systemd-resolved under gdb.

https://wiki.debian.org/HowToGetABacktrace has a few hints.

@ioerror
Copy link
Author

ioerror commented Feb 1, 2021

@mbiebl - Sure thing, no problem.

I've installed systemd-coredump and the relevant symbol packages available for the platform. I'll update this bug with more details after the next crash.

@poettering
Copy link
Member

my educated guess is that this is a duplicate of #16168, i.e. fixed by #18290 + #18407. Was that backported to debian yet?

@poettering poettering added needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer resolve labels Feb 1, 2021
@ioerror
Copy link
Author

ioerror commented Feb 1, 2021

@poettering I'm using the systemd package version 247.2-5; according to https://tracker.debian.org/pkg/systemd and https://metadata.ftp-master.debian.org/changelogs//main/s/systemd/systemd_247.2-5_changelog - it looks like 247.2-5 was uploaded on 2021-01-22 and the changelog reports the last updated date was 2021-01-18. I don't see any note in the changelog (apparently written by @mbiebl) that indicates that any of those bugs were patched in that specific release.

Since restarting the service to capture a crash dump, I haven't seen any crashes, yet.

Is there a DNS query test case to reproduce those crash bugs?

@mbiebl
Copy link
Contributor

mbiebl commented Feb 1, 2021

my educated guess is that this is a duplicate of #16168, i.e. fixed by #18290 + #18407. Was that backported to debian yet?

No. Should those commits be added to systemd-stable? @keszybz ^

@keszybz
Copy link
Member

keszybz commented Feb 1, 2021

No. Should those commits be added to systemd-stable? @keszybz ^

Yeah, I was planning to do that today, but I got stuck on some other issue.

systemd-resolved should stay running and automatically repair itself so DNS services do not need a manual restart.

That sounds wrong. We have had this bug for a while in Fedora, with many many people being hit, and generally the result is that resolved crashes and is immediately restarted. Apart from the interrupted query and noise in the logs, this isn't visible to users. The daemon getting "stuck" after a crash is unexpected.

@ioerror
Copy link
Author

ioerror commented Feb 6, 2021

Somewhat surprisingly, I haven't had any systemd-resolved crashes since enabling the systemd-coredump support as requested.

@ioerror
Copy link
Author

ioerror commented Feb 13, 2021

I've now had another set of crashes with an upgraded systemd package (systemd 247 (247.3-1) +PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) :

coredumpctl 
TIME                            PID   UID   GID SIG COREFILE  EXE
Thu 2021-02-11 13:56:13 CET     447   103   104   6 present   /usr/lib/systemd/systemd-resolved
Thu 2021-02-11 13:58:18 CET    1115   103   104  11 present   /usr/lib/systemd/systemd-resolved
Thu 2021-02-11 14:00:39 CET    1284   103   104   6 present   /usr/lib/systemd/systemd-resolved
Thu 2021-02-11 14:17:11 CET    1402   103   104   6 present   /usr/lib/systemd/systemd-resolved
Thu 2021-02-11 14:19:32 CET    2181   103   104  11 present   /usr/lib/systemd/systemd-resolved
Thu 2021-02-11 14:21:46 CET    2333   103   104   6 present   /usr/lib/systemd/systemd-resolved
Fri 2021-02-12 09:57:22 CET     465   103   104   6 present   /usr/lib/systemd/systemd-resolved
Fri 2021-02-12 11:23:56 CET    4592   103   104   6 present   /usr/lib/systemd/systemd-resolved
Fri 2021-02-12 12:20:44 CET    4779   103   104  11 present   /usr/lib/systemd/systemd-resolved

More of these in dmesg:

[Feb12 12:20] systemd-resolve[4779]: segfault at ffffffff00000004 ip 00007f8cf910c2b9 sp 00007ffc7a750400 error 5 in libsystemd-shared-247.so[7f8cf8fad000+18c000]
[  +0.000119] Code: 35 69 70 07 00 e8 a7 3f ea ff b8 f6 ff ff ff e9 c6 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 85 ff 74 7b 48 83 ec 18 <8b> 47 04 85 c0 74 48 83 e8 01 89 47 04 75 11 f6 47 29 02 74 12 f6

I've attached the output of coredumpctl info systemd-resolved-coredumps.txt for each pid listed above.

Is there other information that would be useful for me to collect?

@keszybz
Copy link
Member

keszybz commented Feb 13, 2021

I've installed systemd-coredump and the relevant symbol packages available for the platform. I'll update this bug with more details after the next crash.

There are three asserts in dns_query_complete. It's not even clear which one we're hitting.

Your backtrace has only function names and addresses which would require too much work to resolve for non-debian people. For one of the cases with SIGABRT, not SIGSEGV, please attach gdb and print a backtrace:

$ sudo coredumpctl debug <pid> [tab completion of the pid should work here]
...
(gdb) bt full
...
(gdb) up 3
(gdb) print *q

@ioerror
Copy link
Author

ioerror commented Feb 13, 2021

Here's pid 447:

# coredumpctl debug 447
           PID: 447 (systemd-resolve)
           UID: 103 (systemd-resolve)
           GID: 104 (systemd-resolve)
        Signal: 6 (ABRT)
     Timestamp: Thu 2021-02-11 13:56:12 CET (2 days ago)
  Command Line: /lib/systemd/systemd-resolved
    Executable: /usr/lib/systemd/systemd-resolved
 Control Group: /system.slice/systemd-resolved.service
          Unit: systemd-resolved.service
         Slice: system.slice
       Boot ID: 2eb719e5567e44ffaaab9f4aa523af85
    Machine ID: ebb806bf9dda4e9fb40030432029330e
      Hostname: apu4d4
       Storage: /var/lib/systemd/coredump/core.systemd-resolve.103.2eb719e5567e44ffaaab9f4aa523af85.447.1613048172000000.zst
       Message: Process 447 (systemd-resolve) of user 103 dumped core.
                
                Stack trace of thread 447:
                #0  0x00007f2f46d5bce1 __GI_raise (libc.so.6 + 0x3bce1)
                #1  0x00007f2f46d45537 __GI_abort (libc.so.6 + 0x25537)
                #2  0x00007f2f473dba2e log_assert_failed_realm (libsystemd-shared-247.so + 0x60a2e)
                #3  0x00005621f402af29 dns_query_complete (systemd-resolved + 0x14f29)
                #4  0x00005621f402b03b on_query_timeout (systemd-resolved + 0x1503b)
                #5  0x00007f2f475333ee source_dispatch (libsystemd-shared-247.so + 0x1b83ee)
                #6  0x00007f2f47533771 sd_event_dispatch (libsystemd-shared-247.so + 0x1b8771)
                #7  0x00007f2f47534ad8 sd_event_run (libsystemd-shared-247.so + 0x1b9ad8)
                #8  0x00007f2f47534cf7 sd_event_loop (libsystemd-shared-247.so + 0x1b9cf7)
                #9  0x00005621f404ee6b run (systemd-resolved + 0x38e6b)
                #10 0x00005621f4022a66 main (systemd-resolved + 0xca66)
                #11 0x00007f2f46d46d0a __libc_start_main (libc.so.6 + 0x26d0a)
                #12 0x00005621f4022b0a _start (systemd-resolved + 0xcb0a)

GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/systemd/systemd-resolved...
Reading symbols from /usr/lib/debug/.build-id/ef/2c7b57995fb00a6336ca843fc3e191ef0557aa.debug...
[New LWP 447]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/lib/systemd/systemd-resolved'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {17179888130, 0, 2443688128, 15, 0, 0, 2, 94703845893112, 18446744073709551615, 0, 94703848892900, 94703845955792, 
            94704028876799, 139841036278093, 0, 139841030193568}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f2f46d45537 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x1, sa_sigaction = 0x1}, sa_mask = {__val = {1, 12, 13508095776426807296, 94703845893040, 
              94703827972032, 2, 94703827966683, 491, 1794, 0, 139841036231490, 0, 0, 0, 139841037717216, 94703846094432}}, 
          sa_flags = -1851278656, sa_restorer = 0x5621f51af440}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f2f473dba2e in log_assert_failed_realm (realm=realm@entry=LOG_REALM_SYSTEMD, 
    text=text@entry=0x5621f40665f0 "DNS_TRANSACTION_IS_LIVE(q->state)", file=file@entry=0x5621f4065adb "src/resolve/resolved-dns-query.c", 
    line=line@entry=491, func=func@entry=0x5621f4066fc0 <__PRETTY_FUNCTION__.15.lto_priv.2> "dns_query_complete") at ../src/basic/log.c:864
No locals.
#3  0x00005621f402af29 in dns_query_complete (q=0x5621f51af440, state=<optimized out>) at ../src/resolve/resolved-dns-query.c:491
        __PRETTY_FUNCTION__ = "dns_query_complete"
#4  0x00005621f402b03b in on_query_timeout (s=<optimized out>, usec=<optimized out>, userdata=<optimized out>)
    at ../src/resolve/resolved-dns-query.c:509
        q = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        __PRETTY_FUNCTION__ = "on_query_timeout"
#5  0x00007f2f475333ee in source_dispatch (s=s@entry=0x5621f51af660) at ../src/libsystemd/sd-event/sd-event.c:3246
        saved_event = 0x5621f517e3b0
        saved_type = SOURCE_TIME_BOOTTIME
        r = <optimized out>
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#6  0x00007f2f47533771 in sd_event_dispatch (e=e@entry=0x5621f517e3b0) at ../src/libsystemd/sd-event/sd-event.c:3689
        ref = <optimized out>
        p = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_dispatch"
#7  0x00007f2f47534ad8 in sd_event_run (e=e@entry=0x5621f517e3b0, timeout=timeout@entry=18446744073709551615)
    at ../src/libsystemd/sd-event/sd-event.c:3747
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#8  0x00007f2f47534cf7 in sd_event_loop (e=0x5621f517e3b0) at ../src/libsystemd/sd-event/sd-event.c:3769
        ref = 0x5621f517e3b0
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_loop"
#9  0x00005621f404ee6b in run (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:92
--Type <RET> for more, q to quit, c to continue without paging--
        m = <optimized out>
        notify_stop = 0x5621f40707f8 "STOPPING=1\nSTATUS=Shutting down..."
        r = <optimized out>
        __func__ = "run"
        __PRETTY_FUNCTION__ = "run"
#10 0x00005621f4022a66 in main (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:99
        r = <optimized out>
(gdb) up 3
#3  0x00005621f402af29 in dns_query_complete (q=0x5621f51af440, state=<optimized out>) at ../src/resolve/resolved-dns-query.c:491
491	../src/resolve/resolved-dns-query.c: No such file or directory.
(gdb) print *q
$1 = {manager = 0x5621f4eac010, auxiliary_for = 0x0, auxiliary_queries = 0x0, n_auxiliary_queries = 0, auxiliary_result = 0, 
  question_idna = 0x5621f52ac410, question_utf8 = 0x5621f52ac410, flags = 287, ifindex = 0, clamp_ttl = true, state = DNS_TRANSACTION_ERRNO, 
  n_cname_redirects = 0, candidates = 0x0, timeout_event_source = 0x0, answer = 0x0, answer_rcode = 0, 
  answer_dnssec_result = _DNSSEC_RESULT_INVALID, answer_authenticated = false, answer_protocol = _DNS_PROTOCOL_INVALID, answer_family = 0, 
  answer_search_domain = 0x0, answer_errno = 0, previous_redirect_unauthenticated = false, bus_request = 0x0, varlink_request = 0x0, 
  request_family = 0, request_address_valid = false, request_address = {in = {s_addr = 0}, in6 = {__in6_u = {
        __u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, 
    bytes = '\000' <repeats 15 times>}, block_all_complete = 0, request_address_string = 0x0, request_dns_packet = 0x5621f53d86e0, 
  request_dns_stream = 0x0, reply_dns_packet = 0x0, stub_listener_extra = 0x5621f4eafe30, complete = 0x5621f4036820 <dns_stub_query_complete>, 
  block_ready = 0, bus_track = 0x0, queries_next = 0x0, queries_prev = 0x0, auxiliary_queries_next = 0x0, auxiliary_queries_prev = 0x220}
(gdb) 

@keszybz keszybz removed the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Feb 16, 2021
@ioerror
Copy link
Author

ioerror commented Feb 16, 2021

For completeness and before I clear the corefiles, I wanted to share at least a small amount of the information contained within.

I think the two SIGSEV corefiles roughly look like they're crashing after a timeout on a query. The query is routed over an upstream TLS resolver as I indicated in my configuration for systemd-resolved.

# coredumpctl debug 4779
           PID: 4779 (systemd-resolve)
           UID: 103 (systemd-resolve)
           GID: 104 (systemd-resolve)
        Signal: 11 (SEGV)
     Timestamp: Fri 2021-02-12 12:20:43 CET (4 days ago)
  Command Line: /lib/systemd/systemd-resolved
    Executable: /usr/lib/systemd/systemd-resolved
 Control Group: /system.slice/systemd-resolved.service
          Unit: systemd-resolved.service
         Slice: system.slice
       Boot ID: c945e57aa61e4c148cc99845c802f724
    Machine ID: ebb806bf9dda4e9fb40030432029330e
      Hostname: apu4d4
       Storage: /var/lib/systemd/coredump/core.systemd-resolve.103.c945e57aa61e4c148cc99845c802f724.4779.1613128843000000.zst
       Message: Process 4779 (systemd-resolve) of user 103 dumped core.
                
                Stack trace of thread 4779:
                #0  0x00007f8cf910c2b9 sd_event_source_unref (libsystemd-shared-247.so + 0x1b62b9)
                #1  0x0000557750ceee52 dns_query_stop (systemd-resolved + 0x14e52)
                #2  0x0000557750ceeec0 dns_query_complete (systemd-resolved + 0x14ec0)
                #3  0x0000557750cef03b on_query_timeout (systemd-resolved + 0x1503b)
                #4  0x00007f8cf910e3ee source_dispatch (libsystemd-shared-247.so + 0x1b83ee)
                #5  0x00007f8cf910e771 sd_event_dispatch (libsystemd-shared-247.so + 0x1b8771)
                #6  0x00007f8cf910fad8 sd_event_run (libsystemd-shared-247.so + 0x1b9ad8)
                #7  0x00007f8cf910fcf7 sd_event_loop (libsystemd-shared-247.so + 0x1b9cf7)
                #8  0x0000557750d12e6b run (systemd-resolved + 0x38e6b)
                #9  0x0000557750ce6a66 main (systemd-resolved + 0xca66)
                #10 0x00007f8cf8921d0a __libc_start_main (libc.so.6 + 0x26d0a)
                #11 0x0000557750ce6b0a _start (systemd-resolved + 0xcb0a)

GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Reading symbols from /usr/lib/systemd/systemd-resolved...
Reading symbols from /usr/lib/debug/.build-id/ef/2c7b57995fb00a6336ca843fc3e191ef0557aa.debug...
[New LWP 4779]
bt full[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/lib/systemd/systemd-resolved'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f8cf910c2b9 in sd_event_source_unref (p=0xffffffff00000000) at ../src/libsystemd/sd-event/sd-event.c:1966
1966	../src/libsystemd/sd-event/sd-event.c: No such file or directory.
(gdb) bt full
#0  0x00007f8cf910c2b9 in sd_event_source_unref (p=0xffffffff00000000) at ../src/libsystemd/sd-event/sd-event.c:1966
        __PRETTY_FUNCTION__ = "sd_event_source_unref"
#1  0x0000557750ceee52 in dns_query_stop (q=q@entry=0x557752a8c190) at ../src/resolve/resolved-dns-query.c:300
        c = <optimized out>
        __PRETTY_FUNCTION__ = "dns_query_stop"
#2  0x0000557750ceeec0 in dns_query_complete (q=0x557752a8c190, state=<optimized out>)
    at ../src/resolve/resolved-dns-query.c:498
        __PRETTY_FUNCTION__ = "dns_query_complete"
#3  0x0000557750cef03b in on_query_timeout (s=<optimized out>, usec=<optimized out>, userdata=<optimized out>)
    at ../src/resolve/resolved-dns-query.c:509
        q = <optimized out>
        __PRETTY_FUNCTION__ = "on_query_timeout"
#4  0x00007f8cf910e3ee in source_dispatch (s=s@entry=0x557752b59b80) at ../src/libsystemd/sd-event/sd-event.c:3246
        saved_event = 0x5577528f3a90
        saved_type = SOURCE_TIME_BOOTTIME
        r = <optimized out>
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#5  0x00007f8cf910e771 in sd_event_dispatch (e=e@entry=0x5577528f3a90) at ../src/libsystemd/sd-event/sd-event.c:3689
        ref = <optimized out>
        p = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_dispatch"
#6  0x00007f8cf910fad8 in sd_event_run (e=e@entry=0x5577528f3a90, timeout=timeout@entry=18446744073709551615)
    at ../src/libsystemd/sd-event/sd-event.c:3747
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#7  0x00007f8cf910fcf7 in sd_event_loop (e=0x5577528f3a90) at ../src/libsystemd/sd-event/sd-event.c:3769
        ref = 0x5577528f3a90
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_loop"
#8  0x0000557750d12e6b in run (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:92
        m = <optimized out>
        notify_stop = 0x557750d347f8 "STOPPING=1\nSTATUS=Shutting down..."
        r = <optimized out>
        __func__ = "run"
        __PRETTY_FUNCTION__ = "run"
#9  0x0000557750ce6a66 in main (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:99
        r = <optimized out>
(gdb) up 3
#3  0x0000557750cef03b in on_query_timeout (s=<optimized out>, usec=<optimized out>, userdata=<optimized out>)
    at ../src/resolve/resolved-dns-query.c:509
509	../src/resolve/resolved-dns-query.c: No such file or directory.
(gdb) print *q
value has been optimized out
(gdb) down 3
#0  0x00007f8cf910c2b9 in sd_event_source_unref (p=0xffffffff00000000) at ../src/libsystemd/sd-event/sd-event.c:1966
1966	../src/libsystemd/sd-event/sd-event.c: No such file or directory.
(gdb) print *q
No symbol "q" in current context.
(gdb) print *a
Attempt to take contents of a non-pointer value.
(gdb) print a
$1 = {i = {0, 1045149306}, d = 1.2904777690891933e-08}
# coredumpctl debug 1115
           PID: 1115 (systemd-resolve)
           UID: 103 (systemd-resolve)
           GID: 104 (systemd-resolve)
        Signal: 11 (SEGV)
     Timestamp: Thu 2021-02-11 13:58:17 CET (5 days ago)
  Command Line: /lib/systemd/systemd-resolved
    Executable: /usr/lib/systemd/systemd-resolved
 Control Group: /system.slice/systemd-resolved.service
          Unit: systemd-resolved.service
         Slice: system.slice
       Boot ID: 2eb719e5567e44ffaaab9f4aa523af85
    Machine ID: ebb806bf9dda4e9fb40030432029330e
      Hostname: apu4d4
       Storage: /var/lib/systemd/coredump/core.systemd-resolve.103.2eb719e5567e44ffaaab9f4aa523af85.1115.1613048297000000.zst
       Message: Process 1115 (systemd-resolve) of user 103 dumped core.
                
                Stack trace of thread 1115:
                #0  0x0000562016c17430 n/a (n/a + 0x0)

GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Reading symbols from /usr/lib/systemd/systemd-resolved...
Reading symbols from /usr/lib/debug/.build-id/ef/2c7b57995fb00a6336ca843fc3e191ef0557aa.debug...
[New LWP 1115]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/lib/systemd/systemd-resolved'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000562016c17430 in ?? ()
(gdb) bt full
#0  0x0000562016c17430 in ?? ()
No symbol table info available.
#1  0x0000562014d2f03b in on_query_timeout (s=<optimized out>, usec=<optimized out>, userdata=<optimized out>)
    at ../src/resolve/resolved-dns-query.c:509
        q = <optimized out>
        __PRETTY_FUNCTION__ = "on_query_timeout"
#2  0x00007fa1959c93ee in source_dispatch (s=s@entry=0x562016b9edc0) at ../src/libsystemd/sd-event/sd-event.c:3246
        saved_event = 0x562016b6a3b0
        saved_type = SOURCE_TIME_BOOTTIME
        r = <optimized out>
        __PRETTY_FUNCTION__ = "source_dispatch"
        __func__ = "source_dispatch"
#3  0x00007fa1959c9771 in sd_event_dispatch (e=e@entry=0x562016b6a3b0) at ../src/libsystemd/sd-event/sd-event.c:3689
        ref = <optimized out>
        p = <optimized out>
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_dispatch"
#4  0x00007fa1959caad8 in sd_event_run (e=e@entry=0x562016b6a3b0, timeout=timeout@entry=18446744073709551615)
    at ../src/libsystemd/sd-event/sd-event.c:3747
        r = 1
        __PRETTY_FUNCTION__ = "sd_event_run"
#5  0x00007fa1959cacf7 in sd_event_loop (e=0x562016b6a3b0) at ../src/libsystemd/sd-event/sd-event.c:3769
        ref = 0x562016b6a3b0
        r = <optimized out>
        __PRETTY_FUNCTION__ = "sd_event_loop"
#6  0x0000562014d52e6b in run (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:92
        m = <optimized out>
        notify_stop = 0x562014d747f8 "STOPPING=1\nSTATUS=Shutting down..."
        r = <optimized out>
        __func__ = "run"
        __PRETTY_FUNCTION__ = "run"
#7  0x0000562014d26a66 in main (argc=<optimized out>, argv=<optimized out>) at ../src/resolve/resolved.c:99
        r = <optimized out>

@ioerror
Copy link
Author

ioerror commented Feb 16, 2021

I was thinking of trying to capture the network traffic that leads to either a SIGABRT or a SIGSEGV for systemd-resolved. However, when looking at another corefile (SIGSEV), I have the impression that it would be non-trivial to capture interesting packets in a pcap as I think both bugs are triggered by a query (or the lack of a response?) inside of a TLS session rather than a query or an answer on the UDP listener on my local network segment. Does that seem to be the case to anyone else?

If it is likely inside of an encrypted connection... is there a systemd way to log the TLS session keys to allow for decryption of the TLS connection? Part of me hopes the answer is no, but if it is possible, I'm willing to try something else to track these issues down.

@ScurvyLouse

This comment has been minimized.

@ioerror

This comment has been minimized.

keszybz added a commit to keszybz/systemd that referenced this issue Mar 1, 2021
We generally operate on the assumption that a source is "gone" as soon
as we unref it. This is generally true because we have the only reference.
But if something else holds the reference, our unref doesn't really stop
the source and it could fire again.

In particular, on_query_timeout() is called with DnsQuery* as userdata, and
it calls dns_query_stop() which invalidates that pointer. If it was ever
called again, we'd be accessing already-freed memory.

I don't see what would hold the reference. sd-event takes a temporary reference,
but on the sd_event object, not on the individual sources. And our sources
are non-floating, so there is no reference from the sd_event object to the
sources.

For systemd#18427.
@keszybz
Copy link
Member

keszybz commented Mar 1, 2021

@ioerror if you could maybe try #18832? (Only the second patch matters.)

@keszybz
Copy link
Member

keszybz commented Mar 23, 2021

I'm pretty sure this is fixed by #18832.

@keszybz keszybz closed this as completed Mar 23, 2021
bluca pushed a commit to bluca/systemd that referenced this issue Apr 1, 2021
We generally operate on the assumption that a source is "gone" as soon
as we unref it. This is generally true because we have the only reference.
But if something else holds the reference, our unref doesn't really stop
the source and it could fire again.

In particular, on_query_timeout() is called with DnsQuery* as userdata, and
it calls dns_query_stop() which invalidates that pointer. If it was ever
called again, we'd be accessing already-freed memory.

I don't see what would hold the reference. sd-event takes a temporary reference,
but on the sd_event object, not on the individual sources. And our sources
are non-floating, so there is no reference from the sd_event object to the
sources.

For systemd#18427.

(cherry picked from commit 9793530)
mikhailnov pushed a commit to mikhailnov/systemd that referenced this issue Apr 23, 2021
We generally operate on the assumption that a source is "gone" as soon
as we unref it. This is generally true because we have the only reference.
But if something else holds the reference, our unref doesn't really stop
the source and it could fire again.

In particular, on_query_timeout() is called with DnsQuery* as userdata, and
it calls dns_query_stop() which invalidates that pointer. If it was ever
called again, we'd be accessing already-freed memory.

I don't see what would hold the reference. sd-event takes a temporary reference,
but on the sd_event object, not on the individual sources. And our sources
are non-floating, so there is no reference from the sd_event object to the
sources.

For systemd#18427.

(cherry picked from commit 9793530)
(cherry picked from commit 78a43c3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants