Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd-logind Watchdog timeout / killed debian 9 #6283

Closed
pheanex opened this issue Jul 4, 2017 · 4 comments
Closed

Systemd-logind Watchdog timeout / killed debian 9 #6283

pheanex opened this issue Jul 4, 2017 · 4 comments

Comments

@pheanex
Copy link

pheanex commented Jul 4, 2017

Submission type

  • Bug report

systemd version the issue has been seen with

root@:~# systemd --version
systemd 232
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN

Used distribution

Linux 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64

In case of bug report: Expected behaviour you didn't see

systemd-logind working as expected

In case of bug report: Unexpected behaviour you saw

Systemd-logind watchdog timeout was killed by kernel

In case of bug report: Steps to reproduce the problem

unknown

Log from journalctl:
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Killing process 792 (systemd-logind) with signal SIGABRT.
Jul 03 17:25:50 <hostname> kernel: INFO: rcu_sched self-detected stall on CPU
Jul 03 17:25:50 <hostname> kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Jul 03 17:25:50 <hostname> kernel:         0-...: (1 GPs behind) idle=2f3/140000000000001/0 softirq=4418620/4418622 fqs=0 
Jul 03 17:25:50 <hostname> kernel:         (detected by 1, t=31588 jiffies, g=2903271, c=2903270, q=125)
Jul 03 17:25:50 <hostname> kernel: Task dump for CPU 0:
Jul 03 17:25:50 <hostname> kernel: perl            R  running task        0 12181  30472 0x00000000
Jul 03 17:25:50 <hostname> kernel: Task dump for CPU 0:
Jul 03 17:25:50 <hostname> kernel: perl            R  running task        0 12181  30472 0x00000000
Jul 03 17:25:50 <hostname> kernel:  0000000000000000 00000000024280ca ffffc32845817d78 ffffffffb4185c86
Jul 03 17:25:50 <hostname> kernel:  ffff9d8dffffacc0 ffff9d8dffffacc0 0000000000000000 ffff9d8dffffacc0
Jul 03 17:25:50 <hostname> kernel:  0000000300000001 0000000000000000 495f42ee8a7b80d1 00000000024280ca
Jul 03 17:25:50 <hostname> kernel: Call Trace:
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4185c86>] ? __alloc_pages_nodemask+0xf6/0x260
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb41d7c3e>] ? alloc_pages_vma+0xae/0x260
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb41b39ba>] ? handle_mm_fault+0x111a/0x1350
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb421a3c8>] ? dput+0xc8/0x250
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb405fd84>] ? __do_page_fault+0x2a4/0x510
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4607688>] ? async_page_fault+0x28/0x30
Jul 03 17:25:50 <hostname> kernel: rcu_sched kthread starved for 31588 jiffies! g2903271 c2903270 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
Jul 03 17:25:50 <hostname> kernel: rcu_sched       S    0     7      2 0x00000000
Jul 03 17:25:50 <hostname> kernel:  ffff9d8df75f9000 0000000000000000 ffff9d8df81dc040 ffff9d8dffd18240
Jul 03 17:25:50 <hostname> kernel:  ffff9d8df822a000 ffffc32840c87db0 ffffffffb46015d3 ffffc32840c87de0
Jul 03 17:25:50 <hostname> kernel:  00000001011ddcfd ffff9d8dffd18240 0000000000000001 ffff9d8df81dc040
Jul 03 17:25:50 <hostname> kernel: Call Trace:
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb46015d3>] ? __schedule+0x233/0x6d0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4601aa2>] ? schedule+0x32/0x80
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4604dae>] ? schedule_timeout+0x17e/0x310
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40e3e50>] ? del_timer_sync+0x50/0x50
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40dd605>] ? rcu_gp_kthread+0x505/0x850
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40b8799>] ? __wake_up_common+0x49/0x80
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40dd100>] ? rcu_note_context_switch+0xe0/0xe0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40965d7>] ? kthread+0xd7/0xf0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4096500>] ? kthread_park+0x60/0x60
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb46064f5>] ? ret_from_fork+0x25/0x30
Jul 03 17:25:50 <hostname> kernel:         0-...: (1 GPs behind) idle=2f3/140000000000001/0 softirq=4418620/4418622 fqs=1 
Jul 03 17:25:50 <hostname> kernel:          (t=31588 jiffies g=2903271 c=2903270 q=159)
Jul 03 17:25:50 <hostname> kernel: Task dump for CPU 0:
Jul 03 17:25:50 <hostname> kernel: perl            R  running task        0 12181  30472 0x00000000
Jul 03 17:25:50 <hostname> kernel:  ffffffffb4d13580 ffffffffb40a3bcb 0000000000000000 ffffffffb4d13580
Jul 03 17:25:50 <hostname> kernel:  ffffffffb417a4b6 ffff9d8dffc18fc0 ffffffffb4c4a6c0 0000000000000000
Jul 03 17:25:50 <hostname> kernel:  ffffffffb4d13580 00000000ffffffff ffffffffb40dee04 00000000003d0900
Jul 03 17:25:50 <hostname> kernel: Call Trace:
Jul 03 17:25:50 <hostname> kernel:  <IRQ> 
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40a3bcb>] ? sched_show_task+0xcb/0x130
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb417a4b6>] ? rcu_dump_cpu_stacks+0x92/0xb2
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40dee04>] ? rcu_check_callbacks+0x754/0x8a0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40ed0c3>] ? update_wall_time+0x473/0x790
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40f48c0>] ? tick_sched_handle.isra.12+0x50/0x50
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40e5718>] ? update_process_times+0x28/0x50
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40f4890>] ? tick_sched_handle.isra.12+0x20/0x50
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40f48f8>] ? tick_sched_timer+0x38/0x70
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40e60fc>] ? __hrtimer_run_queues+0xdc/0x240
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb40e67cc>] ? hrtimer_interrupt+0x9c/0x1a0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4608ba9>] ? smp_apic_timer_interrupt+0x39/0x50
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4607ec2>] ? apic_timer_interrupt+0x82/0x90
Jul 03 17:25:50 <hostname> kernel:  <EOI> 
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4334f87>] ? clear_page_c_e+0x7/0x10
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb41849f8>] ? get_page_from_freelist+0x908/0xb40
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb420bf29>] ? terminate_walk+0x89/0xf0
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4210d6b>] ? path_openat+0x76b/0x1440
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4185c86>] ? __alloc_pages_nodemask+0xf6/0x260
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb41d7c3e>] ? alloc_pages_vma+0xae/0x260
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb41b39ba>] ? handle_mm_fault+0x111a/0x1350
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb421a3c8>] ? dput+0xc8/0x250
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb405fd84>] ? __do_page_fault+0x2a4/0x510
Jul 03 17:25:50 <hostname> kernel:  [<ffffffffb4607688>] ? async_page_fault+0x28/0x30
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Main process exited, code=killed, status=6/ABRT
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Unit entered failed state.
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Failed with result 'watchdog'.
Jul 03 17:25:50 <hostname> systemd[1]: systemd-logind.service: Service has no hold-off time, scheduling restart.
Jul 03 17:25:50 <hostname> systemd[1]: Stopped Login Service.
Jul 03 17:25:50 <hostname> systemd[1]: Starting Login Service...
Jul 03 17:25:50 <hostname> systemd[1]: Started Login Service.
Jul 03 17:25:50 <hostname> systemd-logind[12185]: New seat seat0.
Jul 03 17:25:50 <hostname> systemd-logind[12185]: Watching system buttons on /dev/input/event1 (Power Button)
Jul 03 17:25:50 <hostname> systemd-logind[12185]: New session 101 of user root.
Jul 03 17:25:50 <hostname> systemd-logind[12185]: New session 103 of user root.


root@<hostname>:~# dpkg -l | grep systemd
ii  libpam-systemd:amd64            232-25                         amd64        system and service manager - PAM module
ii  libsystemd0:amd64               232-25                         amd64        systemd utility library
ii  python3-systemd                 233-1                          amd64        Python 3 bindings for systemd
ii  systemd                         232-25                         amd64        system and service manager
ii  systemd-sysv                    232-25                         amd64        system and service manager - SysV links


root@<hostname>:~# systemctl cat systemd-logind.service 
# /lib/systemd/system/systemd-logind.service
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Login Service
Documentation=man:systemd-logind.service(8) man:logind.conf(5)
Documentation=http://www.freedesktop.org/wiki/Software/systemd/logind
Documentation=http://www.freedesktop.org/wiki/Software/systemd/multiseat
Wants=user.slice
After=nss-user-lookup.target user.slice
ConditionPathExists=/lib/systemd/system/dbus.service

# Ask for the dbus socket. If running over kdbus, the socket will
# not be actually used.
Wants=dbus.socket
After=dbus.socket

[Service]
ExecStart=/lib/systemd/systemd-logind
Restart=always
RestartSec=0
BusName=org.freedesktop.login1
WatchdogSec=3min
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_MAC_ADMIN CAP_AUDIT_CONTROL CAP_CHOWN CAP_KILL CAP_DAC_READ_SEARCH CAP_DAC_OVERRIDE CAP_FOWNER CAP_SYS_TTY_CONFIG
MemoryDenyWriteExecute=yes
RestrictRealtime=yes
RestrictAddressFamilies=AF_UNIX AF_NETLINK AF_INET AF_INET6
SystemCallFilter=~@clock @cpu-emulation @debug @keyring @module @obsolete @raw-io

# Increase the default a bit in order to allow many simultaneous
# logins since we keep one fd open per session.
LimitNOFILE=16384

Hope the provided information is enough and helps.
Will provide more if needed.

@GiedriusS
Copy link
Contributor

Are you sure this isn't a kernel bug?

@poettering
Copy link
Member

I am sorry. But this is definitely a kernel bug. Userspace should not be able to trigger such stalls and the kernel's output only happens for kernel hangs. Please report this to your kernel maintainers, there's nothing systemd can do about this from userspace.

Sorry. Closing.

@pheanex
Copy link
Author

pheanex commented Jul 5, 2017

Thank you for the quick feedback. Will do

@ivan
Copy link

ivan commented Jan 1, 2018

3+ minute stalls have been around for a very long time; they happen regularly for me when doing things to millions of XFS inodes, and on Linux, IO stalls on one device seem to spread to almost all processes trying to do a write. I think systemd should be more accommodating for this Linux problem, especially because killing systemd-logind crashes Xorg sessions (I just saw an xscreensaver-locked session get dumped to an unlocked VT - fun stuff.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants