Skip to content

System can become unresponsive during systemd daemon-reload with a group for mirror, as groups cannot be queried at this time #178

@t-m-w

Description

@t-m-w

Systems tested

  • Fedora Kinoite, systemd 258.3-2.fc43, bindfs 1.18.3
  • Debian 13.2, systemd 257.9-1~deb13u1, bindfs 1.14.7

Issue description

With a bindfs mount that was mounted via fstab, if the mountpoint has yet to be visited, an interaction between systemd and bindfs results in temporary system unresponsiveness (90 seconds) when running systemctl daemon-reload.

Prerequisites

  • /etc/fstab entry for the mount
  • Filesystem must be mounted already
  • The user cache within bindfs must not yet be populated (i.e. the mount hasn't been visited since boot)

Steps to reproduce (as root)

mkdir /root/src
mkdir /root/dst1
echo '/root/src /root/dst1 fuse.bindfs mirror=@root 0 0' | tee -a /etc/fstab
# Optionally, run an early `systemctl daemon-reload` here.
# It will be quick, unlike the next one after we've mounted this below.
mount /root/dst1
time systemctl daemon-reload # hangs for 90 seconds

# Further checks for demonstration purposes
time systemctl daemon-reload # ok
pkill -USR1 bindfs
time systemctl daemon-reload # hangs for 90 seconds
umount /root/dst1
mount /root/dst1
time systemctl daemon-reload # hangs for 90 seconds
pkill -USR1 bindfs
stat /root/dst1
time systemctl daemon-reload # ok

Or you can reboot after you add the fstab line, and then run time systemctl daemon-reload after reboot, and there will be a hang. (time is only used to show how long it takes; its presence is not involved in the hang.)

Technical details

When systemctl daemon-reload is called, the fstab generator hangs and eventually times out when it stats an affected bindfs mountpoint. This seems to be because bindfs tries to obtain group information, but systemd apparently cannot provide this information when it's in the process of reloading its daemon.

It looks like the relevant call chain in bindfs.c is the getattr callbacks -> getattr_common -> is_mirrored_user -> user_belongs_to_group.

Possible solutions

  1. Since user_belongs_to_group appears to use a cache, maybe bindfs could populate that cache at process start, and perhaps also (somehow) ensure a cache refresh is not attempted when systemd is being reloaded. As far as this particular issue is concerned, it might be safer for the signal handler to actually refresh the cache rather than just invalidate it.
  2. See if this can be addressed in systemd somehow instead, rather than here in bindfs.

Workarounds

  1. Use a systemd mount unit file for the affected bindfs mounts instead of fstab, since it is systemd-fstab-generator that attempts to stat the mountpoint.
  2. Review the mountpoint with ls or stat before running systemctl daemon-reload so that its attributes are cached and there doesn't need to be another call for group information when systemd is reloading.

Troubleshooting tips

systemctl log-level debug shows extra info in the log, including when running systemctl daemon-reload. On examination, you can see in the journal that systemd-fstab-generator is one of the processes spawned during this time, along with its PID. Unlike other generators, there isn't a quick completion for the PID launched for that one. Moving /usr/lib/systemd/system-generators/systemd-fstab-generator out of the way and replacing with a symlink to /bin/true confirmed that it was implicated.

strace -T -t was also helpful, including having a wrapper script for systemd-fstab-generator that calls it with strace -t -T and saves a log; this made it clear that the process hangs on the fstat step for the mountpoint.

Using bindfs -d, also with strace -T -t was helpful. Without strace, it wouldn't be as clear that there was a group query happening around the time of the hang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions