Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[241] "No such process" when starting user instance with hidepid=2 and cgroupv2 #12955

Closed
yadutaf opened this issue Jul 4, 2019 · 14 comments
Closed

Comments

@yadutaf
Copy link

yadutaf commented Jul 4, 2019

systemd version the issue has been seen with

241

Used distribution

Yocto, with an additional "-Ddefault-hierarchy=unified" build flag

Expected behaviour you didn't see

user@xxxx.service should start

Unexpected behaviour you saw

user@xxxx.service is in state failed with the following logs

root@machine:~# systemctl status user@xxxx.service
● user@xxxx.service - User Manager for UID xxxx
   Loaded: loaded (/lib/systemd/system/user@.service; static; vendor preset: enabled)
   Active: failed (Result: protocol) since Thu 2019-07-04 09:00:23 UTC; 2min 28s ago
     Docs: man:user@.service(5)
  Process: 1433 ExecStart=/lib/systemd/systemd --user (code=exited, status=1/FAILURE)
 Main PID: 1433 (code=exited, status=1/FAILURE)
      CPU: 5ms

Jul 04 09:00:23 machine systemd[1]: Starting User Manager for UID xxxx...
Jul 04 09:00:23 machine systemd[1433]: Failed to determine supported controllers: No such process
Jul 04 09:00:23 machine systemd[1433]: Failed to allocate manager object: No such process
Jul 04 09:00:23 machine systemd[1]: user@xxxx.service: Failed with result 'protocol'.
Jul 04 09:00:23 machine systemd[1]: Failed to start User Manager for UID xxxx.
Jul 04 09:00:23 machine systemd[1]: user@xxxx.service: Consumed 5ms CPU time.

Steps to reproduce the problem

  1. mount -oremount,gid=4,hidepid=2 /proc
  2. /bin/loginctl enable-linger user-xxxx

We are using the "unified" hierarchy, not sure whether it makes any difference

Workaround

This can be worked-around by adding SupplementaryGroups=4 to

(Original hint from https://github.com/pld-linux/systemd/blob/master/proc-hidepid.patch)

But, in this case, all user processes created by systemd user instance have the supplementary group 4, which reduces the usefulness of the 'hidepid=2' feature. It does not however affect sessions opened via SSH so that there is still some benefit :)

Is there any way to avoid propagating the supplementary group to the child processes or, is there a way to avoid needing the patch at least on the user@.service (which I believe, would 'fix' the 'leak') ? I'd naively imagine that the user instance could query the system instance for the info it needs :)

Thanks,

@poettering
Copy link
Member

poettering commented Jul 4, 2019

Sorry, but hidepid= is simply not supported on systemd instances. we require the ability to identify remote processes and most of our services run at minimal privileges, and this means this cannot work. Sorry.

There has been work on making hidepid= a true mount option (i.e. an option of the mount itself instead of the pidns), but it stalled. If we ever get that we could have a per-service hidepid= which would make things a lot more useful.

But anyway, sorry. hidepid= the way it is is not compatible with systemd, and that's not going to change.

@poettering
Copy link
Member

poettering commented Jul 4, 2019

The patch set I was talking of is this:

https://lwn.net/Articles/738597/

If you are interested in this, please work with the kernel folks, resurrect the patch set, and try to make it work.

@yadutaf
Copy link
Author

yadutaf commented Jul 5, 2019

Thanks for your quick reply !

I understand Systemd requires to see all PIDs on the system and it's fine. Maybe there could be a way to start the systemd user instance with CAP_SET_GID and instruct it to drop the additional group (4 in my case) either when it no longer needs it (if this time exists), either before exec-ing a non-systemd binary.

Would it make sense ?

@poettering
Copy link
Member

poettering commented Jul 5, 2019

It's not just systemd, it's a lot of other stuff we require. Polkit/dbus for example and so on...

I think the clean fix would be to fix the kernel to make hidepid not a system-wide setting, see above. Instead of patching around in systemd, just fix this properly by patching around in the kernel.

@yadutaf
Copy link
Author

yadutaf commented Jul 18, 2019

I was able to hack something which works in my use case with:

  1. A hackish patch on the kernel (4.9)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ae6d4aa65c8f..11163b7bbe4d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -675,11 +675,14 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
 /*
  * May current process learn task's sched/cmdline info (for hide_pid_min=1)
  * or euid/egid (for hide_pid_min=2)?
+ * jean-tiare: Hack for systemd user instance: Always allow PID 1.
  */
 static bool has_pid_permissions(struct pid_namespace *pid,
 				 struct task_struct *task,
 				 int hide_pid_min)
 {
+	if (task->pid == 1)
+		return true;
 	if (pid->hide_pid < hide_pid_min)
 		return true;
 	if (in_group_p(pid->pid_gid))
  1. A drop in for systemd-logind.service
# Grant full access to all PIDs to systemd-logind
[Service]
SupplementaryGroups=4

With both patches, the full test suite passes on my side. Systemd user instances are working fine and the started process have only access there own PIDs and PID 1. This may be missing a lot of (corner ?) cases but this suggests that 'hidepid' support in systemd is not very far and might not require a rework in /proc. Not sure if proposing a new mount flag like "showpid1" for procfs would be acceptable on the kernel side though.

komachi added a commit to komachi/ansible-decent-desktop that referenced this issue Jan 31, 2021
fix broken privoxy config
fix gsimplecal locale
install gantsign.keyboard from galaxy, remove submodule
disable hidepid=2 as it breaks systemd user units systemd/systemd#12955
rename decent_keyring to remote_keyring
remove empty logrotate role
remove tempdirs after using
update README
@Zocker1999NET
Copy link
Contributor

Zocker1999NET commented Mar 17, 2021

I fixed it on Linux 5.10 without patching the kernel, only by creating a special group proc, which has access to all process information by using following configuration in /etc/fstab with gid=proc:

proc /proc proc nosuid,nodev,noexec,hidepid=2,gid=proc 0 0

Then, to allow systemd-logind to access all processes, add a drop-in config for systemd-logind.service adding the supplement group proc to systemd-logind:

[Service]
SupplementaryGroups=proc

After restarting the computer (or restarting systemd-logind), everything worked fine like before enabling hidepid=2 as I observed it.

@4oo4
Copy link

4oo4 commented Aug 30, 2021

@Zocker1999NET Thanks for that workaround, could you elaborate a bit on how you did the permissions for the proc group? I tried something like this in addition to the gid=proc mount option, and wasn't able to create my user session afterward:

groupadd proc
usermod -a -G root proc
usermod -a -G proc Debian-gdm

Cheers

@Zocker1999NET
Copy link
Contributor

Zocker1999NET commented Sep 1, 2021

@4oo4 I just discovered that I also added the supplementary group to the user@.service, that might also be important … As I remember it, while using SDDM and KDE Plasma, I could jump into a working X11 user session but services depending on dbus and other user services where broken. Maybe this solves your problem? Otherwise you could post some relevant logs from journalctl?
Processes launched as root should still be allowed to access all processes without adding this additional group to them.

To clarify what I did exactly: I only created the group itself, changed/added the proc mount line and added the supplement group to both services. I never touched a group/user called Debian-gdm, but I don't use GDM as display manager. I use the KDE Plasma suite with SDDM as display manager. I implemented this in Ansible, see here. This should roughly translate to executing following commands in bash/zsh:

groupadd --system proc # --system primarily influences the range for choosing a GID, this might be however important, idk
echo "proc /proc proc nosuid,nodev,noexec,hidepid=2,gid=proc 0 0" >> /etc/fstab # assuming proc mount line is missing from fstab
for d in /etc/systemd/system/{systemd-logind,user@}.service.d; do
  mkdir -p "$d"
  cat > "$d"/proc_hidepid_whitelist.conf <<EOF
[Service]
SupplementaryGroups=proc
EOF
done

@poettering
Copy link
Member

poettering commented Sep 1, 2021

Note that recent kernels allow per-namespace hidepid, and we expose this through ProctectProc= and turned it on for all our long-running services.

This is a much nicer solution since turning this on is opt-in per-service and making use of it can be done generically for all systems instead of just doing local hacks. And turning this on per-service won't negatively impact the rest of the system.

@4oo4
Copy link

4oo4 commented Sep 1, 2021

@Zocker1999NET Even though we're using different desktop environments, that sounds extremely similar to my issue. I couldn't properly start a wayland desktop session because of issues with dbus and the Debian-gdm service not starting properly, so it would instead start a partially-working X11 session.

@poettering Good to know, so that means that the patch set you were referring to finally got merged?

Thanks

@poettering
Copy link
Member

poettering commented Sep 3, 2021

@poettering Good to know, so that means that the patch set you were referring to finally got merged?

Yes.

@nyuszika7h
Copy link

nyuszika7h commented Sep 7, 2021

Note that recent kernels allow per-namespace hidepid, and we expose this through ProctectProc= and turned it on for all our long-running services.

This is a much nicer solution since turning this on is opt-in per-service and making use of it can be done generically for all systems instead of just doing local hacks. And turning this on per-service won't negatively impact the rest of the system.

This is not a complete solution. Users may run one-off commands that contain credentials in the command line or environment, which would be left unprotected.

@MichaelHierweck
Copy link

MichaelHierweck commented Oct 6, 2021

@poettering I'm unsure whether

  • to mount proc with hidepid=invisible using Linux 5.10+ and a recent systemd version (which?) or
  • to avoid the hidepid mount option and to use Linux 5.10+ and a recent systemd version (which?) and use ProctectProc. How coud "ProtectProc" be applied to all "normal user sessions" (cron, ssh,...) then?

@jplitza
Copy link

jplitza commented Oct 29, 2021

@poettering How does having this as a service option help? To avoid users seeing each other's processes, I have to set ProtectProc on user@.service, too. Otherwise, the user could elevate its privileges by starting custom systemd user units.

The only real workaround I've found so far is disabling the unified cgroup hierarchy by setting the boot option systemd.unified_cgroup_hierarchy=0, which starts the user instance successfully but causes this message in the journal:

-.slice: Failed to migrate controller cgroups from /user.slice/user-1000.slice/user@1000.service, ignoring: Permission denied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants