Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpam-cgfs breaks systemd user sessions #117

Closed
bigon opened this issue Jun 3, 2016 · 21 comments
Closed

libpam-cgfs breaks systemd user sessions #117

bigon opened this issue Jun 3, 2016 · 21 comments

Comments

@bigon
Copy link

bigon commented Jun 3, 2016

From: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=826260

Hi,

The default configuration of libpam-cgfs breaks systemd user session
management.

All the processes started by systemd --user are not properly put in the
correct systemd cgroup. This can be verified with "loginctl user-status"
command.

This is at least breaking stuffs like policykit.

Removing "name=systemd" from the call to the pam module seems to fix
this. I don't think it's the role of libpam-cgfs to mangle with systemd
cgroups.

@stgraber
Copy link
Member

stgraber commented Jun 3, 2016

Hmm, @hallyn thoughts?

@stgraber
Copy link
Member

stgraber commented Jun 3, 2016

The bug doesn't give much details on exactly what's going on.

libpam-cgfs isn't supposed to mangle pre-configured cgroup controlers, at least not by moving you to another. It should just change cgroup ownership if required.

Could you paste /proc/self/cgroup of a good and of a broken session, that'd help us see what's lxcfs doing exactly and how we can make it smarter...

@bigon
Copy link
Author

bigon commented Jun 3, 2016

Without libpam-cgfs:

10:pids:/user.slice/user-1002.slice
9:cpuset:/
8:freezer:/
7:memory:/user.slice
6:cpu,cpuacct:/user.slice
5:blkio:/user.slice
4:net_cls,net_prio:/
3:devices:/user.slice
2:perf_event:/
1:name=systemd:/user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service

With libpam-cgfs:

10:pids:/user.slice/user-1002.slice
9:cpuset:/
8:freezer:/user/test/4
7:memory:/user/test/4
6:cpu,cpuacct:/user.slice
5:blkio:/user.slice
4:net_cls,net_prio:/
3:devices:/user.slice
2:perf_event:/
1:name=systemd:/user/test/4/gnome-terminal-server.service

This is with systemd 230

@bigon
Copy link
Author

bigon commented Jun 3, 2016

FTR this is the PAM config in Debian/Ubuntu

optional pam_cgfs.so -c freezer,memory,name=systemd

@hallyn
Copy link
Member

hallyn commented Jun 4, 2016

On Fri, Jun 03, 2016 at 12:12:27PM -0700, Laurent Bigonville wrote:

Without libpam-cgfs:

10:pids:/user.slice/user-1002.slice
9:cpuset:/
8:freezer:/
7:memory:/user.slice
6:cpu,cpuacct:/user.slice
5:blkio:/user.slice
4:net_cls,net_prio:/
3:devices:/user.slice
2:perf_event:/
1:name=systemd:/user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service

Right, this is the problem: the format of the systemd-generated
cgroup path used to be
/user/$uid.user/$something.session
then it became
user-$uid.slice/session-c$session.scope

Now I'm not quite certain whether it has changed again, or whether this
has something to do with the 'gnome terminal server'.

libpam-cgfs only creates a new cgroup if it detects that the existing
one was not created by systemd. If it was created by systemd, then
it only chowns the cgroup to the user. (Not doing this prevents
unprivileged users from creating containers running systemd)

@bigon
Copy link
Author

bigon commented Jun 4, 2016

It's probably more an issue with systemd --user

If you are using debian/ubuntu this can be tested by installing dbus-user-session

@hallyn
Copy link
Member

hallyn commented Jun 6, 2016

Note this is a touchy issue. When you log in, lxcfs can do one of several things:

  1. It can create and chown you a new cgroup. This means systemd
    loses track of where you are, but it means you can create containers
    there.
  2. It can just chown the cgroup you're in. However that's dangerous
    if systemd did not create your cgroup for you. You may now own a
    cgroup in which more privileged tasks also sit.
  3. It can of course just do nothing. But then you cannot start unpriv
    containers.

So the question is - do we think that, long term, we can keep up with
systemd's cgroup name generation schemes and always know without doubt
whether a group was created for us?

Maybe it's always safe so long as, relative to (X=cgroupof 1) we
are in /user.slice/user-$uid.slice/** ? I think we can safely say
that no more privileged task should ever be in a cgroup under another
uid's cgroup?

Alternatively we could switch lxcfs's strategy: lxcfs always simply
creates a cgroup for the user and chowns it, under /lxcfs/user.$uid,
but does not move the login tasks there. Rather, lxc learns to
check for /lxcfs/user.$uid if current's cgroup is not writeable.

That may be the safest and most resilient way to go here. It also keeps
us from playing in systemd's sandbox.

@stgraber
Copy link
Member

stgraber commented Jun 6, 2016

Hmm, so how about the following:

  • If cgroup path is "/", create a cgroup, chown it and move the task into it
  • If cgroup path is a known user-specific path, then just chown
  • Otherwise do nothing

This would avoid similar breakage in the future as we'd hit the "do nothing case" rather than have libpam-cgfs create a new path and move the task when it was already in a configured cgroup.

We'd still need to update the path matching logic to match any path under the user-specific one but I don't see this as particularly controversial.

@stgraber
Copy link
Member

stgraber commented Jun 6, 2016

As for your proposal to change LXC, the main downside I can see to this (besides having to update LXC everywhere) is that the containers wouldn't be under the user session anymore, so if systemd did setup a controller with some limits, this would effectively bypass those limits.

@hallyn
Copy link
Member

hallyn commented Jun 6, 2016

Quoting Stéphane Graber (notifications@github.com):

As for your proposal to change LXC, the main downside I can see to this (besides having to update LXC everywhere)

That one is not really legitimate - as it would use the current cgroup if it exists. So it
wouldn't break anything that isn't broken now.

is that the containers wouldn't be under the user session anymore, so if systemd did setup a controller with some limits, this would effectively bypass those limits.

I lament that this sounds like conceding that systemd should be the way to set those limits :)

But yes, that would be an issue.

@hallyn
Copy link
Member

hallyn commented Jun 6, 2016

Quoting Stéphane Graber (notifications@github.com):

Hmm, so how about the following:

  • If cgroup path is "/", create a cgroup, chown it and move the task into it
  • If cgroup path is a known user-specific path, then just chown

that was the original idea. The problem is that 'known' keeps changing.

So for the moment, do we consider a valid v3 systemd-generated path
to be anything like /user.slice/user-$uid.slice/** ?

iow ignore the rest of the path in

user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service

@stgraber
Copy link
Member

stgraber commented Jun 6, 2016

Well, it didn't change so much as when using systemd for the user session, it moves things in paths under the path we were already tracking.

So yeah, I think that considering our current path as a prefix rather than an absolute path to check against should do the trick.

@hallyn
Copy link
Member

hallyn commented Jun 6, 2016 via email

hallyn pushed a commit to hallyn/lxcfs that referenced this issue Jun 12, 2016
…uid.slice

Closes lxc#117

Signed-off-by: Serge Hallyn <serge@fermat.io>
hallyn pushed a commit to hallyn/lxcfs that referenced this issue Jun 12, 2016
…uid.slice

Closes lxc#117

Signed-off-by: Serge Hallyn <serge@fermat.io>
hallyn pushed a commit to hallyn/lxcfs that referenced this issue Jun 12, 2016
…uid.slice

Closes lxc#117

Signed-off-by: Serge Hallyn <serge@fermat.io>
hallyn pushed a commit to hallyn/lxcfs that referenced this issue Jun 12, 2016
…uid.slice

(relative to our init's path)

Closes lxc#117

Signed-off-by: Serge Hallyn <serge@fermat.io>
@hallyn hallyn closed this as completed in 475a859 Jun 12, 2016
@bigon
Copy link
Author

bigon commented Jun 12, 2016

I've tested this patch in the debian package https://anonscm.debian.org/cgit/pkg-lxc/lxcfs.git/commit/?id=73b2f113829c8f6bead764f168df42d80cbc5923 and the cgroups are not the same:

Without libpam-cgfs:

9:perf_event:/
8:cpuset:/
7:pids:/user.slice/user-1002.slice
6:devices:/user.slice
5:cpu,cpuacct:/user.slice
4:freezer:/
3:blkio:/user.slice
2:net_cls,net_prio:/
1:name=systemd:/user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service

With libpam-cgfs (and the patch):

9:perf_event:/
8:cpuset:/
7:pids:/user.slice/user-1002.slice
6:devices:/user.slice
5:cpu,cpuacct:/user.slice
4:freezer:/user/test/0
3:blkio:/user.slice
2:net_cls,net_prio:/
1:name=systemd:/user/test/0/gnome-terminal-server.service

stgraber pushed a commit that referenced this issue Jun 13, 2016
…uid.slice

(relative to our init's path)

Closes #117

Signed-off-by: Serge Hallyn <serge@hallyn.com>
@hallyn
Copy link
Member

hallyn commented Jun 14, 2016 via email

@hallyn
Copy link
Member

hallyn commented Jun 16, 2016

I've set up a ubuntu xenial vm where I've put the following:

[Unit]
Description=debug

[Service]
ExecStart=/home/ubuntu/dome.sh

into .local/share/systemd/user/debug.service, where ~/dome.sh is:

#!/bin/sh
/bin/cat /proc/self/cgroup >> /tmp/cgroup

i do systemd --user in one terminal, then do

systemctl --user start debug

in another. /tmp/cgroup shows:

ubuntu@lxcfs:~$ cat /tmp/cgroup
11:freezer:/user/ubuntu/1
10:devices:/user.slice
9:cpu,cpuacct:/user.slice
8:net_cls,net_prio:/
7:perf_event:/
6:blkio:/user.slice
5:cpuset:/
4:pids:/user.slice/user-1000.slice
3:hugetlb:/
2:memory:/user/ubuntu/1
1:name=systemd:/user.slice/user-1000.slice/session-1.scope/debug.service

This is both with the xenial package and with a package built from
upstream lxcfs.

I see that unstable is at a much newer systemd version 230-2.

So it appears that a change in systemd likely regressed this behavior, but
I'm not sure how.

@hallyn
Copy link
Member

hallyn commented Jun 17, 2016

Ah - found the problem. When init is in the root cgroup, it is '/',
in all other cases it does not end in '/'. I'll add a check for that.

@hallyn
Copy link
Member

hallyn commented Jun 20, 2016 via email

@evgeni
Copy link
Contributor

evgeni commented Jun 20, 2016

debs are on https://people.debian.org/~evgeni/tmp/lxcfs/, bigon has promised tot test them when he has some time.

@bigon
Copy link
Author

bigon commented Jun 22, 2016

I now get this with the package provided by @evgeni

9:blkio:/user.slice
8:pids:/user.slice/user-1002.slice
7:freezer:/user/test/0
6:cpu,cpuacct:/user.slice
5:devices:/user.slice
4:net_cls,net_prio:/
3:perf_event:/
2:cpuset:/
1:name=systemd:/user.slice/user-1002.slice/user@1002.service/gnome-terminal-server.service

So it looks OK now I think

@hallyn
Copy link
Member

hallyn commented Jun 23, 2016

Great, thanks for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants