Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devpts improvements #20463

Open
brauner opened this issue Aug 17, 2021 · 13 comments
Open

devpts improvements #20463

brauner opened this issue Aug 17, 2021 · 13 comments
Labels

Comments

@brauner
Copy link
Contributor

brauner commented Aug 17, 2021

In the before-fore times we used to have /dev/ptmx and then through a long and convoluted history we ended up with devpts. That is so long ago that neither Google, Facebook, or Twitter existed. In fact, I couldn't even have used Wikipedia to lookup when devpts was created.
All distros now mount a devpts instance at /dev/pts with the ptmx device located in /dev/pts/ptmx.

But for backward compatibility reasons distros still provide the ptmx device at /dev/ptmx. My ultimate dream had been to just kill /dev/ptmx completely and I once tried to address this in glibc directly but of course that went nowhere https://sourceware.org/legacy-ml/libc-alpha/2018-03/msg00360.html because of reasons I found not convincing.

I can live with the not getting rid of /dev/ptmx completely in favor of /dev/pts/ptmx part. However, the distro landscape is a cornucopia of ways to provide /dev/ptmx. There are three ways to provide /dev/ptmx:

  1. /dev/ptmx is a symlink to /dev/pts/ptmx
    In this case devpts must have either been mounted with ptmxmode=0666 or
    chmod 0666 /dev/pts/ptmx must have been called.
    So any open() on /dev/pts/ptmx will succeed.
  2. /dev/ptmx is a bind-mount of /dev/pts/ptmx
    Analogous to 1. devpts must have either been mounted with ptmxmode=0666
    or chmod 0666 /dev/pts/ptmx must have been called.
    So any open() on /dev/pts/ptmx will succeed.
  3. /dev/ptmx is a separate ptmx device node
    In this case devpts can either be mounted with ptmxmode=0666 or
    ptmxmode=0000. In the latter case privileged opens of /dev/pts/ptmx will
    succeed while unprivileged opens will fail. The unprivileged failure
    case will be unproblematic since we always fallback to opening /dev/ptmx
    which should have permission 0666. If it doesn't then we would fail the
    exact same way we always did.

This is a nightmare. The most annoying case is distros that provide 3., a separate /dev/ptmx device node that has different permissions for no reason whatsoever than the identical device node at /dev/pts/ptmx. This is especially nonsensical since the dependent device received from a newly opened dominating device of /dev/ptmx needs to be looked up or opened (at least if TIOCGPTPEER is not available) by going through /dev/pts. It also technically requires a userspace program to check that /dev/ptmx and /dev/pts/ptmx are the same device node if they want to avoid interacting with the wrong dependent device.

If at all feasible, I would like to discuss the possibility of systemd becoming opinionated and enforcing one of the three cases uniformly or at least enforcing that the permissions between the device node in /dev/pts/ptmx and /dev/ptmx always match if they are the same device node.

@bluca bluca added the udev label Aug 17, 2021
@floppym
Copy link
Contributor

floppym commented Aug 17, 2021

Current state:

systemd implements behavior 3.

systemd-nspawn implements behavior 1.

@poettering
Copy link
Member

so, we kinda just follow here what the kernel does for us, and don't undo kernel preparations. That's why we implement behaviour 3 for the host systemd: we don't actually create ptmx, devtmpfs does that for us.

udev these days is not in the business of creating device nodes, the kernel does that on its own, we just adjust the access modes. (tmpfiles creates some nodes though, based on MODALIAS info exported by kmods that want to be lazy-loaded, but ptmx is not like that)

nspawn otoh doesn't use devtmpfs (because there's only one devtmpfs for the whole system), instead /dev is populated manually there starting from an empty tmpfs. There we set up /dev/ptmx as symlink to /dev/pts/ptmx. It's the only way really, since the way pts namespacing works is that you instantiate a new devptsfs instance for each container, hence we must redirect /dev/ptmx into the mouted devptsfs).

I fully agree that it's kinda moronic to have two fully distinct setups here...

Note that accessing /dev/pts/ptmx directly is never OK under the status quo, because it's explicitly marked 0000 to make it inaccessible if the /dev/ptmx node is an actual device node. You have to go through the /dev/ptmx name, to get to the right node, which glibc so far did.

Maybe a fix would be to change the kernel to pre-mount devpts the same way it premounts devtmpfs for the host, and if so replace /dev/ptmx by a symlink. If that's a kernel option (similar to how devtmpfs already is) then things would just work for current systemd userspace, as we never create /dev/ptmx on the host anyway, and will just use whatever the kernel supports.

(doing this would be slightly icky though: we need to pass gid=5 to the kernel during mounts of devpts, hence the kernel would have to hardcode that during build time)

@poettering
Copy link
Member

to say this differently:

i feel uncomfortable with replacing a device node the kernel put into devtmpfs with a symlink. kernel should not create stuff there we are expected to the delete and replace. it should create things the right way out of the box.

(I'd be a bit more comfortable with automatically bind mounting /dev/pts/ptmx to /dev/ptmx whenever we mount /dev/pts. But it's kind ugly too, since the device node will then change identity in the middle of everything)

@brauner
Copy link
Contributor Author

brauner commented Aug 18, 2021

so, we kinda just follow here what the kernel does for us, and don't undo kernel preparations. That's why we implement behaviour 3 for the host systemd: we don't actually create ptmx, devtmpfs does that for us.

Yes, as I've said in another place I'm aware of that. While I hate this two distinct device nodes model I can sympathize with not changing kernel layout.

udev these days is not in the business of creating device nodes, the kernel does that on its own, we just adjust the access modes. (tmpfiles creates some nodes though, based on MODALIAS info exported by kmods that want to be lazy-loaded, but ptmx is not like that)

Yes, I know.

nspawn otoh doesn't use devtmpfs (because there's only one devtmpfs for the whole system), instead /dev is populated manually there starting from an empty tmpfs. There we set up /dev/ptmx as symlink to /dev/pts/ptmx. It's the only way really, since the way pts namespacing works is that you instantiate a new devptsfs instance for each container, hence we must redirect /dev/ptmx into the mouted devptsfs).

Yes, I know. We spearheaded this model in LXC which has been copied by all other runtimes.

I fully agree that it's kinda moronic to have two fully distinct setups here...

Note that accessing /dev/pts/ptmx directly is never OK under the status quo, because it's explicitly marked 0000 to make it inaccessible if the /dev/ptmx node is an actual device node. You have to go through the /dev/ptmx name, to get to the right node, which glibc so far did.

And this is the problem. The goal of devpts has been to phase out /dev/ptmx in favor /dev/pts/ptmx and to get rid of the setuid nonsense that glibc used to do to allow unprivileged users access to terminal devices.

In addition to this it doesn't make a lot of sense since root can open /dev/pts/ptmx even if ptmxmode=0000 is used.

Maybe a fix would be to change the kernel to pre-mount devpts the same way it premounts devtmpfs for the host, and if so replace /dev/ptmx by a symlink. If that's a kernel option (similar to how devtmpfs already is) then things would just work for current systemd userspace, as we never create /dev/ptmx on the host anyway, and will just use whatever the kernel supports.

I would propose systemd to mount devpts with ptmxmode=0666 and align permissions between /dev/ptmx and /dev/pts/ptmx.

@brauner
Copy link
Contributor Author

brauner commented Aug 18, 2021

The problem with having distinct permissions between /dev/ptmx and /dev/pts/ptmx is that it also weakens security.
Especially in the face of mount namespaces and containers given the fact that each devpts instance is a separate instance, i.e. ptys allocated in one devpts instance are separate from pts in another devpts instance. An unprivileged container can happily mount new devpts instances.

For example, the openpty() helper is inherently an unsafe function, at least on kernels without a feature we added a few years back (I'll get to that.).

The openpty() helper that glibc exposes always tries to open /dev/ptmx and allocate a new pty fd pair from there. But this is problematic, especially in containers. In order to safely guarantee that openpty() allocates devices in the correct devpts instance you need to have a way of verifying that /dev/ptmx actually refers to the correct devpts instance. If it doesn't you have a potential security problem as you might be tricked into operating on the wrong device. Mounting two devpts instances:

sudo mount -t devpts devpts /dev/pts
sudo mount -t devpts devpts /mnt

and now opening /dev/ptmx It isn't easily clear to the caller what devpts instance /dev/ptmx refers to and where it will it make the new dependent device appear. You could even overmount /dev/pts with a new devpts instance and pre-allocate the same number of pty devices as the old devpts instance and play games with watches on either the overmounted devpts instance or open calls on the /dev/ptmx device to be in control of the device allocation. That's not very interesting mostly but it points to a lot of potential for confusion. I'm not an experienced attacker but there is at least some potential here.

All of this doesn't even go into the issue of racy-path lookup for pty devices. There simply is no safe way to call openpty() on /dev/ptmx, then retrieve the number of the dependent device via TIOCGPTN and open the path in /dev/pts/<nr>. It's inherently racy.

The solution to this problem was the addition of a new TIOCGPTPEER ioctl we added that makes it possible to retrieve the dependent device fd solely based on the fd of the dominating device. This is what openpty() will try to use if the kernel supports it. But in order to be really safe you essentially have to hand-roll your own openpty() (Which is recommended anyway since this function is broken in other ways.).

All of these races become less severe once /dev/ptmx and /dev/pts/ptmx always have the same permissions. systemd-nspawn and LXC both get this right. They mount with ptmxmode=0666. At this point I can open /dev/pts and stash that fd.
Then call open on /dev/pts/ptmx and use lookup that is scoped beneath the /dev/pts fd I'm holding and I know that I can't be tricked.

I do think that mounting devpts with ptmxmode=0666 on the host - or making sure that /dev/ptmx and /dev/pts/ptmx have the same permission is a good thing to do. It's consistent and it improves security.

@poettering
Copy link
Member

poettering commented Aug 18, 2021

I would propose systemd to mount devpts with ptmxmode=0666 and align permissions between /dev/ptmx and /dev/pts/ptmx.

That's a bit icky. /dev/ptmx does not linux driver model, so we cannot process it via udev, and adjust perms there. /dev/pts/ptmx is a device node that bypasses the whole driver model, it's just there.

I mean, we can certainly add ptmxmode=0666 to the mount option and make sure that our udev rule for /dev/ptmx says the exact same thing, but that of course would then have to be kept in sync manually, there wouldn't really be any automatic syncing, only our manually coded adjustment of both.

Note that /dev/ptmx is currently owned by the tty group. /dev/pts/ptmx is not. The group of course doesn't really matter given the 0666 access mode, but to keep things fully in sync we should probably adjust that too. Except that there is no mount option for that in devpts afaics? that kinda sucks, as adjusting this a posteriori isn't ideal

@poettering
Copy link
Member

poettering commented Aug 18, 2021

so what about this:

  1. systemd will start setting ptmxmode=0666 when mounting devpts
  2. systemd will start creating /dev/ptmx as symlink to pts/ptmx if /dev/ptmx is missing
  3. kernel gains an option to turn off support for /dev/ptmx. If this option is used both the kernel backing code code is removed and the device node in devtmpfs is no longer created.
  4. profitttttt!

also good:
6. devpts would gain ptmxgid= so that we can adjust gid to tty too, out of the box, when mounting

@brauner
Copy link
Contributor Author

brauner commented Aug 18, 2021

I would propose systemd to mount devpts with ptmxmode=0666 and align permissions between /dev/ptmx and /dev/pts/ptmx.

That's a bit icky. /dev/ptmx follows linux driver model, so we can process it via udev, and adjust perms there. /dev/pts/ptmx is a device node that bypasses the whole driver model, it's just there.

I mean, we can certainly add ptmxmode=0666 to the mount option and make sure that our udev rule for /dev/ptmx says the exact same thing, but that of course would then have to be kept in sync manually, there wouldn't really be any automatic syncing, only our manually coded adjustment of both.

Note that /dev/ptmx is currently owned by the tty group. /dev/pts/ptmx is not. The group of course doesn't really matter given the 0666 access mode, but to keep things fully in sync we should probably adjust that too. Except that there is no mount option for that in devpts afaics? that kinda sucks, as adjusting this a posteriori isn't ideal

There is a mount option for devpts. It's gid=. We set:

		ret = fs_set_property(fd_fs, "gid", "5");
		if (ret < 0)
			SYSTRACE("Failed to set \"gid=5\" on devpts filesystem context %d", fd_fs);

		ret = fs_set_property(fd_fs, "ptmxmode", "0666");
		if (ret < 0)
			return syserror("Failed to set \"ptmxmode=0666\" property on devpts filesystem context %d", fd_fs);

		ret = fs_set_property(fd_fs, "mode", "0620");
		if (ret < 0)
			return syserror("Failed to set \"mode=0620\" property on devpts filesystem context %d", fd_fs);

@floppym
Copy link
Contributor

floppym commented Aug 18, 2021

There is a mount option for devpts. It's gid=. We set:

It looks like the gid option does not apply to the ptmx node within devpts.

% mount | grep devpts
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)

% ls -ln /dev/pts
total 0
crw--w---- 1 10000 5 136, 0 Aug 15 14:01 0
crw--w---- 1 10000 5 136, 1 Aug 15 14:03 1
crw--w---- 1 10000 5 136, 2 Aug 18 10:07 2
crw--w---- 1 10000 5 136, 3 Aug 18 09:55 3
c--------- 1     0 0   5, 2 Aug 14 13:07 ptmx

@brauner
Copy link
Contributor Author

brauner commented Aug 18, 2021

Right, the assumption is that while you may want to delegate /dev/pts/<nr> devices to unprivileged users you don't necessarily want to delegate access to the multiplexing device /dev/pts/ptmx itself. In other words, making ptmx accessible to unprivileged users is a separate delegation step. This is also in line with /dev/ptmx. Its permissions are changed through a udev rule in userspace, not by the kernel.

@floppym
Copy link
Contributor

floppym commented Aug 18, 2021

I suppose we could configure the group and mode on /dev/pts/ptmx using a tmpfiles.d snippet. That would provide a sane default behavior, while also allowing the system admin to override it.

Something like:

z /dev/pts/ptmx 0666 - tty

@poettering
Copy link
Member

I'd really prefer if there was a ptmxgid= as a mount option to devptsfs, so that we can set this right immediately

@brauner
Copy link
Contributor Author

brauner commented Aug 18, 2021

I'd really prefer if there was a ptmxgid= as a mount option to devptsfs, so that we can set this right immediately

This is going to be rejected with "just chown it".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants