Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masking files/directories #2282

Open
flx42 opened this issue Apr 18, 2018 · 19 comments
Open

Masking files/directories #2282

flx42 opened this issue Apr 18, 2018 · 19 comments
Labels
Feature New feature, not a bug

Comments

@flx42
Copy link
Contributor

flx42 commented Apr 18, 2018

The Linux section of the OCI runtime spec has an array of strings called maskedPaths:

maskedPaths (array of strings, OPTIONAL) will mask over the provided paths inside the container so that they cannot be read. The values MUST be absolute paths in the container namespace.

Here's an example from the default OCI spec generated by runc:

		"maskedPaths": [
			"/proc/kcore",
			"/proc/latency_stats",
			"/proc/timer_list",
			"/proc/timer_stats",
			"/proc/sched_debug",
			"/sys/firmware",
			"/proc/scsi"
		],

This is how runc handles it:
https://github.com/opencontainers/runc/blob/63e6708c74c1cc46091ec92ea9df5aca4d82e803/libcontainer/rootfs_linux.go#L777-L790

// For files, maskPath bind mounts /dev/null over the top of the specified path.
// For directories, maskPath mounts read-only tmpfs over the top of the specified path.

Straighforward, but in LXC I don't see a true equivalent. If you know in advance if the target is a file or directory, you can do the right lxc.mount.entry. Here, files (/proc/kcore) and directories (/sys/firmware) are in the same list, and also some paths might not exist (/proc/latency_stats for me).
The solution I've used is to add both mounts, but both optional:
brauner@c816bd8

If the target is a file, the first mount should succeed and the second one will fail.
If the target is a directory, the first mount will fail and the second one should succeed.
If the target doesn't exist, both mounts will fail.

There is a gap: if the target exists, but both mounts fail for another reason. We have no way of guaranteeing that at least one mount will succeed if the target exists.

What do you think? Should we add another configuration options to handle this case natively? Any other idea?

@hallyn
Copy link
Member

hallyn commented Apr 19, 2018 via email

@flx42
Copy link
Contributor Author

flx42 commented Apr 19, 2018

If you control the OCI runtime spec, you can do everything you want security-wise.
There are cases where you do want to do privileged operations inside your container, so I guess that's the reason why it's per-container.

It's outside of the scope of the spec, but I understand your point, maybe the runtime should be free to add its own security restrictions.

@hallyn
Copy link
Member

hallyn commented Apr 27, 2018

Ok so in any case the general feature is fine. I think a good way forward would be:

  1. implement lxc.pathmask = which does what you describe
  2. maybe add some default lxc.pathmask's to the common config file

@brauner, @tych0 what do you think?

@tych0
Copy link
Member

tych0 commented Apr 27, 2018

Is there a reason we can't use deny $evil_path rwklx in the apparmor profile? I guess lxc doesn't have a way to append rules to the apparmor profile like lxd does, but perhaps that would be a more flexible alternative?

@tych0
Copy link
Member

tych0 commented Apr 27, 2018

I guess that makes the behavior slightly different than the OCI spec, since you'd get EPERM instead of a file of zero length when you opened it, but since nobody is presumably looking at these files anyway, it's probably ok.

@flx42
Copy link
Contributor Author

flx42 commented Apr 27, 2018

It doesn't look like the spec mandates how it should be implemented. I also found this comment that agrees with you :)
opencontainers/runtime-spec#582 (comment)
I actually prefer EPERM, an empty file looks like you failed your bind-mount ;)

So, you're right, we could use apparmor. However, I don't feel it means we have to discard the idea of adding a new option. Apparmor vs bind-mount would be an implementation detail, with bind-mount being the fallback when apparmor is not available.

@hallyn
Copy link
Member

hallyn commented Apr 28, 2018

So that sounds like we want a generic lxc.pathmask mechanism, which would be implemented so as to use apparmor or selinux when active, or a bind mount otherwise?

How many users are there who actually aren't using apparmor?

(Keeping in mind that the bind mount approach has other implications such as inability to further create nested containers without jumping through hoops)

@flx42
Copy link
Contributor Author

flx42 commented Apr 28, 2018

Hm, good point. Let's start with apparmor then, we will need a way to add those rules to the profile.
OK with that @brauner?

@brauner
Copy link
Member

brauner commented Apr 28, 2018

I'm fine with doing it via AppArmor. I'm against lxc.pathmask for now though.

@flx42
Copy link
Contributor Author

flx42 commented Apr 28, 2018

Yeah, agree, we can revisit at a later time. I'll let you close this issue @brauner if you want.

@brauner brauner closed this as completed Apr 28, 2018
@brauner
Copy link
Member

brauner commented Apr 30, 2018

So, here's my idea. Doing it via AppArmor by default is fine. But I'm against the lxc.pathmask solution for LXC. Plain LXC users that want to hide paths and files can do this right now by using lxc.mount.entry. For the OCI config parser we can simply establish and additional internal protocol. This can be as simple as, while maskedPaths or whatever it's called is parsed we set a boolean on the corresponding mount entry hidden to true. (Might need to change the or add a struct for the mount entries.) Then when we go on to mount them we check whether this boolean is set to true and if so and there's no AppArmor support enabled we will stat the source and determine whether it is a file or not. When it's a file we yank /dev/null over it otherwise we'll simply strap a tmpfs on top of it. Thoughts?
//cc @hallyn, flx42

@brauner brauner reopened this Apr 30, 2018
@flx42
Copy link
Contributor Author

flx42 commented Apr 30, 2018

This can be as simple as, while maskedPaths or whatever it's called is parsed we set a boolean on the corresponding mount entry hidden to true. (Might need to change the or add a struct for the mount entries.)

I was following you until this point. What's the corresponding mount entry?
If I want to hide /sys/firmware and my corresponding mount entry is /sys, how would I do that? Wouldn't I need a new mount entry anyway for applying this hidden flag to a subset of sysfs?

@brauner
Copy link
Member

brauner commented Apr 30, 2018

I need a new mount entry anyway for applying this hidden flag to a subset of sysfs?

Now I'm not following: why is your corresponding mount entry /sys if you want to hide /sys/firmware? Wouldn't maskedPath list /sys/firmware and hence, you could add a mount entry for /sys/firmware to the container's mount list and set the boolean hidden to true?

@flx42
Copy link
Contributor Author

flx42 commented Apr 30, 2018

Ok, I was not sure you were adding a new mount entry.
But so hidden is the same as the proposed lxc.maskpath? Except that you also carry all the other fstab options with you, which could get confusing. You would have to reject bind,hidden, but not hidden,optional, or you would have to reject if the type is not none.

@brauner
Copy link
Member

brauner commented Apr 30, 2018

Nope, you're getting me wrong. :) Right now, we have:

static int set_config_mount(const char *key, const char *value,
			    struct lxc_conf *lxc_conf, void *data)
{
	char *mntelem;
	struct lxc_list *mntlist;

	if (lxc_config_value_empty(value))
		return lxc_clear_mount_entries(lxc_conf);

	mntlist = malloc(sizeof(*mntlist));
	if (!mntlist)
		return -1;

	mntelem = strdup(value);
	if (!mntelem) {
		free(mntlist);
		return -1;
	}
	mntlist->elem = mntelem;

	lxc_list_add_tail(&lxc_conf->mount_list, mntlist);

	return 0;
}

and mntelem is essentially just a simple char *. But we could e.g. do:

struct lxc_mount_entry {
        char *fstab_line;
        bool hidden;
};

@brauner
Copy link
Member

brauner commented Apr 30, 2018

So when you parse maskedPaths you'd do:

struct lxc_mount_entry dummy;
dummy.fstab_line = strdup(whatever);
dummy.hidden = true;

@flx42
Copy link
Contributor Author

flx42 commented Apr 30, 2018

My bad, you did say "internal protocol" above. :) Thanks for the clarification.

Ok, going back to the AppArmor option. I see that LXD embeds the AppArmor profile in the code, appends the lines passed by the user in raw.apparmor then execs apparmor_parser.

Could that work for LXC? If a user specifies lxc.apparmor.profile = my-profile, how would you augment this profile with the deny options for path masking? Do we need to generate my-profile-custom-container1? Can we stack profiles? (that doesn't seem like the right use case...)

Another consideration, if we have 100 applications containers to launch, it seems we would need to generate 100 different profiles then parse them/load them. How bad would be the performance hit? I don't think the cache would help here since it seems name based, but I might be wrong.

@brauner
Copy link
Member

brauner commented May 2, 2018

My bad, you did say "internal protocol" above. :) Thanks for the clarification.

As in "definitely not user visible". :)

Could that work for LXC?

If it works for LXD it should work for LXC so you could just copy that logic into C.

Another consideration, if we have 100 applications containers to launch, it seems we would need to generate 100 different profiles then parse them/load them. How bad would be the performance hit? I don't think the cache would help here since it seems name based, but I might be wrong.

Using the cache should help. There's also this comment in LXD that gives some clarification:

	/* In order to avoid forcing a profile parse (potentially slow) on
	 * every container start, let's use apparmor's binary policy cache,
	 * which checks mtime of the files to figure out if the policy needs to
	 * be regenerated.
	 *
	 * Since it uses mtimes, we shouldn't just always write out our local
	 * apparmor template; instead we should check to see whether the
	 * template is the same as ours. If it isn't we should write our
	 * version out so that the new changes are reflected and we definitely
	 * force a recompile.
	 */

@flx42
Copy link
Contributor Author

flx42 commented May 2, 2018

If it works for LXD it should work for LXC so you could just copy that logic into C.

Do you have an equivalent of lxc.apparmor.profile = my-profile in LXD?
If you control the full apparmor profile in text form, it's simple to assemble it: get the default profile, sprinkle the raw.apparmor bits on top, then parse it/cache it. That's a single profile.

If you must make do with an existing profile (my-profile), isn't the only option to stack my-profile with a new profile for masking the paths? So, we would have my-profile//&container1-masked-paths.

Using the cache should help. There's also this comment in LXD that gives some clarification:

I saw this part, but I guess the caching relies on the name of the file, which is fine for a single container since it's unlikely to change its apparmor profile often.
But for 100 containers, wouldn't you generate 100 files? You would have files container1-masked-paths, container2-masked-paths, container3-masked-paths, ... ,container100-masked-paths.
The cache wouldn't apply. You would have to go one step further: hash the content of the file to check if container1-masked-paths == container2-masked-paths
Let me know if I missed something.

I think it's useful to be able to add custom apparmor bits, but it seems like a complex solution to a simple problem :)

@stgraber stgraber added the Feature New feature, not a bug label Mar 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature, not a bug
Development

No branches or pull requests

5 participants