Skip to content

Commit

Permalink
bpf: update device cgroup semantics
Browse files Browse the repository at this point in the history
LXC has supported the bpf device controlller for a while now. A bpf device
program can be attached to the container's cgroup if this is a pure cgroup2
host.

The format for specifying device rules for the cgroup2 bpf device controller is
the same as for the legacy cgroup device controller; only the configuration key
prefix has to change. Specifically, device rules for the legacy cgroup device
controller are specified by via lxc.cgroup.devices.{allow,deny} whereas for the
cgroup2 bpf device controller lxc.cgroup2.devices.{allow,deny} must be used.

The following semantics apply:
1. The device rule "lxc.cgroup2.devices.deny = a" will cause LXC to instruct
   the kernel to block access to all devices by default. To grant access to
   devices "allow device rules" must be added via the
   "lxc.cgroup2.devices.allow" key. This is referred to as a "allowlist" device
   program.
2. The device rule "lxc.cgroup2.devices.allow = a" will cause LXC to instruct
   the kernel to allow access to all devices by default. To deny access to
   devices "deny device rules" must be added via "lxc.cgroup2.devices.deny"
   key. This is referred to as a "denylist" device program.
3. Specifying a rule as explained in 1. or 2. will cause all previous rules to
   be cleared, i.e. the device list will be reset.

For example the set of rules:

lxc.cgroup2.devices.deny = a
lxc.cgroup2.devices.allow = c *:* m
lxc.cgroup2.devices.allow = b *:* m
lxc.cgroup2.devices.allow = c 1:3 rwm

implements a "allowlist" device program, i.e. the kernel will block access to
all devices not specifically allowed in this list. This particular program
states that all character and block devices might be created but only /dev/null
might be read or written.

If we to switch to the set of rules to:

lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm

then LXC would instruct the kernel to implement a "denylist", i.e. the kernel
will allow access to all devices not specifically denied in this list. This
particular program states that no character devices or block devices might be
created and that /dev/null is not allow allowed to be read, written, or
created.

Consider the same program but followed by a rule as explained in 1. or 2.:

lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.allow = a

The last line will cause LXC to reset the device list without changing the type
of device program.

lxc.cgroup2.devices.allow = a
lxc.cgroup2.devices.deny = c *:* m
lxc.cgroup2.devices.deny = b *:* m
lxc.cgroup2.devices.deny = c 1:3 rwm
lxc.cgroup2.devices.deny = a

The last line will cause LXC to reset the device list and switch from a
"allowlist" program to a "denylist" program.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
  • Loading branch information
Christian Brauner committed Feb 26, 2021
1 parent 25d20b9 commit a0e788a
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 64 deletions.
102 changes: 44 additions & 58 deletions src/lxc/cgroups/cgroup2_devices.c
Expand Up @@ -430,10 +430,29 @@ void bpf_device_program_free(struct cgroup_ops *ops)
}
}

static inline bool bpf_device_list_block_all(const struct bpf_devices *bpf_devices)
{
/* LXC_BPF_DEVICE_CGROUP_ALLOWLIST -> block ("allowlist") all devices. */
return bpf_devices->list_type == LXC_BPF_DEVICE_CGROUP_ALLOWLIST;
}

static inline bool bpf_device_add(const struct bpf_devices *bpf_devices,
struct device_item *device)
{
/* We're blocking all devices so skip individual deny rules. */
if (bpf_device_list_block_all(bpf_devices) && !device->allow)
return log_trace(false, "Device cgroup blocks all devices; skipping specific deny rules");

/* We're allowing all devices so skip individual allow rules. */
if (!bpf_device_list_block_all(bpf_devices) && device->allow)
return log_trace(false, "Device cgroup allows all devices; skipping specific allow rules");

return true;
}

int bpf_list_add_device(struct bpf_devices *bpf_devices,
struct device_item *device)
{
int device_list_altered = 0;
__do_free struct lxc_list *list_elem = NULL;
__do_free struct device_item *new_device = NULL;
struct lxc_list *it;
Expand All @@ -447,24 +466,22 @@ int bpf_list_add_device(struct bpf_devices *bpf_devices,
device->minor < 0 &&
is_empty_string(device->access)) {
if (device->allow) {
if (bpf_devices->list_type != LXC_BPF_DEVICE_CGROUP_DENYLIST) {
bpf_devices->list_type = LXC_BPF_DEVICE_CGROUP_DENYLIST;
device_list_altered = 1;
}

bpf_devices->list_type = LXC_BPF_DEVICE_CGROUP_DENYLIST;
TRACE("Device cgroup will allow (\"denylist\") all devices by default");
} else {
if (bpf_devices->list_type != LXC_BPF_DEVICE_CGROUP_ALLOWLIST) {
bpf_devices->list_type = LXC_BPF_DEVICE_CGROUP_ALLOWLIST;
device_list_altered = 1;
}

bpf_devices->list_type = LXC_BPF_DEVICE_CGROUP_ALLOWLIST;
TRACE("Device cgroup will block (\"allowlist\") all devices by default");
}

return device_list_altered;
/* Reset the device list. */
lxc_clear_cgroup2_devices(bpf_devices);
TRACE("Resetting cgroup device list");
return 1; /* The device list was altered. */
}

TRACE("Processing new device rule: type %c, major %d, minor %d, access %s, allow %d",
device->type, device->major, device->minor, device->access, device->allow);

lxc_list_for_each(it, &bpf_devices->device_item) {
struct device_item *cur = it->elem;

Expand All @@ -477,25 +494,23 @@ int bpf_list_add_device(struct bpf_devices *bpf_devices,
if (!strequal(cur->access, device->access))
continue;

if (!bpf_device_add(bpf_devices, cur))
continue;

/*
* The rule is switched from allow to deny or vica versa so
* don't bother allocating just flip the existing one.
*/
if (cur->allow != device->allow) {
cur->allow = device->allow;
device_list_altered = 1;

TRACE("Switched existing rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);
} else {
TRACE("Reusing existing rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);
return log_trace(1, "Switched existing device rule"); /* The device list was altered. */
}

return device_list_altered;

return log_trace(0, "Reused existing device rule"); /* The device list wasn't altered. */
}

device_list_altered = 1;
list_elem = malloc(sizeof(*list_elem));
if (!list_elem)
return syserrno_set(ENOMEM, "Failed to allocate new device list");
Expand All @@ -507,7 +522,7 @@ int bpf_list_add_device(struct bpf_devices *bpf_devices,
lxc_list_add_elem(list_elem, move_ptr(new_device));
lxc_list_add_tail(&bpf_devices->device_item, move_ptr(list_elem));

return device_list_altered;
return log_trace(1, "Added new device rule"); /* The device list was altered. */
}

bool bpf_devices_cgroup_supported(void)
Expand Down Expand Up @@ -542,19 +557,6 @@ bool bpf_devices_cgroup_supported(void)
return log_trace(true, "The bpf device cgroup is supported");
}

static inline bool bpf_device_add(const struct bpf_program *prog,
struct device_item *device)
{
/* We're blocking all devices so skip individual deny rules. */
if (bpf_device_block_all(prog) && !device->allow)
return false;

/* We're allowing all devices so skip individual allow rules. */
if (!bpf_device_block_all(prog) && device->allow)
return false;
return true;
}

static struct bpf_program *__bpf_cgroup_devices(struct bpf_devices *bpf_devices)
{
__do_bpf_program_free struct bpf_program *prog = NULL;
Expand All @@ -571,29 +573,27 @@ static struct bpf_program *__bpf_cgroup_devices(struct bpf_devices *bpf_devices)

prog->device_list_type = bpf_devices->list_type;
TRACE("Device cgroup %s all devices by default",
bpf_device_block_all(prog) ? "blocks" : "allows");
bpf_device_list_block_all(bpf_devices) ? "blocks" : "allows");

lxc_list_for_each(it, &bpf_devices->device_item) {
struct device_item *cur = it->elem;

if (!bpf_device_add(prog, cur)) {
TRACE("Skipping rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);
TRACE("Processing device rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);

if (!bpf_device_add(bpf_devices, cur))
continue;
}

ret = bpf_program_append_device(prog, cur);
if (ret)
return syserrno(NULL, "Failed adding rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);
return syserrno(NULL, "Failed adding new device rule");

TRACE("Added rule: type %c, major %d, minor %d, access %s, allow %d",
cur->type, cur->major, cur->minor, cur->access, cur->allow);
TRACE("Added new device rule");
}

ret = bpf_program_finalize(prog);
if (ret)
return syserrno(NULL, "Failed to finalize bpf program");
return syserrno(NULL, "Failed to finalize device program");

return move_ptr(prog);
}
Expand All @@ -619,20 +619,6 @@ bool bpf_cgroup_devices_attach(struct cgroup_ops *ops,
return log_trace(true, "Attached bpf program");
}

/*
* TODO: Clarify semantics.
* Specifically, when a user switches the type of device program, i.e. switches
* from blocking all devices by default to allowing all devices by default or
* vica versa do we reactivate the devices we have recorded so far or not?
* Specific example: The user configures a device program that blocks all
* devices by default apart from a small list of devices such as /dev/zero and
* /dev/null. Now the user switches to a device program that allows all devices
* by default. Naturally we skip all specific devices since they are
* encompassed in the global allow rule. But now assume the user switches back
* to a device program that blocks all devices by default. Do we reactivate the
* previously specific allowed devices, i.e. do we grant access to /dev/zero
* and /dev/null? My gut feeling is no, but I'm not sure.
*/
bool bpf_cgroup_devices_update(struct cgroup_ops *ops,
struct bpf_devices *bpf_devices,
struct device_item *new)
Expand Down
6 changes: 0 additions & 6 deletions src/lxc/cgroups/cgroup2_devices.h
Expand Up @@ -48,12 +48,6 @@ struct bpf_program {
__u32 attached_flags;
};

static inline bool bpf_device_block_all(const struct bpf_program *prog)
{
/* LXC_BPF_DEVICE_CGROUP_ALLOWLIST -> allowlist (deny all) */
return prog->device_list_type == LXC_BPF_DEVICE_CGROUP_ALLOWLIST;
}

__hidden extern struct bpf_program *bpf_program_new(__u32 prog_type);
__hidden extern int bpf_program_init(struct bpf_program *prog);
__hidden extern int bpf_program_append_device(struct bpf_program *prog, struct device_item *device);
Expand Down

0 comments on commit a0e788a

Please sign in to comment.