Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interfaces: simplify snap-confine by just loading pre-generated bpf code #3431

Merged
merged 95 commits into from Jun 22, 2017

Conversation

@mvo5
Copy link
Collaborator

mvo5 commented Jun 2, 2017

We currently rely on snap-confine to generate the seccomp filter based on a textual filter. This approach has some drawbacks:

  • the parser is written in a suid root C program
  • tricky to add new symbols and new syntax, i.e. we always need to ensure that the snap-confine that runs can parse the input
  • the parser is written in C

This branch changes this so that all the parsing is done in go. This will generate a binary bpf program. This program is loaded via snap-confine into the kernel.

Note that the tests will not work just yet, the testing is done by compiling the bpf program and running it in a bpf VM with the inputs that the kernel would feed into the bpf program. However there is a bug/misfeature in x/net/bpf that hardcodes the byte order to big-endian. Which will fail on x86. I reported this (with a patch) upstream at golang/go#20556

@mvo5 mvo5 requested review from zyga and jdstrand Jun 2, 2017
@mvo5 mvo5 force-pushed the mvo5:seccomp-bpf branch from fb5bcbc to 3fd764e Jun 6, 2017
Copy link
Contributor

jdstrand left a comment

I just started looking at this and I wanted to quickly mention before any other review comments that I noticed that the tests in cmd/snap-confine/tests/* have not been ported. These are run as part of 'make check' and many verify syntax parsing and some make sure the sandbox's profile is loaded and in effect. I suppose the sandbox tests could be moved out to spread but all the parsing tests should be reimplemented in main_test.go.

Copy link
Contributor

jdstrand left a comment

I've performed a first pass at the snap-confine C changes. I'll take a look at snap-seccomp next.

die("seteuid failed");
if (geteuid() != 0)
die("raising privs before seccomp_load did not work");
}

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

Instead of raising here, can you raise just before the prctl call instead? I don't see a reason to perform the parsing and loading into memory as root.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

Thanks, indeed! This makes a lot of sense.

filter_profile);
char profile_path[512]; // arbitrary path name limit
sc_must_snprintf(profile_path, sizeof(profile_path), "%s/%s.bpf",
filter_profile_dir, filter_profile);

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

In the current non-bpf caching implementation, filter_profile_dir can be set to SNAPPY_LAUNCHER_SECCOMP_PROFILE_DIR. Was this functionality intentionally dropped from this PR?

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

I will have a look at this, we may not need it anymore with real spread tests and with the bpf VM unit tests. But if its easier with it I will bring it back.

.filter = (struct sock_filter *)bpf,
};
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
die("prctl(SECCOMP)");

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

We had better error reporting before this PR. Ie, we had:

rc = seccomp_load(ctx);
if (rc != 0) {
        fprintf(stderr, "seccomp_load failed with %i\n", rc);
        die("aborting");
}

Can you do something similar, eg:

if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
        perror("prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...) failed");
        die("aborting");
}
@@ -775,126 +107,12 @@ scmp_filter_ctx sc_prepare_seccomp_context(const char *filter_profile)
// - capability sys_admin in AppArmor
// Note that with NO_NEW_PRIVS disabled, CAP_SYS_ADMIN is required to
// change the seccomp sandbox.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

Please adjust this comment to now be:

// Load filter into the kernel. Importantly we are intentionally *not* setting
// NO_NEW_PRIVS because it interferes with exec transitions in AppArmor with
// certain snappy interfaces. Not setting NO_NEW_PRIVS does mean that
// applications can adjust their sandbox if they have CAP_SYS_ADMIN or, if
// running on < 4.8 kernels, break out of the seccomp via ptrace. Both
// CAP_SYS_ADMIN and 'ptrace (trace)' are blocked by AppArmor with typical
// snappy interfaces.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

done

.len = num_read / sizeof(struct sock_filter),
.filter = (struct sock_filter *)bpf,
};
if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

I'm not terribly excited that we are just dumping into the kernel whatever is on disk since that could expose kernel implementation bugs but some light fuzzing does show that the kernel is verifying it is a valid bpf, but snap-confine seems like it would happily load an arbitrary non-seccomp bpf. snap-confine should be modified to check that the '/' all the way down to the bpf cache file are root:root owned and all writable only by the root user, with no writes for group or other. In this manner, only root-owned processes (eg, snapd) can write the bpf cache files, thus mitigating issues with loading arbitrary content into the kernel. This would guard against situations where the permissions of these files somehow become unsafe.

The most correct implementation would also have snap-confine perform (as non-root) validation on the input to make sure it is structured correctly.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 19, 2017

Author Collaborator

As discussed I added opcode based validation and removed it again because the consensus is that this kind of validation is too brittle. The check for /var/lib/snapd/seccomp/bpf all the way down to "/" is implemented (and uncovered a bug in the current snap-confine code).

// initialize hsearch map
sc_map_init();
// load bpf
char bpf[32 * 1024];

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

Where did you get these magic values? I looked around and I found:

Looking at that, it seems that the size you picked is rather arbitrary. Is this true? If so, please comment this is an arbitrary upper bound.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

Sorry, my bad. This is indeed an arbitrary limit in my code. I think we want some limit but the question of course is how big it should be. Maybe something like 640kb? or multiple megabyte? Suggestions welcome.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 14, 2017

Contributor

I mentioned elsewhere, I think 16kb is fine.

if (ctx == NULL) {
errno = ENOMEM;
die("seccomp_init() failed");
ssize_t num_read = read(fd, bpf, sizeof bpf);

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

Note that read() might be interrupted for a number of reasons which could potentially result in a partial bpf cache file read that could be continued later. It's probably fine to just error out here like you are and let systemd/the user retry, but thought I'd mention it. See 'man 2 read' for details.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

Indeed, I reworked this code a little bit and error for now, I think I need to get back to this though, I think we can't land it without a proper read() implementation that deals with interruptions :/

if (num_read < 0) {
die("cannot read bpf %s", profile_path);
} else if (num_read == sizeof bpf) {
die("cannot fit bpf %s into buffer", profile_path);

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

This isn't quite right-- a file of exactly size '32 * 1024' fits into bpf. Since you are reading binary data, you don't need the '\0' at the end. Your code for the size check is fine and safe, it is just a bit confusing. I also don't know if '32 * 1024' is a magic value. Perhaps you want '(32 * 1024) + 1'?

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

Indeed, thank you! I changed the code slightly now to check the limit directly via fstat instead via this (indirect and confusing) loading.

cmd := os.Args[1]
switch cmd {
case "compile":
content, err := ioutil.ReadFile(os.Args[2])

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

Your use of := here declares err local to this statement such that err is always nil outside of this statement which means that compile(content, os.Args[3]) errors end up being ignored. Eg:

$ mkdir /tmp/dir
$ snap-seccomp compile /var/lib/snapd/seccomp/profiles/snap.hello-world.sh /tmp/dir && echo yes
yes
$

That should have errored out. Please correct this to use = instead of := up above and add a test for os.Args[3] pointing at an existing directory and that it returns an error.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 19, 2017

Author Collaborator

Thank you, very good catch. This is fixed now.

@@ -20,6 +20,7 @@
#include <asm/ioctls.h>
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <linux/can.h> // needed for search mappings
#include <linux/netlink.h>
#include <sched.h>

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 12, 2017

Contributor

You can (and should) remove a lot of includes that were only used for search mappings.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 13, 2017

Author Collaborator

Thanks, this is done now.

case "s390x":
return main.ScmpArchS390X
case "ppc":
return main.ScmpArchPPC

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

You're missing ppc64.

}
}

func simulateBpf(c *C, seccompWhitelist, bpfInput string, expected int) {

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Perhaps add a comment here to help explain things. Eg:

// simulateBpf:
//  1. runs main.Compile() which will catch syntax errors and output to a file
//  2. takes the output file from main.Compile and loads it via decodeBpfFromFIle
//  3. parses the decoded bpf
//  4. runs the parsed bpf through a bpf VM
//
// In this manner, in addition to verifying policy syntax we are able to
// unit test the resulting bpf in several rudimentary ways (rudimentary
// because this parser is (intentionally) not a complete seccomp parser.
//
// Full testing of applied policy is done elsewhere via spread tests.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

In writing the above comment it occurred to me that it would be nice if we could load this policy into the kernel via prctl(). This should be possible with a small loader program that you exec(). Eg, here is a small C program to do that:

/*
 * gcc -Wall ./load-bpf-with-nnp.c -o load-bpf-with-nnp
 * /tmp/snap-seccomp compile ./test-nonet.filter ./test-nonet.filter.bpf
 *
 * Assuing test-nonet.filter doesn't allow 'socket', this should load the
 * bpf into the kernel and run ping under it, which should fail:
 * ./load-bpf-with-nnp ./test-nonet.filter.bpf /bin/ping -c 1 www.ubuntu.com
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <stdint.h>
#include <inttypes.h>
#include <fcntl.h>
#include <sys/prctl.h>

#include <linux/filter.h>
#include <linux/seccomp.h>

#define MAX_BPF_SIZE 32 * 1024

int sc_apply_seccomp_bpf(const char *profile_path)
{
	unsigned char bpf[MAX_BPF_SIZE + 1]; // account for EOF
	FILE *fp;
	fp = fopen(profile_path, "rb");
	if (fp == NULL) {
		fprintf(stderr, "cannot read %s\n", profile_path);
		exit(1);
	}

	// set 'size' to '1; to get bytes transferred
	size_t num_read = fread(bpf, 1, sizeof(bpf), fp);

	if (ferror(fp) != 0) {
		perror("fread()");
		exit(1);
	} else if (feof(fp) == 0) {
		fprintf(stderr, "file too big\n");
		exit(1);
	}
	fclose(fp);

	struct sock_fprog prog = {
		.len = num_read / sizeof(struct sock_filter),
		.filter = (struct sock_filter *)bpf,
	};

	// Set NNP to allow loading seccomp policy into the kernel without
	// root
	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
		perror("prctl(PR_NO_NEW_PRIVS, 1, 0, 0, 0)");
		exit(1);
	}

	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
		perror("prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...) failed");
		exit(1);
	}
	return 0;
}

int main(int argc, char *argv[])
{
	int rc = 0;
	if (argc < 2) {
		fprintf(stderr, "Usage: %s <bpf file> [prog ...]\n", argv[0]);
		return 1;
	}

	rc = sc_apply_seccomp_bpf(argv[1]);
	if (rc || argc == 2)
		return rc;

	execv(argv[2], (char *const *)&argv[2]);
	perror("execv failed");
	return 1;
}

Now, you don't have to use C, the important parts are:

	// Set NNP to allow loading seccomp policy into the kernel without
	// root
	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
		perror("prctl(PR_NO_NEW_PRIVS, 1, 0, 0, 0)");
		exit(1);
	}

	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
		perror("prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, ...) failed");
		exit(1);
	}

so, call prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) then you can load the policy into the kernel with prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog) without being root. If that doesn't exit with error then that is an even better test than the bpf VM.

The shell scripts for the C code would take this a small step further and typically simply run something basic like /bin/true with the default policy and any tested rules to see that after the policy was loaded, it was usable to some degree (ie, enough to run /bin/true; spread tests continue to be the best way to test specific rule functionality of course).

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 20, 2017

Author Collaborator

Thanks a lot! I spend a bit of time tying to do this natively in go. Loading is fine, however simulating a seccomp killed app is tricky with go because of its runtime. I end up with hanging code and zombies when trying to do this with cgo from inside go. It seems to do these kinds of tests we will need a C helper (as the one you outlined above).

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 20, 2017

Author Collaborator

I played with this a bit and got to http://paste.ubuntu.com/24909258/ - it does feel a bit ugly and takes a long time (45sec on my relatively fast workstation) because of intensive tests like snapSeccompSuite.TestRestrictionsWorkingArgsSocket - I can explore further tomorrow if we want to keep this direction.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 20, 2017

Author Collaborator

This is a fully working version: http://paste.ubuntu.com/24909351/ using the C helper/in-kernel-execute approach.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 20, 2017

Author Collaborator

The helper is only compiled once, but I compile custom test-binaries:

content := fmt.Sprintf(`
#define _GNU_SOURCE
#include<unistd.h>
#include<sys/syscall.h>
int main(int argc, char **argv) {
syscall(SYS_%v, %v, %v, %v, %v, %v, %v);
}
`, syscallname, syscallArgs[0], syscallArgs[1], syscallArgs[2], syscallArgs[3], syscallArgs[4], syscallArgs[5])

like this to test the filtering with both Allow and Kill.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

That's pretty cool actually and a lot more than what the old shell tests were doing and way more comprehensive. To me, the 45 seconds would be worth it. :)

That said, I think that test binary could be made more generic so that it only would need to compile it once. You shouldn't have to use syscall(SYS_%v, ...); you can just feed ints into syscall() for arg1 and all the other args (man syscall). So compile once, then the tests call your test-binary with different arguments. This should get you the speed you want and give us very comprehensive testing.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 20, 2017

Author Collaborator

I got this down to ~30s on my system and also to a much narrower set of syscalls: http://paste.ubuntu.com/24911184/ - it still feels strange to have gcc in there but this is really as close as we can get to when it comes to testing.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Personally, I think 30 seconds is acceptable, but note my last comment-- I think we can make it faster.

What is neat about your approach is 'yes' it is very close and we're doing it all in unit tests rather than relying on this only in the integration spread tests. Yay! :)

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 21, 2017

Author Collaborator

@jdstrand I pushed #3502 with the suggested single syscall-runner binary. I did the original per-test compile mostly because it allowed me to have a super narrow set of syscalls by avoiding to link against most of libc. Once I get argument parsing I need to accept the libc startupfiles which means more syscalls (like brk, arch_prctl, access, sysinfo). But given the win in test runtime I think its better to have the compile only once (even if it means some more allowed syscalls by default).

panic(fmt.Sprintf("cannot map ubuntu arch %q to a seccomp arch", ubuntuArch))
}

// ScmpArchToSeccompAnativeArch takes a seccomp.ScmpArch and converts

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

s/ScmpArchToSeccompAnativeArch/ScmpArchToSeccompNativeArch/

Copy link
Contributor

jdstrand left a comment

I went through the PR top to bottom and believe all the most important parts of my comments have been addressed (excepting the pending loading the bpf into the kernel patches). What remains is all comment changes, nit-picky stuff, a small apparmor profile change, some observations and a few easily added extra tests.

@@ -62,11 +63,58 @@ func ubuntuArchFromGoArch(goarch string) string {
"ppc64le": "ppc64el",
"s390x": "s390x",
"ppc": "powerpc",
// available in debian and other distros
"ppc64": "ppc64",

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

I just noticed that arch_test.go is missing s390x and ppc64.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 21, 2017

Author Collaborator

Thanks, added.

"s390x": "s390x",
"ppc": "powerpc",
// available in debian and other distros
"ppc64": "ppc64",

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

I think you need to go fmt for the spacing here.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 21, 2017

Author Collaborator

The comment is the reason why the spacing here is different, go fmt was applied :)

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 21, 2017

Contributor

Ah :)

@@ -116,6 +116,18 @@ static void sc_quirk_create_writable_mimic(const char *mimic_dir,
debug("creating writable mimic directory %s based on %s", mimic_dir,
ref_dir);
sc_quirk_setup_tmpfs(mimic_dir);

struct stat stat_buf;

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Not strictly required cause the code is obvious, but adding a comment to the effect of this might be nice:

// Now copy the ownership and permissions of the mimicked directory
@@ -89,11 +89,14 @@
# change_profile unsafe /** -> **,

# reading seccomp filters
/{tmp/snap.rootfs_*/,}var/lib/snapd/seccomp/profiles/* r,
/{tmp/snap.rootfs_*/,}var/lib/snapd/seccomp/profiles.bpf/* r,

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Let's add the .bpf extension here so if for some reason snap-confine ends up trying to read the wrong thing, we have a denial to indicate that. Eg, instead of what you have, use:

/{tmp/snap.rootfs_*/,}var/lib/snapd/seccomp/profiles.bpf/*.bpf r,
// sc_maybe_fixup_permissions fixes incorrect permissions
// inside the mount namespace for /var/lib. Before 1ccce4
// this directory was created with permissions 1777.
void sc_maybe_fixup_permissions()

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

I'm 'ok' with this function here since it is a fixup function, but I suspect there might be a better location. @zyga, do you have an opinion?


// simulateBpf:
// 1. runs main.Compile() which will catch syntax errors and output to a file
// 2. takes the output file from main.Compile and loads it via decodeBpfFromFIle

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

s/decodeBpfFromFIle/decodeBpfFromFile/

// 4. runs the parsed bpf through a bpf VM
//
// In this manner, in addition to verifying policy syntax we are able to
// unit test the resulting bpf in several ways approximating the kernels

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

s/kernel/kernel's/

// In this manner, in addition to verifying policy syntax we are able to
// unit test the resulting bpf in several ways approximating the kernels
// behaviour (approximating because this parser is not the kernel seccomp
// parser.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

s/parser/parser)/

// simulateBpf:
// 1. runs main.Compile() which will catch syntax errors and output to a file
// 2. takes the output file from main.Compile and loads it via decodeBpfFromFIle
// 3. parses the decoded bpf

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Let's be a little more specific:

//  3. parses the decoded bpf using the seccomp library and various snapd functions
// TestCompile will test the input from our textual seccomp whitelist
// against a kernel syscall input that may contain arguments. The test
// is performed by running the compiled bpf program on a virtual bpf
// machine. Each test needs to declare what output from the VM it expects.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 20, 2017

Contributor

Let's make this easier for people adding tests and rephrase to:

// TestCompile iterates over a range of textual seccomp whitelist rules and
// mocked kernel syscall input. For each rule, the test consists of compiling
// the rule into a bpf program and then running that program on a virtual bpf
// machine and comparing the bpf machine output to the specified expected
// output and seccomp operation. Eg:
//    {"<rule>", "<mocked kernel input>", <seccomp result>}
//
// Eg to test that the rule 'read >=2' is allowed with 'read(2)' and 'read(3)'
// and denied with 'read(1)' and 'read(0)', add the following tests:
//    {"read >=2", "read;native;2", main.SeccompRetAllow},
//    {"read >=2", "read;native;3", main.SeccompRetAllow},
//    {"read >=2", "read;native;1", main.SeccompRetKill},
//    {"read >=2", "read;native;0", main.SeccompRetKill},
mvo5 added 2 commits Jun 21, 2017
@jdstrand

This comment has been minimized.

Copy link
Contributor

jdstrand commented Jun 21, 2017

Thanks for addressing all the feedback @mvo5! :)

@mvo5 mvo5 added the ⚠ Critical label Jun 21, 2017
Copy link
Contributor

niemeyer left a comment

Assuming you're happy about these comments, LGTM.

The ".bpf" directory prefix is the only point I'd like to talk about first in case you disagree with the points made.

log.Panicf("cannot get kernel architecture: %v", err)
}

kernelArch := make([]byte, 0, len(utsname.Machine))

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

Isn't the logic below basically

kernelArch := strings.SplitN(utsname.Machine, "\x00", 2)[0]

that's pedantic, but besides being simpler it means no need to convert and allocate back and forth into byte arrays. It's just a string being sliced.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 22, 2017

Author Collaborator

utsname.Machine is defined as [65]int8 so doing the above is slightly more involved. The following will work:

	p := (*[len(utsname.Machine)]byte)(unsafe.Pointer(&utsname.Machine))
	kernelArch := bytes.SplitN(p[:], []byte("\x00"), 2)[0]

If you prefer that, I'm happy to use that instead.

"strings"
"syscall"

// FIXME: we want github.com/mvo5/libseccomp-golang but that

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

Did you mean seccomp instead of mvo5 here?


// used to mock in tests
var (
archUbuntuArchitecture = arch.UbuntuArchitecture

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

All of these "Ubuntu" names should at some point probably be replaced by something that reflects better the cross-distribution nature of this code.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 22, 2017

Author Collaborator

Indeed, I added a FIXME for this. It seems like SnapdArchitecture or simply Architecture is reasonable. It is slightly sad that (by now) we have three different ways of describing the architecture. There is the "dpkg" architecture (and snaps use, e.g. amd64). There is the kernel architecture name (e.g. X86_64) and the go architecture (e.g. amd64). And there is the seccomp architecture which is two integers. One in the go code (iota) and one the kernel architecture uint32 (AUDIT_ARCH_X86_64). But I digress :)

// add a compat architecture for some architectures that
// support it, e.g. on amd64 kernel and userland, we add
// compat i386 syscalls.
if archUbuntuArchitecture() == archUbuntuKernelArchitecture() {

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

Both of these should go into variables so we're not rebuilding them repeatedly below.

}

scanner := bufio.NewScanner(bytes.NewBuffer(content))
for scanner.Scan() {

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

We just took that exact same text and split it out entirely in memory above under hasLine and threw away the content. It feels a bit awkward to then take the care of streaming it line by line once more and over a much more complex machinery. If we don't care, then let's just split it up above once and ahead of time, and user the same slice of lines in hasLine (renaming it to contains or something) and here.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 22, 2017

Author Collaborator

Indeed, I modeled this after the C implementation, I reworked it now and it is only doing a single pass over content now.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 22, 2017

Contributor

The C implementation algorithm was intentionally 2 pass with a line by line preprocess for @unrestricted/@complain and then another pass to process line by line because it resulted to simpler C setuid code.

The go code doesn't have to be 2-pass.


// FIXME: right now complain mode is the equivalent to unrestricted.
// We'll want to change this once we seccomp logging is in order.
if hasLine(content, "@unrestricted") || hasLine(content, "@complain") {

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

Is the position of these lines really irrelevant?

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 21, 2017

Contributor

Yes. Since the seccomp profile is just an unordered list of syscalls, @unrestricted and @complain are treated similarly and may appear anywhere. By convention in snapd we put them at the top of the file, but anywhere in the file is fine and consistent with the C implementation. We could change this of course if it is deemed a problem. I like the property because it allows for these or future directives to potentially be composed (eg, an interface that adds @something).


func showSeccompLibraryVersion() error {
major, minor, micro := seccomp.GetLibraryVersion()
fmt.Fprintf(os.Stdout, "seccomp version: %d.%d.%d\n", major, minor, micro)

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

I think we can print just the version itself here, since the command is already super specific (library-version).

if err != nil {
break
}
err = compile(content, os.Args[3])

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

len(os.Args) == 3 means harsh crash here.
len(os.Args) == 2 means harsh crash four lines above.
len(os.Args) == 1 means harsh crash seven lines above.


for baseName := range content {
in := filepath.Join(dirs.SnapSeccompDir, baseName)
out := filepath.Join(dirs.SnapSeccompDir, baseName+".bpf")

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 21, 2017

Contributor

If the file names are being extended with .bpf, is there any benefit in changing SnapSeccompDir to have .bpf in the directory itself, or could we simply keep the same directory and have .bpf in the filenames alone? This has the added benefit that the old profiles will be properly collected once new profiles compile, instead of being left around forever once this patch lands.

This comment has been minimized.

Copy link
@mvo5

mvo5 Jun 21, 2017

Author Collaborator

My original approach was to use the same dir, however there are some complications. We use EnsureDirState() in the code that generates the seccomp profiles. This code will remove all of the old seccomp profiles (written by the old 2.24 code). So when we revert we run into the same problem that we need to solve, there will be new incompatible profiles at startup until 2.24 ran and generated valid profiles for 2.24. The easiest way out of this was to use a new dir. We can add a upcoming task to the forum to ensure we garbage collect the now-longer-used directory. We can of course still keep the /var/lib/snapd/seccomp/profiles/ directory, when we do this, we need to tweak the profile names so that the glob used in EnsureDirState() does not remove the old 2.24 seccomp profiles. If that is more desirable than the new directory in can investigate this in my morning.

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 22, 2017

Contributor

Thanks for the explanation. I forgot we'll always rebuild the profile on startup due to the binary change. It doesn't seem worth adding extra complexity to preserve the directory when something simpler along the lines of what you've done will sort it out just fine.

Here is an alternative suggestion: seccomp/bpf/*.src for the source files and .../*.bin for the binary ones. The .src is just a trivial hint to suggest that this file won't really be used by the runtime until it is compiled. The .bin is a way to avoid the .../bpf/*.bpf duplicity, and only really recommended if there's no standard suffix for the BPF bytecode (I couldn't find .bpf as a traditional suffix.. is it?).

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 22, 2017

Contributor

I like how /var/lib/snapd/seccomp is preserved in @niemeyer's suggestion to use /var/lib/snapd/seccomp/bpf/... since finding the subdir bpf/ is more discoverable than the sibling dir seccomp.bpf.

As mentioned in previous days, I don't have a strong preference on the naming approach, so what is proposed is fine with me, but I got to wondering if we have the bpf/ subdir and both the input and output files are there, we don't really need both the input and ouput files to have a suffix. Eg, could simply use seccomp/bpf/snap.name.cmd.src (or seccomp/bpf/snap.name.cmd.in) and seccomp/bpf/snap.name.cmd (ie, no extension for the output file). Or flip it and have no extension for the input file and use .cache or .bin for the output. I suspect that the former (have an extension on the input file) is preferred since it retains the hint @niemeyer was after.

Not blocking on this if you're happy with *.src and *.bin, just thought it was potentially cleaner to use an extension on only one of input or output.

This comment has been minimized.

Copy link
@niemeyer

niemeyer Jun 22, 2017

Contributor

Problem with not having .bin is that everywhere else those are text files, and tab-completing them will hit the naked file before the .src, so people will tend to have an editor full of garbage. The .src and .bin make it very easy to grasp what's the content in each case without having to open it up.

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 22, 2017

Contributor

That makes sense especially if the hint is important.

@mvo5 mvo5 force-pushed the mvo5:seccomp-bpf branch from c60e6f2 to 26b2591 Jun 22, 2017
@@ -33,7 +33,7 @@
#include "../libsnap-confine-private/string-utils.h"
#include "../libsnap-confine-private/utils.h"

static char *filter_profile_dir = "/var/lib/snapd/seccomp/profiles.bpf/";
static char *filter_profile_dir = "/var/lib/snapd/seccomp/bpf/";

This comment has been minimized.

Copy link
@jdstrand

jdstrand Jun 22, 2017

Contributor

Meh, I mentally misread all the other stuff as /var/lib/snapd/seccomp/profiles/bpf when I commented that I liked it. I don't want to belabor this point cause like I said before, I don't have a strong opinion. This is fine.

@mvo5 mvo5 merged commit 3247e95 into snapcore:master Jun 22, 2017
5 of 7 checks passed
5 of 7 checks passed
xenial-i386 autopkgtest finished (failure)
Details
yakkety-amd64 autopkgtest finished (failure)
Details
artful-amd64 autopkgtest finished (success)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
zesty-amd64 autopkgtest finished (success)
Details
mvo5 added a commit that referenced this pull request Jun 26, 2017
many: backport of seccomp-bpf branch (#3431) to the 2.26 release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.