-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syscall: support 'mode 2 seccomp' on Linux #3405
Comments
Hi Russ, first of all, here is the example of using 'mode 2 seccomp' from Go: https://github.com/krasin/seccomp/blob/master/example/example.go It has been tested on x86-64 Ubuntu 12.04, but is not guaranteed to work on 32-bit and will not work on other / older kernels. The good news that 'mode 2 seccomp' is going to be included into Linux 3.5 kernel. See https://lkml.org/lkml/2012/3/25/81 Currently, the example is only half-way working. seccomp applies policies to the current thread, not the process or all the threads. It means that I had to runtime.LockOSThread() to avoid switching the system thread while executing the goroutine and it's only that thread is killed, which makes the whole program just hang (Go runtime does not handle the situation when one of threads has received SIGSYS) In order to get 'mode 2 seccomp' to work properly with Go, it should make it possible to apply the policy to all system threads at once and make it possible to disable thread spawning by the runtime (since clone(2) will likely be disabled by any self-respecting seccomp policy). |
Seccomp mode 2 is not in the mainline kernel: https://github.com/torvalds/linux/blob/master/kernel/seccomp.c |
Seccomp mode 2 is now in the mainline kernel: https://github.com/torvalds/linux/blob/master/kernel/seccomp.c |
And, finally, Linux 3.5 is released: http://kernelnewbies.org/Linux_3.5 From the Release Notes: 1.3. Seccomp-based system call filtering Seccomp (alias for "secure computing") is a simple sandboxing mechanism added back in 2.6.12 that allows to transition to a state where it cannot make any system calls except a very restricted set (exit, sigreturn, read and write to already open file descriptors). Seccomp has now been extended: instead of a fixed and very limited set of system calls, seccomp has evolved into a filtering mechanism that allows processes to specify an arbitrary filter of system calls (expressed as a Berkeley Packet Filter program) that should be forbidden. This can be used to implement different types of security mechanisms; for example, the Linux port of the Chromium web browser supports this feature to run plugins in a sandbox. The systemd init daemon has added support for this feature. A Unit file can use the SystemCallFilter to specify a list with the syscalls that will be allowed to run, any other syscall will not be allowed: [Service] ExecStart=/bin/echo "I am in a sandbox" SystemCallFilter=brk mmap access open fstat close read fstat mprotect arch_prctl munmap write Recommended links: Documentation and Samples). Recommended LWN article: Yet another new approach to seccomp Code: (commit 1, 2, 3, 4, 5) |
For example, if I want to apply the sandbox after the initialization has been completed, but before I have started to handle the untrusted data or code, it perfectly makes sense. Go runtime should support two things in order to make it happen: 1. Make it possible to apply the rule to all threads (even if not immediately, but provide a blocking call) 2. Exit the program if any thread was killed (for example, because of sandbox violation) |
Hi All, I did some research on this issue. Here's a doc with some of my thoughts and a rough proposal: https://docs.google.com/document/d/12jRIrlFYKe3EyBLgtrkZelC-aGnLXnsHzx0ZVDF75Go/pub Feedback welcome. Let me know if the basic approach seems workable and acceptable from the Go runtime point of view. |
Thanks for looking at this. There are two different issues to discuss: the API, and the implementation. The API doesn't belong in the runtime package. In the old days it would have been put in the syscall package, but since we've frozen the syscall package it should now go into the go.sys package. It does sound like the implementation needs to involve the runtime package, but that should happen behind the scenes--programs should call something in go.sys, not something in runtime. In go.sys I think it's fine to have an interface that is specific to the Linux kernel. We already have examples of that in syscall--e.g., the Cloneflags field in SysProcAttr. I don't think we should aim for a system-independent interface to system call blocking; I think there will be too much variance between systems. The case of SYS_CLONE is an interesting one. I don't think the Go runtime can function without the ability to clone a new thread. You suggest creating some new threads ahead of time, but it would not be hard for a program to run out of threads. Every blocking file access call uses up a thread. That is independent of GOMAXPROCS, which controls the number of running threads, not the total number of threads. I think that at least initially we should fail the seccomp call if SYS_CLONE is not permitted. We can see if anybody sees this as a problem--it allows an evil program to fork-bomb the system but it doesn't open any other holes that I can see. |
Thanks for your comments. Your API recommendations make this much simpler. No exported API from package runtime, and a linux (and seccomp) specific API in go.sys package. I'll update the doc to reflect these changes. I also agree that we should start with an implementation that requires SYS_CLONE to be whitelisted. This will make the runtime support for the implementation much simpler as well. However, for the longer term, and from sandboxing point of view, we should still consider the option of working with a strictly limited number of threads in the runtime. Quoting from the reporter of this issue above: "clone(2) will likely be disabled by any self-respecting seccomp policy". Moreover, read() and write() will probably need to be whitelisted by most seccomp policies. If excessive file IO from a sandboxed program may result in the creation of an arbitrarily large number of system threads, it is not a very effective scheme for sandboxing. I was under the impression that the runtime already supports this mode (simply block on blocking system calls if no more threads can be created). Now I see that it causes the program to crash if the number of threads exceeds the threshold set by runtime/debug/SetMaxThreads(). For my curiosity, is the model of working with a strictly limited number of threads fundamentally incompatible with the Go semantics? My understanding is that it is possible to have a fully compliant Go compiler and runtime system that does not use any multi-threading capabilities of the operating system it is running on (and does not have to resort to interpreting either). In other words, using a different system thread for every blocking system call is an optimization in the runtime, and not a requirement for a correct Go implementation. Please correct me if I'm wrong. |
Conceptually, you are right: the number of threads used by a Go program is purely an implementation detail, and it should be possible in principle to write a Go implementation that uses a fixed number of threads. This can even be done in practice. When a Go program is about to enter a system call, or call a C/C++ function, we can check whether there are more threads available to run goroutines. If there are not, we can suspend the goroutine until there are threads available. That approach would work for most programs. The problem is that there are valid Go programs that would work normally, but would fail if this policy were adopted, simply because the other suspended threads might be unblocked by the C function that would have been called by the goroutine that the runtime suspended. The Go programmer has to change from a model in which goroutines can be created casually and can run any operation to a different model in which all potentially blocking calls have to be counted, with semaphores or other mechanisms to prevent a deadlock of blocking calls. Perhaps this is acceptable for people who use seccomp, but it hardly seems desirable. And this is not something that can be implemented only in the seccomp support code. It goes against the grain to modify the Go runtime to support a more complex programming model. |
Ah.. I see. I can have a program with two goroutines that communicate using a system pipe (horror). This program would end up in a deadlock if restricted to a single thread. The read/write system calls on the pipe have to be made simultaneously from different threads for the goroutines to make progress. I have updated the draft based on the feedback above. The new version is here: https://docs.google.com/document/d/1nh1hub2wJdYzoLVUkCPS7MoUIFRZRP5PbrtOO0WajII/pub Please take another look. |
Linux allow seccomp mode 2 to be called multiple times, with each filter stacked. This restriction is unneccesary. |
This is basically the plan for bug #1435 too. |
If your kernel has the new seccomp syscall and supports SECCOMP_FILTER_FLAG_TSYNC, a seccomp filter can be installed from a Go program without special help from the runtime. For details on this flag: https://git.kernel.org/linus/c2e1f2e30daa551db3c670c0ccfeab20a540b9e1 seccomp-tsync support is available in 3.17 or newer kernels. It might also have been backported to the distribution/kernel you're using. One way to find out if your kernel supports this is to test for it (see below). A Go package for seccomp support was added recently to ChromiumOS tree: https://chromium.googlesource.com/chromiumos/platform/go-seccomp/ The package includes a CheckSupport() function to check for seccomp-tsync support in the kernel. |
There are a few packages that allow for setting seccomp filters already, the one from chromium linked above and https://github.com/seccomp/libseccomp-golang which is used by https://github.com/opencontainers/runc . Is this something that needs to be a part of the standard library? There was even a pure go implementation of a seccomp package (https://github.com/docker/libcontainer/pull/613/files), but the problem with that is constantly having to stay in sync with libseccomp and the kernel. So I guess I'm wondering what the goal is here? Because I'm willing to help. |
@AndrewGMorgan perhaps this might interest you :) |
Yes, it seems to me that the syscall.AllThreadsSyscall*() functions added in: https://go-review.googlesource.com/c/go/+/210639/ should address this request too. I'd certainly like to help fix that patch it if it doesn't. |
Tl;dr Go can support what this bug is requesting, we just have to write the sample code differently and adjust expectations for the way Go actually works. I've looked into this. Specifically, I looked at what is up with #3405 (comment) . That comment links to https://github.com/krasin/seccomp/blob/master/example/example.go which, as of the date of this bug update, does not work as described. It has some constant parameter problems on my The first one is Next, and I think this is the nub of the reason this bug report was filed, an important fact to keep in mind is that the Go runtime doesn't treat kernel threads with quite the elevated status that the sample code appears to assume. In general, trying to reason about the behavior of kernel threads in a Go program can tie you up in knots... The Go runtime collectively does everything it can to keep its pool of kernel threads busy making progress on the unblocked code sequences in the program. In the patched code (which dates from before #1435 was fixed), there is some effort to pin a thread down with In the spirit of the above explanation, in general, if a thread (unexpectedly and) silently dies, the collective runtime assumes that thread is actually still working on something important and waits for it. This is the "hang" the above source code refers to. All the other threads are waiting for the SECCOMP killed thread to complete So, given that explanation, what to do? The first thing is to give up any notion that the runtime cares about individual kernel threads. The code uses
#3405 (comment) introduces the flag So, we can rewrite the above example with this in mind... As a bonus, now that #1435 is fixed, it is possible to simultaneously PR_SET_NO_NEW_PRIVS on all of the Go runtime threads, with For similar all threads syscall support, that works with earlier go versions, via cgo, you might also like to try: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx . The long and the short of it being the following rewritten example:
|
FWIW you can get the program to crash in a more informative way by adding this code:
adding the following to the list of accepted system calls:
and replacing This will cause the program to abort with a crash dump pointing to the offending code. Something like this:
Where 0x101 = 257 and
|
Digging deeper still, there is actually no need to use "psx" at all with seccomp when using the
Given all the above detail. I believe this issue should be closed. I'm not sure there is anything needed in the golang sources since the above dated change to the linux kernel. |
This addresses golandlock issue #5. The same issue has been previously discussed in the context of `seccomp(2)` at golang/go#3405. Both `prctl()` and `landlock_restrict_self()` need to be invoked with `syscall.AllThreadsSyscall()`, so that their thread-local effects get applied to all OS threads managed by the Go runtime.
The text was updated successfully, but these errors were encountered: