syscall: support 'mode 2 seccomp' on Linux #3405

krasin · 2012-03-27T03:14:12Z

Ubuntu 12.04 LTS comes with "mode 2 seccomp" and the mainline kernel is
currenly in the process of accepting seccomp patches.

In short, "mode 2 seccomp" adds an ability to apply syscall filters to the
current process.

A good tutorial is http://outflux.net/teach-seccomp/
I have tested it with the daily build of Ubuntu 12.04 LTS,
$ uname -a
Linux krasin-seccomp 3.2.0-20-generic #32-Ubuntu SMP Thu Mar 22 02:22:46 UTC 2012 x86_64
x86_64 x86_64 GNU/Linux

I understand that this feature is not on top of the priority, but when Go 1 is here, it
would be probably time to add this support.

rsc · 2012-03-27T11:19:31Z

Comment 1:

What does this mean?  Are you asking for a system call or two?

Labels changed: added priority-later, removed priority-triage.

Status changed to WaitingForReply.

krasin · 2012-03-28T00:23:59Z

Comment 2:

Hi Russ,
first of all, here is the example of using 'mode 2 seccomp' from Go:
https://github.com/krasin/seccomp/blob/master/example/example.go
It has been tested on x86-64 Ubuntu 12.04, but is not guaranteed to work on 32-bit and
will not work on other / older kernels. The good news that 'mode 2 seccomp' is going to
be included into Linux 3.5 kernel. See https://lkml.org/lkml/2012/3/25/81
Currently, the example is only half-way working. seccomp applies policies to the current
thread, not the process or all the threads. It means that I had to
runtime.LockOSThread() to avoid switching the system thread while executing the
goroutine and it's only that thread is killed, which makes the whole program just hang
(Go runtime does not handle the situation when one of threads has received SIGSYS)
In order to get 'mode 2 seccomp' to work properly with Go, it should make it possible to
apply the policy to all system threads at once and make it possible to disable thread
spawning by the runtime (since clone(2) will likely be disabled by any self-respecting
seccomp policy).

krasin · 2012-03-28T00:30:27Z

Comment 3:

The output I see on Ubuntu 12.04:
$ go run example.go 
Applying syscall policy...
And now, let's make a 'bad' syscall
Note: due to lack of seccomp support from the Go runtime,the example will stuck instead
of crashing. Use Ctrl+C to exit.

krasin · 2012-03-28T00:32:29Z

Comment 4:

The output on Ubuntu 11.10 (which does not support 'mode 2 seccomp'):
$ go get github.com/krasin/seccomp/example
$ example
Applying syscall policy...
2012/03/28 00:31:45 Prctl(PR_SET_NO_NEW_PRIVS): invalid argument

bradfitz · 2012-03-28T01:13:15Z

Comment 5:

There have been a dozen "seccomp v2" proposals over the past many years.
Until one of them is finally upstream (and not in a vendor kernel) and trickles down to
a few notable distros, *then* this gets more interesting.

krasin · 2012-03-28T04:21:44Z

Comment 6:

ok, makes sense. Anyway, it does not make sense to do anything before Ubuntu 12.04
release and I have a strong feeling that Will Drew's 'mode 2 seccomp' will be integrated
into the mainline -next branch by that time.

rsc · 2012-03-28T11:50:36Z

Comment 7:

Labels changed: added priority-someday, removed priority-later.

Status changed to LongTerm.

gopherbot · 2012-03-30T17:23:51Z

Comment 8 by ColinTrexob:

Simply use the command flag --enable-seccomp for what is effectively the mode 2. What
mode 2 adds is the ability to do more than just read, write, end - this is already
what's supported by the Chrome seccomp sandbox.

krasin · 2012-03-30T17:37:13Z

Comment 9:

Hi Colin,
I'm not sure I understand your suggestion. Would you like to elaborate?

gopherbot · 2012-03-31T02:38:11Z

Comment 10 by ColinTrexob:

To elaborate: I had a code.google.com for Chromium (filing a bug) and got very very
confused lol

krasin · 2012-07-15T09:39:46Z

Comment 11:

Seccomp mode 2 is not in the mainline kernel:
https://github.com/torvalds/linux/blob/master/kernel/seccomp.c

krasin · 2012-07-15T09:40:09Z

Comment 12:

Seccomp mode 2 is now in the mainline kernel:
https://github.com/torvalds/linux/blob/master/kernel/seccomp.c

krasin · 2012-07-22T03:28:06Z

Comment 13:

And, finally, Linux 3.5 is released: http://kernelnewbies.org/Linux_3.5
From the Release Notes:
1.3. Seccomp-based system call filtering
Seccomp (alias for "secure computing") is a simple sandboxing mechanism added back in
2.6.12 that allows to transition to a state where it cannot make any system calls except
a very restricted set (exit, sigreturn, read and write to already open file
descriptors). Seccomp has now been extended: instead of a fixed and very limited set of
system calls, seccomp has evolved into a filtering mechanism that allows processes to
specify an arbitrary filter of system calls (expressed as a Berkeley Packet Filter
program) that should be forbidden. This can be used to implement different types of
security mechanisms; for example, the Linux port of the Chromium web browser supports
this feature to run plugins in a sandbox.
The systemd init daemon has added support for this feature. A Unit file can use the
SystemCallFilter to specify a list with the syscalls that will be allowed to run, any
other syscall will not be allowed:
 [Service]  ExecStart=/bin/echo "I am in a sandbox" SystemCallFilter=brk mmap access open fstat close read fstat mprotect arch_prctl munmap write
Recommended links: Documentation and Samples).
Recommended LWN article: Yet another new approach to seccomp
Code: (commit 1, 2, 3, 4, 5)

alberts · 2012-07-22T06:19:54Z

Comment 14:

As with unshare, it might only make sense to call this system call before the Go runtime
starts making threads, which is what systemd is very good at.

krasin · 2012-07-22T08:30:00Z

Comment 15:

why not at any point later?

krasin · 2012-07-22T08:41:35Z

Comment 16:

For example, if I want to apply the sandbox after the initialization has been completed,
but before I have started to handle the untrusted data or code, it perfectly makes
sense. Go runtime should support two things in order to make it happen:
1. Make it possible to apply the rule to all threads (even if not immediately, but
provide a blocking call)
2. Exit the program if any thread was killed (for example, because of sandbox violation)

alberts · 2012-07-22T15:36:04Z

Comment 17:

It's not impossible, it's just harder. I don't speak for the Go team, but I'm guessing:
"patches welcome".

rsc · 2013-12-04T01:51:54Z

Comment 18:

Labels changed: added repo-main.

rsc · 2014-03-03T20:46:13Z

Comment 19:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

gopherbot · 2014-09-08T07:17:08Z

Comment 20 by rahul.chaudhry:

Hi All,
I did some research on this issue. Here's a doc with some of my thoughts and a rough
proposal:
https://docs.google.com/document/d/12jRIrlFYKe3EyBLgtrkZelC-aGnLXnsHzx0ZVDF75Go/pub
Feedback welcome. Let me know if the basic approach seems workable and acceptable from
the Go runtime point of view.

ianlancetaylor · 2014-09-08T15:41:16Z

Comment 21:

Thanks for looking at this.
There are two different issues to discuss: the API, and the implementation.  The API
doesn't belong in the runtime package.  In the old days it would have been put in the
syscall package, but since we've frozen the syscall package it should now go into the
go.sys package.  It does sound like the implementation needs to involve the runtime
package, but that should happen behind the scenes--programs should call something in
go.sys, not something in runtime.
In go.sys I think it's fine to have an interface that is specific to the Linux kernel. 
We already have examples of that in syscall--e.g., the Cloneflags field in SysProcAttr. 
I don't think we should aim for a system-independent interface to system call blocking;
I think there will be too much variance between systems.
The case of SYS_CLONE is an interesting one.  I don't think the Go runtime can function
without the ability to clone a new thread.  You suggest creating some new threads ahead
of time, but it would not be hard for a program to run out of threads.  Every blocking
file access call uses up a thread.  That is independent of GOMAXPROCS, which controls
the number of running threads, not the total number of threads.  I think that at least
initially we should fail the seccomp call if SYS_CLONE is not permitted.  We can see if
anybody sees this as a problem--it allows an evil program to fork-bomb the system but it
doesn't open any other holes that I can see.

gopherbot · 2014-09-09T19:20:57Z

Comment 22 by rahul.chaudhry:

Thanks for your comments.
Your API recommendations make this much simpler. No exported API from package runtime,
and a linux (and seccomp) specific API in go.sys package. I'll update the doc to reflect
these changes.
I also agree that we should start with an implementation that requires SYS_CLONE to be
whitelisted. This will make the runtime support for the implementation much simpler as
well.
However, for the longer term, and from sandboxing point of view, we should still
consider the option of working with a strictly limited number of threads in the runtime.
Quoting from the reporter of this issue above: "clone(2) will likely be disabled by any
self-respecting seccomp policy". Moreover, read() and write() will probably need to be
whitelisted by most seccomp policies. If excessive file IO from a sandboxed program may
result in the creation of an arbitrarily large number of system threads, it is not a
very effective scheme for sandboxing.
I was under the impression that the runtime already supports this mode (simply block on
blocking system calls if no more threads can be created). Now I see that it causes the
program to crash if the number of threads exceeds the threshold set by
runtime/debug/SetMaxThreads().
For my curiosity, is the model of working with a strictly limited number of threads
fundamentally incompatible with the Go semantics? My understanding is that it is
possible to have a fully compliant Go compiler and runtime system that does not use any
multi-threading capabilities of the operating system it is running on (and does not have
to resort to interpreting either). In other words, using a different system thread for
every blocking system call is an optimization in the runtime, and not a requirement for
a correct Go implementation. Please correct me if I'm wrong.

ianlancetaylor · 2014-09-09T20:45:57Z

Comment 23:

Conceptually, you are right: the number of threads used by a Go program is purely an
implementation detail, and it should be possible in principle to write a Go
implementation that uses a fixed number of threads.
This can even be done in practice.  When a Go program is about to enter a system call,
or call a C/C++ function, we can check whether there are more threads available to run
goroutines.  If there are not, we can suspend the goroutine until there are threads
available.
That approach would work for most programs.  The problem is that there are valid Go
programs that would work normally, but would fail if this policy were adopted, simply
because the other suspended threads might be unblocked by the C function that would have
been called by the goroutine that the runtime suspended.  The Go programmer has to
change from a model in which goroutines can be created casually and can run any
operation to a different model in which all potentially blocking calls have to be
counted, with semaphores or other mechanisms to prevent a deadlock of blocking calls. 
Perhaps this is acceptable for people who use seccomp, but it hardly seems desirable. 
And this is not something that can be implemented only in the seccomp support code.  It
goes against the grain to modify the Go runtime to support a more complex programming
model.

gopherbot · 2014-09-18T06:57:02Z

Comment 24 by rahul.chaudhry:

Ah.. I see. I can have a program with two goroutines that communicate using a system
pipe (horror). This program would end up in a deadlock if restricted to a single thread.
The read/write system calls on the pipe have to be made simultaneously from different
threads for the goroutines to make progress.
I have updated the draft based on the feedback above. The new version is here:
https://docs.google.com/document/d/1nh1hub2wJdYzoLVUkCPS7MoUIFRZRP5PbrtOO0WajII/pub
Please take another look.

ianlancetaylor · 2014-09-19T15:17:06Z

Comment 25:

I'm sure some details will arise, but this plan looks basically sound to me.  Thanks.

shawnl · 2015-03-22T17:11:46Z

SecComp can only be called once.

Linux allow seccomp mode 2 to be called multiple times, with each filter stacked. This restriction is unneccesary.

shawnl · 2015-03-22T17:21:59Z

This is basically the plan for bug #1435 too.

rahulchaudhry · 2015-06-05T20:11:08Z

If your kernel has the new seccomp syscall and supports SECCOMP_FILTER_FLAG_TSYNC, a seccomp filter can be installed from a Go program without special help from the runtime.

For details on this flag: https://git.kernel.org/linus/c2e1f2e30daa551db3c670c0ccfeab20a540b9e1

seccomp-tsync support is available in 3.17 or newer kernels. It might also have been backported to the distribution/kernel you're using. One way to find out if your kernel supports this is to test for it (see below).

A Go package for seccomp support was added recently to ChromiumOS tree: https://chromium.googlesource.com/chromiumos/platform/go-seccomp/

The package includes a CheckSupport() function to check for seccomp-tsync support in the kernel.
See the main package file for details: https://chromium.googlesource.com/chromiumos/platform/go-seccomp/+/master/src/chromiumos/seccomp/seccomp.go

jessfraz · 2016-06-22T22:40:40Z

There are a few packages that allow for setting seccomp filters already, the one from chromium linked above and https://github.com/seccomp/libseccomp-golang which is used by https://github.com/opencontainers/runc . Is this something that needs to be a part of the standard library? There was even a pure go implementation of a seccomp package (https://github.com/docker/libcontainer/pull/613/files), but the problem with that is constantly having to stay in sync with libseccomp and the kernel. So I guess I'm wondering what the goal is here? Because I'm willing to help.

odeke-em · 2020-06-06T20:51:20Z

@AndrewGMorgan perhaps this might interest you :)

AndrewGMorgan · 2020-06-07T02:52:22Z

Yes, it seems to me that the syscall.AllThreadsSyscall*() functions added in:

https://go-review.googlesource.com/c/go/+/210639/

should address this request too. I'd certainly like to help fix that patch it if it doesn't.

AndrewGMorgan · 2020-12-24T09:11:23Z

Tl;dr Go can support what this bug is requesting, we just have to write the sample code differently and adjust expectations for the way Go actually works.

I've looked into this. Specifically, I looked at what is up with #3405 (comment) . That comment links to https://github.com/krasin/seccomp/blob/master/example/example.go which, as of the date of this bug update, does not work as described. It has some constant parameter problems on my x86_64 machine.

The first one is PR_SET_NO_NEW_PRIVS should be 38. (In the github link, the code defines this as 36) - which causes the PR_SET_SECCOMP prctl to outright fail since privilege has not been limited. So, for the rest of the discussion, let's assume that is fixed (I've included a full patch below) and move on.

Next, and I think this is the nub of the reason this bug report was filed, an important fact to keep in mind is that the Go runtime doesn't treat kernel threads with quite the elevated status that the sample code appears to assume. In general, trying to reason about the behavior of kernel threads in a Go program can tie you up in knots... The Go runtime collectively does everything it can to keep its pool of kernel threads busy making progress on the unblocked code sequences in the program.

In the patched code (which dates from before #1435 was fixed), there is some effort to pin a thread down with runtime.LockOSThread(). This pinned thread loads the seccomp monitor sequence rule, then executes something that is illegal (ie., forbidden by the rule), and the kernel quietly terminates that pinned thread.

In the spirit of the above explanation, in general, if a thread (unexpectedly and) silently dies, the collective runtime assumes that thread is actually still working on something important and waits for it. This is the "hang" the above source code refers to. All the other threads are waiting for the SECCOMP killed thread to complete main.main() so the program can exit. Since that thread is not making any progress because it is dead, the rest of the runtime threads patiently wait forever. This is the hang...

So, given that explanation, what to do? The first thing is to give up any notion that the runtime cares about individual kernel threads. The code uses SECCOMP_RET_KILL = 0 to signal its displeasure at the individual thread breaking its rule (the kernel sources refer to this as SECCOMP_RET_KILL_THREAD) and it only terminates the offending thread. This does not play well with the Go runtime - it leads to a hang. A better return code, the only one really suitable for Go's runtime model, is: SECCOMP_RET_KILL_PROCESS = 0x80000000, which terminates the whole program. So, here is a patch to get the sample to work as I think was intended:

$ git diff
diff --git a/example/example.go b/example/example.go
index 47d0231..98a2784 100644
--- a/example/example.go
+++ b/example/example.go
@@ -33,11 +33,11 @@ func Prctl(option int, arg2, arg3, arg4, arg5 uint64) (err error) {
 const (
        PR_GET_NAME         = 16
        PR_SET_SECCOMP      = 22
-       PR_SET_NO_NEW_PRIVS = 36
+       PR_SET_NO_NEW_PRIVS = 38
 
-       SECCOMP_MODE_FILTER = 2 /* uses user-supplied filter. */
-       SECCOMP_RET_KILL    = 0 /* kill the task immediately */
-       SECCOMP_RET_ALLOW   = 0x7fff0000
+       SECCOMP_MODE_FILTER      = 2          /* uses user-supplied filter. */
+       SECCOMP_RET_KILL_PROCESS = 0x80000000 /* kill the whole process immediately */
+       SECCOMP_RET_ALLOW        = 0x7fff0000
 
        BPF_LD  = 0x00
        BPF_JMP = 0x05
@@ -77,7 +77,7 @@ func ValidateArchitecture() []SockFilter {
        return []SockFilter{
                BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 4), // HACK: I don't understand this 4.
                BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ARCH_NR, 1, 0),
-               BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+               BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS),
        }
 }
 
@@ -96,7 +96,7 @@ func AllowSyscall(syscallNum uint32) []SockFilter {
 
 func KillProcess() []SockFilter {
        return []SockFilter{
-               BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
+               BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS),
        }
 }
 
@@ -143,14 +143,9 @@ func main() {
        }
 
        fmt.Printf("And now, let's make a 'bad' syscall\n")
-       fmt.Printf("Note: due to lack of seccomp support from the Go runtime," +
-               "the example will stuck instead of crashing. Use Ctrl+C to exit.\n")
+       fmt.Println("...which will crash the program...")
        _, _ = os.Open("nonexistent_file")
 
-       // Actually, the line below will never be printed.
-       // The quirk is that instead of crashing the whole process,
-       // the system kills just the thread that has violated the policy.
-       // Currently, it means that the Go runtime thread gets stuck and
-       // you will have to Ctrl+C to exit from this example.
+       // The line below will never be printed.
        fmt.Printf("How come, I'm alive?\n")
 }

#3405 (comment) introduces the flag SECCOMP_FILTER_FLAG_TSYNC. Which is the right way to install a filter for a Go program, but this flag is not supported through the prctl SECCOMP ABI.

So, we can rewrite the above example with this in mind...

As a bonus, now that #1435 is fixed, it is possible to simultaneously PR_SET_NO_NEW_PRIVS on all of the Go runtime threads, with syscall.AllThreadsSyscall6(), and compiling the code with CGO_ENABLED=0. Combined, this makes it possible to remove the runtime.LockOSThread() call, since the new syscall.AllThreadsSyscall() support mirrors the prctl call on all of the known runtime kernel threads. That new linux-specific syscall support requires a go1.16 compiler toolchain (currently in pre-release state).

For similar all threads syscall support, that works with earlier go versions, via cgo, you might also like to try: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx . The long and the short of it being the following rewritten example:

// Program example is a refactoring of the code referenced in:
//
//   https://github.com/golang/go/issues/3405#issuecomment-66065503
//
// The main goal of the refactoring being to remove the need for
// locking an OS thread, and extending the applicability of the
// filtering to all of the Go program threads. This code uses
// constants for linux-x86_64 architectures only.
//
// Supported Go toolchains are after go1.10. Those prior to go1.15
// require this environment variable to be set to build successfully:
//
//   export CGO_LDFLAGS_ALLOW="-Wl,-?-wrap[=,][^-.@][^,]*"
//
// Go toolchains go1.16+ can be compiled CGO_ENABLED=0 too,
// demonstrating native nocgo support for seccomp features.
package main

import (
	"fmt"
	"log"
	"os"
	"syscall"
	"unsafe"

	"kernel.org/pub/linux/libs/security/libcap/psx"
)

const (
	PR_SET_NO_NEW_PRIVS = 38

	SYS_SECCOMP               = 317        // x86_64 syscall number
	SECCOMP_SET_MODE_FILTER   = 1          // uses user-supplied filter.
	SECCOMP_FILTER_FLAG_TSYNC = (1 << 0)   // mirror filtering on all threads.
	SECCOMP_RET_KILL_PROCESS  = 0x80000000 // kill the whole process immediately
	SECCOMP_RET_ALLOW         = 0x7fff0000

	BPF_LD  = 0x00
	BPF_JMP = 0x05
	BPF_RET = 0x06

	BPF_W = 0x00

	BPF_ABS = 0x20
	BPF_JEQ = 0x10

	BPF_K = 0x00

	AUDIT_ARCH_X86_64 = 3221225534 // HACK: I don't understand this value
	ARCH_NR           = AUDIT_ARCH_X86_64

	syscall_nr = 0
)

// SockFilter is a single filter block.
type SockFilter struct {
	// Code is the filter code instruction.
	Code uint16
	// Jt is the target for a true result from the code execution.
	Jt uint8
	// Jf is the target for a false result from the code execution.
	Jf uint8
	// K is a generic multiuse field
	K uint32
}

// SockFProg is a
type SockFProg struct {
	// Len is the number of contiguous SockFilter blocks that can
	// be found at *Filter.
	Len uint16
	// Filter is the address of the first SockFilter block of a
	// program sequence.
	Filter *SockFilter
}

type SockFilterSlice []SockFilter

func BPF_STMT(code uint16, k uint32) SockFilter {
	return SockFilter{code, 0, 0, k}
}

func BPF_JUMP(code uint16, k uint32, jt uint8, jf uint8) SockFilter {
	return SockFilter{code, jt, jf, k}
}

func ValidateArchitecture() []SockFilter {
	return []SockFilter{
		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 4), // HACK: I don't understand this 4.
		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ARCH_NR, 1, 0),
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS),
	}
}

func ExamineSyscall() []SockFilter {
	return []SockFilter{
		BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr),
	}
}

func AllowSyscall(syscallNum uint32) []SockFilter {
	return []SockFilter{
		BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, syscallNum, 0, 1),
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
	}
}

func KillProcess() []SockFilter {
	return []SockFilter{
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL_PROCESS),
	}
}

// prctl is our wrapper to perform a prctl function syscall
// simultaneously on all linux threads.
//go:uintptrescapes
func prctl(option, arg1, arg2, arg3, arg4, arg5 uintptr) error {
	_, _, e := psx.Syscall6(syscall.SYS_PRCTL, option, arg1, arg2, arg3, arg4, arg5)
	if e != 0 {
		return e
	}
	return nil
}

// seccomp_set_mode_filter is our wrapper for performing our seccomp system call.
//go:uintptrescapes
func seccomp_set_mode_filter(prog *SockFProg) error {
	if _, _, e := syscall.RawSyscall(SYS_SECCOMP, SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, uintptr(unsafe.Pointer(prog))); e != 0 {
		return e
	}
	return nil
}

func main() {
	var filter []SockFilter
	filter = append(filter, ValidateArchitecture()...)

	// Grab the system call number.
	filter = append(filter, ExamineSyscall()...)

	// List allowed syscalls.
	for _, x := range []uint32{
		syscall.SYS_EXIT_GROUP,
		syscall.SYS_EXIT,
		syscall.SYS_MMAP,
		syscall.SYS_READ,
		syscall.SYS_WRITE,
		syscall.SYS_GETTIMEOFDAY,
		syscall.SYS_FUTEX,
		syscall.SYS_SIGALTSTACK,
		syscall.SYS_RT_SIGPROCMASK,
		syscall.SYS_SCHED_YIELD,
		syscall.SYS_GETPID,
		syscall.SYS_TGKILL,
	} {
		filter = append(filter, AllowSyscall(x)...)
	}

	filter = append(filter, KillProcess()...)

	prog := &SockFProg{
		Len:    uint16(len(filter)),
		Filter: &filter[0],
	}

	// This is required to load a filter without privilege.
	if err := prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0, 0); err != nil {
		log.Fatalf("Prctl(PR_SET_NO_NEW_PRIVS): %v", err)
	}

	fmt.Println("Applying syscall policy...")
	if err := seccomp_set_mode_filter(prog); err != nil {
		log.Fatalf("seccomp_set_mode_filter: %v", err)
	}
	fmt.Println("...Policy applied")

	fmt.Println("Let's make a 'bad' syscall - to crash the program")
	_, _ = os.Open("nonexistent_file")

	log.Fatal("This line should not be reached")
}

AndrewGMorgan · 2020-12-30T05:05:00Z

FWIW you can get the program to crash in a more informative way by adding this code:

const SECCOMP_RET_TRAP = 0x00030000 // disallow and force a SIGSYS

func NotifyProcessAndDie() []SockFilter {
	return []SockFilter{
		BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
	}
}

adding the following to the list of accepted system calls:

		syscall.SYS_NANOSLEEP,
		syscall.SYS_RT_SIGRETURN,

and replacing filter = append(filter, KillProcess()...) with filter = append(filter, NotifyProcessAndDie()...).

This will cause the program to abort with a crash dump pointing to the offending code. Something like this:

Let's make a 'bad' syscall - to crash the program
SIGSYS: bad system call
PC=0x48da8a m=0 sigcode=1

goroutine 1 [syscall]:
syscall.Syscall6(0x101, 0xffffffffffffff9c, 0xc00001a180, 0x80000, 0x0, 0x0, 0x0, 0xc0000640c0, 0xc000020100, 0x474d65)
[...]

Where 0x101 = 257 and

$ grep 257 /usr/include/asm/unistd_64.h 
#define __NR_openat 257

AndrewGMorgan · 2021-01-04T03:29:49Z

Digging deeper still, there is actually no need to use "psx" at all with seccomp when using the SECCOMP_FILTER_FLAG_TSYNC option. The kernel appears to force the sharing of the no-new-privs bit when applying the bpf filter program if the thread that attempts it has its own no new privs bit set. That is, since 2015-12-26, kernel/seccomp.c:seccomp_sync_threads():

		/*
		 * Don't let an unprivileged task work around
		 * the no_new_privs restriction by creating
		 * a thread that sets it up, enters seccomp,
		 * then dies.
		 */
		if (task_no_new_privs(caller))
			task_set_no_new_privs(thread);

Given all the above detail. I believe this issue should be closed. I'm not sure there is anything needed in the golang sources since the above dated change to the linux kernel.

This addresses golandlock issue #5. The same issue has been previously discussed in the context of `seccomp(2)` at golang/go#3405. Both `prctl()` and `landlock_restrict_self()` need to be invoked with `syscall.AllThreadsSyscall()`, so that their thread-local effects get applied to all OS threads managed by the Go runtime.

krasin added longterm labels Sep 19, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed priority-someday labels Apr 10, 2015

ianlancetaylor removed the LongTerm label May 17, 2018

rumpelsepp mentioned this issue Dec 24, 2018

Add pledge(2) support (OpenBSD only) via a 'protector' package. junegunn/fzf#1297

Merged

gnoack mentioned this issue Jul 24, 2021

RestrictPaths needs to apply to all OS threads and Goroutines at once landlock-lsm/go-landlock#5

Closed

seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall: support 'mode 2 seccomp' on Linux #3405

syscall: support 'mode 2 seccomp' on Linux #3405

krasin commented Mar 27, 2012

rsc commented Mar 27, 2012

krasin commented Mar 28, 2012

krasin commented Mar 28, 2012

krasin commented Mar 28, 2012

bradfitz commented Mar 28, 2012

krasin commented Mar 28, 2012

rsc commented Mar 28, 2012

gopherbot commented Mar 30, 2012

krasin commented Mar 30, 2012

gopherbot commented Mar 31, 2012

krasin commented Jul 15, 2012

krasin commented Jul 15, 2012

krasin commented Jul 22, 2012

alberts commented Jul 22, 2012

krasin commented Jul 22, 2012

krasin commented Jul 22, 2012

alberts commented Jul 22, 2012

rsc commented Dec 4, 2013

rsc commented Mar 3, 2014

gopherbot commented Sep 8, 2014

ianlancetaylor commented Sep 8, 2014

gopherbot commented Sep 9, 2014

ianlancetaylor commented Sep 9, 2014

gopherbot commented Sep 18, 2014

ianlancetaylor commented Sep 19, 2014

shawnl commented Mar 22, 2015

shawnl commented Mar 22, 2015

rahulchaudhry commented Jun 5, 2015

jessfraz commented Jun 22, 2016 •

edited

Loading

odeke-em commented Jun 6, 2020

AndrewGMorgan commented Jun 7, 2020

AndrewGMorgan commented Dec 24, 2020 •

edited

Loading

AndrewGMorgan commented Dec 30, 2020

AndrewGMorgan commented Jan 4, 2021

syscall: support 'mode 2 seccomp' on Linux #3405

syscall: support 'mode 2 seccomp' on Linux #3405

Comments

krasin commented Mar 27, 2012

rsc commented Mar 27, 2012

krasin commented Mar 28, 2012

krasin commented Mar 28, 2012

krasin commented Mar 28, 2012

bradfitz commented Mar 28, 2012

krasin commented Mar 28, 2012

rsc commented Mar 28, 2012

gopherbot commented Mar 30, 2012

krasin commented Mar 30, 2012

gopherbot commented Mar 31, 2012

krasin commented Jul 15, 2012

krasin commented Jul 15, 2012

krasin commented Jul 22, 2012

alberts commented Jul 22, 2012

krasin commented Jul 22, 2012

krasin commented Jul 22, 2012

alberts commented Jul 22, 2012

rsc commented Dec 4, 2013

rsc commented Mar 3, 2014

gopherbot commented Sep 8, 2014

ianlancetaylor commented Sep 8, 2014

gopherbot commented Sep 9, 2014

ianlancetaylor commented Sep 9, 2014

gopherbot commented Sep 18, 2014

ianlancetaylor commented Sep 19, 2014

shawnl commented Mar 22, 2015

shawnl commented Mar 22, 2015

rahulchaudhry commented Jun 5, 2015

jessfraz commented Jun 22, 2016 • edited Loading

odeke-em commented Jun 6, 2020

AndrewGMorgan commented Jun 7, 2020

AndrewGMorgan commented Dec 24, 2020 • edited Loading

AndrewGMorgan commented Dec 30, 2020

AndrewGMorgan commented Jan 4, 2021

jessfraz commented Jun 22, 2016 •

edited

Loading

AndrewGMorgan commented Dec 24, 2020 •

edited

Loading