Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: panic: operation not permitted #41356

Open
fogfish opened this issue Sep 12, 2020 · 23 comments
Open

cmd/link: panic: operation not permitted #41356

fogfish opened this issue Sep 12, 2020 · 23 comments

Comments

@fogfish
Copy link

@fogfish fogfish commented Sep 12, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

The issue exists in a family of 1.15.x releases, most probably it relates to linker changes introduced at https://golang.org/doc/go1.15#linker

Previous family of 1.14.x releases are not impacted by the issue

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/tmp"
GOENV=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/tmp/8c7ec6d123a7abf0769333196e0d26f470afdc59/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/tmp/8c7ec6d123a7abf0769333196e0d26f470afdc59"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/tmp/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/tmp/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build935346230=/tmp/go-build -gno-record-gcc-switches"

What did you do?

go run hw.go
// hw.go
package main

func main(){}

What did you expect to see?

No error

What did you see instead?

The following command fail at Amazon Linux only while running in the cloud

It fails with

# command-line-arguments
panic: operation not permitted
goroutine 1 [running]:
cmd/link/internal/ld.Main(0x870840, 0x20, 0x20, 0x1, 0x7, 0x10, 0x0, 0x0, 0x6da8ff, 0x1b, ...) | 
/usr/local/go/src/cmd/link/internal/ld/main.go:320 +0x21bd
main.main()
/usr/local/go/src/cmd/link/main.go:68 +0x1dc
...
go run -x hw.go


WORK=/tmp/go-build825787747
mkdir -p $WORK/b001/
cat >$WORK/b001/importcfg << 'EOF' # internal
# import config
packagefile runtime=/tmp/go/pkg/linux_amd64/runtime.a
EOF
cd /tmp
./go/pkg/tool/linux_amd64/compile -o ./go-build825787747/b001/_pkg_.a -trimpath "$WORK/b001=>" -p main -complete -buildid ehL3WBFzOeSpryH7yP9P/ehL3WBFzOeSpryH7yP9P -dwarf=false -goversion go1.15.2 -D _/tmp -importcfg ./go-build825787747/b001/importcfg -pack -c=2 ./test.go
/tmp/go/pkg/tool/linux_amd64/buildid -w $WORK/b001/_pkg_.a # internal
cp $WORK/b001/_pkg_.a /tmp/f8/f806946fd9e5455a99ccbdeaa0e34798d70aed6edb248123b8fd2df38d8bbda3-d # internal
cat >$WORK/b001/importcfg.link << 'EOF' # internal
packagefile command-line-arguments=$WORK/b001/_pkg_.a
packagefile runtime=/tmp/go/pkg/linux_amd64/runtime.a
packagefile internal/bytealg=/tmp/go/pkg/linux_amd64/internal/bytealg.a
packagefile internal/cpu=/tmp/go/pkg/linux_amd64/internal/cpu.a
packagefile runtime/internal/atomic=/tmp/go/pkg/linux_amd64/runtime/internal/atomic.a
packagefile runtime/internal/math=/tmp/go/pkg/linux_amd64/runtime/internal/math.a
packagefile runtime/internal/sys=/tmp/go/pkg/linux_amd64/runtime/internal/sys.a
EOF
mkdir -p $WORK/b001/exe/
cd .
/tmp/go/pkg/tool/linux_amd64/link -o $WORK/b001/exe/test -importcfg $WORK/b001/importcfg.link -s -w -buildmode=exe -buildid=Juw7E6kz4zUijtlWIqs0/ehL3WBFzOeSpryH7yP9P/3cKdpLiR9cWyq_AaprQT/Juw7E6kz4zUijtlWIqs0 -extld=gcc $WORK/b001/_pkg_.a
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 12, 2020

This line is the error when mmap failed. Does the machine you are using not support mmap?

We dropped the fallback path when mmap fails. Maybe we should add them back? @jeremyfaller

@cherrymui cherrymui changed the title internal/ld/main.go panic: operation not permitted cmd/link: panic: operation not permitted Sep 12, 2020
@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Sep 14, 2020

Seems to be the OS is returning a different error than one we've seen before. We check for "operation not supported", not "operation not permitted". This error seems related to using CONFIG_STRICT_DEVMEM in the kernel. Although this could result in a game of whack-a-mole with error codes, since this is the only currently reported error, I'll send a CL to also fail gracefully with that error. If any more come up, I'll come up with a better pattern.

** Thinking about this more, and digging into the code more. I think I will default to not failing. Reading about that flag makes me hesitant to accept I have all the failure cases in mind.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 14, 2020

The old linker uses a fallback path if mmap fails with any reason. Maybe we want to add that back?

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Sep 14, 2020

Yes, that's what I'm gonna do.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 14, 2020

SGTM. Thanks.

@gopherbot
Copy link

@gopherbot gopherbot commented Sep 14, 2020

Change https://golang.org/cl/254777 mentions this issue: cmd/link: ignore mmap failures in the linker

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 14, 2020

Thank you!

@aclements
Copy link
Member

@aclements aclements commented Sep 15, 2020

Wait, backing up a moment, I don't see at all what this would have to do with CONFIG_STRICT_DEVMEM. We're definitely not mmaping /dev/mem.

@aclements
Copy link
Member

@aclements aclements commented Sep 15, 2020

@fogfish, it would be helpful to see an strace of the failing link. Could you run go build -work -x hw.go (-work will let you rerun build commands by hand), then copy-paste the WORK=... line at the top into your shell, then finally run strace -f -e mmap,open,openat <link command> with the failing link command and paste the output?

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 15, 2020

Thank you! I'll try. This is CI/CD instance, which is fully managed by AWS. There are no shell access to it.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 15, 2020

Yeah, taking another look, it seems this fails with EPERM, which is rather weird. Looking at the manpages, I don't think any of the EPERM cases of mmap or fallocate can happen in our case. Maybe ftruncate?

@aclements
Copy link
Member

@aclements aclements commented Sep 15, 2020

@fogfish, thanks. You don't happen to know what file system it's running on, do you (or what your test instance from your initial report was running on)?

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 15, 2020

It runs on top of Linux Kernel 4.14.171-105.231.amzn1.x86_64

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 15, 2020

Let me try to reproduce it with given kernel on EC2

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 15, 2020

If you can run mount, we can also see any special mount options that may be in use.

w.r.t. strace, if you can't trace the WORK= command directly, strace -f -e mmap,open,openat,fallocate,ftruncate go build hw.go should work too, it will just give us a much longer trace to comb through.

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 15, 2020

The environment is hardened enough and I cannot use sudo, etc. I need more time to find a work around my limits.

strace: exit status 1
strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++
@toothrot toothrot added this to the Backlog milestone Sep 15, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Sep 15, 2020

Various sandbox environments that restrict the set of permitted system calls can cause a result of EPERM for any system call that is not permitted. Those environments often don't bother to figure out the "right" error to return.

@aclements
Copy link
Member

@aclements aclements commented Sep 15, 2020

@ianlancetaylor , that's a good point. It wouldn't surprise me at all of fallocate in particular simply wasn't on the allow list.

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 17, 2020

Indeed fallocate failed: Operation not permitted. It also fails in docker for mounted filesystem but linker work in the docker.

@fogfish
Copy link
Author

@fogfish fogfish commented Sep 18, 2020

Here is the all info, I've managed to collect about the environment

Linux version 4.14.177-104.253.amzn2.x86_64 (gcc version 7.3.1 20180712 (Red Hat 7.3.1-6) (GCC)) #1 SMP Fri May 1 02:01:13 UTC 2020

/dev/vdd /tmp ext4 rw,relatime,data=writeback 0 0

sh-4.2$ fallocate -l 1K a
fallocate: fallocate failed: Operation not permitted
sh-4.2$ fallocate -x -l 1K a
sh-4.2$


strace: exit status 1
strace: test_ptrace_get_syscall_info: PTRACE_TRACEME: Operation not permitted
strace: ptrace(PTRACE_TRACEME, ...): Operation not permitted
+++ exited with 1 +++

I can mock a simple program that uses some of system calls or build excessive debug logging of cmd/link if you just give an advice what direction to look ignorer to isolate and understand linker issue.

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 18, 2020

I tend to agree with @ianlancetaylor that this looks like a sandboxed environment. Is this AWS CodePipeline? I've been trying to find docs about its execution environment, though I haven't found much.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Sep 18, 2020

Maybe the path forward here is to add EPERM to the allowed errors of fallocate (along with ENOTSUP/EOPNOTSUPP which we already do), but not accepting arbitrary errors (for one, we do want to catch "not enough space" error, which is the whole purpose of fallocate), also not accepting errors on Mmap.

@prattmic
Copy link
Member

@prattmic prattmic commented Sep 18, 2020

That seems fine; the fallocate is just a hint and if there really is a permission error it should come up again on mmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.