New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: gdb tests hang on go1.8beta2 gentoo ebuild #18442

Open
williamh opened this Issue Dec 28, 2016 · 33 comments

Comments

Projects
None yet
7 participants
@williamh

williamh commented Dec 28, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.8beta2 Linux/amd64

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

I attempted to install go1.8beta2 using our package manager and an ebuild I'm about to add to Gentoo. Testss are turned on which causes the package manager to attempt to run the Go test suite.

What did you expect to see?

A successful installation of Go with all tests passing.

What did you see instead?

The tests failed which aborts installation.

The build log is attached. Any guidance you can provide for troubleshooting this will be greatly appreciated. :-)

build-log.txt

@davecheney

This comment has been minimized.

Contributor

davecheney commented Dec 28, 2016

@bradfitz

This comment has been minimized.

Member

bradfitz commented Dec 28, 2016

Looks like your host is just slower than the tests assume.

Try setting GO_TEST_TIMEOUT_SCALE=2 or GO_TEST_TIMEOUT_SCALE=3 in your environment.

But really, the Gentoo package manager should probably just run make.bash instead of all.bash. The tests are mostly interesting for us and others working on Go itself.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Dec 28, 2016

@williamh

This comment has been minimized.

williamh commented Dec 28, 2016

Portage does not run the tests unless a user requests that it do so.
We run make.bash to build Go, then optionally "run.bash --no-rebuild" to run the tests.

I can make portage block users from running the tests if they pass outside portage, but I thought we should give the option to run them since your build automatically runs them. :-)

I will run a quick build outside of portage and let you know how it goes.

@williamh

This comment has been minimized.

williamh commented Dec 28, 2016

Hi all,

the tests pass if the build is run without portage in the mix.

If you want, I am willing to attempt to help troubleshoot. Otherwise, I can just block the tests so that portage does not run them.

Which do you prefer?

@williamh

This comment has been minimized.

williamh commented Dec 28, 2016

Hi all,

I ran the build including tests inside portage again, setting GO_TEST_TIMEOUT_SCALE=3 in the environment as suggested.

It failed again, the log is attached.

build-log1.txt

The one thing I can think of is, do the tests write files somewhere on the filesystem? If they do, where? is that path programmable in some way?

@davecheney

This comment has been minimized.

Contributor

davecheney commented Dec 28, 2016

@williamh

This comment has been minimized.

williamh commented Dec 28, 2016

I pointed TMPDIR to the appropriate location and still got a failure.
The log is attached.
It definitely looks like some kind of timeout, but I'm not sure what is causing it.

build-log2.txt

@davecheney

This comment has been minimized.

Contributor

davecheney commented Dec 28, 2016

@williamh

This comment has been minimized.

williamh commented Dec 28, 2016

I cleared dmesg so I wouldn't see any old messages then re-ran the build with tests enabled. Nothing appeared in dmesg.

@rsc

This comment has been minimized.

Contributor

rsc commented Jan 4, 2017

The runtime test is hung waiting for gdb in TestGdbBacktrace. It runs:

gdb -nx -batch \
	-ex 'set startup-with-shell off' \
	-ex 'break main.eee' \
	-ex 'run' \
	-ex 'backtrace' \
	-ex 'continue' \
	a.exe

where a.exe is the result of compiling:

package main

//go:noinline
func aaa() bool { return bbb() }

//go:noinline
func bbb() bool { return ccc() }

//go:noinline
func ccc() bool { return ddd() }

//go:noinline
func ddd() bool { return f() }

//go:noinline
func eee() bool { return true }

var f = eee

func main() {
	_ = aaa()
}

Gdb is not exiting. Can you try running that command by hand and see why?

It is also hung waiting for gdb in TestGdbAutotmpTypes, which does a similar thing.

@rsc rsc added this to the Go1.8Maybe milestone Jan 4, 2017

@rsc rsc changed the title from go-1.8beta2 tests are failing on linux/amd64 to runtime: gdb tests hang on go1.8beta2 gentoo ebuild Jan 4, 2017

@rsc rsc modified the milestones: Go1.9Maybe, Go1.8Maybe Feb 7, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 7, 2017

Still waiting for a reply.

@williamh

This comment has been minimized.

williamh commented Feb 7, 2017

I will look at this in the next 48 hours. :-)

@williamh

This comment has been minimized.

williamh commented Feb 8, 2017

I am now working with go1.8rc3.
When I ran this by hand, a exited normally. However, when I ran the test inside our package manager environment by copying a.go into the build environment and modifying the test functionality to compile it and run this gdb command, it still hung.
The only relevant output appears to be the following showing that a breakpoint was set:

Breakpoint 1 at 0x44d680: file /usr/lib/go/src/a/a.go, line 16.

After this output, it hangs. The only option I have is to kill the build.

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 8, 2017

It sounds like portage is breaking things somehow? Can you attach to gdb and see if it is stuck on something interesting?

@williamh

This comment has been minimized.

williamh commented Feb 8, 2017

I am attaching output from strace, which will show all of the system calls.
strace.txt

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 9, 2017

The last few lines of the trace are

stat("/var/tmp/portage/dev-lang/go-1.8_rc3/work/go/a.exe", {st_mode=S_IFREG|0755, st_size=960330, ...}) = 0
close(7)                                = 0
stat("/var/tmp/portage/dev-lang/go-1.8_rc3/work/go/a.exe", {st_mode=S_IFREG|0755, st_size=960330, ...}) = 0
personality(0xffffffff)                 = 0 (PER_LINUX)
personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
personality(0xffffffff)                 = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
vfork(

It looks like vfork starts and just hangs? That's kind of strange. Or maybe you're running strace without -f and it got very confused?

The truncation in the strace log does match the behavior you see without strace: it tells about the breakpoint but then when it's time to start the program, bad things.

I don't know much about portage. Is there some reason it might block vfork?

@minux

This comment has been minimized.

Member

minux commented Feb 9, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 9, 2017

Is there some environment variable we can use to detect that we're running as part of portage? It would be easy enough to disable the test once we believe we understand the root cause and that it's not worth fixing.

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

You are correct that our sandbox is the culprit. I ran the same test a little while ago disabling the sandbox and it worked fine. I do not know the answers to either of your questions myself, but I will find out asap tomorrow morning ( it is 21:48 here now).

@rsc rsc modified the milestones: Go1.8Maybe, Go1.9Maybe Feb 9, 2017

@minux

This comment has been minimized.

Member

minux commented Feb 9, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 9, 2017

@minux, the net/http tests are passing (see original build log). Maybe strace is not available in the portage chroot but gdb is?

@minux

This comment has been minimized.

Member

minux commented Feb 9, 2017

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 9, 2017

FWIW, net/http does:

        if _, err := exec.LookPath("strace"); err != nil {
                t.Skip("skipping; strace not found in path")
        }
...
        if err := child.Start(); err != nil {
                t.Skipf("skipping; failed to start straced child: %v", err)
        }
@tw4452852

This comment has been minimized.

Contributor

tw4452852 commented Feb 9, 2017

Just did a test. Gdb breakpoint doesn't work in sandbox with go program.

$ gdb -nx /tmp/go-build089897835/a.exe
GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /tmp/go-build089897835/a.exe...done.
(gdb) b main.eee
Breakpoint 1 at 0x44dc60: file /tmp/go-build089897835/main.go, line 17.
(gdb) r
Starting program: /tmp/go-build089897835/a.exe 
During startup program exited normally.
(gdb) info br
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000000044dc60 in main.eee at /tmp/go-build089897835/main.go:17
(gdb) bt
No stack.

But it works on a c program.

// main.c

#include <stdio.h>

int
main(int argc, const char *argv[])
{
        printf("hello world\n");
        return 0;
}
$ gdb -nx ~/tmp/test
GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/tw/tmp/test...done.
(gdb) b main
Breakpoint 1 at 0x400565: file main.c, line 6.
(gdb) r
Starting program: /home/tw/tmp/test 

Breakpoint 1, main (argc=1, argv=0x7fffffffd668) at main.c:6
6               printf("hello world\n");
(gdb) bt
#0  main (argc=1, argv=0x7fffffffd668) at main.c:6
(gdb) 
@minux

This comment has been minimized.

Member

minux commented Feb 9, 2017

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

I just looked at the sandbox sources, and we are using ptrace.

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

@minux There is not a way for an ebuild to disable the sandbox. The only option I would have is to disable the test phase for dev-lang/go.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Feb 9, 2017

@williamh, did you see @rsc's comment above? #18442 (comment)

Is there a way to detect when we're running in your specific sandbox? Is there some environment variable set like HEY_YOURE_BUILDING_IN_GENTOO_SANDBOX=1?

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

@bradfitz I don't know of a specific variable you can use for that, but I'm researching it. I'll give you a definite answer asap. :-)

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

@bradfitz @minux @rsc I was just advised that I may be wrong about disabling the sandbox for the tests.
SANDBOX_ON=1 in the environment means the sandbox is active, and I may be able to turn it off with SANDBOX_ON=0 only for the tests. I am testing this now and I will update you.

@williamh

This comment has been minimized.

williamh commented Feb 9, 2017

It looks like SANDBOX_ON=0 does not currently disable the sandbox. I verified this and also saw a comment in the source that says once the sandbox is active it can't be deactivated.
If you want to skip the tests in our sandbox, the best way to do that is to skip them if SANDBOX_ON=1 is in the environment in case we decide to honor SANDBOX_ON=0 in the future.

@ALTree

This comment has been minimized.

Member

ALTree commented Sep 22, 2018

Is this still a problem on 1.11?

@rsc rsc added the WaitingForInfo label Nov 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment