Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestGdbPythonCgo fails on MIPS32 #18784

Closed
cherrymui opened this issue Jan 25, 2017 · 22 comments

Comments

Projects
None yet
9 participants
@cherrymui
Copy link
Contributor

commented Jan 25, 2017

--- FAIL: TestGdbPythonCgo (9.68s)
	runtime-gdb_test.go:55: gdb version 7.7
	runtime-gdb_test.go:189: info goroutines failed: END
		No stack.
		BEGIN print mapvar
		No symbol "mapvar" in current context.
FAIL
FAIL	runtime	673.361s

Looking at the build dashboard, this test fails with a large chance, but not always. It happens on both BE and LE. For example,
https://build.golang.org/log/1b824c580a7c5767ef45a34e29dd6c5c8c49e792
https://build.golang.org/log/5f62e2e14098562cbfcd839333a2a22a4c373c4a

I will take a look.
cc @ianlancetaylor @vstefanovic

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2017

I run the gdb command in runtime/runtime-gdb_test.go manually, the failure actually happens much rarer, once per 10-20 runs. Here is the output on failure (I added an "echo START" before "run"):

Loading Go Runtime support.
Loaded  Script                                                                 
Yes     /data/mipsle/go/src/runtime/runtime-gdb.py                             
Breakpoint 1 at 0x47e6e8: file /data/mipsle/go/src/fmt/print.go, line 257.
START
Failed to read a valid object file image from memory.

Program received signal SIGTRAP, Trace/breakpoint trap.
dl_main (phdr=0x400034, phnum=13, user_entry=0x7fff7988, auxv=0x7fff7db4) at rtld.c:1594
1594    rtld.c: No such file or directory.
BEGIN info goroutines
END
#1  0x77fdddcc in _dl_sysdep_start (start_argptr=<optimized out>, dl_main=0x77fc6748 <dl_main>) at ../elf/dl-sysdep.c:249
249     ../elf/dl-sysdep.c: No such file or directory.
BEGIN print mapvar
No symbol "mapvar" in current context.
END
BEGIN print strvar
No symbol "strvar" in current context.
END
BEGIN info locals
phdr = <optimized out>
phnum = <optimized out>
user_entry = 4197952
av = <optimized out>
END
#0  dl_main (phdr=0x400034, phnum=13, user_entry=0x7fff7988, auxv=0x7fff7db4) at rtld.c:1594
1594    rtld.c: No such file or directory.
BEGIN goroutine 2 bt
No such goroutine:  2
END

On a success run:

Loading Go Runtime support.
Loaded  Script                                                                 
Yes     /data/mipsle/go/src/runtime/runtime-gdb.py                             
Breakpoint 1 at 0x47e6e8: file /data/mipsle/go/src/fmt/print.go, line 257.
START
Failed to read a valid object file image from memory.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/mipsel-linux-gnu/libthread_db.so.1".
[New Thread 0x77db6490 (LWP 3009)]
[New Thread 0x775b6490 (LWP 3010)]
[New Thread 0x763ff490 (LWP 3012)]
[New Thread 0x76db6490 (LWP 3011)]

Breakpoint 1, fmt.Println (a= []interface {} = {...}, n=277274832, err=...) at /data/mipsle/go/src/fmt/print.go:257
257             return Fprintln(os.Stdout, a...)
BEGIN info goroutines
* 1 running  runtime.systemstack_switch
* 17 syscall  runtime.goexit
  2 waiting  runtime.gopark
  18 waiting  runtime.gopark
  19 runnable runtime.runfinq
END
#1  0x00485f30 in main.main () at /data/mipsle/gdbcgotest.go:14
14              fmt.Println("hi") // line 12
BEGIN print mapvar
$1 = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
END
BEGIN print strvar
$2 = "abc"
END
BEGIN info locals
slicevar.len = 1
slicevar.cap = 16
slicevar.ptr = "def"
mapvar = map[string]string = {["abc"] = "def", ["ghi"] = "jkl"}
strvar = "abc"
END
#0  fmt.Println (a= []interface {} = {...}, n=277274832, err=...) at /data/mipsle/go/src/fmt/print.go:257
257             return Fprintln(os.Stdout, a...)
BEGIN goroutine 2 bt
#0  runtime.gopark (unlockf={void (struct runtime.g *, void *, bool *)} 0x108227bc, lock=0x4fddd0 <runtime.forcegc>, reason="force gc (idle)", traceEv=20 '\024', traceskip=1) at /data/mipsle/go/src/runtime/proc.go:272
#1  0x0042e464 in runtime.goparkunlock (lock=0x4fddd0 <runtime.forcegc>, reason="force gc (idle)", traceEv=20 '\024', traceskip=1) at /data/mipsle/go/src/runtime/proc.go:277
#2  0x0042e174 in runtime.forcegchelper () at /data/mipsle/go/src/runtime/proc.go:226
#3  0x00458344 in runtime.goexit () at /data/mipsle/go/src/runtime/asm_mipsx.s:944
Backtrace stopped: frame did not save the PC
END

It seems the problem might be gdb loading libthread_db?

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2017

I added "-ex" "set debug libthread-db 1" in gdb command. It seems indeed related to libthread:
On failure:

Loading Go Runtime support.
Loaded  Script                                                                 
Yes     /data/mipsle/go/src/runtime/runtime-gdb.py                             
libthread-db debugging is 1.
Breakpoint 1 at 0x47e6e8: file /data/mipsle/go/src/fmt/print.go, line 257.
START
Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib/mipsel-linux-gnu/libthread_db.so.1.
td_ta_new failed: application not linked with libthread
thread_db_load_search returning 0
Failed to read a valid object file image from memory.

Program received signal SIGTRAP, Trace/breakpoint trap.
dl_main (phdr=0x400034, phnum=13, user_entry=0x7fff7988, auxv=0x7fff7db4) at rtld.c:1594
...

On success:

Loading Go Runtime support.
Loaded  Script                                                                 
Yes     /data/mipsle/go/src/runtime/runtime-gdb.py                             
libthread-db debugging is 1.
Breakpoint 1 at 0x47e6e8: file /data/mipsle/go/src/fmt/print.go, line 257.
START
Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib/mipsel-linux-gnu/libthread_db.so.1.
td_ta_new failed: application not linked with libthread
thread_db_load_search returning 0
Failed to read a valid object file image from memory.
Trying host libthread_db library: libthread_db.so.1.
Host libthread_db.so.1 resolved to: /lib/mipsel-linux-gnu/libthread_db.so.1.
Found 0 new threads in iteration 0.
Found 0 new threads in iteration 1.
Found 0 new threads in iteration 2.
Found 0 new threads in iteration 3.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/mipsel-linux-gnu/libthread_db.so.1".
thread_db_load_search returning 1
...
@dhananjay92

This comment has been minimized.

Copy link
Member

commented Jan 25, 2017

td_ta_new failed: application not linked with libthread

Perhaps it just needs -pthread gcc flag, no?

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2017

It actually has pthread:

/usr/bin/readelf -d ./gdbcgotest 

Dynamic section at offset 0x2ac contains 27 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000c (INIT)                       0x400e04
 0x0000000d (FINI)                       0x486bd0
 0x00000019 (INIT_ARRAY)                 0x4f8ba4
 0x0000001b (INIT_ARRAYSZ)               4 (bytes)
 0x0000001a (FINI_ARRAY)                 0x4f8ba8
 0x0000001c (FINI_ARRAYSZ)               4 (bytes)
 0x00000004 (HASH)                       0x4003ac
 0x00000005 (STRTAB)                     0x400984
 0x00000006 (SYMTAB)                     0x400554
 0x0000000a (STRSZ)                      922 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x70000016 (MIPS_RLD_MAP)               0x4fbcd0
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x4fdb50
 0x70000001 (MIPS_RLD_VERSION)           1
 0x70000005 (MIPS_FLAGS)                 NOTPOT
 0x70000006 (MIPS_BASE_ADDRESS)          0x400000
 0x7000000a (MIPS_LOCAL_GOTNO)           20
 0x70000011 (MIPS_SYMTABNO)              67
 0x70000012 (MIPS_UNREFEXTNO)            49
 0x70000013 (MIPS_GOTSYM)                0x28
 0x6ffffffe (VERNEED)                    0x400da4
 0x6fffffff (VERNEEDNUM)                 2
 0x6ffffff0 (VERSYM)                     0x400d1e
 0x00000000 (NULL)                       0x0

The mystery part is why it fails sometimes, with the same binary...

@thanm

This comment has been minimized.

Copy link
Member

commented Jan 25, 2017

Would it make sense just to turn off libthread_db, if it will fix the test flakiness? Doesn't help with the underlying problem, though. I'll send an experimental CL.

@gopherbot

This comment has been minimized.

Copy link

commented Jan 25, 2017

CL https://golang.org/cl/35734 mentions this issue.

@dhananjay92

This comment has been minimized.

Copy link
Member

commented Jan 25, 2017

By removing undoing this line from aad06da, I can reliably reproduce the failure on linux-amd64. I get,

runtime-gdb_test.go:55: gdb version 7.9
runtime-gdb_test.go:194: print mapvar failed: No symbol "mapvar" in current context.

So, I suspect that #18745 didn't get fixed for MIPS. It should be easy to confirm if that's the case.

@cherrymui, since you have access to a MIPS machine, can you run objdump --debugging <binary> | grep "DW_AT_low_pc" -B 2 -A 4? If it prints 0x0 for failure cases, then it's the same issue as #18745 (ELF relocations not emitted when doing an external link). However, that would still not explain why it happens only occasionally though.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2017

I don't think the MIPS problem can be related to the fix for #18745, because the MIPS problem is sporadic, and the fix for #18745 is deterministic. If the MIPS problem were directly related to that, it would always fail. Since it mostly succeeds, I think the problem must be something else. But I don't know what.

If we don't figure it out in a couple of hours, we should skip the test on MIPS.

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2017

@thanm disabling auto-loading libthread_db doesn't seem to help. I still see occasional failure running the gdb command manually.

@dhananjay92 objdump --debugging <binary> | grep "DW_AT_low_pc" -B 2 -A 4 prints quite a lot (presumably one entry for each symbol?) but all are not 0x0.

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 25, 2017

Maybe libthread is not the problem?

Program received signal SIGTRAP, Trace/breakpoint trap.
dl_main (phdr=0x400034, phnum=13, user_entry=0x7fff7988, auxv=0x7fff7db4) at rtld.c:1594
1594    rtld.c: No such file or directory.

I think this is the problem, but I don't know if it is related to libthread.

@minux

This comment has been minimized.

Copy link
Member

commented Jan 25, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 25, 2017

As far as I know, an assertion failure in the dynamic linker does not normally present itself as a SIGTRAP signal. That sounds like something that would happen only when running the program under gdb, when gdb is trying to instrument the dynamic linker to detect the set of shared libraries and is somehow messing up.

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jan 26, 2017

@minux

/lib/mipsel-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.19-18+deb8u6) stable release version 2.19, by Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.16.7 system on 2016-09-05.
Available extensions:
	crypt add-on version 2.1 by Michael Glad and others
	GNU Libidn by Simon Josefsson
	Native POSIX Threads Library by Ulrich Drepper et al
	BIND-8.2.3-T5B
libc ABIs: MIPS_PLT UNIQUE
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

/bin/cat /proc/sys/kernel/randomize_va_space
1
@minux

This comment has been minimized.

Copy link
Member

commented Jan 26, 2017

@vstefanovic

This comment has been minimized.

Copy link
Member

commented Jan 30, 2017

Running just this command:

gdb --batch -ex "b main" -ex run -ex bt hello-c-world

from runtime-gdb_test.go prints:

#0  0x77fc6880 in ?? () from /lib/ld.so.1

once in every dozen times (on average).

hello-c-world is a plain C exe.

Running the same gdb command in a loop (outside of runtime-gdb_test.go) failed once in about 10,000 runs.
(By 'failed' I mean - printed backtrace from /lib/ld.so.1 instead of main.)

The board is mips64be (but with mips32be rootfs), libc: 2.13-38+deb7u8, gdb 7.9.

@gopherbot

This comment has been minimized.

Copy link

commented Feb 2, 2017

CL https://golang.org/cl/36250 mentions this issue.

gopherbot pushed a commit that referenced this issue Feb 2, 2017

runtime: skip flaky TestGdbPythonCgo on MIPS
It seems the problem is on gdb and the dynamic linker. Skip the
test for now until we figure out what's going on with the system.

Updates #18784.

Change-Id: Ic9320ffd463f6c231b2c4192652263b1cf7f4231
Reviewed-on: https://go-review.googlesource.com/36250
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

@bradfitz bradfitz added this to the Go1.9 milestone Mar 21, 2017

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 7, 2017

Is gdb itself just flaky on MIPS? It's not obvious to me that there's anything we can do about this on the Go side.

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2017

Probably some bad interaction between gdb and the dynamic loader. I still don't understand what is happening. As @vstefanovic can reproduce it with a C program, it is probably not much related to Go.

@aclements

This comment has been minimized.

Copy link
Member

commented Jun 7, 2017

Thanks. Okay with closing the issue?

@vstefanovic, would you mind filing an issue with GDB with your test from #18784 (comment)?

@cherrymui

This comment has been minimized.

Copy link
Contributor Author

commented Jun 7, 2017

I'm ok with closing. We can reopen if we find anything related to Go.

@aclements aclements closed this Jun 7, 2017

gopherbot pushed a commit that referenced this issue Jun 15, 2017

runtime: enable GDB tests on mips64 (except TestGdbPythonCgo)
They were failing when run on 32bit RFS, with 32bit gdb.
(mips64 builder now has 64bit RFS, with gdb 7.9.)
Leaving TestGdbPythonCgo disabled, it behaves as described in #18784.

Fixes #18173

Change-Id: I3c438cd5850b7bfd118ac6396f40c1208bac8c2d
Reviewed-on: https://go-review.googlesource.com/45874
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@vstefanovic

This comment has been minimized.

Copy link
Member

commented Jul 25, 2017

Even

gdb -ex r -ex q hello-c-world

raises

Program received signal SIGTRAP, Trace/breakpoint trap.
0x77fc7830 in ?? () from /lib/ld.so.1

quite often, as long as the machine is doing some other work at the same time (running gdb's configure was sufficient in my case).
So it's clearly unrelated to Go.

@aclemens, I will, probably, but I'd like to have a look at it myself first.

@aclements

This comment has been minimized.

Copy link
Member

commented Jul 25, 2017

@vstefanovic, thanks!

@golang golang locked and limited conversation to collaborators Jul 25, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.