Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestLldbPython fails for lldb 7.0.0 #29244

Open
jimrobinson opened this issue Dec 14, 2018 · 11 comments
Assignees
Milestone

Comments

@jimrobinson
Copy link

@jimrobinson jimrobinson commented Dec 14, 2018

What version of Go are you using (go version)?

$ go version
go version devel +a6293995c5 Thu Dec 13 23:58:06 2018 +0000 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jimr/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jimr/lib/go:/home/jimr/proj/github:/usr/local/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build101053857=/tmp/go-build -gno-record-gcc-switches"

This is a Fedora 29 machine, it has lddb 7.0.0-1 and python2 2.7.15 installed:

$ dnf list --installed python2 lldb python2-lldb
Installed Packages
lldb.x86_64                           7.0.0-1.fc29                      @fedora 
python2.x86_64                        2.7.15-11.fc29                    @updates
python2-lldb.x86_64                   7.0.0-1.fc29                      @fedora 

What did you do?

$ cd /usr/local/go/src
$ git rev-parse HEAD
a629399
$ export GOROOT_BOOTSTRAP=/usr/local/lib/go1.11.2
$ ./all.bash

What did you expect to see?

...
ALL TESTS PASSED
---
Installed Go for linux/amd64 in /usr/local/go
Installed commands in /usr/local/go/bin

What did you see instead?

...
--- FAIL: TestLldbPython (2.33s)
    runtime-lldb_test.go:180: Unexpected lldb output:
        Created target
        Created breakpoint
        Process launched
FAIL
FAIL	runtime	20.873s
...
2018/12/13 16:38:53 Failed: exit status 1

It appears that the lldb script.py is not returning the expected output:

$ /usr/local/go/bin/go build "-gcflags=all=-N -l" "-ldflags=-compressdwarf=false" -o a.exe

$ /usr/bin/python2.7 script.py /usr/lib64/python2.7/site-packages
Created target
Created breakpoint
Process launched

Digging it, this section of script.py is failing because the stop reason is not the expected value:

32     if state == lldb.eStateStopped:
33       for t in process.threads:
34         if t.GetStopReason() == lldb.eStopReasonBreakpoint:

I added some debugging:

 diff -u script.py script.debug.py 
--- script.py	2018-12-13 16:12:27.539701062 -0800
+++ script.debug.py	2018-12-13 16:33:50.672528818 -0800
@@ -5,6 +5,19 @@
 
 TIMEOUT_SECS = 5
 
+eStopReason = {
+    lldb.eStopReasonNone: "lldb.eStopReasonNone",
+    lldb.eStopReasonTrace: "lldb.eStopReasonTrace",
+    lldb.eStopReasonBreakpoint: "lldb.eStopReasonBreakpoint",
+    lldb.eStopReasonWatchpoint: "lldb.eStopReasonWatchpoint",
+    lldb.eStopReasonSignal: "lldb.eStopReasonSignal",
+    lldb.eStopReasonException: "lldb.eStopReasonException",
+    lldb.eStopReasonExec: "lldb.eStopReasonExec",
+    lldb.eStopReasonPlanComplete: "lldb.eStopReasonPlanComplete",
+    lldb.eStopReasonThreadExiting: "lldb.eStopReasonThreadExiting",
+    lldb.eStopReasonInstrumentation: "lldb.eStopReasonInstrumentation"
+}
+
 debugger = lldb.SBDebugger.Create()
 debugger.SetAsync(True)
 target = debugger.CreateTargetWithFileAndArch("a.exe", None)
@@ -44,6 +57,8 @@
               print "intvar = %s" % (var.GetValue(),)
             else:
               print "no intvar"
+        else:
+          print t
+          print eStopReason[t.GetStopReason()]
     else:
       print "Process state", state
     process.Destroy()

and this produces output indicating "Signal" and "None" stopped reasons:

$ /usr/bin/python2.7 script.debug.py /usr/lib64/python2.7/site-packages
Created target
Created breakpoint
Process launched
thread #2: tid = 4968, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #3: tid = 4967, 0x0000000000458d1d a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #4: tid = 4969, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #5: tid = 4970, 0x00000000004592b3 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
lldb.eStopReasonSignal
thread #6: tid = 1, 0x00000000004a0070 a.exe`main.main at main.go:11
lldb.eStopReasonNone
thread #7: tid = 2, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone
thread #8: tid = 3, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone
thread #9: tid = 4, 0x000000000042dbc4 a.exe`runtime.gopark(unlockf=, lock=, reason=, traceEv=, traceskip=) at proc.go:302
lldb.eStopReasonNone

TestLldbPython.debug.tar.gz

@agnivade agnivade changed the title TestLldbPython - runtime-lldb_test.go:180: Unexpected lldb output runtime: TestLldbPython fails for lldb 7.0.0 Dec 14, 2018
@agnivade

This comment has been minimized.

Copy link
Contributor

@agnivade agnivade commented Dec 14, 2018

Related to #22299 which fails for older lldb versions. But this one seems to fail even for new ones.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Dec 14, 2018

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Dec 14, 2018

Hmm. I poked at this for a while but never go to the point where the test could run on my machine (some bad combination of python, lldb, and python-lldb bindings config I assume). I get:

File "/usr/lib/python2.7/dist-packages/lldb/__init__.py", line 45, in swig_import_helper return importlib.import_module('_lldb') File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module __import__(name) ImportError: No module named _lldb

This is with "sudo apt-get install lldb-7 python-lldb-7" to install the basics. Is there a trick that I am missing?

@jimrobinson

This comment has been minimized.

Copy link
Author

@jimrobinson jimrobinson commented Dec 17, 2018

@thanm Given the name apt-get, I assume you're on a Debian or Ubuntu system? Which revision? I'm on Fedora 29 so I'll need to spin up a VM if I am to try and replicate the issue on another distro.

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Dec 18, 2018

Yes, I am using a Debian variant. Thanks for the offer of additional testing, but I'd like to discuss the test with a couple of my team-mates first. Looking at the test it's not clear to me exactly what it is supposed to be verifying. If the goal is only to check whether lldb can set a breakpoint on the toy program, then it might better just to run lldb in batch mode as opposed to via the python interface. Stay tuned.

@thanm thanm self-assigned this Dec 18, 2018
@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Dec 19, 2018

@jimrobinson I did manage to get my version 7 lldb + python-lldb installation issues resolved, so I can run test locally, including your debug script -- however I can't seem to reproduce the same bad behavior on my end (I do see the correct stop reason). I'll poke at it some more later today.

@jimrobinson

This comment has been minimized.

Copy link
Author

@jimrobinson jimrobinson commented Dec 19, 2018

@thanm thank you. So can we verify our respective setups?

Linux 4.20.0-0.rc6
Python 2.7.15
lldb version 7.0.0

The libraries python2 links against:

glibc-2.28-23.fc29.x86_64
python2-libs-2.7.15-11.fc29.x86_64

The libraries lldb links against:

clang-libs-7.0.0-2.fc29.x86_64
glibc-2.28-23.fc29.x86_64
libedit-3.1-24.20170329cvs.fc29.x86_64
libffi-3.1-18.fc29.x86_64
libgcc-8.2.1-5.fc29.x86_64
libstdc++-8.2.1-5.fc29.x86_64
libxml2-2.9.8-4.fc29.x86_64
lldb-7.0.0-1.fc29.x86_64
llvm-libs-7.0.0-2.fc29.x86_64
ncurses-libs-6.1-8.20180923.fc29.x86_64
python2-libs-2.7.15-11.fc29.x86_64
xz-libs-5.2.4-3.fc29.x86_64
zlib-1.2.11-14.fc29.x86_64
@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Dec 19, 2018

@thanm, should this be milestoned to 1.12, 1.13, or Unplanned?

@thanm

This comment has been minimized.

Copy link
Member

@thanm thanm commented Dec 19, 2018

At this point I am 99% sure it is a problem with the test, not with the Go compiler or DWARF. I'll put it in 1.13

@thanm thanm added this to the Go1.13 milestone Dec 19, 2018
@siebenmann

This comment has been minimized.

Copy link

@siebenmann siebenmann commented Jan 25, 2019

This didn't used to happen to me in the same environment (Fedora 29 with lldb available), but something changed recently and I'm now getting this exact failure. A potentially relevant piece of information is that I recently applied a Fedora update that updated llvm but not lldb. It was apparently a minor rebuild that only changed the RPM version number (7.0.1-1 to 7.0.1-2), but maybe something gets perturbed even then.

@siebenmann

This comment has been minimized.

Copy link

@siebenmann siebenmann commented Jan 31, 2019

After some more investigation, I've established that what causes this test to start failing for me is switching from Fedora' earlier kernels (up through 4.19.15) to their 4.20.x kernel (either 4.20.3 or 4.20.4 so far). In 4.20.x, the Python script sees lldb stopping at the correct line but with reason eStopReasonNone. This also happens for a version of the test program built with Go 1.11, not just the latest go tip version, and also with Go 1.10 if I remove the go build option to disable compressed DWARF. So I think this is a general issue on modern Linux kernels, possibly a general one that lldb has with anything that uses threads (I haven't tested a threaded C program). Delve appears to be able to do breakpoints properly even on 4.20.x kernels on this test program.

Running lldb by hand on a.exe with a breakpoint set at the same place (with list main.main and then break set -l 10) may also be failing. When I do this, lldb stops, but it only displays four threads that have stopped with SIGSTOP, all of which appear to be internal Go threads:

* thread #2, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536
[...]
  thread #3, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131
[...]
  thread #4, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536
[...]
  thread #5, name = 'a.exe', stop reason = signal SIGSTOP
    frame #0: 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131

The actual hit breakpoint only shows up if I list all threads:

(lldb) thread list
Process 30484 stopped
* thread #2: tid = 30488, 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
  thread #3: tid = 30487, 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
  thread #4: tid = 30489, 0x0000000000459743 a.exe`runtime.futex at sys_linux_amd64.s:536, name = 'a.exe', stop reason = signal SIGSTOP
  thread #5: tid = 30490, 0x00000000004591ad a.exe`runtime.usleep at sys_linux_amd64.s:131, name = 'a.exe', stop reason = signal SIGSTOP
  thread #6: tid = 1, 0x00000000004a0b28 a.exe`main.main at main.go:10
  thread #7: tid = 2, 0x000000000042dc34 a.exe`runtime.gopark(unlockf=<unavailable>, lock=<unavailable>, reason=<unavailable>, traceEv=<unavailable>, traceskip=<unavailable>) at proc.go:302
  thread #8: tid = 3, 0x000000000042dc34 a.exe`runtime.gopark(unlockf=<unavailable>, lock=<unavailable>, reason=<unavailable>, traceEv=<unavailable>, traceskip=<unavailable>) at proc.go:302
  thread #9: tid = 17, 0x0000000000417d10 a.exe`runtime.runfinq at mfinal.go:161

If I cont this in lldb, it pauses twice more before printing the hi, but at no point does lldb think that we've stopped in main(), and after the first time thread list says that nothing is even running in main() any more.

Using the same compiled a.exe binary on 4.19.15 with lldb, everything works exactly as it should; lldb immediately stops at the right point and shows what you'd want:

(lldb) break set -l 10
Breakpoint 1: where = a.exe`main.main + 392 at main.go:10, address = 0x00000000004a0b28
(lldb) run
Process 3678 launched: '/tmp/t/a.exe' (x86_64)
Process 3678 stopped
* thread #6, stop reason = breakpoint 1.1
    frame #0: 0x00000000004a0b28 a.exe`main.main at main.go:10
   7            mapvar["ghi"] = "jkl"
   8            intvar := 42
   9            ptrvar := &intvar
-> 10           fmt.Println("hi") // line 10
   11           _ = ptrvar
   12   }

(Since I ran into this myself, a note for people trying to reproduce this: if you extract the main.go program by hand from runtime/runtime-lldb_test.go, note that it needs to have a blank line at the top or all line numbers will be off by one.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.