Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAPIT trace sometimes fails ApplicationLoaders.GetClassLoader #1116

Closed
baldwinn860 opened this issue Sep 18, 2017 · 12 comments
Closed

GAPIT trace sometimes fails ApplicationLoaders.GetClassLoader #1116

baldwinn860 opened this issue Sep 18, 2017 · 12 comments

Comments

@baldwinn860
Copy link
Contributor

Seeing this in Robot, there are a few instances (it seems to be random) where gapit is waiting for ApplicationLoaders.getClassLoader() but never moves on from it, leaving the trace to run forever. Typically afterwards gapit will wait for Application.onCreate() but that never happens. The application has already started on the device, and so it just runs infinitely.
Sample failing gapit log:

18:12:28.716 I: [start] <gapit> Turning device screen on
18:12:28.747 I: [start] <gapit> Checking for lockscreen
18:12:28.780 I: [start] <gapit> Checking gapid.apk is installed
18:12:28.780 I: [gapidapk.EnsureInstalled⇒start] <gapit> Examining gapid.apk on host...
18:12:28.784 I: [gapidapk.EnsureInstalled⇒start] <gapit> Looking at installed packages...
18:12:29.953 I: [gapidapk.EnsureInstalled⇒start] <gapit> Found gapid package...
18:12:29.953 I: [start] <gapit> Forwarding
18:12:29.981 I: [start] <gapit> Starting activity in debug mode
18:12:30.729 I: [start] <gapit> Forwarding TCP port 45583 -> JDWP pid 7181
18:12:30.734 I: [start] <gapit> Connecting to JDWP
18:12:30.831 I: [start] <gapit> Waiting for ApplicationLoaders.getClassLoader()

I used SIGKILL to terminate the process, so I didn't get the stack trace. Next time I see the issue I will be sure to use SIGABRT instead.

@baldwinn860
Copy link
Contributor Author

baldwinn860 commented Sep 21, 2017

SIGABRT: abort
PC=0x4b96f3 m=2 sigcode=0

goroutine 0 [idle]:
runtime.futex(0x123a3f8, 0x0, 0x7f1d0139cd28, 0x0, 0x7f1d00000000, 0x4b9386, 0x3c, 0x0, 0x7f1d0139cd70, 0x46ff90, ...)
	/tmpfs/src/go/src/runtime/sys_linux_amd64.s:423 +0x23
runtime.futexsleep(0x123a3f8, 0x7f1d00000000, 0xdf8475800)
	/tmpfs/src/go/src/runtime/os_linux.go:62 +0xd7
runtime.notetsleep_internal(0x123a3f8, 0xdf8475800, 0x0)
	/tmpfs/src/go/src/runtime/lock_futex.go:174 +0xd0
runtime.notetsleep(0x123a3f8, 0xdf8475800, 0x489e1b01f901)
	/tmpfs/src/go/src/runtime/lock_futex.go:194 +0x56
runtime.sysmon()
	/tmpfs/src/go/src/runtime/proc.go:3805 +0x135
runtime.mstart1()
	/tmpfs/src/go/src/runtime/proc.go:1179 +0x11e
runtime.mstart()
	/tmpfs/src/go/src/runtime/proc.go:1149 +0x64

goroutine 1 [select, 45 minutes]:
github.com/google/gapid/core/java/jdwp.(*Connection).WatchEvents(0xc420022bd0, 0x120c340, 0xc420b68780, 0xc42021b298)
	/tmpfs/src/github/src/github.com/google/gapid/core/java/jdwp/event.go:34 +0x1fc
github.com/google/gapid/core/java/jdwp.(*Connection).WaitForMethodEntry(0xc420022bd0, 0x120c340, 0xc420b68780, 0x2, 0x6facdf4c, 0x0, 0x0, 0x0, 0x0)
	/tmpfs/src/github/src/github.com/google/gapid/core/java/jdwp/helpers.go:110 +0x1bf
github.com/google/gapid/gapii/client.waitForVulkanLoad(0x120c340, 0xc420b68780, 0xc420022bd0, 0x2f, 0x0, 0x0)
	/tmpfs/src/github/src/github.com/google/gapid/gapii/client/jdwp_loader.go:78 +0x1dd
github.com/google/gapid/gapii/client.(*Process).loadAndConnectViaJDWP(0xc420b68240, 0x120c400, 0xc420d62ab0, 0xc42017b360, 0x1d49, 0x1212a00, 0xc4202275a0, 0x0, 0x0)
	/tmpfs/src/github/src/github.com/google/gapid/gapii/client/jdwp_loader.go:151 +0x745
github.com/google/gapid/gapii/client.StartOrAttach(0x120c400, 0xc4201e4960, 0xc420199cb0, 0xc42021ba78, 0x5, 0x0, 0x10000001ffffffff, 0x7ffd1636df97, 0x40, 0x0, ...)
	/tmpfs/src/github/src/github.com/google/gapid/gapii/client/adb.go:145 +0xea9
main.(*traceVerb).captureADB(0xc4201a3c20, 0x120c400, 0xc4201e4960, 0xc4201d1ba0, 0x0, 0x0, 0x1, 0xc4201e4a20, 0xc4201e4510, 0xc4200101b0, ...)
	/tmpfs/src/github/src/github.com/google/gapid/cmd/gapit/trace.go:293 +0x30c
main.(*traceVerb).Run(0xc4201a3c20, 0x120c400, 0xc4201e4960, 0xc4201d1ba0, 0x0, 0x0, 0x1, 0xc4201e4a20, 0xc4201e4510, 0xc4200101b0, ...)
	/tmpfs/src/github/src/github.com/google/gapid/cmd/gapit/trace.go:111 +0x2e7
github.com/google/gapid/core/app.(*Verb).Invoke(0x123a260, 0x120c400, 0xc4201e4960, 0xc4200100f0, 0xd, 0xd, 0x1206180, 0x11f90b0)
	/tmpfs/src/github/src/github.com/google/gapid/core/app/verbs.go:89 +0x36c
github.com/google/gapid/core/app.VerbMain(0x120c400, 0xc4201e4960, 0xc420081748, 0xc4201d1b80)
	/tmpfs/src/github/src/github.com/google/gapid/core/app/verbs.go:120 +0x7b
github.com/google/gapid/core/app.Run(0xc84dd8)
	/tmpfs/src/github/src/github.com/google/gapid/core/app/run.go:159 +0x42e
main.main()
	/tmpfs/src/github/src/github.com/google/gapid/cmd/gapit/main.go:24 +0x77

goroutine 17 [syscall, 45 minutes, locked to thread]:
runtime.goexit()
	/tmpfs/src/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 5 [syscall, 45 minutes]:
os/signal.signal_recv(0x0)
	/tmpfs/src/go/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
	/tmpfs/src/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
	/tmpfs/src/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 8 [select, 45 minutes, locked to thread]:
runtime.gopark(0xc862d0, 0x0, 0xc5f691, 0x6, 0x18, 0x2)
	/tmpfs/src/go/src/runtime/proc.go:271 +0x13a
runtime.selectgoImpl(0xc42004af50, 0x0, 0x18)
	/tmpfs/src/go/src/runtime/select.go:423 +0x1364
runtime.selectgo(0xc42004af50)
	/tmpfs/src/go/src/runtime/select.go:238 +0x1c
runtime.ensureSigM.func1()
	/tmpfs/src/go/src/runtime/signal_unix.go:434 +0x2dd
runtime.goexit()
	/tmpfs/src/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 7 [chan receive, 45 minutes]:
github.com/google/gapid/core/log.Channel.func1(0xc4200819e0, 0xc420081980, 0x1205700, 0xc4201f6510)
	/tmpfs/src/github/src/github.com/google/gapid/core/log/channel.go:25 +0x78
created by github.com/google/gapid/core/log.Channel
	/tmpfs/src/github/src/github.com/google/gapid/core/log/channel.go:31 +0xa3

goroutine 9 [chan receive, 45 minutes]:
github.com/google/gapid/core/app.handleAbortSignals.func1(0xc420081a40, 0xc4201f64b0)
	/tmpfs/src/github/src/github.com/google/gapid/core/app/atexit.go:80 +0x40
created by github.com/google/gapid/core/app.handleAbortSignals
	/tmpfs/src/github/src/github.com/google/gapid/core/app/atexit.go:82 +0xe9

goroutine 87 [chan receive, 45 minutes]:
github.com/google/gapid/core/app.AddCleanup.func1(0xc420180570, 0x120c400, 0xc420d82ae0, 0xc420d83080)
	/tmpfs/src/github/src/github.com/google/gapid/core/app/atexit.go:54 +0x8a
created by github.com/google/gapid/core/app.AddCleanup
	/tmpfs/src/github/src/github.com/google/gapid/core/app/atexit.go:56 +0x6f

goroutine 106 [IO wait, 45 minutes]:
net.runtime_pollWait(0x7f1d03838f00, 0x72, 0x3)
	/tmpfs/src/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420022bc8, 0x72, 0x1202bc0, 0x11f9160)
	/tmpfs/src/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420022bc8, 0xc420b68cd0, 0x4)
	/tmpfs/src/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc420022b60, 0xc420b68cd0, 0x4, 0x8, 0x0, 0x1202bc0, 0x11f9160)
	/tmpfs/src/go/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc420d664c8, 0xc420b68cd0, 0x4, 0x8, 0x0, 0x0, 0x0)
	/tmpfs/src/go/src/net/net.go:181 +0x70
io.ReadAtLeast(0x7f1cfd0110b0, 0xc420d664c8, 0xc420b68cd0, 0x4, 0x8, 0x4, 0xc420180ef8, 0xc42004bc88, 0x4725ac)
	/tmpfs/src/go/src/io/io.go:307 +0xa9
io.ReadFull(0x7f1cfd0110b0, 0xc420d664c8, 0xc420b68cd0, 0x4, 0x8, 0xc4200001a0, 0x4, 0xc42004bcd8)
	/tmpfs/src/go/src/io/io.go:325 +0x58
github.com/google/gapid/core/data/endian.(*reader).Uint32(0xc420b68cc0, 0xc420026050)
	/tmpfs/src/github/src/github.com/google/gapid/core/data/endian/endian.go:178 +0x6a
github.com/google/gapid/core/java/jdwp.(*Connection).readPacket(0xc420022bd0, 0xc420b68780, 0xc42004bf00, 0xc420022d48, 0x1)
	/tmpfs/src/github/src/github.com/google/gapid/core/java/jdwp/packet.go:75 +0x4f
github.com/google/gapid/core/java/jdwp.(*Connection).recv(0xc420022bd0, 0x120c340, 0xc420b68780)
	/tmpfs/src/github/src/github.com/google/gapid/core/java/jdwp/recv.go:35 +0x81
created by github.com/google/gapid/core/java/jdwp.Open
	/tmpfs/src/github/src/github.com/google/gapid/core/java/jdwp/jdwp.go:82 +0x4a0

rax    0xfffffffffffffffc
rbx    0x0
rcx    0x4b96f3
rdx    0x0
rdi    0x123a3f8
rsi    0x0
rbp    0x7f1d0139cd38
rsp    0x7f1d0139ccf0
r8     0x0
r9     0x0
r10    0x7f1d0139cd28
r11    0x246
r12    0x0
r13    0x0
r14    0x7f1d0139d9c0
r15    0x7f1d0139d700
rip    0x4b96f3
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

@baldwinn860
Copy link
Contributor Author

baldwinn860 commented Sep 21, 2017

Things that jump out to me (not knowing about the GAPII code at all) are:

  • waitForVulkanLoad when the titles in question don't use vulkan
  • This happens more often than not on one device/APK combo (contact me for details)
  • It has occurred on other device/APK's but less frequently
  • The APK was clearly waiting for the Debugger to attach on the device.

@AWoloszyn
Copy link
Contributor

waitForVulkanLoad is probably a red-herring. We use this method to wait for ApplicationLoaders.getClassLoader(). It will only get called if that method exists. So since we are hitting this code-path, then it exists.

Things that it could be:

  1. ApplicationLoaders.getClassLoader() is called before we even manage to attach to jdwp
  2. We fail to resume the thread in JDWP, so we basically wait forever.
  3. ApplicationLoaders.getClassLoader() is not actually called always?

@ben-clayton ben-clayton added the P1 label Oct 2, 2017
@ben-clayton ben-clayton added this to the V1.0 milestone Oct 2, 2017
@tombondee
Copy link

Hi,

I'm having the same issue with the latest release. Getting stuck on "Waiting for ApplicationLoaders.getClassLoader()". Any ideas and I'm using a Vivo phone. Would you recommend trying another phone? Also what are the trace commands I should be using to capture everything I need to debug opengles 3.2 offscreen fbo black screen issues. Thanks.

Here is my session info:

./gapit.exe trace -android-package com.eecolor.sRBGSimulatorForDCIP3Display -android-at
tach -record-inputs -api gles
Press enter to stop capturing...
04:17:19.672 I: [gapidapk.EnsureInstalled] Examining gapid.apk on host...
04:17:19.676 I: [gapidapk.EnsureInstalled] Looking for gapid.apk...
04:17:20.316 I: [gapidapk.EnsureInstalled] Found gapid package...
04:17:22.030 I: Adding new device
04:17:22.030 I: Device list:
04:17:22.030 I: 8f9eb4c5
04:17:22.648 I: Package is debuggable
04:17:22.804 I: [start] Turning device screen on
04:17:22.902 I: [start] Checking for lockscreen
04:17:23.007 I: [start] Checking gapid.apk is installed
04:17:23.007 I: [start] Forwarding
04:17:23.102 I: [start] No start activity selected - trying to attach...
04:17:25.459 I: [start] Forwarding TCP port 60078 -> JDWP pid 3944
04:17:25.523 I: [start] Connecting to JDWP
04:17:26.552 I: [start] Waiting for ApplicationLoaders.getClassLoader()

@ben-clayton
Copy link
Contributor

Hi @tombondee, thank you for reporting this.

@baldwinn860 - please can you see if #1225 has fixed this problem?

@dsrbecky
Copy link
Contributor

@tombondee The command line you are using is suspicious. --android-attach is supposed to be used to attach to a running process which is probably not what you want. It has very specific use cases like tracing CTS tests.

Can you try just "./gapit.exe trace sRBGSimulatorForDCIP3Display"?

@baldwinn860
Copy link
Contributor Author

It seems #1225 has not fixed this issue yet. The latest test on robot, taken after this change, still reproduces the problem.

@ben-clayton
Copy link
Contributor

I suspect this might have been fixed by 4e24571.
Nick can you please check whether this is still happening?

@baldwinn860
Copy link
Contributor Author

This is still happening after 4e24571.

@ben-clayton
Copy link
Contributor

So I have a theory to what's happening here. Let's go through the events of jdwp_loader.go:

  1. The application is launched with wait-for-debugger. This means the application will not progress Java execution until the debugger has attached and has then stopped issuing commands for a short duration of time (see logcat's "waiting for debugger to settle..." messages).
  2. loadAndConnectViaJDWP connects to the application and immediately sets a breakpoint on android.app.ApplicationLoaders.getClassLoader. GAPIT will then block waiting for this breakpoint to hit. As GAPIT is blocked, no more JDWP commands are sent to the app, and so it 'settles'. The app starts running.
  3. The breakpoint for getClassLoader is reached. We specify that the thread that calls this function should be suspended.
  4. With the thread suspended, we fiddle with librarySearchPaths to add GAPII's vulkan layers to the search paths. loadAndConnectViaJDWP's execution then goes on to set a breakpoint in Application.onCreate. The suspended thread is automatically resumed by WatchEvents so the new breakpoint can be hit.
  5. Once we've reached onCreate we do a bunch'o'stuff to load the interceptor and get things going.
  6. When our JDWP connection is cut, the app resumes any suspended threads.

So - my theory is that the thread that calls getClassLoader is not the same thread that calls onCreate. Because they're not the same thread, the application is free to execute onCreate before we've reached 4.

To test this theory we could simply print the thread ID that each of these functions were called on:

	getClassLoader, err := waitForVulkanLoad(ctx, conn)
	log.I(ctx, "getClassLoader was called on thread %v", getClassLoader.Thread)

...

	onCreate, err := waitForOnCreate(ctx, conn, classLoaderThread)
	log.I(ctx, "onCreate was called on thread %v", onCreate.Thread)

My guess is that on the devices that have this issue, these two threads are typically different.

@baldwinn860 - I'd be interested to know what this log lines say for your troublesome devices.

@pmuetschard
Copy link
Member

I would be surprised if there's more than one thread, though, if I recall correctly, when registering the event, you can ask the JVM to suspend all threads when the event is hit. We could try that and see if it fixes it.

BTW, onCreate and getClassLoader both get called from LoadedApk

@ben-clayton ben-clayton added P2 and removed P1 labels Oct 30, 2017
ben-clayton added a commit to ben-clayton/gapid that referenced this issue Nov 1, 2017
Add flag to print device logcat.
Print message if JDWP fails to connect.

These changes are to try to identify the cause of google#1116.
ben-clayton added a commit that referenced this issue Nov 1, 2017
Add flag to print device logcat.
Print message if JDWP fails to connect.

These changes are to try to identify the cause of #1116.
@ben-clayton
Copy link
Contributor

We concluded that this was a screen-configuration change issue, where apps was being restarted by the OS.
Let's re-open if we discover that this is not the case.

purvisa-at-google-com pushed a commit that referenced this issue Sep 29, 2022
Fix segfault in validation failures, fix case where validation sometimes always passes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants