Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Android NDK build #851

Closed
twaik opened this issue Aug 26, 2019 · 28 comments
Closed

Android NDK build #851

twaik opened this issue Aug 26, 2019 · 28 comments

Comments

@twaik
Copy link

twaik commented Aug 26, 2019

Hi. Can you please make uftrace to be able to be built with Android NDK? NDK has no mcount support, but there is a need to write function execution time trace log. I think there is no need to interpret results on the target device, it can be done on PC using uftrace replay. NDK misses posix_spawn implementation until Android 9, so you can get it here.

@namhyung
Copy link
Owner

Thanks for using uftrace. Unfortunately it doesn't support Android for now. I'm not familiar with the build system and there's some piece of code needs update to be built on non-glibc systems.

And I don't know why you mentioned posix_spawn as it's not used in uftrace (but supported if user application is using it).

@twaik
Copy link
Author

twaik commented Aug 27, 2019

Oh, I am sorry. Termux build was failing because of missing posix_spawn implementation in this file: https://github.com/namhyung/uftrace/blob/master/tests/s-posix_spawn.c .

@twaik
Copy link
Author

twaik commented Aug 27, 2019

Can I ask you to fix uftrace in Termux? I tried to cross-compile uftrace but everything I get is

$ uftrace tool
child terminated by signal: 11: Segmentation fault
WARN: cannot open record data: /data/data/com.termux/files/usr/tmp/uftrace-live-enMZF9: m

@namhyung
Copy link
Owner

No need to be sorry. The posix_spawn is in the test and it should move on even if it fails to build. I'm not aware of the Termux so it's unlikely I can work on it in near future. But can you show me the backtrace of the segfault? You can simply add -v option to uftrace as well.

@twaik
Copy link
Author

twaik commented Aug 29, 2019

$ tool
Hello!
$ uftrace -v tool
uftrace: checking binary tool
uftrace: using /data/data/com.termux/files/usr/lib/libmcount.so library for tracing
uftrace: creating 1 thread(s) for recording
mcount: initializing mcount library
demangle: demangle failed: _ZTSDu
demangle: demangle failed: _ZTIPKDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbwwEPKwS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZTIPDu
demangle: demangle failed: _ZTSPDu
demangle: demangle failed: _ZTSPKDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbccEPKcS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZTIDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbccEPKcS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbwwEPKwS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
plthook: setup PLT hooking "/data/data/com.termux/files/home/tool"
child terminated by signal: 11: Segmentation fault
uftrace: cannot find build-id section
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
uftrace: live-record finished.. 
uftrace: start live-replaying...
WARN: cannot open record data: /data/data/com.termux/files/usr/tmp/uftrace-live-PgVBqm: No data available
uftrace: removing /data/data/com.termux/files/usr/tmp/uftrace-live-PgVBqm directory
$ 

Also I need to notice a bit different structure of bionic binary, e.g. Unwind_Resume() function is placed in libgcc_real.a in NDK and compiled statically to the binary.
Patches for uftrace to be built in Termux: https://github.com/termux/termux-packages/tree/master/packages/uftrace
Build instructions: https://github.com/termux/termux-packages/wiki

@twaik
Copy link
Author

twaik commented Sep 3, 2019

I think you need to skip /dev/* files when you are scanning process maps files.

@twaik
Copy link
Author

twaik commented Sep 3, 2019

I have a gdb log, but I do not understan what I see.

$ gdb -q --args uftrace record tool
Reading symbols from uftrace...
(No debugging symbols found in uftrace)
(gdb) b main
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (main) pending.
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /data/data/com.termux/files/usr/bin/uftrace record tool
[Attaching after process 9824 fork to child process 9828]
[New inferior 2 (process 9828)]
[Detaching after fork from parent process 9824]
[Inferior 1 (process 9824) detached]
process 9828 is executing new program: /data/data/com.termux/files/home/tool

Thread 2.1 "tool" received signal SIGSEGV, Segmentation fault.
[Switching to process 9828]
0xb6f355f8 in __dl__ZN12LinkerLogger10ResetStateEv () from /system/bin/linker
(gdb) step
Single stepping until exit from function __dl__ZN12LinkerLogger10ResetStateEv,
which has no line number information.

Thread 2.1 "tool" received signal SIGSEGV, Segmentation fault.
0xb6f353e8 in __dl__ZZL29__linker_init_post_relocationR19KernelArgumentBlockEN3$_28__invokeEv () from /system/bin/linker
(gdb) bt
#0  0xb6f353e8 in __dl__ZZL29__linker_init_post_relocationR19KernelArgumentBlockEN3$_28__invokeEv () from /system/bin/linker
#1  0xb6f3dc3c in __dl__ZL24debuggerd_signal_handleriP7siginfoPv () from /system/bin/linker
#2  <signal handler called>
#3  0xb6f355f8 in __dl__ZN12LinkerLogger10ResetStateEv () from /system/bin/linker
#4  0xb6f2847a in __dl__Z10dlsym_implPvPKcS1_PKv () from /system/bin/linker
#5  0xb6d4e0c6 in dlsym () from /system/lib/libdl.so
#6  0xb6cd5fd4 in ?? () from /data/data/com.termux/files/usr/lib/libmcount-fast.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) quit
A debugging session is active.

	Inferior 2 [process 9828] will be killed.

Quit anyway? (y or n) y
child terminated by signal: 9: Killed

@twaik
Copy link
Author

twaik commented Sep 3, 2019

The error occures in linker_init_post_relocation, but it is not because of dlopen/execve/etc hooks, and not because of wrong mcount name. I can not trace a segfault. Can you please help me?

@namhyung
Copy link
Owner

namhyung commented Sep 8, 2019

Sorry for the late reply. I don't know what's the exact reason of the segfault, but dlsym() was called during PLT hooking in the initialization phase. Not sure why it's a problem, maybe need to look at the implementation of linker.

@twaik
Copy link
Author

twaik commented Sep 8, 2019

Looks like Ubuntu arm segfaults too. 0.9 works fine, but the latest segfaults.

@namhyung
Copy link
Owner

namhyung commented Sep 8, 2019

Do you mean the packaged version or latest master?

@twaik
Copy link
Author

twaik commented Sep 8, 2019

Latest master

@honggyukim
Copy link
Collaborator

Hi @twaik. Thanks for the work. If 0.9 works fine, then could you please bisect from 0.9 to latest master to find the problematic commit?

@twaik
Copy link
Author

twaik commented Sep 9, 2019

I am sorry, but I did not test. @xeffyr did.

@honggyukim
Copy link
Collaborator

I see. Thanks anyway.

@ghost
Copy link

ghost commented Sep 9, 2019

Latest commit on master seems work fine.

root@localhost:~# uftrace record main
func1
func2
func3
root@localhost:~# uftrace replay
# DURATION     TID     FUNCTION
   6.346 us [ 27379] | __monstartup();
   4.385 us [ 27379] | __cxa_atexit();
            [ 27379] | main() {
            [ 27379] |   func1() {
 122.807 us [ 27379] |     puts();
            [ 27379] |     func2() {
  56.692 us [ 27379] |       puts();
  65.692 us [ 27379] |     } /* func2 */
 201.616 us [ 27379] |   } /* func1 */
            [ 27379] |   func3() {
  54.692 us [ 27379] |     puts();
  63.538 us [ 27379] |   } /* func3 */
 277.500 us [ 27379] | } /* main */

The segfault was on tag v0.9.3 and happened only when traced program compiled with gcc -pg flag:

root@localhost:~# gdb -q --args uftrace record main
Reading symbols from uftrace...
(gdb) run     
Starting program: /usr/local/bin/uftrace record main

Program received signal SIGSEGV, Segmentation fault.
0xf77c9ee0 in ?? () from /lib/ld-linux-armhf.so.3

I can do a git bisect, but since on latest master this issue is not reproducible I think no point to do it.


Regarding issue on Android ARM, I can't reproduce segfault but I'm getting a bit different problem when traced program built with clang -pg:

~ $ uftrace record a.out 
func1
func2
func3
~ $ uftrace replay
WARN: cannot open record data: uftrace.data: No data available

This doesn't happen when program is compiled with -finstrument-functions.
Seems more clang-related issue.

@namhyung
Copy link
Owner

namhyung commented Sep 9, 2019

@xeffyr I agree you don't need to bisect if latest master works well.

Could you please run uftrace record with -v option and show me the disassembly of few functions like main()?

@ghost
Copy link

ghost commented Sep 9, 2019

uftrace record -v on Android:

~ $ uftrace record -v a.out 
uftrace: checking binary a.out
uftrace: removing uftrace.data.old directory
uftrace: skipping perf event due to error: m
uftrace: please check /proc/sys/kernel/perf_event_paranoid
uftrace: creating 2 thread(s) for recording
uftrace: using /data/data/com.termux/files/usr/lib/libmcount.so library for tracing
mcount: initializing mcount library
demangle: demangle failed: _ZTSDu
demangle: demangle failed: _ZTIPKDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbwwEPKwS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZTIPDu
demangle: demangle failed: _ZTSPDu
demangle: demangle failed: _ZTSPKDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbccEPKcS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZTIDu
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbccEPKcS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
demangle: demangle failed: _ZNSt6__ndk110__find_endIPDoFbwwEPKwS4_EET0_S5_S5_T1_S6_T_NS_26random_access_iterator_tagES8_
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/property_info: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/property_info: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
plthook: setup PLT hooking "/data/data/com.termux/files/home/a.out"
mcount: mcount setup done
func1
func2
func3
uftrace: child terminated with exit code: 0
uftrace: cannot find build-id section
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:default_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/properties_serial: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/property_info: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/property_info: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle
symbol: error during open symbol file: /dev/__properties__/u:object_r:debug_prop:s0: m
symbol: ELF error when loading symbols: invalid `Elf' handle

Disassembly of a.out. Only for main() and func1:

00000580 <func1>:
 580:	e92d4800 	push	{fp, lr}
 584:	e1a0b00d 	mov	fp, sp
 588:	e24dd008 	sub	sp, sp, #8
 58c:	ebffffae 	bl	44c <mcount@plt>
 590:	e59f0018 	ldr	r0, [pc, #24]	; 5b0 <func1+0x30>
 594:	e08f0000 	add	r0, pc, r0
 598:	ebffffb1 	bl	464 <puts@plt>
 59c:	e58d0004 	str	r0, [sp, #4]
 5a0:	ebffffea 	bl	550 <func2>
 5a4:	e1a0d00b 	mov	sp, fp
 5a8:	e8bd4800 	pop	{fp, lr}
 5ac:	e12fff1e 	bx	lr
 5b0:	00000058 	.word	0x00000058

000005b4 <main>:
 5b4:	e92d4800 	push	{fp, lr}
 5b8:	e1a0b00d 	mov	fp, sp
 5bc:	e24dd008 	sub	sp, sp, #8
 5c0:	ebffffa1 	bl	44c <mcount@plt>
 5c4:	e3a0e000 	mov	lr, #0
 5c8:	e58de004 	str	lr, [sp, #4]
 5cc:	e58de000 	str	lr, [sp]
 5d0:	ebffffea 	bl	580 <func1>
 5d4:	ebffffd1 	bl	520 <func3>
 5d8:	e59d0000 	ldr	r0, [sp]
 5dc:	e1a0d00b 	mov	sp, fp
 5e0:	e8bd4800 	pop	{fp, lr}
 5e4:	e12fff1e 	bx	lr

Complete at: https://pastebin.com/ViTLxcXJ

@namhyung
Copy link
Owner

namhyung commented Sep 9, 2019

Thanks for doing that. I found that your clang generate incompatible binary pattern than we expect from gcc. Please see the comments in arch/arm/mcount.S.

@namhyung
Copy link
Owner

namhyung commented Sep 9, 2019

Could you please test the completed untested patch?

diff --git a/arch/arm/mcount.S b/arch/arm/mcount.S
index bbfc597..eb28831 100644
--- a/arch/arm/mcount.S
+++ b/arch/arm/mcount.S
@@ -60,6 +60,26 @@ GLOBAL(__gnu_mcount_nc)
        bx      ip
 END(__gnu_mcount_nc)
 
+/* for clang */
+GLOBAL(mcount)
+       push    {r0-r3, lr}
+       ands    r3, lr, #1  /* check lr for ARM/THUMB detection */
+       add     r0, fp, #4  /* r0 points to pushed LR  */
+       bne     1f
+       ldr     r1, [fp]    /* fp (=r11) might point to return address on ARM */
+       ldr     r2, [r0]
+       cmp     r1, r2
+       moveq   r0, lr
+1:
+       mov     r1, lr      /* child ip */
+       mov     r2, sp      /* mcount_args */
+
+       bl      mcount_entry
+
+       pop     {r0-r3, lr}
+       bx      lr
+END(mcount)
+
 
 ENTRY(mcount_return)
        push    {r0-r3, lr, pc}  /* ensure 8-byte alignment */

@ghost
Copy link

ghost commented Sep 10, 2019

@namhyung After applying this patch, it works.

~ $ file main
main: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /system/bin/linker, not stripped
~ $ uftrace record main
func1
func2
func3
~ $ uftrace replay
# DURATION     TID     FUNCTION
            [ 21950] | main() {
            [ 21950] |   func1() {
  57.500 us [ 21950] |     func2();
 221.039 us [ 21950] |   } /* func1 */
  58.500 us [ 21950] |   func3();
 299.000 us [ 21950] | } /* main */

ghost pushed a commit to termux/termux-packages that referenced this issue Sep 10, 2019
@honggyukim
Copy link
Collaborator

That's great! @namhyung Is there any reason to make GLOBAL(mcount) slightly different from the original GLOBAL(__gnu_mcount_nc)? If there's no reason, we can just make it as an alias.

@twaik
Copy link
Author

twaik commented Sep 11, 2019

You can add #ifdef __clang__ to make sure it builds only for clang.

@namhyung
Copy link
Owner

@honggyukim there's a difference how lr register was saved before calling mcount. So they cannot be merged.

@xeffyr good to see it works, thanks for the testing!

@twaik it's not a problem of building uftrace. It cares the target binary which could be built by gcc or clang. So the marco guard won't work IMO.

@twaik
Copy link
Author

twaik commented Sep 11, 2019

I mean you can change mcount asm code correspond the target compiler. Patched mcount code for clang or original __gnu_mcount_nc code for gcc.

@namhyung
Copy link
Owner

Are you saying that it should do it runtime? We cannot (or might not) know which compiler was used for the target binary when running it with uftrace.

@twaik
Copy link
Author

twaik commented Sep 11, 2019

So the patch effectiveness depends on the target binary compiler?

@honggyukim
Copy link
Collaborator

If uftrace is built with gcc, but tries to run it with the target binary, which is compile with clang, then we should still need GLOBAL(mcount) to support both gcc compile binary and clang compile binary.

hqsz pushed a commit to hqsz/uftrace that referenced this issue Sep 17, 2019
It was reported that clang generates different code pattern for
calling mcount().  Unlike gcc uses __gnu_mcount_nc() for compatibility
it just use the old mcount() name and saves fp and lr registers before
calling it.  It seems to be the case in Android NDK build (or termux?)

Fixed: namhyung#851

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
amuramatsu pushed a commit to amuramatsu/termux-packages that referenced this issue Dec 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants