Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic/Full Dynamic tracing error on ubuntu 20.04 #1231

Closed
Rexyyj opened this issue Jan 20, 2021 · 42 comments
Closed

Dynamic/Full Dynamic tracing error on ubuntu 20.04 #1231

Rexyyj opened this issue Jan 20, 2021 · 42 comments

Comments

@Rexyyj
Copy link

Rexyyj commented Jan 20, 2021

Hello, when I was testing full dynamic tracing on ubuntu 20.04, the following error output shows:

rex@rex-ubuntu:~/test_ws/dynamic$ gcc -o test test.c
rex@rex-ubuntu:~/test_ws/dynamic$ uftrace -P . test
WARN: Segmentation fault: invalid permission (addr: 0x557f610d017a)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.
WARN: Backtrace from uftrace v0.9.4-193-g64947 ( dwarf python luajit tui perf sched dynamic )
WARN: =====================================
WARN: [0] (main[557f610d017f] <= __libc_start_main[7f7ae02680b3])
Please report this bug to https://github.com/namhyung/uftrace/issues.
WARN: child terminated by signal: 11: Segmentation fault
# DURATION     TID     FUNCTION
            [  9433] | main() {
uftrace stopped tracing with remaining functions
================================================
task: 9433
[0] main

But when I tried the same test on ubuntu 18.04, the output is as expected and without any error during tracing.
The code in test.c is:

#include <stdio.h>
void b(){
    printf("Hellow");
}

void a(){
    b();
}

int main(){
    a();
    return 0;
}

And during the test of dynamic tracing, error occur during gcc compile time with the command in the tutorial:

rex@rex-ubuntu:~/test_ws/dynamic$ gcc -pg -mfentry -mnop-mcount  -o test2 test.c
cc1: error: ‘-mnop-mcount’ is not implemented for ‘-fPIC’

It can be solved by:

rex@rex-ubuntu:~/test_ws/dynamic$ gcc -pg -mfentry -mnop-mcount -fno-pic -no-pie -o test2 test.c

but during the trace, the following error shows:

rex@rex-ubuntu:~/test_ws/dynamic$ uftrace -P . test2
WARN: Segmentation fault: invalid permission (addr: 0x4010c0)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.9.4-193-g64947 ( dwarf python luajit tui perf sched dynamic )
WARN: =====================================
WARN: [0] (__gmon_start__[4010c5] <= <401016>[401016])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
# DURATION     TID     FUNCTION
            [ 10325] | __gmon_start__() {

uftrace stopped tracing with remaining functions
================================================
task: 10325
[0] __gmon_start__

@honggyukim
Copy link
Collaborator

honggyukim commented Jan 22, 2021

Thanks very much for reporting this bug. I can reproduce the problem on Ubuntu 20.04.

$ uftrace -P main --debug-domain dynamic:3 a.out
dynamic: dynamic patch type: a.out: 0 (none)
dynamic: patch normal func: main (patch size: 5)
dynamic: patched all (1) functions in 'a.out'
WARN: Segmentation fault: invalid permission (addr: 0x7ffb299bc17a)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.9.4-220-g42fc ( x86_64 dwarf python luajit tui perf sched dynamic )
WARN: =====================================
WARN: [0] (main[7ffb299bc17f] <= __libc_start_main[7ffb297670b3])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
# DURATION     TID     FUNCTION
            [  1470] | main() {

uftrace stopped tracing with remaining functions
================================================
task: 1470
[0] main

I will take a look when I have some time.

@honggyukim
Copy link
Collaborator

Hmm.. It shows endbr64 instructions at the entry of each functions.

000000000001149 <b>:
    1149:       f3 0f 1e fa             endbr64
    114d:       55                      push   %rbp
    114e:       48 89 e5                mov    %rsp,%rbp
    1151:       48 8d 3d ac 0e 00 00    lea    0xeac(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    1158:       b8 00 00 00 00          mov    $0x0,%eax
    115d:       e8 ee fe ff ff          callq  1050 <printf@plt>
    1162:       90                      nop
    1163:       5d                      pop    %rbp
    1164:       c3                      retq

0000000000001165 <a>:
    1165:       f3 0f 1e fa             endbr64
    1169:       55                      push   %rbp
    116a:       48 89 e5                mov    %rsp,%rbp
    116d:       b8 00 00 00 00          mov    $0x0,%eax
    1172:       e8 d2 ff ff ff          callq  1149 <b>
    1177:       90                      nop
    1178:       5d                      pop    %rbp
    1179:       c3                      retq

000000000000117a <main>:
    117a:       f3 0f 1e fa             endbr64
    117e:       55                      push   %rbp
    117f:       48 89 e5                mov    %rsp,%rbp
    1182:       b8 00 00 00 00          mov    $0x0,%eax
    1187:       e8 d9 ff ff ff          callq  1165 <a>
    118c:       b8 00 00 00 00          mov    $0x0,%eax
    1191:       5d                      pop    %rbp
    1192:       c3                      retq
    1193:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
    119a:       00 00 00
    119d:       0f 1f 00                nopl   (%rax)

@honggyukim
Copy link
Collaborator

We've already seen endbr64 in #738.

@honggyukim
Copy link
Collaborator

honggyukim commented Jan 22, 2021

The gcc info is as follows.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: \
        ../src/configure \
        -v \
        --with-pkgversion='Ubuntu 9.3.0-17ubuntu1~20.04' \
        --with-bugurl=file:///usr/share/doc/gcc-9/README.Bugs \
        --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,gm2 \
        --prefix=/usr \
        --with-gcc-major-version-only \
        --program-suffix=-9 \
        --program-prefix=x86_64-linux-gnu- \
        --enable-shared \
        --enable-linker-build-id \
        --libexecdir=/usr/lib \
        --without-included-gettext \
        --enable-threads=posix \
        --libdir=/usr/lib \
        --enable-nls \
        --enable-clocale=gnu \
        --enable-libstdcxx-debug \
        --enable-libstdcxx-time=yes \
        --with-default-libstdcxx-abi=new \
        --enable-gnu-unique-object \
        --disable-vtable-verify \
        --enable-plugin \
        --enable-default-pie \
        --with-system-zlib \
        --with-target-system-zlib=auto \
        --enable-objc-gc=auto \
        --enable-multiarch \
        --disable-werror \
        --with-arch-32=i686 \
        --with-abi=m64 \
        --with-multilib-list=m32,m64,mx32 \
        --enable-multilib \
        --with-tune=generic \
        --enable-offload-targets=nvptx-none=/build/gcc-9-HskZEa/gcc-9-9.3.0/debian/tmp-nvptx/usr,hsa \
        --without-cuda-driver \
        --enable-checking=release \
        --build=x86_64-linux-gnu \
        --host=x86_64-linux-gnu \
        --target=x86_64-linux-gnu
Thread model: posix
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

@honggyukim
Copy link
Collaborator

It can be reproduced with any test program.

@namhyung
Copy link
Owner

The actual error message was 'invalid permission' for the code address. Can you please test this?

diff --git a/libmcount/dynamic.c b/libmcount/dynamic.c
index f228aa9b..6eaf51db 100644
--- a/libmcount/dynamic.c
+++ b/libmcount/dynamic.c
@@ -164,7 +164,8 @@ void mcount_freeze_code(void)
                if (cp->frozen)
                        continue;
 
-               mprotect(cp->page, CODE_CHUNK, PROT_READ|PROT_EXEC);
+               if (mprotect(cp->page, CODE_CHUNK, PROT_READ|PROT_EXEC) < 0)
+                       pr_err("mprotect to freeze code page failed");
                cp->frozen = true;
        }
 }

@Rexyyj
Copy link
Author

Rexyyj commented Jan 23, 2021

Hello, thanks for the reply! I have tested this modification but the error output seems to be the same as the previous one, I didn't get the new added error message :

rex@rex-ubuntu:~/test_ws/dynamic$ gcc test.c
rex@rex-ubuntu:~/test_ws/dynamic$ uftrace -P . ./a.out 
WARN: Segmentation fault: invalid permission (addr: 0x55fc8960b17a)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.9.4-193-g64947 ( dwarf python luajit tui perf sched dynamic )
WARN: =====================================
WARN: [0] (main[55fc8960b17f] <= __libc_start_main[7f0d7e0f50b3])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
# DURATION     TID     FUNCTION
            [ 13254] | main() {

uftrace stopped tracing with remaining functions
================================================
task: 13254
[0] main

The actual error message was 'invalid permission' for the code address. Can you please test this?

diff --git a/libmcount/dynamic.c b/libmcount/dynamic.c
index f228aa9b..6eaf51db 100644
--- a/libmcount/dynamic.c
+++ b/libmcount/dynamic.c
@@ -164,7 +164,8 @@ void mcount_freeze_code(void)
                if (cp->frozen)
                        continue;
 
-               mprotect(cp->page, CODE_CHUNK, PROT_READ|PROT_EXEC);
+               if (mprotect(cp->page, CODE_CHUNK, PROT_READ|PROT_EXEC) < 0)
+                       pr_err("mprotect to freeze code page failed");
                cp->frozen = true;
        }
 }

@namhyung
Copy link
Owner

Thanks for the test. Regardless of the result I think we should have the change though.

Can you try to build your test program with -fcf-protection=none switch?

@honggyukim
Copy link
Collaborator

Hmm.. It works fine.

$ gcc tests/s-abc.c -fcf-protection=none

$ uftrace -P. a.out
# DURATION     TID     FUNCTION
            [   397] | main() {
            [   397] |   a() {
            [   397] |     b() {
            [   397] |       c() {
   2.600 us [   397] |         getpid();
  14.900 us [   397] |       } /* c */
  17.900 us [   397] |     } /* b */
  20.900 us [   397] |   } /* a */
  40.000 us [   397] | } /* main */

@honggyukim honggyukim added this to the v0.9.5 milestone Jan 24, 2021
@namhyung
Copy link
Owner

OK, then can you try this with the CF protection enabled binary?

diff --git a/cmds/record.c b/cmds/record.c
index e750f053..c70da5f8 100644
--- a/cmds/record.c
+++ b/cmds/record.c
@@ -356,7 +356,7 @@ static void setup_child_environ(struct opts *opts, int argc, char *argv[])
 
        put_libmcount_path(libpath);
        setenv("XRAY_OPTIONS", "patch_premain=false", 1);
-       setenv("GLIBC_TUNABLES", "glibc.cpu.hwcaps=-IBT,-SHSTK", 1);
+       setenv("GLIBC_TUNABLES", "glibc.cpu.x86_ibt=off:glibc.cpu.x86_shstk=off", 1);
 }
 
 static uint64_t calc_feat_mask(struct opts *opts)

@honggyukim
Copy link
Collaborator

Hmm.. It doesn't work. It crashes as well.

@honggyukim
Copy link
Collaborator

honggyukim commented Jan 30, 2021

The followings glibc commits may be related to this problem.

@honggyukim
Copy link
Collaborator

Hmm.. Ubuntu 20.04 uses glibc-2.31, but it doesn't include both commits above.

@namhyung
Copy link
Owner

Hmm.. can you test this again and show me the error message?

diff --git a/libmcount/mcount.c b/libmcount/mcount.c
index 8b7d3a82..2e534e00 100644
--- a/libmcount/mcount.c
+++ b/libmcount/mcount.c
@@ -704,8 +704,8 @@ static void segv_handler(int sig, siginfo_t *si, void *ctx)
                        break;
 
                if (si->si_code == sigsegv_codes[i].code) {
-                       pr_warn("Segmentation fault: %s (addr: %p)\n",
-                              sigsegv_codes[i].msg, si->si_addr);
+                       pr_warn("Segmentation fault: %s (addr: %p, error: %x)\n",
+                               sigsegv_codes[i].msg, si->si_addr, si->si_errno);
                        break;
                }
        }

@honggyukim
Copy link
Collaborator

Hmm.. si->si_errno shows 0.

@honggyukim
Copy link
Collaborator

It shows Success because the si->si_errno is 0. Let's add the code anyway for later usage.

diff --git a/libmcount/mcount.c b/libmcount/mcount.c
index 8b7d3a82..06e48a55 100644
--- a/libmcount/mcount.c
+++ b/libmcount/mcount.c
@@ -704,8 +704,8 @@ static void segv_handler(int sig, siginfo_t *si, void *ctx)
                        break;

                if (si->si_code == sigsegv_codes[i].code) {
-                       pr_warn("Segmentation fault: %s (addr: %p)\n",
-                              sigsegv_codes[i].msg, si->si_addr);
+                       pr_warn("Segmentation fault: %s (addr: %p, error: %s)\n",
+                              sigsegv_codes[i].msg, si->si_addr, strerror(si->si_errno));
                        break;
                }
        }

@namhyung
Copy link
Owner

I'm not sure si->si_errno is same as errno so that we can use strerror(). Actually I expected something other (like arch/kernel specific one).

@honggyukim
Copy link
Collaborator

It might be a bug in glibc-2.31 so reported this in the glibc mailing list at https://sourceware.org/bugzilla/show_bug.cgi?id=27300. Let's see if there is a way to avoid this problem.

@honggyukim
Copy link
Collaborator

@namhyung You can reproduce this problem in WSL2 Ubuntu 20.04 if you have a windows PC.

@honggyukim
Copy link
Collaborator

This problem happens also in Ubuntu 20.10, which uses libc-2.32.so.

@honggyukim honggyukim changed the title Dynamic/Full Dynamic tracing error on ubuntu 20.04 Dynamic/Full Dynamic tracing error on ubuntu 20.04 and 20.10 Feb 5, 2021
@namhyung
Copy link
Owner

I've checked the Ubuntu's glibc source and it seems to have an out of tree kernel patch to control CET. (but still not sure why the glibc tunables don't work).

Anyway can you please test the below patch?

diff --git a/cmds/record.c b/cmds/record.c
index e750f053..42225023 100644
--- a/cmds/record.c
+++ b/cmds/record.c
@@ -357,6 +357,11 @@ static void setup_child_environ(struct opts *opts, int argc, char *argv[])
        put_libmcount_path(libpath);
        setenv("XRAY_OPTIONS", "patch_premain=false", 1);
        setenv("GLIBC_TUNABLES", "glibc.cpu.hwcaps=-IBT,-SHSTK", 1);
+
+#ifdef ARCH_CET_DISABLE
+       /* 3 = GNU_PROPERTY_X86_FEATURE_1_(IBT|SHSTK) */
+       prctl(ARCH_CET_DISABLE, 3, 0, 0, 0);
+#endif
 }
 
 static uint64_t calc_feat_mask(struct opts *opts)

@honggyukim
Copy link
Collaborator

Hmm.. I tested it on Ubuntu 20.04, but ARCH_CET_DISABLE is not defined so it shows the same crash.

@namhyung
Copy link
Owner

Maybe I need to include the headers. Can you please try again?

diff --git a/cmds/record.c b/cmds/record.c
index 77887c26..d259923f 100644
--- a/cmds/record.c
+++ b/cmds/record.c
@@ -18,6 +18,8 @@
 #include <sys/eventfd.h>
 #include <sys/resource.h>
 #include <sys/epoll.h>
+#include <sys/prctl.h>
+#include <asm/prctl.h>
 #include <sys/personality.h>
 
 #include "uftrace.h"
@@ -360,7 +362,10 @@ static void setup_child_environ(struct opts *opts, int argc, char *argv[])
 
 #ifdef ARCH_CET_DISABLE
        /* HACK: for Ubuntu 20.04  (3 = GNU_PROPERTY_X86_FEATURE_1_(IBT|SHSTK)) */
-       prctl(ARCH_CET_DISABLE, 3, 0, 0, 0);
+       if (prctl(ARCH_CET_DISABLE, 3, 0, 0, 0))
+               pr_dbg("prctl(ARCH_CET_DISABLE) failed: %m\n");
+       else
+               pr_dbg("prctl(ARCH_CET_DISABLE): ok\n");
 #endif
 }
 

@honggyukim
Copy link
Collaborator

honggyukim commented Feb 14, 2021

I mean ARCH_CET_DISABLE is not defined anywhere so the #ifdef ARCH_CET_DISABLE is not compiled at all. So the diff doesn't make any change in the code. The sys/prctl.h and asm/prctl.h do not have ARCH_CET_DISABLE as well.

@namhyung
Copy link
Owner

Hmm.. ok. Can you strace your binary (without uftrace) and capture the output?

@honggyukim
Copy link
Collaborator

Here is the strace output.

$ strace ./t-abc
execve("./t-abc", ["./t-abc"], 0x7fffe80b6130 /* 36 vars */) = 0
brk(NULL)                               = 0x7fffcc8ed000
arch_prctl(0x3001 /* ARCH_??? */, 0x7fffd3f37090) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/tls/haswell/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/tls/haswell", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/tls/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/tls", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/haswell/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/haswell", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/lib", {st_mode=S_IFDIR|0755, st_size=512, ...}) = 0
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/tls/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/tls/haswell/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/tls/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/tls/haswell", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/tls/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/tls/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/tls/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/tls", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/haswell/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/haswell/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/haswell/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/haswell", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/x86_64/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib/x86_64", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/honggyu/usr/release/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/home/honggyu/usr/release/lib", 0x7fffd3f362e0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=33872, ...}) = 0
mmap(NULL, 33872, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6029b67000
close(3)                                = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360q\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"..., 68, 880) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2029224, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6029ba0000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\t\233\222%\274\260\320\31\331\326\10\204\276X>\263"..., 68, 880) = 68
mmap(NULL, 2036952, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6029970000
mprotect(0x7f6029995000, 1847296, PROT_NONE) = 0
mmap(0x7f6029995000, 1540096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f6029995000
mmap(0x7f6029b0d000, 303104, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19d000) = 0x7f6029b0d000
mmap(0x7f6029b58000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f6029b58000
mmap(0x7f6029b5e000, 13528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f6029b5e000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f6029ba1480) = 0
mprotect(0x7f6029b58000, 12288, PROT_READ) = 0
mprotect(0x7f6029bab000, 4096, PROT_READ) = 0
mprotect(0x7f6029b9d000, 4096, PROT_READ) = 0
munmap(0x7f6029b67000, 33872)           = 0
getpid()                                = 29527
exit_group(0)                           = ?
+++ exited with 0 +++

@namhyung
Copy link
Owner

arch_prctl(0x3001 /* ARCH_??? */, 0x7fffd3f37090) = -1 EINVAL (Invalid argument)

This is the ARCH_CET_STATUS according to https://lore.kernel.org/lkml/20180710222639.8241-28-yu-cheng.yu@intel.com/.

But kernel returns -EINVAL so it might not have the CET support. Hmm..

@honggyukim
Copy link
Collaborator

Hmm.. It looks like a mismatching problem in Ubuntu not supporting arch_prctl(ARCH_CET_STATUS, ...) in kernel but glibc also doesn't support glibc.cpu.hwcaps=-IBT,-SHSTK in GLIBC_TUNABLES. That looks bad.

@wangmingyu84
Copy link

This error happened when using uftrace from binutils2.35.1.
I tested it by using binutils2.35, the result is OK.

@namhyung
Copy link
Owner

namhyung commented Mar 1, 2021

Thanks for the hint! I'll check what changed in the binutils recently.

@wangmingyu84
Copy link

wangmingyu84 commented Apr 6, 2021

After some simple investigation, it is found that even the following code can not execute the command correctly in aarch64 environment:

int main() {
return 0;
}

WARN: child terminated by signal: 7: Bus error
WARN: cannot open record data: /tmp/uftrace-live-Grb68N: No data available

Look at the log,function mcount_init_file is not executed. Is there something wrong with initialize mcount library?

Because I haven't tracked the code yet, I only know the information above.

In addition, I have multiple platforms here. If you need to test after modifying the code, you can let me know. I can assist in testing

@namhyung
Copy link
Owner

namhyung commented Apr 8, 2021

Thanks for the investigation. I'm busy with other stuff these days so couldn't have a time to look at it.

The mcount_init_file() is the first function when uftrace starts to trace so it seems to fail before the actual tracing.. hmm.

@honggyukim
Copy link
Collaborator

@wangmingyu84 Thanks very much for your help. Could you please show us the full log of running the following command for the example above?

$ uftrace record -vvv ./test_program

@namhyung
Copy link
Owner

Can you please test check/dynamic-fix?

@honggyukim
Copy link
Collaborator

honggyukim commented Apr 12, 2021

Can you please test check/dynamic-fix?

@namhyung Thanks very much for the fix. It works fine now in Ubuntu 20.04 (x86_64).

After some simple investigation, it is found that even the following code can not execute the command correctly in aarch64 environment:

@wangmingyu84 I don't see an error when testing in aarch64. Could you test it again in your aarch64 environment with @namhyung's fix in check/dynamic-fix?

@leimaohui
Copy link
Contributor

leimaohui commented Apr 14, 2021

The bug in Ubuntu 20.04(binutils 2.24) and Ubuntu 20.10(binutils 2.35.1) may be not the same.
@honggyukim Hi, have you tested on Ubuntu 20.10 of aarch64?

@namhyung
Copy link
Owner

OK, so we have another problem in Ubuntu 20.01... hmm.

@namhyung
Copy link
Owner

Let's discuss the 20.10 issue in a separate thread and close this.

@namhyung namhyung changed the title Dynamic/Full Dynamic tracing error on ubuntu 20.04 and 20.10 Dynamic/Full Dynamic tracing error on ubuntu 20.04 Apr 15, 2021
@honggyukim
Copy link
Collaborator

Hi, have you tested on Ubuntu 20.10 of aarch64?

Hi @leimaohui, I tested it on Ubuntu 20.04 in aarch64 only. But it didn't have any problem even without this fix.

@honggyukim
Copy link
Collaborator

honggyukim commented Apr 18, 2021

@honggyukim Hi, have you tested on Ubuntu 20.10 of aarch64?

Hi @leimaohui, it works fine on Ubuntu 20.10 in x86_64 as follows.

$ cat /etc/os-release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 20.10"

$ ./uftrace --version
uftrace v0.9.4-236-g1355 ( x86_64 dwarf python luajit tui perf sched dynamic )

$ gcc -o t-abc tests/s-abc.c 

$ ./uftrace -L. -P. t-abc 
# DURATION     TID     FUNCTION
            [  3028] | main() {
            [  3028] |   a() {
            [  3028] |     b() {
            [  3028] |       c() {
   1.238 us [  3028] |         getpid();
   3.496 us [  3028] |       } /* c */
   4.253 us [  3028] |     } /* b */
   5.117 us [  3028] |   } /* a */
   6.848 us [  3028] | } /* main */

I haven't tested it on Ubuntu 20.10 in aarch64 though.

@honggyukim
Copy link
Collaborator

honggyukim commented Apr 18, 2021

The bug in Ubuntu 20.04(binutils 2.24) and Ubuntu 20.10(binutils 2.35.1) may be not the same.

The Ubuntu 20.10 in x86_64 uses binutils 2.35.1 but works fine now.

$ ld -v
GNU ld (GNU Binutils for Ubuntu) 2.35.1

@leimaohui
Copy link
Contributor

Hi @leimaohui, it works fine on Ubuntu 20.10 in x86_64 as follows.

Yes, this issue only happens on aarch64. We have no problems on arm and x86-64.
So, I guess Ubuntu20.10 of aarch64 also has this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants