Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

None of the sanitizers work in Clang 3.9.1 #769

Closed
tony2001 opened this issue Feb 14, 2017 · 17 comments
Closed

None of the sanitizers work in Clang 3.9.1 #769

tony2001 opened this issue Feb 14, 2017 · 17 comments

Comments

@tony2001
Copy link

I get the following errors when trying to use any of the sanitizers:

# /home/tony/clang/bin/clang -fsanitize=address -o foo foo.c
/usr/lib64/gcc/x86_64-suse-linux/4.9/../../../../x86_64-suse-linux/bin/ld: cannot find /home/tony/clang/bin/../lib/clang/3.9.1/lib/linux/libclang_rt.asan-x86_64.a: No such file or directory
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)

# /home/tony/clang/bin/clang -fsanitize=memory -o foo foo.c
/usr/lib64/gcc/x86_64-suse-linux/4.9/../../../../x86_64-suse-linux/bin/ld: cannot find /home/tony/clang/bin/../lib/clang/3.9.1/lib/linux/libclang_rt.msan-x86_64.a: No such file or directory
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)

# /home/tony/clang/bin/clang -fsanitize=undefined -o foo foo.c
/usr/lib64/gcc/x86_64-suse-linux/4.9/../../../../x86_64-suse-linux/bin/ld: cannot find /home/tony/clang/bin/../lib/clang/3.9.1/lib/linux/libclang_rt.ubsan_standalone-x86_64.a: No such file or directory
clang-3.9: error: linker command failed with exit code 1 (use -v to see invocation)

Clang/LLVM version: 3.9.1
OS: SUSE Linux Enterprise Server 11 SP3 x86-64
Compiler used to build Clang: gcc-6.3.0

How to reproduce:

# wget http://releases.llvm.org/3.9.1/cfe-3.9.1.src.tar.xz
# wget http://releases.llvm.org/3.9.1/llvm-3.9.1.src.tar.xz
# tar -xJf llvm-3.9.1.src.tar.xz
# tar -xJf cfe-3.9.1.src.tar.xz
# mv cfe-3.9.1.src llvm-3.9.1.src/tools/clang
# mkdir llvm-3.9.1.src/build
# cd llvm-3.9.1.src/build
# cmake -DCMAKE_INSTALL_PREFIX=/home/tony/clang -DCMAKE_BUILD_TYPE=MinSizeRel ..
# make install

I saw many similar bug reports, but all of them were related to autotools build and were fixed several years ago. The current build doesn't even mention autotools anymore, yet the problem still persists.
Any idea what I'm doing wrong?

@chefmax
Copy link

chefmax commented Feb 14, 2017

Hm, it seems that you forgot to checkout compiler-rt sources (http://releases.llvm.org/3.9.1/compiler-rt-3.9.1.src.tar.xz):

# wget http://releases.llvm.org/3.9.1/compiler-rt-3.9.1.src.tar.xz
# tar -xJf compiler-rt-3.9.1.src.tar.xz
# mv compiler-rt-3.9.1.src llvm-3.9.1.src/projects/compiler-rt

@tony2001
Copy link
Author

Thank you. That helped indeed.
Unfortunately, ASAN crashes right after the start:

ASAN:DEADLYSIGNAL
=================================================================
==5074==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000008 (pc 0x7ffff00e4db2 bp 0x7f0000b000010000 sp 0x7fffffff1630 T0)
==5074==The signal is caused by a READ memory access.
==5074==Hint: address points to the zero page.
   #0 0x7ffff00e4db1  (/lib64/libc.so.6+0x7adb1)"
    #1 0x7ffff00e810b  (/lib64/libc.so.6+0x7e10b)"
    #2 0x7fffcc5d4e88  (/lib64/libnss_dns.so.2+0x2e88)"
    #3 0x7ffff013180a  (/lib64/libc.so.6+0xc780a)"
    #4 0x7ffff0134645  (/lib64/libc.so.6+0xca645)

It looks like free() in a shared library (libcurl in this case) causes it:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff00e4db2 in _int_free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff00e4db2 in _int_free () from /lib64/libc.so.6
#1  0x00007ffff00e810c in free () from /lib64/libc.so.6
#2  0x00007fffcc470e89 in _nss_dns_gethostbyname4_r () from /lib64/libnss_dns.so.2
#3  0x00007ffff013180b in gaih_inet () from /lib64/libc.so.6
#4  0x00007ffff0134646 in getaddrinfo () from /lib64/libc.so.6
#5  0x000000000056d797 in __interceptor_getaddrinfo (node=0x6120007ec940 "ip2geo.d3", service=0x7fffffff3170 "80", hints=0x7fffffff3190, out=0x7fffffff3088)
    at /home/tony/llvm-3.9.1.src/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:2169
#6  0x00007ffff384dd30 in Curl_getaddrinfo_ex () from /local/lib/php7/lib/libcurl.so.4
#7  0x00007ffff3848051 in Curl_getaddrinfo () from /local/lib/php7/lib/libcurl.so.4
#8  0x00007ffff3809025 in Curl_resolv () from /local/lib/php7/lib/libcurl.so.4
#9  0x00007ffff380930b in Curl_resolv_timeout () from /local/lib/php7/lib/libcurl.so.4
#10 0x00007ffff382cb97 in resolve_server () from /local/lib/php7/lib/libcurl.so.4
#11 0x00007ffff382dfb3 in create_conn () from /local/lib/php7/lib/libcurl.so.4
#12 0x00007ffff382e232 in Curl_connect () from /local/lib/php7/lib/libcurl.so.4
#13 0x00007ffff3842672 in multi_runsingle () from /local/lib/php7/lib/libcurl.so.4
#14 0x00007ffff3843b7d in curl_multi_perform () from /local/lib/php7/lib/libcurl.so.4
#15 0x00007ffff38395b7 in easy_transfer () from /local/lib/php7/lib/libcurl.so.4
#16 0x00007ffff3839785 in easy_perform () from /local/lib/php7/lib/libcurl.so.4
#17 0x00007ffff38397d6 in curl_easy_perform () from /local/lib/php7/lib/libcurl.so.4

Any idea what could be wrong this time?

@chefmax
Copy link

chefmax commented Feb 14, 2017

It's hard to tell without more context. Could you provide a disassembly around failed instruction?

@tony2001
Copy link
Author

Sure, here it is:

Dump of assembler code for function _int_free:
   0x00007ffff00e4550 <+0>:     push   %r15
   0x00007ffff00e4552 <+2>:     push   %r14
   0x00007ffff00e4554 <+4>:     push   %r13
   0x00007ffff00e4556 <+6>:     mov    %rdi,%r13
   0x00007ffff00e4559 <+9>:     push   %r12
   0x00007ffff00e455b <+11>:    push   %rbp
   0x00007ffff00e455c <+12>:    push   %rbx
   0x00007ffff00e455d <+13>:    mov    %rsi,%rbx
   0x00007ffff00e4560 <+16>:    sub    $0x38,%rsp
   0x00007ffff00e4564 <+20>:    mov    0x8(%rsi),%rdx
   0x00007ffff00e4568 <+24>:    mov    %rdx,%rbp
   0x00007ffff00e456b <+27>:    and    $0xfffffffffffffff8,%rbp
   0x00007ffff00e456f <+31>:    mov    %rbp,%rax
   0x00007ffff00e4572 <+34>:    neg    %rax
   0x00007ffff00e4575 <+37>:    cmp    %rax,%rsi
   0x00007ffff00e4578 <+40>:    ja     0x7ffff00e4d80 <_int_free+2096>
   0x00007ffff00e457e <+46>:    test   $0xf,%sil
   0x00007ffff00e4582 <+50>:    jne    0x7ffff00e4d80 <_int_free+2096>
   0x00007ffff00e4588 <+56>:    cmp    $0x1f,%rbp
   0x00007ffff00e458c <+60>:    lea    0xc4d51(%rip),%rsi        # 0x7ffff01a92e4
   0x00007ffff00e4593 <+67>:    jbe    0x7ffff00e4b98 <_int_free+1608>
   0x00007ffff00e4599 <+73>:    cmp    0x2fe1b8(%rip),%rbp        # 0x7ffff03e2758 <global_max_fast>
   0x00007ffff00e45a0 <+80>:    jbe    0x7ffff00e48c0 <_int_free+880>
   0x00007ffff00e45a6 <+86>:    test   $0x2,%dl
   0x00007ffff00e45a9 <+89>:    jne    0x7ffff00e48a8 <_int_free+856>
   0x00007ffff00e45af <+95>:    mov    0x58(%rdi),%rcx
   0x00007ffff00e45b3 <+99>:    lea    0xc830e(%rip),%rsi        # 0x7ffff01ac8c8
   0x00007ffff00e45ba <+106>:   cmp    %rbx,%rcx
   0x00007ffff00e45bd <+109>:   je     0x7ffff00e4b98 <_int_free+1608>
   0x00007ffff00e45c3 <+115>:   testb  $0x2,0x4(%rdi)
   0x00007ffff00e45c7 <+119>:   lea    (%rbx,%rbp,1),%r12
   0x00007ffff00e45cb <+123>:   je     0x7ffff00e4db2 <_int_free+2146>
   0x00007ffff00e45d1 <+129>:   mov    0x8(%r12),%rax
   0x00007ffff00e45d6 <+134>:   lea    0xc832b(%rip),%rsi        # 0x7ffff01ac908
   0x00007ffff00e45dd <+141>:   test   $0x1,%al
   0x00007ffff00e45df <+143>:   je     0x7ffff00e4b98 <_int_free+1608>
   0x00007ffff00e45e5 <+149>:   cmp    $0x10,%rax

@yugr
Copy link

yugr commented Feb 14, 2017

Hm, Glibc free has been called instead of ASan's interceptor. Did you build main executable with -fsanitize=address or just some shared library?

@chefmax
Copy link

chefmax commented Feb 14, 2017

This is internal free... perhaps it's OK to call it if corresponding malloc also was internal.

@ramosian-glider
Copy link
Member

ramosian-glider commented Feb 14, 2017 via email

@tony2001
Copy link
Author

I've built the libraries as usual (only added -fno-omit-frame-pointer) and the main binary was built then with -fsanitize=address.
The error above is reproducible with both Clang 3.9.1 and GCC 6.3.0.
I also tried LD_PRELOAD'ing ASAN after building the binary with GCC, the result was the same.

@chefmax
Copy link

chefmax commented Feb 14, 2017

Well, glibc has mechanisms how to bypass ASan interceptors (it can call malloc/free directly without plt). But I haven't seen problems with that because Glibc tends to call internal free on pointers that were allocated by internal malloc.

@chefmax
Copy link

chefmax commented Feb 14, 2017

@tony2001 Could you provide disasm around 0x00007ffff00e4db2?

@tony2001
Copy link
Author

Here it is:

   0x00007ffff00e4d50 <+2048>:  pop    %r14
   0x00007ffff00e4d52 <+2050>:  pop    %r15
   0x00007ffff00e4d54 <+2052>:  jmpq   0x7ffff00e2c60 <sYSTRIm>
   0x00007ffff00e4d59 <+2057>:  nopl   0x0(%rax)
   0x00007ffff00e4d60 <+2064>:  lea    (%r12,%rbp,1),%rdi
   0x00007ffff00e4d64 <+2068>:  mov    $0x4,%edx
   0x00007ffff00e4d69 <+2073>:  mov    %rbx,%rsi
   0x00007ffff00e4d6c <+2076>:  callq  0x7ffff0145650 <madvise>
   0x00007ffff00e4d71 <+2081>:  jmpq   0x7ffff00e4b16 <_int_free+1478>
   0x00007ffff00e4d76 <+2086>:  nopw   %cs:0x0(%rax,%rax,1)
   0x00007ffff00e4d80 <+2096>:  lea    0xc4545(%rip),%rsi        # 0x7ffff01a92cc
   0x00007ffff00e4d87 <+2103>:  jmpq   0x7ffff00e4b98 <_int_free+1608>
   0x00007ffff00e4d8c <+2108>:  mov    0x2fb376(%rip),%edi        # 0x7ffff03e0108 <check_action>
   0x00007ffff00e4d92 <+2114>:  lea    0xc448a(%rip),%rsi        # 0x7ffff01a9223
   0x00007ffff00e4d99 <+2121>:  mov    %r15,%rdx
   0x00007ffff00e4d9c <+2124>:  callq  0x7ffff00e3010 <malloc_printerr>
   0x00007ffff00e4da1 <+2129>:  jmpq   0x7ffff00e47c7 <_int_free+631>
   0x00007ffff00e4da6 <+2134>:  lea    0xc7aa3(%rip),%rsi        # 0x7ffff01ac850
   0x00007ffff00e4dad <+2141>:  jmpq   0x7ffff00e4b98 <_int_free+1608>
=> 0x00007ffff00e4db2 <+2146>:  mov    0x8(%rcx),%rax
   0x00007ffff00e4db6 <+2150>:  lea    0xc7b2b(%rip),%rsi        # 0x7ffff01ac8e8
   0x00007ffff00e4dbd <+2157>:  and    $0xfffffffffffffff8,%rax
   0x00007ffff00e4dc1 <+2161>:  lea    (%rcx,%rax,1),%rax
   0x00007ffff00e4dc5 <+2165>:  cmp    %rax,%r12
   0x00007ffff00e4dc8 <+2168>:  jb     0x7ffff00e45d1 <_int_free+129>
   0x00007ffff00e4dce <+2174>:  jmpq   0x7ffff00e4b98 <_int_free+1608>
   0x00007ffff00e4dd3 <+2179>:  nopl   0x0(%rax,%rax,1)
   0x00007ffff00e4dd8 <+2184>:  lea    -0x10(%rbp),%rdx
   0x00007ffff00e4ddc <+2188>:  lea    0x10(%rbx),%rdi
   0x00007ffff00e4de0 <+2192>:  movzbl %al,%esi
   0x00007ffff00e4de3 <+2195>:  mov    %r8,0x8(%rsp)
   0x00007ffff00e4de8 <+2200>:  callq  0x7ffff00eff20 <__memset_sse2>
   0x00007ffff00e4ded <+2205>:  mov    0x8(%rbx),%rdx
   0x00007ffff00e4df1 <+2209>:  mov    0x8(%rsp),%r8
   0x00007ffff00e4df6 <+2214>:  jmpq   0x7ffff00e4611 <_int_free+193>
   0x00007ffff00e4dfb <+2219>:  lea    -0x10(%rbp),%rdx
   0x00007ffff00e4dff <+2223>:  lea    0x10(%rbx),%rdi
   0x00007ffff00e4e03 <+2227>:  movzbl %al,%esi
   0x00007ffff00e4e06 <+2230>:  callq  0x7ffff00eff20 <__memset_sse2>

@ramosian-glider
Copy link
Member

ramosian-glider commented Feb 14, 2017 via email

@tony2001
Copy link
Author

Which symbol are you looking for in the output? free()? Maybe I can cut the output a bit..

@ramosian-glider
Copy link
Member

ramosian-glider commented Feb 14, 2017 via email

@mrichtarsky
Copy link

I also ran into this issue a while back, it is specific to SLES 11. Upgrading to SLES 12 solved it for me, as did downgrading glibc. It happened first after upgrading to the glibc with the fix for CVE-2015-7547, around version 2.11.3-17.95.2. To avoid downgrading the system glibc I had a workaround in place which just placed the old versions of libnss_dns.so.2, libnss_files.so.2 and libresolv.so.2 in the LD_LIBRARY_PATH of our installation for our internal ASan tests.

I filed a bug with SuSE but since I didn't have a reasonably small reproducer, and a workaround in place, it didn't get fixed.
https://bugzilla.suse.com/show_bug.cgi?id=969241 (probably not public but you could refer to this bug)

The root cause of this is somewhere in the loading of the NSS libs with DEEPBIND, which will prevent interception of some relevant allocation functions. I hit a related issue much earlier, which I could workaround in ASan. Some related links for that are:
https://bugzilla.novell.com/show_bug.cgi?id=157078
https://sourceware.org/bugzilla/show_bug.cgi?id=6610

Best regards,
Martin

@yugr
Copy link

yugr commented Feb 15, 2017

From http://linux.unige.ch/install/suse/opensuse/11.4/i386/ChangeLog:

Remove the NSS hack of opening modules using RTLD_DEEPBIND.
This was useful for nss_ldap, since some applications used a different
LDAP library with clashing symbol names. However, it also created
many headaches, especially with the NSS modules not respecting
malloc() overrides. Now, sssd is used by default for LDAP resolutions
and we can therefore safely get rid of the hack. [bnc#477061]

So this is probly a dup of #611 indeed.

@tony2001
Copy link
Author

Indeed, it looks like RTLD_DEEPBIND is the culprit.
I'm still figuring out how to avoid it, though. It looks like one version of the patch used to enable RTLD_DEEPBIND in SuSE has a way to disable it using environment variable with the same name. Unfortunately, the version I see @ build.opensuse.org doesn't have that option..

Anyway, I agree that it's not an ASAN problem, so you can safely close this issue as a dup.
Thanks a lot for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants