Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upRecent kernel causes -fPIE ASan executables to abort on x86_64 #837
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Jul 19, 2017
Contributor
It is possible at the code size and execution time cost, which we are not willing to pay.
Any chance to get the kernel to cooperate?
|
It is possible at the code size and execution time cost, which we are not willing to pay. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Jul 19, 2017
Contributor
This would not be the first time when the kernel change breaks the sanitizers.
The last significant one was by H.J. Lu when he changed the based from 0x7.... to 0x555....
It caused lots of trouble for us in msan and tsan.
What we really need here is to tell at link time where the shadow is.
AFAICT, there is no such capability currently.
|
This would not be the first time when the kernel change breaks the sanitizers. What we really need here is to tell at link time where the shadow is. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rnk
Jul 19, 2017
Contributor
I always wondered if it would be possible to express the shadow mapping as an ELF program header. That would be the ultimate way to communicate shadow memory needs to the kernel.
|
I always wondered if it would be possible to express the shadow mapping as an ELF program header. That would be the ultimate way to communicate shadow memory needs to the kernel. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jcowgill
Jul 19, 2017
I'm not sure - I'm just a user who happened to stumble across the bug. You might be able to get them to change where the executable gets mapped, but they could argue that PIE executables should be prepared to be loaded at any address.
What we really need here is to tell at link time where the shadow is.
I don't see how that is possible with PIE / ASLR. The entire point is that you don't know where the executable will be loaded, so you can't know what bits of memory will be free until runtime.
jcowgill
commented
Jul 19, 2017
|
I'm not sure - I'm just a user who happened to stumble across the bug. You might be able to get them to change where the executable gets mapped, but they could argue that PIE executables should be prepared to be loaded at any address.
I don't see how that is possible with PIE / ASLR. The entire point is that you don't know where the executable will be loaded, so you can't know what bits of memory will be free until runtime. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pcc
Jul 19, 2017
Contributor
We could have a program header that means "please reserve the first N bytes of the address space for the application". Then the kernel can use that as a minimum for ELF_ET_DYN_BASE.
|
We could have a program header that means "please reserve the first N bytes of the address space for the application". Then the kernel can use that as a minimum for ELF_ET_DYN_BASE. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
@dvyukov can you confirm that the fresh kernel breaks the sanitizers? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bennofs
Jul 30, 2017
I think I am hitting this bug:
$ ./loadaddr
==16572==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==16572==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range.
==16572==Process memory map follows:
0x04daa6e91000-0x04daa6fc6000 /tmp/loadaddr
0x04daa71c6000-0x04daa71c7000 /tmp/loadaddr
0x04daa71c7000-0x04daa71ca000 /tmp/loadaddr
0x04daa71ca000-0x04daa7e2f000
0x7b742c072000-0x7b742c3c4000
0x7b742c3c4000-0x7b742c559000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c559000-0x7b742c759000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c759000-0x7b742c75d000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c75d000-0x7b742c75f000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libc-2.25.so
0x7b742c75f000-0x7b742c763000
0x7b742c763000-0x7b742c779000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c779000-0x7b742c978000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c978000-0x7b742c979000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libgcc_s.so.1
0x7b742c979000-0x7b742c97c000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742c97c000-0x7b742cb7b000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7b000-0x7b742cb7c000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7c000-0x7b742cb7d000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libdl-2.25.so
0x7b742cb7d000-0x7b742cc8e000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742cc8e000-0x7b742ce8e000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce8e000-0x7b742ce8f000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce8f000-0x7b742ce90000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libm-2.25.so
0x7b742ce90000-0x7b742ce97000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742ce97000-0x7b742d096000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d096000-0x7b742d097000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d097000-0x7b742d098000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/librt-2.25.so
0x7b742d098000-0x7b742d0b1000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d0b1000-0x7b742d2b0000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b0000-0x7b742d2b1000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b1000-0x7b742d2b2000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/libpthread-2.25.so
0x7b742d2b2000-0x7b742d2b6000
0x7b742d2b6000-0x7b742d2d9000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4a8000-0x7b742d4bc000
0x7b742d4c0000-0x7b742d4d9000
0x7b742d4d9000-0x7b742d4da000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4da000-0x7b742d4db000 /nix/store/l48biijfr1j6d5kdg911051x2phfjrz7-glibc-2.25/lib/ld-2.25.so
0x7b742d4db000-0x7b742d4dc000
0x7fff06e24000-0x7fff06e46000 [stack]
0x7fff06ef0000-0x7fff06ef2000 [vvar]
0x7fff06ef2000-0x7fff06ef4000 [vdso]
0xffffffffff600000-0xffffffffff601000 [vsyscall]
==16572==End of process memory map.
c-cube:/tmp uname -a
Linux c-cube 4.9.39 #1-NixOS SMP Fri Jul 21 05:42:36 UTC 2017 x86_64 GNU/Linux
(Just compiled a trivial Hello world with -fsanitize=address)
bennofs
commented
Jul 30, 2017
•
|
I think I am hitting this bug:
(Just compiled a trivial Hello world with |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bennofs
Jul 30, 2017
A possible workaround seems to be the following:
$ .../ld-2.25.so ./loadaddr
That way, loadaddr will be loaded by ld.so, which uses mmap so loadaddr ends up in the mmap region which is way higher than the PIE base.
(Yes, my ld.so is in weird path, that's just NixOS things :)
bennofs
commented
Jul 30, 2017
•
|
A possible workaround seems to be the following:
That way, (Yes, my |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
FSMaxB
Aug 1, 2017
I independently bisected this in the kernel and opened a bug there: https://bugzilla.kernel.org/show_bug.cgi?id=196537 but didn't have a lot of knowledge about the underlying issues.
FSMaxB
commented
Aug 1, 2017
|
I independently bisected this in the kernel and opened a bug there: https://bugzilla.kernel.org/show_bug.cgi?id=196537 but didn't have a lot of knowledge about the underlying issues. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
Bringing this over from twitter (https://twitter.com/kayseesee/status/894594085608013825), my basic view is that this is a bug in the ASAN library code. Assuming you can use a particular virtual address range is not valid (it could already be in use for some reason, as you're now seeing), and even if it were valid, it's not safe for something that can be used in deployment; it exposes potentially sensitive information at an attacker-known address. ASAN simply needs to pay the cost of using a variable address chosen at runtime.
richfelker
commented
Aug 7, 2017
|
Bringing this over from twitter (https://twitter.com/kayseesee/status/894594085608013825), my basic view is that this is a bug in the ASAN library code. Assuming you can use a particular virtual address range is not valid (it could already be in use for some reason, as you're now seeing), and even if it were valid, it's not safe for something that can be used in deployment; it exposes potentially sensitive information at an attacker-known address. ASAN simply needs to pay the cost of using a variable address chosen at runtime. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 7, 2017
Contributor
@richfelker ASAN has been using fixed addresses since 2011.
I know kernel does not guarantee anything like this, but it worked, and it provided performance and code size benefits over using a dynamic shadow base (which we also have now, as an option, off by default on linux)
ASAN simply needs to pay the cost of using a variable address chosen at runtime.
That's one way to look at it. But a much better resolution would be to have a kernel<=>userspace interface that allows to use a fixed address. And in the meantime, revert the change that broke ASAN.
safe for something that can be used in deployment
If you want to discuss this topic, please open a separate issue, let's not mix too many things in a single place.
|
@richfelker ASAN has been using fixed addresses since 2011.
That's one way to look at it. But a much better resolution would be to have a kernel<=>userspace interface that allows to use a fixed address. And in the meantime, revert the change that broke ASAN.
If you want to discuss this topic, please open a separate issue, let's not mix too many things in a single place. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
Like I said on on the initial Twitter thread, I don't think I have much of value to say beyond "I think what you're doing is badly wrong" and "it happened to work before is not a good argument to do it (or for changing the kernel)". If we disagree then we disagree...
richfelker
commented
Aug 7, 2017
|
Like I said on on the initial Twitter thread, I don't think I have much of value to say beyond "I think what you're doing is badly wrong" and "it happened to work before is not a good argument to do it (or for changing the kernel)". If we disagree then we disagree... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
FSMaxB
Aug 7, 2017
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that.
Is that available in the current stable release of LLVM? And if yes, can you point me to some documentation please.
I think that information would be useful for downstream projects that find the runtime overhead of a dynamic shadow base is acceptable.
FSMaxB
commented
Aug 7, 2017
|
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that. Is that available in the current stable release of LLVM? And if yes, can you point me to some documentation please. I think that information would be useful for downstream projects that find the runtime overhead of a dynamic shadow base is acceptable. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bennofs
Aug 7, 2017
And in the meantime, revert the change that broke ASAN.
@kcc I don't think this is good advice. Pretty sure that the change fixes some security issue, so you shouldn't revert that.
bennofs
commented
Aug 7, 2017
•
@kcc I don't think this is good advice. Pretty sure that the change fixes some security issue, so you shouldn't revert that. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
I agree strongly with @bennofs. Address assignment/ASLR for production systems should not be tiptoeing around (and possibly impacting security) for the sake of a tool that's only suitable in debugging situations and not production. I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not.
richfelker
commented
Aug 7, 2017
|
I agree strongly with @bennofs. Address assignment/ASLR for production systems should not be tiptoeing around (and possibly impacting security) for the sake of a tool that's only suitable in debugging situations and not production. I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 7, 2017
Contributor
One more discussion thread is here: http://marc.info/?t=149973272100048&r=1&w=2
|
One more discussion thread is here: http://marc.info/?t=149973272100048&r=1&w=2 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 7, 2017
Contributor
@kcc: You mentioned a dynamic shadow base. Could you please elaborate on that.
In clang there is -mllvm -asan-force-dynamic-shadow=1, which is the default on Windows.
I don't think this has been implemented in GCC.
This is currently an implementation detail (on windows), not documented.
should not be tiptoeing around
All these arguments are perfectly valid, but who is going to pay for the increased CPU usage and code size? Or, if we end up supporting both configurations on linux (dynamic and static) who is going to pay for the extra maintenance overhead?
We really need to come up with a solution where the application requests a fixed address range at startup and the kernel can't refuse.
In clang there is
All these arguments are perfectly valid, but who is going to pay for the increased CPU usage and code size? Or, if we end up supporting both configurations on linux (dynamic and static) who is going to pay for the extra maintenance overhead? We really need to come up with a solution where the application requests a fixed address range at startup and the kernel can't refuse. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
FSMaxB
Aug 7, 2017
@kcc: Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1)
FSMaxB
commented
Aug 7, 2017
|
@kcc: Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 7, 2017
Contributor
@FSMaxB please open a separate bug with details.
But please note: this flag is not officially supported.
|
@FSMaxB please open a separate bug with details. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
Requesting a fixed address range at startup is non-PIE. Normal non-PIE ELF already has a way to do that: PT_LOAD segments (e.g. with PROT_NONE or just BSS you can MAP_FIXED over later). The whole point of an executable being PIE is that it doesn't demand specific addresses.
Being that current kernels don't, and future kernels probably won't, support the invalid usage of assuming a particular fixed address range is free, the fixed address mode should just be removed and dynamic always used. This will simplify the amount of code that needs to be maintained anyway (since Windows already needs dynamic). Performance is not likely to be significantly worse, but ASAN already performs badly and is intended and understood as a costly (but less so than some other approaches) tool for debugging (and possibly in the future, for hardening).
richfelker
commented
Aug 7, 2017
|
Requesting a fixed address range at startup is non-PIE. Normal non-PIE ELF already has a way to do that: PT_LOAD segments (e.g. with PROT_NONE or just BSS you can MAP_FIXED over later). The whole point of an executable being PIE is that it doesn't demand specific addresses. Being that current kernels don't, and future kernels probably won't, support the invalid usage of assuming a particular fixed address range is free, the fixed address mode should just be removed and dynamic always used. This will simplify the amount of code that needs to be maintained anyway (since Windows already needs dynamic). Performance is not likely to be significantly worse, but ASAN already performs badly and is intended and understood as a costly (but less so than some other approaches) tool for debugging (and possibly in the future, for hardening). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 7, 2017
Contributor
Asan's shadow being at a fixed offset does not really contradict PIE -- the rest of the addresses could be anywhere they want to (except for the shadow region).
BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow.
|
Asan's shadow being at a fixed offset does not really contradict PIE -- the rest of the addresses could be anywhere they want to (except for the shadow region). BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
The view I'm putting forward, which you're free to disagree with but I think is worthwhile, is that the definition of PIE is "no fixed mappings", not "some non-fixed mappings". In this definition, PIE ELF programs can even be loaded in rather esoteric environments like a shared address space (multiple programs in the same process) or a nommu system (where all processes share an address space). There are very good reasons to consider any fixed mappings a design bug; in places where they've been used recently, they've repeatedly come back to bite the designers and users. The Linux/glibc x86_64 "vsyscall" mess, ARM kuserhelper page, etc. come to mind.
richfelker
commented
Aug 7, 2017
|
The view I'm putting forward, which you're free to disagree with but I think is worthwhile, is that the definition of PIE is "no fixed mappings", not "some non-fixed mappings". In this definition, PIE ELF programs can even be loaded in rather esoteric environments like a shared address space (multiple programs in the same process) or a nommu system (where all processes share an address space). There are very good reasons to consider any fixed mappings a design bug; in places where they've been used recently, they've repeatedly come back to bite the designers and users. The Linux/glibc x86_64 "vsyscall" mess, ARM kuserhelper page, etc. come to mind. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 7, 2017
BTW my view of these matters is somewhat broader than "Linux" because I'm thinking of/interested in the usage case of non-Linux implementations loading and executing programs using the Linux user-kernel ABI. This sort of generality is part of why I disagree with the view that the kernel is obligated to lay out memory the same way past versions did.
richfelker
commented
Aug 7, 2017
|
BTW my view of these matters is somewhat broader than "Linux" because I'm thinking of/interested in the usage case of non-Linux implementations loading and executing programs using the Linux user-kernel ABI. This sort of generality is part of why I disagree with the view that the kernel is obligated to lay out memory the same way past versions did. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
yugr
Aug 8, 2017
BTW, I am trying to get the fresh perf numbers on spec for static vs dynamic shadow.
May make sense to measure sanitized DSOs (where __asan_shadow_memory_dynamic_address is GOT-relocated), rather than sanitized executables.
yugr
commented
Aug 8, 2017
May make sense to measure sanitized DSOs (where |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
yugr
Aug 8, 2017
I'd like ASAN to be usable in production (which is why I mentioned that above) but at present it's not.
Relevant discussion in oss-security
yugr
commented
Aug 8, 2017
Relevant discussion in oss-security |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 8, 2017
Contributor
I've done an overnight run of SPEC2006 on my machine.
The results are surprisingly close.
But the run-to-run variation is too high, I'll need to find a less noisy machine.
static dynamic
400.perlbench, 1605.00, 1647.00, 1.03 << dynamic is 3% slower
401.bzip2, 779.00, 797.00, 1.02
403.gcc, 660.00, 686.00, 1.04
429.mcf, 593.00, 503.00, 0.85 << very noisy test
445.gobmk, 960.00, 956.00, 1.00
456.hmmer, 809.00, 812.00, 1.00
458.sjeng, 1214.00, 1227.00, 1.01
462.libquantum, 435.00, 442.00, 1.02
464.h264ref, 1193.00, 1207.00, 1.01
471.omnetpp, 881.00, 904.00, 1.03
473.astar, 704.00, 672.00, 0.95 << dynamic is 5% faster!
483.xalancbmk, 1252.00, 1216.00, 0.97
433.milc, 860.00, 837.00, 0.97
444.namd, 583.00, 590.00, 1.01
447.dealII, 1659.00, 1627.00, 0.98
450.soplex, 454.00, 476.00, 1.05
453.povray, 648.00, 630.00, 0.97
470.lbm, 478.00, 460.00, 0.96
482.sphinx3, 811.00, 798.00, 0.98
I was also surprised to see that the code size with dynamic shadow is actually better (~0.3%).
Well, looking at the objdump it makes sense:
Dynamic:
9a8a66: 80 3c 01 00 cmpb $0x0,(%rcx,%rax,1)
Static:
41fd36: 80 b8 00 80 ff 7f 00 cmpb $0x0,0x7fff8000(%rax)
Next steps:
- find a proper noise-free machine for benchmarking
- check what happens with PIC/PIE, where loading the shadow base is more expensive
- check what's going on on ARM (I'll certainly need help with that)
|
I've done an overnight run of SPEC2006 on my machine.
I was also surprised to see that the code size with dynamic shadow is actually better (~0.3%). Dynamic:
Static:
Next steps:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 8, 2017
Contributor
The difference between regular executables and PIE:
Regular:
4e7f74: 4c 8b 35 9d 2c 44 00 mov 0x442c9d(%rip),%r14 # 92ac18 <__asan_shadow_memory_dynamic_address>
PIE (or -shared-libasan):
e9504: 48 8d 05 0d 27 44 00 lea 0x44270d(%rip),%rax # 52bc18 <__asan_shadow_memory_dynamic_address>
e950b: 4c 8b 30 mov (%rax),%r14
|
The difference between regular executables and PIE:
PIE (or -shared-libasan):
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
pcc
Aug 8, 2017
Contributor
It looks like the linker is applying relocation relaxation in the PIE/-shared-libasan case, so we end up with a single indirection in the final executable. If you look at the object files you should see two mov instructions.
Are you sure you are linking against the libasan DSO when you build with -shared-libasan? I'd expect to see two movs in the executable unless libasan is being linked statically.
|
It looks like the linker is applying relocation relaxation in the PIE/-shared-libasan case, so we end up with a single indirection in the final executable. If you look at the object files you should see two mov instructions. Are you sure you are linking against the libasan DSO when you build with -shared-libasan? I'd expect to see two movs in the executable unless libasan is being linked statically. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 8, 2017
Contributor
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 && objdump -d a.out | grep "<main>:" -A 6
4e7f74: 4c 8b 35 9d 2c 44 00 mov 0x442c9d(%rip),%r14 # 92ac18 <__asan_shadow_memory_dynamic_address>
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -shared-libasan && objdump -d a.out | grep "<main>:" -A 6
4007a4: 4c 8b 35 b5 08 20 00 mov 0x2008b5(%rip),%r14 # 601060 <__TMC_END__>
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIE -pie && objdump -d a.out | grep "<main>:" -A 6
e9504: 48 8d 05 0d 27 44 00 lea 0x44270d(%rip),%rax # 52bc18 <__asan_shadow_memory_dynamic_address>
e950b: 4c 8b 30 mov (%rax),%r14
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIE -pie -shared-libasan && objdump -d a.out | grep "<main>:" -A 6
984: 48 8b 05 6d 06 20 00 mov 0x20066d(%rip),%rax # 200ff8 <_DYNAMIC+0x258>
98b: 4c 8b 30 mov (%rax),%r14
% ldd a.out | grep asan
libclang_rt.asan-x86_64.so => not found
So, -fPIE -pie -shared-libasan gives us two loads.
So, |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 9, 2017
Contributor
% clang++ -fsanitize=address -O1 a.cc -mllvm -asan-force-dynamic-shadow=1 -fPIC -shared && objdump -d a.out | grep "<main>:" -A 6
874: 48 8b 05 7d 07 20 00 mov 0x20077d(%rip),%rax # 200ff8 <_DYNAMIC+0x218>
87b: 4c 8b 30 mov (%rax),%r14
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rnk
Aug 9, 2017
Contributor
If we care about ELF + dynamic shadow base, we should duplicate the shadow base global into every DSO. We could add a hidden visibility comdat global with the shadow base to every object file and let the linker merge them. A high priority initializer would set it. This is similar to what we do on Windows.
|
If we care about ELF + dynamic shadow base, we should duplicate the shadow base global into every DSO. We could add a hidden visibility comdat global with the shadow base to every object file and let the linker merge them. A high priority initializer would set it. This is similar to what we do on Windows. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
richfelker
Aug 9, 2017
That seems workable, but before pulling in heavy machinery like that there should be some justification, i.e. a measurement that shows it makes a significant difference. The whole reason we have this problem to begin with is because somebody decided to do a premature optimization with a fixed shadow base address that apparently made virtually no performance difference...
richfelker
commented
Aug 9, 2017
|
That seems workable, but before pulling in heavy machinery like that there should be some justification, i.e. a measurement that shows it makes a significant difference. The whole reason we have this problem to begin with is because somebody decided to do a premature optimization with a fixed shadow base address that apparently made virtually no performance difference... |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 9, 2017
Contributor
somebody decided
That was me in 2011, and I've made measurements at that time and they were in favor of my decision. Looks like not any more (not 100% confident though, independent evaluation is welcome)
That was me in 2011, and I've made measurements at that time and they were in favor of my decision. Looks like not any more (not 100% confident though, independent evaluation is welcome) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dvyukov
Aug 9, 2017
Contributor
Do we/does it make sense/possible to mark the global with some special attributes so that compiler knows that it never changes in generated code under any circumstances, so that it can freely cache it in a register across functions/calls/loops?
|
Do we/does it make sense/possible to mark the global with some special attributes so that compiler knows that it never changes in generated code under any circumstances, so that it can freely cache it in a register across functions/calls/loops? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rnk
Aug 9, 2017
Contributor
@dvyukov Right now dynamic shadow base is only loaded once per function call. The load (or two loads for DSOs) happen the prologue, and that value is typically allocated to a register live across the whole function. Unfortunately, I think LLVM's rematerialization is primitive. It mostly rematerializes constants.
|
@dvyukov Right now dynamic shadow base is only loaded once per function call. The load (or two loads for DSOs) happen the prologue, and that value is typically allocated to a register live across the whole function. Unfortunately, I think LLVM's rematerialization is primitive. It mostly rematerializes constants. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dtzWill
Aug 9, 2017
Do we/does it make sense/possible to mark the global with some special attributes so that compiler knows that it never changes in generated code under any circumstances, so that it can freely cache it in a register across functions/calls/loops?
At least in LLVM you can-- global declarations can be marked const for pretty much this purpose, excerpt from the LLVM LangRef
LLVM explicitly allows declarations of global variables to be marked constant, even if the final definition of the global is not. This capability can be used to enable slightly better optimization of the program, but requires the language definition to guarantee that optimizations based on the ‘constantness’ are valid for the translation units that do not include the definition.
dtzWill
commented
Aug 9, 2017
•
At least in LLVM you can-- global declarations can be marked const for pretty much this purpose, excerpt from the LLVM LangRef
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dtzWill
Aug 9, 2017
That was me in 2011, and I've made measurements at that time and they were in favor of my decision. Looks like not any more (not 100% confident though, independent evaluation is welcome)
I've never benchmarked ASAN but I've benchmarked thoroughly various shadow-memory systems (taint tracking, etc.) and I can confirm that a constant shadow location is a small but significant optimization. I can search to see if I have any charts handy.
The biggest win IIRC was that a constant address let you be clever about selecting your shadow memory range such that mapping program pointers to their shadow location could be done in fewer instructions (how many depended on the "density" of the mapping, 1:1 or are you bit-packing?).
I was also inlining the runtime, not sure what ASAN does in this regard.
(there are multiple papers about the efficient engineering of these things, FWIW)
dtzWill
commented
Aug 9, 2017
•
I've never benchmarked ASAN but I've benchmarked thoroughly various shadow-memory systems (taint tracking, etc.) and I can confirm that a constant shadow location is a small but significant optimization. I can search to see if I have any charts handy. The biggest win IIRC was that a constant address let you be clever about selecting your shadow memory range such that mapping program pointers to their shadow location could be done in fewer instructions (how many depended on the "density" of the mapping, 1:1 or are you bit-packing?). I was also inlining the runtime, not sure what ASAN does in this regard. (there are multiple papers about the efficient engineering of these things, FWIW) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
kcc
Aug 9, 2017
Contributor
ASan's mapping is 8=>1 (no bit packing though, details here: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#mapping
When I last checked a few years ago, the big difference was between using 0, 0x7fff8000 and something like (1ULL << 43).
'0' is the fastest and provides the smallest code but does not work with non-PIE binaries on linux (we use 0 base on Android)
(1ULL << 43) or some such was used for a while, but then Jakub Jelenek suggested 0x7fff8000 as a compromise between 0 and (1ULL << 43). 0x7fff8000 on x86_64 gave us most of the code size and most of the performance of 0 with a much greater compatibility.
|
ASan's mapping is 8=>1 (no bit packing though, details here: https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#mapping When I last checked a few years ago, the big difference was between using
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
chefmax
Aug 11, 2017
Collaborator
Forcing the dynamic shadow doesn't work on my system! (Archlinux x86_64 with clang 4.0.1)
AFAIK dynamic shadow isn't supported in ASan runtime for Linux (FindAvailableMemoryRange contains UNREACHABLE) so that's expected. Possible implementation would be just to mmap a large chunk for shadow, probably with some hint, in this routine.
check what's going on on ARM (I'll certainly need help with that)
FYI I'm trying to get numbers on my ARM Linux board, but I'll get some results only till the mid of next week (SPEC2006 is very time consuming on my weak ARM board).
In clang there is -mllvm -asan-force-dynamic-shadow=1, which is the default on Windows. I don't think this has been implemented in GCC.
Yes, this is not implemented in GCC, but I don't think it's hard to do (I have a patch that passes GCC ASan bootstrap, but it needs some polishing).
AFAIK dynamic shadow isn't supported in ASan runtime for Linux (FindAvailableMemoryRange contains UNREACHABLE) so that's expected. Possible implementation would be just to mmap a large chunk for shadow, probably with some hint, in this routine.
FYI I'm trying to get numbers on my ARM Linux board, but I'll get some results only till the mid of next week (SPEC2006 is very time consuming on my weak ARM board).
Yes, this is not implemented in GCC, but I don't think it's hard to do (I have a patch that passes GCC ASan bootstrap, but it needs some polishing). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
chefmax
Aug 15, 2017
Collaborator
FYI I'm trying to get numbers on my ARM Linux board, but I'll get some results only till the mid of next week (SPEC2006 is very time consuming on my weak ARM board).
So, I've got some numbers on my ARM Linux board. I've used SPEC2006 train size (the board almost died under ref), but even with train noise between test runs was quite low (~1%) for most tests (except perl and hmmer, where noise was ~3%):
Static CFLAGS= -O2 -fPIC -pie -shared-libasan
Dynamic CFLAGS=-O2 -fPIC -mllvm -asan-force-dynamic-shadow=1 -pie -shared-libasan
Processor:
processor : 0
model name : ARMv7 Processor rev 4 (v7l)
Features : swp half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc0f
CPU revision : 4
| Test | static | dynamic | dynamic slowdown (less is better) |
|---|---|---|---|
| 400.perlbench | 401 | 413 | 2.9% |
| 401.bzip2 | 232 | 238 | 2.5% |
| 429.mcf | 99.5 | 101 | 1.5% |
| 445.gobmk | 918 | 921 | 0.3% |
| 456.hmmer | 292 | 300 | 2.7% |
| 458.sjeng | 1610 | 1622 | 0.7% |
| 471.omnetpp | 850 | 852 | 0.2% |
| 473.astar | 415 | 425 | 2.4% |
| 483.xalancbmk | 774 | 777 | 0.4% |
| 433.milc | 78.4 | 79.9 | 1.9% |
| 444.namd | 52.7 | 54.4 | 3.2% |
| 447.dealII | 175 | 192 | 9.7% |
| 450.soplex | 38.2 | 38.7 | 1.3% |
| 453.povray | 78.2 | 81.6 | 4.3% |
| 470.lbm | 197 | 197 | 0.0% |
So, I've got some numbers on my ARM Linux board. I've used SPEC2006 train size (the board almost died under ref), but even with train noise between test runs was quite low (~1%) for most tests (except perl and hmmer, where noise was ~3%): Static CFLAGS=
|
4144
referenced this issue
Aug 16, 2017
Closed
Issue in kernel what prevent using asan flags with gcc or clang #2105
kcc
referenced this issue
Sep 6, 2017
Open
Workarounds for #837 (Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.) #856
alexcrichton
referenced this issue
Sep 7, 2017
Merged
travis: Downgrade to previous images temporarily #44399
added a commit
to alexcrichton/rust
that referenced
this issue
Sep 8, 2017
added a commit
to varnishcache/varnish-cache
that referenced
this issue
Sep 10, 2017
added a commit
to rust-lang/rust
that referenced
this issue
Sep 11, 2017
added a commit
to varnishcache/varnish-cache
that referenced
this issue
Sep 15, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
eugenis
Oct 4, 2017
Btw, ASan on 32-bit Android maps shadow at 0000 0000 .. 2000 0000, because all executables are PIE, and it is slightly faster that way (and requires less code). This is now broken.
eugenis
commented
Oct 4, 2017
|
Btw, ASan on 32-bit Android maps shadow at 0000 0000 .. 2000 0000, because all executables are PIE, and it is slightly faster that way (and requires less code). This is now broken. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
eugenis
Oct 17, 2017
If we care about ELF + dynamic shadow base, we should duplicate the shadow base global into every DSO. We could add a hidden visibility comdat global with the shadow base to every object file and let the linker merge them. A high priority initializer would set it. This is similar to what we do on Windows.
This will not always work. If library A depends on library B, then a constructor of B may call A before A's constructors have ran.
eugenis
commented
Oct 17, 2017
This will not always work. If library A depends on library B, then a constructor of B may call A before A's constructors have ran. |
added a commit
to cheshire/swift
that referenced
this issue
Oct 31, 2017
cheshire
referenced this issue
Oct 31, 2017
Merged
Temporarily disable sanitizers tests on Linux, as recent kernel updat… #12681
added a commit
to cheshire/swift
that referenced
this issue
Oct 31, 2017
added a commit
to apple/swift
that referenced
this issue
Oct 31, 2017
added a commit
to cheshire/swift
that referenced
this issue
Nov 1, 2017
cheshire
referenced this issue
Nov 1, 2017
Closed
Temporarily disable sanitizers tests on Linux on swift-4.0 #12693
evverx
referenced this issue
Nov 10, 2017
Closed
AddressSanitizer has found 1876 bytes leaked in 25 allocations #7283
added a commit
to google/fruit
that referenced
this issue
Nov 18, 2017
added a commit
to hukeyue/libch
that referenced
this issue
Mar 18, 2018
izbyshev
referenced this issue
May 21, 2018
Open
ASAN aborts on x86_64 in children of 32-bit processes from Linux 4.11-rc3 until 4.17-rc6 #960
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
morehouse
Jun 8, 2018
Member
The kernel commit was ultimately reverted. Do we want to keep this issue open?
|
The kernel commit was ultimately reverted. Do we want to keep this issue open? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
eugenis
commented
Jun 8, 2018
|
I don't think it was reverted. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
eugenis
commented
Jun 8, 2018
|
Oh, I think it was reverted in Ubuntu kernel, but not in upstream. |
jcowgill commentedJul 19, 2017
This is just a heads-up about this Linux kernel commit recently committed and pending on a number of stable queues:
torvalds/linux@eab0953
It seems to adjust move the default load address for
-fPIEexecutables into the location ASan uses for its shadow memory map (on x86_64). This then causes ASan to abort on startup. Example error:With ASLR enabled, you can sometimes get lucky with the load address and the program runs, but most of the time ASan aborts with this error.
Is it possible for ASan to be a bit more flexible about where it places the shadow map on startup to fix this?