-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First version of AddressSanitizer #32
Conversation
a0e0546
to
78c285a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 24 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL)
a discussion (no related file):
I did not review this PR (since it's WIP), but please post bug fixes as a separate PR (once you're done).
c2af8d4
to
78269e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 24 files at r1, 18 of 18 files at r2, all commit messages.
Reviewable status: all files reviewed, 25 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)
a discussion (no related file):
Tracking issue: #19 (but I don't consider this done yet, see "What's next")
FYI: I tweaked the checklist there to reflect the actual implementation order.
a discussion (no related file):
My best idea is to fork and exec add2line. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.
Quite hacky, but fine with me, it's just test code. And if in the future it turns out that we really need the full traceback then we can implement the GDB version.
a discussion (no related file):
However, this also requires cleaning (unpoisoning) all stack memory that we recycle or unmap. Two problems I see with that:
- There is the
shim_entry_api
for running LibOS functions from application (used mostly for glibc integration). These functions run on the user stack. I guess we should switch stacks for them as well.- I'm not sure which stack(s) do we use when handling signals?
This is something for @boryspoplawski to answer.
meson.build, line 110 at r2 (raw file):
if asan # TODO: When we start compiling C++ code with Meson, check for C++ compiler as well
It seems that we'll merge that change before this one, so I'm blocking to not forget about this while rebasing.
meson.build, line 120 at r2 (raw file):
'-mllvm', '-asan-use-after-return=0', '-mllvm', '-asan-stack=0', '-mllvm', '-asan-globals=0',
What about globals? Is it also hard to support? You didn't mention them in the PR description.
.ci/linux-sgx-sanitizers.jenkinsfile, line 16 at r2 (raw file):
load '.ci/lib/config-debug.jenkinsfile' load '.ci/lib/config-ubsan.jenkinsfile' load '.ci/lib/config-asan.jenkinsfile'
What about linux-direct-sanitizers.jenkinsfile
? Doesn't it need a similar change?
common/include/api.h, line 233 at r2 (raw file):
void* __memset_chk(void* dest, int ch, size_t count, size_t dest_count); /* Original version of functions that ASan overrides */
version
-> versions
common/include/asan.h, line 9 at r2 (raw file):
* This file defines functions for address sanitization (ASan). * * Normally, code compiled with ASAN is linked against a special library (libasan), but that library
ASAN
-> ASan
common/src/asan.c, line 1 at r2 (raw file):
missing license and copyright info
common/src/asan.c, line 10 at r2 (raw file):
#endif #define RETURN_ADDR __builtin_extract_return_addr(__builtin_return_address(0))
This should probably be parenthesized? Also, I'd prefer to add empty parenthesis to the macro so that it doesn't look like a constant.
common/src/asan.c, line 64 at r2 (raw file):
if (asan_check(bad_addr)) break;
Just for sanity, could you assert asan_check(bad_addr)
here? It's on the slow path, so it won't matter performance-wise and we unfortunately don't have for-else in C.
common/src/asan.c, line 125 at r2 (raw file):
__attribute_no_sanitize_address static void asan_report(void* ip_addr, uintptr_t addr, size_t size, bool is_load) {
no need for this empty line
common/src/asan.c, line 131 at r2 (raw file):
log_error("asan: %s while trying to %s %lu bytes at 0x%lx", bug_type, is_load ? "load" : "store", size, addr);
Could we replace "bytes" with "%s" and use singular version for single-byte stores? Not blocking, but the reports look really nice and this was the only thing which didn't look so nicely :)
common/src/asan.c, line 132 at r2 (raw file):
from beginning
This was a bit confusing to me, I had to look at the code to understand beginning of what this is. Maybe something like "beginning of access" or "beginning of memory access" would be better?
common/src/asan.c, line 168 at r2 (raw file):
} #define ASAN_LOAD(addr, size) \
please align this backslash
common/src/asan.c, line 171 at r2 (raw file):
do { \ if (asan_check_region(addr, size)) { \ asan_report(RETURN_ADDR, addr, size, true); \
please use named arg for the bool one
common/src/asan.c, line 176 at r2 (raw file):
} while(0) #define ASAN_STORE(addr, size) \
ditto
common/src/asan.c, line 179 at r2 (raw file):
do { \ if (asan_check_region(addr, size)) { \ asan_report(RETURN_ADDR, addr, size, false); \
ditto (named arg)
common/src/asan.c, line 192 at r2 (raw file):
} \ void __asan_report_load##size(uintptr_t addr) { \ asan_report(RETURN_ADDR, addr, size, true); \
ditto (named arg)
common/src/asan.c, line 196 at r2 (raw file):
} \ void __asan_report_store##size(uintptr_t addr) { \ asan_report(RETURN_ADDR, addr, size, false); \
ditto (named arg)
common/src/asan.c, line 215 at r2 (raw file):
void __asan_report_load_n(uintptr_t addr, size_t size) { asan_report(RETURN_ADDR, addr, size, true);
ditto (named arg)
common/src/asan.c, line 220 at r2 (raw file):
void __asan_report_store_n(uintptr_t addr, size_t size) { asan_report(RETURN_ADDR, addr, size, false);
ditto (named arg)
common/src/asan.c, line 259 at r2 (raw file):
__attribute__((alias("__asan_memmove"))) void* memmove(void*, const void*, uintptr_t);
uintptr_t
? not size_t
?
common/src/asan.c, line 269 at r2 (raw file):
int memcmp(const void* lhs, const void* rhs, size_t count) { ASAN_LOAD((uintptr_t)lhs, count); ASAN_LOAD((uintptr_t)lhs, count);
Gotcha! Should be rhs
;)
LibOS/shim/test/regression/test_libos.py, line 31 at r2 (raw file):
@unittest.skipUnless(os.environ.get('ASAN') == '1', 'test only enabled with ASAN=1') def test_020_asan(self):
test_020
-> test_021
Pal/src/slab.c, line 130 at r2 (raw file):
INIT_FAIL(PAL_ERROR_NOMEM, "cannot initialize slab manager"); #if STATIC_SLAB == 1 #ifdef ASAN
I think you can mix this into just one #if
using defined()
.
Pal/src/host/Linux-SGX/Makefile, line 136 at r2 (raw file):
$(call cmd,cpp_s_S) # Add `-Wunused-command-line-argument`, because with ASAN, clang complains that LLVM flags
ASAN
-> ASan
clang
-> Clang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 24 files at r1, 14 of 18 files at r2, all commit messages.
Reviewable status: all files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)
a discussion (no related file):
One of the most impressive PRs I've seen!
meson.build, line 112 at r2 (raw file):
# TODO: When we start compiling C++ code with Meson, check for C++ compiler as well if meson.get_compiler('c').get_id() != 'clang' error('ASan is currently supported only for Clang (CC=clang)')
What is the exact reason we only support ASan on Clang?
common/include/asan.h, line 95 at r2 (raw file):
#define ASAN_SHADOW_START 0x7fff8000 #define ASAN_SHADOW_SHIFT 3 #define ASAN_SHADOW_LENGTH (1ULL << 44)
How does this work? We start our shadow memory at 2GB (0x7fff8000
) and this shadow extends over 16TBs (1ULL << 44)??
I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)? But how can this work for e.g. Linux-SGX?
common/include/asan.h, line 123 at r2 (raw file):
* Load/store callbacks: * * - `load` / `store`: check if an memory under given address is accessible; if not, report the
an memory
-> memory
common/src/asan.c, line 40 at r2 (raw file):
_real_memset(shadow_ptr, 0, shadow_size); if (right_part_size) *(shadow_ptr + shadow_size) = right_part_size;
What is this magic with the right part of the memory-region-to-unpoison?
UPDATE: Looks like this shows us how many bytes are allocated in this 8-byte "end of object" (for non-8-16-24-32-...-byte-sized objects).
common/src/asan.c, line 68 at r2 (raw file):
/* If this is a partial right redzone, check the next byte */ if (*shadow_ptr < 0x80) shadow_ptr++;
I don't get it. Why is it guaranteed that after an 8-byte partially allocated object, there will be a redzone (== shadow memory contains the description of what kind of redzone)?
Is this because you added that padding[8]
to any allocated memory object (memobj
), so there is always at least 8 bytes in-between allocations? And these 8 bytes are filled with ASAN_POISON_HEAP_LEFT_REDZONE
at the host-allocation PAL level?
common/src/asan.c, line 267 at r2 (raw file):
} int memcmp(const void* lhs, const void* rhs, size_t count) {
What's the deal with memcmp()
? Why is it not __asan_memcmp
?
Pal/src/slab.c, line 81 at r2 (raw file):
} #ifdef ASAN asan_poison_region((uintptr_t)addr, ALLOC_ALIGN_UP(size), ASAN_POISON_HEAP_LEFT_REDZONE);
This is PAL-internal memory allocation. It is different from the app+LibOS memory allocation. I wonder if we could have different ASAN_POISON_...
macro for it? Maybe not in this PR, but in the future?
Pal/src/slab.c, line 106 at r2 (raw file):
} /* not a last object from low/high addresses, can't do anything about this case */ /* ASAN: keep the memory poisoned, because we're not returning it to the system */
I don't understand this comment. (This case is the counterpart to the above malloc case: this is memory allocated previously from the g_mem_pool
. Why is memory in g_mem_pool
always poisoned?)
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_FIXED_NOREPLACE; void* addr = (void*)DO_SYSCALL(mmap, (void*)ASAN_SHADOW_START, ASAN_SHADOW_LENGTH, prot, flags, /*fd=*/-1, /*offset=*/0);
This carves out a memory region (enclave or untrusted) starting from 2GB and till 16TB. This means that e.g. enclaves with sgx.non_pie = true
and sgx.enclave_size = 4G
will fail to start -- because they overlap with the shadow memory region of ASan.
Are there any other constants we could use? For example, starting shadow memory from e.g. 64GB or 1TB would be no problem at all (most enclaves I've seen have enclave size of max 32GB).
Otherwise, we have to put some explanations in the docs, that some combinations of SGX enclaves and ASan cannot work together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @pwmarcz)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
One of the most impressive PRs I've seen!
+1, great work!
common/include/asan.h, line 95 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
How does this work? We start our shadow memory at 2GB (
0x7fff8000
) and this shadow extends over 16TBs (1ULL << 44)??I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)? But how can this work for e.g. Linux-SGX?
This relies on Linux' lazy allocation of physical memory. Note that on SGX we allocate this memory outside the enclave and then just use it from inside - this is only a debugging feature, so this is perfectly fine.
common/src/asan.c, line 40 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
What is this magic with the right part of the memory-region-to-unpoison?
UPDATE: Looks like this shows us how many bytes are allocated in this 8-byte "end of object" (for non-8-16-24-32-...-byte-sized objects).
Yup. There's a nice explanation in asan.h
for this - "A low value (01..07) means only the first N bytes are accessible.".
common/src/asan.c, line 68 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I don't get it. Why is it guaranteed that after an 8-byte partially allocated object, there will be a redzone (== shadow memory contains the description of what kind of redzone)?
Is this because you added that
padding[8]
to any allocated memory object (memobj
), so there is always at least 8 bytes in-between allocations? And these 8 bytes are filled withASAN_POISON_HEAP_LEFT_REDZONE
at the host-allocation PAL level?
I think this is just a heuristic - we hit an invalid access for a byte we don't have info about, so we just try to take it from the next one, hoping that it's the same. But let's wait for Paweł's answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 22 of 29 files reviewed, 34 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @pwmarcz)
a discussion (no related file):
Previously, mkow (Michał Kowalczyk) wrote…
My best idea is to fork and exec add2line. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.
Quite hacky, but fine with me, it's just test code. And if in the future it turns out that we really need the full traceback then we can implement the GDB version.
Let's do it after this PR then; I have some proof-of-concept lying around but there might be some discussions about how to make it less messy.
a discussion (no related file):
Previously, mkow (Michał Kowalczyk) wrote…
+1, great work!
Thanks!
meson.build, line 112 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
What is the exact reason we only support ASan on Clang?
This is from memory, but:
- Obviously the
-mllvm ...
flags are Clang-specific - Not sure if it actually supports stack sanitization, etc.
- ASan is only on the newer GCC (the one in Ubuntu 20.04). That means at least making a more specific
#ifdef
for theno_sanitize("undefined")
attribute - I think it uses only the
__asan_*
etc. callbacks; doesn't generate inline code (so it should be slower, but actually easier to support)
I think it wouldn't be too hard to support GCC ASan, but without a Jenkins build it's likely to break (same as our Clang support broke multiple times before we added a Jenkins build). Since it would require at least some branches (ifdefs, conditions in Meson, etc), I don't think it's worth it, because GCC sanitizers are usually weaker anyway.
meson.build, line 120 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
What about globals? Is it also hard to support? You didn't mention them in the PR description.
From what I understand from libasan
code, tracking globals is (1) being able to track origin of memory (which our integration doesn't do), and (2) setting up a red zone around globals (which might be useful to catch buffer overflows).
I think the main obstacle is supporting .init
sections, because that's where the globals are registered (we would have to make sure the .init
code appears in our binaries and is being run by ELF loading code). I can try after this PR.
.ci/linux-sgx-sanitizers.jenkinsfile, line 16 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
What about
linux-direct-sanitizers.jenkinsfile
? Doesn't it need a similar change?
Good catch! Added, I hope this doesn't break Jenkins.
common/include/api.h, line 233 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
version
->versions
Done.
common/include/asan.h, line 9 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ASAN
->ASan
Done.
common/include/asan.h, line 95 at r2 (raw file):
How does this work? We start our shadow memory at 2GB (
0x7fff8000
) and this shadow extends over 16TBs (1ULL << 44)??
Yes.
I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)?
Yeah, looks like LLVM is betting on the fact that this area (2GB +16TB) is usually unmapped... I don't know how reliable that assumption is.
But how can this work for e.g. Linux-SGX?
Good point, it's going to break for larger enclaves. Let's continue in the thread for sgx_main.c
.
common/include/asan.h, line 123 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
an memory
->memory
Fixed.
common/src/asan.c, line 1 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
missing license and copyright info
Added.
common/src/asan.c, line 10 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
This should probably be parenthesized? Also, I'd prefer to add empty parenthesis to the macro so that it doesn't look like a constant.
Done.
common/src/asan.c, line 64 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Just for sanity, could you assert
asan_check(bad_addr)
here? It's on the slow path, so it won't matter performance-wise and we unfortunately don't have for-else in C.
Done.
common/src/asan.c, line 68 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I think this is just a heuristic - we hit an invalid access for a byte we don't have info about, so we just try to take it from the next one, hoping that it's the same. But let's wait for Paweł's answer.
Yes, this is just a heuristic, taken from LLVM. I think in rare cases it might fail (e.g. we're right at the end of a heap page, so the next shadow byte might be 0).
common/src/asan.c, line 125 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
no need for this empty line
Fixed.
common/src/asan.c, line 131 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Could we replace "bytes" with "%s" and use singular version for single-byte stores? Not blocking, but the reports look really nice and this was the only thing which didn't look so nicely :)
Done.
common/src/asan.c, line 132 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
from beginning
This was a bit confusing to me, I had to look at the code to understand beginning of what this is. Maybe something like "beginning of access" or "beginning of memory access" would be better?
Changed to "from beginning of access".
common/src/asan.c, line 168 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
please align this backslash
Fixed.
common/src/asan.c, line 171 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
please use named arg for the bool one
Done.
common/src/asan.c, line 176 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto
Done.
common/src/asan.c, line 179 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (named arg)
Done.
common/src/asan.c, line 192 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (named arg)
Done.
common/src/asan.c, line 196 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (named arg)
Done.
common/src/asan.c, line 215 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (named arg)
Done.
common/src/asan.c, line 220 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ditto (named arg)
Done.
common/src/asan.c, line 259 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
uintptr_t
? notsize_t
?
Right, fixed.
(LLVM uses the same type, uptr
, for addresses and sizes...)
common/src/asan.c, line 267 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
What's the deal with
memcmp()
? Why is it not__asan_memcmp
?
The ASAN interface requires just these three.
Actually, I'll reorganize it so that __asan_*
are aliases, and keep them separate. That should make the code clearer.
common/src/asan.c, line 269 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Gotcha! Should be
rhs
;)
Fixed. I wonder if we'll find any more bugs with that.
LibOS/shim/test/regression/test_libos.py, line 31 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
test_020
->test_021
Done.
Pal/src/slab.c, line 81 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This is PAL-internal memory allocation. It is different from the app+LibOS memory allocation. I wonder if we could have different
ASAN_POISON_...
macro for it? Maybe not in this PR, but in the future?
Probably we could, but sometimes LLVM generates inline poisoning code (mostly for stack). So I think it's best to treat these constants as part of ASan's interface, however badly defined it is.
Pal/src/slab.c, line 106 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
I don't understand this comment. (This case is the counterpart to the above malloc case: this is memory allocated previously from the
g_mem_pool
. Why is memory ing_mem_pool
always poisoned?)
At this point all of the memory should be poisoned by init_slab_mgr
, or by slab_free
(after freeing individual objects). But now that I look at it, it's confusing - better to have an explicit poison command here.
As for why I want unused parts of g_mem_pool
poisoned (unlike memory returned with DkVirtualMemoryFree
) instead of unpoisoning it, it's because I can: nothing other than the allocator will use it, so any access to the unused parts is a bug.
Changed to an explicit asan_poison
with ASAN_POISON_HEAP_LEFT_REDZONE
.
Pal/src/slab.c, line 130 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I think you can mix this into just one
#if
usingdefined()
.
OK.
Pal/src/host/Linux-SGX/Makefile, line 136 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
ASAN
->ASan
clang
->Clang
Done.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
This carves out a memory region (enclave or untrusted) starting from 2GB and till 16TB. This means that e.g. enclaves with
sgx.non_pie = true
andsgx.enclave_size = 4G
will fail to start -- because they overlap with the shadow memory region of ASan.Are there any other constants we could use? For example, starting shadow memory from e.g. 64GB or 1TB would be no problem at all (most enclaves I've seen have enclave size of max 32GB).
Otherwise, we have to put some explanations in the docs, that some combinations of SGX enclaves and ASan cannot work together.
Yeah, I tried out sgx.enclave_size = "4G"
and it's even worse: it silently maps the enclave over the first 2GB of the shadow map.
Shouldn't we fail setting up the enclave if there is something already mapped there? I'm looking at add_pages_to_enclave
, and could use MAP_FIXED_NOREPLACE
instead of MAP_FIXED
. Although there is also the "older drivers" section which maps pages through ioctl
to the driver, not sure how it behaves...
I will try using a starting address of 1TB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 20 of 29 files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @pwmarcz)
common/include/asan.h, line 91 at r5 (raw file):
* * The address has to be provided to the compiler using `-mllvm -asan-mapping-offset=0x...`. * BEWARE: You cannot use a power of 2, because LLVM special-cases such starting addresses and emits
Ugly, isn't it? :)
Initially, I wanted this to be configurable (compile with -DASAN_SHADOW_START=0x...
), but because of this caveat, I think it's better to just hardcode the address here.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Yeah, I tried out
sgx.enclave_size = "4G"
and it's even worse: it silently maps the enclave over the first 2GB of the shadow map.Shouldn't we fail setting up the enclave if there is something already mapped there? I'm looking at
add_pages_to_enclave
, and could useMAP_FIXED_NOREPLACE
instead ofMAP_FIXED
. Although there is also the "older drivers" section which maps pages throughioctl
to the driver, not sure how it behaves...I will try using a starting address of 1TB.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, 5 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)
meson.build, line 112 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
This is from memory, but:
- Obviously the
-mllvm ...
flags are Clang-specific- Not sure if it actually supports stack sanitization, etc.
- ASan is only on the newer GCC (the one in Ubuntu 20.04). That means at least making a more specific
#ifdef
for theno_sanitize("undefined")
attribute- I think it uses only the
__asan_*
etc. callbacks; doesn't generate inline code (so it should be slower, but actually easier to support)I think it wouldn't be too hard to support GCC ASan, but without a Jenkins build it's likely to break (same as our Clang support broke multiple times before we added a Jenkins build). Since it would require at least some branches (ifdefs, conditions in Meson, etc), I don't think it's worth it, because GCC sanitizers are usually weaker anyway.
+1, doesn't seem to be worth the effort
meson.build, line 120 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
From what I understand from
libasan
code, tracking globals is (1) being able to track origin of memory (which our integration doesn't do), and (2) setting up a red zone around globals (which might be useful to catch buffer overflows).I think the main obstacle is supporting
.init
sections, because that's where the globals are registered (we would have to make sure the.init
code appears in our binaries and is being run by ELF loading code). I can try after this PR.
I see, thanks! It's definitely less important than adding support for stack buffers, but still nice to have.
common/include/asan.h, line 91 at r5 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Ugly, isn't it? :)
Initially, I wanted this to be configurable (compile with
-DASAN_SHADOW_START=0x...
), but because of this caveat, I think it's better to just hardcode the address here.
Hmm, why OR doesn't work in our case?
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Shouldn't we fail setting up the enclave if there is something already mapped there?
Uh, that sounds really bad. I guess the reason to not use it was that we dropped Ubuntu 16.04 only recently, and AFAIR it was too old for it. Would you have a moment to submit a PR fixing this? You already have a test which for it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)
common/include/asan.h, line 91 at r5 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Hmm, why OR doesn't work in our case?
The formula is shadow_start | (addr >> shift)
, so shadow_start
must be high enough so that (addr >> 3)
doesn't ever have that bit set. Otherwise the result will not only be different than with addition, but also it will conflate unrelated locations.
(I actually saw this break with shadow_start == 1 TB
: ASan didn't always trigger for a buffer overflow.)
But something like shadow_start == (1 << 44)
would work... I'll try it. LLVM code says they want to use OR whenever possible because it's faster.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
Shouldn't we fail setting up the enclave if there is something already mapped there?
Uh, that sounds really bad. I guess the reason to not use it was that we dropped Ubuntu 16.04 only recently, and AFAIR it was too old for it. Would you have a moment to submit a PR fixing this? You already have a test which for it :)
OK, I'll submit a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r3, 5 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)
a discussion (no related file):
@pwmarcz Could you please check why the Docs Jenkins failed?
common/include/asan.h, line 91 at r5 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
The formula is
shadow_start | (addr >> shift)
, soshadow_start
must be high enough so that(addr >> 3)
doesn't ever have that bit set. Otherwise the result will not only be different than with addition, but also it will conflate unrelated locations.(I actually saw this break with
shadow_start == 1 TB
: ASan didn't always trigger for a buffer overflow.)But something like
shadow_start == (1 << 44)
would work... I'll try it. LLVM code says they want to use OR whenever possible because it's faster.
We don't really care about performance of ASan, right? I am perfectly fine with the current 1.5TB shadow start.
Pal/src/slab.c, line 130 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
OK.
BTW, we can just remove this STATIC_SLAB
macro completely. Gramine doesn't work anyway with STATIC_SLAB = 0
.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
OK, I'll submit a PR.
@mkow is exactly correct, we couldn't use MAP_FIXED_NOREPLACE
on Ubuntu 16.04. @pwmarcz Please submit a PR.
fa94137
to
7e8c344
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 16 of 29 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @mkow, and @pwmarcz)
a discussion (no related file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@pwmarcz Could you please check why the Docs Jenkins failed?
It's a problem with Python module, already resolved on master. I'll rebase the branch, we had significant changes since.
meson.build, line 110 at r2 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
It seems that we'll merge that change before this one, so I'm blocking to not forget about this while rebasing.
Rebased and updated.
common/include/asan.h, line 91 at r5 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
We don't really care about performance of ASan, right? I am perfectly fine with the current 1.5TB shadow start.
It would be nice to have the tests run faster... but I just checked that, and using a power of 2 (1 << 44
) makes no significant difference for LibOS make regression
CPU time. (Either way, the "user" time is about 2x the non-ASan case).
So I think keeping 1.5 TB is better. Since LLVM happily did the wrong thing for a low power of 2, I think it's better to stay clear of that case.
I updated the comment to explain the situation better.
Pal/src/slab.c, line 130 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
BTW, we can just remove this
STATIC_SLAB
macro completely. Gramine doesn't work anyway withSTATIC_SLAB = 0
.
Good to know. Submitted in #95, and I can rebase either branch once the other one is merged.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, dimakuv (Dmitrii Kuvaiskii) wrote…
@mkow is exactly correct, we couldn't use
MAP_FIXED_NOREPLACE
on Ubuntu 16.04. @pwmarcz Please submit a PR.
Actually MAP_FIXED_NOREPLACE
is not available on Ubuntu 18.04 either... It uses glibc 2.27, which doesn't define it, and Linux 4.15, which doesn't have it yet.
I'll drop it for now, and also remove it from the ASan code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow and @pwmarcz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)
common/include/asan.h, line 91 at r5 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
It would be nice to have the tests run faster... but I just checked that, and using a power of 2 (
1 << 44
) makes no significant difference for LibOSmake regression
CPU time. (Either way, the "user" time is about 2x the non-ASan case).So I think keeping 1.5 TB is better. Since LLVM happily did the wrong thing for a low power of 2, I think it's better to stay clear of that case.
I updated the comment to explain the situation better.
Ah, I see. Weird and risky optimization...
+1 to the current version then.
Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
Actually
MAP_FIXED_NOREPLACE
is not available on Ubuntu 18.04 either... It uses glibc 2.27, which doesn't define it, and Linux 4.15, which doesn't have it yet.I'll drop it for now, and also remove it from the ASan code.
Too bad :(
As with UBSan, this adds our own handlers to Graphene's common library. The current version of ASan integration supports heap allocation only. Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
3bb2f56
to
de42e84
Compare
Jenkins, retest Jenkins-Direct-Sanitizers please (timeout on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r8, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)
meson.build, line 109 at r8 (raw file):
if asan if meson.get_compiler('c').get_id() != 'clang' or meson.get_compiler('cpp').get_id() != 'clang'
I just noticed this, shouldn't this be clang++
for C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @mkow)
meson.build, line 109 at r8 (raw file):
Previously, mkow (Michał Kowalczyk) wrote…
I just noticed this, shouldn't this be
clang++
for C++?
No. The "compiler ID" returned by Meson is clang
. I checked that manually, but also otherwise it would've failed Jenkins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL)
meson.build, line 109 at r8 (raw file):
Previously, pwmarcz (Paweł Marczewski) wrote…
No. The "compiler ID" returned by Meson is
clang
. I checked that manually, but also otherwise it would've failed Jenkins.
Huh, weird. And yes, you are right, it would have failed on Jenkins :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r8, all commit messages.
Reviewable status:complete! all files reviewed, all discussions resolved
Description of the changes
As with UBSan, this adds our own handlers to Gramine's common library. The current version of ASan integration supports heap allocation only.
Tracking issue: #19 (but I don't consider this done yet, see "What's next")
How to test this PR?
Compile with Clang (
CC=clang CXX=clang++ AS=clang
) and ASan (ASAN=1
andmeson -Dasan=enabled
).Run the regression test:
graphene-direct run_test asan_buffer_overflow
This is the test code:
It should abort with the following output:
Why implement our own hooks?
An alternative was to take LLVM's libasan and try to adapt it. I decided against it because a lot of patching would be necessary to make it work under no-stdlib, and we do a lot of nonstandard things (our own malloc, thread management, etc.) I believe I would spend a lot more time for not necessarily more functionality and a pretty messy result.
A better template for what I'm doing is Linux KASAN (I also saw some attempts for FreeBSD), which uses a similar compiler mode, but implements its own hooks. We're not implementing a kernel, but we also have a no-stdlib setting.
What's next
Better source information
We have the offending location (IP register), but unfortunately nothing else (no function, file, line, etc.), that's why I included the advice to use GDB. But it would be useful if, for example, Jenkins output included at least the source location.
(Note that we have the
debug_map
structure, so we can also recover ELF file + offset).My best idea is to fork and exec
add2line
. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.(LLVM's libasan actually uses llvm-symbolizer which is similar to addr2line).
I guess we could also dump core and exec
gdb
? This would allow us to display the full traceback. But it's not as easy as dumping core, we need to also include information about how the ELF files are mapped.Sanitize stack
ASan can be also turned on for stack: it will detect bugs like use-after-scope, use-after-return, and overflows for buffers allocated with
alloca
. This is very helpful! I fixed a few bugs found that way (see #18).However, this also requires cleaning (unpoisoning) all stack memory that we recycle or unmap. Two problems I see with that:
There is the
shim_entry_api
for running LibOS functions from application (used mostly for glibc integration). These functions run on the user stack. I guess we should switch stacks for them as well.I'm not sure which stack(s) do we use when handling signals?
This change is![Reviewable](https://camo.githubusercontent.com/23b05f5fb48215c989e92cc44cf6512512d083132bd3daf689867c8d9d386888/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)