Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First version of AddressSanitizer #32

Merged
merged 1 commit into from
Sep 28, 2021
Merged

First version of AddressSanitizer #32

merged 1 commit into from
Sep 28, 2021

Conversation

pwmarcz
Copy link
Contributor

@pwmarcz pwmarcz commented Sep 13, 2021

Description of the changes

As with UBSan, this adds our own handlers to Gramine's common library. The current version of ASan integration supports heap allocation only.

Tracking issue: #19 (but I don't consider this done yet, see "What's next")

How to test this PR?

Compile with Clang (CC=clang CXX=clang++ AS=clang) and ASan (ASAN=1 and meson -Dasan=enabled).

Run the regression test: graphene-direct run_test asan_buffer_overflow

This is the test code:

    uint8_t* buf = malloc(30);
    buf[30] = 1;
    free(buf);
    return 0;

It should abort with the following output:

$ graphene-direct run_test asan_buffer_overflow
error: Using insecure argv source. Gramine will continue application execution, but this configuration must not be used in production!
[P1:T1:run_test] run_test("asan_buffer_overflow") ...
[P1:T1:run_test] error: asan: heap-buffer-overflow while trying to store 1 bytes at 0x7ad88073014e
[P1:T1:run_test] error: asan: the bad address is 0x7ad88073014e (0 from beginning)
[P1:T1:run_test] error: asan: IP = 0x7ad880a0bce3 (for a full traceback, use GDB with a breakpoint at "shim_abort")
[P1:T1:run_test] error: asan:
[P1:T1:run_test] error: asan: 0x7ad88072ff00  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad88072ff80  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad880730000  fa fa fa fa fa fa fd fd fd fd fa fa fa fa fd fd
[P1:T1:run_test] error: asan: 0x7ad880730080  fd fd fa fa fa fa fd fd fd fd fa fa fa fa 00 00
[P1:T1:run_test] error: asan: 0x7ad880730100  00 00 fa fa fa fa 00 00 00[06]fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad880730180  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad880730200  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad880730280  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan: 0x7ad880730300  fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
[P1:T1:run_test] error: asan:
[P1:T1:run_test] error: asan: shadow byte legend (1 shadow byte = 8 application bytes):
[P1:T1:run_test] error: asan:           addressable: 00
[P1:T1:run_test] error: asan: partially addressable: 01..07
[P1:T1:run_test] error: asan:     heap left redzone: fa
[P1:T1:run_test] error: asan:     freed heap region: fd

Why implement our own hooks?

An alternative was to take LLVM's libasan and try to adapt it. I decided against it because a lot of patching would be necessary to make it work under no-stdlib, and we do a lot of nonstandard things (our own malloc, thread management, etc.) I believe I would spend a lot more time for not necessarily more functionality and a pretty messy result.

A better template for what I'm doing is Linux KASAN (I also saw some attempts for FreeBSD), which uses a similar compiler mode, but implements its own hooks. We're not implementing a kernel, but we also have a no-stdlib setting.

What's next

Better source information

We have the offending location (IP register), but unfortunately nothing else (no function, file, line, etc.), that's why I included the advice to use GDB. But it would be useful if, for example, Jenkins output included at least the source location.

(Note that we have the debug_map structure, so we can also recover ELF file + offset).

My best idea is to fork and exec add2line. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.

(LLVM's libasan actually uses llvm-symbolizer which is similar to addr2line).

I guess we could also dump core and exec gdb? This would allow us to display the full traceback. But it's not as easy as dumping core, we need to also include information about how the ELF files are mapped.

Sanitize stack

ASan can be also turned on for stack: it will detect bugs like use-after-scope, use-after-return, and overflows for buffers allocated with alloca. This is very helpful! I fixed a few bugs found that way (see #18).

However, this also requires cleaning (unpoisoning) all stack memory that we recycle or unmap. Two problems I see with that:

  • There is the shim_entry_api for running LibOS functions from application (used mostly for glibc integration). These functions run on the user stack. I guess we should switch stacks for them as well.

  • I'm not sure which stack(s) do we use when handling signals?


This change is Reviewable

Copy link
Contributor

@boryspoplawski boryspoplawski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 24 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: ITL)

a discussion (no related file):
I did not review this PR (since it's WIP), but please post bug fixes as a separate PR (once you're done).


@pwmarcz pwmarcz force-pushed the pawel/asan-start branch 4 times, most recently from c2af8d4 to 78269e5 Compare September 14, 2021 17:36
@pwmarcz pwmarcz changed the title [WIP] ASan First version of AddressSanitizer Sep 14, 2021
@pwmarcz pwmarcz marked this pull request as ready for review September 14, 2021 17:36
@mkow mkow mentioned this pull request Sep 20, 2021
7 tasks
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 24 files at r1, 18 of 18 files at r2, all commit messages.
Reviewable status: all files reviewed, 25 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)

a discussion (no related file):

Tracking issue: #19 (but I don't consider this done yet, see "What's next")

FYI: I tweaked the checklist there to reflect the actual implementation order.


a discussion (no related file):

My best idea is to fork and exec add2line. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.

Quite hacky, but fine with me, it's just test code. And if in the future it turns out that we really need the full traceback then we can implement the GDB version.


a discussion (no related file):

However, this also requires cleaning (unpoisoning) all stack memory that we recycle or unmap. Two problems I see with that:

  • There is the shim_entry_api for running LibOS functions from application (used mostly for glibc integration). These functions run on the user stack. I guess we should switch stacks for them as well.
  • I'm not sure which stack(s) do we use when handling signals?

This is something for @boryspoplawski to answer.



meson.build, line 110 at r2 (raw file):

if asan
    # TODO: When we start compiling C++ code with Meson, check for C++ compiler as well

It seems that we'll merge that change before this one, so I'm blocking to not forget about this while rebasing.


meson.build, line 120 at r2 (raw file):

        '-mllvm', '-asan-use-after-return=0',
        '-mllvm', '-asan-stack=0',
        '-mllvm', '-asan-globals=0',

What about globals? Is it also hard to support? You didn't mention them in the PR description.


.ci/linux-sgx-sanitizers.jenkinsfile, line 16 at r2 (raw file):

        load '.ci/lib/config-debug.jenkinsfile'
        load '.ci/lib/config-ubsan.jenkinsfile'
        load '.ci/lib/config-asan.jenkinsfile'

What about linux-direct-sanitizers.jenkinsfile? Doesn't it need a similar change?


common/include/api.h, line 233 at r2 (raw file):

void* __memset_chk(void* dest, int ch, size_t count, size_t dest_count);

/* Original version of functions that ASan overrides */

version -> versions


common/include/asan.h, line 9 at r2 (raw file):

 * This file defines functions for address sanitization (ASan).
 *
 * Normally, code compiled with ASAN is linked against a special library (libasan), but that library

ASAN -> ASan


common/src/asan.c, line 1 at r2 (raw file):

missing license and copyright info


common/src/asan.c, line 10 at r2 (raw file):

#endif

#define RETURN_ADDR __builtin_extract_return_addr(__builtin_return_address(0))

This should probably be parenthesized? Also, I'd prefer to add empty parenthesis to the macro so that it doesn't look like a constant.


common/src/asan.c, line 64 at r2 (raw file):

        if (asan_check(bad_addr))
            break;

Just for sanity, could you assert asan_check(bad_addr) here? It's on the slow path, so it won't matter performance-wise and we unfortunately don't have for-else in C.


common/src/asan.c, line 125 at r2 (raw file):

__attribute_no_sanitize_address
static void asan_report(void* ip_addr, uintptr_t addr, size_t size, bool is_load) {

no need for this empty line


common/src/asan.c, line 131 at r2 (raw file):

    log_error("asan: %s while trying to %s %lu bytes at 0x%lx", bug_type,
              is_load ? "load" : "store", size, addr);

Could we replace "bytes" with "%s" and use singular version for single-byte stores? Not blocking, but the reports look really nice and this was the only thing which didn't look so nicely :)


common/src/asan.c, line 132 at r2 (raw file):

from beginning

This was a bit confusing to me, I had to look at the code to understand beginning of what this is. Maybe something like "beginning of access" or "beginning of memory access" would be better?


common/src/asan.c, line 168 at r2 (raw file):

}

#define ASAN_LOAD(addr, size)          \

please align this backslash


common/src/asan.c, line 171 at r2 (raw file):

    do {                                                \
        if (asan_check_region(addr, size)) {            \
            asan_report(RETURN_ADDR, addr, size, true); \

please use named arg for the bool one


common/src/asan.c, line 176 at r2 (raw file):

    } while(0)

#define ASAN_STORE(addr, size)          \

ditto


common/src/asan.c, line 179 at r2 (raw file):

    do {                                                 \
        if (asan_check_region(addr, size)) {             \
            asan_report(RETURN_ADDR, addr, size, false); \

ditto (named arg)


common/src/asan.c, line 192 at r2 (raw file):

    }                                                          \
    void __asan_report_load##size(uintptr_t addr) {            \
        asan_report(RETURN_ADDR, addr, size, true);            \

ditto (named arg)


common/src/asan.c, line 196 at r2 (raw file):

    }                                                          \
    void __asan_report_store##size(uintptr_t addr) {           \
        asan_report(RETURN_ADDR, addr, size, false);           \

ditto (named arg)


common/src/asan.c, line 215 at r2 (raw file):

void __asan_report_load_n(uintptr_t addr, size_t size) {
    asan_report(RETURN_ADDR, addr, size, true);

ditto (named arg)


common/src/asan.c, line 220 at r2 (raw file):

void __asan_report_store_n(uintptr_t addr, size_t size) {
    asan_report(RETURN_ADDR, addr, size, false);

ditto (named arg)


common/src/asan.c, line 259 at r2 (raw file):

__attribute__((alias("__asan_memmove")))
void* memmove(void*, const void*, uintptr_t);

uintptr_t? not size_t?


common/src/asan.c, line 269 at r2 (raw file):

int memcmp(const void* lhs, const void* rhs, size_t count) {
    ASAN_LOAD((uintptr_t)lhs, count);
    ASAN_LOAD((uintptr_t)lhs, count);

Gotcha! Should be rhs ;)


LibOS/shim/test/regression/test_libos.py, line 31 at r2 (raw file):

    @unittest.skipUnless(os.environ.get('ASAN') == '1', 'test only enabled with ASAN=1')
    def test_020_asan(self):

test_020 -> test_021


Pal/src/slab.c, line 130 at r2 (raw file):

        INIT_FAIL(PAL_ERROR_NOMEM, "cannot initialize slab manager");
#if STATIC_SLAB == 1
#ifdef ASAN

I think you can mix this into just one #if using defined().


Pal/src/host/Linux-SGX/Makefile, line 136 at r2 (raw file):

	$(call cmd,cpp_s_S)

# Add `-Wunused-command-line-argument`, because with ASAN, clang complains that LLVM flags

ASAN -> ASan
clang -> Clang

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 24 files at r1, 14 of 18 files at r2, all commit messages.
Reviewable status: all files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)

a discussion (no related file):
One of the most impressive PRs I've seen!



meson.build, line 112 at r2 (raw file):

    # TODO: When we start compiling C++ code with Meson, check for C++ compiler as well
    if meson.get_compiler('c').get_id() != 'clang'
        error('ASan is currently supported only for Clang (CC=clang)')

What is the exact reason we only support ASan on Clang?


common/include/asan.h, line 95 at r2 (raw file):

#define ASAN_SHADOW_START 0x7fff8000
#define ASAN_SHADOW_SHIFT 3
#define ASAN_SHADOW_LENGTH (1ULL << 44)

How does this work? We start our shadow memory at 2GB (0x7fff8000) and this shadow extends over 16TBs (1ULL << 44)??

I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)? But how can this work for e.g. Linux-SGX?


common/include/asan.h, line 123 at r2 (raw file):

 * Load/store callbacks:
 *
 * - `load` / `store`: check if an memory under given address is accessible; if not, report the

an memory -> memory


common/src/asan.c, line 40 at r2 (raw file):

    _real_memset(shadow_ptr, 0, shadow_size);
    if (right_part_size)
        *(shadow_ptr + shadow_size) = right_part_size;

What is this magic with the right part of the memory-region-to-unpoison?

UPDATE: Looks like this shows us how many bytes are allocated in this 8-byte "end of object" (for non-8-16-24-32-...-byte-sized objects).


common/src/asan.c, line 68 at r2 (raw file):

    /* If this is a partial right redzone, check the next byte */
    if (*shadow_ptr < 0x80)
        shadow_ptr++;

I don't get it. Why is it guaranteed that after an 8-byte partially allocated object, there will be a redzone (== shadow memory contains the description of what kind of redzone)?

Is this because you added that padding[8] to any allocated memory object (memobj), so there is always at least 8 bytes in-between allocations? And these 8 bytes are filled with ASAN_POISON_HEAP_LEFT_REDZONE at the host-allocation PAL level?


common/src/asan.c, line 267 at r2 (raw file):

}

int memcmp(const void* lhs, const void* rhs, size_t count) {

What's the deal with memcmp()? Why is it not __asan_memcmp?


Pal/src/slab.c, line 81 at r2 (raw file):

    }
#ifdef ASAN
    asan_poison_region((uintptr_t)addr, ALLOC_ALIGN_UP(size), ASAN_POISON_HEAP_LEFT_REDZONE);

This is PAL-internal memory allocation. It is different from the app+LibOS memory allocation. I wonder if we could have different ASAN_POISON_... macro for it? Maybe not in this PR, but in the future?


Pal/src/slab.c, line 106 at r2 (raw file):

        }
        /* not a last object from low/high addresses, can't do anything about this case */
        /* ASAN: keep the memory poisoned, because we're not returning it to the system */

I don't understand this comment. (This case is the counterpart to the above malloc case: this is memory allocated previously from the g_mem_pool. Why is memory in g_mem_pool always poisoned?)


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

    int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE | MAP_FIXED_NOREPLACE;
    void* addr = (void*)DO_SYSCALL(mmap, (void*)ASAN_SHADOW_START, ASAN_SHADOW_LENGTH, prot, flags,
                                   /*fd=*/-1, /*offset=*/0);

This carves out a memory region (enclave or untrusted) starting from 2GB and till 16TB. This means that e.g. enclaves with sgx.non_pie = true and sgx.enclave_size = 4G will fail to start -- because they overlap with the shadow memory region of ASan.

Are there any other constants we could use? For example, starting shadow memory from e.g. 64GB or 1TB would be no problem at all (most enclaves I've seen have enclave size of max 32GB).

Otherwise, we have to put some explanations in the docs, that some combinations of SGX enclaves and ASan cannot work together.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv and @pwmarcz)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

One of the most impressive PRs I've seen!

+1, great work!



common/include/asan.h, line 95 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

How does this work? We start our shadow memory at 2GB (0x7fff8000) and this shadow extends over 16TBs (1ULL << 44)??

I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)? But how can this work for e.g. Linux-SGX?

This relies on Linux' lazy allocation of physical memory. Note that on SGX we allocate this memory outside the enclave and then just use it from inside - this is only a debugging feature, so this is perfectly fine.


common/src/asan.c, line 40 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What is this magic with the right part of the memory-region-to-unpoison?

UPDATE: Looks like this shows us how many bytes are allocated in this 8-byte "end of object" (for non-8-16-24-32-...-byte-sized objects).

Yup. There's a nice explanation in asan.h for this - "A low value (01..07) means only the first N bytes are accessible.".


common/src/asan.c, line 68 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't get it. Why is it guaranteed that after an 8-byte partially allocated object, there will be a redzone (== shadow memory contains the description of what kind of redzone)?

Is this because you added that padding[8] to any allocated memory object (memobj), so there is always at least 8 bytes in-between allocations? And these 8 bytes are filled with ASAN_POISON_HEAP_LEFT_REDZONE at the host-allocation PAL level?

I think this is just a heuristic - we hit an invalid access for a byte we don't have info about, so we just try to take it from the next one, hoping that it's the same. But let's wait for Paweł's answer.

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 22 of 29 files reviewed, 34 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @pwmarcz)

a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

My best idea is to fork and exec add2line. This is messy, but I tried it out and it works: we can report source file, line, and function. We're doing it just before aborting the whole process anyway.

Quite hacky, but fine with me, it's just test code. And if in the future it turns out that we really need the full traceback then we can implement the GDB version.

Let's do it after this PR then; I have some proof-of-concept lying around but there might be some discussions about how to make it less messy.


a discussion (no related file):

Previously, mkow (Michał Kowalczyk) wrote…

+1, great work!

Thanks!



meson.build, line 112 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What is the exact reason we only support ASan on Clang?

This is from memory, but:

  • Obviously the -mllvm ... flags are Clang-specific
  • Not sure if it actually supports stack sanitization, etc.
  • ASan is only on the newer GCC (the one in Ubuntu 20.04). That means at least making a more specific #ifdef for the no_sanitize("undefined") attribute
  • I think it uses only the __asan_* etc. callbacks; doesn't generate inline code (so it should be slower, but actually easier to support)

I think it wouldn't be too hard to support GCC ASan, but without a Jenkins build it's likely to break (same as our Clang support broke multiple times before we added a Jenkins build). Since it would require at least some branches (ifdefs, conditions in Meson, etc), I don't think it's worth it, because GCC sanitizers are usually weaker anyway.


meson.build, line 120 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What about globals? Is it also hard to support? You didn't mention them in the PR description.

From what I understand from libasan code, tracking globals is (1) being able to track origin of memory (which our integration doesn't do), and (2) setting up a red zone around globals (which might be useful to catch buffer overflows).

I think the main obstacle is supporting .init sections, because that's where the globals are registered (we would have to make sure the .init code appears in our binaries and is being run by ELF loading code). I can try after this PR.


.ci/linux-sgx-sanitizers.jenkinsfile, line 16 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

What about linux-direct-sanitizers.jenkinsfile? Doesn't it need a similar change?

Good catch! Added, I hope this doesn't break Jenkins.


common/include/api.h, line 233 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

version -> versions

Done.


common/include/asan.h, line 9 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ASAN -> ASan

Done.


common/include/asan.h, line 95 at r2 (raw file):

How does this work? We start our shadow memory at 2GB (0x7fff8000) and this shadow extends over 16TBs (1ULL << 44)??

Yes.

I guess the idea is that it works for the first 2GB of code/data (because executable is typically placed there) and also for libs/stack (because they are typically allocated from bits 47-48)?

Yeah, looks like LLVM is betting on the fact that this area (2GB +16TB) is usually unmapped... I don't know how reliable that assumption is.

But how can this work for e.g. Linux-SGX?

Good point, it's going to break for larger enclaves. Let's continue in the thread for sgx_main.c.


common/include/asan.h, line 123 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

an memory -> memory

Fixed.


common/src/asan.c, line 1 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

missing license and copyright info

Added.


common/src/asan.c, line 10 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

This should probably be parenthesized? Also, I'd prefer to add empty parenthesis to the macro so that it doesn't look like a constant.

Done.


common/src/asan.c, line 64 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Just for sanity, could you assert asan_check(bad_addr) here? It's on the slow path, so it won't matter performance-wise and we unfortunately don't have for-else in C.

Done.


common/src/asan.c, line 68 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think this is just a heuristic - we hit an invalid access for a byte we don't have info about, so we just try to take it from the next one, hoping that it's the same. But let's wait for Paweł's answer.

Yes, this is just a heuristic, taken from LLVM. I think in rare cases it might fail (e.g. we're right at the end of a heap page, so the next shadow byte might be 0).


common/src/asan.c, line 125 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

no need for this empty line

Fixed.


common/src/asan.c, line 131 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could we replace "bytes" with "%s" and use singular version for single-byte stores? Not blocking, but the reports look really nice and this was the only thing which didn't look so nicely :)

Done.


common/src/asan.c, line 132 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

from beginning

This was a bit confusing to me, I had to look at the code to understand beginning of what this is. Maybe something like "beginning of access" or "beginning of memory access" would be better?

Changed to "from beginning of access".


common/src/asan.c, line 168 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please align this backslash

Fixed.


common/src/asan.c, line 171 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

please use named arg for the bool one

Done.


common/src/asan.c, line 176 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto

Done.


common/src/asan.c, line 179 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg)

Done.


common/src/asan.c, line 192 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg)

Done.


common/src/asan.c, line 196 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg)

Done.


common/src/asan.c, line 215 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg)

Done.


common/src/asan.c, line 220 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ditto (named arg)

Done.


common/src/asan.c, line 259 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

uintptr_t? not size_t?

Right, fixed.

(LLVM uses the same type, uptr, for addresses and sizes...)


common/src/asan.c, line 267 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

What's the deal with memcmp()? Why is it not __asan_memcmp?

The ASAN interface requires just these three.

Actually, I'll reorganize it so that __asan_* are aliases, and keep them separate. That should make the code clearer.


common/src/asan.c, line 269 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Gotcha! Should be rhs ;)

Fixed. I wonder if we'll find any more bugs with that.


LibOS/shim/test/regression/test_libos.py, line 31 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

test_020 -> test_021

Done.


Pal/src/slab.c, line 81 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This is PAL-internal memory allocation. It is different from the app+LibOS memory allocation. I wonder if we could have different ASAN_POISON_... macro for it? Maybe not in this PR, but in the future?

Probably we could, but sometimes LLVM generates inline poisoning code (mostly for stack). So I think it's best to treat these constants as part of ASan's interface, however badly defined it is.


Pal/src/slab.c, line 106 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

I don't understand this comment. (This case is the counterpart to the above malloc case: this is memory allocated previously from the g_mem_pool. Why is memory in g_mem_pool always poisoned?)

At this point all of the memory should be poisoned by init_slab_mgr, or by slab_free (after freeing individual objects). But now that I look at it, it's confusing - better to have an explicit poison command here.

As for why I want unused parts of g_mem_pool poisoned (unlike memory returned with DkVirtualMemoryFree) instead of unpoisoning it, it's because I can: nothing other than the allocator will use it, so any access to the unused parts is a bug.

Changed to an explicit asan_poison with ASAN_POISON_HEAP_LEFT_REDZONE.


Pal/src/slab.c, line 130 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I think you can mix this into just one #if using defined().

OK.


Pal/src/host/Linux-SGX/Makefile, line 136 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

ASAN -> ASan
clang -> Clang

Done.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This carves out a memory region (enclave or untrusted) starting from 2GB and till 16TB. This means that e.g. enclaves with sgx.non_pie = true and sgx.enclave_size = 4G will fail to start -- because they overlap with the shadow memory region of ASan.

Are there any other constants we could use? For example, starting shadow memory from e.g. 64GB or 1TB would be no problem at all (most enclaves I've seen have enclave size of max 32GB).

Otherwise, we have to put some explanations in the docs, that some combinations of SGX enclaves and ASan cannot work together.

Yeah, I tried out sgx.enclave_size = "4G" and it's even worse: it silently maps the enclave over the first 2GB of the shadow map.

Shouldn't we fail setting up the enclave if there is something already mapped there? I'm looking at add_pages_to_enclave, and could use MAP_FIXED_NOREPLACE instead of MAP_FIXED. Although there is also the "older drivers" section which maps pages through ioctl to the driver, not sure how it behaves...

I will try using a starting address of 1TB.

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 20 of 29 files reviewed, 35 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv, @mkow, and @pwmarcz)


common/include/asan.h, line 91 at r5 (raw file):

 *
 * The address has to be provided to the compiler using `-mllvm -asan-mapping-offset=0x...`.
 * BEWARE: You cannot use a power of 2, because LLVM special-cases such starting addresses and emits

Ugly, isn't it? :)

Initially, I wanted this to be configurable (compile with -DASAN_SHADOW_START=0x...), but because of this caveat, I think it's better to just hardcode the address here.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Yeah, I tried out sgx.enclave_size = "4G" and it's even worse: it silently maps the enclave over the first 2GB of the shadow map.

Shouldn't we fail setting up the enclave if there is something already mapped there? I'm looking at add_pages_to_enclave, and could use MAP_FIXED_NOREPLACE instead of MAP_FIXED. Although there is also the "older drivers" section which maps pages through ioctl to the driver, not sure how it behaves...

I will try using a starting address of 1TB.

Done.

mkow
mkow previously approved these changes Sep 25, 2021
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, 5 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)


meson.build, line 112 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

This is from memory, but:

  • Obviously the -mllvm ... flags are Clang-specific
  • Not sure if it actually supports stack sanitization, etc.
  • ASan is only on the newer GCC (the one in Ubuntu 20.04). That means at least making a more specific #ifdef for the no_sanitize("undefined") attribute
  • I think it uses only the __asan_* etc. callbacks; doesn't generate inline code (so it should be slower, but actually easier to support)

I think it wouldn't be too hard to support GCC ASan, but without a Jenkins build it's likely to break (same as our Clang support broke multiple times before we added a Jenkins build). Since it would require at least some branches (ifdefs, conditions in Meson, etc), I don't think it's worth it, because GCC sanitizers are usually weaker anyway.

+1, doesn't seem to be worth the effort


meson.build, line 120 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

From what I understand from libasan code, tracking globals is (1) being able to track origin of memory (which our integration doesn't do), and (2) setting up a red zone around globals (which might be useful to catch buffer overflows).

I think the main obstacle is supporting .init sections, because that's where the globals are registered (we would have to make sure the .init code appears in our binaries and is being run by ELF loading code). I can try after this PR.

I see, thanks! It's definitely less important than adding support for stack buffers, but still nice to have.


common/include/asan.h, line 91 at r5 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Ugly, isn't it? :)

Initially, I wanted this to be configurable (compile with -DASAN_SHADOW_START=0x...), but because of this caveat, I think it's better to just hardcode the address here.

Hmm, why OR doesn't work in our case?


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Shouldn't we fail setting up the enclave if there is something already mapped there?

Uh, that sounds really bad. I guess the reason to not use it was that we dropped Ubuntu 16.04 only recently, and AFAIR it was too old for it. Would you have a moment to submit a PR fixing this? You already have a test which for it :)

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 12 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @pwmarcz)


common/include/asan.h, line 91 at r5 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Hmm, why OR doesn't work in our case?

The formula is shadow_start | (addr >> shift), so shadow_start must be high enough so that (addr >> 3) doesn't ever have that bit set. Otherwise the result will not only be different than with addition, but also it will conflate unrelated locations.

(I actually saw this break with shadow_start == 1 TB: ASan didn't always trigger for a buffer overflow.)

But something like shadow_start == (1 << 44) would work... I'll try it. LLVM code says they want to use OR whenever possible because it's faster.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Shouldn't we fail setting up the enclave if there is something already mapped there?

Uh, that sounds really bad. I guess the reason to not use it was that we dropped Ubuntu 16.04 only recently, and AFAIR it was too old for it. Would you have a moment to submit a PR fixing this? You already have a test which for it :)

OK, I'll submit a PR.

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, 5 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: all files reviewed, 5 unresolved discussions, not enough approvals from different teams (1 more required, approved so far: ITL), "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)

a discussion (no related file):
@pwmarcz Could you please check why the Docs Jenkins failed?



common/include/asan.h, line 91 at r5 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

The formula is shadow_start | (addr >> shift), so shadow_start must be high enough so that (addr >> 3) doesn't ever have that bit set. Otherwise the result will not only be different than with addition, but also it will conflate unrelated locations.

(I actually saw this break with shadow_start == 1 TB: ASan didn't always trigger for a buffer overflow.)

But something like shadow_start == (1 << 44) would work... I'll try it. LLVM code says they want to use OR whenever possible because it's faster.

We don't really care about performance of ASan, right? I am perfectly fine with the current 1.5TB shadow start.


Pal/src/slab.c, line 130 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

OK.

BTW, we can just remove this STATIC_SLAB macro completely. Gramine doesn't work anyway with STATIC_SLAB = 0.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

OK, I'll submit a PR.

@mkow is exactly correct, we couldn't use MAP_FIXED_NOREPLACE on Ubuntu 16.04. @pwmarcz Please submit a PR.

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 16 of 29 files reviewed, 3 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @dimakuv, @mkow, and @pwmarcz)

a discussion (no related file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@pwmarcz Could you please check why the Docs Jenkins failed?

It's a problem with Python module, already resolved on master. I'll rebase the branch, we had significant changes since.



meson.build, line 110 at r2 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

It seems that we'll merge that change before this one, so I'm blocking to not forget about this while rebasing.

Rebased and updated.


common/include/asan.h, line 91 at r5 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

We don't really care about performance of ASan, right? I am perfectly fine with the current 1.5TB shadow start.

It would be nice to have the tests run faster... but I just checked that, and using a power of 2 (1 << 44) makes no significant difference for LibOS make regression CPU time. (Either way, the "user" time is about 2x the non-ASan case).

So I think keeping 1.5 TB is better. Since LLVM happily did the wrong thing for a low power of 2, I think it's better to stay clear of that case.

I updated the comment to explain the situation better.


Pal/src/slab.c, line 130 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

BTW, we can just remove this STATIC_SLAB macro completely. Gramine doesn't work anyway with STATIC_SLAB = 0.

Good to know. Submitted in #95, and I can rebase either branch once the other one is merged.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

@mkow is exactly correct, we couldn't use MAP_FIXED_NOREPLACE on Ubuntu 16.04. @pwmarcz Please submit a PR.

Actually MAP_FIXED_NOREPLACE is not available on Ubuntu 18.04 either... It uses glibc 2.27, which doesn't define it, and Linux 4.15, which doesn't have it yet.

I'll drop it for now, and also remove it from the ASan code.

dimakuv
dimakuv previously approved these changes Sep 27, 2021
Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @mkow and @pwmarcz)

mkow
mkow previously approved these changes Sep 28, 2021
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r6, 1 of 1 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, "fixup! " found in commit messages' one-liners (waiting on @pwmarcz)


common/include/asan.h, line 91 at r5 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

It would be nice to have the tests run faster... but I just checked that, and using a power of 2 (1 << 44) makes no significant difference for LibOS make regression CPU time. (Either way, the "user" time is about 2x the non-ASan case).

So I think keeping 1.5 TB is better. Since LLVM happily did the wrong thing for a low power of 2, I think it's better to stay clear of that case.

I updated the comment to explain the situation better.

Ah, I see. Weird and risky optimization...
+1 to the current version then.


Pal/src/host/Linux-SGX/sgx_main.c, line 1083 at r2 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

Actually MAP_FIXED_NOREPLACE is not available on Ubuntu 18.04 either... It uses glibc 2.27, which doesn't define it, and Linux 4.15, which doesn't have it yet.

I'll drop it for now, and also remove it from the ASan code.

Too bad :(

As with UBSan, this adds our own handlers to Graphene's common library.
The current version of ASan integration supports heap allocation only.

Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
@pwmarcz
Copy link
Contributor Author

pwmarcz commented Sep 28, 2021

Jenkins, retest Jenkins-Direct-Sanitizers please (timeout on fcntl14 and sendfile09 LTP tests)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r8, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @pwmarcz)


meson.build, line 109 at r8 (raw file):

if asan
    if meson.get_compiler('c').get_id() != 'clang' or meson.get_compiler('cpp').get_id() != 'clang'

I just noticed this, shouldn't this be clang++ for C++?

Copy link
Contributor Author

@pwmarcz pwmarcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: ITL) (waiting on @mkow)


meson.build, line 109 at r8 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

I just noticed this, shouldn't this be clang++ for C++?

No. The "compiler ID" returned by Meson is clang. I checked that manually, but also otherwise it would've failed Jenkins.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: ITL)


meson.build, line 109 at r8 (raw file):

Previously, pwmarcz (Paweł Marczewski) wrote…

No. The "compiler ID" returned by Meson is clang. I checked that manually, but also otherwise it would've failed Jenkins.

Huh, weird. And yes, you are right, it would have failed on Jenkins :)

Copy link
Contributor

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r8, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants