Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack misalignment on AVX-512 with asan + detect_stack_use_after_return=0 #91565

Open
glandium opened this issue May 9, 2024 · 2 comments
Open
Labels

Comments

@glandium
Copy link
Contributor

glandium commented May 9, 2024

Self-sufficient reproducer:

void qux(int& y);
struct alignas(64) Foo {
    int a[16] = {};
};
void hoge(Foo x);
void foo() {
    int x, y, z;
    qux(x);
    qux(y);
    qux(z);
    hoge(Foo());
}

Call the above foo.cc, and add the following as main.cc for an executable reproducer:

extern "C" const char *__asan_default_options() {
    return "detect_stack_use_after_return=0";
}

void qux(int& y) {}

struct alignas(64) Foo {
    int a[16] = {};
};

void hoge(Foo x) {}

void foo();

int main() {
    foo();
    return 0;
}

Compile with clang++ -o foo foo.cc main.cc -fsanitize=address -O2 -mavx512f -g. Running the program crashes:

Program received signal SIGSEGV, Segmentation fault.
0x0000559c458fd95c in foo () at foo.cc:14
14          hoge(Foo());
(gdb) disassemble $rip,$rip+1
Dump of assembler code from 0x559c458fd95c to 0x559c458fd95d:
=> 0x0000559c458fd95c <_Z3foov+236>:    vmovaps %zmm0,(%rsp)
End of assembler dump.
(gdb) p/x $rsp
$1 = 0x7ffc7c389ce0

vmovaps with zmm registers requires 64-bytes alignment but rsp does not have enough alignment.

The relevant parts of the assembly code path:

        and     rsp, -64    // function prologue aligns the stack to 64-bytes and
        sub     rsp, 192    // resizes it in a 64-bytes-aligned way
(...)
        mov     r14, rsp
        add     r14, -96    // this is an alloca in the function prologue, triggered
                            // when __asan_stack_malloc_1 returns 0, which it does when
                            // detect_stack_use_after_return=0 or when it's out of fake
                            // stack.
        and     r14, -32    // align the stack to... 32-bytes.
        mov     rsp, r14
(...)
        sub     rsp, 64     // move the stack for argument passing. The stack is still
                            // 32-bytes aligned at this point.
(...)
        vmovaps zmmword ptr [rsp], zmm0  // copy argument to the stack, aka CRASH.

This started with 51fbab1 (#77210) and -mllvm -asan-use-stack-safety=0 fixes it.

This all comes down to the asan pass setting the alloca to 32-bytes alignment:

  %MyAlloca = alloca i8, i64 96, align 32

With -mllvm -asan-use-stack-safety=0, the alloca becomes:

  %MyAlloca = alloca i8, i64 224, align 64

It's worth noting that it's only 32 because that's the default value of asan-realign-stack. With -mllvm -asan-realign-stack=1 it gets down to 16.

Essentially, this kind of worked by chance before because the stack poisoner was doing the argument passing alloca too, so MyAlloca was sized and aligned for it. But now that's not the case anymore with stack-safety, MyAlloca doesn't contain and is not aligned for that alloca.

The function call to hoge has a proper align 64 annotation on the byval. I guess one way to look at the problem is that the function call should take that into account and realign the stack pointer. In theory, I guess this could happen in other scenarios than ASan, but I couldn't get it to happen using an actual alloca() call in C/C++. The alloca for the call to hoge always ends up before the manual alloca() in the IR in my attempts.

Another way to look at it is that MyAlloca should have extra alignment based on all the allocas, including the "non interesting" ones, but that sounds hackish.

Cc: @MaskRay

@glandium
Copy link
Contributor Author

glandium commented May 9, 2024

Actually, I just had an epiphany and ended up reproducing with alloca():
foo.cc:

#include <alloca.h>
void qux(void*);
struct alignas(64) Foo {
    int a[16] = {};
};
void hoge(Foo x);
void foo(int a) {
    void *x = a ? alloca(12) : nullptr;
    qux(x);
    hoge(Foo());
}

main.c:

void qux(void*) {}
struct alignas(64) Foo {
    int a[16] = {};
};
void hoge(Foo x) {}
void foo(int);
int main() {
    foo(1);
    return 0;
}

clang++ -o foo foo.cc main.cc -O2 -mavx512f

Program received signal SIGSEGV, Segmentation fault.
0x000055a0d0965168 in foo (a=a@entry=1) at foo.cc:14
14          hoge(Foo());
(gdb) disassemble $rip,$rip+1
Dump of assembler code from 0x55a0d0965168 to 0x55a0d0965169:
=> 0x000055a0d0965168 <_Z3fooi+56>:     vmovaps %zmm0,(%rsp)
End of assembler dump.
(gdb) p/x $rsp
$1 = 0x7fffdf60c7f0

GCC properly aligns the stack.

@llvmbot
Copy link
Collaborator

llvmbot commented May 9, 2024

@llvm/issue-subscribers-backend-x86

Author: Mike Hommey (glandium)

Self-sufficient reproducer: ``` void qux(int& y); struct alignas(64) Foo { int a[16] = {}; }; void hoge(Foo x); void foo() { int x, y, z; qux(x); qux(y); qux(z); hoge(Foo()); } ``` Call the above `foo.cc`, and add the following as `main.cc` for an executable reproducer: ``` extern "C" const char *__asan_default_options() { return "detect_stack_use_after_return=0"; }

void qux(int& y) {}

struct alignas(64) Foo {
int a[16] = {};
};

void hoge(Foo x) {}

void foo();

int main() {
foo();
return 0;
}

Compile with `clang++ -o foo foo.cc main.cc -fsanitize=address -O2 -mavx512f -g`. Running the program crashes:

Program received signal SIGSEGV, Segmentation fault.
0x0000559c458fd95c in foo () at foo.cc:14
14 hoge(Foo());
(gdb) disassemble $rip,$rip+1
Dump of assembler code from 0x559c458fd95c to 0x559c458fd95d:
=> 0x0000559c458fd95c <_Z3foov+236>: vmovaps %zmm0,(%rsp)
End of assembler dump.
(gdb) p/x $rsp
$1 = 0x7ffc7c389ce0


vmovaps with zmm registers requires 64-bytes alignment but rsp does not have enough alignment.

The relevant parts of the assembly code path:
    and     rsp, -64    // function prologue aligns the stack to 64-bytes and
    sub     rsp, 192    // resizes it in a 64-bytes-aligned way

(...)
mov r14, rsp
add r14, -96 // this is an alloca in the function prologue, triggered
// when __asan_stack_malloc_1 returns 0, which it does when
// detect_stack_use_after_return=0 or when it's out of fake
// stack.
and r14, -32 // align the stack to... 32-bytes.
mov rsp, r14
(...)
sub rsp, 64 // move the stack for argument passing. The stack is still
// 32-bytes aligned at this point.
(...)
vmovaps zmmword ptr [rsp], zmm0 // copy argument to the stack, aka CRASH.


This started with 51fbab134560ece663517bf1e8c2a30300d08f1a (#<!-- -->77210) and `-mllvm -asan-use-stack-safety=0` fixes it.

This all comes down to the asan pass setting the alloca to 32-bytes alignment:

%MyAlloca = alloca i8, i64 96, align 32


With `-mllvm -asan-use-stack-safety=0`, the alloca becomes:

%MyAlloca = alloca i8, i64 224, align 64


It's worth noting that it's only 32 because that's the default value of asan-realign-stack. With `-mllvm -asan-realign-stack=1` it gets down to 16.

Essentially, this kind of worked by chance before because the stack poisoner was doing the argument passing alloca too, so MyAlloca was sized and aligned for it. But now that's not the case anymore with stack-safety, MyAlloca doesn't contain and is not aligned for that alloca.

The function call to `hoge` has a proper `align 64` annotation on the `byval`. I guess one way to look at the problem is that the function call should take that into account and realign the stack pointer. In theory, I guess this could happen in other scenarios than ASan, but I couldn't get it to happen using an actual `alloca()` call in C/C++. The alloca for the call to `hoge` always ends up before the manual `alloca()` in the IR in my attempts.

Another way to look at it is that MyAlloca should have extra alignment based on all the allocas, including the "non interesting" ones, but that sounds hackish.

Cc: @<!-- -->MaskRay 
</details>

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue May 9, 2024
Whatever issues there were with detect_stack_use_after_return during the
clang trunk cycle for clang 15 seem to be gone.

On the other hand, changes in clang 18 trigger a bug[1] that causes stack
misalignment in AVX-512 code when detect_stack_use_after_return is
disabled.

1. llvm/llvm-project#91565

Differential Revision: https://phabricator.services.mozilla.com/D209907
ErichDonGubler pushed a commit to ErichDonGubler/firefox that referenced this issue May 10, 2024
Whatever issues there were with detect_stack_use_after_return during the
clang trunk cycle for clang 15 seem to be gone.

On the other hand, changes in clang 18 trigger a bug[1] that causes stack
misalignment in AVX-512 code when detect_stack_use_after_return is
disabled.

1. llvm/llvm-project#91565

Differential Revision: https://phabricator.services.mozilla.com/D209907
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue May 14, 2024
Whatever issues there were with detect_stack_use_after_return during the
clang trunk cycle for clang 15 seem to be gone.

On the other hand, changes in clang 18 trigger a bug[1] that causes stack
misalignment in AVX-512 code when detect_stack_use_after_return is
disabled.

1. llvm/llvm-project#91565

Differential Revision: https://phabricator.services.mozilla.com/D209907
jamienicol pushed a commit to jamienicol/gecko that referenced this issue May 14, 2024
Whatever issues there were with detect_stack_use_after_return during the
clang trunk cycle for clang 15 seem to be gone.

On the other hand, changes in clang 18 trigger a bug[1] that causes stack
misalignment in AVX-512 code when detect_stack_use_after_return is
disabled.

1. llvm/llvm-project#91565

Differential Revision: https://phabricator.services.mozilla.com/D209907
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants