Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

-fsanitize=address causes huge runtime slowdown from std::rethrow_exception not called #64190

Open
ecatmur opened this issue Jul 28, 2023 · 12 comments
Assignees

Comments

@ecatmur
Copy link

ecatmur commented Jul 28, 2023

#include <exception>
std::exception_ptr p;
void f() {
  try { throw 1; } catch(char) { std::rethrow_exception(p); }
}
int main() {
  for (int i = 0; i != 100000; ++i)
    try { f(); } catch (...) { }
}

Compiled with -fsanitize=address (and at -O0 through -O3, for gcc), this is roughly 30x slower under gcc 13 than under gcc 12 (4.7s vs 0.15s on my Core i7 3 GHz).

On godbolt with clang/libc++ and clang/libstdc++ both the problem exhibits between clang 14.0.0 and clang 15.0.0. At -O1 clang appears capable of enough elision to reduce the impact somewhat (so it is only ~15x slower) and at -O3 to eliminate the bug entirely.

Initially reported to gcc at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110835 but they pointed out that it appears to be a bug in compiler-rt/asan, not in gcc/clang codegen.

Note that the std::rethrow_exception() is not called, but is still essential to exhibit the bug. Also f needs to be a separate function (and not static). At low optimization levels it can be an iife.

@ecatmur
Copy link
Author

ecatmur commented Jul 28, 2023

Motivation is https://github.com/boostorg/exception/blob/b039b4ea18ef752d0c1684b3f715ce493b778060/include/boost/exception/detail/exception_ptr.hpp#L550 ; the half-reduced code is:

#include <boost/exception_ptr.hpp>
struct S {};
int main() {
    auto ep = boost::copy_exception(S());
    for (int i = 0; i != 100000; ++i)
        try { boost::rethrow_exception(ep); } catch (...) {}
}

@13steinj
Copy link

Bisected to 4b4437c (found initial bad commit from bisecting gcc)

[redacted]:~/src/llvm-project ((4b4437c084e2...)|BISECTING)$ git bisect log
git bisect start
# bad: [0a1bcab9f3bf75c4c5d3e53bafb3eeb80320af46] tsan: fix deadlock in libbacktrace
git bisect bad 0a1bcab9f3bf75c4c5d3e53bafb3eeb80320af46
# good: [329fda39c507e8740978d10458451dcdb21563be] NFC: Mention auto-vec support for SVE in release notes.
git bisect good 329fda39c507e8740978d10458451dcdb21563be
# good: [db01b123d012df2f0e6acf7e90bf4ba63382587c] [flang] Lower PAUSE statement
git bisect good db01b123d012df2f0e6acf7e90bf4ba63382587c
# good: [75c1d9155472e343034da88e35e1c29f8142adc7] [mlir][SCF] Implement RegionBranchOpInterface on ExecuteRegionOp
git bisect good 75c1d9155472e343034da88e35e1c29f8142adc7
# good: [a88e8374db3d6a0ede8a456cbfe6d5ffdc5ae8f9] [SVE] Add more gather/scatter tests to highlight bugs in their generated code.
git bisect good a88e8374db3d6a0ede8a456cbfe6d5ffdc5ae8f9
# good: [1dfe0273fda3972662bd979de3c216155b18f6ed] [OpenMP] Add explicit triple to linker wrapper test
git bisect good 1dfe0273fda3972662bd979de3c216155b18f6ed
# bad: [957ada4164ddb5110a0a5b231fb8e4ac494f5d39] [AArch64][NFC] Deleted llvm/test/Analysis/CostModel/AArch64/splat-load.ll test
git bisect bad 957ada4164ddb5110a0a5b231fb8e4ac494f5d39
# good: [8dbc6b560055ff5068ff45b550482ba62c36b5a5] Revert "[randstruct] Check final randomized layout ordering"
git bisect good 8dbc6b560055ff5068ff45b550482ba62c36b5a5
# bad: [34312f1f0c4f56ae78577783ec62bee3fb5dab90] [mlir][LLVM] Support opaque pointers in data layout entries
git bisect bad 34312f1f0c4f56ae78577783ec62bee3fb5dab90
# bad: [f1dbf8e4ada7761ba296400acbf0190e6e203dc6] [flang][runtime] Fix edge-case FP input bugs
git bisect bad f1dbf8e4ada7761ba296400acbf0190e6e203dc6
# good: [5dd99f71aa733387ccfc43298b18ef1b20613f55] [RISCV] transform MI to W variant to remove sext.w
git bisect good 5dd99f71aa733387ccfc43298b18ef1b20613f55
# good: [ce3bb82e4503fca385824f309f72444f6a960f37] [LICM] Add test for writeonly fn with noalias call.
git bisect good ce3bb82e4503fca385824f309f72444f6a960f37
# good: [20a9fb953e46b1d97aaee7b182b0f3d48f340bd1] [Clang][OpenMP] Fix the issue that temp cubin files are not removed after compilation when using new OpenMP driver
git bisect good 20a9fb953e46b1d97aaee7b182b0f3d48f340bd1
# bad: [4b4437c084e2b8a2643e97e7aef125c438635a4d] [asan] Enable detect_stack_use_after_return=1 by default
git bisect bad 4b4437c084e2b8a2643e97e7aef125c438635a4d
# good: [8ed2bd1e746567ab82e138896db340a6a6781511] [mlir][LLVM] Fix `DataLayoutTypeInterface` for opqaue pointers with non-default address space
git bisect good 8ed2bd1e746567ab82e138896db340a6a6781511
# good: [debfb96be62b43a9373b6a7478b5c4a87f8f7af4] llvm-reduce: Fix cloning unset maxCallFrameSize
git bisect good debfb96be62b43a9373b6a7478b5c4a87f8f7af4
# first bad commit: [4b4437c084e2b8a2643e97e7aef125c438635a4d] [asan] Enable detect_stack_use_after_return=1 by default

ecatmur referenced this issue Dec 12, 2023
By default -fsanitize=address already compiles with this check,
why not use it.
For compatibly it can be disabled with env ASAN_OPTIONS=detect_stack_use_after_return=0.

Reviewed By: eugenis, kda, #sanitizers, hans

Differential Revision: https://reviews.llvm.org/D124057
@vitalybuka
Copy link
Collaborator

vitalybuka commented Dec 12, 2023

Could you try the HEAD of LLVM?
More precisely 9be8892 and 515c435 can improve exceptions performance.

@ecatmur
Copy link
Author

ecatmur commented Dec 18, 2023

Could you try the HEAD of LLVM? More precisely 9be8892 and 515c435 can improve exceptions performance.

I'm not seeing any improvement; rather the opposite:
15.0.0 6.382s
16.0.0 7.742s
18.0.0git 5ac1295 9.014s

@vitalybuka
Copy link
Collaborator

Assuming example is quite artificial, it has less value than detect_stack_use_after_return for most users.
So I don't think we need to change the default.

However it's worth of investigation, maybe we can improve performance here easily.
I will assign to myself, but can't promise to work on this soon.

@vitalybuka vitalybuka self-assigned this Dec 18, 2023
@Mistuke
Copy link

Mistuke commented Dec 18, 2023

In case you need a bigger non-artificial example perhaps https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112981 is better?

@taalhaataahir0102
Copy link
Contributor

hi @vitalybuka! Did you worked on this? I'm new to llvm and planing to work on it and was hoping if you can give any general advice or guidance for me on how to kick-start my journey into solving this problem, I’d be grateful for the assistance. Thanks in advance!

@vitalybuka
Copy link
Collaborator

I didn't look into that.

I guess you need to start with building LLVM and reproducing the issue with just built LLVM.
After that either profiling or just printf debugging can be used to identify what takes so long.

@taalhaataahir0102
Copy link
Contributor

Thanks. I'll let you know if I get to somewhere or face any issue :)

@taalhaataahir0102
Copy link
Contributor

@vitalybuka Hi! I've tried recreating the issue again but having some issues.
I'm using callgrind to measure the cycles and time to measure time. Here are the results at O0:
Commands:

valgrind --tool=callgrind ./clang++ -O0 -fsanitize=address -o flag example.cpp
time ./clang++ -O0 -fsanitize=address -o flag example.cpp

Screenshot from 2024-02-19 14-07-42

I've used 2 codes, the given example above and a random example. In both examples, I'm experiencing almost similar results i.e., same cycles reported by callgrind and almost similar slowdown introduced by the flag -fsanitize=address i.e., around 2x. I've build the latest clang version i.e., 19. My configuration command was:
cmake -DLLVM_ENABLE_PROJECTS="clang;llvm" -DLLVM_ENABLE_RUNTIMES=compiler-rt -DCMAKE_BUILD_TYPE=Release -G Ninja ../llvm-project-main/llvm
Similarly I'm having same slowdown on clang-14 (which I've installed using apt install). Maybe I'm not understanding the issue correctly as it says 30x/15x slowdown at O0 on different clang versions. Can you please guide me 😕

@13steinj
Copy link

@vitalybuka the issue isn't the compile time to generate the object file / executable, but the runtime of the executable.

@knjmooney
Copy link

In case you need a bigger non-artificial example perhaps https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112981 is better?

This slowdown disappears when I export ASAN_OPTIONS=detect_stack_use_after_return=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants