Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot catch exceptions on Apple M1 #49036

Open
hahnjo opened this issue Mar 23, 2021 · 29 comments
Open

Cannot catch exceptions on Apple M1 #49036

hahnjo opened this issue Mar 23, 2021 · 29 comments
Labels

Comments

@hahnjo
Copy link
Member

hahnjo commented Mar 23, 2021

Bugzilla Link 49692
Version trunk
OS MacOS X
CC @AlexDenisov,@Axel-Naumann,@lhames,@vgvassilev

Extended Description

On arm64-apple-darwin20.3.0, consider the following code:

 % cat test.cc
int main() {
  try {
    throw 1;
  } catch (...) { }
}

When compiled into an executable (by either Apple clang or Clang trunk), the program exits normally:

 % clang++ -O2 throw.cc -o throw
 % ./throw
 % echo $?
0

However, the same code crashes when compiling into LLVM IR and executing via lli:

 % clang++ -O2 -S -emit-llvm throw.cc
 % lli throw.ll
libc++abi.dylib: terminating with uncaught exception of type int

I've tried all combinations of --jit-kind and --jit-link, none of them worked.

@hahnjo
Copy link
Member Author

hahnjo commented Mar 23, 2021

Part of the problem may be related to "T.isOSDarwin() && .getArch() == Triple::aarch64" defaulting to emit only compact unwinding info without __eh_frame. I tried forcing LLVM to generate that section with

diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 155104cddda2..6b045bafdf59 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -56,9 +56,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
           MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
       SectionKind::getReadOnly());

+#if 0
   if (T.isOSDarwin() &&
       (T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
     SupportsCompactUnwindWithoutEHFrame = true;
+#endif

   if (T.isWatchABI())
     OmitDwarfIfHaveCompactUnwind = true;

and afterwards I see the section being handed over to RTDyldMemoryManager::registerEHFramesInProcess, but it still fails to catch the exception.

Unfortunately, I haven't succeeded in building libunwind with debug symbols and trace through the unwinding to see what's going wrong: It didn't even work to inject a custom libunwind (neither from LLVM trunk nor the patched version from https://opensource.apple.com/source/libunwind/) into a compiled executable, that results in no exceptions being caught either. It appears Apple did some changes to the exception handling ABI compared to x86_64, we probably need some of their engineers to help us out here...

@Axel-Naumann
Copy link
Member

@​lhames any recommendation whom to contact here? The fact that providing a custom-built libunwind is a dead end and that lli doesn't get this right either means we really don't know what to do here... Do you know of EH ABI specialists for Apple's AArch64?

@lhames
Copy link
Contributor

lhames commented Mar 25, 2021

Hi Axel,

This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.

How urgent is this for you, and are you using ORCv2? Ideally I'll just implement this in JITLink, but that won't help if you're on MCJIT or ORCv1.

@Axel-Naumann
Copy link
Member

Hey Lang,

See all of that is already a wealth of information! :-) Tells us that here we don't need to hunt our own bug, for once.

We're currently on llvm9's ORCv1; I hope we will be on ORCv2 by the end of the year.

The current state is pretty painful for M1 users... but it's not mission critical, our particles will continue to collide. And I can totally relate if you do this for ORCv2 only. If all fails on our side we might still be able to learn from what you do for JITLink and try to hack this into llvm9's ORCv1.

Thank you so much for your reply! Looking forward to your fixes in a couple of weeks, even if in JITLink!

Axel.

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2021

Hi Lang,

This one is on me -- I'll look into supporting compact-unwind, but won't have time to get to it for a couple of weeks.

So darwinarm64 is compact unwinding only? Or would there be a chance to get __eh_frame working, even if not the default for the platform? Do you have technical information with respect to the "new" unwinding ABI?

Thanks,
Jonas

@lhames
Copy link
Contributor

lhames commented Mar 26, 2021

Hi Jonas,

So darwinarm64 is compact unwinding only? Or would there be a chance to get __eh_frame working, even if not the default for the platform? Do you have technical information with respect to the "new" unwinding ABI?

I haven't had time to dig in to the details yet. My understanding is that arm64 prefers to emit compact unwind records (they're smaller than eh_frame records), but can not do so in all cases and falls back to eh_frame records where necessary. There is probably an option to force the compiler to emit eh_frame records for all functions, but I have not looked for it yet. It's also possible that JITLink's current eh-frame support is not sufficient for arm64 yet -- I'm not sure it uses the same pointer encodings as x86-64. In any case using compact_unwind is the eventual goal, because we would like to be able to load objects and archives that weren't compiled specifically for the JIT.

The best sources for compact_unwind that I'm aware of are the apple compact_unwind_encoding.h header [1], the source for libunwind [2], the source for ld64 [3].

-- Lang.

[1] https://opensource.apple.com/source/libunwind/libunwind-35.1/include/mach-o/compact_unwind_encoding.h.auto.html
[2] https://github.com/llvm/llvm-project/tree/main/libunwind
[3] https://opensource.apple.com/source/ld64/

@hahnjo
Copy link
Member Author

hahnjo commented Mar 26, 2021

Ok, thanks for the pointers. As written in comment 1, forcing emission of __eh_frame doesn't help, and I haven't succeeded in catching exceptions using a custom version of libunwind either (even not the one from Apple). So there must have been some changes, and neither JITlink nor RuntimeDyld are currently prepared to handle them.
Of course, we need support for compact unwinding in the long run to load arbitrary libraries, but being at least able to catch exceptions of our own libraries (where we can control the compilation options) would be great in the short term. I might have another look into the pages you linked, but I'm not super confident.

Jonas

@hahnjo
Copy link
Member Author

hahnjo commented Mar 31, 2021

hacky patch for libunwind
Some updates on this: I finally succeeded in catching exceptions (of a compiled program) with a custom libunwind. The missing piece was pointer authentication, which is (in my limited understanding) a tag added into the higher bits of a pointer value, for example in the link register and return addresses. For unknown reasons, this is not dealt with in available source code of libunwind (including the one from opensource.apple.com), but fairly easy to hack together for experimentation - I'm attaching a patch for future reference.

With that in place, I was able to run through the unwinding with lldb and found that the program addresses don't match the registered FDE. Disabling the call to RuntimeDyldMachOCRTPBase::processFDE in RuntimeDyldMachOCRTPBase::registerEHFrames fixes that and I'm able to catch exceptions thrown from JITed code - hooray! At least with my self-compiled libunwind, the system-provided library crashes somewhere in _Unwind_SetIP -> unw_set_reg before execution of the landing pad :-(

@hahnjo
Copy link
Member Author

hahnjo commented Apr 12, 2021

I poked some more and I think I see where the crash in the system libunwind comes from: From the accesses I see it appears that unw_set_reg loads a flag or something from the beginning of the MachO image that contains the exception handler. This works fine for compiled executables, but is set to the NULL pointer for dynamically registered FDEs:

if (getInfoFromFdeCie(fdeInfo, cieInfo, pc, 0))
(the last parameter is dso_base and written into the extra field, which is documented as "mach_header of mach-o image containing func").

As this doesn't happen in LLVM's libunwind nor in the version available on opensource.apple.com, I guess there are more modifications Apple did and there's not much we can do as "users" of the system. Lang, could you help us get this message to the right team at Apple so they can fix their library?

Thanks,
Jonas

@hahnjo
Copy link
Member Author

hahnjo commented Jun 24, 2021

Quick reminder about this issue: I can still reproduce the original problem ("libc++abi: terminating with uncaught exception of type int") with latest main and some hacky changes make "lli --jit-kind=mcjit throw.ll" crash in libunwind as described before. I didn't test my hacky patch to libunwind again, but I would expect it to "work" as before.

diff --git a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
index 9ca76602ea18..50a3add0c207 100644
--- a/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
+++ b/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldMachO.cpp
@@ -324,6 +324,7 @@ void RuntimeDyldMachOCRTPBase<Impl>::registerEHFrames() {
       continue;
     SectionEntry *Text = &Sections[SectionInfo.TextSID];
     SectionEntry *EHFrame = &Sections[SectionInfo.EHFrameSID];
+#if 0
     SectionEntry *ExceptTab = nullptr;
     if (SectionInfo.ExceptTabSID != RTDYLD_INVALID_SECTION_ID)
       ExceptTab = &Sections[SectionInfo.ExceptTabSID];
@@ -338,6 +339,7 @@ void RuntimeDyldMachOCRTPBase<Impl>::registerEHFrames() {
     while (P != End) {
       P = processFDE(P, DeltaForText, DeltaForEH);
     }
+#endif
 
     MemMgr.registerEHFrames(EHFrame->getAddress(), EHFrame->getLoadAddress(),
                             EHFrame->getSize());
diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 1a448f040b3b..60f27aae36c6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -57,9 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
           MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
       SectionKind::getReadOnly());
 
+#if 0
   if (T.isOSDarwin() &&
       (T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
     SupportsCompactUnwindWithoutEHFrame = true;
+#endif
 
   if (T.isWatchABI())
     OmitDwarfIfHaveCompactUnwind = true;

@lhames
Copy link
Contributor

lhames commented Jun 30, 2021

Hi Jonas,

I recently ok'd a patch that seems like it matches your suggested fix: https://reviews.llvm.org/D103052

It hasn't landed yet, but I expect to commit it in the next couple of days.

I'll be focusing more attention on arm64 support over the next few months, so I'm hoping we'll see significant improvements in exception handling on that architecture.

@hahnjo
Copy link
Member Author

hahnjo commented Jun 30, 2021

Hi Lang,

yes, that patch fixes one third of the problem, namely not processing FDEs on AArch64. https://reviews.llvm.org/D103052#2778324 mentions the second problem of forcing emission of .eh_frame sections that I hacked via

diff --git a/llvm/lib/MC/MCObjectFileInfo.cpp b/llvm/lib/MC/MCObjectFileInfo.cpp
index 1a448f040b3b..60f27aae36c6 100644
--- a/llvm/lib/MC/MCObjectFileInfo.cpp
+++ b/llvm/lib/MC/MCObjectFileInfo.cpp
@@ -57,9 +57,11 @@ void MCObjectFileInfo::initMachOMCObjectFileInfo(const Triple &T) {
           MachO::S_ATTR_STRIP_STATIC_SYMS | MachO::S_ATTR_LIVE_SUPPORT,
       SectionKind::getReadOnly());
 
+#if 0
   if (T.isOSDarwin() &&
       (T.getArch() == Triple::aarch64 || T.getArch() == Triple::aarch64_32))
     SupportsCompactUnwindWithoutEHFrame = true;
+#endif
 
   if (T.isWatchABI())
     OmitDwarfIfHaveCompactUnwind = true;

However, the real blocker (as far as I can tell) is that libunwind distributed with macOS keeps crashing once you have all that in place. See comments 8 and 9 for more details, and as mentioned in comment 10 I'm still able to reproduce the issue. Unfortunately, as mentioned before, this is not something we can fix since Apple didn't even open-source the modifications to libunwind that provoke the crash...

Jonas

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@hahnjo
Copy link
Member Author

hahnjo commented Mar 11, 2022

Still broken in the same way, the hacks mentioned in #49036 (comment) still provoke the crash in libunwind's unw_set_reg.

@lhames any updates on this?

@inumanag
Copy link

Hi @hahnjo,

It seems that Apple has released libunwind source code (https://github.com/apple-oss-distributions/libunwind/). Would this codebase help with this issue?

Thanks!

@hahnjo
Copy link
Member Author

hahnjo commented May 27, 2022

Hi @inumanag, thanks for sharing. Unfortunately, these are the same versions as already available from https://opensource.apple.com/source/libunwind/, which, as mentioned in a previous comment, does not have all modifications that Apple actually ships. Most importantly, it doesn't have the code in unw_set_reg that crashes.

@arshajii
Copy link

Thanks for your work on this @hahnjo -- I was wondering if you ever found a satisfactory solution with a modified libunwind? I tried combining your patches from #49036 (comment) and #49036 (comment), but to no avail, at least on LLVM 15. Were there any other modifications that had to be made?

@hahnjo
Copy link
Member Author

hahnjo commented Sep 27, 2022

Hi @arshajii, never "satisfactory" but I remember that at some point I managed to catch an exception in lli (#49036 (comment)). This requires both the "hacky" patch to libunwind and disabling some parts in LLVM. I haven't tested this recently, so you're saying it doesn't work anymore? If so, I may take a look but probably not this week...

@arshajii
Copy link

Hi @hahnjo: I managed to get it working with JITLink (whereas before I was testing with RuntimeDyld), although I'm still testing to make sure everything works. I couldn't get it working with RuntimeDyld. The steps for JITLink, at least with LLVM 15, are:

@vgvassilev
Copy link
Contributor

@msneubauer ran some tests on OSX 13 and it seems this issue is fixed. Thanks a lot, Mark! Here is what he ran:

cat test_exceptions.C
void test_exceptions() {
  try {
    std::cout << "got here\n";
    throw 1;
  } catch (...) { }
}
wirelessprv-10-193-242-21:tmp msn$ root.exe -l -b -q -e '.x test_exceptions.C'

got here

This is based on a source build of the root_v6.26.06.source.tar.gz tarball.

cc: @hahnjo, @lhames

@arshajii
Copy link

arshajii commented Nov 3, 2022

I can confirm on my end too that OSX 13 is working (on LLVM 15 w/ JIT Link). 🥳

@lhames
Copy link
Contributor

lhames commented Nov 4, 2022

Before we break out the party hats I should warn that the current "everything just works" state is temporary: We plan to introduce a new registration function in libunwind that will be required in future updates in order for JIT'd exception handling to work. This registration function will be used by libunwind to determine the MachO subarchitecture of JIT'd frames. I'll aim to provide an update soon on the details.

@lhames
Copy link
Contributor

lhames commented Jan 20, 2023

It's taken a while but I have a review for a proposed long-term solution up at https://reviews.llvm.org/D142176.

If/when that change is accepted I'll update this issue with instructions on how to add registration API calls to your app to ensure that JIT'd exceptions work in future releases.

@hahnjo
Copy link
Member Author

hahnjo commented Jan 20, 2023

If/when that change is accepted I'll update this issue with instructions on how to add registration API calls to your app to ensure that JIT'd exceptions work in future releases.

Just to be sure: The callback should be implemented by the JIT infrastructure, which should also call the registration API, right? Why would we do that from the app?

@lhames
Copy link
Contributor

lhames commented Jan 20, 2023

Just to be sure: The callback should be implemented by the JIT infrastructure, which should also call the registration API, right? Why would we do that from the app?

I'll include exact details once the libunwind patch is accepted, but in short: The callback and registration support will be added to the ORC runtime (a prototype is already in the LLVM mainline), but clients will need to manually register a callback until/unless they pick up a new ORC runtime.

Registration will be required in the near future, so the safe path will be to implement the manual callback immediately, and then remove it once you're able to pick up an ORC runtime that includes the fix.

@lhames
Copy link
Contributor

lhames commented Feb 20, 2023

Ok https://reviews.llvm.org/D142176 has landed. If you're using LLVM top-of-tree (LLVM 17) and the ORC runtime then DWARF exceptions should Just Work, and compact-unwind is on the way. In that case you should not use the code below, as it will shadow registration code that's already in the ORC runtime.

If you're using MCJIT, ORC without the ORC runtime, or an earlier LLVM version then you'll want to add the following code to
your app/library.

// Define a minimal mach header for JIT'd code.
static MachO::mach_header_64 fake_mach_header = {
  .magic = MachO::MH_MAGIC_64,
  .cputype = MachO::CPU_TYPE_ARM64,
  .cpusubtype = MachO::CPU_SUBTYPE_ARM64_ALL,
  .filetype = MachO::MH_DYLIB,
  .ncmds = 0,
  .sizeofcmds = 0,
  .flags = 0,
  .reserved = 0
};

// Declare libunwind SPI types and functions.
struct unw_dynamic_unwind_sections {
  uintptr_t dso_base;
  uintptr_t dwarf_section;
  size_t     dwarf_section_length;
  uintptr_t compact_unwind_section;
  size_t     compact_unwind_section_length;
};

int find_dynamic_unwind_sections(
    uintptr_t addr, unw_dynamic_unwind_sections *info) {
  info->dso_base = (uintptr_t)&fake_mach_header;
  info->dwarf_section = 0;
  info->dwarf_section_length = 0;
  info->compact_unwind_section = 0;
  info->compact_unwind_section_length = 0;
  return 1;
}

// Typedef for callback above.
typedef int (*unw_find_dynamic_unwind_sections)(
    uintptr_t addr, struct unw_dynamic_unwind_sections *info);

with those declarations in place you can use the following code to check for and use the new registration functions:

// At program / library load time:
if (auto *unw_add_find_dynamic_unwind_sections =
      (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
      dlsym(RTLD_DEFAULT, "__unw_add_find_dynamic_unwind_sections"))
  unw_add_find_dynamic_unwind_sections(find_dynamic_unwind_sections);

// At program / library unload time:
if (auto *unw_remove_find_dynamic_unwind_sections =
      (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
      dlsym(RTLD_DEFAULT, "__unw_remove_find_dynamic_unwind_sections"))
  unw_remove_find_dynamic_unwind_sections(find_dynamic_unwind_sections);

Libunwind will call the find_dynamic_unwind_sections callback, which unconditionally returns fake_mach_header as the header for all JIT'd code. Libunwind will query the cpu subtype of field of the header to determine how pointer-signing should be handled for JIT'd frames.

@lhames
Copy link
Contributor

lhames commented Apr 30, 2024

@williamhCode This bug is about JIT'd exception handling only.

@vgvassilev
Copy link
Contributor

Ok https://reviews.llvm.org/D142176 has landed. If you're using LLVM top-of-tree (LLVM 17) and the ORC runtime then DWARF exceptions should Just Work, and compact-unwind is on the way. In that case you should not use the code below, as it will shadow registration code that's already in the ORC runtime.

If you're using MCJIT, ORC without the ORC runtime, or an earlier LLVM version then you'll want to add the following code to your app/library.

// Define a minimal mach header for JIT'd code.
static MachO::mach_header_64 fake_mach_header = {
  .magic = MachO::MH_MAGIC_64,
  .cputype = MachO::CPU_TYPE_ARM64,
  .cpusubtype = MachO::CPU_SUBTYPE_ARM64_ALL,
  .filetype = MachO::MH_DYLIB,
  .ncmds = 0,
  .sizeofcmds = 0,
  .flags = 0,
  .reserved = 0
};

// Declare libunwind SPI types and functions.
struct unw_dynamic_unwind_sections {
  uintptr_t dso_base;
  uintptr_t dwarf_section;
  size_t     dwarf_section_length;
  uintptr_t compact_unwind_section;
  size_t     compact_unwind_section_length;
};

int find_dynamic_unwind_sections(
    uintptr_t addr, unw_dynamic_unwind_sections *info) {
  info->dso_base = (uintptr_t)&fake_mach_header;
  info->dwarf_section = 0;
  info->dwarf_section_length = 0;
  info->compact_unwind_section = 0;
  info->compact_unwind_section_length = 0;
  return 1;
}

// Typedef for callback above.
typedef int (*unw_find_dynamic_unwind_sections)(
    uintptr_t addr, struct unw_dynamic_unwind_sections *info);

with those declarations in place you can use the following code to check for and use the new registration functions:

// At program / library load time:
if (auto *unw_add_find_dynamic_unwind_sections =
      (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
      dlsym(RTLD_DEFAULT, "__unw_add_find_dynamic_unwind_sections"))
  unw_add_find_dynamic_unwind_sections(find_dynamic_unwind_sections);

// At program / library unload time:
if (auto *unw_remove_find_dynamic_unwind_sections =
      (int (*)(unw_find_dynamic_unwind_sections find_dynamic_unwind_sections))
      dlsym(RTLD_DEFAULT, "__unw_remove_find_dynamic_unwind_sections"))
  unw_remove_find_dynamic_unwind_sections(find_dynamic_unwind_sections);

Libunwind will call the find_dynamic_unwind_sections callback, which unconditionally returns fake_mach_header as the header for all JIT'd code. Libunwind will query the cpu subtype of field of the header to determine how pointer-signing should be handled for JIT'd frames.

@lhames, would this be appropriate to live in clang/lib/Interpreter, for example? Or generally in the OrcJit infrastructure so that we do not need to put the burden on all clients?

@llvmbot
Copy link

llvmbot commented May 1, 2024

@llvm/issue-subscribers-orcjit

Author: Jonas Hahnfeld (hahnjo)

| | | | --- | --- | | Bugzilla Link | [49692](https://llvm.org/bz49692) | | Version | trunk | | OS | MacOS X | | CC | @AlexDenisov,@Axel-Naumann,@lhames,@vgvassilev |

Extended Description

On arm64-apple-darwin20.3.0, consider the following code:

 % cat test.cc
int main() {
  try {
    throw 1;
  } catch (...) { }
}

When compiled into an executable (by either Apple clang or Clang trunk), the program exits normally:

 % clang++ -O2 throw.cc -o throw
 % ./throw
 % echo $?
0

However, the same code crashes when compiling into LLVM IR and executing via lli:

 % clang++ -O2 -S -emit-llvm throw.cc
 % lli throw.ll
libc++abi.dylib: terminating with uncaught exception of type int

I've tried all combinations of --jit-kind and --jit-link, none of them worked.

@lhames
Copy link
Contributor

lhames commented May 8, 2024

@lhames, would this be appropriate to live in clang/lib/Interpreter, for example? Or generally in the OrcJit infrastructure so that we do not need to put the burden on all clients?

It depends on registration order -- if we embed a copy in clang/lib/Interpreter we want to make sure that it's visited after any user-installed find_dynamic_unwind_sections callbacks, otherwise it will shadow them.

Maybe we could have it run on first construction of an Interpreter instance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants