Skip to content

Conversation

@bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Nov 20, 2025

This way we don't have to support unwinding llvm intrinsics in the codegen backends.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 20, 2025
@rustbot
Copy link
Collaborator

rustbot commented Nov 20, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

Helps with #148533 (comment). cc @purplesyringa

@purplesyringa
Copy link
Contributor

purplesyringa commented Nov 20, 2025

Hmm. Doesn't inline asm need to be accounted for in unwind_ffi_calls the same way extern "C-unwind" is handled? This feels like opening the same potential for UB.

Otherwise, this looks like the best way forward.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

Indeed. It we forgot to handle inline asm in that function. Pushed a commit to fix that.

@purplesyringa
Copy link
Contributor

That looks good, thanks!

@rustbot
Copy link
Collaborator

rustbot commented Nov 20, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

This way we don't have to support unwinding llvm intrinsics in the
codegen backends.
This is required for the soundness of options(may_unwind)
}
}

#[any(not(target_os = "emscripten"), emscripten_wasm_eh)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't emscripten considered unix? cfg_select! in the root of the unwind crate prefers a libunwind-based implementation over wasm for unix targets, so this check doesn't look like it can ever fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Removed this commit again.

purplesyringa added a commit to iex-rs/lithium that referenced this pull request Nov 20, 2025
purplesyringa added a commit to iex-rs/lithium that referenced this pull request Nov 20, 2025
purplesyringa added a commit to iex-rs/lithium that referenced this pull request Nov 20, 2025
@purplesyringa
Copy link
Contributor

purplesyringa commented Nov 20, 2025

I copied the asm snippet to Lithium and had a test failure on wasip1. Given that it worked fine with the previous implementation, it seems like either I'm stumbling on new UB or LLVM handles inline asm differently in some way, and we should know what we're getting ourselves into. Not sure what the issue is yet.

@purplesyringa
Copy link
Contributor

purplesyringa commented Nov 20, 2025

Ah, I see. It seems like .tagtype defines a new tag, and the resulting wasm module includes two tags: one used by catch and one used by throw. So obviously exceptions are never caught. We should be treating __cpp_exception as an external symbol somehow.

This bug can also only be reproduced with a high -C codegen-units. I guess because when there's a single codegen unit, all users end up using the locally defined tag?

Also, this only fails on wasip1 for some reason, but not Emscripten. (Yes, I know EH on wasip1 is not supported, but that shouldn't affect codegen at this point.) Not sure if that's due to random chance or something else.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

This seems to work for me:

diff --git a/src/backend/itanium.rs b/src/backend/itanium.rs
index 8a20236..40ed67f 100644
--- a/src/backend/itanium.rs
+++ b/src/backend/itanium.rs
@@ -173,6 +173,7 @@ unsafe fn _Unwind_RaiseException(ex: *mut u8) -> ! {
     unsafe {
         core::arch::asm!(
             ".tagtype __cpp_exception i32",
+            ".globl __cpp_exception",
             "local.get {ex}",
             "throw __cpp_exception",
             ex = in(local) ex,
diff --git a/src/backend/wasm.rs b/src/backend/wasm.rs
index 698af54..dfea843 100644
--- a/src/backend/wasm.rs
+++ b/src/backend/wasm.rs
@@ -55,6 +55,7 @@ unsafe fn throw(ex: *mut u8) -> ! {
     unsafe {
         core::arch::asm!(
             ".tagtype __cpp_exception i32",
+            ".globl __cpp_exception",
             "local.get {ex}",
             "throw __cpp_exception",
             ex = in(local) ex,

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

But if I also do that in libunwind, I get

  = note: rust-lld: error: duplicate symbol: __cpp_exception
          >>> defined in /home/gh-bjorn3/lithium/target/wasm32-wasip1/debug/deps/lithium-78ca5961541269ed.0d0bfo19cty4ehby6hltmh4f5.1oiejt7.rcgu.o
          >>> defined in /home/gh-bjorn3/lithium/target/wasm32-wasip1/debug/deps/libunwind-a912e649dab03540.rlib(unwind-a912e649dab03540.unwind.54e4b316f502e26a-cgu.0.rcgu.o)

@purplesyringa
Copy link
Contributor

LLVM has this code:

void WasmException::endModule() {
  // These are symbols used to throw/catch C++ exceptions and C longjmps. These
  // symbols have to be emitted somewhere once in the module. Check if each of
  // the symbols has already been created, i.e., we have at least one 'throw' or
  // 'catch' instruction with the symbol in the module, and emit the symbol only
  // if so.
  //
  // But in dynamic linking, it is in general not possible to come up with a
  // module instantiating order in which tag-defining modules are loaded before
  // the importing modules. So we make them undefined symbols here, define tags
  // in the JS side, and feed them to each importing module.
  if (!Asm->isPositionIndependent()) {
    for (const char *SymName : {"__cpp_exception", "__c_longjmp"}) {
      SmallString<60> NameStr;
      Mangler::getNameWithPrefix(NameStr, SymName, Asm->getDataLayout());
      if (Asm->OutContext.lookupSymbol(NameStr)) {
        MCSymbol *ExceptionSym = Asm->GetExternalSymbolSymbol(SymName);
        Asm->OutStreamer->emitLabel(ExceptionSym);
      }
    }
  }
}

...so I'm assuming the difference can be chalked up to the default relocation-model, which is static only for WASI. I've verified that using -C relocation-model=static on Emscripten brings the bug onto that target too. Not sure what that even means for wasm targets, but I wouldn't want to rely on this interaction.

@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

Looking at the wasm object generated with llvm.wasm.throw, it looks like __cpp_exception is a weak definition. If I add .weak __cpp_exception it works fine and all non-doctests pass (I didn't have rustdoc built for my stage1 toolchain).

purplesyringa added a commit to iex-rs/lithium that referenced this pull request Nov 20, 2025
@purplesyringa
Copy link
Contributor

Seems to work on my side on the current nightly too, fingers crossed it stays that way. (Also, just making sure: you made a typo and added .globl, and the test should probably be updated too)

purplesyringa added a commit to iex-rs/lithium that referenced this pull request Nov 20, 2025
@bjorn3
Copy link
Member Author

bjorn3 commented Nov 20, 2025

I meant to mark it as both .globl and .weak but forgot to commit the latter change. Using .globl too ensures the tag is actually defined rather than being a weak import I think. In any case it looks like you are having some issues on Emscripten in CI.

rustc-LLVM ERROR: undefined tag symbol cannot be weak

For emscripten dynamic linking we should be emitting a true import and not a weak definition I think.

@purplesyringa
Copy link
Contributor

purplesyringa commented Nov 20, 2025

Yeah, I forgot how symbols worked and thought .weak doesn't need .globl 😅 Thanks.

I tried to confirm that dynamic linking requires a true import, but I'm not sure how to do that. I mean that makes sense, but the error looks odd: I am defining the symbol, after all, even if weak, so it's not clear why it's considered undefined. I feel like I'm missing something.

Is this something I need to handle in Lithium alone, or should this be taken into consideration for unwind too? I thought it might affect cdylib crates, but I'm not sure if wasm32-unknown-emscripten can even build them.

@purplesyringa
Copy link
Contributor

Is this connected to mangling in some way? As far as I can see, .tagtype mangles the symbol name, while .globl/.weak don't. Since Emscripten uses ELF mangling, this would explain why I got the undefined symbol error: there wasn't actually any symbol called __cpp_exception, only ___cpp_exception. When I added the underscore, Emscripten started working, but WASI broke. This would make sense if WASI didn't prefix names with _, but I'm pretty sure it does; in fact, even if it didn't, I should've got linking errors due to exposing an undefined symbol as weak at that point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants