Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WebAssembly] Don't return multivalue when Emscripten EH/SjLj is used #86048

Closed
wants to merge 3 commits into from

Conversation

aheejin
Copy link
Member

@aheejin aheejin commented Mar 21, 2024

Emscripten EH/SjLj uses invoke wrappers in JS, such as invoke_vi, from which user functions are called indirectly, to emulate exceptions and setjmp/longjmp.

But in case the invoked function returns multiple values or a 128-bit value, its the JS invoke wrappers cannot return multivalue because JS doesn't support that. So we should not enable multivalue returns for the JS invoke wrappers and also the functions called by those JS wrappers because their signature has to match with the JS wrapper. For example, if func returns {i32, i32} and we have

  invoke {i32, i32} @func() ...

while LowerEmscriptenEHSjLj will lower it down to something like

  %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @func)
  ...

we should eventually lower both the invoke wrapper (whose name will be changed later to invoke_vi) and func down to a signature that indirectly returns multiple values by memory parameter, because JS invoke wrappers do support multiple return values.

So we need to disable multivalue returns for JS invoke wrappers and functions called by them. I think we have three ways to do that:

  1. Make a set and add all functions that are invoked by JS invoke wrappers in LowerEmscriptenEHSjLj and pass it to the backend using an auxiliary data structure. We have a precedent of this kind of structure already, which is used for Wasm EH:
    https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
    We can even add this set to WasmEHFuncInfo maybe, given that this is also a way of handling exceptions in Wasm.

    • Pros: Most precise
    • Cons: Auxiliary structure needed
  2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like 1, we don't have a way of precisely knowing the set of invoked functions. But they are all indirectly invoked, so in the backend we can check whether a given function is ever indirectly used (i.e., its pointer taken) by traversing its users() in the IR and if it is, we don't allow multivalue returns for them.

    • Pros: No auxiliary structure
    • Cons: IR checking overhead. More conservative than 1.
  3. Disallow all multivalue returns when Emscripten EH or SjLj is enabled.

    • Pros: Simplest
    • Cons: Most conservative.

This PR is doing 3. While it is the most conservative and possibly disallowing multivalue returns from more functions than needed, I chose this because it is the simplest, and given that hopefully more people will adopt Wasm EH going forward, I don't think there will be many people who would use multivalue and Emscripten EH/SjLj together and want the whatever performance benefit that multivalue return can bring, given that Emscripten EH/SjLj has already a huge performance cost.

This is separate from whether we should make the multivalue return dependent on the multivalue feature or something else like a clang flag, which is being partly discussed in WebAssembly/tool-conventions#223.
Whichever way we decide on that front, we still need to disable multivalue returns in case Emscripten EH/SjLj is used.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 21, 2024

@llvm/pr-subscribers-backend-webassembly

Author: Heejin Ahn (aheejin)

Changes

Emscripten EH/SjLj uses invoke wrappers in JS, such as invoke_vi, from which user functions are called indirectly, to emulate exceptions and setjmp/longjmp.

But in case the invoked function returns multiple values or a 128-bit value, its the JS invoke wrappers cannot return multivalue because JS doesn't support that. So we should not enable multivalue returns for the JS invoke wrappers and also the functions called by those JS wrappers because their signature has to match with the JS wrapper. For example, if func returns {i32, i32} and we have
``ll
invoke {i32, i32} @func() ...

while LowerEmscriptenEHSjLj will lower it down to something like
```ll
  %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @<!-- -->func)
  ...

we should eventually lower both the invoke wrapper (whose name will be
changed later to invoke_vi) and func down to a signature that
indirectly returns multiple values by memory parameter, because JS
invoke wrappers do support multiple return values.

So we need to disable multivalue returns for JS invoke wrappers and
functions called by them. I think we have three ways to do that:

  1. Make a set and add all functions that are invoked by JS invoke
    wrappers in LowerEmscriptenEHSjLj and pass it to the backend using a
    auxiliary data structure. We have a precedent of this kind of
    structure already, which is used for Wasm EH:
    https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
    We can even add this set to WasmEHFuncInfo maybe, given that this
    is also a way of handling exceptions in Wasm.

    • Pros: Most precise
    • Cons: Auxiliary structure needed
  2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like
    1, we don't have a way of precisely knowing the set of invoked
    functions. But they are all indirectly invoked, so in the backend we
    can check whether a given function is ever indirectly used (i.e., its
    pointer taken) by traversing its users() in the IR and if it is,
    we don't allow multivalue returns for them.

    • Pros: No auxiliary structure
    • Cons: IR checking overhead. More conservative than 1.
  3. Disallow all multivalue returns when Emscripten EH or SjLj is
    enabled.

    • Pros: Simplest
    • Cons: Most conservative.

This PR is doing 3. While it is the most conservative and possibly
disallowing multivalue returns from more functions than needed, I chose
this because it is the simplest, and given that hopefully more people
will adopt Wasm EH going forward, I don't think there will be many
people who would use multivalue and Emscripten EH/SjLj together and want
the whatever performance benefit that multivalue return can bring, given
that Emscripten EH/SjLj has already a huge performance cost.

This is separate from whether we should make the multivalue return
dependent on the multivalue feature or something else like a clang flag,
which is being partly discussed in
WebAssembly/tool-conventions#223.
Whichever way we decide on that front, we still need to disable
multivalue returns in case Emscripten EH/SjLj is used.


Full diff: https://github.com/llvm/llvm-project/pull/86048.diff

7 Files Affected:

  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp (+4-9)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h (+1-1)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp (+2-2)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp (+6-5)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp (+14)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h (+5)
  • (modified) llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll (+29-4)
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
index 03897b551ecaee..c6c7bf1cacda40 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
@@ -151,19 +151,14 @@ static std::string getEmscriptenInvokeSymbolName(wasm::WasmSignature *Sig) {
 //===----------------------------------------------------------------------===//
 
 MCSymbolWasm *WebAssemblyAsmPrinter::getMCSymbolForFunction(
-    const Function *F, bool EnableEmEH, wasm::WasmSignature *Sig,
+    const Function *F, bool EnableEmEHSjLj, wasm::WasmSignature *Sig,
     bool &InvokeDetected) {
   MCSymbolWasm *WasmSym = nullptr;
-  if (EnableEmEH && isEmscriptenInvokeName(F->getName())) {
+  if (EnableEmEHSjLj && isEmscriptenInvokeName(F->getName())) {
     assert(Sig);
     InvokeDetected = true;
-    if (Sig->Returns.size() > 1) {
-      std::string Msg =
-          "Emscripten EH/SjLj does not support multivalue returns: " +
-          std::string(F->getName()) + ": " +
-          WebAssembly::signatureToString(Sig);
-      report_fatal_error(Twine(Msg));
-    }
+    // When Emscripten EH/SjLj is enabled, we don't enable multivalue returns
+    assert(Sig->Returns.size() <= 1);
     WasmSym = cast<MCSymbolWasm>(
         GetExternalSymbolSymbol(getEmscriptenInvokeSymbolName(Sig)));
   } else {
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
index c30e0155c81e72..4892df1f22485d 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
@@ -81,7 +81,7 @@ class LLVM_LIBRARY_VISIBILITY WebAssemblyAsmPrinter final : public AsmPrinter {
   MVT getRegType(unsigned RegNo) const;
   std::string regToString(const MachineOperand &MO);
   WebAssemblyTargetStreamer *getTargetStreamer();
-  MCSymbolWasm *getMCSymbolForFunction(const Function *F, bool EnableEmEH,
+  MCSymbolWasm *getMCSymbolForFunction(const Function *F, bool EnableEmEHSjLj,
                                        wasm::WasmSignature *Sig,
                                        bool &InvokeDetected);
   MCSymbol *getOrCreateWasmSymbol(StringRef Name);
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index 905ff3b9018428..64bcadf3f5677c 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -1288,7 +1288,7 @@ bool WebAssemblyTargetLowering::CanLowerReturn(
     const SmallVectorImpl<ISD::OutputArg> &Outs,
     LLVMContext & /*Context*/) const {
   // WebAssembly can only handle returning tuples with multivalue enabled
-  return Subtarget->hasMultivalue() || Outs.size() <= 1;
+  return WebAssembly::canLowerReturn(Outs.size(), Subtarget);
 }
 
 SDValue WebAssemblyTargetLowering::LowerReturn(
@@ -1296,7 +1296,7 @@ SDValue WebAssemblyTargetLowering::LowerReturn(
     const SmallVectorImpl<ISD::OutputArg> &Outs,
     const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
     SelectionDAG &DAG) const {
-  assert((Subtarget->hasMultivalue() || Outs.size() <= 1) &&
+  assert(WebAssembly::canLowerReturn(Outs.size(), Subtarget) &&
          "MVP WebAssembly can only return up to one value");
   if (!callingConvSupported(CallConv))
     fail(DL, DAG, "WebAssembly doesn't support non-C calling conventions");
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
index 1e959111a4dbcb..2c104382aa65ca 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
@@ -17,6 +17,7 @@
 #include "Utils/WebAssemblyTypeUtilities.h"
 #include "WebAssemblyISelLowering.h"
 #include "WebAssemblySubtarget.h"
+#include "WebAssemblyUtilities.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/WasmEHFuncInfo.h"
 #include "llvm/Target/TargetMachine.h"
@@ -70,12 +71,12 @@ void llvm::computeSignatureVTs(const FunctionType *Ty,
   computeLegalValueVTs(ContextFunc, TM, Ty->getReturnType(), Results);
 
   MVT PtrVT = MVT::getIntegerVT(TM.createDataLayout().getPointerSizeInBits());
-  if (Results.size() > 1 &&
-      !TM.getSubtarget<WebAssemblySubtarget>(ContextFunc).hasMultivalue()) {
+  if (!WebAssembly::canLowerReturn(
+          Results.size(),
+          &TM.getSubtarget<WebAssemblySubtarget>(ContextFunc))) {
     // WebAssembly can't lower returns of multiple values without demoting to
-    // sret unless multivalue is enabled (see
-    // WebAssemblyTargetLowering::CanLowerReturn). So replace multiple return
-    // values with a poitner parameter.
+    // sret unless multivalue is enabled (see WebAssembly::canLowerReturn). So
+    // replace multiple return values with a poitner parameter.
     Results.clear();
     Params.push_back(PtrVT);
   }
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
index ac7cf5b37fcaa4..f09700ec7c0527 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
@@ -179,3 +179,17 @@ unsigned WebAssembly::getCopyOpcodeForRegClass(const TargetRegisterClass *RC) {
     llvm_unreachable("Unexpected register class");
   }
 }
+
+bool WebAssembly::canLowerReturn(size_t ResultSize,
+                                 const WebAssemblySubtarget *Subtarget) {
+  // We don't return multivalues when either of Emscripten EH or Emscripten SjLj
+  // is used. JS invoke wrapper functions, which are used in Emscripten EH/SjLj,
+  // don't support multiple return values, and the functions indirectly called
+  // from those JS invoke wrapper functions can't be lowered using multivalue
+  // results because they have to match the signature of the JS invoke wrappers.
+  // We can come up with a precise set of functions that are called from the JS
+  // wrappers if we do some bookeeping, but given that Emscripten EH/SjLj is an
+  // old feature whose use is expected to decline, we don't think it's worth it.
+  return ResultSize <= 1 || (Subtarget->hasMultivalue() &&
+                             !WebAssembly::WasmEnableEmEH && !WasmEnableEmSjLj);
+}
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
index 7f28fb1858a690..b430edba4627ae 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
@@ -63,6 +63,11 @@ MachineInstr *findCatch(MachineBasicBlock *EHPad);
 /// Returns the appropriate copy opcode for the given register class.
 unsigned getCopyOpcodeForRegClass(const TargetRegisterClass *RC);
 
+/// Returns true if the function's return value(s) can be lowered directly,
+/// i.e., not indirectly via a pointer parameter that points to the value in
+/// memory.
+bool canLowerReturn(size_t ResultSize, const WebAssemblySubtarget *Subtarget);
+
 } // end namespace WebAssembly
 
 } // end namespace llvm
diff --git a/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll b/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
index 4f33439db770dc..a4d0908299f4dc 100644
--- a/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
+++ b/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
@@ -1,8 +1,8 @@
-; RUN: not --crash llc < %s -enable-emscripten-cxx-exceptions -mattr=+multivalue 2>&1 | FileCheck %s --check-prefix=EH
-; RUN: not --crash llc < %s -enable-emscripten-sjlj -mattr=+multivalue 2>&1 | FileCheck %s --check-prefix=SJLJ
+; RUN: llc < %s -enable-emscripten-cxx-exceptions -enable-emscripten-sjlj -mattr=+multivalue | FileCheck %s
 
-; Currently multivalue returning functions are not supported in Emscripten EH /
-; SjLj. Make sure they error out.
+; Even with multivalue feature enabled, we don't enable multivalue returns when
+; Emscripten EH/SjLj is used. Also the invoke wrappers' (e.g. invoke_vi)
+; signature should not change.
 
 target triple = "wasm32-unknown-unknown"
 
@@ -35,7 +35,32 @@ entry:
   unreachable
 }
 
+define void @invoke_i128() personality ptr @__gxx_personality_v0 {
+entry:
+  invoke i128 @get_i128()
+          to label %try.cont unwind label %lpad
+
+lpad:                                             ; preds = %entry
+  %1 = landingpad { ptr, i32 }
+          catch ptr null
+  %2 = extractvalue { ptr, i32 } %1, 0
+  %3 = extractvalue { ptr, i32 } %1, 1
+  %4 = call ptr @__cxa_begin_catch(ptr %2)
+  call void @__cxa_end_catch()
+  br label %try.cont
+
+try.cont:                                         ; preds = %entry, %lpad
+  ret void
+}
+
 declare {i32, i32} @foo(i32)
+; CHECK-DAG: .functype foo (i32, i32) -> ()
+; CHECK-DAG: .functype invoke_vi (i32, i32) -> ()
+
+declare i128 @get_i128()
+; CHECK-DAG: .functype get_i128 (i32) -> ()
+; CHECK-DAG: .functype invoke_vii (i32, i32, i32) -> ()
+
 declare i32 @__gxx_personality_v0(...)
 declare ptr @__cxa_begin_catch(ptr)
 declare void @__cxa_end_catch()

Emscripten EH/SjLj uses invoke wrappers in JS, such as `invoke_vi`, from
which user functions are called indirectly, to emulate exceptions and
setjmp/longjmp.

But in case the invoked function returns multiple values or a 128-bit
value, its the JS invoke wrappers cannot return multivalue because JS
doesn't support that. So we should not enable multivalue returns for the
JS invoke wrappers and also the functions called by those JS wrappers
because their signature has to match with the JS wrapper. For example,
if `func` returns `{i32, i32}` and we have
```ll
  invoke {i32, i32} @func() ...
```
while LowerEmscriptenEHSjLj will lower it down to something like
```ll
  %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @func)
  ...
```
we should eventually lower both the invoke wrapper (whose name will be
changed later to `invoke_vi`) and `func` down to a signature that
indirectly returns multiple values by memory parameter, because JS
invoke wrappers do support multiple return values.

So we need to disable multivalue returns for JS invoke wrappers and
functions called by them. I think we have three ways to do that:

1. Make a set and add all functions that are invoked by JS invoke
   wrappers in LowerEmscriptenEHSjLj and pass it to the backend using an
   auxiliary data structure. We have a precedent of this kind of
   structure already, which is used for Wasm EH:
   https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
   We can even add this set to `WasmEHFuncInfo` maybe, given that this
   is also a way of handling exceptions in Wasm.
   - Pros: Most precise
   - Cons: Auxiliary structure needed

2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like
   1, we don't have a way of precisely knowing the set of invoked
   functions. But they are all indirectly invoked, so in the backend we
   can check whether a given function is ever indirectly used (i.e., its
   pointer taken) by traversing its `users()` in the IR and if it is,
   we don't allow multivalue returns for them.
   - Pros: No auxiliary structure
   - Cons: IR checking overhead. More conservative than 1.

3. Disallow all multivalue returns when Emscripten EH or SjLj is
   enabled.
   - Pros: Simplest
   - Cons: Most conservative.

This PR is doing 3. While it is the most conservative and possibly
disallowing multivalue returns from more functions than needed, I chose
this because it is the simplest, and given that hopefully more people
will adopt Wasm EH going forward, I don't think there will be many
people who would use multivalue and Emscripten EH/SjLj together and want
the whatever performance benefit that multivalue return can bring, given
that Emscripten EH/SjLj has already a huge performance cost.

This is separate from whether we should make the multivalue return
dependent on the multivalue feature or something else like a clang flag,
which is being partly discussed in
WebAssembly/tool-conventions#223.
Whichever way we decide on that front, we still need to disable
multivalue returns in case Emscripten EH/SjLj is used.
@aheejin aheejin changed the title [WebAssembly] Don't return multivalue for invokes [WebAssembly] Don't return multivalue when Emscripten EH/SjLj is used Mar 21, 2024
@aheejin
Copy link
Member Author

aheejin commented Mar 21, 2024

cc @TerrorJack in case you are interested. If you are using emscripten this may affect your results, but I don't think you are..?

@tlively
Copy link
Collaborator

tlively commented Mar 21, 2024

the JS invoke wrappers cannot return multivalue because JS doesn't support that

Doesn't it? I thought multivalue interacts with JS by having multivalue function called from JS return JS arrays of values.

@aheejin
Copy link
Member Author

aheejin commented Mar 21, 2024

One thing to think about is, given that Emscripten EH is off by default but Emscripten SjLj is on by default unless you use Wasm EH, this may turn off multivalue returns for more cases than we intend to. So to sum up,

  • Enabling Emscripten EH automatically means enabling Emscripten SjLj
  • Enabling Wasm EH automatically means enabling Wasm SjLj
  • Enabling neither EH means enabling Emscripten SjLJ
    • This can change later when Wasm EH is supported universally, but likely not in the near future

@tlively
Copy link
Collaborator

tlively commented Mar 21, 2024

One thing to think about is, given that Emscripten EH is off by default but Emscripten SjLj is on by default unless you use Wasm EH, this may turn off multivalue returns for more cases than we intend to. So to sum up,

  • Enabling Emscripten EH automatically means enabling Emscripten SjLj

  • Enabling Wasm EH automatically means enabling Wasm SjLj

  • Enabling neither EH means enabling Emscripten SjLJ

    • This can change later when Wasm EH is supported universally, but likely not in the near future

So if you really want multivalue returns, the only way to get them after this change is by also enabling Wasm EH?

@TerrorJack
Copy link

@aheejin Thanks for the ping! Our toolchain does not use emscripten sjlj mechanism fyi.

@aheejin
Copy link
Member Author

aheejin commented Mar 21, 2024

the JS invoke wrappers cannot return multivalue because JS doesn't support that

Doesn't it? I thought multivalue interacts with JS by having multivalue function called from JS return JS arrays of values.

Hmm, yeah, you're right. It's not an inherent limitation of JS but rather our invoke wrapper-specific issue...

@aheejin
Copy link
Member Author

aheejin commented Mar 21, 2024

One thing to think about is, given that Emscripten EH is off by default but Emscripten SjLj is on by default unless you use Wasm EH, this may turn off multivalue returns for more cases than we intend to. So to sum up,

  • Enabling Emscripten EH automatically means enabling Emscripten SjLj

  • Enabling Wasm EH automatically means enabling Wasm SjLj

  • Enabling neither EH means enabling Emscripten SjLJ

    • This can change later when Wasm EH is supported universally, but likely not in the near future

So if you really want multivalue returns, the only way to get them after this change is by also enabling Wasm EH?

Either that or explicitly changing longjmp support via -sSUPPORT_LONGJMP=wasm or -sSUPPORT_LONGJMP=0. But yeah I don't think people would do that. I was mostly thinking about Emscripten EH when I was working on this and disabling multivalue whenever Emscripten SjLj is enabled can be too much. I guess I need to rethink; will make the PR a draft in the meantime.

@aheejin
Copy link
Member Author

aheejin commented Apr 23, 2024

Will abandon this given that Emscripten does not need this for now after #88492 lands.

@aheejin aheejin closed this Apr 23, 2024
@aheejin aheejin deleted the invoke_no_multi_return branch April 23, 2024 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants