-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[WebAssembly] Fold constant i8x16.swizzle and i8x16.relaxed.swizzle to shufflevector
#169110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-webassembly Author: None (valadaptive) ChangesResolves #169058. This adds an InstCombine pass to the WebAssembly backend that folds This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations. I took most of this from the x86 backend (in particular, As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it. The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend). I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant. Full diff: https://github.com/llvm/llvm-project/pull/169110.diff 5 Files Affected:
diff --git a/llvm/lib/Target/WebAssembly/CMakeLists.txt b/llvm/lib/Target/WebAssembly/CMakeLists.txt
index 17df119d62709..13fff96fc6a33 100644
--- a/llvm/lib/Target/WebAssembly/CMakeLists.txt
+++ b/llvm/lib/Target/WebAssembly/CMakeLists.txt
@@ -32,6 +32,7 @@ add_llvm_target(WebAssemblyCodeGen
WebAssemblyFixIrreducibleControlFlow.cpp
WebAssemblyFixFunctionBitcasts.cpp
WebAssemblyFrameLowering.cpp
+ WebAssemblyInstCombineIntrinsic.cpp
WebAssemblyISelDAGToDAG.cpp
WebAssemblyISelLowering.cpp
WebAssemblyInstrInfo.cpp
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
new file mode 100644
index 0000000000000..2fa00b3c5d50d
--- /dev/null
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
@@ -0,0 +1,107 @@
+//=== WebAssemblyInstCombineIntrinsic.cpp -
+// WebAssembly specific InstCombine pass ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file implements a TargetTransformInfo analysis pass specific to
+/// WebAssembly. It uses the target's detailed information to provide more
+/// precise answers to certain TTI queries, while letting the target independent
+/// and default TTI implementations handle the rest.
+///
+//===----------------------------------------------------------------------===//
+
+#include "WebAssemblyTargetTransformInfo.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsWebAssembly.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+#include <optional>
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+/// Attempt to convert [relaxed_]swizzle to shufflevector if the mask is
+/// constant.
+static Value *simplifyWasmSwizzle(const IntrinsicInst &II,
+ InstCombiner::BuilderTy &Builder,
+ bool IsRelaxed) {
+ auto *V = dyn_cast<Constant>(II.getArgOperand(1));
+ if (!V)
+ return nullptr;
+
+ auto *VecTy = cast<FixedVectorType>(II.getType());
+ unsigned NumElts = VecTy->getNumElements();
+ assert(NumElts == 16);
+
+ // Construct a shuffle mask from constant integers or UNDEFs.
+ int Indexes[16];
+ bool AnyOutOfBounds = false;
+
+ for (unsigned I = 0; I < NumElts; ++I) {
+ Constant *COp = V->getAggregateElement(I);
+ if (!COp || (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
+ return nullptr;
+
+ if (isa<UndefValue>(COp)) {
+ Indexes[I] = -1;
+ continue;
+ }
+
+ int64_t Index = cast<ConstantInt>(COp)->getSExtValue();
+
+ if (Index >= NumElts && IsRelaxed) {
+ // For lane indices above 15, the relaxed_swizzle operation can choose
+ // between returning 0 or the lane at `Index % 16`. However, the choice
+ // must be made consistently. As the WebAssembly spec states:
+ //
+ // "The result of relaxed operators are implementation-dependent, because
+ // the set of possible results may depend on properties of the host
+ // environment, such as its hardware. Technically, their behaviour is
+ // controlled by a set of global parameters to the semantics that an
+ // implementation can instantiate in different ways. These choices are
+ // fixed, that is, parameters are constant during the execution of any
+ // given program."
+ //
+ // The WebAssembly runtime may choose differently from us, so we can't
+ // optimize a relaxed swizzle with lane indices above 15.
+ return nullptr;
+ }
+
+ if (Index >= NumElts || Index < 0) {
+ AnyOutOfBounds = true;
+ // If there are out-of-bounds indices, the swizzle instruction returns
+ // zeroes in those lanes. We'll provide an all-zeroes vector as the
+ // second argument to shufflevector and read the first element from it.
+ Indexes[I] = NumElts;
+ continue;
+ }
+
+ Indexes[I] = Index;
+ }
+
+ auto *V1 = II.getArgOperand(0);
+ auto *V2 =
+ AnyOutOfBounds ? Constant::getNullValue(VecTy) : PoisonValue::get(VecTy);
+
+ return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes, NumElts));
+}
+
+std::optional<Instruction *>
+WebAssemblyTTIImpl::instCombineIntrinsic(InstCombiner &IC,
+ IntrinsicInst &II) const {
+ Intrinsic::ID IID = II.getIntrinsicID();
+ switch (IID) {
+ case Intrinsic::wasm_swizzle:
+ case Intrinsic::wasm_relaxed_swizzle:
+ if (Value *V = simplifyWasmSwizzle(
+ II, IC.Builder, IID == Intrinsic::wasm_relaxed_swizzle)) {
+ return IC.replaceInstUsesWith(II, V);
+ }
+ break;
+ }
+
+ return std::nullopt;
+}
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
index 4146c0ec6ab07..11f7efc625399 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
@@ -90,6 +90,8 @@ class WebAssemblyTTIImpl final : public BasicTTIImplBase<WebAssemblyTTIImpl> {
TTI::TargetCostKind CostKind,
unsigned Index, const Value *Op0,
const Value *Op1) const override;
+ std::optional<Instruction *>
+ instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const override;
InstructionCost getPartialReductionCost(
unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType,
ElementCount VF, TTI::PartialReductionExtendKind OpAExtend,
diff --git a/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
new file mode 100644
index 0000000000000..ba251929c3739
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
@@ -0,0 +1,116 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -passes=instcombine -mtriple=wasm32-unknown-unknown -S | FileCheck %s
+
+; swizzle with a constant operand should be optimized to a shufflevector.
+
+declare <16 x i8> @llvm.wasm.swizzle(<16 x i8>, <16 x i8>)
+declare <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8>, <16 x i8>)
+
+; Identity swizzle pattern
+define <16 x i8> @swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: ret <16 x i8> [[V]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+ ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern
+define <16 x i8> @swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+ ret <16 x i8> %result
+}
+
+; undef elements
+define <16 x i8> @swizzle_with_undef(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_with_undef(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 undef, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+ ret <16 x i8> %result
+}
+
+; Negative test: non-constant operand
+define <16 x i8> @swizzle_non_constant(<16 x i8> %v, <16 x i8> %mask) {
+; CHECK-LABEL: define <16 x i8> @swizzle_non_constant(
+; CHECK-SAME: <16 x i8> [[V:%.*]], <16 x i8> [[MASK:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> [[V]], <16 x i8> [[MASK]])
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> %mask)
+ ret <16 x i8> %result
+}
+
+; Out-of-bounds index, otherwise identity pattern
+define <16 x i8> @swizzle_out_of_bounds_1(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_1(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = insertelement <16 x i8> [[V]], i8 0, i64 15
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 16>)
+ ret <16 x i8> %result
+}
+
+; Out-of-bounds indices, both negative and positive
+define <16 x i8> @swizzle_out_of_bounds_2(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_2(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+ ret <16 x i8> %result
+}
+
+; Identity swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: ret <16 x i8> [[V]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+ ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+ ret <16 x i8> %result
+}
+
+; Out-of-bounds index, only negative (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_out_of_bounds(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 -99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+ ret <16 x i8> %result
+}
+
+; Negative test: out-of-bounds index, both positive and negative (relaxed_swizzle)
+; The choice between different relaxed semantics can only be made at runtime, since it must be consistent.
+define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT: [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> [[V]], <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+; CHECK-NEXT: ret <16 x i8> [[RESULT]]
+;
+ %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+ ret <16 x i8> %result
+}
diff --git a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
index 11a57fcb008cd..8d976a33ce9db 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
@@ -54,6 +54,7 @@ static_library("LLVMWebAssemblyCodeGen") {
"WebAssemblyFixFunctionBitcasts.cpp",
"WebAssemblyFixIrreducibleControlFlow.cpp",
"WebAssemblyFrameLowering.cpp",
+ "WebAssemblyInstCombineIntrinsic.cpp",
"WebAssemblyISelDAGToDAG.cpp",
"WebAssemblyISelLowering.cpp",
"WebAssemblyInstrInfo.cpp",
|
|
✅ With the latest revision this PR passed the undef deprecator. |
|
thanks for the PR, looks really good to me, I think the code formatter is complaining about undef usages in one of the test, we can change it to poison for those. Let's wait for another reviewer and for all the CI to pass to see if we miss anything. I'll hop in and out to enable CI/CD if there's any pushes |
🐧 Linux x64 Test Results
|
|
Looking at the LangRef, I noticed:
and
The swizzle optimization implemented here checks
|
ppenzin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InstCombine guide also asks for alive2 proof, I believe. @topperc, you know more about instcombine than I do, is this right structure for a new instcombine pass?
Not sure what our position on Claude, though in this case it is just for the test.
llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
Outdated
Show resolved
Hide resolved
llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
Outdated
Show resolved
Hide resolved
Is there a good guide for writing alive2 proofs? I'm not sure how to write a "for all arbitrary constants" constraint. |
lukel97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do this as a DAG combine and avoid adding a new pass
My understanding is that DAG combine passes run much later in the pipeline, so we'd be losing out on a lot of optimizations if this was a DAG combine. |
DAGCombiner has plenty of combines on shuffles so I don't think you'd miss much doing it there. But I'm just noticing now that this isn't actually a new pass, it's just implementing a TTI hook. Can you update the PR description to reflect that? I'm aware that the original comment in X86InstCombineIntrinsic.cpp says it's a pass but that probably needs updated. |
|
I've updated the PR description. I also noticed that for most other targets (except AMDGPU and x86), the |
I think that would be fine. Both WebAssemblyTargetTransformInfo.cpp and your addition are pretty small, whereas for e.g. X86 they are pretty big. |
General LLVM position is that it's fine, but that contributors are responsible for their contributions just as if they wrote it themselves. I think this PR is getting plenty of review so I'm not worried. |
|
I've moved the new TTI hook into I've decided to keep treating the swizzle indices as signed--the code is a lot cleaner that way, it's functionally equivalent since all valid indices are between 0 and 15, and finally I can't actually tell whether the spec treats them as signed or unsigned. I'm still unsure about the following:
|
| return nullptr; | ||
| } | ||
|
|
||
| if (Index >= NumElts || Index < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-relaxed swizzle treats indices as unsigned, I don't think you can apply same logic to both
Continuing from #169110 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the code to use GetSExtValue for the relaxed swizzle check, and GetZExtValue for everything after that. The outcome should be the same whether we treat them as signed or unsigned, though--the only valid indices are 0-15, which are well in range of a signed 8-bit int.
Resolves #169058.
This adds
an InstCombine passa TTI hook to the WebAssembly backend that foldsi8x16.swizzleandi8x16.relaxed.swizzleoperations toshufflevectoroperations if their mask operands are constant.This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations.
I took most of this from the x86 backend (in particular,
simplifyX86vpermilvarinX86InstCombineIntrinsic), and adapted it for the WebAssembly backend. There wasn't any previousinstCombineIntrinsicmethod on the WebAssemblyTargetTransformInfo, so I added it. Right now, this swizzle optimization is the only one it performs.As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it.
The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend).
I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.