[WebAssembly] Fold constant `i8x16.swizzle` and `i8x16.relaxed.swizzle` to `shufflevector` #169110

valadaptive · 2025-11-21T22:18:36Z

This adds ~~an InstCombine pass~~ a TTI hook to the WebAssembly backend that folds i8x16.swizzle and i8x16.relaxed.swizzle operations to shufflevector operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular, simplifyX86vpermilvar in X86InstCombineIntrinsic), and adapted it for the WebAssembly backend. There wasn't any previous instCombineIntrinsic method on the WebAssembly TargetTransformInfo, so I added it. Right now, this swizzle optimization is the only one it performs.

As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.

github-actions · 2025-11-21T22:18:54Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-21T22:19:25Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-webassembly

Author: None (valadaptive)

Changes

Resolves #169058.

This adds an InstCombine pass to the WebAssembly backend that folds i8x16.swizzle and i8x16.relaxed.swizzle operations to shufflevector operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular, simplifyX86vpermilvar in X86InstCombineIntrinsic), and adapted it for the WebAssembly backend. There wasn't any previous instCombineIntrinsic method on the WebAssembly TargetTransformInfo, so I added it. Right now, this swizzle optimization is the only one it performs.

As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.

Full diff: https://github.com/llvm/llvm-project/pull/169110.diff

5 Files Affected:

(modified) llvm/lib/Target/WebAssembly/CMakeLists.txt (+1)
(added) llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp (+107)
(modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2)
(added) llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll (+116)
(modified) llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn (+1)

diff --git a/llvm/lib/Target/WebAssembly/CMakeLists.txt b/llvm/lib/Target/WebAssembly/CMakeLists.txt
index 17df119d62709..13fff96fc6a33 100644
--- a/llvm/lib/Target/WebAssembly/CMakeLists.txt
+++ b/llvm/lib/Target/WebAssembly/CMakeLists.txt
@@ -32,6 +32,7 @@ add_llvm_target(WebAssemblyCodeGen
   WebAssemblyFixIrreducibleControlFlow.cpp
   WebAssemblyFixFunctionBitcasts.cpp
   WebAssemblyFrameLowering.cpp
+  WebAssemblyInstCombineIntrinsic.cpp
   WebAssemblyISelDAGToDAG.cpp
   WebAssemblyISelLowering.cpp
   WebAssemblyInstrInfo.cpp
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
new file mode 100644
index 0000000000000..2fa00b3c5d50d
--- /dev/null
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
@@ -0,0 +1,107 @@
+//=== WebAssemblyInstCombineIntrinsic.cpp -
+//                                WebAssembly specific InstCombine pass ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file implements a TargetTransformInfo analysis pass specific to
+/// WebAssembly. It uses the target's detailed information to provide more
+/// precise answers to certain TTI queries, while letting the target independent
+/// and default TTI implementations handle the rest.
+///
+//===----------------------------------------------------------------------===//
+
+#include "WebAssemblyTargetTransformInfo.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsWebAssembly.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+#include <optional>
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+/// Attempt to convert [relaxed_]swizzle to shufflevector if the mask is
+/// constant.
+static Value *simplifyWasmSwizzle(const IntrinsicInst &II,
+                                  InstCombiner::BuilderTy &Builder,
+                                  bool IsRelaxed) {
+  auto *V = dyn_cast<Constant>(II.getArgOperand(1));
+  if (!V)
+    return nullptr;
+
+  auto *VecTy = cast<FixedVectorType>(II.getType());
+  unsigned NumElts = VecTy->getNumElements();
+  assert(NumElts == 16);
+
+  // Construct a shuffle mask from constant integers or UNDEFs.
+  int Indexes[16];
+  bool AnyOutOfBounds = false;
+
+  for (unsigned I = 0; I < NumElts; ++I) {
+    Constant *COp = V->getAggregateElement(I);
+    if (!COp || (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
+      return nullptr;
+
+    if (isa<UndefValue>(COp)) {
+      Indexes[I] = -1;
+      continue;
+    }
+
+    int64_t Index = cast<ConstantInt>(COp)->getSExtValue();
+
+    if (Index >= NumElts && IsRelaxed) {
+      // For lane indices above 15, the relaxed_swizzle operation can choose
+      // between returning 0 or the lane at `Index % 16`. However, the choice
+      // must be made consistently. As the WebAssembly spec states:
+      //
+      // "The result of relaxed operators are implementation-dependent, because
+      // the set of possible results may depend on properties of the host
+      // environment, such as its hardware. Technically, their behaviour is
+      // controlled by a set of global parameters to the semantics that an
+      // implementation can instantiate in different ways. These choices are
+      // fixed, that is, parameters are constant during the execution of any
+      // given program."
+      //
+      // The WebAssembly runtime may choose differently from us, so we can't
+      // optimize a relaxed swizzle with lane indices above 15.
+      return nullptr;
+    }
+
+    if (Index >= NumElts || Index < 0) {
+      AnyOutOfBounds = true;
+      // If there are out-of-bounds indices, the swizzle instruction returns
+      // zeroes in those lanes. We'll provide an all-zeroes vector as the
+      // second argument to shufflevector and read the first element from it.
+      Indexes[I] = NumElts;
+      continue;
+    }
+
+    Indexes[I] = Index;
+  }
+
+  auto *V1 = II.getArgOperand(0);
+  auto *V2 =
+      AnyOutOfBounds ? Constant::getNullValue(VecTy) : PoisonValue::get(VecTy);
+
+  return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes, NumElts));
+}
+
+std::optional<Instruction *>
+WebAssemblyTTIImpl::instCombineIntrinsic(InstCombiner &IC,
+                                         IntrinsicInst &II) const {
+  Intrinsic::ID IID = II.getIntrinsicID();
+  switch (IID) {
+  case Intrinsic::wasm_swizzle:
+  case Intrinsic::wasm_relaxed_swizzle:
+    if (Value *V = simplifyWasmSwizzle(
+            II, IC.Builder, IID == Intrinsic::wasm_relaxed_swizzle)) {
+      return IC.replaceInstUsesWith(II, V);
+    }
+    break;
+  }
+
+  return std::nullopt;
+}
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
index 4146c0ec6ab07..11f7efc625399 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
@@ -90,6 +90,8 @@ class WebAssemblyTTIImpl final : public BasicTTIImplBase<WebAssemblyTTIImpl> {
                                      TTI::TargetCostKind CostKind,
                                      unsigned Index, const Value *Op0,
                                      const Value *Op1) const override;
+  std::optional<Instruction *>
+  instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const override;
   InstructionCost getPartialReductionCost(
       unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType,
       ElementCount VF, TTI::PartialReductionExtendKind OpAExtend,
diff --git a/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
new file mode 100644
index 0000000000000..ba251929c3739
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
@@ -0,0 +1,116 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -passes=instcombine -mtriple=wasm32-unknown-unknown -S | FileCheck %s
+
+; swizzle with a constant operand should be optimized to a shufflevector.
+
+declare <16 x i8> @llvm.wasm.swizzle(<16 x i8>, <16 x i8>)
+declare <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8>, <16 x i8>)
+
+; Identity swizzle pattern
+define <16 x i8> @swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    ret <16 x i8> [[V]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern
+define <16 x i8> @swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; undef elements
+define <16 x i8> @swizzle_with_undef(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_with_undef(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 undef, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Negative test: non-constant operand
+define <16 x i8> @swizzle_non_constant(<16 x i8> %v, <16 x i8> %mask) {
+; CHECK-LABEL: define <16 x i8> @swizzle_non_constant(
+; CHECK-SAME: <16 x i8> [[V:%.*]], <16 x i8> [[MASK:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> [[V]], <16 x i8> [[MASK]])
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> %mask)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds index, otherwise identity pattern
+define <16 x i8> @swizzle_out_of_bounds_1(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_1(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = insertelement <16 x i8> [[V]], i8 0, i64 15
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 16>)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds indices, both negative and positive
+define <16 x i8> @swizzle_out_of_bounds_2(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_2(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Identity swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    ret <16 x i8> [[V]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds index, only negative (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_out_of_bounds(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 -99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Negative test: out-of-bounds index, both positive and negative (relaxed_swizzle)
+; The choice between different relaxed semantics can only be made at runtime, since it must be consistent.
+define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> [[V]], <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
diff --git a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
index 11a57fcb008cd..8d976a33ce9db 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
@@ -54,6 +54,7 @@ static_library("LLVMWebAssemblyCodeGen") {
     "WebAssemblyFixFunctionBitcasts.cpp",
     "WebAssemblyFixIrreducibleControlFlow.cpp",
     "WebAssemblyFrameLowering.cpp",
+    "WebAssemblyInstCombineIntrinsic.cpp",
     "WebAssemblyISelDAGToDAG.cpp",
     "WebAssemblyISelLowering.cpp",
     "WebAssemblyInstrInfo.cpp",

github-actions · 2025-11-22T01:04:12Z

✅ With the latest revision this PR passed the undef deprecator.

badumbatish · 2025-11-22T01:20:42Z

thanks for the PR, looks really good to me, I think the code formatter is complaining about undef usages in one of the test, we can change it to poison for those.

Let's wait for another reviewer and for all the CI to pass to see if we miss anything. I'll hop in and out to enable CI/CD if there's any pushes

github-actions · 2025-11-22T01:47:25Z

🐧 Linux x64 Test Results

186450 tests passed
4876 tests skipped

valadaptive · 2025-11-22T07:56:46Z

Looking at the LangRef, I noticed:

A ‘poison’ value (described in the next section) should be used instead of ‘undef’ whenever possible. Poison values are stronger than undef, and enable more optimizations.

and

It is correct to replace a poison value with an undef value or any value of the type.

The swizzle optimization implemented here checks isa<UndefValue> for each element, which seems to also work for poison, and sets the corresponding shuffle index to -1 if so. I believe out-of-bounds swizzle elements become poison. A couple questions then:

Is isa<UndefValue> the correct thing to check? There are a lot more uses of isa<UndefValue> than isa<PoisonValue> in the codebase, and it's what the original x86 optimization uses.
Given that the LangRef says that poison is "stronger than undef", and that it is correct to replace a poison value with undef but not vice versa, is it correct to optimize an undef mask element in the input to a poison element in the output? The LangRef says:

A poison element in the mask vector specifies that the resulting element is poison. For backwards-compatibility reasons, LLVM temporarily also accepts undef mask elements, which will be interpreted the same way as poison elements.

But it's unclear if this means "an undef element specifies that the resulting element is undef", or "an undef element specifies that the resulting element is poison". (EDIT: Whoops, I read the literal next sentence. It's the former.)

Again, this is the same behavior as the existing x86 permute optimization.

ppenzin

InstCombine guide also asks for alive2 proof, I believe. @topperc, you know more about instcombine than I do, is this right structure for a new instcombine pass?

Not sure what our position on Claude, though in this case it is just for the test.

llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp

valadaptive · 2025-11-22T09:04:39Z

InstCombine guide also asks for alive2 proof, I believe.

Is there a good guide for writing alive2 proofs? I'm not sure how to write a "for all arbitrary constants" constraint.

lukel97

Is it possible to do this as a DAG combine and avoid adding a new pass

valadaptive · 2025-11-24T12:18:08Z

Is it possible to do this as a DAG combine and avoid adding a new pass

My understanding is that DAG combine passes run much later in the pipeline, so we'd be losing out on a lot of optimizations if this was a DAG combine.

lukel97 · 2025-11-24T14:20:21Z

Is it possible to do this as a DAG combine and avoid adding a new pass

My understanding is that DAG combine passes run much later in the pipeline, so we'd be losing out on a lot of optimizations if this was a DAG combine.

DAGCombiner has plenty of combines on shuffles so I don't think you'd miss much doing it there. But I'm just noticing now that this isn't actually a new pass, it's just implementing a TTI hook. Can you update the PR description to reflect that? I'm aware that the original comment in X86InstCombineIntrinsic.cpp says it's a pass but that probably needs updated.

valadaptive · 2025-11-24T14:26:08Z

I've updated the PR description. I also noticed that for most other targets (except AMDGPU and x86), the instCombineIntrinsic hook is not a separate file, and is instead part of [target]TargetTransformInfo.cpp. Should I just move the new hook into WebAssemblyTargetTransformInfo.cpp?

dschuff · 2025-11-24T17:09:53Z

I've updated the PR description. I also noticed that for most other targets (except AMDGPU and x86), the instCombineIntrinsic hook is not a separate file, and is instead part of [target]TargetTransformInfo.cpp. Should I just move the new hook into WebAssemblyTargetTransformInfo.cpp?

I think that would be fine. Both WebAssemblyTargetTransformInfo.cpp and your addition are pretty small, whereas for e.g. X86 they are pretty big.

dschuff · 2025-11-24T17:32:47Z

Not sure what our position on Claude, though in this case it is just for the test.

General LLVM position is that it's fine, but that contributors are responsible for their contributions just as if they wrote it themselves. I think this PR is getting plenty of review so I'm not worried.

valadaptive · 2025-11-24T17:45:25Z

I've moved the new TTI hook into WebAssemblyTargetTransformInfo.cpp.

I've decided to keep treating the swizzle indices as signed--the code is a lot cleaner that way, it's functionally equivalent since all valid indices are between 0 and 15, and finally I can't actually tell whether the spec treats them as signed or unsigned.

I'm still unsure about the following:

I'm handling undef values in this transform the same way that the x86 version does, but I don't know if that's correct.
If I can, I want to write an Alive2 proof for this transform, but I can't find any good guides on how to do so. In particular, I don't know if it supports "for all arbitrary constants" constraints.

ppenzin · 2025-11-28T06:29:34Z

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp

+      return nullptr;
+    }
+
+    if (Index >= NumElts || Index < 0) {


Non-relaxed swizzle treats indices as unsigned, I don't think you can apply same logic to both

Continuing from #169110 (comment)

I've changed the code to use GetSExtValue for the relaxed swizzle check, and GetZExtValue for everything after that. The outcome should be the same whether we treat them as signed or unsigned, though--the only valid indices are 0-15, which are well in range of a signed 8-bit int.

valadaptive added 2 commits November 21, 2025 16:57

[WebAssembly] Add InstCombine test for constant swizzles

9746078

[WebAssembly] Fold constant i8x16.swizzle to shufflevector

ffc2951

llvmbot added backend:WebAssembly llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 21, 2025

valadaptive mentioned this pull request Nov 21, 2025

wasm: i8x16.swizzle with a constant value should be optimized to i8x16.shuffle #169058

Open

[WebAssembly] Use poison instead of undef for swizzle test

41edf7d

ppenzin reviewed Nov 22, 2025

View reviewed changes

llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp Outdated Show resolved Hide resolved

ppenzin requested a review from topperc November 22, 2025 08:46

lukel97 reviewed Nov 24, 2025

View reviewed changes

[WebAssembly] Rearrange new TTI hook code

07ed82b

This was referenced Nov 25, 2025

Shuffle/swizzle operations linebender/fearless_simd#29

Open

[AArch64][ARM] Move ARM-specific InstCombine transforms to new module #169589

Open

ppenzin self-requested a review November 26, 2025 01:01

This was referenced Nov 26, 2025

Remove or update the overview? WebAssembly/relaxed-simd#163

Open

[AArch64][ARM] Optimize more tbl/tbx calls into shufflevector #169748

Open

ppenzin reviewed Nov 28, 2025

View reviewed changes

[WebAssembly] Treat swizzle indices as unsigned more

14300e4

[WebAssembly] Fold constant i8x16.swizzle and i8x16.relaxed.swizzle to shufflevector #169110

Are you sure you want to change the base?

[WebAssembly] Fold constant i8x16.swizzle and i8x16.relaxed.swizzle to shufflevector #169110

Conversation

valadaptive commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

llvmbot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

badumbatish commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

valadaptive commented Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ppenzin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

valadaptive commented Nov 22, 2025

Uh oh!

lukel97 left a comment

Choose a reason for hiding this comment

Uh oh!

valadaptive commented Nov 24, 2025

Uh oh!

lukel97 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valadaptive commented Nov 24, 2025

Uh oh!

dschuff commented Nov 24, 2025

Uh oh!

dschuff commented Nov 24, 2025

Uh oh!

valadaptive commented Nov 24, 2025

Uh oh!

ppenzin Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

valadaptive Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[WebAssembly] Fold constant `i8x16.swizzle` and `i8x16.relaxed.swizzle` to `shufflevector` #169110

[WebAssembly] Fold constant `i8x16.swizzle` and `i8x16.relaxed.swizzle` to `shufflevector` #169110

valadaptive commented Nov 21, 2025 •

edited

Loading

llvmbot commented Nov 21, 2025 •

edited

Loading

github-actions bot commented Nov 22, 2025 •

edited

Loading

badumbatish commented Nov 22, 2025 •

edited

Loading

github-actions bot commented Nov 22, 2025 •

edited

Loading

valadaptive commented Nov 22, 2025 •

edited

Loading

lukel97 commented Nov 24, 2025 •

edited

Loading