Skip to content

Conversation

@valadaptive
Copy link

@valadaptive valadaptive commented Nov 21, 2025

Resolves #169058.

This adds an InstCombine pass a TTI hook to the WebAssembly backend that folds i8x16.swizzle and i8x16.relaxed.swizzle operations to shufflevector operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular, simplifyX86vpermilvar in X86InstCombineIntrinsic), and adapted it for the WebAssembly backend. There wasn't any previous instCombineIntrinsic method on the WebAssembly TargetTransformInfo, so I added it. Right now, this swizzle optimization is the only one it performs.

As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:WebAssembly llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-webassembly

Author: None (valadaptive)

Changes

Resolves #169058.

This adds an InstCombine pass to the WebAssembly backend that folds i8x16.swizzle and i8x16.relaxed.swizzle operations to shufflevector operations if their mask operands are constant.

This is mainly useful for abstractions over the raw intrinsics--for instance, in architecture-generic SIMD code that may not be able to expose the constant shuffles due to type system limitations.

I took most of this from the x86 backend (in particular, simplifyX86vpermilvar in X86InstCombineIntrinsic), and adapted it for the WebAssembly backend. There wasn't any previous instCombineIntrinsic method on the WebAssembly TargetTransformInfo, so I added it. Right now, this swizzle optimization is the only one it performs.

As I noted in the transform itself, the "relaxed" swizzle actually has stricter preconditions than the non-relaxed one. If a non-negative but still out-of-bounds index is provided, the "relaxed" swizzle can choose between returning 0 and the lane at the index modulo 16. However, it must make the same choice every time, and we don't know which choice the runtime will make, so we can't constant-fold it.

The regression tests were mostly generated by Claude and adapted a bit by me (I tried to follow the InstCombine contributor guide). There was previously no WebAssembly subdirectory within the InstCombine tests, so I created that too; as of now, the swizzle fold test is the only file in it. Everything else was written by myself (well, partly copy-pasted from the x86 backend).

I'm not sure how to write an Alive2 test for this; I can't find any examples where the input is an arbitrary constant.


Full diff: https://github.com/llvm/llvm-project/pull/169110.diff

5 Files Affected:

  • (modified) llvm/lib/Target/WebAssembly/CMakeLists.txt (+1)
  • (added) llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp (+107)
  • (modified) llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h (+2)
  • (added) llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll (+116)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn (+1)
diff --git a/llvm/lib/Target/WebAssembly/CMakeLists.txt b/llvm/lib/Target/WebAssembly/CMakeLists.txt
index 17df119d62709..13fff96fc6a33 100644
--- a/llvm/lib/Target/WebAssembly/CMakeLists.txt
+++ b/llvm/lib/Target/WebAssembly/CMakeLists.txt
@@ -32,6 +32,7 @@ add_llvm_target(WebAssemblyCodeGen
   WebAssemblyFixIrreducibleControlFlow.cpp
   WebAssemblyFixFunctionBitcasts.cpp
   WebAssemblyFrameLowering.cpp
+  WebAssemblyInstCombineIntrinsic.cpp
   WebAssemblyISelDAGToDAG.cpp
   WebAssemblyISelLowering.cpp
   WebAssemblyInstrInfo.cpp
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
new file mode 100644
index 0000000000000..2fa00b3c5d50d
--- /dev/null
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyInstCombineIntrinsic.cpp
@@ -0,0 +1,107 @@
+//=== WebAssemblyInstCombineIntrinsic.cpp -
+//                                WebAssembly specific InstCombine pass ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file implements a TargetTransformInfo analysis pass specific to
+/// WebAssembly. It uses the target's detailed information to provide more
+/// precise answers to certain TTI queries, while letting the target independent
+/// and default TTI implementations handle the rest.
+///
+//===----------------------------------------------------------------------===//
+
+#include "WebAssemblyTargetTransformInfo.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/IntrinsicsWebAssembly.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+#include <optional>
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+/// Attempt to convert [relaxed_]swizzle to shufflevector if the mask is
+/// constant.
+static Value *simplifyWasmSwizzle(const IntrinsicInst &II,
+                                  InstCombiner::BuilderTy &Builder,
+                                  bool IsRelaxed) {
+  auto *V = dyn_cast<Constant>(II.getArgOperand(1));
+  if (!V)
+    return nullptr;
+
+  auto *VecTy = cast<FixedVectorType>(II.getType());
+  unsigned NumElts = VecTy->getNumElements();
+  assert(NumElts == 16);
+
+  // Construct a shuffle mask from constant integers or UNDEFs.
+  int Indexes[16];
+  bool AnyOutOfBounds = false;
+
+  for (unsigned I = 0; I < NumElts; ++I) {
+    Constant *COp = V->getAggregateElement(I);
+    if (!COp || (!isa<UndefValue>(COp) && !isa<ConstantInt>(COp)))
+      return nullptr;
+
+    if (isa<UndefValue>(COp)) {
+      Indexes[I] = -1;
+      continue;
+    }
+
+    int64_t Index = cast<ConstantInt>(COp)->getSExtValue();
+
+    if (Index >= NumElts && IsRelaxed) {
+      // For lane indices above 15, the relaxed_swizzle operation can choose
+      // between returning 0 or the lane at `Index % 16`. However, the choice
+      // must be made consistently. As the WebAssembly spec states:
+      //
+      // "The result of relaxed operators are implementation-dependent, because
+      // the set of possible results may depend on properties of the host
+      // environment, such as its hardware. Technically, their behaviour is
+      // controlled by a set of global parameters to the semantics that an
+      // implementation can instantiate in different ways. These choices are
+      // fixed, that is, parameters are constant during the execution of any
+      // given program."
+      //
+      // The WebAssembly runtime may choose differently from us, so we can't
+      // optimize a relaxed swizzle with lane indices above 15.
+      return nullptr;
+    }
+
+    if (Index >= NumElts || Index < 0) {
+      AnyOutOfBounds = true;
+      // If there are out-of-bounds indices, the swizzle instruction returns
+      // zeroes in those lanes. We'll provide an all-zeroes vector as the
+      // second argument to shufflevector and read the first element from it.
+      Indexes[I] = NumElts;
+      continue;
+    }
+
+    Indexes[I] = Index;
+  }
+
+  auto *V1 = II.getArgOperand(0);
+  auto *V2 =
+      AnyOutOfBounds ? Constant::getNullValue(VecTy) : PoisonValue::get(VecTy);
+
+  return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes, NumElts));
+}
+
+std::optional<Instruction *>
+WebAssemblyTTIImpl::instCombineIntrinsic(InstCombiner &IC,
+                                         IntrinsicInst &II) const {
+  Intrinsic::ID IID = II.getIntrinsicID();
+  switch (IID) {
+  case Intrinsic::wasm_swizzle:
+  case Intrinsic::wasm_relaxed_swizzle:
+    if (Value *V = simplifyWasmSwizzle(
+            II, IC.Builder, IID == Intrinsic::wasm_relaxed_swizzle)) {
+      return IC.replaceInstUsesWith(II, V);
+    }
+    break;
+  }
+
+  return std::nullopt;
+}
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
index 4146c0ec6ab07..11f7efc625399 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h
@@ -90,6 +90,8 @@ class WebAssemblyTTIImpl final : public BasicTTIImplBase<WebAssemblyTTIImpl> {
                                      TTI::TargetCostKind CostKind,
                                      unsigned Index, const Value *Op0,
                                      const Value *Op1) const override;
+  std::optional<Instruction *>
+  instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const override;
   InstructionCost getPartialReductionCost(
       unsigned Opcode, Type *InputTypeA, Type *InputTypeB, Type *AccumType,
       ElementCount VF, TTI::PartialReductionExtendKind OpAExtend,
diff --git a/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
new file mode 100644
index 0000000000000..ba251929c3739
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/WebAssembly/fold-swizzle.ll
@@ -0,0 +1,116 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -passes=instcombine -mtriple=wasm32-unknown-unknown -S | FileCheck %s
+
+; swizzle with a constant operand should be optimized to a shufflevector.
+
+declare <16 x i8> @llvm.wasm.swizzle(<16 x i8>, <16 x i8>)
+declare <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8>, <16 x i8>)
+
+; Identity swizzle pattern
+define <16 x i8> @swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    ret <16 x i8> [[V]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern
+define <16 x i8> @swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; undef elements
+define <16 x i8> @swizzle_with_undef(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_with_undef(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 undef, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Negative test: non-constant operand
+define <16 x i8> @swizzle_non_constant(<16 x i8> %v, <16 x i8> %mask) {
+; CHECK-LABEL: define <16 x i8> @swizzle_non_constant(
+; CHECK-SAME: <16 x i8> [[V:%.*]], <16 x i8> [[MASK:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> [[V]], <16 x i8> [[MASK]])
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> %mask)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds index, otherwise identity pattern
+define <16 x i8> @swizzle_out_of_bounds_1(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_1(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = insertelement <16 x i8> [[V]], i8 0, i64 15
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 16>)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds indices, both negative and positive
+define <16 x i8> @swizzle_out_of_bounds_2(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @swizzle_out_of_bounds_2(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Identity swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_identity(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_identity(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    ret <16 x i8> [[V]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>)
+  ret <16 x i8> %result
+}
+
+; Reverse swizzle pattern (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_reverse(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_reverse(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> poison, <16 x i32> <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 15, i8 14, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Out-of-bounds index, only negative (relaxed_swizzle)
+define <16 x i8> @relaxed_swizzle_out_of_bounds(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = shufflevector <16 x i8> [[V]], <16 x i8> <i8 0, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison, i8 poison>, <16 x i32> <i32 16, i32 16, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 0>
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 -99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
+
+; Negative test: out-of-bounds index, both positive and negative (relaxed_swizzle)
+; The choice between different relaxed semantics can only be made at runtime, since it must be consistent.
+define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(<16 x i8> %v) {
+; CHECK-LABEL: define <16 x i8> @relaxed_swizzle_out_of_bounds_positive(
+; CHECK-SAME: <16 x i8> [[V:%.*]]) {
+; CHECK-NEXT:    [[RESULT:%.*]] = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> [[V]], <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+; CHECK-NEXT:    ret <16 x i8> [[RESULT]]
+;
+  %result = tail call <16 x i8> @llvm.wasm.relaxed.swizzle(<16 x i8> %v, <16 x i8> <i8 99, i8 -1, i8 13, i8 12, i8 11, i8 10, i8 9, i8 8, i8 7, i8 6, i8 5, i8 4, i8 3, i8 2, i8 1, i8 0>)
+  ret <16 x i8> %result
+}
diff --git a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
index 11a57fcb008cd..8d976a33ce9db 100644
--- a/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
+++ b/llvm/utils/gn/secondary/llvm/lib/Target/WebAssembly/BUILD.gn
@@ -54,6 +54,7 @@ static_library("LLVMWebAssemblyCodeGen") {
     "WebAssemblyFixFunctionBitcasts.cpp",
     "WebAssemblyFixIrreducibleControlFlow.cpp",
     "WebAssemblyFrameLowering.cpp",
+    "WebAssemblyInstCombineIntrinsic.cpp",
     "WebAssemblyISelDAGToDAG.cpp",
     "WebAssemblyISelLowering.cpp",
     "WebAssemblyInstrInfo.cpp",

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

✅ With the latest revision this PR passed the undef deprecator.

@badumbatish
Copy link
Contributor

badumbatish commented Nov 22, 2025

thanks for the PR, looks really good to me, I think the code formatter is complaining about undef usages in one of the test, we can change it to poison for those.

Let's wait for another reviewer and for all the CI to pass to see if we miss anything. I'll hop in and out to enable CI/CD if there's any pushes

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

🐧 Linux x64 Test Results

  • 186450 tests passed
  • 4876 tests skipped

@valadaptive
Copy link
Author

valadaptive commented Nov 22, 2025

Looking at the LangRef, I noticed:

A ‘poison’ value (described in the next section) should be used instead of ‘undef’ whenever possible. Poison values are stronger than undef, and enable more optimizations.

and

It is correct to replace a poison value with an undef value or any value of the type.

The swizzle optimization implemented here checks isa<UndefValue> for each element, which seems to also work for poison, and sets the corresponding shuffle index to -1 if so. I believe out-of-bounds swizzle elements become poison. A couple questions then:

  • Is isa<UndefValue> the correct thing to check? There are a lot more uses of isa<UndefValue> than isa<PoisonValue> in the codebase, and it's what the original x86 optimization uses.

  • Given that the LangRef says that poison is "stronger than undef", and that it is correct to replace a poison value with undef but not vice versa, is it correct to optimize an undef mask element in the input to a poison element in the output? The LangRef says:

    A poison element in the mask vector specifies that the resulting element is poison. For backwards-compatibility reasons, LLVM temporarily also accepts undef mask elements, which will be interpreted the same way as poison elements.

    But it's unclear if this means "an undef element specifies that the resulting element is undef", or "an undef element specifies that the resulting element is poison". (EDIT: Whoops, I read the literal next sentence. It's the former.)

    Again, this is the same behavior as the existing x86 permute optimization.

Copy link
Contributor

@ppenzin ppenzin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstCombine guide also asks for alive2 proof, I believe. @topperc, you know more about instcombine than I do, is this right structure for a new instcombine pass?

Not sure what our position on Claude, though in this case it is just for the test.

@ppenzin ppenzin requested a review from topperc November 22, 2025 08:46
@valadaptive
Copy link
Author

InstCombine guide also asks for alive2 proof, I believe.

Is there a good guide for writing alive2 proofs? I'm not sure how to write a "for all arbitrary constants" constraint.

Copy link
Contributor

@lukel97 lukel97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this as a DAG combine and avoid adding a new pass

@valadaptive
Copy link
Author

Is it possible to do this as a DAG combine and avoid adding a new pass

My understanding is that DAG combine passes run much later in the pipeline, so we'd be losing out on a lot of optimizations if this was a DAG combine.

@lukel97
Copy link
Contributor

lukel97 commented Nov 24, 2025

Is it possible to do this as a DAG combine and avoid adding a new pass

My understanding is that DAG combine passes run much later in the pipeline, so we'd be losing out on a lot of optimizations if this was a DAG combine.

DAGCombiner has plenty of combines on shuffles so I don't think you'd miss much doing it there. But I'm just noticing now that this isn't actually a new pass, it's just implementing a TTI hook. Can you update the PR description to reflect that? I'm aware that the original comment in X86InstCombineIntrinsic.cpp says it's a pass but that probably needs updated.

@valadaptive
Copy link
Author

I've updated the PR description. I also noticed that for most other targets (except AMDGPU and x86), the instCombineIntrinsic hook is not a separate file, and is instead part of [target]TargetTransformInfo.cpp. Should I just move the new hook into WebAssemblyTargetTransformInfo.cpp?

@dschuff
Copy link
Member

dschuff commented Nov 24, 2025

I've updated the PR description. I also noticed that for most other targets (except AMDGPU and x86), the instCombineIntrinsic hook is not a separate file, and is instead part of [target]TargetTransformInfo.cpp. Should I just move the new hook into WebAssemblyTargetTransformInfo.cpp?

I think that would be fine. Both WebAssemblyTargetTransformInfo.cpp and your addition are pretty small, whereas for e.g. X86 they are pretty big.

@dschuff
Copy link
Member

dschuff commented Nov 24, 2025

Not sure what our position on Claude, though in this case it is just for the test.

General LLVM position is that it's fine, but that contributors are responsible for their contributions just as if they wrote it themselves. I think this PR is getting plenty of review so I'm not worried.

@valadaptive
Copy link
Author

I've moved the new TTI hook into WebAssemblyTargetTransformInfo.cpp.

I've decided to keep treating the swizzle indices as signed--the code is a lot cleaner that way, it's functionally equivalent since all valid indices are between 0 and 15, and finally I can't actually tell whether the spec treats them as signed or unsigned.

I'm still unsure about the following:

  • I'm handling undef values in this transform the same way that the x86 version does, but I don't know if that's correct.

  • If I can, I want to write an Alive2 proof for this transform, but I can't find any good guides on how to do so. In particular, I don't know if it supports "for all arbitrary constants" constraints.

return nullptr;
}

if (Index >= NumElts || Index < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-relaxed swizzle treats indices as unsigned, I don't think you can apply same logic to both

Continuing from #169110 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the code to use GetSExtValue for the relaxed swizzle check, and GetZExtValue for everything after that. The outcome should be the same whether we treat them as signed or unsigned, though--the only valid indices are 0-15, which are well in range of a signed 8-bit int.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:WebAssembly llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wasm: i8x16.swizzle with a constant value should be optimized to i8x16.shuffle

6 participants