Skip to content

Conversation

@valadaptive
Copy link

Back when TargetTransformInfo::instCombineIntrinsic was added in https://reviews.llvm.org/D81728, several transforms common to both ARM and AArch64 were kept in the non-target-specific InstCombineCalls.cpp so they could be shared between the two targets.

I want to extend the transform of the tbl intrinsics into static shufflevectors in a similar manner to #169110 (right now it only works with a 64-bit tbl1, but shufflevector should allow it to work with up to 2 operands, and it can definitely work with 128-bit vectors). I think separating out the transform into a TTI hook is a prerequisite.

I'm not happy about creating an entirely new module for this and having to wire it up through CMake and everything, but I'm not sure about the alternatives. If any maintainers can think of a cleaner way of doing this, I'm very open to it.

@valadaptive valadaptive requested a review from nikic as a code owner November 26, 2025 00:56
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added backend:ARM backend:AArch64 llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Nov 26, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 26, 2025

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-transforms

Author: None (valadaptive)

Changes

Back when TargetTransformInfo::instCombineIntrinsic was added in https://reviews.llvm.org/D81728, several transforms common to both ARM and AArch64 were kept in the non-target-specific InstCombineCalls.cpp so they could be shared between the two targets.

I want to extend the transform of the tbl intrinsics into static shufflevectors in a similar manner to #169110 (right now it only works with a 64-bit tbl1, but shufflevector should allow it to work with up to 2 operands, and it can definitely work with 128-bit vectors). I think separating out the transform into a TTI hook is a prerequisite.

I'm not happy about creating an entirely new module for this and having to wire it up through CMake and everything, but I'm not sure about the alternatives. If any maintainers can think of a cleaner way of doing this, I'm very open to it.


Full diff: https://github.com/llvm/llvm-project/pull/169589.diff

13 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp (+13)
  • (modified) llvm/lib/Target/AArch64/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+14)
  • (modified) llvm/lib/Target/ARM/CMakeLists.txt (+1)
  • (added) llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp (+136)
  • (added) llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h (+57)
  • (added) llvm/lib/Target/ARMCommon/CMakeLists.txt (+8)
  • (modified) llvm/lib/Target/CMakeLists.txt (+5)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp (-104)
  • (modified) llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/ARM/tbl1.ll (+1-1)
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 0bae00bafee3c..8acad13c494d4 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "AArch64TargetTransformInfo.h"
+#include "../ARMCommon/ARMCommonInstCombineIntrinsic.h"
 #include "AArch64ExpandImm.h"
 #include "AArch64PerfectShuffle.h"
 #include "AArch64SMEAttributes.h"
@@ -2856,6 +2857,18 @@ AArch64TTIImpl::instCombineIntrinsic(InstCombiner &IC,
   case Intrinsic::aarch64_neon_fmaxnm:
   case Intrinsic::aarch64_neon_fminnm:
     return instCombineMaxMinNM(IC, II);
+  case Intrinsic::aarch64_neon_tbl1:
+    return ARMCommon::simplifyNeonTbl1(II, IC);
+  case Intrinsic::aarch64_neon_smull:
+  case Intrinsic::aarch64_neon_umull: {
+    bool IsSigned = IID == Intrinsic::aarch64_neon_smull;
+    return ARMCommon::simplifyNeonMultiply(II, IC, IsSigned);
+  }
+  case Intrinsic::aarch64_crypto_aesd:
+  case Intrinsic::aarch64_crypto_aese:
+  case Intrinsic::aarch64_sve_aesd:
+  case Intrinsic::aarch64_sve_aese:
+    return ARMCommon::simplifyAES(II, IC);
   case Intrinsic::aarch64_sve_convert_from_svbool:
     return instCombineConvertFromSVBool(IC, II);
   case Intrinsic::aarch64_sve_dup:
diff --git a/llvm/lib/Target/AArch64/CMakeLists.txt b/llvm/lib/Target/AArch64/CMakeLists.txt
index 285d646293eb7..d27a698ee9e4a 100644
--- a/llvm/lib/Target/AArch64/CMakeLists.txt
+++ b/llvm/lib/Target/AArch64/CMakeLists.txt
@@ -101,6 +101,7 @@ add_llvm_target(AArch64CodeGen
   AArch64Desc
   AArch64Info
   AArch64Utils
+  ARMCommon
   Analysis
   AsmPrinter
   CFGuard
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index fdb0ec40cb41f..3dd3ae65321f8 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "ARMTargetTransformInfo.h"
+#include "../ARMCommon/ARMCommonInstCombineIntrinsic.h"
 #include "ARMSubtarget.h"
 #include "MCTargetDesc/ARMAddressingModes.h"
 #include "llvm/ADT/APInt.h"
@@ -182,6 +183,19 @@ ARMTTIImpl::instCombineIntrinsic(InstCombiner &IC, IntrinsicInst &II) const {
     break;
   }
 
+  case Intrinsic::arm_neon_vtbl1:
+    return ARMCommon::simplifyNeonTbl1(II, IC);
+
+  case Intrinsic::arm_neon_vmulls:
+  case Intrinsic::arm_neon_vmullu: {
+    bool IsSigned = IID == Intrinsic::arm_neon_vmulls;
+    return ARMCommon::simplifyNeonMultiply(II, IC, IsSigned);
+  }
+
+  case Intrinsic::arm_neon_aesd:
+  case Intrinsic::arm_neon_aese:
+    return ARMCommon::simplifyAES(II, IC);
+
   case Intrinsic::arm_mve_pred_i2v: {
     Value *Arg = II.getArgOperand(0);
     Value *ArgArg;
diff --git a/llvm/lib/Target/ARM/CMakeLists.txt b/llvm/lib/Target/ARM/CMakeLists.txt
index eb3ad01a54fb2..9fc9bc134e5cc 100644
--- a/llvm/lib/Target/ARM/CMakeLists.txt
+++ b/llvm/lib/Target/ARM/CMakeLists.txt
@@ -73,6 +73,7 @@ add_llvm_target(ARMCodeGen
   Thumb2SizeReduction.cpp
 
   LINK_COMPONENTS
+  ARMCommon
   ARMDesc
   ARMInfo
   ARMUtils
diff --git a/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp
new file mode 100644
index 0000000000000..6e5711a0272e3
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.cpp
@@ -0,0 +1,136 @@
+//===- ARMCommonInstCombineIntrinsic.cpp -
+//                  instCombineIntrinsic opts for both ARM and AArch64  ---===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains optimizations for ARM and AArch64 intrinsics that
+/// are shared between both architectures. These functions can be called from:
+/// - ARM TTI's instCombineIntrinsic (for arm_neon_* intrinsics)
+/// - AArch64 TTI's instCombineIntrinsic (for aarch64_neon_* and aarch64_sve_*
+///   intrinsics)
+///
+//===----------------------------------------------------------------------===//
+
+#include "ARMCommonInstCombineIntrinsic.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+
+using namespace llvm;
+using namespace llvm::PatternMatch;
+
+namespace llvm {
+namespace ARMCommon {
+
+/// Convert a table lookup to shufflevector if the mask is constant.
+/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
+/// which case we could lower the shufflevector with rev64 instructions
+/// as it's actually a byte reverse.
+Instruction *simplifyNeonTbl1(IntrinsicInst &II, InstCombiner &IC) {
+  // Bail out if the mask is not a constant.
+  auto *C = dyn_cast<Constant>(II.getArgOperand(1));
+  if (!C)
+    return nullptr;
+
+  auto *VecTy = cast<FixedVectorType>(II.getType());
+  unsigned NumElts = VecTy->getNumElements();
+
+  // Only perform this transformation for <8 x i8> vector types.
+  if (!VecTy->getElementType()->isIntegerTy(8) || NumElts != 8)
+    return nullptr;
+
+  int Indexes[8];
+
+  for (unsigned I = 0; I < NumElts; ++I) {
+    Constant *COp = C->getAggregateElement(I);
+
+    if (!COp || !isa<ConstantInt>(COp))
+      return nullptr;
+
+    Indexes[I] = cast<ConstantInt>(COp)->getLimitedValue();
+
+    // Make sure the mask indices are in range.
+    if ((unsigned)Indexes[I] >= NumElts)
+      return nullptr;
+  }
+
+  auto *V1 = II.getArgOperand(0);
+  auto *V2 = Constant::getNullValue(V1->getType());
+  Value *Shuf = IC.Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes));
+  return IC.replaceInstUsesWith(II, Shuf);
+}
+
+/// Simplify NEON multiply-long intrinsics (smull, umull).
+/// These intrinsics perform widening multiplies: they multiply two vectors of
+/// narrow integers and produce a vector of wider integers. This function
+/// performs algebraic simplifications:
+/// 1. Multiply by zero => zero vector
+/// 2. Multiply by one => zero/sign-extend the non-one operand
+/// 3. Both operands constant => regular multiply that can be constant-folded
+///    later
+Instruction *simplifyNeonMultiply(IntrinsicInst &II, InstCombiner &IC,
+                                  bool IsSigned) {
+  Value *Arg0 = II.getArgOperand(0);
+  Value *Arg1 = II.getArgOperand(1);
+
+  // Handle mul by zero first:
+  if (isa<ConstantAggregateZero>(Arg0) || isa<ConstantAggregateZero>(Arg1)) {
+    return IC.replaceInstUsesWith(II, ConstantAggregateZero::get(II.getType()));
+  }
+
+  // Check for constant LHS & RHS - in this case we just simplify.
+  VectorType *NewVT = cast<VectorType>(II.getType());
+  if (Constant *CV0 = dyn_cast<Constant>(Arg0)) {
+    if (Constant *CV1 = dyn_cast<Constant>(Arg1)) {
+      Value *V0 = IC.Builder.CreateIntCast(CV0, NewVT, IsSigned);
+      Value *V1 = IC.Builder.CreateIntCast(CV1, NewVT, IsSigned);
+      return IC.replaceInstUsesWith(II, IC.Builder.CreateMul(V0, V1));
+    }
+
+    // Couldn't simplify - canonicalize constant to the RHS.
+    std::swap(Arg0, Arg1);
+  }
+
+  // Handle mul by one:
+  if (Constant *CV1 = dyn_cast<Constant>(Arg1))
+    if (ConstantInt *Splat =
+            dyn_cast_or_null<ConstantInt>(CV1->getSplatValue()))
+      if (Splat->isOne())
+        return CastInst::CreateIntegerCast(Arg0, II.getType(), IsSigned);
+
+  return nullptr;
+}
+
+/// Simplify AES encryption/decryption intrinsics (AESE, AESD).
+///
+/// ARM's AES instructions (AESE/AESD) XOR the data and the key, provided as
+/// separate arguments, before performing the encryption/decryption operation.
+/// We can fold that "internal" XOR with a previous one.
+Instruction *simplifyAES(IntrinsicInst &II, InstCombiner &IC) {
+  Value *DataArg = II.getArgOperand(0);
+  Value *KeyArg = II.getArgOperand(1);
+
+  // Accept zero on either operand.
+  if (!match(KeyArg, m_ZeroInt()))
+    std::swap(KeyArg, DataArg);
+
+  // Try to use the builtin XOR in AESE and AESD to eliminate a prior XOR
+  Value *Data, *Key;
+  if (match(KeyArg, m_ZeroInt()) &&
+      match(DataArg, m_Xor(m_Value(Data), m_Value(Key)))) {
+    IC.replaceOperand(II, 0, Data);
+    IC.replaceOperand(II, 1, Key);
+    return &II;
+  }
+
+  return nullptr;
+}
+
+} // namespace ARMCommon
+} // namespace llvm
diff --git a/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h
new file mode 100644
index 0000000000000..541fb6a57f558
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/ARMCommonInstCombineIntrinsic.h
@@ -0,0 +1,57 @@
+//===- ARMCommonInstCombineIntrinsic.h -
+// instCombineIntrinsic opts for both ARM and AArch64 -----------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains optimizations for ARM and AArch64 intrinsics that
+/// are shared between both architectures. These functions can be called from:
+/// - ARM TTI's instCombineIntrinsic (for arm_neon_* intrinsics)
+/// - AArch64 TTI's instCombineIntrinsic (for aarch64_neon_* and aarch64_sve_*
+///   intrinsics)
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
+#define LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
+
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Transforms/InstCombine/InstCombiner.h"
+
+namespace llvm {
+
+namespace ARMCommon {
+
+/// Convert a table lookup to shufflevector if the mask is constant.
+/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
+/// which case we could lower the shufflevector with rev64 instructions
+/// as it's actually a byte reverse.
+Instruction *simplifyNeonTbl1(IntrinsicInst &II, InstCombiner &IC);
+
+/// Simplify NEON multiply-long intrinsics (smull, umull).
+/// These intrinsics perform widening multiplies: they multiply two vectors of
+/// narrow integers and produce a vector of wider integers. This function
+/// performs algebraic simplifications:
+/// 1. Multiply by zero => zero vector
+/// 2. Multiply by one => zero/sign-extend the non-one operand
+/// 3. Both operands constant => regular multiply that can be constant-folded
+///    later
+Instruction *simplifyNeonMultiply(IntrinsicInst &II, InstCombiner &IC,
+                                  bool IsSigned);
+
+/// Simplify AES encryption/decryption intrinsics (AESE, AESD).
+///
+/// ARM's AES instructions (AESE/AESD) XOR the data and the key, provided as
+/// separate arguments, before performing the encryption/decryption operation.
+/// We can fold that "internal" XOR with a previous one.
+Instruction *simplifyAES(IntrinsicInst &II, InstCombiner &IC);
+
+} // namespace ARMCommon
+} // namespace llvm
+
+#endif // LLVM_LIB_TARGET_ARMCOMMON_ARMCOMMONINSTCOMBINEINTRINSIC_H
diff --git a/llvm/lib/Target/ARMCommon/CMakeLists.txt b/llvm/lib/Target/ARMCommon/CMakeLists.txt
new file mode 100644
index 0000000000000..1805a5df2f053
--- /dev/null
+++ b/llvm/lib/Target/ARMCommon/CMakeLists.txt
@@ -0,0 +1,8 @@
+add_llvm_component_library(LLVMARMCommon
+  ARMCommonInstCombineIntrinsic.cpp
+
+  LINK_COMPONENTS
+  Core
+  Support
+  TransformUtils
+  )
diff --git a/llvm/lib/Target/CMakeLists.txt b/llvm/lib/Target/CMakeLists.txt
index bcc13f942bf96..e3528014a4be2 100644
--- a/llvm/lib/Target/CMakeLists.txt
+++ b/llvm/lib/Target/CMakeLists.txt
@@ -31,6 +31,11 @@ if (NOT BUILD_SHARED_LIBS AND NOT APPLE AND
   set(CMAKE_CXX_VISIBILITY_PRESET hidden)
 endif()
 
+# Add shared ARM/AArch64 utilities if either target is being built
+if("ARM" IN_LIST LLVM_TARGETS_TO_BUILD OR "AArch64" IN_LIST LLVM_TARGETS_TO_BUILD)
+  add_subdirectory(ARMCommon)
+endif()
+
 foreach(t ${LLVM_TARGETS_TO_BUILD})
   message(STATUS "Targeting ${t}")
   add_subdirectory(${t})
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
index 8e4edefec42fd..8a54c0dde6be6 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -737,44 +737,6 @@ static Instruction *foldCtpop(IntrinsicInst &II, InstCombinerImpl &IC) {
   return nullptr;
 }
 
-/// Convert a table lookup to shufflevector if the mask is constant.
-/// This could benefit tbl1 if the mask is { 7,6,5,4,3,2,1,0 }, in
-/// which case we could lower the shufflevector with rev64 instructions
-/// as it's actually a byte reverse.
-static Value *simplifyNeonTbl1(const IntrinsicInst &II,
-                               InstCombiner::BuilderTy &Builder) {
-  // Bail out if the mask is not a constant.
-  auto *C = dyn_cast<Constant>(II.getArgOperand(1));
-  if (!C)
-    return nullptr;
-
-  auto *VecTy = cast<FixedVectorType>(II.getType());
-  unsigned NumElts = VecTy->getNumElements();
-
-  // Only perform this transformation for <8 x i8> vector types.
-  if (!VecTy->getElementType()->isIntegerTy(8) || NumElts != 8)
-    return nullptr;
-
-  int Indexes[8];
-
-  for (unsigned I = 0; I < NumElts; ++I) {
-    Constant *COp = C->getAggregateElement(I);
-
-    if (!COp || !isa<ConstantInt>(COp))
-      return nullptr;
-
-    Indexes[I] = cast<ConstantInt>(COp)->getLimitedValue();
-
-    // Make sure the mask indices are in range.
-    if ((unsigned)Indexes[I] >= NumElts)
-      return nullptr;
-  }
-
-  auto *V1 = II.getArgOperand(0);
-  auto *V2 = Constant::getNullValue(V1->getType());
-  return Builder.CreateShuffleVector(V1, V2, ArrayRef(Indexes));
-}
-
 // Returns true iff the 2 intrinsics have the same operands, limiting the
 // comparison to the first NumOperands.
 static bool haveSameOperands(const IntrinsicInst &I, const IntrinsicInst &E,
@@ -3155,72 +3117,6 @@ Instruction *InstCombinerImpl::visitCallInst(CallInst &CI) {
         Intrinsic::getOrInsertDeclaration(II->getModule(), NewIntrin);
     return CallInst::Create(NewFn, CallArgs);
   }
-  case Intrinsic::arm_neon_vtbl1:
-  case Intrinsic::aarch64_neon_tbl1:
-    if (Value *V = simplifyNeonTbl1(*II, Builder))
-      return replaceInstUsesWith(*II, V);
-    break;
-
-  case Intrinsic::arm_neon_vmulls:
-  case Intrinsic::arm_neon_vmullu:
-  case Intrinsic::aarch64_neon_smull:
-  case Intrinsic::aarch64_neon_umull: {
-    Value *Arg0 = II->getArgOperand(0);
-    Value *Arg1 = II->getArgOperand(1);
-
-    // Handle mul by zero first:
-    if (isa<ConstantAggregateZero>(Arg0) || isa<ConstantAggregateZero>(Arg1)) {
-      return replaceInstUsesWith(CI, ConstantAggregateZero::get(II->getType()));
-    }
-
-    // Check for constant LHS & RHS - in this case we just simplify.
-    bool Zext = (IID == Intrinsic::arm_neon_vmullu ||
-                 IID == Intrinsic::aarch64_neon_umull);
-    VectorType *NewVT = cast<VectorType>(II->getType());
-    if (Constant *CV0 = dyn_cast<Constant>(Arg0)) {
-      if (Constant *CV1 = dyn_cast<Constant>(Arg1)) {
-        Value *V0 = Builder.CreateIntCast(CV0, NewVT, /*isSigned=*/!Zext);
-        Value *V1 = Builder.CreateIntCast(CV1, NewVT, /*isSigned=*/!Zext);
-        return replaceInstUsesWith(CI, Builder.CreateMul(V0, V1));
-      }
-
-      // Couldn't simplify - canonicalize constant to the RHS.
-      std::swap(Arg0, Arg1);
-    }
-
-    // Handle mul by one:
-    if (Constant *CV1 = dyn_cast<Constant>(Arg1))
-      if (ConstantInt *Splat =
-              dyn_cast_or_null<ConstantInt>(CV1->getSplatValue()))
-        if (Splat->isOne())
-          return CastInst::CreateIntegerCast(Arg0, II->getType(),
-                                             /*isSigned=*/!Zext);
-
-    break;
-  }
-  case Intrinsic::arm_neon_aesd:
-  case Intrinsic::arm_neon_aese:
-  case Intrinsic::aarch64_crypto_aesd:
-  case Intrinsic::aarch64_crypto_aese:
-  case Intrinsic::aarch64_sve_aesd:
-  case Intrinsic::aarch64_sve_aese: {
-    Value *DataArg = II->getArgOperand(0);
-    Value *KeyArg  = II->getArgOperand(1);
-
-    // Accept zero on either operand.
-    if (!match(KeyArg, m_ZeroInt()))
-      std::swap(KeyArg, DataArg);
-
-    // Try to use the builtin XOR in AESE and AESD to eliminate a prior XOR
-    Value *Data, *Key;
-    if (match(KeyArg, m_ZeroInt()) &&
-        match(DataArg, m_Xor(m_Value(Data), m_Value(Key)))) {
-      replaceOperand(*II, 0, Data);
-      replaceOperand(*II, 1, Key);
-      return II;
-    }
-    break;
-  }
   case Intrinsic::hexagon_V6_vandvrt:
   case Intrinsic::hexagon_V6_vandvrt_128B: {
     // Simplify Q -> V -> Q conversion.
diff --git a/llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll b/llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll
index 8c69d0721b738..fdc628bb59cb0 100644
--- a/llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/AArch64/aes-intrinsics.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
-; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+; RUN: opt --mtriple=aarch64 -S -passes=instcombine < %s | FileCheck %s
 ; ARM64 AES intrinsic variants
 
 define <16 x i8> @combineXorAeseZeroARM64(<16 x i8> %data, <16 x i8> %key) {
diff --git a/llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll b/llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll
index 5fc5709ff8897..9ba4b418cb8e5 100644
--- a/llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/ARM/2012-04-23-Neon-Intrinsics.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
-; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+; RUN: opt -mtriple=arm -S -passes=instcombine < %s | FileCheck %s
 
 define <4 x i32> @mulByZero(<4 x i16> %x) nounwind readnone ssp {
 ; CHECK-LABEL: define <4 x i32> @mulByZero(
diff --git a/llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll b/llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll
index 0056d872ff9e3..10175096035ec 100644
--- a/llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll
+++ b/llvm/test/Transforms/InstCombine/ARM/aes-intrinsics.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
-; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+; RUN: opt -mtriple=arm -S -passes=instcombine < %s | FileCheck %s
 ; ARM AES intrinsic variants
 
 define <16 x i8> @combineXorAeseZeroARM(<16 x i8> %data, <16 x i8> %key) {
diff --git a/llvm/test/Transforms/InstCombine/ARM/tbl1.ll b/llvm/test/Transforms/InstCombine/ARM/tbl1.ll
index fbec1a2bb7a07..ceeac8648ec51 100644
--- a/llvm/test/Transforms/InstCombine/ARM/tbl1.ll
+++ b/llvm/test/Transforms/InstCombine/ARM/tbl1.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
-; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+; RUN: opt < %s -mtriple=arm -passes=instcombine -S | FileCheck %s
 
 target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
 target triple = "armv8-arm-none-eabi"

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I just read https://discourse.llvm.org/t/should-there-be-an-architecture-agnostic-variable-vector-shuffle-operation/88936. (I like the idea). If that goes forward would this still be required?

Can the new file go into Transforms/Utils or somewhere like it? That would avoid the need for a new module, and the code in it doesn't look like it is using anything specific to the architectures (it doesn't use intrinsic definitions, etc, just checks for constants).

@valadaptive
Copy link
Author

Putting them in Transforms/Utils is a lot simpler, so I've moved them there. What's the policy for stacked PRs when an earlier PR in the "stack" is modified; do I merge the changes into the later one?

A target-agnostic variable vector shuffle may obviate this transform, depending on how it's implemented. If there's a variable vector shuffle that accepts up to four source operands, the same as the tbl intrinsics, all the target-specific "convert my bespoke shuffle intrinsic into a shufflevector" optimizations could be replaced.

If the hypothetical variable vector shuffle accepts fewer than four source operands, you could convert some subset of tbl intrinsics to variable vector shuffles, but that would miss out on optimizations where e.g. multiple source operands are the same, and you can adjust the statically-known shuffle indices to point to at most two sources.

Either way, I think coming up with a satisfactory design for such an operation will take a while, and implementing it for all targets + ensuring it can be optimized will take longer still. I think #169748 could be landed much earlier, and the regression tests added there are a useful way to ensure that whatever "variable shufflevector" we come up with has equivalent optimization potential.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AArch64 backend:ARM llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants