X86: make VBMI2 funnel shifts use VSHLD/VSHRD for const splats #169401

ArnavM3434 · 2025-11-24T20:17:54Z

github-actions · 2025-11-24T20:18:12Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-11-24T20:18:40Z

@llvm/pr-subscribers-backend-x86

Author: None (ArnavM3434)

Changes

Legalize ISD::FSHL/ISD::FSHR for VBMI2 targets (v32i16 / v16i32 / v8i64 types) and VBMI2 + VLX targets (v8i16 / v16i16 / v4i32 / v8i32 / v2i64 / v4i64 types).

Convert uniform constant shifts to X86ISD::VSHLD/X86ISD::VSHRD by adding a combineFunnelShift.

Keeps LowerFunnelShift only for 512-bit widening (VBMI2 without VLX).

Testing:

Added a small unit test just to check legality of instructions. For lowering testing used existing regression tests.

Fixes [#166949].

Patch is 470.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169401.diff

3 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+2674-2535)
(modified) llvm/unittests/Target/X86/CMakeLists.txt (+1)
(added) llvm/unittests/Target/X86/FunnelShiftCombineTest.cpp (+163)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index dc84025c166a3..5714211dba11d 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -189,10 +189,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   // We don't accept any truncstore of integer registers.
   setTruncStoreAction(MVT::i64, MVT::i32, Expand);
   setTruncStoreAction(MVT::i64, MVT::i16, Expand);
-  setTruncStoreAction(MVT::i64, MVT::i8 , Expand);
+  setTruncStoreAction(MVT::i64, MVT::i8, Expand);
   setTruncStoreAction(MVT::i32, MVT::i16, Expand);
-  setTruncStoreAction(MVT::i32, MVT::i8 , Expand);
-  setTruncStoreAction(MVT::i16, MVT::i8,  Expand);
+  setTruncStoreAction(MVT::i32, MVT::i8, Expand);
+  setTruncStoreAction(MVT::i16, MVT::i8, Expand);
 
   setTruncStoreAction(MVT::f64, MVT::f32, Expand);
 
@@ -204,106 +204,106 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
 
   // Integer absolute.
   if (Subtarget.canUseCMOV()) {
-    setOperationAction(ISD::ABS            , MVT::i16  , Custom);
-    setOperationAction(ISD::ABS            , MVT::i32  , Custom);
+    setOperationAction(ISD::ABS, MVT::i16, Custom);
+    setOperationAction(ISD::ABS, MVT::i32, Custom);
     if (Subtarget.is64Bit())
-      setOperationAction(ISD::ABS          , MVT::i64  , Custom);
+      setOperationAction(ISD::ABS, MVT::i64, Custom);
   }
 
   // Absolute difference.
   for (auto Op : {ISD::ABDS, ISD::ABDU}) {
-    setOperationAction(Op                  , MVT::i8   , Custom);
-    setOperationAction(Op                  , MVT::i16  , Custom);
-    setOperationAction(Op                  , MVT::i32  , Custom);
+    setOperationAction(Op, MVT::i8, Custom);
+    setOperationAction(Op, MVT::i16, Custom);
+    setOperationAction(Op, MVT::i32, Custom);
     if (Subtarget.is64Bit())
-     setOperationAction(Op                 , MVT::i64  , Custom);
+      setOperationAction(Op, MVT::i64, Custom);
   }
 
   // Signed saturation subtraction.
-  setOperationAction(ISD::SSUBSAT          , MVT::i8   , Custom);
-  setOperationAction(ISD::SSUBSAT          , MVT::i16  , Custom);
-  setOperationAction(ISD::SSUBSAT          , MVT::i32  , Custom);
+  setOperationAction(ISD::SSUBSAT, MVT::i8, Custom);
+  setOperationAction(ISD::SSUBSAT, MVT::i16, Custom);
+  setOperationAction(ISD::SSUBSAT, MVT::i32, Custom);
   if (Subtarget.is64Bit())
-    setOperationAction(ISD::SSUBSAT        , MVT::i64  , Custom);
+    setOperationAction(ISD::SSUBSAT, MVT::i64, Custom);
 
   // Funnel shifts.
   for (auto ShiftOp : {ISD::FSHL, ISD::FSHR}) {
     // For slow shld targets we only lower for code size.
     LegalizeAction ShiftDoubleAction = Subtarget.isSHLDSlow() ? Custom : Legal;
 
-    setOperationAction(ShiftOp             , MVT::i8   , Custom);
-    setOperationAction(ShiftOp             , MVT::i16  , Custom);
-    setOperationAction(ShiftOp             , MVT::i32  , ShiftDoubleAction);
+    setOperationAction(ShiftOp, MVT::i8, Custom);
+    setOperationAction(ShiftOp, MVT::i16, Custom);
+    setOperationAction(ShiftOp, MVT::i32, ShiftDoubleAction);
     if (Subtarget.is64Bit())
-      setOperationAction(ShiftOp           , MVT::i64  , ShiftDoubleAction);
+      setOperationAction(ShiftOp, MVT::i64, ShiftDoubleAction);
   }
 
   if (!Subtarget.useSoftFloat()) {
     // Promote all UINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have this
     // operation.
-    setOperationAction(ISD::UINT_TO_FP,        MVT::i8, Promote);
+    setOperationAction(ISD::UINT_TO_FP, MVT::i8, Promote);
     setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i8, Promote);
-    setOperationAction(ISD::UINT_TO_FP,        MVT::i16, Promote);
+    setOperationAction(ISD::UINT_TO_FP, MVT::i16, Promote);
     setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i16, Promote);
     // We have an algorithm for SSE2, and we turn this into a 64-bit
     // FILD or VCVTUSI2SS/SD for other targets.
-    setOperationAction(ISD::UINT_TO_FP,        MVT::i32, Custom);
+    setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);
     setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i32, Custom);
     // We have an algorithm for SSE2->double, and we turn this into a
     // 64-bit FILD followed by conditional FADD for other targets.
-    setOperationAction(ISD::UINT_TO_FP,        MVT::i64, Custom);
+    setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);
     setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i64, Custom);
 
     // Promote i8 SINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have
     // this operation.
-    setOperationAction(ISD::SINT_TO_FP,        MVT::i8, Promote);
+    setOperationAction(ISD::SINT_TO_FP, MVT::i8, Promote);
     setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i8, Promote);
     // SSE has no i16 to fp conversion, only i32. We promote in the handler
     // to allow f80 to use i16 and f64 to use i16 with sse1 only
-    setOperationAction(ISD::SINT_TO_FP,        MVT::i16, Custom);
+    setOperationAction(ISD::SINT_TO_FP, MVT::i16, Custom);
     setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i16, Custom);
     // f32 and f64 cases are Legal with SSE1/SSE2, f80 case is not
-    setOperationAction(ISD::SINT_TO_FP,        MVT::i32, Custom);
+    setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
     setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i32, Custom);
     // In 32-bit mode these are custom lowered.  In 64-bit mode F32 and F64
     // are Legal, f80 is custom lowered.
-    setOperationAction(ISD::SINT_TO_FP,        MVT::i64, Custom);
+    setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);
     setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i64, Custom);
 
     // Promote i8 FP_TO_SINT to larger FP_TO_SINTS's, as X86 doesn't have
     // this operation.
-    setOperationAction(ISD::FP_TO_SINT,        MVT::i8,  Promote);
+    setOperationAction(ISD::FP_TO_SINT, MVT::i8, Promote);
     // FIXME: This doesn't generate invalid exception when it should. PR44019.
-    setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i8,  Promote);
-    setOperationAction(ISD::FP_TO_SINT,        MVT::i16, Custom);
+    setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i8, Promote);
+    setOperationAction(ISD::FP_TO_SINT, MVT::i16, Custom);
     setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i16, Custom);
-    setOperationAction(ISD::FP_TO_SINT,        MVT::i32, Custom);
+    setOperationAction(ISD::FP_TO_SINT, MVT::i32, Custom);
     setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i32, Custom);
     // In 32-bit mode these are custom lowered.  In 64-bit mode F32 and F64
     // are Legal, f80 is custom lowered.
-    setOperationAction(ISD::FP_TO_SINT,        MVT::i64, Custom);
+    setOperationAction(ISD::FP_TO_SINT, MVT::i64, Custom);
     setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i64, Custom);
 
     // Handle FP_TO_UINT by promoting the destination to a larger signed
     // conversion.
-    setOperationAction(ISD::FP_TO_UINT,        MVT::i8,  Promote);
+    setOperationAction(ISD::FP_TO_UINT, MVT::i8, Promote);
     // FIXME: This doesn't generate invalid exception when it should. PR44019.
-    setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i8,  Promote);
-    setOperationAction(ISD::FP_TO_UINT,        MVT::i16, Promote);
+    setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i8, Promote);
+    setOperationAction(ISD::FP_TO_UINT, MVT::i16, Promote);
     // FIXME: This doesn't generate invalid exception when it should. PR44019.
     setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i16, Promote);
-    setOperationAction(ISD::FP_TO_UINT,        MVT::i32, Custom);
+    setOperationAction(ISD::FP_TO_UINT, MVT::i32, Custom);
     setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i32, Custom);
-    setOperationAction(ISD::FP_TO_UINT,        MVT::i64, Custom);
+    setOperationAction(ISD::FP_TO_UINT, MVT::i64, Custom);
     setOperationAction(ISD::STRICT_FP_TO_UINT, MVT::i64, Custom);
 
-    setOperationAction(ISD::LRINT,             MVT::f32, Custom);
-    setOperationAction(ISD::LRINT,             MVT::f64, Custom);
-    setOperationAction(ISD::LLRINT,            MVT::f32, Custom);
-    setOperationAction(ISD::LLRINT,            MVT::f64, Custom);
+    setOperationAction(ISD::LRINT, MVT::f32, Custom);
+    setOperationAction(ISD::LRINT, MVT::f64, Custom);
+    setOperationAction(ISD::LLRINT, MVT::f32, Custom);
+    setOperationAction(ISD::LLRINT, MVT::f64, Custom);
 
     if (!Subtarget.is64Bit()) {
-      setOperationAction(ISD::LRINT,  MVT::i64, Custom);
+      setOperationAction(ISD::LRINT, MVT::i64, Custom);
       setOperationAction(ISD::LLRINT, MVT::i64, Custom);
     }
   }
@@ -311,7 +311,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   if (Subtarget.hasSSE2()) {
     // Custom lowering for saturating float to int conversions.
     // We handle promotion to larger result types manually.
-    for (MVT VT : { MVT::i8, MVT::i16, MVT::i32 }) {
+    for (MVT VT : {MVT::i8, MVT::i16, MVT::i32}) {
       setOperationAction(ISD::FP_TO_UINT_SAT, VT, Custom);
       setOperationAction(ISD::FP_TO_SINT_SAT, VT, Custom);
     }
@@ -344,17 +344,17 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
 
   // TODO: when we have SSE, these could be more efficient, by using movd/movq.
   if (!Subtarget.hasSSE2()) {
-    setOperationAction(ISD::BITCAST        , MVT::f32  , Expand);
-    setOperationAction(ISD::BITCAST        , MVT::i32  , Expand);
+    setOperationAction(ISD::BITCAST, MVT::f32, Expand);
+    setOperationAction(ISD::BITCAST, MVT::i32, Expand);
     setOperationAction(ISD::FCANONICALIZE, MVT::f32, Custom);
     setOperationAction(ISD::FCANONICALIZE, MVT::f64, Custom);
     if (Subtarget.is64Bit()) {
-      setOperationAction(ISD::BITCAST      , MVT::f64  , Expand);
+      setOperationAction(ISD::BITCAST, MVT::f64, Expand);
       // Without SSE, i64->f64 goes through memory.
-      setOperationAction(ISD::BITCAST      , MVT::i64  , Expand);
+      setOperationAction(ISD::BITCAST, MVT::i64, Expand);
     }
   } else if (!Subtarget.is64Bit())
-    setOperationAction(ISD::BITCAST      , MVT::i64  , Custom);
+    setOperationAction(ISD::BITCAST, MVT::i64, Custom);
 
   // Scalar integer divide and remainder are lowered to use operations that
   // produce two results, to match the available instructions. This exposes
@@ -366,7 +366,7 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   // (low) operations are left as Legal, as there are single-result
   // instructions for this in x86. Using the two-result multiply instructions
   // when both high and low results are needed must be arranged by dagcombine.
-  for (auto VT : { MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {
+  for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) {
     setOperationAction(ISD::MULHS, VT, Expand);
     setOperationAction(ISD::MULHU, VT, Expand);
     setOperationAction(ISD::SDIV, VT, Expand);
@@ -375,47 +375,47 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
     setOperationAction(ISD::UREM, VT, Expand);
   }
 
-  setOperationAction(ISD::BR_JT            , MVT::Other, Expand);
-  setOperationAction(ISD::BRCOND           , MVT::Other, Custom);
-  for (auto VT : { MVT::f32, MVT::f64, MVT::f80, MVT::f128,
-                   MVT::i8,  MVT::i16, MVT::i32, MVT::i64 }) {
-    setOperationAction(ISD::BR_CC,     VT, Expand);
+  setOperationAction(ISD::BR_JT, MVT::Other, Expand);
+  setOperationAction(ISD::BRCOND, MVT::Other, Custom);
+  for (auto VT : {MVT::f32, MVT::f64, MVT::f80, MVT::f128, MVT::i8, MVT::i16,
+                  MVT::i32, MVT::i64}) {
+    setOperationAction(ISD::BR_CC, VT, Expand);
     setOperationAction(ISD::SELECT_CC, VT, Expand);
   }
   if (Subtarget.is64Bit())
     setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Legal);
-  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16  , Legal);
-  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8   , Legal);
-  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1   , Expand);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16, Legal);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8, Legal);
+  setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i1, Expand);
 
-  setOperationAction(ISD::FREM             , MVT::f32  , Expand);
-  setOperationAction(ISD::FREM             , MVT::f64  , Expand);
-  setOperationAction(ISD::FREM             , MVT::f80  , Expand);
-  setOperationAction(ISD::FREM             , MVT::f128 , Expand);
+  setOperationAction(ISD::FREM, MVT::f32, Expand);
+  setOperationAction(ISD::FREM, MVT::f64, Expand);
+  setOperationAction(ISD::FREM, MVT::f80, Expand);
+  setOperationAction(ISD::FREM, MVT::f128, Expand);
 
   if (!Subtarget.useSoftFloat() && Subtarget.hasX87()) {
-    setOperationAction(ISD::GET_ROUNDING   , MVT::i32  , Custom);
-    setOperationAction(ISD::SET_ROUNDING   , MVT::Other, Custom);
-    setOperationAction(ISD::GET_FPENV_MEM  , MVT::Other, Custom);
-    setOperationAction(ISD::SET_FPENV_MEM  , MVT::Other, Custom);
-    setOperationAction(ISD::RESET_FPENV    , MVT::Other, Custom);
+    setOperationAction(ISD::GET_ROUNDING, MVT::i32, Custom);
+    setOperationAction(ISD::SET_ROUNDING, MVT::Other, Custom);
+    setOperationAction(ISD::GET_FPENV_MEM, MVT::Other, Custom);
+    setOperationAction(ISD::SET_FPENV_MEM, MVT::Other, Custom);
+    setOperationAction(ISD::RESET_FPENV, MVT::Other, Custom);
   }
 
   // Promote the i8 variants and force them on up to i32 which has a shorter
   // encoding.
-  setOperationPromotedToType(ISD::CTTZ           , MVT::i8   , MVT::i32);
-  setOperationPromotedToType(ISD::CTTZ_ZERO_UNDEF, MVT::i8   , MVT::i32);
+  setOperationPromotedToType(ISD::CTTZ, MVT::i8, MVT::i32);
+  setOperationPromotedToType(ISD::CTTZ_ZERO_UNDEF, MVT::i8, MVT::i32);
   // Promoted i16. tzcntw has a false dependency on Intel CPUs. For BSF, we emit
   // a REP prefix to encode it as TZCNT for modern CPUs so it makes sense to
   // promote that too.
-  setOperationPromotedToType(ISD::CTTZ           , MVT::i16  , MVT::i32);
-  setOperationPromotedToType(ISD::CTTZ_ZERO_UNDEF, MVT::i16  , MVT::i32);
+  setOperationPromotedToType(ISD::CTTZ, MVT::i16, MVT::i32);
+  setOperationPromotedToType(ISD::CTTZ_ZERO_UNDEF, MVT::i16, MVT::i32);
 
   if (!Subtarget.hasBMI()) {
-    setOperationAction(ISD::CTTZ           , MVT::i32  , Custom);
-    setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32  , Legal);
+    setOperationAction(ISD::CTTZ, MVT::i32, Custom);
+    setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Legal);
     if (Subtarget.is64Bit()) {
-      setOperationAction(ISD::CTTZ         , MVT::i64  , Custom);
+      setOperationAction(ISD::CTTZ, MVT::i64, Custom);
       setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Legal);
     }
   }
@@ -423,13 +423,13 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   if (Subtarget.hasLZCNT()) {
     // When promoting the i8 variants, force them to i32 for a shorter
     // encoding.
-    setOperationPromotedToType(ISD::CTLZ           , MVT::i8   , MVT::i32);
-    setOperationPromotedToType(ISD::CTLZ_ZERO_UNDEF, MVT::i8   , MVT::i32);
+    setOperationPromotedToType(ISD::CTLZ, MVT::i8, MVT::i32);
+    setOperationPromotedToType(ISD::CTLZ_ZERO_UNDEF, MVT::i8, MVT::i32);
   } else {
     for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) {
       if (VT == MVT::i64 && !Subtarget.is64Bit())
         continue;
-      setOperationAction(ISD::CTLZ           , VT, Custom);
+      setOperationAction(ISD::CTLZ, VT, Custom);
       setOperationAction(ISD::CTLZ_ZERO_UNDEF, VT, Custom);
     }
   }
@@ -474,36 +474,36 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
     // on the dest that popcntl hasn't had since Cannon Lake.
     setOperationPromotedToType(ISD::CTPOP, MVT::i16, MVT::i32);
   } else {
-    setOperationAction(ISD::CTPOP          , MVT::i8   , Custom);
-    setOperationAction(ISD::CTPOP          , MVT::i16  , Custom);
-    setOperationAction(ISD::CTPOP          , MVT::i32  , Custom);
-    setOperationAction(ISD::CTPOP          , MVT::i64  , Custom);
+    setOperationAction(ISD::CTPOP, MVT::i8, Custom);
+    setOperationAction(ISD::CTPOP, MVT::i16, Custom);
+    setOperationAction(ISD::CTPOP, MVT::i32, Custom);
+    setOperationAction(ISD::CTPOP, MVT::i64, Custom);
   }
 
-  setOperationAction(ISD::READCYCLECOUNTER , MVT::i64  , Custom);
+  setOperationAction(ISD::READCYCLECOUNTER, MVT::i64, Custom);
 
   if (!Subtarget.hasMOVBE())
-    setOperationAction(ISD::BSWAP          , MVT::i16  , Expand);
+    setOperationAction(ISD::BSWAP, MVT::i16, Expand);
 
   // X86 wants to expand cmov itself.
-  for (auto VT : { MVT::f32, MVT::f64, MVT::f80, MVT::f128 }) {
+  for (auto VT : {MVT::f32, MVT::f64, MVT::f80, MVT::f128}) {
     setOperationAction(ISD::SELECT, VT, Custom);
     setOperationAction(ISD::SETCC, VT, Custom);
     setOperationAction(ISD::STRICT_FSETCC, VT, Custom);
     setOperationAction(ISD::STRICT_FSETCCS, VT, Custom);
   }
-  for (auto VT : { MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {
+  for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) {
     if (VT == MVT::i64 && !Subtarget.is64Bit())
       continue;
     setOperationAction(ISD::SELECT, VT, Custom);
-    setOperationAction(ISD::SETCC,  VT, Custom);
+    setOperationAction(ISD::SETCC, VT, Custom);
   }
 
   // Custom action for SELECT MMX and expand action for SELECT_CC MMX
   setOperationAction(ISD::SELECT, MVT::x86mmx, Custom);
   setOperationAction(ISD::SELECT_CC, MVT::x86mmx, Expand);
 
-  setOperationAction(ISD::EH_RETURN       , MVT::Other, Custom);
+  setOperationAction(ISD::EH_RETURN, MVT::Other, Custom);
   // NOTE: EH_SJLJ_SETJMP/_LONGJMP are not recommended, since
   // LLVM/Clang supports zero-cost DWARF and SEH exception handling.
   setOperationAction(ISD::EH_SJLJ_SETJMP, MVT::i32, Custom);
@@ -511,19 +511,19 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   setOperationAction(ISD::EH_SJLJ_SETUP_DISPATCH, MVT::Other, Custom);
 
   // Darwin ABI issue.
-  for (auto VT : { MVT::i32, MVT::i64 }) {
+  for (auto VT : {MVT::i32, MVT::i64}) {
     if (VT == MVT::i64 && !Subtarget.is64Bit())
       continue;
-    setOperationAction(ISD::ConstantPool    , VT, Custom);
-    setOperationAction(ISD::JumpTable       , VT, Custom);
-    setOperationAction(ISD::GlobalAddress   , VT, Custom);
+    setOperationAction(ISD::ConstantPool, VT, Custom);
+    setOperationAction(ISD::JumpTable, VT, Custom);
+    setOperationAction(ISD::GlobalAddress, VT, Custom);
     setOperationAction(ISD::GlobalTLSAddress, VT, Custom);
-    setOperationAction(ISD::ExternalSymbol  , VT, Custom);
-    setOperationAction(ISD::BlockAddress    , VT, Custom);
+    setOperationAction(ISD::ExternalSymbol, VT, Custom);
+    setOperationAction(ISD::BlockAddress, VT, Custom);
   }
 
   // 64-bit shl, sra, srl (iff 32-bit x86)
-  for (auto VT : { MVT::i32, MVT::i64 }) {
+  for (auto VT : {MVT::i32, MVT::i64}) {
     if (VT == MVT::i64 && !Subtarget.is64Bit())
       continue;
     setOperationAction(ISD::SHL_PARTS, VT, Custom);
@@ -532,12 +532,12 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
   }
 
   if (Subtarget.hasSSEPrefetch())
-    setOperationAction(ISD::PREFETCH      , MVT::Other, Custom);
+    setOperationAction(ISD::PREFETCH, MVT::Other, Custom);
 
-  setOperationAction(ISD::ATOMIC_FENCE  , MVT::Other, Custom);
+  setOperationAction(ISD::ATOMIC_FENCE, MVT::Other, Custom);
 
   // Expand certain atomics
-  for (auto VT : { MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {
+  for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) {
     setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
     setOperationAction(ISD::ATOMIC_LOAD_SUB, VT, Custom);
     setOperationAction(ISD::ATOMIC_LOAD_ADD, VT, Custom);
@@ -581,14 +581,14 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
     setOperationAction(ISD::UBSANTRAP, MVT::Other, Legal);
 
   // VASTART needs to be custom lowered to use the VarArgsFrameIndex
-  setOperationAction(ISD::VASTART           , MVT::Other, Custom);
-  setOperationAction(ISD::VAEND             , MVT::Other, Expand);
+  setOperationAction(ISD::VASTART, MVT::Other, Custom);
+  setOperationAction(ISD::VAEND, MVT::Other, Expand);
   bool Is64Bit = Subtarget.is64Bit();
-  setOperationAction(ISD::VAARG,  MVT::Other, Is64Bit ? Custom : Expand);
+  setOperationAction(ISD::VAARG, MVT::Other, Is64Bit ? Cu...
[truncated]

RKSimon · 2025-11-24T22:13:12Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -189,10 +189,10 @@ X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
  // We don't accept any truncstore of integer registers.
  setTruncStoreAction(MVT::i64, MVT::i32, Expand);
  setTruncStoreAction(MVT::i64, MVT::i16, Expand);
-  setTruncStoreAction(MVT::i64, MVT::i8 , Expand);
+  setTruncStoreAction(MVT::i64, MVT::i8, Expand);


(style) don't clang-format the entire file - only touch lines that you are changing - unfortunately it makes the patch un-reviewable :(

Oh whoops will fix this.

llvm/unittests/Target/X86/X86SelectionDAGTest.cpp

zwuis · 2025-11-25T15:43:20Z

Fixes [#166949].

You can use this syntax.

llvm/lib/Target/X86/X86ISelLowering.cpp

RKSimon · 2025-11-25T16:12:15Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+  SDValue Amt = N->getOperand(2);
+  EVT VT = Op0.getValueType();
+
+  if (!VT.isVector() || !Subtarget.hasVBMI2())


Remove hasVBMI2 check - should be handled by isOperationLegal?

True, removed it

llvm/lib/Target/X86/X86ISelLowering.cpp

RKSimon · 2025-11-25T16:16:40Z

llvm/unittests/Target/X86/X86SelectionDAGTest.cpp

@@ -0,0 +1,163 @@
+//===- FunnelShiftCombineTest.cpp - X86 Funnel Shift Combine Tests --------===//


FunnelShiftCombineTest.cpp ?

Whoops I can update this if we keep the file

RKSimon · 2025-11-25T16:19:05Z

llvm/unittests/Target/X86/X86SelectionDAGTest.cpp

+
+namespace {
+
+class X86FunnelShiftCombineTest : public testing::Test {


Still not sure this is necessary - we already have pretty thorough test coverage inside llvm\test\CodeGen\X86 - @phoebewang WDYT?

Yeah I just added it because the llvm contributing guide said to add a small unit test, but the test itself isn't really critical and seems out of place.

I agree. unittests are less straightforward than the assemble. And here checking type legality doesn't have much value.

phoebewang · 2025-11-26T02:06:01Z

llvm/lib/Target/X86/X86ISelLowering.cpp

+  case ISD::FSHL: return combineFunnelShift(N, DAG, DCI, Subtarget);
+  case ISD::FSHR: return combineFunnelShift(N, DAG, DCI, Subtarget);


Suggested change

case ISD::FSHL: return combineFunnelShift(N, DAG, DCI, Subtarget);

case ISD::FSHR: return combineFunnelShift(N, DAG, DCI, Subtarget);

case ISD::FSHL:

case ISD::FSHR: return combineFunnelShift(N, DAG, DCI, Subtarget);

phoebewang · 2025-11-26T02:13:55Z

llvm/lib/Target/X86/X86ISelLowering.cpp

@@ -57622,6 +57626,49 @@ static SDValue combineFP_TO_xINT_SAT(SDNode *N, SelectionDAG &DAG,
  return SDValue();
 }

+// Combiner: turn uniform-constant splat funnel shifts into VSHLD/VSHRD


Why do we need the combine? Can we use patten match instead?

I believe it is because in the combine we can easily check for constant splats and also type legality. I'm not sure how we'd do this with TableGen patterns. @RKSimon do you agree?

we can't currently easily match constant splats inside td patterns - it'd be a nice to have someday, but beyond the scope of this patch - plus we'd hit the problem that the X86ISD::VSHRD node has an implicit operand swap that is going to be messy to remove.

@phoebewang There is the option of moving this inside X86ISelDAGToDAG though.

I see, thanks!

Move constant splat handling for vector funnel shifts into a DAG combiner so that VBMI2 legal widths emit VSHLD/VSHRD directly (fixes llvm#166949). Signed-off-by: Arnav Mehta <arnavnmehta1@gmail.com>

llvm/lib/Target/X86/X86ISelLowering.cpp

github-actions · 2025-11-27T08:42:58Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/X86/X86ISelLowering.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 0913c119f..96bf45088 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -57655,7 +57655,7 @@ static SDValue combineFunnelShift(SDNode *N, SelectionDAG &DAG,
   if (IsFSHR)
     std::swap(Op0, Op1);
   unsigned Opcode = IsFSHR ? X86ISD::VSHRD : X86ISD::VSHLD;
-  
+
   return DAG.getNode(Opcode, DL, VT, {Op0, Op1, Imm});
 }

RKSimon

Now that its legal, much of the VBMI2 path can be removed from LowerFunnelShift - we just need to perform the widening for non-VLX VBMI2 targets.

llvmbot added the backend:X86 label Nov 24, 2025

RKSimon requested review from RKSimon and phoebewang November 24, 2025 21:52

RKSimon requested changes Nov 24, 2025

View reviewed changes

ArnavM3434 force-pushed the x86-funnelshift-fix branch from b752661 to a8b8325 Compare November 25, 2025 14:39

RKSimon requested changes Nov 25, 2025

View reviewed changes

phoebewang reviewed Nov 26, 2025

View reviewed changes

X86: make VBMI2 funnel shifts use VSHLD/VSHRD for const splats

4dd87cd

Move constant splat handling for vector funnel shifts into a DAG combiner so that VBMI2 legal widths emit VSHLD/VSHRD directly (fixes llvm#166949). Signed-off-by: Arnav Mehta <arnavnmehta1@gmail.com>

ArnavM3434 force-pushed the x86-funnelshift-fix branch from a8b8325 to 4dd87cd Compare November 26, 2025 04:06

phoebewang reviewed Nov 26, 2025

View reviewed changes

llvm/lib/Target/X86/X86ISelLowering.cpp Show resolved Hide resolved

RKSimon self-requested a review November 27, 2025 08:45

RKSimon requested changes Nov 27, 2025

View reviewed changes

		@@ -0,0 +1,163 @@
		//===- FunnelShiftCombineTest.cpp - X86 Funnel Shift Combine Tests --------===//


		namespace {

		class X86FunnelShiftCombineTest : public testing::Test {

		case ISD::FSHL: return combineFunnelShift(N, DAG, DCI, Subtarget);
		case ISD::FSHR: return combineFunnelShift(N, DAG, DCI, Subtarget);

X86: make VBMI2 funnel shifts use VSHLD/VSHRD for const splats #169401

Are you sure you want to change the base?

X86: make VBMI2 funnel shifts use VSHLD/VSHRD for const splats #169401

Uh oh!

Conversation

ArnavM3434 commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

llvmbot commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zwuis commented Nov 25, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 27, 2025

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ArnavM3434 commented Nov 24, 2025 •

edited

Loading