Optimized Constant Xor And And Not Operation #161784

manik-muk · 2025-10-03T05:21:45Z

github-actions · 2025-10-03T05:22:07Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-10-03T05:22:39Z

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: None (manik-muk)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161784.diff

3 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+63)
(added) llvm/test/CodeGen/X86/constant-xor-and-andnot.ll (+63)
(modified) llvm/test/CodeGen/X86/pr108731.ll (+6-6)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index cdc97faf394ca..6c562ccf5b363 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -51541,6 +51541,64 @@ static SDValue combineBMILogicOp(SDNode *N, SelectionDAG &DAG,
   return SDValue();
 }
 
+/// Optimize (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c)
+/// This allows the andn operation to be done in parallel with the xor
+static SDValue combineConstantXorAndAndNot(SDNode *N, const SDLoc &DL,
+                                           SelectionDAG &DAG,
+                                           const X86Subtarget &Subtarget) {
+  using namespace llvm::SDPatternMatch;
+
+  EVT VT = N->getValueType(0);
+  // Only handle scalar integer types that support BMI instructions
+  if (!Subtarget.hasBMI() || (VT != MVT::i32 && VT != MVT::i64))
+    return SDValue();
+
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+
+  // Check if N0 is AND(XOR(Constant, a), b)
+  if (N0.getOpcode() != ISD::AND)
+    return SDValue();
+
+  SDValue AndLHS = N0.getOperand(0);
+  SDValue AndRHS = N0.getOperand(1);
+
+  // Check if one operand is XOR(Constant, a)
+  SDValue XorOp, OtherOp;
+  if (AndLHS.getOpcode() == ISD::XOR) {
+    XorOp = AndLHS;
+    OtherOp = AndRHS;
+  } else if (AndRHS.getOpcode() == ISD::XOR) {
+    XorOp = AndRHS;
+    OtherOp = AndLHS;
+  } else {
+    return SDValue();
+  }
+
+  // Check if XOR has a constant operand
+  if (!isa<ConstantSDNode>(XorOp.getOperand(0)) &&
+      !isa<ConstantSDNode>(XorOp.getOperand(1))) {
+    return SDValue();
+  }
+
+  // Check if N1 is NOT(c) - i.e., XOR(c, -1)
+  SDValue NotOp;
+  if (N1.getOpcode() == ISD::XOR && isAllOnesConstant(N1.getOperand(1))) {
+    NotOp = N1.getOperand(0);
+  } else {
+    return SDValue();
+  }
+
+  // Transform: AND(AND(XOR(Constant, a), b), NOT(c))
+  // To: AND(XOR(Constant, a), AND(b, NOT(c)))
+  // This allows the andn (b & ~c) to be done in parallel with the xor
+
+  // Create AND(b, NOT(c)) - this will become andn
+  SDValue NewAnd = DAG.getNode(ISD::AND, DL, VT, OtherOp, N1);
+  // Create final AND(XOR(Constant, a), AND(b, NOT(c)))
+  return DAG.getNode(ISD::AND, DL, VT, XorOp, NewAnd);
+}
+
 /// Fold AND(Y, XOR(X, NEG(X))) -> ANDN(Y, BLSMSK(X)) if BMI is available.
 static SDValue combineAndXorSubWithBMI(SDNode *And, const SDLoc &DL,
                                        SelectionDAG &DAG,
@@ -51833,6 +51891,11 @@ static SDValue combineAnd(SDNode *N, SelectionDAG &DAG,
   if (SDValue R = combineAndNotOrIntoAndNotAnd(N, dl, DAG))
     return R;
 
+  // Optimize (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c)
+  // This allows the andn operation to be done in parallel with the xor
+  if (SDValue R = combineConstantXorAndAndNot(N, dl, DAG, Subtarget))
+    return R;
+
   // fold (and (mul x, c1), c2) -> (mul x, (and c1, c2))
   // iff c2 is all/no bits mask - i.e. a select-with-zero mask.
   // TODO: Handle PMULDQ/PMULUDQ/VPMADDWD/VPMADDUBSW?
diff --git a/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll b/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll
new file mode 100644
index 0000000000000..5a4d931d29896
--- /dev/null
+++ b/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll
@@ -0,0 +1,63 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+bmi < %s | FileCheck %s
+
+; Test the optimization described in issue #161630:
+; (Constant XOR a) & b & ~c should compile to allow andn to be done in parallel with xor
+
+define i64 @test_constant_xor_and_andnot(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andnq %rsi, %rdx, %rax
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %xor, %b
+  %not_c = xor i64 %c, -1
+  %result = and i64 %and1, %not_c
+  ret i64 %result
+}
+
+define i32 @test_constant_xor_and_andnot_32(i32 %a, i32 %b, i32 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl $5678, %edi # imm = 0x162E
+; CHECK-NEXT:    andnl %esi, %edx, %eax
+; CHECK-NEXT:    andl %edi, %eax
+; CHECK-NEXT:    retq
+  %xor = xor i32 %a, 5678
+  %and1 = and i32 %xor, %b
+  %not_c = xor i32 %c, -1
+  %result = and i32 %and1, %not_c
+  ret i32 %result
+}
+
+; Test with different operand order
+define i64 @test_constant_xor_and_andnot_swapped(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_swapped:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andnq %rsi, %rdx, %rax
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %b, %xor
+  %not_c = xor i64 %c, -1
+  %result = and i64 %and1, %not_c
+  ret i64 %result
+}
+
+; Test with different operand order for the final AND
+define i64 @test_constant_xor_and_andnot_final_swapped(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_final_swapped:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andq %rsi, %rdi
+; CHECK-NEXT:    andnq %rdi, %rdx, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %xor, %b
+  %not_c = xor i64 %c, -1
+  %result = and i64 %not_c, %and1
+  ret i64 %result
+}
\ No newline at end of file
diff --git a/llvm/test/CodeGen/X86/pr108731.ll b/llvm/test/CodeGen/X86/pr108731.ll
index 2983d108eaedd..bda90117a1be4 100644
--- a/llvm/test/CodeGen/X86/pr108731.ll
+++ b/llvm/test/CodeGen/X86/pr108731.ll
@@ -17,9 +17,9 @@ define i64 @test_i64(i64 %w, i64 %x, i64 %y, i64 %z) {
 ; BMI-LABEL: test_i64:
 ; BMI:       # %bb.0: # %Entry
 ; BMI-NEXT:    andq %rdx, %rsi
-; BMI-NEXT:    andnq %rdi, %rsi, %rax
-; BMI-NEXT:    andnq %rcx, %rdx, %rcx
-; BMI-NEXT:    andnq %rax, %rcx, %rax
+; BMI-NEXT:    andnq %rcx, %rdx, %rax
+; BMI-NEXT:    andnq %rdi, %rax, %rax
+; BMI-NEXT:    andnq %rax, %rsi, %rax
 ; BMI-NEXT:    retq
 Entry:
   %and1 = and i64 %y, %x
@@ -46,9 +46,9 @@ define i32 @test_i32(i32 %w, i32 %x, i32 %y, i32 %z) {
 ; BMI-LABEL: test_i32:
 ; BMI:       # %bb.0: # %Entry
 ; BMI-NEXT:    andl %edx, %esi
-; BMI-NEXT:    andnl %edi, %esi, %eax
-; BMI-NEXT:    andnl %ecx, %edx, %ecx
-; BMI-NEXT:    andnl %eax, %ecx, %eax
+; BMI-NEXT:    andnl %ecx, %edx, %eax
+; BMI-NEXT:    andnl %edi, %eax, %eax
+; BMI-NEXT:    andnl %eax, %esi, %eax
 ; BMI-NEXT:    retq
 Entry:
   %and1 = and i32 %y, %x

RKSimon

This looks like a general DAG reassociation combine, do other targets with ANDNOT see the same missed optimization?

manik-muk · 2025-10-03T06:46:31Z

That's right! Just tested with some other architectures and found that they also miss the optimization. Would you like a more generic / generalizable solution? Can potentially modify DAGCombiner.cpp to make this generalize

RKSimon · 2025-10-03T08:15:10Z

Yes, most likely we can instead put something similar in DAGCombiner::reassociateOpsCommutative with a TLI.hasAndNot check

This moves the optimization from X86-specific code to the generic reassociateOpsCommutative function in DAGCombiner.cpp. The optimization transforms (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c) to allow ANDNOT operations to be done in parallel with XOR operations. This benefits all targets that have ANDNOT instructions (X86 BMI, ARM BIC, RISC-V, etc.) rather than being limited to X86 only. - Remove X86-specific combineConstantXorAndAndNot function - Add generic optimization to reassociateOpsCommutative with TLI.hasAndNot check - Update test expectations for the new optimized output

arsenm · 2025-10-05T00:07:37Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Suggested change

isAllOnesConstant(N0.getOperand(1))) {

Redundant

arsenm · 2025-10-05T00:07:47Z

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

added optimization and tests

af5dcb5

manik-muk changed the title ~~added optimization and tests~~ Optimized Constant Xor And And Not Operation Oct 3, 2025

llvmbot added the backend:X86 label Oct 3, 2025

RKSimon requested review from RKSimon and phoebewang October 3, 2025 06:34

RKSimon reviewed Oct 3, 2025

View reviewed changes

manik-muk added 2 commits October 4, 2025 16:32

changed combiner logic to account for infinite loops

4a2e546

llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Oct 4, 2025

manik-muk requested a review from RKSimon October 4, 2025 23:31

arsenm reviewed Oct 5, 2025

View reviewed changes

refactored to remove goto

7fb0e39

manik-muk requested a review from arsenm October 5, 2025 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized Constant Xor And And Not Operation #161784

Optimized Constant Xor And And Not Operation #161784

manik-muk commented Oct 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

llvmbot commented Oct 3, 2025 •

edited

Loading

Uh oh!

RKSimon left a comment

Uh oh!

manik-muk commented Oct 3, 2025

Uh oh!

RKSimon commented Oct 3, 2025

Uh oh!

arsenm Oct 5, 2025

Uh oh!

arsenm Oct 5, 2025

Uh oh!

Uh oh!

Optimized Constant Xor And And Not Operation #161784

Are you sure you want to change the base?

Optimized Constant Xor And And Not Operation #161784

Conversation

manik-muk commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

llvmbot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RKSimon left a comment

Choose a reason for hiding this comment

Uh oh!

manik-muk commented Oct 3, 2025

Uh oh!

RKSimon commented Oct 3, 2025

Uh oh!

arsenm Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Oct 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manik-muk commented Oct 3, 2025 •

edited

Loading

llvmbot commented Oct 3, 2025 •

edited

Loading