Skip to content

Conversation

manik-muk
Copy link

@manik-muk manik-muk commented Oct 3, 2025

resolves #161630

Copy link

github-actions bot commented Oct 3, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@manik-muk manik-muk changed the title added optimization and tests Optimized Constant Xor And And Not Operation Oct 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-x86

Author: None (manik-muk)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/161784.diff

3 Files Affected:

  • (modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+63)
  • (added) llvm/test/CodeGen/X86/constant-xor-and-andnot.ll (+63)
  • (modified) llvm/test/CodeGen/X86/pr108731.ll (+6-6)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index cdc97faf394ca..6c562ccf5b363 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -51541,6 +51541,64 @@ static SDValue combineBMILogicOp(SDNode *N, SelectionDAG &DAG,
   return SDValue();
 }
 
+/// Optimize (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c)
+/// This allows the andn operation to be done in parallel with the xor
+static SDValue combineConstantXorAndAndNot(SDNode *N, const SDLoc &DL,
+                                           SelectionDAG &DAG,
+                                           const X86Subtarget &Subtarget) {
+  using namespace llvm::SDPatternMatch;
+
+  EVT VT = N->getValueType(0);
+  // Only handle scalar integer types that support BMI instructions
+  if (!Subtarget.hasBMI() || (VT != MVT::i32 && VT != MVT::i64))
+    return SDValue();
+
+  SDValue N0 = N->getOperand(0);
+  SDValue N1 = N->getOperand(1);
+
+  // Check if N0 is AND(XOR(Constant, a), b)
+  if (N0.getOpcode() != ISD::AND)
+    return SDValue();
+
+  SDValue AndLHS = N0.getOperand(0);
+  SDValue AndRHS = N0.getOperand(1);
+
+  // Check if one operand is XOR(Constant, a)
+  SDValue XorOp, OtherOp;
+  if (AndLHS.getOpcode() == ISD::XOR) {
+    XorOp = AndLHS;
+    OtherOp = AndRHS;
+  } else if (AndRHS.getOpcode() == ISD::XOR) {
+    XorOp = AndRHS;
+    OtherOp = AndLHS;
+  } else {
+    return SDValue();
+  }
+
+  // Check if XOR has a constant operand
+  if (!isa<ConstantSDNode>(XorOp.getOperand(0)) &&
+      !isa<ConstantSDNode>(XorOp.getOperand(1))) {
+    return SDValue();
+  }
+
+  // Check if N1 is NOT(c) - i.e., XOR(c, -1)
+  SDValue NotOp;
+  if (N1.getOpcode() == ISD::XOR && isAllOnesConstant(N1.getOperand(1))) {
+    NotOp = N1.getOperand(0);
+  } else {
+    return SDValue();
+  }
+
+  // Transform: AND(AND(XOR(Constant, a), b), NOT(c))
+  // To: AND(XOR(Constant, a), AND(b, NOT(c)))
+  // This allows the andn (b & ~c) to be done in parallel with the xor
+
+  // Create AND(b, NOT(c)) - this will become andn
+  SDValue NewAnd = DAG.getNode(ISD::AND, DL, VT, OtherOp, N1);
+  // Create final AND(XOR(Constant, a), AND(b, NOT(c)))
+  return DAG.getNode(ISD::AND, DL, VT, XorOp, NewAnd);
+}
+
 /// Fold AND(Y, XOR(X, NEG(X))) -> ANDN(Y, BLSMSK(X)) if BMI is available.
 static SDValue combineAndXorSubWithBMI(SDNode *And, const SDLoc &DL,
                                        SelectionDAG &DAG,
@@ -51833,6 +51891,11 @@ static SDValue combineAnd(SDNode *N, SelectionDAG &DAG,
   if (SDValue R = combineAndNotOrIntoAndNotAnd(N, dl, DAG))
     return R;
 
+  // Optimize (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c)
+  // This allows the andn operation to be done in parallel with the xor
+  if (SDValue R = combineConstantXorAndAndNot(N, dl, DAG, Subtarget))
+    return R;
+
   // fold (and (mul x, c1), c2) -> (mul x, (and c1, c2))
   // iff c2 is all/no bits mask - i.e. a select-with-zero mask.
   // TODO: Handle PMULDQ/PMULUDQ/VPMADDWD/VPMADDUBSW?
diff --git a/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll b/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll
new file mode 100644
index 0000000000000..5a4d931d29896
--- /dev/null
+++ b/llvm/test/CodeGen/X86/constant-xor-and-andnot.ll
@@ -0,0 +1,63 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=x86_64-unknown-unknown -mattr=+bmi < %s | FileCheck %s
+
+; Test the optimization described in issue #161630:
+; (Constant XOR a) & b & ~c should compile to allow andn to be done in parallel with xor
+
+define i64 @test_constant_xor_and_andnot(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andnq %rsi, %rdx, %rax
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %xor, %b
+  %not_c = xor i64 %c, -1
+  %result = and i64 %and1, %not_c
+  ret i64 %result
+}
+
+define i32 @test_constant_xor_and_andnot_32(i32 %a, i32 %b, i32 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorl $5678, %edi # imm = 0x162E
+; CHECK-NEXT:    andnl %esi, %edx, %eax
+; CHECK-NEXT:    andl %edi, %eax
+; CHECK-NEXT:    retq
+  %xor = xor i32 %a, 5678
+  %and1 = and i32 %xor, %b
+  %not_c = xor i32 %c, -1
+  %result = and i32 %and1, %not_c
+  ret i32 %result
+}
+
+; Test with different operand order
+define i64 @test_constant_xor_and_andnot_swapped(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_swapped:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andnq %rsi, %rdx, %rax
+; CHECK-NEXT:    andq %rdi, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %b, %xor
+  %not_c = xor i64 %c, -1
+  %result = and i64 %and1, %not_c
+  ret i64 %result
+}
+
+; Test with different operand order for the final AND
+define i64 @test_constant_xor_and_andnot_final_swapped(i64 %a, i64 %b, i64 %c) {
+; CHECK-LABEL: test_constant_xor_and_andnot_final_swapped:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    xorq $1234, %rdi # imm = 0x4D2
+; CHECK-NEXT:    andq %rsi, %rdi
+; CHECK-NEXT:    andnq %rdi, %rdx, %rax
+; CHECK-NEXT:    retq
+  %xor = xor i64 %a, 1234
+  %and1 = and i64 %xor, %b
+  %not_c = xor i64 %c, -1
+  %result = and i64 %not_c, %and1
+  ret i64 %result
+}
\ No newline at end of file
diff --git a/llvm/test/CodeGen/X86/pr108731.ll b/llvm/test/CodeGen/X86/pr108731.ll
index 2983d108eaedd..bda90117a1be4 100644
--- a/llvm/test/CodeGen/X86/pr108731.ll
+++ b/llvm/test/CodeGen/X86/pr108731.ll
@@ -17,9 +17,9 @@ define i64 @test_i64(i64 %w, i64 %x, i64 %y, i64 %z) {
 ; BMI-LABEL: test_i64:
 ; BMI:       # %bb.0: # %Entry
 ; BMI-NEXT:    andq %rdx, %rsi
-; BMI-NEXT:    andnq %rdi, %rsi, %rax
-; BMI-NEXT:    andnq %rcx, %rdx, %rcx
-; BMI-NEXT:    andnq %rax, %rcx, %rax
+; BMI-NEXT:    andnq %rcx, %rdx, %rax
+; BMI-NEXT:    andnq %rdi, %rax, %rax
+; BMI-NEXT:    andnq %rax, %rsi, %rax
 ; BMI-NEXT:    retq
 Entry:
   %and1 = and i64 %y, %x
@@ -46,9 +46,9 @@ define i32 @test_i32(i32 %w, i32 %x, i32 %y, i32 %z) {
 ; BMI-LABEL: test_i32:
 ; BMI:       # %bb.0: # %Entry
 ; BMI-NEXT:    andl %edx, %esi
-; BMI-NEXT:    andnl %edi, %esi, %eax
-; BMI-NEXT:    andnl %ecx, %edx, %ecx
-; BMI-NEXT:    andnl %eax, %ecx, %eax
+; BMI-NEXT:    andnl %ecx, %edx, %eax
+; BMI-NEXT:    andnl %edi, %eax, %eax
+; BMI-NEXT:    andnl %eax, %esi, %eax
 ; BMI-NEXT:    retq
 Entry:
   %and1 = and i32 %y, %x

@RKSimon RKSimon requested review from RKSimon and phoebewang October 3, 2025 06:34
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a general DAG reassociation combine, do other targets with ANDNOT see the same missed optimization?

@manik-muk
Copy link
Author

That's right! Just tested with some other architectures and found that they also miss the optimization. Would you like a more generic / generalizable solution? Can potentially modify DAGCombiner.cpp to make this generalize

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 3, 2025

Yes, most likely we can instead put something similar in DAGCombiner::reassociateOpsCommutative with a TLI.hasAndNot check

This moves the optimization from X86-specific code to the generic
reassociateOpsCommutative function in DAGCombiner.cpp. The optimization
transforms (Constant XOR a) & b & ~c -> (Constant XOR a) & (b & ~c)
to allow ANDNOT operations to be done in parallel with XOR operations.

This benefits all targets that have ANDNOT instructions (X86 BMI, ARM BIC,
RISC-V, etc.) rather than being limited to X86 only.

- Remove X86-specific combineConstantXorAndAndNot function
- Add generic optimization to reassociateOpsCommutative with TLI.hasAndNot check
- Update test expectations for the new optimized output
@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Oct 4, 2025
@manik-muk manik-muk requested a review from RKSimon October 4, 2025 23:31
Comment on lines 7517 to 7518
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
isAllOnesConstant(N0.getOperand(1))) {

Redundant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no goto

@manik-muk manik-muk requested a review from arsenm October 5, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Constant <OP> a) & b & ~c should compile to (Constant <OP> a) & (b & ~c)
4 participants