[X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure we merge the full / truncated load chains #166160

RKSimon · 2025-11-03T13:01:53Z

The full load might persist so ensure that the chains are merged into a token factor instead of just transferring the chain to the new load

Noticed while trying to fix the regression reported from #165540

…sure we merge the full / truncated load chains The full load might persist so ensure that the chains are merged into a token factor instead of just transferring the chain to the new load Noticed while trying to fix the regression reported from llvm#165540

llvmbot · 2025-11-03T13:02:25Z

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

The full load might persist so ensure that the chains are merged into a token factor instead of just transferring the chain to the new load

Noticed while trying to fix the regression reported from #165540

Full diff: https://github.com/llvm/llvm-project/pull/166160.diff

1 Files Affected:

(modified) llvm/lib/Target/X86/X86ISelLowering.cpp (+1-2)

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 133406bd8e0d7..e5b2743f602da 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -54529,8 +54529,7 @@ static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,
         SDValue NewLoad =
             DAG.getLoad(VT, DL, Ld->getChain(), NewPtr, Ld->getPointerInfo(),
                         Align(), Ld->getMemOperand()->getFlags());
-        DAG.ReplaceAllUsesOfValueWith(Src.getOperand(0).getValue(1),
-                                      NewLoad.getValue(1));
+        DAG.makeEquivalentMemoryOrdering(Ld, NewLoad);
         return NewLoad;
       }
     }

…gers (REAPPLIED) This patch allows us to narrow single bit-test/twiddle operations for larger than legal scalar integers to efficiently operate just on the i32 sub-integer block actually affected. The BITOP(X,SHL(1,IDX)) patterns are split, with the IDX used to access the specific i32 block as well as specific bit within that block. BT comparisons are relatively simple, and builds on the truncated shifted loads fold from llvm#165266. BTC/BTR/BTS bit twiddling patterns need to match the entire RMW pattern to safely confirm only one block is affected, but a similar approach is taken and creates codegen that should allow us to further merge with matching BT opcodes in a future patch (see llvm#165291). The resulting codegen is notably more efficient than the heavily micro-coded memory folded variants of BT/BTC/BTR/BTS. There is still some work to improve the bit insert 'init' patterns included in bittest-big-integer.ll but I'm expecting this to be a straightforward future extension. REAPPLIED from llvm#165540 which was reverted due to a sanitizer regression that should have been fixed by llvm#166160 Fixes llvm#164225

…gers (REAPPLIED) (#166176) This patch allows us to narrow single bit-test/twiddle operations for larger than legal scalar integers to efficiently operate just on the i32 sub-integer block actually affected. The BITOP(X,SHL(1,IDX)) patterns are split, with the IDX used to access the specific i32 block as well as specific bit within that block. BT comparisons are relatively simple, and builds on the truncated shifted loads fold from #165266. BTC/BTR/BTS bit twiddling patterns need to match the entire RMW pattern to safely confirm only one block is affected, but a similar approach is taken and creates codegen that should allow us to further merge with matching BT opcodes in a future patch (see #165291). The resulting codegen is notably more efficient than the heavily micro-coded memory folded variants of BT/BTC/BTR/BTS. There is still some work to improve the bit insert 'init' patterns included in bittest-big-integer.ll but I'm expecting this to be a straightforward future extension. REAPPLIED from #165540 which was reverted due to a sanitizer regression that should have been fixed by #166160 Fixes #164225

… (REAPPLIED) Insertion of a single bit into a large integer is typically canonicalized to "(X & ~(1 << ShAmt)) | (InsertBit << ShAmt)", which can be simplified to modify the i32 block as a BTR followed by an OR((i32)InsertBit << (ShAmt % 32). We must ensure that the InsertBit is zero apart from the LSB so we can cheaply truncate it to work with the i32 block like the simpler BT patterns. REAPPLIED from llvm#165742 which was reverted as part of a chain of commits due to a sanitizer regression that should have been fixed by llvm#166160

… (REAPPLIED) (#166337) Insertion of a single bit into a large integer is typically canonicalized to "(X & ~(1 << ShAmt)) | (InsertBit << ShAmt)", which can be simplified to modify the i32 block as a BTR followed by an OR((i32)InsertBit << (ShAmt % 32). We must ensure that the InsertBit is zero apart from the LSB so we can cheaply truncate it to work with the i32 block like the simpler BT patterns. REAPPLIED from #165742 which was reverted as part of a chain of commits due to a sanitizer regression that should have been fixed by #166160

llvmbot added the backend:X86 label Nov 3, 2025

RKSimon enabled auto-merge (squash) November 3, 2025 13:02

RKSimon merged commit 8395343 into llvm:main Nov 3, 2025
11 of 12 checks passed

RKSimon deleted the x86-trunc-load-merge-chain branch November 3, 2025 13:57

RKSimon mentioned this pull request Nov 3, 2025

[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (REAPPLIED) #166176

Merged

RKSimon mentioned this pull request Nov 3, 2025

Revert "[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (#165540)" #165979

Merged

RKSimon mentioned this pull request Nov 4, 2025

[X86] narrowBitOpRMW - add handling for single bit insertion patterns (REAPPLIED) #166337

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure we merge the full / truncated load chains #166160

[X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure we merge the full / truncated load chains #166160

Uh oh!

RKSimon commented Nov 3, 2025

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure we merge the full / truncated load chains #166160

[X86] combineTruncate - trunc(srl(load(p),amt)) -> load(p+amt/8) - ensure we merge the full / truncated load chains #166160

Uh oh!

Conversation

RKSimon commented Nov 3, 2025

Uh oh!

llvmbot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants