[ARM] Fix `ARMTTIImpl::getNumMemOps` destination alignment calculation by osa1 · Pull Request #199649 · llvm/llvm-project

osa1 · 2026-05-26T10:18:33Z

ARMTTIImpl::getNumMemOps's destination alignment calculation was out of sync with the SelectionDAG and GlobalISel's calculation, and it attributed wrong cost to inlined memcpys and memmoves in some cases.

Relevant isel code:

LegalizeHelper::lowerMemmove (line 11,088) and lowerMemcpy (line 10,979) have the same code: Align Alignment = std::min(DstAlign, SrcAlign);
SelectionDAGBuilder::visitIntrinsicCall, memove and memcpy cases have the same: (lines 6,407 and 6,745) Align Alignment = std::min(DstAlign, SrcAlign);

memcpy cost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls by

InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
  int NumOps = getNumMemOps(cast<IntrinsicInst>(I));

  // To model the cost of a library call, we assume 1 for the call, and
  // 3 for the argument setup.
  if (NumOps == -1)
    return 4;
  return NumOps;
}

The code above the test expectations show the compiled code which is 2 operations:

ldrb  r1, [r1]
strb  r1, [r0]

So the cost should've been 2 instead of 4, which is fixed with this commit.

AI was used to navigate the code base. All code changes are done by the author.

`ARMTTIImpl::getNumMemOps`'s destination alignment calculation was out of sync with the `SelectionDAG` and `GlobalISel`'s calculation, and it attributed wrong cost to inlined `memcpy`s and `memmove`s in some cases. Relevant isel code: - `LegalizeHelper::lowerMemmove` (line 11,088) and `lowerMemcpy` (line 10,979) have the same code: ``` Align Alignment = std::min(DstAlign, SrcAlign); ``` - `SelectionDAGBuilder::visitIntrinsicCall`, `memove` and `memcpy` cases have the same: (lines 6,407 and 6,745) ``` Align Alignment = std::min(DstAlign, SrcAlign); ``` `memcpy` cost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls by ``` InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const { int NumOps = getNumMemOps(cast<IntrinsicInst>(I)); // To model the cost of a library call, we assume 1 for the call, and // 3 for the argument setup. if (NumOps == -1) return 4; return NumOps; } ``` The code above the test expectations show the compiled code which is 2 operations. So the cost should've been 2 instead of 4, which is fixed with this commit.

github-actions · 2026-05-26T10:18:55Z

Hello @osa1 👋

Thank you for submitting a Pull Request (PR) to the LLVM Project. Since this is your first PR, here are a few useful links covering our main contribution policies and review practices.

All contributions to LLVM must follow our LLVM AI Tool Use Policy. In particular, if you used AI while working on this PR, remember to add a note to the PR description.
The LLVM Code-Review Policy and Practices document contains practical information about the PR process, including how patches are reviewed and accepted, and who can review a PR.
Our LLVM Developer Policy describes our expectations for code quality, commit summaries and contains notes on our CI system.

Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description.

Frequently asked questions

How do I add reviewers?

This PR will be automatically labeled, and the relevant teams will be notified. For some parts of the project, reviewers may also be added automatically.

You can also add reviewers manually using the Reviewers section on this page. If you cannot use that section, it is probably because you do not have write permissions for the repository. In that case, you can request a review by tagging reviewers in a comment using @ followed by their GitHub username.

What if there are no comments?

If you have not received any comments on your PR after a week, you can request a review by pinging the PR with a comment such as “Ping”. The common courtesy ping rate is once a week. Please remember that you are asking for volunteer time from other developers.

Are any special GitHub settings required to contribute to LLVM?

We only require contributors to have a public email address associated with their GitHub commits, see this section of LLVM Developer Policy for details.

If you have questions, feel free to leave a comment on this PR, or ask on LLVM Discord or LLVM Discourse.

Thank you,
The LLVM Community

llvmorg-github-actions · 2026-05-26T10:19:31Z

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-analysis

Author: Ömer Sinan Ağacan (osa1)

Changes

ARMTTIImpl::getNumMemOps's destination alignment calculation was out of sync with the SelectionDAG and GlobalISel's calculation, and it attributed wrong cost to inlined memcpys and memmoves in some cases.

Relevant isel code:

LegalizeHelper::lowerMemmove (line 11,088) and lowerMemcpy (line 10,979) have the same code: Align Alignment = std::min(DstAlign, SrcAlign);
SelectionDAGBuilder::visitIntrinsicCall, memove and memcpy cases have the same: (lines 6,407 and 6,745) Align Alignment = std::min(DstAlign, SrcAlign);

memcpy cost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls by

InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
  int NumOps = getNumMemOps(cast&lt;IntrinsicInst&gt;(I));

  // To model the cost of a library call, we assume 1 for the call, and
  // 3 for the argument setup.
  if (NumOps == -1)
    return 4;
  return NumOps;
}

The code above the test expectations show the compiled code which is 2 operations. So the cost should've been 2 instead of 4, which is fixed with this commit.

Full diff: https://github.com/llvm/llvm-project/pull/199649.diff

2 Files Affected:

(modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+1-1)
(modified) llvm/test/Analysis/CostModel/ARM/memcpy.ll (+1-1)

diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index ab48a0c4ba39c..7b13ac5693762 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -1226,8 +1226,8 @@ int ARMTTIImpl::getNumMemOps(const IntrinsicInst *I) const {
       return -1;
 
     const unsigned Size = C->getValue().getZExtValue();
-    const Align DstAlign = MC->getDestAlign().valueOrOne();
     const Align SrcAlign = MC->getSourceAlign().valueOrOne();
+    const Align DstAlign = std::min(MC->getDestAlign().valueOrOne(), SrcAlign);
 
     MOp = MemOp::Copy(Size, /*DstAlignCanChange*/ false, DstAlign, SrcAlign,
                       /*IsVolatile*/ false);
diff --git a/llvm/test/Analysis/CostModel/ARM/memcpy.ll b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
index f397397125c05..690d6b5a1bf8a 100644
--- a/llvm/test/Analysis/CostModel/ARM/memcpy.ll
+++ b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
@@ -742,7 +742,7 @@ define void @memcpy_1_al41(ptr %d, ptr %s) {
 ; strb  r1, [r0]
 ;
 ; COMMON-LABEL: 'memcpy_1_al41'
-; COMMON-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
 ; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
 entry:

osa1 · 2026-05-26T10:29:14Z

Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description.

I've read the AI tool use policy and added a line to the PR description re: how I used AI tools for this change.

efriedma-quic

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Why are we estimating 4 in the first place? Increasing the alignment of the destination shouldn't make the copy more expensive.

davemgreen · 2026-05-27T07:51:02Z

I assume the problem is coming from this?

bool TargetLowering::findOptimalMemOpLowering(
    LLVMContext &Context, std::vector<EVT> &MemOps, unsigned Limit,
    const MemOp &Op, unsigned DstAS, unsigned SrcAS,
    const AttributeList &FuncAttributes, EVT *LargestVT) const {
  if (Limit != ~unsigned(0) && Op.isMemcpyWithFixedDstAlign() &&
      Op.getSrcAlign() < Op.getDstAlign())
    return false;

I didn't look very deeply, but I wasn't sue why it was limited to Op.getSrcAlign() < Op.getDstAlign(), other than there being some code below it that assumed it was the case. For the exact test case that you are changing, I believe it should not be useful for align to be more than Size under Arm (providing it is a power2).

osa1 · 2026-05-27T10:11:28Z

Thanks for the reviews.

@efriedma-quic

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Please elaborate. What's more confusing about this two line change? Do you agree that this fixes a bug? What's the right direction we should be taking here?

Why are we estimating 4 in the first place?

It's explained in the PR description, this code is attributing 4 to libcalls:

llvm-project/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp

Lines 1283 to 1292 in a975b7c

    
           InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const { 
        
             int NumOps = getNumMemOps(cast<IntrinsicInst>(I)); 
        
             // To model the cost of a library call, we assume 1 for the call, and 
        
             // 3 for the argument setup. 
        
             if (NumOps == -1) 
        
               return 4; 
        
             return NumOps; 
        
           }

The comments in the code are explaining why it's 4.

The reason why NumOps is -1 in this function is because the memcpy call in the test has different source and destination alignments:

llvm-project/llvm/test/Analysis/CostModel/ARM/memcpy.ll

Line 749 in a975b7c

    
           call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)

Which makes findOptimalMemOpLowering return false:

llvm-project/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Lines 217 to 223 in a975b7c

    
           bool TargetLowering::findOptimalMemOpLowering( 
        
               LLVMContext &Context, std::vector<EVT> &MemOps, unsigned Limit, 
        
               const MemOp &Op, unsigned DstAS, unsigned SrcAS, 
        
               const AttributeList &FuncAttributes, EVT *LargestVT) const { 
        
             if (Limit != ~unsigned(0) && Op.isMemcpyWithFixedDstAlign() && 
        
                 Op.getSrcAlign() < Op.getDstAlign()) 
        
               return false;

But in reality the op is actually lowered as inline instructions, so it's out of sync with the instruction selection.

This patch syncs them by using the same destination alignment values used by the isels.

Increasing the alignment of the destination shouldn't make the copy more expensive.

It's making it more expensive indirectly: findOptimalMemOpLowering is considering the operation as a libcall when the source alignment is smaller than destination alignment. I'm not sure why it's doing that, but see more details below on findOptimalMemOpLowering. (in short: it's the default implementation that's shared by multiple backends, so I think it has to be conservative)

@davemgreen that's the code path that's taken here, yes.

Note that this method is virtual:

llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h

Lines 4250 to 4263 in a975b7c

    
           /// Determines the optimal series of memory ops to replace the memset / 
        
           /// memcpy. Return true if the number of memory ops is below the threshold 
        
           /// (Limit). Note that this is always the case when Limit is ~0. It returns 
        
           /// the types of the sequence of memory ops to perform memset / memcpy by 
        
           /// reference. If LargestVT is non-null, the target may set it to the largest 
        
           /// EVT that should be used for generating the memset value (e.g., for vector 
        
           /// splats). If LargestVT is null or left unchanged, the caller will compute 
        
           /// it from MemOps. 
        
           virtual bool findOptimalMemOpLowering(LLVMContext &Context, 
        
                                                 std::vector<EVT> &MemOps, 
        
                                                 unsigned Limit, const MemOp &Op, 
        
                                                 unsigned DstAS, unsigned SrcAS, 
        
                                                 const AttributeList &FuncAttributes, 
        
                                                 EVT *LargestVT = nullptr) const;

Only AArch64 and "System Z" overrides it:

llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.h

Lines 263 to 267 in a975b7c

    
           bool findOptimalMemOpLowering(LLVMContext &Context, std::vector<EVT> &MemOps, 
        
                                         unsigned Limit, const MemOp &Op, unsigned DstAS, 
        
                                         unsigned SrcAS, 
        
                                         const AttributeList &FuncAttributes, 
        
                                         EVT *LargestVT = nullptr) const override;

So my understand is that TargetLowering::findOptimalMemOpLowering (the default implementation) needs to cover all of the remaining architectures and and it'll be too conservative for some of the architectures because of that.

Note also that GlobalISel does not use findOptimalMemOpLowering, but it has a function (not a virtual method) that's very similar: findGISelOptimalMemOpLowering:

llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Lines 10712 to 10720 in a975b7c

    
           // Returns a list of types to use for memory op lowering in MemOps. A partial 
        
           // port of findOptimalMemOpLowering in TargetLowering. 
        
           static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps, 
        
                                                     unsigned Limit, const MemOp &Op, 
        
                                                     unsigned DstAS, unsigned SrcAS, 
        
                                                     const AttributeList &FuncAttributes, 
        
                                                     const TargetLowering &TLI) { 
        
             if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign()) 
        
               return false;

For some reason the GlobalISel version does not have the limit check, but other than that the check is the same.

The point of this PR is that, as explained in the description, getNumMemOps is trying to count number of instructions generated by the isels but it's not generating the alignment values the same way, causing discrepancy. I'm updating the destination alignment the same way as the isels. With that getNumMemOps is generating the current value in more cases, as demonstrated by the updated test.

For the exact test case that you are changing, I believe it should not be useful for align to be more than Size under Arm (providing it is a power2).

I'm not sure what to do with this information yet, I'll have to think more... In the meantime if you have concrete change suggestions I'd be happy to incorporate them into this PR.

efriedma-quic · 2026-05-27T19:58:39Z

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Please elaborate. What's more confusing about this two line change? Do you agree that this fixes a bug? What's the right direction we should be taking here?

We don't want to merge the source and destination alignments at all. The only reason SelectionDAG does it is because nobody has taken the time to change the code to pass down the separate alignment markings through the memcpy lowering code. (LLVM IR was changed to allow representing separate alignments relatively recently.) Doing alignment merging in more places is the opposite of the direction we want to move.

llvmorg-github-actions Bot added backend:ARM llvm:analysis Includes value tracking, cost tables and constant folding labels May 26, 2026

fhahn requested a review from davemgreen May 26, 2026 10:25

fhahn requested review from efriedma-quic and jroelofs May 26, 2026 10:48

osa1 changed the title ~~Fix ARMTTIImpl::getNumMemOps destination alignment calculation~~ [ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation May 26, 2026

Merge branch 'main' into getNumMemOps

9190c05

efriedma-quic reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM] Fix `ARMTTIImpl::getNumMemOps` destination alignment calculation#199649

[ARM] Fix `ARMTTIImpl::getNumMemOps` destination alignment calculation#199649
osa1 wants to merge 2 commits into
llvm:mainfrom
osa1:getNumMemOps

osa1 commented May 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 26, 2026

Uh oh!

llvmorg-github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

osa1 commented May 26, 2026

Uh oh!

efriedma-quic left a comment

Uh oh!

davemgreen commented May 27, 2026

Uh oh!

osa1 commented May 27, 2026

Uh oh!

efriedma-quic commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

osa1 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026

Frequently asked questions

Uh oh!

llvmorg-github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osa1 commented May 26, 2026

Uh oh!

efriedma-quic left a comment

Choose a reason for hiding this comment

Uh oh!

davemgreen commented May 27, 2026

Uh oh!

osa1 commented May 27, 2026

Uh oh!

efriedma-quic commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

osa1 commented May 26, 2026 •

edited

Loading

llvmorg-github-actions Bot commented May 26, 2026 •

edited

Loading