Skip to content

[ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation#199649

Open
osa1 wants to merge 2 commits into
llvm:mainfrom
osa1:getNumMemOps
Open

[ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation#199649
osa1 wants to merge 2 commits into
llvm:mainfrom
osa1:getNumMemOps

Conversation

@osa1
Copy link
Copy Markdown
Contributor

@osa1 osa1 commented May 26, 2026

ARMTTIImpl::getNumMemOps's destination alignment calculation was out of sync with the SelectionDAG and GlobalISel's calculation, and it attributed wrong cost to inlined memcpys and memmoves in some cases.

Relevant isel code:

  • LegalizeHelper::lowerMemmove (line 11,088) and lowerMemcpy (line 10,979) have the same code: Align Alignment = std::min(DstAlign, SrcAlign);

  • SelectionDAGBuilder::visitIntrinsicCall, memove and memcpy cases have the same: (lines 6,407 and 6,745) Align Alignment = std::min(DstAlign, SrcAlign);

memcpy cost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls by

InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
  int NumOps = getNumMemOps(cast<IntrinsicInst>(I));

  // To model the cost of a library call, we assume 1 for the call, and
  // 3 for the argument setup.
  if (NumOps == -1)
    return 4;
  return NumOps;
}

The code above the test expectations show the compiled code which is 2 operations:

ldrb  r1, [r1]
strb  r1, [r0]

So the cost should've been 2 instead of 4, which is fixed with this commit.


AI was used to navigate the code base. All code changes are done by the author.

`ARMTTIImpl::getNumMemOps`'s destination alignment calculation was out
of sync with the `SelectionDAG` and `GlobalISel`'s calculation, and it
attributed wrong cost to inlined `memcpy`s and `memmove`s in some cases.

Relevant isel code:

- `LegalizeHelper::lowerMemmove` (line 11,088) and `lowerMemcpy` (line
  10,979) have the same code:
  ```
  Align Alignment = std::min(DstAlign, SrcAlign);
  ```

- `SelectionDAGBuilder::visitIntrinsicCall`, `memove` and `memcpy` cases
  have the same: (lines 6,407 and 6,745)
  ```
  Align Alignment = std::min(DstAlign, SrcAlign);
  ```

`memcpy` cost model test updated with the new cost. The test was
actually incorrect before: cost 4 is attributed to libcalls by

```
InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
  int NumOps = getNumMemOps(cast<IntrinsicInst>(I));

  // To model the cost of a library call, we assume 1 for the call, and
  // 3 for the argument setup.
  if (NumOps == -1)
    return 4;
  return NumOps;
}
```

The code above the test expectations show the compiled code which is 2
operations. So the cost should've been 2 instead of 4, which is fixed
with this commit.
@github-actions
Copy link
Copy Markdown

Hello @osa1 👋

Thank you for submitting a Pull Request (PR) to the LLVM Project. Since this is your first PR, here are a few useful links covering our main contribution policies and review practices.

  • All contributions to LLVM must follow our LLVM AI Tool Use Policy. In particular, if you used AI while working on this PR, remember to add a note to the PR description.
  • The LLVM Code-Review Policy and Practices document contains practical information about the PR process, including how patches are reviewed and accepted, and who can review a PR.
  • Our LLVM Developer Policy describes our expectations for code quality, commit summaries and contains notes on our CI system.

Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description.


Frequently asked questions

How do I add reviewers?

This PR will be automatically labeled, and the relevant teams will be notified. For some parts of the project, reviewers may also be added automatically.

You can also add reviewers manually using the Reviewers section on this page. If you cannot use that section, it is probably because you do not have write permissions for the repository. In that case, you can request a review by tagging reviewers in a comment using @ followed by their GitHub username.

What if there are no comments?

If you have not received any comments on your PR after a week, you can request a review by pinging the PR with a comment such as “Ping”. The common courtesy ping rate is once a week. Please remember that you are asking for volunteer time from other developers.

Are any special GitHub settings required to contribute to LLVM?

We only require contributors to have a public email address associated with their GitHub commits, see this section of LLVM Developer Policy for details.


If you have questions, feel free to leave a comment on this PR, or ask on LLVM Discord or LLVM Discourse.

Thank you,
The LLVM Community

@llvmorg-github-actions llvmorg-github-actions Bot added backend:ARM llvm:analysis Includes value tracking, cost tables and constant folding labels May 26, 2026
@llvmorg-github-actions
Copy link
Copy Markdown

llvmorg-github-actions Bot commented May 26, 2026

@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-llvm-analysis

Author: Ömer Sinan Ağacan (osa1)

Changes

ARMTTIImpl::getNumMemOps's destination alignment calculation was out of sync with the SelectionDAG and GlobalISel's calculation, and it attributed wrong cost to inlined memcpys and memmoves in some cases.

Relevant isel code:

  • LegalizeHelper::lowerMemmove (line 11,088) and lowerMemcpy (line 10,979) have the same code: Align Alignment = std::min(DstAlign, SrcAlign);

  • SelectionDAGBuilder::visitIntrinsicCall, memove and memcpy cases have the same: (lines 6,407 and 6,745) Align Alignment = std::min(DstAlign, SrcAlign);

memcpy cost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls by

InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
  int NumOps = getNumMemOps(cast&lt;IntrinsicInst&gt;(I));

  // To model the cost of a library call, we assume 1 for the call, and
  // 3 for the argument setup.
  if (NumOps == -1)
    return 4;
  return NumOps;
}

The code above the test expectations show the compiled code which is 2 operations. So the cost should've been 2 instead of 4, which is fixed with this commit.


Full diff: https://github.com/llvm/llvm-project/pull/199649.diff

2 Files Affected:

  • (modified) llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp (+1-1)
  • (modified) llvm/test/Analysis/CostModel/ARM/memcpy.ll (+1-1)
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index ab48a0c4ba39c..7b13ac5693762 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -1226,8 +1226,8 @@ int ARMTTIImpl::getNumMemOps(const IntrinsicInst *I) const {
       return -1;
 
     const unsigned Size = C->getValue().getZExtValue();
-    const Align DstAlign = MC->getDestAlign().valueOrOne();
     const Align SrcAlign = MC->getSourceAlign().valueOrOne();
+    const Align DstAlign = std::min(MC->getDestAlign().valueOrOne(), SrcAlign);
 
     MOp = MemOp::Copy(Size, /*DstAlignCanChange*/ false, DstAlign, SrcAlign,
                       /*IsVolatile*/ false);
diff --git a/llvm/test/Analysis/CostModel/ARM/memcpy.ll b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
index f397397125c05..690d6b5a1bf8a 100644
--- a/llvm/test/Analysis/CostModel/ARM/memcpy.ll
+++ b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
@@ -742,7 +742,7 @@ define void @memcpy_1_al41(ptr %d, ptr %s) {
 ; strb  r1, [r0]
 ;
 ; COMMON-LABEL: 'memcpy_1_al41'
-; COMMON-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
+; COMMON-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
 ; COMMON-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
 entry:

@fhahn fhahn requested a review from davemgreen May 26, 2026 10:25
@osa1
Copy link
Copy Markdown
Contributor Author

osa1 commented May 26, 2026

Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description.

I've read the AI tool use policy and added a line to the PR description re: how I used AI tools for this change.

@fhahn fhahn requested review from efriedma-quic and jroelofs May 26, 2026 10:48
@osa1 osa1 changed the title Fix ARMTTIImpl::getNumMemOps destination alignment calculation [ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation May 26, 2026
Copy link
Copy Markdown
Contributor

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Why are we estimating 4 in the first place? Increasing the alignment of the destination shouldn't make the copy more expensive.

@davemgreen
Copy link
Copy Markdown
Contributor

I assume the problem is coming from this?

bool TargetLowering::findOptimalMemOpLowering(
    LLVMContext &Context, std::vector<EVT> &MemOps, unsigned Limit,
    const MemOp &Op, unsigned DstAS, unsigned SrcAS,
    const AttributeList &FuncAttributes, EVT *LargestVT) const {
  if (Limit != ~unsigned(0) && Op.isMemcpyWithFixedDstAlign() &&
      Op.getSrcAlign() < Op.getDstAlign())
    return false;

I didn't look very deeply, but I wasn't sue why it was limited to Op.getSrcAlign() < Op.getDstAlign(), other than there being some code below it that assumed it was the case. For the exact test case that you are changing, I believe it should not be useful for align to be more than Size under Arm (providing it is a power2).

@osa1
Copy link
Copy Markdown
Contributor Author

osa1 commented May 27, 2026

Thanks for the reviews.


@efriedma-quic

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Please elaborate. What's more confusing about this two line change? Do you agree that this fixes a bug? What's the right direction we should be taking here?

Why are we estimating 4 in the first place?

It's explained in the PR description, this code is attributing 4 to libcalls:

InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
int NumOps = getNumMemOps(cast<IntrinsicInst>(I));
// To model the cost of a library call, we assume 1 for the call, and
// 3 for the argument setup.
if (NumOps == -1)
return 4;
return NumOps;
}

The comments in the code are explaining why it's 4.

The reason why NumOps is -1 in this function is because the memcpy call in the test has different source and destination alignments:

call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)

Which makes findOptimalMemOpLowering return false:

bool TargetLowering::findOptimalMemOpLowering(
LLVMContext &Context, std::vector<EVT> &MemOps, unsigned Limit,
const MemOp &Op, unsigned DstAS, unsigned SrcAS,
const AttributeList &FuncAttributes, EVT *LargestVT) const {
if (Limit != ~unsigned(0) && Op.isMemcpyWithFixedDstAlign() &&
Op.getSrcAlign() < Op.getDstAlign())
return false;

But in reality the op is actually lowered as inline instructions, so it's out of sync with the instruction selection.

This patch syncs them by using the same destination alignment values used by the isels.

Increasing the alignment of the destination shouldn't make the copy more expensive.

It's making it more expensive indirectly: findOptimalMemOpLowering is considering the operation as a libcall when the source alignment is smaller than destination alignment. I'm not sure why it's doing that, but see more details below on findOptimalMemOpLowering. (in short: it's the default implementation that's shared by multiple backends, so I think it has to be conservative)


@davemgreen that's the code path that's taken here, yes.

Note that this method is virtual:

/// Determines the optimal series of memory ops to replace the memset /
/// memcpy. Return true if the number of memory ops is below the threshold
/// (Limit). Note that this is always the case when Limit is ~0. It returns
/// the types of the sequence of memory ops to perform memset / memcpy by
/// reference. If LargestVT is non-null, the target may set it to the largest
/// EVT that should be used for generating the memset value (e.g., for vector
/// splats). If LargestVT is null or left unchanged, the caller will compute
/// it from MemOps.
virtual bool findOptimalMemOpLowering(LLVMContext &Context,
std::vector<EVT> &MemOps,
unsigned Limit, const MemOp &Op,
unsigned DstAS, unsigned SrcAS,
const AttributeList &FuncAttributes,
EVT *LargestVT = nullptr) const;

Only AArch64 and "System Z" overrides it:

bool findOptimalMemOpLowering(LLVMContext &Context, std::vector<EVT> &MemOps,
unsigned Limit, const MemOp &Op, unsigned DstAS,
unsigned SrcAS,
const AttributeList &FuncAttributes,
EVT *LargestVT = nullptr) const override;

So my understand is that TargetLowering::findOptimalMemOpLowering (the default implementation) needs to cover all of the remaining architectures and and it'll be too conservative for some of the architectures because of that.

Note also that GlobalISel does not use findOptimalMemOpLowering, but it has a function (not a virtual method) that's very similar: findGISelOptimalMemOpLowering:

// Returns a list of types to use for memory op lowering in MemOps. A partial
// port of findOptimalMemOpLowering in TargetLowering.
static bool findGISelOptimalMemOpLowering(std::vector<LLT> &MemOps,
unsigned Limit, const MemOp &Op,
unsigned DstAS, unsigned SrcAS,
const AttributeList &FuncAttributes,
const TargetLowering &TLI) {
if (Op.isMemcpyWithFixedDstAlign() && Op.getSrcAlign() < Op.getDstAlign())
return false;

For some reason the GlobalISel version does not have the limit check, but other than that the check is the same.

The point of this PR is that, as explained in the description, getNumMemOps is trying to count number of instructions generated by the isels but it's not generating the alignment values the same way, causing discrepancy. I'm updating the destination alignment the same way as the isels. With that getNumMemOps is generating the current value in more cases, as demonstrated by the updated test.

For the exact test case that you are changing, I believe it should not be useful for align to be more than Size under Arm (providing it is a power2).

I'm not sure what to do with this information yet, I'll have to think more... In the meantime if you have concrete change suggestions I'd be happy to incorporate them into this PR.

@efriedma-quic
Copy link
Copy Markdown
Contributor

This seems like it makes the code even more confusing... and it's moving in the wrong direction.

Please elaborate. What's more confusing about this two line change? Do you agree that this fixes a bug? What's the right direction we should be taking here?

We don't want to merge the source and destination alignments at all. The only reason SelectionDAG does it is because nobody has taken the time to change the code to pass down the separate alignment markings through the memcpy lowering code. (LLVM IR was changed to allow representing separate alignments relatively recently.) Doing alignment merging in more places is the opposite of the direction we want to move.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM llvm:analysis Includes value tracking, cost tables and constant folding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants