[ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation#199649
[ARM] Fix ARMTTIImpl::getNumMemOps destination alignment calculation#199649osa1 wants to merge 2 commits into
ARMTTIImpl::getNumMemOps destination alignment calculation#199649Conversation
`ARMTTIImpl::getNumMemOps`'s destination alignment calculation was out
of sync with the `SelectionDAG` and `GlobalISel`'s calculation, and it
attributed wrong cost to inlined `memcpy`s and `memmove`s in some cases.
Relevant isel code:
- `LegalizeHelper::lowerMemmove` (line 11,088) and `lowerMemcpy` (line
10,979) have the same code:
```
Align Alignment = std::min(DstAlign, SrcAlign);
```
- `SelectionDAGBuilder::visitIntrinsicCall`, `memove` and `memcpy` cases
have the same: (lines 6,407 and 6,745)
```
Align Alignment = std::min(DstAlign, SrcAlign);
```
`memcpy` cost model test updated with the new cost. The test was
actually incorrect before: cost 4 is attributed to libcalls by
```
InstructionCost ARMTTIImpl::getMemcpyCost(const Instruction *I) const {
int NumOps = getNumMemOps(cast<IntrinsicInst>(I));
// To model the cost of a library call, we assume 1 for the call, and
// 3 for the argument setup.
if (NumOps == -1)
return 4;
return NumOps;
}
```
The code above the test expectations show the compiled code which is 2
operations. So the cost should've been 2 instead of 4, which is fixed
with this commit.
|
Hello @osa1 👋 Thank you for submitting a Pull Request (PR) to the LLVM Project. Since this is your first PR, here are a few useful links covering our main contribution policies and review practices.
Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description. Frequently asked questionsHow do I add reviewers? This PR will be automatically labeled, and the relevant teams will be notified. For some parts of the project, reviewers may also be added automatically. You can also add reviewers manually using the Reviewers section on this page. If you cannot use that section, it is probably because you do not have write permissions for the repository. In that case, you can request a review by tagging reviewers in a comment using What if there are no comments? If you have not received any comments on your PR after a week, you can request a review by pinging the PR with a comment such as “Ping”. The common courtesy ping rate is once a week. Please remember that you are asking for volunteer time from other developers. Are any special GitHub settings required to contribute to LLVM? We only require contributors to have a public email address associated with their GitHub commits, see this section of LLVM Developer Policy for details. If you have questions, feel free to leave a comment on this PR, or ask on LLVM Discord or LLVM Discourse. Thank you, |
|
@llvm/pr-subscribers-backend-arm @llvm/pr-subscribers-llvm-analysis Author: Ömer Sinan Ağacan (osa1) Changes
Relevant isel code:
The code above the test expectations show the compiled code which is 2 operations. So the cost should've been 2 instead of 4, which is fixed with this commit. Full diff: https://github.com/llvm/llvm-project/pull/199649.diff 2 Files Affected:
diff --git a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
index ab48a0c4ba39c..7b13ac5693762 100644
--- a/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
+++ b/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
@@ -1226,8 +1226,8 @@ int ARMTTIImpl::getNumMemOps(const IntrinsicInst *I) const {
return -1;
const unsigned Size = C->getValue().getZExtValue();
- const Align DstAlign = MC->getDestAlign().valueOrOne();
const Align SrcAlign = MC->getSourceAlign().valueOrOne();
+ const Align DstAlign = std::min(MC->getDestAlign().valueOrOne(), SrcAlign);
MOp = MemOp::Copy(Size, /*DstAlignCanChange*/ false, DstAlign, SrcAlign,
/*IsVolatile*/ false);
diff --git a/llvm/test/Analysis/CostModel/ARM/memcpy.ll b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
index f397397125c05..690d6b5a1bf8a 100644
--- a/llvm/test/Analysis/CostModel/ARM/memcpy.ll
+++ b/llvm/test/Analysis/CostModel/ARM/memcpy.ll
@@ -742,7 +742,7 @@ define void @memcpy_1_al41(ptr %d, ptr %s) {
; strb r1, [r0]
;
; COMMON-LABEL: 'memcpy_1_al41'
-; COMMON-NEXT: Cost Model: Found an estimated cost of 4 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
+; COMMON-NEXT: Cost Model: Found an estimated cost of 2 for instruction: call void @llvm.memcpy.p0.p0.i32(ptr align 4 %d, ptr align 1 %s, i32 1, i1 false)
; COMMON-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
entry:
|
I've read the AI tool use policy and added a line to the PR description re: how I used AI tools for this change. |
ARMTTIImpl::getNumMemOps destination alignment calculationARMTTIImpl::getNumMemOps destination alignment calculation
efriedma-quic
left a comment
There was a problem hiding this comment.
This seems like it makes the code even more confusing... and it's moving in the wrong direction.
Why are we estimating 4 in the first place? Increasing the alignment of the destination shouldn't make the copy more expensive.
|
I assume the problem is coming from this? I didn't look very deeply, but I wasn't sue why it was limited to Op.getSrcAlign() < Op.getDstAlign(), other than there being some code below it that assumed it was the case. For the exact test case that you are changing, I believe it should not be useful for align to be more than Size under Arm (providing it is a power2). |
|
Thanks for the reviews.
Please elaborate. What's more confusing about this two line change? Do you agree that this fixes a bug? What's the right direction we should be taking here?
It's explained in the PR description, this code is attributing 4 to libcalls: llvm-project/llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp Lines 1283 to 1292 in a975b7c The comments in the code are explaining why it's 4. The reason why Which makes llvm-project/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp Lines 217 to 223 in a975b7c But in reality the op is actually lowered as inline instructions, so it's out of sync with the instruction selection. This patch syncs them by using the same destination alignment values used by the isels.
It's making it more expensive indirectly: @davemgreen that's the code path that's taken here, yes. Note that this method is virtual: llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h Lines 4250 to 4263 in a975b7c Only AArch64 and "System Z" overrides it: llvm-project/llvm/lib/Target/AArch64/AArch64ISelLowering.h Lines 263 to 267 in a975b7c So my understand is that Note also that llvm-project/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp Lines 10712 to 10720 in a975b7c For some reason the The point of this PR is that, as explained in the description,
I'm not sure what to do with this information yet, I'll have to think more... In the meantime if you have concrete change suggestions I'd be happy to incorporate them into this PR. |
We don't want to merge the source and destination alignments at all. The only reason SelectionDAG does it is because nobody has taken the time to change the code to pass down the separate alignment markings through the memcpy lowering code. (LLVM IR was changed to allow representing separate alignments relatively recently.) Doing alignment merging in more places is the opposite of the direction we want to move. |
ARMTTIImpl::getNumMemOps's destination alignment calculation was out of sync with theSelectionDAGandGlobalISel's calculation, and it attributed wrong cost to inlinedmemcpys andmemmoves in some cases.Relevant isel code:
LegalizeHelper::lowerMemmove(line 11,088) andlowerMemcpy(line 10,979) have the same code:Align Alignment = std::min(DstAlign, SrcAlign);SelectionDAGBuilder::visitIntrinsicCall,memoveandmemcpycases have the same: (lines 6,407 and 6,745)Align Alignment = std::min(DstAlign, SrcAlign);memcpycost model test updated with the new cost. The test was actually incorrect before: cost 4 is attributed to libcalls byThe code above the test expectations show the compiled code which is 2 operations:
So the cost should've been 2 instead of 4, which is fixed with this commit.
AI was used to navigate the code base. All code changes are done by the author.