-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BOLT][AArch64] Fix search to proceed upwards from memcpy call #166182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT][AArch64] Fix search to proceed upwards from memcpy call #166182
Conversation
Search should procced from CallInst to the beginning of BB Patch by Yafet Beyene alulayafet@gmail.com
|
@llvm/pr-subscribers-bolt Author: Elvina Yakubova (ElvinaYakubova) ChangesSearch should procced from CallInst to the beginning of BB Patch by Yafet Beyene alulayafet@gmail.com Full diff: https://github.com/llvm/llvm-project/pull/166182.diff 2 Files Affected:
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 6954cb295e86a..c96f3d6a06f39 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -2758,7 +2758,7 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
BitVector WrittenRegs(RegInfo->getNumRegs());
const BitVector &SizeRegAliases = getAliases(SizeReg);
- for (auto InstIt = BB.begin(); InstIt != CallInst; ++InstIt) {
+ for (auto InstIt = CallInst; InstIt != BB.begin(); --InstIt) {
const MCInst &Inst = *InstIt;
WrittenRegs.reset();
getWrittenRegs(Inst, WrittenRegs);
diff --git a/bolt/test/runtime/AArch64/inline-memcpy.s b/bolt/test/runtime/AArch64/inline-memcpy.s
index dc59a08b889a7..badff299603a0 100644
--- a/bolt/test/runtime/AArch64/inline-memcpy.s
+++ b/bolt/test/runtime/AArch64/inline-memcpy.s
@@ -7,7 +7,7 @@
# RUN: llvm-bolt %t.exe --inline-memcpy -o %t.bolt 2>&1 | FileCheck %s --check-prefix=CHECK-INLINE
# RUN: llvm-objdump -d %t.bolt | FileCheck %s --check-prefix=CHECK-ASM
-# Verify BOLT reports that it inlined memcpy calls (11 successful inlines out of 16 total calls)
+# Verify BOLT reports that it inlined memcpy calls (11 successful inlines out of 17 total calls)
# CHECK-INLINE: BOLT-INFO: inlined 11 memcpy() calls
# Each function should use optimal size-specific instructions and NO memcpy calls
@@ -84,6 +84,9 @@
# CHECK-ASM-LABEL: <test_register_move_negative>:
# CHECK-ASM: bl{{.*}}<memcpy
+# CHECK-ASM-LABEL: <test_x2_rewrite_negative>:
+# CHECK-ASM: bl{{.*}}<memcpy
+
# Live-in parameter should NOT be inlined (size unknown at compile time)
# CHECK-ASM-LABEL: <test_live_in_negative>:
# CHECK-ASM: bl{{.*}}<memcpy
@@ -273,6 +276,15 @@ test_register_move_negative:
ret
.size test_register_move_negative, .-test_register_move_negative
+ .globl test_x2_rewrite_negative
+ .type test_x2_rewrite_negative,@function
+test_x2_rewrite_negative:
+ mov x2, #8
+ ldr x2, [sp, #24]
+ bl memcpy
+ ret
+ .size test_x2_rewrite_negative, .-test_x2_rewrite_negative
+
.globl test_live_in_negative
.type test_live_in_negative,@function
test_live_in_negative:
|
|
I just skimmed through the original memcpy work to get the context, so bear with me. Do I understand correctly, that this is only making it faster to find the correct size? Or was the previous way (searching from BB start to the Call) incorrect sometimes? |
|
@bgergely0 sorry for not explaining this in the description, it was incorrect sometimes, it could be seens from |
paschalis-mpeis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix Elvina and Yafet, looks good!
Apart from correctness, it should also be slightly faster; same complexities though.
Note: If the _negative suffix is confusing (ie, a negative test), we could rename the relevant functions to _uknown for unknown size; keeping only test_negative_size as is, for negative size.
maksfb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good to me - thanks for the fix!
Please update the title to include "[BOLT][AArch64] Fix ..." and include details of what is being fixed in the summary (you've already explained how).
|
thanks for the reviews! |
The search should proceed from CallInst to the beginning of BB since X2 can be rewritten and we need to catch the most recent write before the call.
Patch by Yafet Beyene alulayafet@gmail.com