Skip to content

Conversation

@ElvinaYakubova
Copy link
Contributor

@ElvinaYakubova ElvinaYakubova commented Nov 3, 2025

The search should proceed from CallInst to the beginning of BB since X2 can be rewritten and we need to catch the most recent write before the call.

Patch by Yafet Beyene alulayafet@gmail.com

Search should procced from CallInst to the beginning of BB

Patch by Yafet Beyene alulayafet@gmail.com
@llvmbot
Copy link
Member

llvmbot commented Nov 3, 2025

@llvm/pr-subscribers-bolt

Author: Elvina Yakubova (ElvinaYakubova)

Changes

Search should procced from CallInst to the beginning of BB

Patch by Yafet Beyene alulayafet@gmail.com


Full diff: https://github.com/llvm/llvm-project/pull/166182.diff

2 Files Affected:

  • (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+1-1)
  • (modified) bolt/test/runtime/AArch64/inline-memcpy.s (+13-1)
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index 6954cb295e86a..c96f3d6a06f39 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -2758,7 +2758,7 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     BitVector WrittenRegs(RegInfo->getNumRegs());
     const BitVector &SizeRegAliases = getAliases(SizeReg);
 
-    for (auto InstIt = BB.begin(); InstIt != CallInst; ++InstIt) {
+    for (auto InstIt = CallInst; InstIt != BB.begin(); --InstIt) {
       const MCInst &Inst = *InstIt;
       WrittenRegs.reset();
       getWrittenRegs(Inst, WrittenRegs);
diff --git a/bolt/test/runtime/AArch64/inline-memcpy.s b/bolt/test/runtime/AArch64/inline-memcpy.s
index dc59a08b889a7..badff299603a0 100644
--- a/bolt/test/runtime/AArch64/inline-memcpy.s
+++ b/bolt/test/runtime/AArch64/inline-memcpy.s
@@ -7,7 +7,7 @@
 # RUN: llvm-bolt %t.exe --inline-memcpy -o %t.bolt 2>&1 | FileCheck %s --check-prefix=CHECK-INLINE
 # RUN: llvm-objdump -d %t.bolt | FileCheck %s --check-prefix=CHECK-ASM
 
-# Verify BOLT reports that it inlined memcpy calls (11 successful inlines out of 16 total calls)
+# Verify BOLT reports that it inlined memcpy calls (11 successful inlines out of 17 total calls)
 # CHECK-INLINE: BOLT-INFO: inlined 11 memcpy() calls
 
 # Each function should use optimal size-specific instructions and NO memcpy calls
@@ -84,6 +84,9 @@
 # CHECK-ASM-LABEL: <test_register_move_negative>:
 # CHECK-ASM: bl{{.*}}<memcpy
 
+# CHECK-ASM-LABEL: <test_x2_rewrite_negative>:
+# CHECK-ASM: bl{{.*}}<memcpy
+
 # Live-in parameter should NOT be inlined (size unknown at compile time)
 # CHECK-ASM-LABEL: <test_live_in_negative>:
 # CHECK-ASM: bl{{.*}}<memcpy
@@ -273,6 +276,15 @@ test_register_move_negative:
 	ret
 	.size	test_register_move_negative, .-test_register_move_negative
 
+	.globl  test_x2_rewrite_negative
+	.type   test_x2_rewrite_negative,@function
+test_x2_rewrite_negative:
+	mov     x2, #8
+	ldr     x2, [sp, #24]
+	bl      memcpy
+	ret
+	.size   test_x2_rewrite_negative, .-test_x2_rewrite_negative
+
 	.globl	test_live_in_negative
 	.type	test_live_in_negative,@function
 test_live_in_negative:

@bgergely0
Copy link
Contributor

I just skimmed through the original memcpy work to get the context, so bear with me.

Do I understand correctly, that this is only making it faster to find the correct size? Or was the previous way (searching from BB start to the Call) incorrect sometimes?

@ElvinaYakubova
Copy link
Contributor Author

@bgergely0 sorry for not explaining this in the description, it was incorrect sometimes, it could be seens from test_x2_rewrite_negative test

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix Elvina and Yafet, looks good!
Apart from correctness, it should also be slightly faster; same complexities though.

Note: If the _negative suffix is confusing (ie, a negative test), we could rename the relevant functions to _uknown for unknown size; keeping only test_negative_size as is, for negative size.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me - thanks for the fix!

Please update the title to include "[BOLT][AArch64] Fix ..." and include details of what is being fixed in the summary (you've already explained how).

@ElvinaYakubova
Copy link
Contributor Author

thanks for the reviews!
@paschalis-mpeis I'll update function names in a separate MR

@ElvinaYakubova ElvinaYakubova changed the title [BOLT] Update search to proceed upwards from memcpy call [BOLT][AArch64] Fix search to proceed upwards from memcpy call Nov 5, 2025
@ElvinaYakubova ElvinaYakubova merged commit a65867a into llvm:main Nov 5, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants