Skip to content

Conversation

maksfb
Copy link
Contributor

@maksfb maksfb commented Sep 22, 2025

If an address has both, a data marker "$d" and a function symbol associated with it, treat it as code.

@llvmbot
Copy link
Member

llvmbot commented Sep 22, 2025

@llvm/pr-subscribers-bolt

Author: Maksim Panchenko (maksfb)

Changes

If an address has both, a data marker "$d" and a function entry point associated with it, treat it as code.


Full diff: https://github.com/llvm/llvm-project/pull/160161.diff

2 Files Affected:

  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (-3)
  • (added) bolt/test/AArch64/function-data-marker.s (+25)
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index a6e4dbc9c192f..014fa43f82867 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -917,9 +917,6 @@ void RewriteInstance::discoverFileObjects() {
     bool IsData = false;
     uint64_t LastAddr = 0;
     for (const auto &SymInfo : SortedSymbols) {
-      if (LastAddr == SymInfo.Address) // don't repeat markers
-        continue;
-
       MarkerSymType MarkerType = BC->getMarkerType(SymInfo.Symbol);
 
       // Treat ST_Function as code.
diff --git a/bolt/test/AArch64/function-data-marker.s b/bolt/test/AArch64/function-data-marker.s
new file mode 100644
index 0000000000000..35e31e9fc1731
--- /dev/null
+++ b/bolt/test/AArch64/function-data-marker.s
@@ -0,0 +1,25 @@
+## TODO
+
+# RUN: %clang %cflags %s -o %t.exe -nostdlib -fuse-ld=lld -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.bolt --print-cfg --print-only=foo 2>&1 | FileCheck %s
+
+# BOLT-WARNING: function symbol foo lacks code marker
+
+.text
+.balign 4
+
+.global _start
+.type _start, %function
+_start:
+  bl foo
+  ret
+.size _start, .-_start
+
+## Data marker is emitted immediately before the function.
+.global foo
+.type foo, %function
+$d:
+foo:
+  ret
+.size foo, .-foo
+

@yozhu
Copy link
Contributor

yozhu commented Sep 22, 2025

For the test, maybe we can also add a case where $x label is added to some constant island, to make sure that case is also covered as expected.

@yavtuk
Copy link
Contributor

yavtuk commented Sep 23, 2025

@maksfb as workaround we can try the following

     cl::Hidden, cl::cat(BoltCategory));

+static cl::opt<bool> ForceFuncEntryPoint(
+    "force-func-entry-point",
+    cl::desc("force entry point for function at 0 offset"),
+    cl::init(false),
+    cl::Hidden, cl::cat(BoltCategory));
+
+
 static cl::opt<std::string>
     BoltID("bolt-id",
            cl::desc("add any string to tag this execution in the "
@@ -1319,6 +1326,10 @@ void RewriteInstance::discoverFileObjects() {
       continue;
     }
     const auto EntryOffset = Address - BF->getAddress();
+    if (opts::ForceFuncEntryPoint && !EntryOffset) {
+      BF->markCodeAtOffset(EntryOffset);
+      continue;
+    }
     if (Type == MarkerSymType::CODE) {
       BF->markCodeAtOffset(EntryOffset);
       continue;
# | Binary Function "foo" after building cfg {
# |   Number      : 2
# |   State       : CFG constructed
# |   Address     : 0x10260
# |   Size        : 0x4
# |   MaxSize     : 0x4
# |   Offset      : 0x260
# |   Section     : .text
# |   Orc Section : .local.text.foo
# |   LSDA        : 0x0
# |   IsSimple    : 1
# |   IsMultiEntry: 0
# |   IsSplit     : 0
# |   BB Count    : 1
# |   Hash        : 8aa3f4604e30044c
# |   BB Layout   : .LBB01
# | }
# | .LBB01 (1 instructions, align : 1)
# |   Entry Point
# |     00000000:       ret
# |
# | DWARF CFI Instructions:
# |     <empty>
# | End of Function "foo"

@maksfb
Copy link
Contributor Author

maksfb commented Oct 2, 2025

@yavtuk do you have an example where we have data at the start of the function? I'd like to consider us treating such cases differently, i.e. shift the function entry point to an actual code location.

@yavtuk
Copy link
Contributor

yavtuk commented Oct 3, 2025

@yavtuk do you have an example where we have data at the start of the function? I'd like to consider us treating such cases differently, i.e. shift the function entry point to an actual code location.

Now I found only couple commits which can be related to this

yavtuk/openssl@28b80f2

.type	_armv8_sm4_probe,%function
_armv8_sm4_probe:
	AARCH64_VALID_CALL_TARGET
	.long	0xcec08400	// sm4e	v0.4s, v0.4s
	ret
.size	_armv8_sm4_probe,.-_armv8_sm4_probe

also you can take a look at this commit
yavtuk/openssl@e0be006

I'll add more specific details if I find them

@yavtuk
Copy link
Contributor

yavtuk commented Oct 3, 2025

@yavtuk do you have an example where we have data at the start of the function? I'd like to consider us treating such cases differently, i.e. shift the function entry point to an actual code location.

@maksfb I can't find the exactly function but for x86 I remember we faced with function smth like that

void extra_space() {
  asm volatile(".rept 256\n"
               "    .byte 0xff\n"
               ".endr\n");
  ....
  ....
  return;
}

entry point for the one was with 256 offset, in openssl it was a data which is used for algorithm

I think we can re-check openssl 3.x and 1.1.x versions and take a look is this still exist

@maksfb
Copy link
Contributor Author

maksfb commented Oct 3, 2025

@yavtuk do you have an example where we have data at the start of the function? I'd like to consider us treating such cases differently, i.e. shift the function entry point to an actual code location.

Now I found only couple commits which can be related to this

yavtuk/openssl@28b80f2

.type	_armv8_sm4_probe,%function
_armv8_sm4_probe:
	AARCH64_VALID_CALL_TARGET
	.long	0xcec08400	// sm4e	v0.4s, v0.4s
	ret
.size	_armv8_sm4_probe,.-_armv8_sm4_probe

also you can take a look at this commit yavtuk/openssl@e0be006

I'll add more specific details if I find them

This one makes sense. Even though it's better be using .inst directive, it's still code.

@maksfb
Copy link
Contributor Author

maksfb commented Oct 3, 2025

@yavtuk do you have an example where we have data at the start of the function? I'd like to consider us treating such cases differently, i.e. shift the function entry point to an actual code location.

@maksfb I can't find the exactly function but for x86 I remember we faced with function smth like that

void extra_space() {
  asm volatile(".rept 256\n"
               "    .byte 0xff\n"
               ".endr\n");
  ....
  ....
  return;
}

entry point for the one was with 256 offset, in openssl it was a data which is used for algorithm

I think we can re-check openssl 3.x and 1.1.x versions and take a look is this still exist

Thanks. I'll take a look at the library.

@maksfb
Copy link
Contributor Author

maksfb commented Oct 4, 2025

@yavtuk Have you seen the following commit to openssl? It basically duplicates your change, but for a different reasons:

openssl/openssl@c6e65c1

@maksfb
Copy link
Contributor Author

maksfb commented Oct 4, 2025

I think we can ignore x86 in the context of constant islands, since we don't create them for x86.

For aarch64, I checked openssl 1.1.1 and 3.x. Only in libcrypto.so from 1.1.1 there's a "CI" at the start of a "function" (as identified by BOLT):

_vpaes_consts
_armv8_sha512_probe
ecp_nistz256_precomputed
iotas

Of those four, only _armv8_sha512_probe is marked with ST_FUNC and it is in fact code, just incorrectly specified with .long instead of .inst. You've fixed that in your local branch yavtuk/openssl@28b80f2

To summarize, it seems to be safe to resolve a conflict between ST_FUNC and $d in favor of code.

@maksfb maksfb changed the title [BOLT][WIP] Always treat function entry as code [BOLT] Always treat function entry as code Oct 4, 2025
Copy link

github-actions bot commented Oct 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

If an address has both, a data marker "$d" and a function symbol
associated with it, treat it as code.
Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Check that if a data marker is present at the start of a function, the underlying bytes are still treated as code.

It looks like this covers the case I listed here for PR #160143?

@maksfb
Copy link
Contributor Author

maksfb commented Oct 6, 2025

It looks like this covers the case I listed here for PR #160143?

Good question. foo has to be marked as a function (ST_FUNC) for it to take precedence over $d, otherwise, the bytes at the location are still treated as data.

Thanks for reviews.

@maksfb maksfb merged commit de9b3ca into llvm:main Oct 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants