-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-symbolizer] restore --[no-]use-symbol-table option #71008
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-binary-utilities Author: None (quic-likaid) ChangesLinux kernel modules don't have section address set statically. During kernel fuzzing, we may get a fuction address but llvm-symbolizer ends up with a static variable, because in symbolizer's view, bss and text sections both starts from 0, and thus overlap. The option was unintentionally removed by 593e196, and remained as a no-op since 3d54976. Adding back the option allows us to prevent the undesired behaviour. Full diff: https://github.com/llvm/llvm-project/pull/71008.diff 3 Files Affected:
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index fe5df077b45664d..a85dbdfef47d408 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -303,6 +303,11 @@ OPTIONS
Don't print demangled function names.
+.. option:: --no-use-symbol-table
+
+ Don't prefer function names stored in symbol table to function names in debug
+ info sections.
+
.. option:: --obj <path>, --exe, -e
Path to object file to be symbolized. If ``-`` is specified, read the object
@@ -447,6 +452,11 @@ OPTIONS
of the absolute path. If the command-line to the compiler included
the full path, this will be the same as the default.
+.. option:: --use-symbol-table
+
+ Prefer function names stored in symbol table to function names in debug info
+ sections. This is the default.
+
.. option:: --verbose
Print verbose address, line and column information.
diff --git a/llvm/tools/llvm-symbolizer/Opts.td b/llvm/tools/llvm-symbolizer/Opts.td
index 6742e086d6ff954..29d376457a929b0 100644
--- a/llvm/tools/llvm-symbolizer/Opts.td
+++ b/llvm/tools/llvm-symbolizer/Opts.td
@@ -57,6 +57,8 @@ def relative_address : F<"relative-address", "Interpret addresses as addresses r
def relativenames : F<"relativenames", "Strip the compilation directory from paths">;
defm untag_addresses : B<"untag-addresses", "", "Remove memory tags from addresses before symbolization">;
def use_dia: F<"dia", "Use the DIA library to access symbols (Windows only)">;
+defm use_symbol_table : B<"use-symbol-table", "Prefer function names stored in symbol table",
+ "Don't prefer function names stored in symbol table">;
def verbose : F<"verbose", "Print verbose line info">;
def version : F<"version", "Display the version">;
diff --git a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
index 78a0e6772f3fb36..646bcd163e93c32 100644
--- a/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
+++ b/llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
@@ -469,7 +469,8 @@ int llvm_symbolizer_main(int argc, char **argv, const llvm::ToolContext &) {
Opts.UseDIA = false;
}
#endif
- Opts.UseSymbolTable = true;
+ Opts.UseSymbolTable =
+ Args.hasFlag(OPT_use_symbol_table, OPT_no_use_symbol_table, true);
if (Args.hasArg(OPT_cache_size_EQ))
parseIntArg(Args, OPT_cache_size_EQ, Opts.MaxCacheSize);
Config.PrintAddress = Args.hasArg(OPT_addresses);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case?
@MaskRay, any thoughts?
Got links to these commits? I can't find them at first glance, at least... So the issue is that if we use the ELF symbol table we can't differentiate functions from variables, but if we use the DWARF we can, and so we don't mistakenly symbolize unrelocated addresses as referring to variables? |
The loadable kernel module files (.ko) are relocatable object files. Most functions have If the .ko files are compiled with -ffunction-sections (not popular as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
unintentional removal: https://reviews.llvm.org/D83530#inline-790089
Yes. It's the case we observed with Dynamically Loadable Kernel Modules. |
https://android-review.googlesource.com/c/platform/ndk/+/1419436 landed 3 years ago. I think we can remove Do you have an example how a false |
128db1a
to
3e88cda
Compare
I can remove it if nobody objects
I just added a test to the commit. It prints The test is somewhat artifitial though, here is an output from real world .ko, which motivated this PR (the module is too large to be shared here):
|
And which behavior is it that the Dynamically Loadable Kernel Modules need? (I guess if you're restoring the behavior of the Or is @MaskRay's comments about changes addressed your needs & the |
For DLKM, I'm not restoring
I believe he is suggesting removing the |
OK, so we're reintroducing the same functionality we had before (I assume we had it under |
This PR doesn't create new breakage by itself. Since https://reviews.llvm.org/D83530, we are using |
Yeah, just seems like a weird situation overall. No particular notes now, though. I think we usually test from assembly (have the test assemble it with |
Yes, prefer to avoid canned object files and use YAML/assembly/IR (roughly from most to least preferred) instead. This allows you to annotate the bits of the input that are specifically important for the test, so that future maintainers know what they should/should not change if updating tests, and also to see changes over time. It also avoids permanent binary blobs in the git history, which helps with the repo size (the first binary might sometimes be smaller than the input asm etc, but the cumulative effect of updates won't be). |
Sections in relocatable ELFs have their `sh_addr` set to 0. This can confuse llvm-symbolizer when it tries to use symbol table to get function name. It may end up with a global variable in the bss section. This is observed when the symbolizer is used for Linux's dynamically loadable kernel modules. The option was unintentionally removed by 593e196, and remained as a no-op since 3d54976. Adding back the option allows us to prevent the undesired behaviour.
3e88cda
to
65e1664
Compare
@quic-likaid, please avoid force pushes unless you really need them, as it makes it harder to track the changes between versions of the PR. Instead, use fix up commits (they'll be squashed in when landing the change). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few remaining minor points from me.
@MaskRay, anything to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, in that I think this patch does what it sets out to do and conforms to LLVM standards, but please wait for further feedback from @MaskRay, as I have no opinions on whether we actually want this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is low overhead and we used to have it, so LG. It at least gives us a way to test the symbolization behavior, so I am skeptical how much this can help symbolizing a relocatable object file.
Linux kernel modules don't have section address set statically. During kernel fuzzing, we may get a fuction address but llvm-symbolizer ends up with a static variable, because in symbolizer's view, bss and text sections both starts from 0, and thus overlap.
The option was unintentionally removed by 593e196, and remained as a no-op since 3d54976. Adding back the option allows us to prevent the undesired behaviour.