Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SymbolIndex and kallsyms helpers #388

Merged
merged 6 commits into from
Oct 12, 2024
Merged

Conversation

brenns10
Copy link
Contributor

This is the next step of the DWARFless debugging / CTF feature, #176. Unlike the symbol finder, there aren't too many questions of API, instead we're just implementing the API. This PR adds:

  • The SymbolIndex type, which holds an immutable list of symbols and indexes them for efficient name and address based lookup. Some interesting items here:
    • It uses a hash for name lookup, and an interesting binary search approach for address lookup, in order to support duplicated names and overlapping address ranges.
    • I chose SymbolIndex over SymbolTable since I didn't want to overload the common usage of that term applying to ELF symbol tables. That said, SymbolIndex is a bit strange so I'm open to suggestions.
  • Two C helpers (load_proc_kallsyms() and load_builtin_kallsyms()) enumerate all kallsyms symbols and return a SymbolIndex object. The first uses /proc/kallsyms and the second uses information from VMCOREINFO to find the kallsyms tables.
    • The proc version lets you specify a filename, in case you happen to have a copy of /proc/kallsyms from a vmcore.
    • /proc/kallsyms includes module symbols, while the built-in vmlinux kallsyms of course does not. I've included an option to parse these symbols, however by default, we do not. This is because you can actually get the ELF symbol objects, which have more information, directly from the struct module.
  • A Python helper (load_module_kallsyms()) for loading module symbols from the kernel image and creating a symbol index from that. This does require debuginfo for vmlinux.

I also of course have tests for the above.

Below are two issues you'll probably notice and want to discuss:

  1. The guess_long_names() function works around torvalds/linux@73bbb94466fd ("kallsyms: support "big" kernel symbols") which was merged in 6.1. I didn't catch it immediately so I'm not sure whether there's any use in sending a commit to include a version number for kallsyms... Especially since I believe it's unlikely to be backported, but I could be wrong. It has certainly not been backported to stable kernels.
  2. The final commit of the branch adds an optimization where we can mark symbols as having a lifetime. Symbols with static lifetimes are not freed by drgn_symbol_destroy(). This enables the SymbolIndex to provides symbols with zero copying, even all the way out to the Python layer! However, I'm not certain it's worth the complexity it adds. That's why I kept the change as a second commit; it's easy to drop if you don't like it. I haven't measured any performance impact and admittedly, I don't know if it would be noticeable.

@brenns10
Copy link
Contributor Author

Alright, sorry for the failure (it was an old commit changing how ELF symbols are stored in struct mod_kallsyms). I had been hoping for a clean bill of health on the first try :D
But it's ready now, excited to take the new symbol finder API out for a spin :D

@brenns10
Copy link
Contributor Author

These latest pushes rebase on the current main branch, with the updated symbol finder API.

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only looked at the first two commits so far, so feel free to ignore these comments until I'm done or address them now, up to you. I'll continue reviewing tomorrow.

libdrgn/symbol.c Outdated Show resolved Hide resolved
libdrgn/symbol.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Show resolved Hide resolved
libdrgn/symbol.c Outdated Show resolved Hide resolved
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I finally made it through the whole series. Good stuff! I can't claim that I fully understood the kallsyms format this time around, but I at least tried to look for coding errors.

The guts of the implementation look good, but the drgn.helpers.linux.kallsyms interface is a bit overwhelming. There are lots of similar-looking helpers that I wouldn't know how to make use as an end user. I wonder if we can pare it down a bit. What are the most important use cases that you're targeting?

tests/linux_kernel/helpers/test_kallsyms.py Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
}
} else {
if (kind_ret)
*kind_ret = *token_ptr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return without ever setting this, leaving the caller with uninitialized garbage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's worth guarding against. I think the easiest place to do it is actually in drgn_load_builtin_kallsyms(), where we can simply check whether the string builder length is zero and raise an error.

There's a lot of stuff in the kallsyms tables that, if fuzzed, could cause these parsers to behave weirdly or maybe crash. I'm not certain how paranoid we want to be here. At this point, if the correct set of symbols are provided, with real values, and we've managed to read the token and name table successfully (without faulting or mallocing a buffer that's way too large), then it seems likely that the data will match the kallsyms format. So in practical terms, these sorts of things shouldn't happen. What are your opinions here? I can add paranoia to this, but it does feel like it will take rather unclear code and make it even worse...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of stuff in the kallsyms tables that, if fuzzed, could cause these parsers to behave weirdly or maybe crash.

The general philosophy in drgn is that bogus input can produce bogus results, but it should never cause drgn to crash or invoke undefined behavior. That does tend to require lots of extra bounds checking and overflow checking. I'll keep an eye out for that as I review the kallsyms code.

@brenns10
Copy link
Contributor Author

Thanks for the review! I'll take a look through it in detail and try to get an update ASAP, rebased on main as well. It looks like you made a lot of good catches, so thanks for that.

The guts of the implementation look good, but the drgn.helpers.linux.kallsyms interface is a bit overwhelming. There are lots of similar-looking helpers that I wouldn't know how to make use as an end user. I wonder if we can pare it down a bit. What are the most important use cases that you're targeting?

Yeah, I have even confused myself... I've occasionally imported from there and not known which function I want to use. So it does need to be simplified.

The primary use cases I can think of are:

  1. The full non-DWARF approach where you use kallsyms, CTF || BTF, and ORC or frame pointers for unwinding. In this case, you need to do a delicate dance: first load vmlinux kallsyms, then the types (you need the symbol table to find the BTF, though CTF doesn't require the symbol table first), then the module kallsyms (since module kallsyms requires vmlinux types). In the final CTF branch, there's a simple drgn.helpers.linux.ctf:load_ctf() function that just does everything for you in the right order, so most of the time nobody would need to think about it.
  2. The partial DWARF approach, where you have vmlinux DWARF loaded, but you didn't extract all the modules. This actually happens all the time for us internally, because we typically extract only the debuginfo we need. In that case, it's very nice to be able to add a symbol finder for the kernel modules, so that symbolization works as expected without loading the module debuginfo.

That would mean at a minimum, we'd really just need load_vmlinux_kallsyms() and load_module_kallsyms(). I don't think the load_builtin_kallsyms() and load_proc_kallsyms() functions are really necessary to be part of the API (though keeping them as private/internal functions would be nice, because I could see myself wanting to use them hackily).

The module_kallsyms() function is exceptionally poorly named. I included this as a public function because I could imagine a world where a user may have some kernel modules' DWARF debuginfo loaded, but they want to load the other modules' symbols. They could use this function to get that subset of symbols. Or, alternatively, I could see it being useful for a user to simply ask: what symbols are part of this kernel module? But to sum up, I don't have a single great use case for this function. I wouldn't be disappointed to scrap it from the API, or at a minimum, it should be renamed to something clearer (e.g. get_module_symbols()).

@brenns10 brenns10 force-pushed the kallsyms_finder branch 3 times, most recently from e268fbd to 8a9c865 Compare August 21, 2024 02:47
@brenns10
Copy link
Contributor Author

Ok after a few botched pushes I think I've addressed everything here. I've also rebased on main to drop my drgn.helpers.linux.module changes which conflicted with #411. I did leave open the comments with more substantial replies in case you want to continue those discussions, though I think I addressed those too. Thanks!

In commit c8ff872 ("Support systems without qsort_r"), usage of qsort_r
was eliminated because it is a glibc extension. There was discussion of
creating a utility function that implements qsort_r(), but the approach
described uses thread local variables, so it is not actually reentrant,
and it was dropped to avoid confusion.

However, upcoming commits will also prefer a comparator function which
takes an argument, and they also won't require a reentrant
implementation. Add this helper in with a name that shouldn't spark
confusion: qsort_arg().

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
This will be reused in an upcoming commit.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@brenns10
Copy link
Contributor Author

Noticed that the branch had become stale due to some merge conflicts.

This first push here has just resolved the conflicts. Got a lint error (git rebase does not run pre-commit :/). I still need to incorporate your new binary search macro, so I'll resolve both shortly.

@brenns10
Copy link
Contributor Author

I'm loving the binary search helpers, they simplified things a lot here:

diff --git a/libdrgn/symbol.c b/libdrgn/symbol.c
index 46e21a6b..92bcfdb9 100644
--- a/libdrgn/symbol.c
+++ b/libdrgn/symbol.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 #include <string.h>
 
+#include "binary_search.h"
 #include "drgn_internal.h"
 #include "string_builder.h"
 #include "symbol.h"
@@ -300,38 +301,21 @@ drgn_symbol_index_deinit(struct drgn_symbol_index *index)
 static void address_search_range(struct drgn_symbol_index *index, uint64_t address,
 				 uint32_t *start_ret, uint32_t *end_ret)
 {
-	// bsearch() only handles exact matches, and we need to find what is
-	// essentially the correct insertion point, even if there's not an exact
-	// match. So bsearch() can't be used here. It's helpful to think of this
-	// code in terms of the Python library functions bisect_right().
-	uint32_t lo = 0, hi = index->num_syms;
-	uint32_t mid;
-
 	// First, identify the maximum symbol index which could possibly contain
 	// this address. Think of this as:
 	//   end_ret = bisect_right([s.address for s in symbols], address)
-	while (lo < hi) {
-		mid = lo + (hi - lo) / 2;
-		if (address < index->symbols[mid].address)
-			hi = mid;
-		else
-			lo = mid + 1;
-	}
-	*end_ret = lo;
+	#define less_than_start(a, b) (*(a) < (b)->address)
+	*end_ret = binary_search_gt(index->symbols, index->num_syms, &address,
+				    less_than_start);
+	#undef less_than_start
 
 	// Second, identify first symbol index which could possibly contain this
 	// address. We need to use "max_addrs" for this task:
 	//    bisect_right(max_addrs, address)
-	hi  = lo;
-	lo = 0;
-	while (lo < hi) {
-		mid = lo + (hi - lo) / 2;
-		if (address < index->max_addrs[mid])
-			hi = mid;
-		else
-			lo = mid + 1;
-	}
-	*start_ret = lo;
+	#define less_than_end(a, b) (*(a) < *(b))
+	*start_ret = binary_search_gt(index->max_addrs, index->num_syms, &address,
+				    less_than_end);
+	#undef less_than_end
 }
 
 struct drgn_error *

osandov added a commit that referenced this pull request Oct 3, 2024
Clarification for
#388 (comment).

Signed-off-by: Omar Sandoval <osandov@osandov.com>
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off for the day after looking at commits 1-3. Will continue tomorrow, thanks!

libdrgn/util.h Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
}
} else {
if (kind_ret)
*kind_ret = *token_ptr;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of stuff in the kallsyms tables that, if fuzzed, could cause these parsers to behave weirdly or maybe crash.

The general philosophy in drgn is that bogus input can produce bogus results, but it should never cause drgn to crash or invoke undefined behavior. That does tend to require lots of extra bounds checking and overflow checking. I'll keep an eye out for that as I review the kallsyms code.

libdrgn/python/program.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/python/symbol_index.c Outdated Show resolved Hide resolved
libdrgn/symbol.c Show resolved Hide resolved
libdrgn/symbol.c Outdated Show resolved Hide resolved
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's the rest of it, looking good overall. I gave the kallsyms code a much closer read than I had before, so there are some new comments there that I missed in the last round. Have a good weekend!

libdrgn/python/helpers.c Outdated Show resolved Hide resolved
libdrgn/python/helpers.c Outdated Show resolved Hide resolved
libdrgn/python/helpers.c Outdated Show resolved Hide resolved
libdrgn/python/helpers.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Show resolved Hide resolved
libdrgn/kallsyms.c Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
drgn/helpers/linux/kallsyms.py Show resolved Hide resolved
drgn/helpers/linux/kallsyms.py Outdated Show resolved Hide resolved
The Symbol Finder API gives us the ability to register a dynamic
callback for symbol lookup. However, many common use cases are satisfied
by a simple static list of symbols. Correct and efficient lookup in this
simple case is rather tricky. Implement a new type, SymbolIndex, which
can take a list of symbols and index them for efficient lookup by name
or address.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
@brenns10
Copy link
Contributor Author

I double checked to ensure the s390x failure was transient, so we've got a clean test run :)

I believe I have addressed all your comments, and I avoided rebasing on the main so the diff between versions stays clean. Thanks for the review again!

@osandov
Copy link
Owner

osandov commented Oct 10, 2024

Yeah Packit has been flaky today. I'll give these another look today!

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just need a few final easy fixes.

libdrgn/python/helpers.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/kallsyms.c Outdated Show resolved Hide resolved
libdrgn/python/util.c Outdated Show resolved Hide resolved
@osandov
Copy link
Owner

osandov commented Oct 11, 2024

(LGTM other than the u64_converter() comment)

When properly configured, Linux contains its own symbol table within
kernel memory at runtime. It is exposed as the /proc/kallsyms file,
which is the easiest way to consume it, for live kernels.

However, with recent changes to the Linux kernel in 6.0, necessary
symbols are exposed within VMCOREINFO that allow us to interpret the
data structures inside kernel memory, without needing debuginfo. This
allows us to write helpers that can load kallsyms on vmcores, or on live
systems.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Add Python helpers which load module kallsyms and return a symbol index
for them. Unlike the /proc/kallsyms and built-in kallsyms, these are
quite easy to handle using regular Python & drgn code, so implement them
as Python helpers.

There are (at least) two use cases for these helpers:
1. After loading CTF and built-in vmlinux kallsyms, support for module
   kallsyms is still necessary.
2. Sometimes, people only extract vmlinux DWARF debuginfo. Adding module
   symbols can allow stack traces and other symbolization to work even
   without module debuginfo.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
The SymbolIndex structure owns the memory for symbols and names, and
once added to the Program, it cannot be removed. Making copies of any of
these symbols is purely a waste: we should be able to treat them as
static. So add a lifetime and allow the SymbolIndex to specify static,
avoiding unnecessary copies and frees.

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work on this, it looks ready to go 🎉

@osandov osandov merged commit b380087 into osandov:main Oct 12, 2024
6 of 38 checks passed
@brenns10
Copy link
Contributor Author

Yay, that's great! Thanks for all the reviewing work on this!

@brenns10 brenns10 deleted the kallsyms_finder branch October 12, 2024 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants