Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BTF type finder #162

Closed
wants to merge 9 commits into from
Closed

Add BTF type finder #162

wants to merge 9 commits into from

Conversation

brenns10
Copy link
Contributor

@brenns10 brenns10 commented Mar 8, 2022

This is an early (ugly) draft of a type finder based on BTF! I'm hoping to get some advice on direction if possible, and once we have an idea of where to take this, I can clean up the branch into logical changes and we can move it out of the draft state.

Currently, this supports the basics:

  • typedefs, integers, enums, arrays, structs, unions, function signatures.

It is missing:

  • float, mostly because my vmcore doesn't have any floating point values to test on, but it should be trivial to implement.
  • tags - e.g. attribute tags. I haven't run into these yet but I'm sure it will need implementing before this gets merged.
  • function / variable declarations. This isn't strictly type name information, but instead it's the type information associated with the symbol table. It will need to be there for BTF to be really useful, but it's not necessarily within scope for the "type finder" API.
  • Support for module BTF. This should be 100% feasible, but I need to get a more mature implementation before I start dabbling there.

The way I've been able to exercise the functionality is by adding prog.add_btf_info(address, bytes), which will read the data from the program memory and interpret it as BTF, registering the type finder. So right now, I'm able to load a normal vmcore with debuginfo, get the addresses of __start_BTF and __stop_BTF symbols, and then run drgn again without the debuginfo, calling prog.add_btf_info(__start_BTF, __stop_BTF - __start_BTF).

I've tested it on some large/interesting structs within the kernel (task_struct, dentry, dentry_operations, inode), by running print(prog.type("...")). They all look correct to me.

I have a few specific questions (corresponding with TODOs or other outlined issues in comments):

  1. Licensing: we shouldn't rely on the system-provided <linux/btf.h> because it
    may not be fully up-to-date. So I grabbed a copy from linux 5.17. The license
    identifier on <linux/btf.h> is GPLv2, is this an issue for inclusion?
  2. BTF does not encode the architecture pointer size, is there a good way to get
    that in drgn?
  3. BTF seems to specify that all enums are encoded as 4-byte signed integers. Is
    there any good way to create such a type without relying on the language's
    name? This is for inclusion in the "compatible_type" field of the enum.

However, the biggest question I have is if you can recommend a direction for
me to go next? For the most part, this has been an "automatic" process of
reading and implementing the BTF spec, while also learning the drgn type system.
Now that a large chunk of this is complete, I'm unsure how to integrate this
module with the rest of drgn. Should the BTF registry have a pointer in
prog.dbinfo?

Using the BTF integration automatically seems to be a ways off. After all, we
still need kallsyms support, and at least a few new symbols in the vmcoreinfo
note to help us find the kallsyms in the vmcore. Without both pieces, the BTF
code is pretty unhelpful. To find the BTF information, you need a symbol table,
and if you have that you probably have DWARF info too. Plus, even if you find
the BTF information, you'll still want the symbol table so you can find data
structures to analyze with the BTF types. So would the best way forward right
now be to expose this as some sort of Python interface (cleaned up from what I
have here)?

Add btf.h copied from linux 5.17. Also, add btf.c which contains my own
implementation of indexing BTF types using the btf.h definitions. Add a
drgn type finder which currently has only a definition for integer
types. Integrate it all with a Python binding to add BTF definitions
given their address (__start_BTF, __end_BTF).
There's a lot here. If I was really composing a nice branch this would
be several commits: refactor the type handler to use integers, add
qualifiers, etc. But for now, this is good enough.
It seems bit_field_size should only be set for integers. Further, types
like const, restrict, volatile, and typedef should be skipped past to
get to their concrete integer type. Create an improved function for
determining bit_field_size and fix up the compound type creation
routine.
@brenns10
Copy link
Contributor Author

brenns10 commented Mar 9, 2022

As part of this development process I've learned that, although BTF specifies a type kind for variable declarations, the kernel's BTF only includes information for percpu variables. This restriction seems to be hardcoded into the BTF encoder of pahole. I'm not sure why percpu variables are special to the BPF use case. But my guess is that one motivation is to reduce the BTF size, since each variable declaration would contribute 16 bytes of descriptor, plus its string length as well as any additional type information it depends on.

For the purpose of using BTF as a fallback source of type information in a debugger, this seriously limits its utility.

@brenns10
Copy link
Contributor Author

I'm closing this out -- the amount of functionality I want to add (BTF, kallsyms, and some odds and ends) is too big for a single pull request, and the work in this draft is outdated. I'll be sending PRs for individual parts of this.

@brenns10 brenns10 closed this May 23, 2022
@brenns10 brenns10 mentioned this pull request Mar 9, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant