Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API improvements for third-party tools #89

Closed
landaire opened this issue Oct 27, 2020 · 4 comments
Closed

API improvements for third-party tools #89

landaire opened this issue Oct 27, 2020 · 4 comments

Comments

@landaire
Copy link
Contributor

landaire commented Oct 27, 2020

Sorry for the "mega issue" but I had a couple of questions/comments regarding the current API design and if certain changes would be welcome. I wrote a tool this weekend that I intend to improve upon for exposing some of the interesting PDB metadata that this library exposes: https://github.com/landaire/pdbview

There were two changes/workarounds I had to make in my own local branch for this tool:

  1. CompileFlags fields are not public, essentially making the struct useless: landaire@62aedc0
  2. Some of the data types in the constants module are exposed through public APIs but the module itself is not (e.g. CPUType). This is somewhat awkward as you cannot explicitly import this type into your own code and cannot read documentation for these types (see: https://docs.rs/pdb/0.6.0/pdb/struct.CompileFlagsSymbol.html#structfield.cpu_type). My workaround here was to just format these as Strings since that was all I need anyways.

These two issues I'd be happy to make PRs for if the intent is for these items to be public.

The next issue I encountered was that the ProcedureReferenceSymbol is sort of useless if you actually want to do anything with it. In order to look up the symbol you need to look up the correct module -- the ID of which can be obtained from the module field. From here I'm stumped. There's no way to get a Module from a module index. Other similar scenarios use custom types, such as SymbolIndex, and allow you to use methods such as ModuleInfo::symbols_at(index) to grab the symbol or SymbolIter::seek. As far as I could tell no such APIs exist for modules and none of the examples or internal code reference these fields.

@jan-auer
Copy link
Member

Thanks @landaire for opening this issue! Improvements and contributions are much welcome.

There were two changes/workarounds I had to make in my own local branch for this tool [...]

You're right; both of these are oversights and should definitely be updated. If you like, feel free to create a PR, or let me know if you'd like me to fix this.

The next issue I encountered was that the ProcedureReferenceSymbol is sort of useless if you actually want to do anything with it. In order to look up the symbol you need to look up the correct module -- the ID of which can be obtained from the module field. From here I'm stumped. There's no way to get a Module from a module index.

You're right, this API is currently missing. The issue with adding such an API was that there is no efficient way to provide random access to a module by index, that I'm currently aware of. Ultimately, it looks like this:

  1. The DebugInformation stream (DBI) contains a stream of module entries. These are variable-length encoded, so you have to iterate to get the full list.
  2. With the module entry, you can load the actual ModuleInfo, which is an expensive process. It loads sections from the PDB into memory, which causes a lot of allocations.

The "solution" to this was to pass the burden of loading and indexing modules to the user. For instance, we've built a type that collects all module entries ahead of time into a vector and then lazy loads the module on first access. At the time, I believed this is a bit too high level for the PDB crate, but I'm happy to reconsider and add a convenience type that takes care of this.

This gets even worse when you look at the ItemIndex trait, where you have to look up a module by name. Building such maps incurs overhead, and as such I believe offering a utility type that indicates this overhead is a reasonable approach.

Other similar scenarios use custom types, such as SymbolIndex, and allow you to use methods such as ModuleInfo::symbols_at(index) to grab the symbol or SymbolIter::seek. As far as I could tell no such APIs exist for modules and none of the examples or internal code reference these fields.

You're right. We were able to implement these efficiently due to a difference in what the indexes actually are. While the module index is in fact the numeric index of the module in the stream, the SymbolIndex is a byte offset into the symbol stream. For that reason, we were able to implement "random access".

@landaire
Copy link
Contributor Author

Thanks for the feedback!

The "solution" to this was to pass the burden of loading and indexing modules to the user. For instance, we've built a type that collects all module entries ahead of time into a vector and then lazy loads the module on first access. At the time, I believed this is a bit too high level for the PDB crate, but I'm happy to reconsider and add a convenience type that takes care of this.

Are the modules indexed in order of how they're processed by the iterator? If so, that's trivial for me to handle in my own code and don't mind doing it myself -- I just didn't know if that was guaranteed to match.

@jan-auer
Copy link
Member

This is my understanding, yes. To compare with how Microsoft's PDB code works:

  • Code for resolving a module reference can be seen here.
  • This subtracts 1 via imodForXimod to get a 0-based index. I'm assuming a literal 0 means no reference.
  • openModByImod then runs pmodiForImod.
  • pmodiForImod indexes into rgpmodi.
  • rgpmodi is built sequentially here.

So I think it should be safe to assume that you can take dbi.modules().nth(proc_ref.module - 1). Looking at this, I think we can improve the API by making module an option, but I'd like to verify that first. Can you confirm that the above yields the correct results for you?

@jan-auer
Copy link
Member

jan-auer commented Jun 3, 2022

All points in this PR are now addressed, with the exception of constants being private which is now tracked in #120. Thanks for opening this issue and suggesting improvements!

@jan-auer jan-auer closed this as completed Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants