API improvements for third-party tools #89

landaire · 2020-10-27T00:01:28Z

Sorry for the "mega issue" but I had a couple of questions/comments regarding the current API design and if certain changes would be welcome. I wrote a tool this weekend that I intend to improve upon for exposing some of the interesting PDB metadata that this library exposes: https://github.com/landaire/pdbview

There were two changes/workarounds I had to make in my own local branch for this tool:

CompileFlags fields are not public, essentially making the struct useless: landaire@62aedc0
Some of the data types in the constants module are exposed through public APIs but the module itself is not (e.g. CPUType). This is somewhat awkward as you cannot explicitly import this type into your own code and cannot read documentation for these types (see: https://docs.rs/pdb/0.6.0/pdb/struct.CompileFlagsSymbol.html#structfield.cpu_type). My workaround here was to just format these as Strings since that was all I need anyways.

These two issues I'd be happy to make PRs for if the intent is for these items to be public.

The next issue I encountered was that the ProcedureReferenceSymbol is sort of useless if you actually want to do anything with it. In order to look up the symbol you need to look up the correct module -- the ID of which can be obtained from the module field. From here I'm stumped. There's no way to get a Module from a module index. Other similar scenarios use custom types, such as SymbolIndex, and allow you to use methods such as ModuleInfo::symbols_at(index) to grab the symbol or SymbolIter::seek. As far as I could tell no such APIs exist for modules and none of the examples or internal code reference these fields.

The text was updated successfully, but these errors were encountered:

jan-auer · 2020-10-28T09:28:41Z

Thanks @landaire for opening this issue! Improvements and contributions are much welcome.

There were two changes/workarounds I had to make in my own local branch for this tool [...]

You're right; both of these are oversights and should definitely be updated. If you like, feel free to create a PR, or let me know if you'd like me to fix this.

The next issue I encountered was that the ProcedureReferenceSymbol is sort of useless if you actually want to do anything with it. In order to look up the symbol you need to look up the correct module -- the ID of which can be obtained from the module field. From here I'm stumped. There's no way to get a Module from a module index.

You're right, this API is currently missing. The issue with adding such an API was that there is no efficient way to provide random access to a module by index, that I'm currently aware of. Ultimately, it looks like this:

The DebugInformation stream (DBI) contains a stream of module entries. These are variable-length encoded, so you have to iterate to get the full list.
With the module entry, you can load the actual ModuleInfo, which is an expensive process. It loads sections from the PDB into memory, which causes a lot of allocations.

The "solution" to this was to pass the burden of loading and indexing modules to the user. For instance, we've built a type that collects all module entries ahead of time into a vector and then lazy loads the module on first access. At the time, I believed this is a bit too high level for the PDB crate, but I'm happy to reconsider and add a convenience type that takes care of this.

This gets even worse when you look at the ItemIndex trait, where you have to look up a module by name. Building such maps incurs overhead, and as such I believe offering a utility type that indicates this overhead is a reasonable approach.

Other similar scenarios use custom types, such as SymbolIndex, and allow you to use methods such as ModuleInfo::symbols_at(index) to grab the symbol or SymbolIter::seek. As far as I could tell no such APIs exist for modules and none of the examples or internal code reference these fields.

You're right. We were able to implement these efficiently due to a difference in what the indexes actually are. While the module index is in fact the numeric index of the module in the stream, the SymbolIndex is a byte offset into the symbol stream. For that reason, we were able to implement "random access".

landaire · 2020-10-30T04:18:42Z

Thanks for the feedback!

The "solution" to this was to pass the burden of loading and indexing modules to the user. For instance, we've built a type that collects all module entries ahead of time into a vector and then lazy loads the module on first access. At the time, I believed this is a bit too high level for the PDB crate, but I'm happy to reconsider and add a convenience type that takes care of this.

Are the modules indexed in order of how they're processed by the iterator? If so, that's trivial for me to handle in my own code and don't mind doing it myself -- I just didn't know if that was guaranteed to match.

jan-auer · 2020-10-30T12:40:01Z

This is my understanding, yes. To compare with how Microsoft's PDB code works:

Code for resolving a module reference can be seen here.
This subtracts 1 via imodForXimod to get a 0-based index. I'm assuming a literal 0 means no reference.
openModByImod then runs pmodiForImod.
pmodiForImod indexes into rgpmodi.
rgpmodi is built sequentially here.

So I think it should be safe to assume that you can take dbi.modules().nth(proc_ref.module - 1). Looking at this, I think we can improve the API by making module an option, but I'd like to verify that first. Can you confirm that the above yields the correct results for you?

jan-auer · 2022-06-03T06:36:15Z

All points in this PR are now addressed, with the exception of constants being private which is now tracked in #120. Thanks for opening this issue and suggesting improvements!

This was referenced Oct 30, 2020

Re-export public enums located in private constants crate #90

Merged

Make CompileFlags fields public #91

Merged

jan-auer mentioned this issue Oct 30, 2020

Parse module references in symbol data #93

Merged

jan-auer closed this as completed Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API improvements for third-party tools #89

API improvements for third-party tools #89

landaire commented Oct 27, 2020 •

edited

Loading

jan-auer commented Oct 28, 2020

landaire commented Oct 30, 2020

jan-auer commented Oct 30, 2020

jan-auer commented Jun 3, 2022

API improvements for third-party tools #89

API improvements for third-party tools #89

Comments

landaire commented Oct 27, 2020 • edited Loading

jan-auer commented Oct 28, 2020

landaire commented Oct 30, 2020

jan-auer commented Oct 30, 2020

jan-auer commented Jun 3, 2022

landaire commented Oct 27, 2020 •

edited

Loading