Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new Portable PDB related protocol fields #13

Merged
merged 6 commits into from
Oct 17, 2022
Merged

Conversation

Swatinem
Copy link
Member

This RFC proposes to add new fields to the event protocol to support symbolication of .NET stack frames that use Portable PDB debug files.

Rendered RFC

@mitsuhiko
Copy link
Member

We have discussed this previously but the entries in debug_meta today are a bit of a weird hybrid. An entry usually refers to a single module and then describes both the object file and the debug companion. However the type is almost exclusively referring to the file type of the object file. So we talk about pe, macho, elf and so forth. The exception to this rule today I believe is only the proguard entry which describes the debug companion only.

With the portable PDB situation it's odd now because the pe (which already is portable in the name) is more or less the same for .NET and native. While technically true that there is a difference as the .NET PE really is just a dummy, it does carry a lot of useful PE information still.

The proposal here to introduce a portable-pe (portable, portable executable :P) is understandable and probably not a terrible idea, but I wonder if that is actually how we should think about these debug files. An alternative world would be to allow a debug_meta entry to just refer to a debug file instead. That way you could also potentially associate more than one debug companion with a module.

@Swatinem
Copy link
Member Author

I would also prefer to describe the debug companion (portable-pdb in this case), that would make a lot more sense to me.

@Swatinem
Copy link
Member Author

Fun fact: PE files can indeed have more than one CodeView record, for example here is a hex view of System.Reflection.Metadata.dll:

grafik

That file does not seem to have a PDB checksum however.

It seems like a lot of the System.XXX files have two codeview records.

text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved

We propose to add the following to the [Stack Trace Interface Frame Attributes](https://develop.sentry.dev/sdk/event-payloads/stacktrace/#frame-attributes):

- `instruction_offset: Number`, as it is currently used by the `sentry-dotnet` SDK.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instruction_offset is a good generic name for such a field in principle. Given that the old fields for this have been long removed and were inactive, there is no risk of collisions.

If we want to be more specific, we could also name this something like function_offset, which would further differentiate the semantics of this field to WASM's addressing mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

  • there is a field instruction_addr.
  • there is a field addr_mode which may be abs or rel:n.
  • there is an optional field function_index (name subject to change).

instruction_addr is then interpreted as follows:

  • addr_mode = abs: instruction_addr is an absolute address
  • addr_mode = rel:n and function_index is missing: instruction_addr is relative to module n (the WASM case)
  • addr_mode = rel:n and function_index = k: instruction_addr is relative to function k in module n (in other words, an IL offset)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't hate this but I wonder if we then don't want to have a funcrel:n:m where n is the module index and m is the function index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were also talking about that last week, and did not really like it because it was too "stringly typed". And parsing things from a specially formatted string is strictly worse than just having proper JSON properties for it.

However I do recognize that the whole problem boils down to having a properly discriminated union:

enum Addr {
  Absolute(u64),
  ModuleRelative { module_idx: u32, addr: u64 },
  FunctionRelative { module_idx: u32, function_idx: u32, addr: u64 },
}

text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved
text/0013-portable-pdb.md Outdated Show resolved Hide resolved
@Swatinem
Copy link
Member Author

Swatinem commented Oct 5, 2022

I have researched the System.Reflection.Metadata.dll (Windows version, as the macOS version is different and does not have an associated .ni.pdb):

Here are the headers:

CoffHeader.TimeDateStamp: 858257FE
PEHeader.SizeOfImage: 114000

DebugDirectoryEntry.Type: CodeView
DebugDirectoryEntry.MajorVersion: 100
DebugDirectoryEntry.MinorVersion: 0
DebugDirectoryEntry.Stamp: 858257fe
DebugDirectoryEntry.IsPortableCodeView: False
CodeViewDebugDirectoryData.Guid: d9f6618d-d934-6123-c7c2-459c9645cf84
CodeViewDebugDirectoryData.Age: 1
CodeViewDebugDirectoryData.Path: System.Reflection.Metadata.ni.pdb

DebugDirectoryEntry.Type: CodeView
DebugDirectoryEntry.MajorVersion: 100
DebugDirectoryEntry.MinorVersion: 504d
DebugDirectoryEntry.Stamp: 858257fe
DebugDirectoryEntry.IsPortableCodeView: True
CodeViewDebugDirectoryData.Guid: 3183ede4-e1eb-4d0c-a6a0-2937d1f72463
CodeViewDebugDirectoryData.Age: 1
CodeViewDebugDirectoryData.Path: D:\a\_work\1\s\artifacts\obj\System.Reflection.Metadata\net6.0-Release\System.Reflection.Metadata.pdb

DebugDirectoryEntry.Type: Reproducible
DebugDirectoryEntry.MajorVersion: 0
DebugDirectoryEntry.MinorVersion: 0
DebugDirectoryEntry.Stamp: 0
DebugDirectoryEntry.IsPortableCodeView: False

I can also fetch all three files from the Microsoft symbol server using the PE-timestamp-filesize, PDB-Signature-Age and Portable-Pdb-Signature patterns respectively:

The .pdb file is a Portable PDB, the .ni.pdb file is an "old-style" MSF-based PDB. I haven’t looked into the files yet though.

Looks like the ni suffix stands for Native Image and are generated by ngen.exe, the Native Image Generator: https://devblogs.microsoft.com/devops/creating-ngen-pdbs-for-profiling-reports/

@Swatinem
Copy link
Member Author

Swatinem commented Oct 5, 2022

The .dll has a text section and unwind info which we can dump via dump_cfi, but it does not seem to have a symbol table.
For the .ni.pdb, our object_debug reports a symbol table and debug info, but creating a symcache for it via symcache_debug leaves us empty.

Maybe it is a a different kind of MSF PDB? Either way, it does not look like we can use it at all.

Swatinem added a commit to getsentry/relay that referenced this pull request Oct 6, 2022
This adds new fields to the Stack Frame and Debug Image protocol types, and expends the documentation on others.

The fields are specific to .NET symbolication using Portable PDB debug files.

[RFC0013](getsentry/rfcs#13) introduces and explains these new fields.
@Swatinem
Copy link
Member Author

Summary

  • I settled for a new "pe_dotnet" type now. There is enough evidence that we do need to somehow flag these files differently from normal pe files so a new type it is.
  • I went with instruction_addr + addr_mode: "rel:X" + function_id: HexValue. We discussed putting the function index into the addr_mode, but I believe having a separate field is better than putting more information into an opaque string that needs to be parsed.
  • I verified that the chain from SDK through relay to sentry and symbolicator works fine and correctly symbolicates events, using the WIP PRs linked from the RFC.

Next Steps

I believe we can merge the RFC and the PR to relay right away. The PR to sentry can also be merged, but will only take effect once the symbolic dependency is also updated to correctly recognize the new Portable PDB file type.
Support has already landed in symbolic master. We need a new release and can then merge the PRs to sentry-cli and symbolicator, and update the dependency in sentry as well.

That leaves the dotnet SDK as the last piece of the puzzle. My PR works, but does need some work. Also the current SDK architecture is not well suited to write out stack frames and debug images "atomically" at the same time.

Swatinem added a commit to getsentry/relay that referenced this pull request Oct 17, 2022
This adds new fields to the Stack Frame and Debug Image protocol types, and expends the documentation on others.

The fields are specific to .NET symbolication using Portable PDB debug files.

[RFC0013](getsentry/rfcs#13) introduces and explains these new fields.
@Swatinem Swatinem merged commit 2f6763a into main Oct 17, 2022
@Swatinem Swatinem deleted the portable-pdb branch October 17, 2022 10:08
Swatinem added a commit to getsentry/sentry that referenced this pull request Oct 17, 2022
…#39610)

- This will allow `sentry-cli` to upload Portable PDB debug files. The
companion PR is getsentry/sentry-cli#1345.
- It also allows symbolicator to query/download Portable PDB files.
- The PR also adds support for the protocol types specified in [RFC
0013](getsentry/rfcs#13)

Co-authored-by: Arpad Borsos <arpad.borsos@sentry.io>
Swatinem added a commit to getsentry/relay that referenced this pull request Oct 17, 2022
This adds new fields to the Stack Frame and Debug Image protocol types,
and expends the documentation on others.

The fields are specific to .NET symbolication using Portable PDB debug
files.

[RFC0013](getsentry/rfcs#13) introduces and
explains these new fields.
barkbarkimashark pushed a commit to getsentry/sentry that referenced this pull request Oct 18, 2022
…#39610)

- This will allow `sentry-cli` to upload Portable PDB debug files. The
companion PR is getsentry/sentry-cli#1345.
- It also allows symbolicator to query/download Portable PDB files.
- The PR also adds support for the protocol types specified in [RFC
0013](getsentry/rfcs#13)

Co-authored-by: Arpad Borsos <arpad.borsos@sentry.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants