Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(debuginfo): don't strip leading underscores from PDB symbols #642

Merged
merged 1 commit into from Jul 29, 2022

Conversation

mstange
Copy link
Contributor

@mstange mstange commented Jul 29, 2022

Leaving the symbol name in its original form allows consumers to detect
"C decorated" functions correctly.

According to the Microsoft documentation [1], there are four forms of
"C decorated" names:

_<name> (cdecl)
_<name>@<paramsize> (stdcall)
@<name>@<paramsize> (fastcall)
<name>@@<paramsize> (vectorcall)

According to the same docs, the first three are only used on 32 bit. [2]

dump_syms parses these forms and decodes the parameter size from them.
It needs the parameter size on 32 bit for correct stack unwinding in
some cases.

Before this patch, symbolic-debuginfo was stripping leading underscores
from these symbols. This made it impossible for dump_syms to detect the
second form (stdcall, _<name>@<paramsize>), so it couldn't decode the
parameter size from it and would leave the @ appendix on the
function name.

Here's an example from basic-opt32.pdb in the dump_syms repo. The
original symbol names here are "_MultiByteToWideChar@24" and
"_WideCharToMultiByte@32".

Bad:

PUBLIC 53011 0 MultiByteToWideChar@24
PUBLIC 53017 0 WideCharToMultiByte@32

Good:

PUBLIC 53011 18 MultiByteToWideChar
PUBLIC 53017 20 WideCharToMultiByte

This patch preserves the underscore. This allows dump_syms to detect the
parameter size correctly. It may lead to more underscores appearing in
stack traces, though. If these extra underscores are deemed undesirable,
removing them during demangling is probably a better approach.

At the moment, demangling does not seem to strip the underscore in this
case. Example:

$ cargo run --release -p symcache_debug -- -d ./symbolic-testutils/fixtures/windows/crash.pdb --lookup 0x3756
_malloc
  at <unknown file >

[1] https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170#FormatC
[2] I don't completely understand this; I can see plenty of functions
with leading underscores in 64 bit PDBs, too. If those underscores
are not cdecl decorations, what are they?

Leaving the symbol name in its original form allows consumers to detect
"C decorated" functions correctly.

According to the Microsoft documentation [1], there are four forms of
"C decorated" names:

```
_<name> (cdecl)
_<name>@<paramsize> (stdcall)
@<name>@<paramsize> (fastcall)
<name>@@<paramsize> (vectorcall)
```

According to the same docs, the first three are only used on 32 bit. [2]

dump_syms parses these forms and decodes the parameter size from them.
It needs the parameter size on 32 bit for correct stack unwinding in
some cases.

Before this patch, symbolic-debuginfo was stripping leading underscores
from these symbols. This made it impossible for dump_syms to detect the
second form (stdcall, `_<name>@<paramsize>`), so it couldn't decode the
parameter size from it and would leave the @<paramsize> appendix on the
function name.

Here's an example from basic-opt32.pdb in the dump_syms repo. The
original symbol names here are "_MultiByteToWideChar@24" and
"_WideCharToMultiByte@32".

Bad:

```
PUBLIC 53011 0 MultiByteToWideChar@24
PUBLIC 53017 0 WideCharToMultiByte@32
```

Good:

```
PUBLIC 53011 18 MultiByteToWideChar
PUBLIC 53017 20 WideCharToMultiByte
```

This patch preserves the underscore. This allows dump_syms to detect the
parameter size correctly. It may lead to more underscores appearing in
stack traces, though. If these extra underscores are deemed undesirable,
removing them during demangling is probably a better approach.

At the moment, demangling does not seem to strip the underscore in this
case. Example:

```
$ cargo run --release -p symcache_debug -- -d ./symbolic-testutils/fixtures/windows/crash.pdb --lookup 0x3756
_malloc
  at <unknown file >
```

[1] https://docs.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170#FormatC
[2] I don't completely understand this; I can see plenty of functions
    with leading underscores in 64 bit PDBs, too. If those underscores
    are not cdecl decorations, what are they?
@mstange mstange requested a review from a team July 29, 2022 00:32
@mstange
Copy link
Contributor Author

mstange commented Jul 29, 2022

Here's one way to strip the underscore during demangling, if desired:

diff --git a/symbolic-demangle/src/lib.rs b/symbolic-demangle/src/lib.rs
index a1a4275..2e4ff24 100644
--- a/symbolic-demangle/src/lib.rs
+++ b/symbolic-demangle/src/lib.rs
@@ -446,7 +446,10 @@ impl<'a> Demangle for Name<'a> {
             Language::Rust => try_demangle_rust(self.as_str(), opts),
             Language::Cpp => try_demangle_cpp(self.as_str(), opts),
             Language::Swift => try_demangle_swift(self.as_str(), opts),
-            _ => None,
+            _ => {
+                // Opportunistically strip a single leading underscore.
+                self.as_str().strip_prefix('_').map(ToOwned::to_owned)
+            }
         }
     }
 

@codecov-commenter
Copy link

Codecov Report

Merging #642 (d7ff319) into master (a0360d3) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #642      +/-   ##
==========================================
+ Coverage   71.20%   71.32%   +0.11%     
==========================================
  Files          96       96              
  Lines       18408    18405       -3     
==========================================
+ Hits        13108    13127      +19     
+ Misses       5300     5278      -22     

Copy link
Member

@Swatinem Swatinem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do this in demangling indeed. We now have symbols like 3738 __CxxThrowException@8 shortening that to _CxxThrowException in demangling may be a good idea.

@Swatinem Swatinem merged commit 59fccd0 into getsentry:master Jul 29, 2022
Swatinem added a commit that referenced this pull request Aug 4, 2022
#642 removed stripping the underscores from C-decorated Windows symbols.

The symbolic-debuginfo abstraction now outputs raw names (except for mach-o, where it still strips underscores).

As a followup to that change, this will now undecorate these symbols when writing a symcache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants