Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A variety of improvements #102

Merged
merged 8 commits into from
Nov 6, 2017
Merged

A variety of improvements #102

merged 8 commits into from
Nov 6, 2017

Conversation

khuey
Copy link
Collaborator

@khuey khuey commented Nov 6, 2017

These patches combined get the number of symbols in all of Firefox's debug info that fail to parse down from ~175k to a little over 2k. I think they're all self-explanatory or well documented enough in the commit message that I don't need to explain them in detail here.

@fitzgen
Copy link
Member

fitzgen commented Nov 6, 2017 via email

@khuey
Copy link
Collaborator Author

khuey commented Nov 6, 2017

If you want to talk through this in person I could bike down there today.

Copy link
Member

@fitzgen fitzgen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress comments.

src/ast.rs Outdated

impl TaggedName {
// Not a `Demangle` impl because the 'me reference to self may not live long
// enough.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this surprising -- the only time I had to avoid a Demangle impl because of this was with handles that got copied around, and ultimately weren't references into the substitutions table. But that doesn't seem like it should be the case here. Is this code just cribbed or did you actually run into that problem? If the latter, I'd be interested in learning how it manifested.

tests/tests.rs Outdated
@@ -101,6 +101,10 @@ demangles!(
"void Ty::method<Ty>(void (Ty::*)(char const*), void (Ty::*)(char const*))"
);
demangles!(_ZNK1fB5cxx11Ev,"f[abi:cxx11]() const");
demangles!(
_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truly, a beautiful symbol.

Because closure-type-name starts with two characters, rework the starts_with handling by getting rid of the trait (which is never used as a generic) and doing some one-off handling for closure-type-name.
Consider the mangled symbol `_ZSt4copyIPKcPcET0_`. This is correctly demangled to `char* std::copy<char const*, char*>()`. BecauseEncoding::Function places the function itself on the inner stack so that it can later be pulled off to decorate the full function name with arguments, if the template arguments happen to go looking on the inner stack they will find the function and pull in the function arguments as well. This leads to the nonsensical demangling `char* std::copy<char const*(), char*>`.

To prevent these sorts of problems, we introduce "inner barriers". Conceptually, these are nodes that a descendant looking back up the tree (via the inner stack) cannot see past. This is implemented through an RAII helper that sets aside the existing inner stack for an empty one at an inner barrier and replaces the previous inner stack when the helper goes out of scope. There's also an `inner_barrier` macro to make this transparent to `Demangle` implementations.

Similar to the template-args production, the encoding production can also contain complex elements within it that should not look back up the inner stack. There are probably others too ... a formal analysis of the grammar would help here.
The demangling recursion checking is probably unnecessary (as mentioned in the comments, and because consumers can set their own recursion limits to avoid ilooping). It's also overly conservative. The substitution index based accounting does not consider that one of the substitutions can be a template parameter, whose value may be different at the time of the two substitutions. For a real symbol that illustrates this behavior, consider `_ZN13nsTArray_ImplI6RefPtrIZN7mozilla22MediaConstraintsHelper17FindBadConstraintINS1_28MediaEngineRemoteVideoSourceEEEPKcRKNS1_21NormalizedConstraintsERKT_RK9nsTStringIDsEE10MockDeviceE27nsTArrayInfallibleAllocatorE13AppendElementIPSH_SJ_EEPSI_OSA_`.
This allows a "successful" demangling of `_ZN19do_QueryFrameHelperI8nsIFrameEcvPT_IS0_EEv`, but it is still not correct. The output is now `do_QueryFrameHelper<nsIFrame>::operator nsIFrame<nsIFrame>*()` (compare to the c++filt `do_QueryFrameHelper<nsIFrame>::operator nsIFrame*<nsIFrame>()`). libiberty forces the '*' to appear before any template parameters.
@fitzgen
Copy link
Member

fitzgen commented Nov 6, 2017

LGTM, we will need a follow up issue for trying to construct a cycle in the substitutions, and whether that is a real problem.

@fitzgen fitzgen merged commit f98ecbb into gimli-rs:master Nov 6, 2017
khuey added a commit to khuey/cpp_demangle that referenced this pull request Mar 24, 2019
…rn encoding.

How we got here: In late 2017 (PR gimli-rs#102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension.

The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up.

If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt.

The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as

`sr <unresolved-qualifier-level>+ E <base-unresolved-name>`

Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to.

roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today.

Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
khuey added a commit to khuey/cpp_demangle that referenced this pull request Apr 5, 2019
…rn encoding.

How we got here: In late 2017 (PR gimli-rs#102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension.

The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up.

If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt.

The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as

`sr <unresolved-qualifier-level>+ E <base-unresolved-name>`

Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to.

roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today.

Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
khuey added a commit that referenced this pull request Apr 5, 2019
…rn encoding.

How we got here: In late 2017 (PR #102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension.

The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up.

If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt.

The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as

`sr <unresolved-qualifier-level>+ E <base-unresolved-name>`

Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to.

roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today.

Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants