-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A variety of improvements #102
Conversation
Reviewing this now.
BTW, I'm in the SF office today and part of Wednesday, if you're around.
_Nick_
…On Sun, Nov 5, 2017 at 8:29 PM, Kyle Huey ***@***.***> wrote:
These patches combined get the number of symbols in all of Firefox's debug
info that fail to parse down from ~175k to a little over 2k. I think
they're all self-explanatory or well documented enough in the commit
message that I don't need to explain them in detail here.
------------------------------
You can view, comment on, or merge this pull request online at:
#102
Commit Summary
- Handle ABI tags.
- Handle `sr <type> <unqualified-name>` expressions.
- Accept closure-type-name productions in unqualified-name.
- Introduce the concept of "inner barriers".
- Implement vector-type.
- Remove overly conservative recursion checking.
- Reach through pointers and references to get_template_args.
- Bump libiberty threshold for improvements.
File Changes
- *M* Cargo.toml
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-0> (3)
- *M* build.rs
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-1> (2)
- *M* src/ast.rs
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-2> (489)
- *M* src/error.rs
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-3> (11)
- *M* src/index_str.rs
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-4> (6)
- *M* tests/tests.rs
<https://github.com/fitzgen/cpp_demangle/pull/102/files#diff-5> (13)
Patch Links:
- https://github.com/fitzgen/cpp_demangle/pull/102.patch
- https://github.com/fitzgen/cpp_demangle/pull/102.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#102>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEjS7t8rdD8-vbJelPTCfJ2qQAn03bAks5szosfgaJpZM4QSuQ0>
.
|
If you want to talk through this in person I could bike down there today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In progress comments.
src/ast.rs
Outdated
|
||
impl TaggedName { | ||
// Not a `Demangle` impl because the 'me reference to self may not live long | ||
// enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this surprising -- the only time I had to avoid a Demangle
impl because of this was with handles that got copied around, and ultimately weren't references into the substitutions table. But that doesn't seem like it should be the case here. Is this code just cribbed or did you actually run into that problem? If the latter, I'd be interested in learning how it manifested.
tests/tests.rs
Outdated
@@ -101,6 +101,10 @@ demangles!( | |||
"void Ty::method<Ty>(void (Ty::*)(char const*), void (Ty::*)(char const*))" | |||
); | |||
demangles!(_ZNK1fB5cxx11Ev,"f[abi:cxx11]() const"); | |||
demangles!( | |||
_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly, a beautiful symbol.
Because closure-type-name starts with two characters, rework the starts_with handling by getting rid of the trait (which is never used as a generic) and doing some one-off handling for closure-type-name.
Consider the mangled symbol `_ZSt4copyIPKcPcET0_`. This is correctly demangled to `char* std::copy<char const*, char*>()`. BecauseEncoding::Function places the function itself on the inner stack so that it can later be pulled off to decorate the full function name with arguments, if the template arguments happen to go looking on the inner stack they will find the function and pull in the function arguments as well. This leads to the nonsensical demangling `char* std::copy<char const*(), char*>`. To prevent these sorts of problems, we introduce "inner barriers". Conceptually, these are nodes that a descendant looking back up the tree (via the inner stack) cannot see past. This is implemented through an RAII helper that sets aside the existing inner stack for an empty one at an inner barrier and replaces the previous inner stack when the helper goes out of scope. There's also an `inner_barrier` macro to make this transparent to `Demangle` implementations. Similar to the template-args production, the encoding production can also contain complex elements within it that should not look back up the inner stack. There are probably others too ... a formal analysis of the grammar would help here.
The demangling recursion checking is probably unnecessary (as mentioned in the comments, and because consumers can set their own recursion limits to avoid ilooping). It's also overly conservative. The substitution index based accounting does not consider that one of the substitutions can be a template parameter, whose value may be different at the time of the two substitutions. For a real symbol that illustrates this behavior, consider `_ZN13nsTArray_ImplI6RefPtrIZN7mozilla22MediaConstraintsHelper17FindBadConstraintINS1_28MediaEngineRemoteVideoSourceEEEPKcRKNS1_21NormalizedConstraintsERKT_RK9nsTStringIDsEE10MockDeviceE27nsTArrayInfallibleAllocatorE13AppendElementIPSH_SJ_EEPSI_OSA_`.
This allows a "successful" demangling of `_ZN19do_QueryFrameHelperI8nsIFrameEcvPT_IS0_EEv`, but it is still not correct. The output is now `do_QueryFrameHelper<nsIFrame>::operator nsIFrame<nsIFrame>*()` (compare to the c++filt `do_QueryFrameHelper<nsIFrame>::operator nsIFrame*<nsIFrame>()`). libiberty forces the '*' to appear before any template parameters.
LGTM, we will need a follow up issue for trying to construct a cycle in the substitutions, and whether that is a real problem. |
…rn encoding. How we got here: In late 2017 (PR gimli-rs#102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension. The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up. If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt. The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as `sr <unresolved-qualifier-level>+ E <base-unresolved-name>` Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to. roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today. Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
…rn encoding. How we got here: In late 2017 (PR gimli-rs#102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension. The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up. If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt. The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as `sr <unresolved-qualifier-level>+ E <base-unresolved-name>` Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to. roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today. Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
…rn encoding. How we got here: In late 2017 (PR #102) I wrote f6bcf4c. The example symbol for that change was `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_`. I now believe that symbol is invalid and that its production is either a GNU bug or a nonstandard extension. The latest cxxabi defines `expression` at http://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.expression. It does not have any special casing of `sr` outside of the `unresolved-name` production. These changes were made in 2010 (http://sourcery.mentor.com/archives/cxx-abi-dev/msg02279.html) but GCC and binutils don't seem to have caught up. If we remove our handling of GCC's outdated behavior, then the symbol `_Z6IsNullIiEN1EIXsr1FIT_EE1nEE4typeES1_` from our tests now demangles to `E<F<int>::n>::type IsNull<int>(int)` which is both more sensible and matches llvm-cxxfilt. The symbol `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsrSt14numeric_limitsIT_E10is_integerEbE4typeES4_S4_PS4_` is no longer valid. The expression (starting with the 'X' after enable_if) needs to be read as `sr <unresolved-qualifier-level>+ E <base-unresolved-name>` Unlike the `type` production which allows (among others a well-known) substitution, `unresolved-qualifier-level` is `simple-id`, so `srSt` must become `sr3std`. The `unresolved-qualifier-level` list must be terminated with an 'E', which must be inserted before '10is_integer'. This also drops std::numeric_limits from the substitution table (it was previously at slot 3), so we need to adjust the substitution indexes at the end. That produces a corrected symbol: `_ZN4base8internal14CheckedSubImplIlEENSt9enable_ifIXsr3std14numeric_limitsIT_EE10is_integerEbE4typeES3_S3_PS3_`, which both llvm-cxxfilt and us agree is demangled to what we were previously demangling it to. roc added `_ZN7mozilla6detail12ListenerImplINS_14AbstractThreadEZNS_20MediaEventSourceImplILNS_14ListenerPolicyE0EJNS_13TimedMetadataEEE15ConnectInternalIS2_NS_12MediaDecoderEMS8_FvOS5_EEENS_8EnableIfIXsr8TakeArgsIT1_EE5valueENS_18MediaEventListenerEE4TypeEPT_PT0_SD_EUlS9_E_JS5_EE17ApplyWithArgsImplISL_EENSC_IXsr8TakeArgsISH_EE5valueEvE4TypeERKSH_S9_` as a symbol that loops infinitely, but it now demangles. It also demangled previously in llvm-cxxfilt. Our demanglings don't match, but I don't propose to solve that today. Finally, we come to the symbol that actually motivated this: `_ZSt3powIdiEN9__gnu_cxx11__promote_2IT_T0_NS0_9__promoteIS2_Xsr3std12__is_integerIS2_EE7__valueEE6__typeENS4_IS3_Xsr3std12__is_integerIS3_EE7__valueEE6__typeEE6__typeES2_S3_`. This is pretty similar to the first symbol, after fixing it up. Here, after the 'X' `expression` production, there is another template argument. The second 'E' (which is required to terminate the `unresolved-qualifier-level` list) gets misinterpreted by the existing code as terminating the containing template production without the next template argument. Parsing then goes off the rails from there. Thus, the correct standards-compliant behavior is incompatible with the outdated output of gcc.
These patches combined get the number of symbols in all of Firefox's debug info that fail to parse down from ~175k to a little over 2k. I think they're all self-explanatory or well documented enough in the commit message that I don't need to explain them in detail here.