Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add compression mangling for versioning namespaces in std #69

Closed
wants to merge 1 commit into from
Closed

Conversation

ldionne
Copy link

@ldionne ldionne commented Nov 16, 2018

This is a strawman proposal to add substitutions for inline namespaces (fixing #42). I've never touched the Itanium ABI before so this is most likely wrong and/or too naive, but this is at least something concrete to get us started with.

A couple of notes on my approach:

  • We don't compress std::__1, because that's already in use.
  • I delimit with v on both sides to avoid clashing with things like St3. There may be a better way of doing this.

@zygoloid In #42, you say:

I don't have a concrete suggestion yet; whatever we pick, we'll presumably want the std::<inline namespace> part to itself be substitutable, which makes this a bit awkward to fit into the existing scheme.

Can you explain what you mean by that? I suspect this will throw my approach down the drain (but that's fine).

Fixes #42

The intent is to shorten mangled names for common types shipped by libc++.
Copy link

@christinaa christinaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine, @rjmccall mentioned something about even and odd numbering, though I'm not sure that's necessary since as far as I'm aware only libc++ uses versioned namespaces (do correct me if I'm wrong, I have little experience with GCC's libstdcxx).

This would obviously be ABI breaking which is fine since ABI version 2 is already ABI breaking (I think that part goes without saying, enabling unstable ABI which basically switches to v2, without specifying the ABI version will still use the v1 namespace completely breaking any application dynamically linking against it early on).

Enabling the use of said ABI (and if agreed upon, shorthand mangling) does require explicitly opting for a different ABI version via LIBCXX_ABI_VERSION=2 (for libc++).

@rjmccall
Copy link
Collaborator

That wasn't me, that was JF. And no, we shouldn't be making an assumption that there are only two standard library implementations or requiring that the implementations coordinate to keep versioning unique.

@christinaa
Copy link

Would you suggest explicitly requesting short mangling via some commandline option to -cc1 instead (which means the Clang driver also has the freedom to enable it) and it could be an opt-in via something like -fcxx-short-mangling? As opposed to implicitly assuming anything about it based on the ABI version of the symbols being fed through the mangler/demangler?

::std::__[0-9]+::allocator&lt;char> >
&lt;substitution> ::= Siv&lt;inline-ns>v # ::std::__[0-9]+::basic_istream&lt;char, std::__[0-9]+::char_traits&lt;char> >
&lt;substitution> ::= Sov&lt;inline-ns>v # ::std::__[0-9]+::basic_ostream&lt;char, std::__[0-9]+::char_traits&lt;char> >
&lt;substitution> ::= Sdv&lt;inline-ns>v # ::std::__[0-9]+::basic_iostream&lt;char, std::__[0-9]+::char_traits&lt;char> >
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional suffices like turning Ss into Ssv0v don't work in the mangling grammar; types have to be self-limiting or else you get ambiguities. For example, you want _Z3fooISsv1vEv to demangle to foo<std::__1::string>(), but it already demangles to foo<std::string,void,v>(). You have to move this earlier in the production, like Sv0_s.

@rjmccall
Copy link
Collaborator

rjmccall commented Nov 17, 2018

As a general matter, I'm comfortable with opening up the substitution namespace fairly liberally. What I think we can do here is this:

  • We can reserve 'Sv' <vendor-substitution> to the general space of library-vendor substitutions.
  • Library vendors who want to participate in this can propose a namespace schema that "belongs" to them, and in return we'll give them a unique substitution prefix. What they do after their prefix is basically their business.

The big question here is what counts as a "library". If we reserve this for just standard libraries, with the expectation that those libraries will just have a handful of substitutions each, then it's not out of the question for manglers and demanglers alike to continue to hard-code all those substitutions indefinitely. On the other hand:

  • I would expect a modern standard library to want many more substitutions than just the seven described in the current ABI. [footnote] Hard-coding them may not be reasonable forever.

  • The standard library is far from the only C++ library in very common use. It's at least arguable that libraries like Boost ought to get some special attention here. (But this is a very slippery slope: I wouldn't want the ABI to be allocating vendor namespaces to literally every major C++ project.)

For the time being, I think we should constrain this to major standard library implementations.

[footnote] I'm sure maintainers have a better idea of this than I do, but off the top of my head:

  • Third-party code makes heavy use of unique_ptr, shared_ptr, function, vector, and the various ranges and views. We may also want to explore ways of encoding that these templates are being used with their default non-initial arguments instead of having to explicitly represent those arguments every time.
  • libc++ explicitly instantiates wide strings and streams and would presumably love to have those symbols shortened even if they're not widely used externally.
  • Metaprogramming names are often huge and frequently appear in the names of instantiated function templates. Those function templates are usually inlined away, but it can make a big difference for unoptimized code.

@christinaa
Copy link

christinaa commented Nov 17, 2018

For the time being, I think we should constrain this to major standard library implementations.

Yes this is fine from libc++ side since ABI breakage is acceptable for ABIv2 and up (ignoring the non-canonical (ie. Facebook and something like __fb) vendor specific "arbitrary" inline namespaces which do not use the numeric scheme for versioning, as that has been explicitly reserved within libc++).

It's at least arguable that libraries like Boost ought to get some special attention here. (But this is a very slippery slope: I wouldn't want the ABI to be allocating vendor namespaces to literally every major C++ project.)

I don't suggest any library aside from a C++ standard library (aka, ::std) should get any special attention since as you said giving Boost (::boost) special attention is a slippery slope, what about Abseil (::absl) or libbase (::base)? What about LLVMSupport? So yes I don't advocate doing that since it would open the floodgates to the point where we may need a separate committee for assigning reserved identifiers which seems silly.

This proposal is merely to tackle the loss of short mangling schemes caused by inline namespace-based versioning being introduced.

Metaprogramming names are often huge and frequently appear in the names of instantiated function templates. Those function templates are usually inlined away, but it can make a big difference for unoptimized code.

Don't forget debug data especially on embedded devices where because of the naming, with std::__2::string being a major culprit, shipping uncompressed DWARF debug information causes a massive increase in object file sizes (an argument would be to just not ship it at all, similar to dSYM but that's not the point I'm trying to make).

@rjmccall
Copy link
Collaborator

Right, I was just thinking it through, not trying to say that you'd suggested any of that.

@christinaa
Copy link

christinaa commented Nov 17, 2018

There may be some room to add more shorthand manglings if we're going for an ABIv2 change, though personally for me it's difficult to think of possible combinations, I think std::function<void()>, std::vector<std::string> are two extremely common ones that may warrant their own shorthand schemes (that was entirely off the top of my head).

I guess @ldionne would be best regarding advice on that because of his experience with idiomatic C++ and familiarity with how frequently certain standard library constructs are used.

@rjmccall
Copy link
Collaborator

Remember that substitutions don't have to be fully applied: it might not be useful to have substitutions for any exact specializations of std::initializer_list, but you can still have a substitution for the template itself.

@christinaa
Copy link

Also to clarify, would you not be okay with mangler/demangler drawing assumptions about ABI versions from symbols and using compressed mangling by default for ABIv2 and above? This is not a breaking change for anyone aside from Fuchsia but they are happy with breaking ABI and they don't even mind testing those new breaking changes out in practice. This would only apply to ABIv2 and up and only when libc++ is used implicitly. As far as other stdlibs go, this can be an opt-in flag, and for libc++ an opt-out in a similar fashion.

But I personally don't see a reason to not make this the default for libc++, I think Eric and Louis are both on board with this idea (though it would still be great to introduce new shorthand forms since I'd rather break the unstable ABI once, so it would be a good idea to have all of that ready prior to actually breaking it). Would be nice to hear any suggestion from C++ experts like Louis who are familiar with what would be of most benefit or whoever else we can CC on this.

Obviously not everything needs a short mangling but I think now, as C++ standards advance and bring more into the stdlibs, there's no better time for extending the short mangling forms. And ultimately it's a win-win situation for pretty much everyone, and it's one of the reasons I'm trying to push for those changes as they're of massive help for semi-embedded systems that may still have a full libc++ but be constrained on overall disk space for example.

I do apologize for it being libc++ centric, hence my suggestion to default to opt-out for it and for the rest of the libraries default to opt-in, unless I misunderstood your concern? And to make this 100% clear, std::__1 is specifically excluded from this as this is a major breaking change along with many others brought on by ABIv2.

@rjmccall
Copy link
Collaborator

rjmccall commented Nov 17, 2018

I'm still not entirely sure what you're proposing.

The ABI is allowed to add abbreviations for declarations which it can assume haven't previously existed. That applies regardless of any notion of ABI versioning: if the committee adds a std::widget class, we can safely add a Sw abbreviation for it because we can assume that there has never previously been a std::widget entity that we'd be changing the mangling of (because ordinary programs aren't allowed to declare things in the std namespace).

That means it's fine for us to add new abbreviations for entities in std::__2, if that's what you're asking. They're not going to be as compact as, say, Ss, but they're going to be much more compact than NSt3__212basic_stringI...EE, and that should be good enough.

I would strongly object to the mangling rule being target-dependent. Fuchsia is of course welcome to say that it uses a specific ABI version of libc++, just like it's welcome to tailor all the other symbols in its ABI in a myriad of other ways. That decision should just mean that its ABI is expressed using different entities — which happen to take advantage of some new target-independent abbreviations — not that Fuchsia's compilers actually change the standard ABI rules.

For example, Fuchsia is welcome to make std::string resolve to std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char>>. We on the Itanium ABI can make that much more efficient by adding a standard abbreviation to mangle it as something like Svfs. Those abbreviations would not be target-dependent: programs on other targets that start use that same ABI version of libc++ (which may be difficult, but put that aside for now) would also benefit from them. But it would be inappropriate for Fuchsia to try to take the compression a step further and apply a target-dependent ABI rule to mangle that as Ss.

@jwakely
Copy link
Contributor

jwakely commented Nov 17, 2018

@christinaa

though I'm not sure that's necessary since as far as I'm aware only libc++ uses versioned namespaces (do correct me if I'm wrong, I have little experience with GCC's libstdcxx).

Libstdc++ has a non-default (and not widely used) configuration that uses versioned namespaces. Past versions used std::__7:: and current releases use std::__8::.

@christinaa
Copy link

christinaa commented Nov 18, 2018

@rjmccall Well I was proposing changing the rules for ABIv2 and up which would imply compiler changes. As @jwakely said this isn't going to clash with libstdcxx since they use a different subset of the versioning namespace. I was just wondering how you felt about making the shorthand mangling default for ABIv2 (ie. if libc++ is compiled for ABIv2, the mangler should, unless there's an optout, use the new manglings by default, I'm not sure if that should apply to Clang+libstdcxx since we have far less control over it, so I was suggesting it be an opt-in flag for any library that isn't upstream libc++ with ABIv2+, since it's a lot more controlled w/r to libc++ so it would be a start. Once other vendors adopt that into their unstable versions of ABIs, the opt-in could be changed to an opt-out by default for everyone aside from ABIv1 users).

So this should not affect libstdcxx until they're ready to roll out support for new ABI at which point we can swap opt-in for opt-out once they're happy and we know which ABI version we should assume is going to support these proposed new mangling schemes (since it looks like it's going to be different for libstdcxx). Same goes for all other stdlib vendors, I wouldn't suggest enabling it by default without getting a blessing from the stdlib vendor. However as far as libc++ goes this is much easier as it's a part of the LLVM project as is, which means it's possible to roll out such changes with ABIv2 and have them be implicitly enabled if that ABI version is selected and libc++ is used.

Does that make sense? Again if you see issues with that I won't push for implicitly enabling this for anything yet, especially considering we still need to have a concrete proposal up. I agree with your changes with regarding to abbreviations that you mentioned in review, that's not the issue I was raising.

Anyway, before discussing all this, I think it's best to come up with a proposal first and see what could be added to the new scheme. A few extra characters because of the versioned namespaces aren't a problem, I wasn't implying it is (sorry if it seemed like I had an issue with that). I understand the necessity of it, and I definitely wasn't trying to suggest having this be in the root, unversioned namespace. With regards to Fuchsia all I was trying to say is that they don't mind testing breaking ABI changes, not that the project needs special ways of mangling that would deviate from the to-be ABIv2 short manglings, they're happy with just using ABIv2/Unstable ABI as is for now, regardless of what direction it goes.

Thank you.

@rjmccall
Copy link
Collaborator

"ABIv2" is a collection of ideas, not a concrete variant ABI. Up to now, none of the "v2" ideas have introduced semantic inconsistencies with the "v1" mangling rules. There are two major reasons that I know of.

The first reason is that such changes would subtly break all the existing tooling which assumes that the mangling rules are the same across all platforms. Consider something like c++filt; when it sees Sa, should it demangle it to std::allocator or to std::__2::allocator? Most of those tools operate by calling __cxa_demangle, but that would be incorrect since it would presumably apply the rules for the host OS rather than the right rules for the target. At Apple, we considered changing the interpretation of the standard abbreviations when we had the ABI break moving to AArch64, but ultimately we decided against it for exactly this reason.

The second reason is that there are several other major flaws in the mangling grammar, so if you're truly interested in getting optimal manglings on a new target, you shouldn't stop at changing the standard substitutions. Decreasing the size of an individual symbol is much less important than decreasing the overall size of the symbol table, and the best way to do that is to take advantage of common substrings between symbols, and particularly common prefixes. A prefix-tree symbol table is a good idea just given common C idioms (consider pthread_*, pthread_cond_*, etc.), and it does work tolerably well for C++ manglings, but it would be significantly more effective with a handful of structural changes to the Itanium grammar (e.g. TI, TS, TV, and TT should all be suffixes on the type name instead of prefixes). That kind of change would require moving to a new top-level prefix than _Z.

So if you're really interested in using a "v2" mangling scheme that abandons consistency with the old scheme, I would recommend deeper changes than just tinkering with substitutions. If you're interested in something that maintains consistency with the old scheme, you should add new substitutions, not repurpose the existing ones.

If you're dead-set on getting 2-byte substitutions instead of 3/4-byte substitutions, and you don't care about supporting existing standard libraries on your target, you should just have your standard library declare its entities directly in namespace std. libc++ did not do this originally because we cared about not interfering with libstdc++. You can always introduce an inline namespace later for versioning if you want to make a library ABI break. But honestly, I think you'll get far better symbol compression overall if you accept that you'll have 3/4-byte substitutions but then actually add substitutions in your namespace for all the common templates like std::unique_ptr that aren't included in the existing substitution list.

@christinaa
Copy link

christinaa commented Nov 18, 2018

No no I wasn't implying that I was dead set on that aspect, in fact I said if it's just a few characters I don't see the problem. Sorry I think I'm hard to understand sometimes. I absolutely do care about compatibility and am happy with having slightly longer "short" mangling because of the namespaces, I just want to eliminate major sources of cruft in debug info like what std::string currently expands to.

Again big sorry if I gave the wrong impression, I absolutely support the idea of having slightly longer manglings to not violate compatibility, regardless of whether it's 3-4 or 5 or 6 extra characters it's still a huge win as far as I'm concerned since at the moment std::string mangling (which is my main motivation for pushing this forward) tends to make templated code generate massive debugging information unless it's compressed.

I still think we're at a disagreement regarding the ABI break as far as libc++ goes, I don't personally see the issue with the ABI break, it's an unstable ABI for a reason, ABI breaks are to be expected as part of ABI evolution and I don't think there is a good reason for freezing ABIv2 yet either. If you really are against the scenario where libc++ with ABIv2 will cause the mangler to behave differently I can gate it behind a driver flag.

I don't want to spend too much time discussing this aspect yet so I'm happy to make it strictly opt-in for now since I think we're going a bit off track here and I think we should focus on the actual scheme and possible inclusions as well. This will be a break in unstable ABI though I'd like to keep it to a single break hence me wanting to move onto the actual proposal, and I don't really want to waste anyone's time. and the more I talk the more confusion there seems to be :(

Thank you and big apologies for any misunderstandings, I'm generally not great with RFCs so please excuse any ambiguities that may have caused confusion. Also I think as far as demanglers go, it should be possible to apply that logic in reverse providing the mangling scheme is unambiguous enough and extrapolate the ABI version from symbol names.

@rjmccall
Copy link
Collaborator

Okay, I think we're on the same page here. When you're talking about "ABIv2", you talking about proposals to revamp the ABI of libc++. That's generally up to libc++ as a project and is ultimately off-topic here.

If you're okay with slightly longer abbreviations, then I think the path forward here is quite straightforward:

  • You should propose something like the "vendor substitution" mechanism I described above, and as part of that you should ask for a vendor substitution prefix for libc++. I don't think this will be controversial.

  • You should start a conversation within the libc++ project about what substitutions you actually want. I strongly recommend that you not just default to asking for the 7 substitutions from 15+ years ago; this is data that ought to be easy to collect just by looking at a ton of different classes.

  • When you have a full list of the substitutions you want, you can come back to the Itanium ABI and tell us what substitutions you'd like to register.

@christinaa
Copy link

I think libc++ is on board with this, as I said, I discussed it with both Louis (who opened the issue) and Eric, as they are mostly in charge of it and this was preceeded by a fairly long thread on libcxx-dev and IRC conversations where we concluded that this would require getting an extension to the IA64 ABI (essentially a more refined form of what Louis proposed). I don't think it's off-topic because ideally this would be standardized at ABI level, as an optional breaking change (as far as generic IA64 ABI goes, without mentioning specific stdlib implementations). We can then take advantage of that, instead of coming up with our own standards. (ie. Especially since things like 3rd party demanglers exist, often within crash dumpers for example)

And while Clang has reserved manglings for certain very niche things like SEH on IA64 filters, as one example, something more trivial and common like standards for mangling std::string in an stdlib with versioned namespaces would probably be best off documented here, since that's mostly the point of having these standards in the first place. Because stdlib and compiler vendors aren't the only consumers of those standards.

Since you have a better understanding of mangling schemes, what prefix would be safe (to avoid conflicts) and yet most compact to use for ABI version N (in it's mangled form ie. __Z...) for versioned std::__N (N being 2 for libc++ as it stands)?

@rjmccall
Copy link
Collaborator

Okay. Let me try to be very concrete, because apparently we are not communicating well.

What I would like to do is add a prefix, let's say Sc, and "allocate" that to the libc++ project. The libc++ project can then propose a list of substitutions, all of which start with Sc, and we will register those in the Itanium ABI. They will technically be library- and target-independent substitutions, but since they'll be substitutions within a namespace that only libc++ will be using, they will in fact be libc++-specific. You are free to add to this list in the future, e.g. if the C++ committee adds a new entity to the standard library. Your proposed substitutions can be longer than three bytes, and in fact most of them probably should be, because the Itanium ABI will not be granting you another two-byte prefix if you run out of room in Sc.

Separately, the Itanium ABI will remember that you are using the namespaces std::__1 and std::__2, and that libstdc++ is using the namespaces std::__7 and std::__8. Please let us know if you start using another versioning namespace so that we can discourage other libraries from using it. We will expect that all of your Sc substitutions will be within a namespace that you've claimed in this way.

I'll talk to the rest of the Itanium ABI project to summarize that idea and hopefully get consensus on it.

@rjmccall
Copy link
Collaborator

Actually, I now have a somewhat more systematized idea for what the suffix following Sc should look like, but we need to talk about it to get consensus on what to do.

@orivej
Copy link
Contributor

orivej commented Nov 19, 2018

One option that has not been considered yet is to counteract the std inline namespace with an std inline namespace context. For example, if _Z1fSs is f(std::basic_string<char, std::char_traits<char>, std::allocator<char>>) then something like _ZSN3_ns1fSs could mean f(std::_ns::basic_string<char, std::_ns::char_traits<char>, std::_ns::allocator<char>>), where SN3_ns right after _Z makes all standard std:: substitutions to correspond to std::_ns::.

This seems ideal when different std:: inline namespaces do not appear in the same symbol. (Of course, the latter has to be defined, but it does not have to be efficient.)

@rjmccall
Copy link
Collaborator

rjmccall commented Nov 19, 2018

This PR was prompted by a thread on libcxx-dev where I did actually bring something like that as a "mode prefix". There are two basic problems with it:

  • you have to unambiguously decide whether to use it for any particular symbol, and
  • it fundamentally limits you to the existing catalog of standard substitutions.

@christinaa
Copy link

Okay. Let me try to be very concrete, because apparently we are not communicating well.

What I would like to do is add a prefix, let's say Sc, and "allocate" that to the libc++ project. The libc++ project can then propose a list of substitutions, all of which start with Sc, and we will register those in the Itanium ABI. They will technically be library- and target-independent substitutions, but since they'll be substitutions within a namespace that only libc++ will be using, they will in fact be libc++-specific. You are free to add to this list in the future, e.g. if the C++ committee adds a new entity to the standard library. Your proposed substitutions can be longer than three bytes, and in fact most of them probably should be, because the Itanium ABI will not be granting you another two-byte prefix if you run out of room in Sc.

Separately, the Itanium ABI will remember that you are using the namespaces std::__1 and std::__2, and that libstdc++ is using the namespaces std::__7 and std::__8. Please let us know if you start using another versioning namespace so that we can discourage other libraries from using it. We will expect that all of your Sc substitutions will be within a namespace that you've claimed in this way.

Alright, understood, that sounds reasonable if the rest of the IA64 ABI committee can reach a consensus on it and regarding the part that follows the suffix (which you said is currently being discussed).

Thank you for the clear explanation and apologies for dragging this out due to various misunderstandings.

@rjmccall rjmccall deleted the branch itanium-cxx-abi:master September 15, 2021 03:33
@rjmccall rjmccall closed this Sep 15, 2021
@joshua-arch1
Copy link

How is this work going now? Can we have some option to disable inline namespace in libcxx?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mangling substitutions for std inline namespaces
6 participants