You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ABI says that string literals in instantiation-dependent expressions are mangled thusly:
<expr-primary> ::= L <string type> E # string literal
... presumably because, in C++98, the type of the literal was the only property that could affect the validity of the instantiation-dependent expression. That is no longer the case; a C++11 program can inspect the contents of such a string literal in an instantiation-dependent expression, so we need to mangle said contents.
... where the first N (say, 16) characters of the string are encoded directly, followed by a 4M-bit hash of the entire string (algorithm TBD, but following target endianness) if its length is greater than N (where for all purposes other than determining the type, the terminating nul character is ignored).
The idea here is to preserve the string literal contents (at least the start of it) so that demanglers can display it, while avoiding mangling the entire contents of very long strings.
As an example, if we take N = 16, M = 8, and use MD5 as our hashing algorithm (taking the high-order 32 bits of its output), "Hello, world!" would mangle as LA14_cHello_2c_20world_21E, and U"this is a very long string indeed" would mangle as LA34_Dithis_20is_20a_20very_20l1cf8df38`.
If we like this direction, there are a few open questions:
Should we encode the remainder of the string if that would be shorter than the hash?
What hash algorithm should we use (and what values of N and M)? How much do we care about collision-resistance, given that almost any choice will shield us from accidental collisions? It seems plausible that someone will use a pair of strings with known-colliding MD5 sums as template arguments in (eg) test code for an MD5 algorithm, and at least one common way of generating such a pair produces two strings with the same prefix. How much should we care about that? (It'd be easy to "fix" such cases by applying some simple invertible transform on the string data first, such that the colliding pairs that people are likely to want to use in practice are different from the colliding pairs for our hash.)
The text was updated successfully, but these errors were encountered:
The ABI says that string literals in instantiation-dependent expressions are mangled thusly:
<expr-primary> ::= L <string type> E # string literal
... presumably because, in C++98, the type of the literal was the only property that could affect the validity of the instantiation-dependent expression. That is no longer the case; a C++11 program can inspect the contents of such a string literal in an instantiation-dependent expression, so we need to mangle said contents.
Proposal:
... where the first N (say, 16) characters of the string are encoded directly, followed by a 4M-bit hash of the entire string (algorithm TBD, but following target endianness) if its length is greater than N (where for all purposes other than determining the type, the terminating nul character is ignored).
The idea here is to preserve the string literal contents (at least the start of it) so that demanglers can display it, while avoiding mangling the entire contents of very long strings.
As an example, if we take N = 16, M = 8, and use MD5 as our hashing algorithm (taking the high-order 32 bits of its output),
"Hello, world!"
would mangle asLA14_cHello_2c_20world_21E
, andU"this is a very long string indeed" would mangle as
LA34_Dithis_20is_20a_20very_20l1cf8df38`.If we like this direction, there are a few open questions:
The text was updated successfully, but these errors were encountered: