Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangled Union Initializer in a Template parameter fails if the 'field' is blank. #139

Open
erichkeane opened this issue Mar 29, 2022 · 15 comments

Comments

@erichkeane
Copy link

I'm reasonably convinced this is an issue in the mangling scheme with a union initializer. I see that GCC has the same problem as Clang does in; llvm/llvm-project#54588

The issue, in summary, is that trying to mangle the initializer expression to:
dummy<wrapper<int> { 42 }>();
which initializes the 'unnamed struct' field of the union.

The appropriate mangling section (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangle.braced-expression) has us using this production:
::= di <field [source-name])> <[braced-expression]> # .name = expr

However, in this case, we cannot mangle a source-name, as one does not exist!

For reference, ote that if I change the struct to have a field-name (FOO), we get:
_Z5dummyIXtl7wrapperIiEtlNS1_Ut_Edi3FOOtlNS2_Ut_ELi42EEEEEEiv
which demangles to :
int dummy<wrapper{wrapper::'unnamed'{.FOO = wrapper::'unnamed'::'unnamed'{42}}}>()

So, what are we expected to do here? Should we omit the name entirely and just use the <expression> version here? Or is this a case where we should still do the di and use a different production for the field name? I note that we have an 'unnamed-type-name' production, but nothing like that for a 'source-name'.

@rjmccall
Copy link
Collaborator

Is there a reason not to just look through anonymous structs and unions and treat the inner member as if it were a direct member of the containing class?

@erichkeane
Copy link
Author

I responded on the Clang bug (llvm/llvm-project#54588) with what I believe is your suggestion. I think that is alright, though I presume we would want some level of ABI-mangling-guarantee that this is how you guys want to handle this.

@jicama
Copy link
Contributor

jicama commented Mar 30, 2022

Is there a reason not to just look through anonymous structs and unions and treat the inner member as if it were a direct member of the containing class?

This makes sense to me.

@erichkeane
Copy link
Author

The only concern I have to that approach (assuming I understand it properly?) is that we might end up with duplicate field names that way. So something like:

union {
  struct {
    T val;
    struct {
      T val;
      T val2;
    };
};

then: wrapper<int>{42,44,45}
Ends up getting mangled with val=int{42},val=int{44},val=int{45}. Presuming we mandate the order be declaration order (and perhaps depth-first descent?) in this way though, I don't suspect it would be a problem.

I have no idea how to write a production to say that however.

@rjmccall
Copy link
Collaborator

rjmccall commented Mar 30, 2022

Is there a situation where that's legal? Members of anonymous unions are required to be tested for redeclaration in the enclosing scope ([class.union.anon]p1), and I would hope that compilers all use the same rule for anonymous structs.

@erichkeane
Copy link
Author

Ah! Seemingly not. I ran an experiment at work that made me think this was going to be a problem (which had surprised me at the time), but I must have made a mistake when putting it together around the crashing compilers.

SO I guess I have no complaints about the proposal. How do we formalize it in the ABI?

@rjmccall
Copy link
Collaborator

rjmccall commented Mar 30, 2022

Oh wait, you know, I think I was answering a different question.

The ABI does seem to be missing wording about how to mangle e.g. member expressions that resolve to an anonymous union member. One could argue that it's covered by the rule to traverse the syntactic expression tree and not expand e.g. implicit conversions, and indeed that's what compilers all seem to do, but still, it's better to be precise. To do that, we'd just add a sentence or two to the section about anonymous entities.

But you're asking about how to mangle an initializer which has resolved against the anonymous struct itself. I think there's some confusion here, because the ABI does not say that you're supposed to mangle the name of the initialized member when mangling a braced-init-list. The general rule is that the mangling grammar follows the syntactic tree, and that's the intent for braced-expression. di is for when the source includes a member designator (an extension prior to C++20, and still an extension if nested). So wrapper { 42 } should be mangled as tl7wrapperLi42EE, with no attempt to mangle the path down to the member being initialized. And of course that means we don't have to worry about designators that name anonymous aggregates. (A designator that names an anonymous aggregate member would be mangled as the simple name of that member, just as written in source.)

To be clear, that is true in general, not just for initializers that happen to resolve against anonymous aggregates.

@erichkeane
Copy link
Author

I see, ok then. It seems that the implementations are both doing it wrong by assuming that the union-case should always include the member initialize implicitly in this case. Though, I guess I don't see why a more simple case of otherwise equivalent instantiations that are otherwise equivalent (besides the syntax used to initialize it) should have different symbols.

@rjmccall
Copy link
Collaborator

Well, in general, the mangling distinguishes a large number of semantically-equivalent expressions. So e.g. we mangle (*p).member) differently from p->membereven when we know thatp` has pointer type. There's only so much we can do.

But you're right, I am once again doing this wrong. The mangling for non-type template arguments does need to be independent of spelling and is controlled by #63, which has unfortunately not yet been written up as a PR for inclusion in the ABI document. That rule says that we have to mangle the active member of a union, and that's what's blowing up here, where the active member of a union is an anonymous aggregate.

The section on Anonymous Entities says that:

For the purposes of mangling, the name of an anonymous union is considered to be the name of the first named data member found by a pre-order, depth-first, declaration-order walk of the data members of the anonymous union.

So I think that's the correct rule to use here. In the original example from the Clang bug, the name for mangling purposes of the active member of the anonymous union would be val, since that is the first named data member found by a walk of the active member (the anonymous struct).

Since technically this is already covered by the ABI, there's no need for a change. However, I'm going to go ahead and write up a PR to integrate Issue #63 into the document, and I'll cover this explicitly there.

@erichkeane
Copy link
Author

erichkeane commented Mar 30, 2022

Ok, thank you for the clarification, I think I have enough to fix this in Clang then. I'll make sure to add you to the review when I get around to implementing it.

EDIT: Actually, what about this case:

template <typename T>
class wrapper {
public:
	union {
		struct {
			T val;
                        T val2;
		};
	};
};

template <auto tparam> int dummy() { return 0; }

int main() {
	dummy<wrapper<int> {}>(); // succeeds due to "isZeroInitialized()"
	dummy<wrapper<int> { 42,43 }>();
	dummy<wrapper<int> { 42,44 }>();
	return 0;
}

Those two should be mangled differently, right? But the "For the purposes of mangling, the name of an anonymous union is considered to be the name of the first named data member found by a pre-order, depth-first, declaration-order walk of the data members of the anonymous union." rule you quoted above would ahve them both be val=42, right?

@rjmccall
Copy link
Collaborator

The active member of the union is the same in both of those cases: it's the member that contains the anonymous struct. The name of that member for mangling purposes is val.

I believe the way this is supposed to be mangled according to #63 is something like val = <anonymous struct for val> {42, x}, where x is the second element. I'm not sure what the mangling for an anonymous type is meant to be; if I understand correctly, the rule in Anonymous Entities is for the mangling for the declared object, not of the anonymous type. In any case, arguably the direct-initialization is unnecessary and it ought to just be val = {42, x}.

@erichkeane
Copy link
Author

Ah! Thanks and sorry for my confusion. I missed that the DFS was ONLY for the name, not the entire initialization.

Thanks!

@erichkeane
Copy link
Author

erichkeane commented Mar 30, 2022

Sorry, 1 more question! In this case, do we want to enumerate the bases? So given:

struct base {
  int FIRST;
};

....
union {
  struct : base {
    int SECOND;
  };
};

Do we name it "FIRST" or "SECOND"? Do we consider VIRTUAL bases, or just 'normal' bases?

One thing to note: I don't think the Clang AST keeps the order of virtual and normal bases, so any sort of 'declaration order' on bases is likely a more sizeable change for us in this regard.

@rjmccall
Copy link
Collaborator

I don't think anonymous structs are supposed to be able to have base classes; that seems like an oversight on Clang's behalf. GCC does not allow that. Clang also does not implement it "correctly"; the members of the base are not injected into the enclosing scope.

I don't think that question can come up otherwise; the traversal should never descend into a named class because it would have to be a named member.

@erichkeane
Copy link
Author

I'm not sure about the standard, I haven't really looked into it very closely in this regard, however I note that GCC is the only of the 'big 4' that enforce this : https://godbolt.org/z/EG3nEnMrn

That said, I think you've answered the question that we don't need to look into bases.

erichkeane pushed a commit to llvm/llvm-project that referenced this issue Apr 1, 2022
As reported in #54588
and discussed in itanium-cxx-abi/cxx-abi#139

We are supposed to do a DFS, pre-order, decl-order search for a name for
the union in this case. Prevoiusly we crashed because the IdentiferInfo
pointer was nullptr, so this makes sure we have a name in the cases
described by the ABI.

I added an llvm-unreachable to cover an unexpected case at the end of
the new function with information/reference to the ABI in case we come
up with some way to get back to here.

Differential Revision: https://reviews.llvm.org/D122820
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this issue Oct 7, 2022
As reported in llvm/llvm-project#54588
and discussed in itanium-cxx-abi/cxx-abi#139

We are supposed to do a DFS, pre-order, decl-order search for a name for
the union in this case. Prevoiusly we crashed because the IdentiferInfo
pointer was nullptr, so this makes sure we have a name in the cases
described by the ABI.

I added an llvm-unreachable to cover an unexpected case at the end of
the new function with information/reference to the ABI in case we come
up with some way to get back to here.

Differential Revision: https://reviews.llvm.org/D122820
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants