Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: fix parsing of anonymous entities with clang 16+ #190

Merged
merged 1 commit into from Oct 21, 2023

Conversation

jnikula
Copy link
Owner

@jnikula jnikula commented Oct 8, 2023

Clang 16 started returning names for anonymous entities [1], of the form:

(unnamed at /path/to/file::)

Additionally check for is_anonymous().

[1] llvm/llvm-project@19e984e


I'm not sure if this actually fixes anything, because I didn't try to install clang 16 yet. But at least this doesn't break anything on my clang 14.

Maybe a fix to #189.

@jnikula jnikula marked this pull request as draft October 8, 2023 20:12
@jnikula
Copy link
Owner Author

jnikula commented Oct 8, 2023

This is not enough.

@jnikula
Copy link
Owner Author

jnikula commented Oct 9, 2023

This is not enough.

Not enough, but it does make progress. It fixes all actually anonymous cases, where clang now returns the "helpful" (unnamed struct at /src/test/cpp/../c/struct.c:49:1) kind of names in cursor.spelling.

What remains are typedefs of anonymous structs/unions/enums, where the typedef name is now returned as the struct/union/enum name, and is_anonymous() returns false.

This might actually be a useful thing... except would have to make it work on pre-16 clang too.

Starting from libclang 16, typedefs of anonymous entities such as

	typedef struct {
		/* ... */
	} foo;

will no longer have an empty name for cursor.spelling. Instead, the
returned name will be cursor.type.spelling, matching the name of the
typedef, i.e. "foo" in the above case.

cursor.is_anonymous() will also be false for such entities, regardless
of libclang version.

Rather than try to coerce libclang 16 to the @anonymous_<sha1> naming
convention we have, manually use the same naming for libclang 15 and
earlier. It's arguably a more useful name anyway.

We first have to distinguish regular anonymous entities using
cursor.is_anonymous(). For libclang 15 and earlier, to distinguish from
typedef of anonyous entities. For libclang 16 and later, the name will
be of the form "unnamed ... at /path/to/file".
@jnikula jnikula marked this pull request as ready for review October 9, 2023 20:26
@jnikula
Copy link
Owner Author

jnikula commented Oct 9, 2023

Updated with a proper fix across clang versions.

@BrunoMSantos
Copy link
Collaborator

BrunoMSantos commented Oct 12, 2023

So it turns out this was a big part of why my tests were broken too, and it does improve things substantially (I'm on v16). That said, I still have plenty of other test issues. I still haven't gone through it all, but doesn't seem related to anonymous stuff, so I don't think they matter here, but, for the record, most errors seem to come from bool showing up as int.

Anyway, regarding these changes, the code looks fine. I'm not sure the new behaviour is better than the anonymous_<hash> though. Those were unique before and were obviously marked as anonymous type names, we can't say the same thing about the new ones. There's potential for someone referencing one of these unaware that it's not actually a named type they're referencing, maybe even clashing with a type elsewhere that actually bears the same name and that whole mess. Honestly, I'd rather coerce things to the original format across all versions.

edit: I'll investigate the other errors I'm getting later, I just wanted you to be aware. I think no one else is complaining so just ignore it for now.

edit2: Ok, the bool thing seems to have been uncovered already, it's #193 for sure. I still have other problems though with C++ access specifiers at least... Anyway, need to look at it later with more time.

@BrunoMSantos
Copy link
Collaborator

Honestly, I'd rather coerce things to the original format across all versions.

Or replicate the new behaviour across all versions... That would be even better I guess, though a bit more difficult perhaps.

@jnikula
Copy link
Owner Author

jnikula commented Oct 12, 2023

Or replicate the new behaviour across all versions... That would be even better I guess, though a bit more difficult perhaps.

That's what the commit is supposed to do, though!

is_anonymous() covers all "regular" anonymous entities. Before clang 16, they had empty string for cursor.spelling, and since clang 16, there's this "unnamed at" debug style thing.

After that, what remains are the typedeffed anonymous entities. Before clang 16, they also had empty string for cursor.spelling, but since clang 16, it's cursor.type.spelling - which is what I'm now using for before clang 16 too.

@jnikula
Copy link
Owner Author

jnikula commented Oct 12, 2023

Also, it seems actually harder to unify on the old behaviour. With old clang, you can tell it's a typedeffed anonymous entity from the empty cursor.spelling... but I don't know how to detect that case with clang 16!

@BrunoMSantos
Copy link
Collaborator

Sorry, I meant printing for all versions (e.g.):

c:enum:: enum (unnamed at /builddir/build/BUILD/hawkmoth-0.15.0/test/c/enum.c:18:1)

instead of

c:enum:: @anonymous_ef849cb791c3c921354e4b05dcefedfa

@BrunoMSantos
Copy link
Collaborator

Also, it seems actually harder to unify on the old behaviour. With old clang, you can tell it's a typedeffed anonymous entity from the empty cursor.spelling... but I don't know how to detect that case with clang 16!

I'm a bit rusty on this topic relative to you, but there's the 'unnamed at ...' somewhere, right? type.spelling I'm guessing? It shows up on the output without this patch at least.

@BrunoMSantos
Copy link
Collaborator

Ah wait, I think I get it. Never mind me, I'm doing too many things at once... I'll have a closer look later.

# empty string. Match the behaviour across libclang versions.
if cursor.spelling == '':
return cursor.type.spelling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is indeed tricky. The only way I can see around it is to look into the tokens, which is not great. At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...

With a quick test (v16), this should work as an alternative with no changes to the unit test results:

    # Short cut for anonymous symbols. (Not sure we need both conditions)
    if cursor.is_anonymous() or cursor.spelling == '':
        return None

    # libclang 16 and later have cursor.spelling == cursor.type.spelling for
    # typedefs of anonymous entities. Confirm that it's an anonymous
    # declaration by ensuring the 2nd token is a '{', e.g. 'struct {...'.
    tokens = cursor.get_tokens()
    next(tokens)
    if next(tokens).spelling == '{':
        return None

Not sure how brittle it is (it's quite a shallow approach), but it passes all tests so far.

Regardless of whether the token parsing needs to be more robust, we need to decide whether the output consistency is worth this sort of thing. I'm leaning yes: it solves a real issue and we have several precedents for looking into the tokens when the AST doesn't give us a digested answer by itself. What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...

As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation. It's not great, because it's a common pattern, and converting typedef struct { ... } foo to typedef struct foo { ... } foo feels silly. They are both typedefs, yet they both produce struct documentation, and you still can't reference the typedef in Sphinx in any way. Unless you separate the named struct and the typedef and document them separately.

I think the expectation actually is typedef struct { ... } foo documents a struct and lets you reference it as foo.

As a data point, Doxygen+Breathe also does this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation.

Hmm, that was a conscious decision we took a while back and I still believe it was the right call. There is a difference:

struct some_name {
        ...
};

/** This will be documented with `c:type`, not `c:struct`. */
typedef struct some_name new_name;

In the proposed solution, we document something as a struct which isn't. E.g., taking the example from the unit test, one cannot do struct typedef_struct var; but can certainly do struct named var;, yet the documentation would hint otherwise.

I agree it's a common pattern, but the real solution there would be to make the parser special case these scenarios and use c:type instead. I think that's a bit trickier though and I was happy to leave that in the wish list as long as we had expectable and simple behaviour. In fact we did try to do it the smart way before and it was the ensuing discussion over edge cases that led to the decision of sticking to the current behaviour. At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me 😅

As for Doxygen, does it special case it as a type documentation or does it document the typedef as a struct? Just curiosity... I wouldn't really condition any decision here on what Doxygen does.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.

If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation.

Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.

See also https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507/diffs?commit_id=fa3684a2caa3a250b1c3fbfaedcc6cdfc7f1ec5e

That's not required for Doxygen+Breathe.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me

Heh. In my defense, you've fixed the type fixups and it's all much clearer now. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation

Damn, that's a good point... Have you tried it though? Sphinx seems to eat this up just fine and renders it beautifully as the typedef it is. Did I miss anything?

.. c:type:: @anonymous_hash some_struct

   Some typedefed struct.

   .. c:member:: int foo

      Foo.

:c:type:`some_struct`

:c:member:`some_struct.foo`

Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.

Very true, and I don't want to fight it too much. I said my piece, and this is certainly not about the code, which looks fine. I am not thoroughly convinced this time around though :P

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to have to let this one simmer for a bit anyway, as I'm taking some time off. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.

Forgot to reply to this, but for completeness, no we don't! We will do that once we merge this, but currently we document the structure being typedefed as a structure. A structure that happens to be anonymous and so gets an anonymous name.

What you did in Mesa is frankly a poor workaround, even if I get where you're coming from. You're naming a structure because hawkmoth can't contextually document it as a typedef... It works better sure, but you never actually document the typedef which is what you wanted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in case it wasn't clear, I'm conceding my position. I just wanted to note my arguments against as best I could.

@jnikula jnikula merged commit a1544cf into master Oct 21, 2023
5 checks passed
@jnikula jnikula deleted the anonymous-fix branch October 21, 2023 18:38
@jnikula
Copy link
Owner Author

jnikula commented Oct 21, 2023

So I decided to merge this in the end. This is what clang gives us as the name from v16 on, so let's roll with that. If folks want to document typedefs and structs separately, they still can by splitting their definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants