Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: fix parsing of anonymous entities with clang 16+ #190

Merged
merged 1 commit into from
Oct 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 7 additions & 1 deletion src/hawkmoth/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -507,9 +507,15 @@ def _type_definition_fixup(cursor):
type_elem = []

# Short cut for anonymous symbols.
if cursor.spelling == '':
if cursor.is_anonymous():
return None

# libclang 16 and later have cursor.spelling == cursor.type.spelling for
# typedefs of anonymous entities, while libclang 15 and earlier have an
# empty string. Match the behaviour across libclang versions.
if cursor.spelling == '':
return cursor.type.spelling

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is indeed tricky. The only way I can see around it is to look into the tokens, which is not great. At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...

With a quick test (v16), this should work as an alternative with no changes to the unit test results:

    # Short cut for anonymous symbols. (Not sure we need both conditions)
    if cursor.is_anonymous() or cursor.spelling == '':
        return None

    # libclang 16 and later have cursor.spelling == cursor.type.spelling for
    # typedefs of anonymous entities. Confirm that it's an anonymous
    # declaration by ensuring the 2nd token is a '{', e.g. 'struct {...'.
    tokens = cursor.get_tokens()
    next(tokens)
    if next(tokens).spelling == '{':
        return None

Not sure how brittle it is (it's quite a shallow approach), but it passes all tests so far.

Regardless of whether the token parsing needs to be more robust, we need to decide whether the output consistency is worth this sort of thing. I'm leaning yes: it solves a real issue and we have several precedents for looking into the tokens when the AST doesn't give us a digested answer by itself. What do you think?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time I'm not happy with the proposed solution where the typedefs are erroneously documented as structures. It's unexpected and may lead to confusion...

As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation. It's not great, because it's a common pattern, and converting typedef struct { ... } foo to typedef struct foo { ... } foo feels silly. They are both typedefs, yet they both produce struct documentation, and you still can't reference the typedef in Sphinx in any way. Unless you separate the named struct and the typedef and document them separately.

I think the expectation actually is typedef struct { ... } foo documents a struct and lets you reference it as foo.

As a data point, Doxygen+Breathe also does this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a practical example, when converting Mesa to use Hawkmoth, I've had to give names to a bunch of typedeffed anonymous structs just because of documentation.

Hmm, that was a conscious decision we took a while back and I still believe it was the right call. There is a difference:

struct some_name {
        ...
};

/** This will be documented with `c:type`, not `c:struct`. */
typedef struct some_name new_name;

In the proposed solution, we document something as a struct which isn't. E.g., taking the example from the unit test, one cannot do struct typedef_struct var; but can certainly do struct named var;, yet the documentation would hint otherwise.

I agree it's a common pattern, but the real solution there would be to make the parser special case these scenarios and use c:type instead. I think that's a bit trickier though and I was happy to leave that in the wish list as long as we had expectable and simple behaviour. In fact we did try to do it the smart way before and it was the ensuing discussion over edge cases that led to the decision of sticking to the current behaviour. At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me 😅

As for Doxygen, does it special case it as a type documentation or does it document the typedef as a struct? Just curiosity... I wouldn't really condition any decision here on what Doxygen does.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.

If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation.

Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.

See also https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507/diffs?commit_id=fa3684a2caa3a250b1c3fbfaedcc6cdfc7f1ec5e

That's not required for Doxygen+Breathe.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time we actually had opposite roles in the discussion if I remember correctly, but you thoroughly convinced me

Heh. In my defense, you've fixed the type fixups and it's all much clearer now. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to document such things as types, the user would be forced to separate the struct and typedef both in code and documentation, because you can't have member documentation within a type documentation

Damn, that's a good point... Have you tried it though? Sphinx seems to eat this up just fine and renders it beautifully as the typedef it is. Did I miss anything?

.. c:type:: @anonymous_hash some_struct

   Some typedefed struct.

   .. c:member:: int foo

      Foo.

:c:type:`some_struct`

:c:member:`some_struct.foo`

Arguably defaulting to the typedef name is more flexible, because you can use that as the easy default, but you can also separate them and document the struct and type separately, but you're not forced to do that for the common case.

Very true, and I don't want to fight it too much. I said my piece, and this is certainly not about the code, which looks fine. I am not thoroughly convinced this time around though :P

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to have to let this one simmer for a bit anyway, as I'm taking some time off. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doxygen+Breathe documents the typedeffed structs as structs, not as types, regardless of whether it's anonymous or not. But hey, so do we! The question here is, what name to give that.

Forgot to reply to this, but for completeness, no we don't! We will do that once we merge this, but currently we document the structure being typedefed as a structure. A structure that happens to be anonymous and so gets an anonymous name.

What you did in Mesa is frankly a poor workaround, even if I get where you're coming from. You're naming a structure because hawkmoth can't contextually document it as a typedef... It works better sure, but you never actually document the typedef which is what you wanted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in case it wasn't clear, I'm conceding my position. I just wanted to note my arguments against as best I could.

type_elem.extend(_specifiers_fixup(cursor, cursor.type))

colon_suffix = ''
Expand Down
2 changes: 1 addition & 1 deletion test/c/typedef-enum.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
named enumeration


.. c:enum:: @anonymous_90372874a3c8c25dccf983612f39e93f
.. c:enum:: unnamed_t

unnamed typedeffed enum

Expand Down
2 changes: 1 addition & 1 deletion test/c/typedef-struct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
named member


.. c:struct:: @anonymous_7f9a1d628cd33f3227f3fcdc3a405aa6
.. c:struct:: typedef_struct

unnamed typedeffed struct

Expand Down
2 changes: 1 addition & 1 deletion test/cpp/typedef-enum.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
named enumeration


.. cpp:enum:: @anonymous_90372874a3c8c25dccf983612f39e93f
.. cpp:enum:: unnamed_t

unnamed typedeffed enum

Expand Down
2 changes: 1 addition & 1 deletion test/cpp/typedef-struct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
named member


.. cpp:struct:: @anonymous_7f9a1d628cd33f3227f3fcdc3a405aa6
.. cpp:struct:: typedef_struct

unnamed typedeffed struct

Expand Down