Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No custom C++11 attributes are parsed #57748

Open
JhnW opened this issue Sep 14, 2022 · 6 comments
Open

No custom C++11 attributes are parsed #57748

JhnW opened this issue Sep 14, 2022 · 6 comments
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:tooling LibTooling enhancement Improving things as opposed to bug fixing, e.g. new or missing feature

Comments

@JhnW
Copy link

JhnW commented Sep 14, 2022

Hi.
The C ++ 17 standard specifies that attributes not defined in the compiler implementation will be ignored without error. I am using the C ++ 17 clang parser module to extract information (being strict, python bins but it doesn't matter now).
Clang ignores unfamiliar attributes at the stage of parsing the code.
This is quite a problem for my project, libclang is a great parser, and information about unfamiliar attributes (even as UNEXPOSED_ATTR) along with parsing their arguments would make it easier to use it as part of an advanced processor (something I do in my own project).
Is it possible to expose this information in clang lib and ignore it at another stage?

@EugeneZelenko EugeneZelenko added clang:tooling LibTooling and removed new issue labels Sep 14, 2022
@AaronBallman AaronBallman added enhancement Improving things as opposed to bug fixing, e.g. new or missing feature clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 15, 2022
@AaronBallman
Copy link
Collaborator

The C ++ 17 standard specifies that attributes not defined in the compiler implementation will be ignored without error.

The C++ standard doesn't specify anything about diagnostic strength. What it does say in [dcl.attr]p6 (http://eel.is/c++draft/dcl.attr#grammar-6): For an attribute-token (including an attribute-scoped-token) not specified in this document, the behavior is implementation-defined. Any attribute-token that is not recognized by the implementation is ignored.

"Ignored" doesn't mean "not diagnosed" (our implementation-defined behavior is to diagnose unknown attributes by default because of the potential for hard-to-debug problems stemming from vendor-specified attributes that we don't support).

Clang ignores unfamiliar attributes at the stage of parsing the code.

Correct.

This is quite a problem for my project, libclang is a great parser, and information about unfamiliar attributes (even as UNEXPOSED_ATTR) along with parsing their arguments would make it easier to use it as part of an advanced processor (something I do in my own project).
Is it possible to expose this information in clang lib and ignore it at another stage?

Yes, but it's a qualified yes. What we'd like to do is introduce a new UnknownAttr semantic attribute class. When the parser finds an attribute it doesn't recognize, it would create one of these so that the information is retained in the AST. This is helpful for all sorts of use cases, such as pretty printing the AST back to source or doing AST matchers from clang-tidy. However, the challenge is with attribute arguments. Because those can be arbitrary token soup (with balanced delimiters), we can't make AST nodes for any of the arguments, or really do anything more than store the tokens that comprise the arguments. We had this same issue when dealing with plugin attributes and I don't know that we've found a better approach there.

That said, I'm not aware of anyone actively working on this.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 15, 2022

@llvm/issue-subscribers-clang-frontend

@JhnW
Copy link
Author

JhnW commented Sep 15, 2022

The C++ standard doesn't specify anything about diagnostic strength. What it does say in [dcl.attr]p6 (http://eel.is/c++draft/dcl.attr#grammar-6): For an attribute-token (including an attribute-scoped-token) not specified in this document, the behavior is implementation-defined. Any attribute-token that is not recognized by the implementation is ignored.

"Ignored" doesn't mean "not diagnosed" (our implementation-defined behavior is to diagnose unknown attributes by default because of the potential for hard-to-debug problems stemming from vendor-specified attributes that we don't support).

I didn't mean no diagnosis but no error. From what I can see, most interpretations of the C ++ standard (and implementation) agree with this concept. Before C ++ 17, there was no ignore condition, making the whole thing dependent on the implementation (allowing you to throw a hard bug in any case, for example).
But I don't think that should matter to a clang parser. While the compiler is "implementation-defined", a parser is a much more universal tool used for code that gets compiled with other tools.

Thank you very much for a comprehensive answer. From my point of view, the list of tokens would be sufficient (almost even the mere presence of the element in children along with the location in the code would be amazing).

That said, I'm not aware of anyone actively working on this.

Pity. It seems to me that this is not a good issue to get into a clang (PR) project on the other hand as I think about parsing that regex (not for the first time) I get seizures. While external parsing things in a "preamble" of type is okay for arguments etc. is monstrous.

@AaronBallman
Copy link
Collaborator

I didn't mean no diagnosis but no error.

We don't diagnose as an error (unless you opt-in to doing that with -Werror, so I might be misunderstanding you: https://godbolt.org/z/brr7xfhhe

But I don't think that should matter to a clang parser. While the compiler is "implementation-defined", a parser is a much more universal tool used for code that gets compiled with other tools.

As far as the standard is concerned, the implementation is... everything. It's the compiler, the linker, and the standard library (everything needed to meet the requirements of the abstract machine). That said, I think it's quite reasonable for this to not be handled at the parsing level for [[]] style attributes only, because we know how to parse unknown attributes without causing problems. The same is not true of the other attribute styles, but we might (hopefully) still be able to support them the same way as well. They're a bit trickier though because some attributes (like GNU-style attributes) can change parsing behavior in terms of whether something is parsed as a type or not.

Thank you very much for a comprehensive answer. From my point of view, the list of tokens would be sufficient (almost even the mere presence of the element in children along with the location in the code would be amazing).

That's good to know, thanks! If we're storing actual Token objects, then we'd also have all the source location information, but I'm not even certain we'll be able to do that much. For example, consider a hypothetical vendor attribute of [[vendor::foo(identifiers are easy, but what about § which is a punctuator but not a token and " which is not terminated and also isn't a token? Oops, there is also an unterminated single quote in there too and these commas do not separate arguments.)]]

Pity. It seems to me that this is not a good issue to get into a clang (PR) project on the other hand as I think about parsing that regex (not for the first time) I get seizures. While external parsing things in a "preamble" of type is okay for arguments etc. is monstrous.

I think I agree that this isn't a good first issue for getting into Clang. But I also think it wouldn't be the hardest thing to support either. For example, an initial pass at this could ignore the problem with the attribute arguments by instead exposing the left paren and right paren location for the argument list. At least with that amount of information, an enterprising person can go back through the source manager to get the source text for the arguments (so you could still pretty print from the AST back to source, for example), but we've made a good step towards at least retaining the unknown attribute in the AST.

@JhnW
Copy link
Author

JhnW commented Sep 15, 2022

We don't diagnose as an error (unless you opt-in to doing that with -Werror, so I might be misunderstanding you:

I know. Maybe I didn't express myself clearly / I fell into digressions too much;)

They're a bit trickier though because some attributes (like GNU-style attributes) can change parsing behavior in terms of whether something is parsed as a type or not.

Good point. I haven't thought of any attributes that might change the parsing context.

My biggest disappointment was the complete absence of unknown attributes in the element's children's list. With this, perhaps without even a token list, just knowing the loops in the code, the use of attribute information is now within reach.
Seems to me a bare minimum option?

@AaronBallman
Copy link
Collaborator

My biggest disappointment was the complete absence of unknown attributes in the element's children's list. With this, perhaps without even a token list, just knowing the loops in the code, the use of attribute information is now within reach. Seems to me a bare minimum option?

That's my thinking as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:tooling LibTooling enhancement Improving things as opposed to bug fixing, e.g. new or missing feature
Projects
None yet
Development

No branches or pull requests

4 participants