-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135
base: main
Are you sure you want to change the base?
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
19fc7b3
to
4400cd2
Compare
llvm/lib/TableGen/TGLexer.cpp
Outdated
@@ -664,7 +660,7 @@ tgtok::TokKind TGLexer::prepIsDirective() const { | |||
// It looks like TableGen does not support '\r' as the actual | |||
// carriage return, e.g. getNextChar() treats a single '\r' | |||
// as '\n'. So we do the same here. | |||
NextChar == '\r') | |||
NextChar == '\r' || NextChar == '\0') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unrelated to a new keyword?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right. It is to solve small bugs in llvm-tblgen.
Reason for doing this
When I was implementing undef, I found that if the td file ended with #endif, a compilation error would occur.
Take the following program as an example. For TGlexer, if it finds that this macro instruction is ifdef/ifndef and the macro is not defined, it will execute TGLexer::prepSkipRegion. This function will skip some characters until it encounters # , execute TGLexer::prepIsDirective to determine the type of macro, but if #endif happens to be the end of the file, the match will fail, so NextChar == '\0' is added for this.
#ifdef MACRO
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that make any case fail? I guess if a reproducible case is given, this deserves a dedicated PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My fault, I have never been able to express myself clearly. If you open some td test cases (including correct test cases of #ifdef and #endif) under this path (llvm-project/llvm/test/TableGen), the next line after these td will be spaces.
Bug occurrence scenarios
td file ending with #endif (no characters after #endif, including newlines and comments), When you write a td file in vim, after wq saves it, the td file automatically uses the newline character as the last character, so you need to manually delete the last newline character. Afterwards, compilation errors will occur when using llvm-tblgen to compile.
Let me give you an example: If you open an IDE (vscode or the like) and create a new file test.td, enter the following content(Please make sure there are no characters after #endif):
#ifdef asdasdsasd
#endif
I can confirm that a compilation error will appear in this case:
tdtest.td:2:7: error: Reached EOF without matching #endif
#endif
^
tdtest.td:1:8: error: The latest preprocessor control is here
#ifdef asdasdsasd
^
tdtest.td:2:7: error: Unexpected token at top level
#endif
However, if Tablegen does not allow #endif as the end of the file, this is not a bug, so I checked the official documentation and found the following rules:
LineEnd ::= newline | return | EOF
PreEndif ::= LineBegin (WhiteSpaceOrCComment)*
"#endif" (WhiteSpaceOrAnyComment)* LineEnd
I'm not sure what return means, but I think it's a weird setting not to end with #endif.
Adding nextChar == '\0' can solve this compile, but there may be a more correct way to solve this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the question I mentioned at the end of #endif, I will open a PR to explain this issue clearly.
I submitted the relevant test case in this PR( #69411 ). I think it is a reasonable and correct td (grammatical level), but there is no way for buildkite to pass the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add test cases for the new behavior. You can find some examples under llvm/test/TableGen
.
llvm/lib/TableGen/TGLexer.cpp
Outdated
} else if (Kind == tgtok::Undef) { | ||
StringRef MacroName = prepLexMacroName(); | ||
if (MacroName.empty()) | ||
return ReturnError(TokStart, "Expected macor name after #undef"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ReturnError(TokStart, "Expected macor name after #undef"); | |
return ReturnError(TokStart, "Expected macro name after #undef"); |
llvm/lib/TableGen/TGLexer.cpp
Outdated
return ReturnError(TokStart, "Expected macor name after #undef"); | ||
|
||
if (!DefinedMacros.erase(MacroName)) | ||
return ReturnError(TokStart, "undefine(#undef) an undefined macro"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the C preprocessor, this case isn't an error. That may come as a surprise to folks trying to use this.
4400cd2
to
9f5c83b
Compare
I don't see any tests that #undef actually undefs, just checking error cases? |
In addition, it would also be good to have a test that re- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns right now because this would technically allow to include the same TableGen file multiple times, which is something we've been recently discussing of taking advantage of forbidding, and making the #include
basically #include_once
.
I would rather make sure we close to the door to multiple include of the same file before accidentally opening it!
That's already true if you don't have an include guard?
I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion. But regardless, I do not see why that is relevant here. #undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always. |
(if you want that semantics, call it include[_-]once or import, not include, which normally means the does-exactly-what-you say "paste contents of file here") |
Technically true of course, and it would work with an empty file. But practically speaking without
TableGen is not C, and lucky for us... It's pretty rare that people actually want the C preprocessor in a language! Not a model of design in general...
I don't quite get what you mean about relevance considering what I wrote above.
(I didn't understand this sentence) |
…ug for llvm-tblgen
9f5c83b
to
d5e185c
Compare
I am so sorry, I don't quite understand what you mean. |
I actually suspect that what you're trying to enable would be exactly something I don't quite see as desirable right now. |
My usage scenario for #undef is this. I am doing some automated compilation work, so I need to inject some code into td. If there is no undef, I need to generate a lot of source code files and continuously include these files.
If I have undef, I can write the codes related to fileA, B, and C into one file, and then use different define to reference different code segments.
Without #undef, the difference for me is that I need to create more source code files, for this example it is fileABC/ fileA, fileB, fileC Thank for your time |
I kind of follow what you're saying, but it's not clear to me why this is a good structure or the problem it really solves (that is I see the "made up" structure you described, but haven't seen a strong need for this in the real world so far). |
For a beginner like me, i may think that '#define' and '#undef' are a symmetrical design. That is one of the my starting points, another starting point is that using '#undef' will bring covenience to my project(even if without 'undef', my project is still be achievable, It's just that it may not be "beautiful" in design. ) I think maybe i should close this PR and thank all reviewers again for their review. |
I agree with you and would prefer you keep it open rather than close it because of one objection; the way forward is to have discussion and reach consensus, not abandon anything that has an initially-dissenting view. I disagree with Mehdi's view of how include should work, and that we must forbid people from doing potentially-useful things. |
ok. Got you |
Motivation introduction
I want a td file to contain part of other td files instead of the entire file, so when using macros to implement it, I found that llvm-tblgen provides #define, #ifdef, #ifndef, #endif but does not provide #undef, so this PR The main purpose is to provide #undef support in llvm-tblgen, and when I was debugging, I found that if a td file ends with #endif, a td syntax error will occur, so I added a line to correct this bug. If there are any deficiencies, please point out
bug in llvm-tblgen
If you remove a lot of spaces in the last line of the test file for testing the llvm-tblgen macro, you will find that a compilation error will occur.
ex: llvm/test/TableGen/prep-ifndef.td