Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

whousemyname
Copy link

Motivation introduction

I want a td file to contain part of other td files instead of the entire file, so when using macros to implement it, I found that llvm-tblgen provides #define, #ifdef, #ifndef, #endif but does not provide #undef, so this PR The main purpose is to provide #undef support in llvm-tblgen, and when I was debugging, I found that if a td file ends with #endif, a td syntax error will occur, so I added a line to correct this bug. If there are any deficiencies, please point out

bug in llvm-tblgen

If you remove a lot of spaces in the last line of the test file for testing the llvm-tblgen macro, you will find that a compilation error will occur.
ex: llvm/test/TableGen/prep-ifndef.td

// RUN: llvm-tblgen %s -DMACRO | FileCheck %s
// RUN: llvm-tblgen %s | FileCheck %s --check-prefix=CHECK-NOMACRO

#ifndef MACRO
// CHECK-NOMACRO: def nomacro
def nomacro;
#else
// CHECK: def macro
def macro;
#endif

@github-actions
Copy link

github-actions bot commented Oct 16, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@whousemyname whousemyname changed the title [llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b… [llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b… Oct 16, 2023
@@ -664,7 +660,7 @@ tgtok::TokKind TGLexer::prepIsDirective() const {
// It looks like TableGen does not support '\r' as the actual
// carriage return, e.g. getNextChar() treats a single '\r'
// as '\n'. So we do the same here.
NextChar == '\r')
NextChar == '\r' || NextChar == '\0')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unrelated to a new keyword?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. It is to solve small bugs in llvm-tblgen.

Reason for doing this

When I was implementing undef, I found that if the td file ended with #endif, a compilation error would occur.

Take the following program as an example. For TGlexer, if it finds that this macro instruction is ifdef/ifndef and the macro is not defined, it will execute TGLexer::prepSkipRegion. This function will skip some characters until it encounters # , execute TGLexer::prepIsDirective to determine the type of macro, but if #endif happens to be the end of the file, the match will fail, so NextChar == '\0' is added for this.

#ifdef MACRO
#endif

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that make any case fail? I guess if a reproducible case is given, this deserves a dedicated PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fault, I have never been able to express myself clearly. If you open some td test cases (including correct test cases of #ifdef and #endif) under this path (llvm-project/llvm/test/TableGen), the next line after these td will be spaces.

Bug occurrence scenarios

td file ending with #endif (no characters after #endif, including newlines and comments), When you write a td file in vim, after wq saves it, the td file automatically uses the newline character as the last character, so you need to manually delete the last newline character. Afterwards, compilation errors will occur when using llvm-tblgen to compile.
Let me give you an example: If you open an IDE (vscode or the like) and create a new file test.td, enter the following content(Please make sure there are no characters after #endif):

#ifdef asdasdsasd
#endif

I can confirm that a compilation error will appear in this case:

tdtest.td:2:7: error: Reached EOF without matching #endif
#endif
      ^
tdtest.td:1:8: error: The latest preprocessor control is here
#ifdef asdasdsasd
       ^
tdtest.td:2:7: error: Unexpected token at top level
#endif

However, if Tablegen does not allow #endif as the end of the file, this is not a bug, so I checked the official documentation and found the following rules:

LineEnd                ::=  newline | return | EOF
PreEndif               ::=  LineBegin (WhiteSpaceOrCComment)*
                            "#endif" (WhiteSpaceOrAnyComment)* LineEnd

I'm not sure what return means, but I think it's a weird setting not to end with #endif.
Adding nextChar == '\0' can solve this compile, but there may be a more correct way to solve this problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the question I mentioned at the end of #endif, I will open a PR to explain this issue clearly.
I submitted the relevant test case in this PR( #69411 ). I think it is a reasonable and correct td (grammatical level), but there is no way for buildkite to pass the test case.

Copy link
Contributor

@jroelofs jroelofs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add test cases for the new behavior. You can find some examples under llvm/test/TableGen.

} else if (Kind == tgtok::Undef) {
StringRef MacroName = prepLexMacroName();
if (MacroName.empty())
return ReturnError(TokStart, "Expected macor name after #undef");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ReturnError(TokStart, "Expected macor name after #undef");
return ReturnError(TokStart, "Expected macro name after #undef");

return ReturnError(TokStart, "Expected macor name after #undef");

if (!DefinedMacros.erase(MacroName))
return ReturnError(TokStart, "undefine(#undef) an undefined macro");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the C preprocessor, this case isn't an error. That may come as a surprise to folks trying to use this.

@whousemyname whousemyname changed the title [llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b… [llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen Oct 18, 2023
@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 18, 2023

I don't see any tests that #undef actually undefs, just checking error cases?

@jroelofs
Copy link
Contributor

In addition, it would also be good to have a test that re-#defines after #undef-ing.

Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some concerns right now because this would technically allow to include the same TableGen file multiple times, which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would rather make sure we close to the door to multiple include of the same file before accidentally opening it!

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 19, 2023

I have some concerns right now because this would technically allow to include the same TableGen file multiple times

That's already true if you don't have an include guard?

which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

But regardless, I do not see why that is relevant here. #undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 19, 2023

I have some concerns right now because this would technically allow to include the same TableGen file multiple times

That's already true if you don't have an include guard?

which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

But regardless, I do not see why that is relevant here. #undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

(if you want that semantics, call it include[_-]once or import, not include, which normally means the does-exactly-what-you say "paste contents of file here")

@joker-eph
Copy link
Collaborator

That's already true if you don't have an include guard?

Technically true of course, and it would work with an empty file. But practically speaking without undef it's not clear to me how you can actually include multiple times a file to do anything useful?

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

TableGen is not C, and lucky for us... It's pretty rare that people actually want the C preprocessor in a language! Not a model of design in general...

But regardless, I do not see why that is relevant here.

I don't quite get what you mean about relevance considering what I wrote above.

#undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

(I didn't understand this sentence)

@whousemyname
Copy link
Author

I have some concerns right now because this would technically allow to include the same TableGen file multiple times, which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would rather make sure we close to the door to multiple include of the same file before accidentally opening it!

I am so sorry, I don't quite understand what you mean.
So is it still necessary to add the ‘#undef’ keyword?

@joker-eph
Copy link
Collaborator

I am so sorry, I don't quite understand what you mean. So is it still necessary to add the ‘#undef’ keyword?

I actually suspect that what you're trying to enable would be exactly something I don't quite see as desirable right now.
I can be wrong though, but the need for doing this hasn't arised in all the TableGen use I've seen, so I'm not able to weigh our conflicting goals against each other right now.
I read your Discourse thread, but don't find this very motivated right now: do you have real world use-cases that could be improved by this?

@whousemyname
Copy link
Author

I am so sorry, I don't quite understand what you mean. So is it still necessary to add the ‘#undef’ keyword?

I actually suspect that what you're trying to enable would be exactly something I don't quite see as desirable right now. I can be wrong though, but the need for doing this hasn't arised in all the TableGen use I've seen, so I'm not able to weigh our conflicting goals against each other right now. I read your Discourse thread, but don't find this very motivated right now: do you have real world use-cases that could be improved by this?

My usage scenario for #undef is this. I am doing some automated compilation work, so I need to inject some code into td. If there is no undef, I need to generate a lot of source code files and continuously include these files.
TDSourceCode:

include fileA
include fileB
include fileC

If I have undef, I can write the codes related to fileA, B, and C into one file, and then use different define to reference different code segments.
TDSourceCode:

#define  INCLUDE_fileA
include fileABC

#define  INCLUDE_fileB
include fileABC

#define  INCLUDE_fileC
include fileABC

Without #undef, the difference for me is that I need to create more source code files, for this example it is fileABC/ fileA, fileB, fileC

Thank for your time

@joker-eph
Copy link
Collaborator

I kind of follow what you're saying, but it's not clear to me why this is a good structure or the problem it really solves (that is I see the "made up" structure you described, but haven't seen a strong need for this in the real world so far).
Having three different files for the conceptually three different thing to include should be fine here.

@whousemyname
Copy link
Author

whousemyname commented Oct 20, 2023

I kind of follow what you're saying, but it's not clear to me why this is a good structure or the problem it really solves (that is I see the "made up" structure you described, but haven't seen a strong need for this in the real world so far). Having three different files for the conceptually three different thing to include should be fine here.

For a beginner like me, i may think that '#define' and '#undef' are a symmetrical design. That is one of the my starting points, another starting point is that using '#undef' will bring covenience to my project(even if without 'undef', my project is still be achievable, It's just that it may not be "beautiful" in design. )

I think maybe i should close this PR and thank all reviewers again for their review.

@jrtc27
Copy link
Collaborator

jrtc27 commented Oct 20, 2023

I agree with you and would prefer you keep it open rather than close it because of one objection; the way forward is to have discussion and reach consensus, not abandon anything that has an initially-dissenting view. I disagree with Mehdi's view of how include should work, and that we must forbid people from doing potentially-useful things.

@whousemyname
Copy link
Author

I agree with you and would prefer you keep it open rather than close it because of one objection; the way forward is to have discussion and reach consensus, not abandon anything that has an initially-dissenting view. I disagree with Mehdi's view of how include should work, and that we must forbid people from doing potentially-useful things.

ok. Got you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants