[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

whousemyname · 2023-10-16T02:20:22Z

Motivation introduction

I want a td file to contain part of other td files instead of the entire file, so when using macros to implement it, I found that llvm-tblgen provides #define, #ifdef, #ifndef, #endif but does not provide #undef, so this PR The main purpose is to provide #undef support in llvm-tblgen, and when I was debugging, I found that if a td file ends with #endif, a td syntax error will occur, so I added a line to correct this bug. If there are any deficiencies, please point out

bug in llvm-tblgen

If you remove a lot of spaces in the last line of the test file for testing the llvm-tblgen macro, you will find that a compilation error will occur.
ex: llvm/test/TableGen/prep-ifndef.td

// RUN: llvm-tblgen %s -DMACRO | FileCheck %s
// RUN: llvm-tblgen %s | FileCheck %s --check-prefix=CHECK-NOMACRO

#ifndef MACRO
// CHECK-NOMACRO: def nomacro
def nomacro;
#else
// CHECK: def macro
def macro;
#endif

github-actions · 2023-10-16T02:34:31Z

✅ With the latest revision this PR passed the C/C++ code formatter.

ecnelises · 2023-10-16T08:49:02Z

llvm/lib/TableGen/TGLexer.cpp

@@ -664,7 +660,7 @@ tgtok::TokKind TGLexer::prepIsDirective() const {
          // It looks like TableGen does not support '\r' as the actual
          // carriage return, e.g. getNextChar() treats a single '\r'
          // as '\n'.  So we do the same here.
-          NextChar == '\r')
+          NextChar == '\r' || NextChar == '\0')


Seems unrelated to a new keyword?

you are right. It is to solve small bugs in llvm-tblgen.

Reason for doing this

When I was implementing undef, I found that if the td file ended with #endif, a compilation error would occur.

Take the following program as an example. For TGlexer, if it finds that this macro instruction is ifdef/ifndef and the macro is not defined, it will execute TGLexer::prepSkipRegion. This function will skip some characters until it encounters # , execute TGLexer::prepIsDirective to determine the type of macro, but if #endif happens to be the end of the file, the match will fail, so NextChar == '\0' is added for this.

#ifdef MACRO #endif

Wouldn't that make any case fail? I guess if a reproducible case is given, this deserves a dedicated PR

My fault, I have never been able to express myself clearly. If you open some td test cases (including correct test cases of #ifdef and #endif) under this path (llvm-project/llvm/test/TableGen), the next line after these td will be spaces.

Bug occurrence scenarios

td file ending with #endif (no characters after #endif, including newlines and comments), When you write a td file in vim, after wq saves it, the td file automatically uses the newline character as the last character, so you need to manually delete the last newline character. Afterwards, compilation errors will occur when using llvm-tblgen to compile.
Let me give you an example: If you open an IDE (vscode or the like) and create a new file test.td, enter the following content(Please make sure there are no characters after #endif):

#ifdef asdasdsasd #endif

I can confirm that a compilation error will appear in this case:

tdtest.td:2:7: error: Reached EOF without matching #endif #endif ^ tdtest.td:1:8: error: The latest preprocessor control is here #ifdef asdasdsasd ^ tdtest.td:2:7: error: Unexpected token at top level #endif

However, if Tablegen does not allow #endif as the end of the file, this is not a bug, so I checked the official documentation and found the following rules:

LineEnd ::= newline | return | EOF PreEndif ::= LineBegin (WhiteSpaceOrCComment)* "#endif" (WhiteSpaceOrAnyComment)* LineEnd

I'm not sure what return means, but I think it's a weird setting not to end with #endif.
Adding nextChar == '\0' can solve this compile, but there may be a more correct way to solve this problem.

Regarding the question I mentioned at the end of #endif, I will open a PR to explain this issue clearly.
I submitted the relevant test case in this PR( #69411 ). I think it is a reasonable and correct td (grammatical level), but there is no way for buildkite to pass the test case.

jroelofs

Please add test cases for the new behavior. You can find some examples under llvm/test/TableGen.

jroelofs · 2023-10-17T16:43:49Z

llvm/lib/TableGen/TGLexer.cpp

+  } else if (Kind == tgtok::Undef) {
+    StringRef MacroName = prepLexMacroName();
+    if (MacroName.empty())
+      return ReturnError(TokStart, "Expected macor name after #undef");


Suggested change

return ReturnError(TokStart, "Expected macor name after #undef");

return ReturnError(TokStart, "Expected macro name after #undef");

jroelofs · 2023-10-17T17:03:45Z

llvm/lib/TableGen/TGLexer.cpp

+      return ReturnError(TokStart, "Expected macor name after #undef");
+
+    if (!DefinedMacros.erase(MacroName))
+      return ReturnError(TokStart, "undefine(#undef) an undefined macro");


In the C preprocessor, this case isn't an error. That may come as a surprise to folks trying to use this.

jrtc27 · 2023-10-18T16:27:08Z

I don't see any tests that #undef actually undefs, just checking error cases?

jroelofs · 2023-10-18T16:51:15Z

In addition, it would also be good to have a test that re-#defines after #undef-ing.

joker-eph

I have some concerns right now because this would technically allow to include the same TableGen file multiple times, which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would rather make sure we close to the door to multiple include of the same file before accidentally opening it!

jrtc27 · 2023-10-19T03:43:42Z

I have some concerns right now because this would technically allow to include the same TableGen file multiple times

That's already true if you don't have an include guard?

which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

But regardless, I do not see why that is relevant here. #undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

jrtc27 · 2023-10-19T03:57:05Z

I have some concerns right now because this would technically allow to include the same TableGen file multiple times

That's already true if you don't have an include guard?

which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

But regardless, I do not see why that is relevant here. #undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

(if you want that semantics, call it include[_-]once or import, not include, which normally means the does-exactly-what-you say "paste contents of file here")

joker-eph · 2023-10-19T03:59:48Z

That's already true if you don't have an include guard?

Technically true of course, and it would work with an empty file. But practically speaking without undef it's not clear to me how you can actually include multiple times a file to do anything useful?

I would have concerns about that; including a file multiple times can be useful, and diverging from C's preprocessor sounds like a bad idea and will lead to confusion.

TableGen is not C, and lucky for us... It's pretty rare that people actually want the C preprocessor in a language! Not a model of design in general...

But regardless, I do not see why that is relevant here.

I don't quite get what you mean about relevance considering what I wrote above.

#undef is only relevant in the sense that #ifdef can be used to write an include guard, but not always.

(I didn't understand this sentence)

…ug for llvm-tblgen

whousemyname · 2023-10-19T07:34:21Z

I have some concerns right now because this would technically allow to include the same TableGen file multiple times, which is something we've been recently discussing of taking advantage of forbidding, and making the #include basically #include_once.

I would rather make sure we close to the door to multiple include of the same file before accidentally opening it!

I am so sorry, I don't quite understand what you mean.
So is it still necessary to add the ‘#undef’ keyword?

joker-eph · 2023-10-19T07:54:56Z

I am so sorry, I don't quite understand what you mean. So is it still necessary to add the ‘#undef’ keyword?

I actually suspect that what you're trying to enable would be exactly something I don't quite see as desirable right now.
I can be wrong though, but the need for doing this hasn't arised in all the TableGen use I've seen, so I'm not able to weigh our conflicting goals against each other right now.
I read your Discourse thread, but don't find this very motivated right now: do you have real world use-cases that could be improved by this?

whousemyname · 2023-10-19T08:27:59Z

I am so sorry, I don't quite understand what you mean. So is it still necessary to add the ‘#undef’ keyword?

I actually suspect that what you're trying to enable would be exactly something I don't quite see as desirable right now. I can be wrong though, but the need for doing this hasn't arised in all the TableGen use I've seen, so I'm not able to weigh our conflicting goals against each other right now. I read your Discourse thread, but don't find this very motivated right now: do you have real world use-cases that could be improved by this?

My usage scenario for #undef is this. I am doing some automated compilation work, so I need to inject some code into td. If there is no undef, I need to generate a lot of source code files and continuously include these files.
TDSourceCode：

include fileA
include fileB
include fileC

If I have undef, I can write the codes related to fileA, B, and C into one file, and then use different define to reference different code segments.
TDSourceCode：

#define  INCLUDE_fileA
include fileABC

#define  INCLUDE_fileB
include fileABC

#define  INCLUDE_fileC
include fileABC

Without #undef, the difference for me is that I need to create more source code files, for this example it is fileABC/ fileA, fileB, fileC

Thank for your time

joker-eph · 2023-10-19T21:14:54Z

I kind of follow what you're saying, but it's not clear to me why this is a good structure or the problem it really solves (that is I see the "made up" structure you described, but haven't seen a strong need for this in the real world so far).
Having three different files for the conceptually three different thing to include should be fine here.

whousemyname · 2023-10-20T01:47:56Z

I kind of follow what you're saying, but it's not clear to me why this is a good structure or the problem it really solves (that is I see the "made up" structure you described, but haven't seen a strong need for this in the real world so far). Having three different files for the conceptually three different thing to include should be fine here.

For a beginner like me, i may think that '#define' and '#undef' are a symmetrical design. That is one of the my starting points, another starting point is that using '#undef' will bring covenience to my project(even if without 'undef', my project is still be achievable, It's just that it may not be "beautiful" in design. )

I think maybe i should close this PR and thank all reviewers again for their review.

jrtc27 · 2023-10-20T01:58:39Z

I agree with you and would prefer you keep it open rather than close it because of one objection; the way forward is to have discussion and reach consensus, not abandon anything that has an initially-dissenting view. I disagree with Mehdi's view of how include should work, and that we must forbid people from doing potentially-useful things.

whousemyname · 2023-10-20T02:09:11Z

I agree with you and would prefer you keep it open rather than close it because of one objection; the way forward is to have discussion and reach consensus, not abandon anything that has an initially-dissenting view. I disagree with Mehdi's view of how include should work, and that we must forbid people from doing potentially-useful things.

ok. Got you

whousemyname force-pushed the patch-tblgen branch from 19fc7b3 to 4400cd2 Compare October 16, 2023 03:53

whousemyname changed the title ~~[llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b…~~ [llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b… Oct 16, 2023

ecnelises reviewed Oct 16, 2023

View reviewed changes

jroelofs requested changes Oct 17, 2023

View reviewed changes

whousemyname force-pushed the patch-tblgen branch from 4400cd2 to 9f5c83b Compare October 18, 2023 02:08

whousemyname requested review from ecnelises and jroelofs October 18, 2023 02:44

whousemyname changed the title ~~[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b…~~ [llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen Oct 18, 2023

joker-eph requested changes Oct 19, 2023

View reviewed changes

[llvm-tblgen] Added keyword #undef to llvm-tblgen and fixed a small b…

d5e185c

…ug for llvm-tblgen

whousemyname force-pushed the patch-tblgen branch from 9f5c83b to d5e185c Compare October 19, 2023 07:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

whousemyname commented Oct 16, 2023

github-actions bot commented Oct 16, 2023 •

edited

ecnelises Oct 16, 2023

whousemyname Oct 16, 2023

ecnelises Oct 17, 2023

whousemyname Oct 17, 2023

whousemyname Oct 18, 2023 •

edited

jroelofs left a comment

jroelofs Oct 17, 2023

jroelofs Oct 17, 2023

jrtc27 commented Oct 18, 2023

jroelofs commented Oct 18, 2023

joker-eph left a comment

jrtc27 commented Oct 19, 2023

jrtc27 commented Oct 19, 2023

joker-eph commented Oct 19, 2023

whousemyname commented Oct 19, 2023

joker-eph commented Oct 19, 2023

whousemyname commented Oct 19, 2023

joker-eph commented Oct 19, 2023

whousemyname commented Oct 20, 2023 •

edited

jrtc27 commented Oct 20, 2023 •

edited

whousemyname commented Oct 20, 2023

	return ReturnError(TokStart, "Expected macor name after #undef");
	return ReturnError(TokStart, "Expected macro name after #undef");

[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

Are you sure you want to change the base?

[llvm][Tablegen][llvm-tblgen] Added keyword #undef to llvm-tblgen #69135

Conversation

whousemyname commented Oct 16, 2023

Motivation introduction

bug in llvm-tblgen

github-actions bot commented Oct 16, 2023 • edited

ecnelises Oct 16, 2023

Choose a reason for hiding this comment

whousemyname Oct 16, 2023

Choose a reason for hiding this comment

Reason for doing this

ecnelises Oct 17, 2023

Choose a reason for hiding this comment

whousemyname Oct 17, 2023

Choose a reason for hiding this comment

Bug occurrence scenarios

whousemyname Oct 18, 2023 • edited

Choose a reason for hiding this comment

jroelofs left a comment

Choose a reason for hiding this comment

jroelofs Oct 17, 2023

Choose a reason for hiding this comment

jroelofs Oct 17, 2023

Choose a reason for hiding this comment

jrtc27 commented Oct 18, 2023

jroelofs commented Oct 18, 2023

joker-eph left a comment

Choose a reason for hiding this comment

jrtc27 commented Oct 19, 2023

jrtc27 commented Oct 19, 2023

joker-eph commented Oct 19, 2023

whousemyname commented Oct 19, 2023

joker-eph commented Oct 19, 2023

whousemyname commented Oct 19, 2023

Thank for your time

joker-eph commented Oct 19, 2023

whousemyname commented Oct 20, 2023 • edited

I think maybe i should close this PR and thank all reviewers again for their review.

jrtc27 commented Oct 20, 2023 • edited

whousemyname commented Oct 20, 2023

github-actions bot commented Oct 16, 2023 •

edited

whousemyname Oct 18, 2023 •

edited

whousemyname commented Oct 20, 2023 •

edited

jrtc27 commented Oct 20, 2023 •

edited