Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Support of TableGen statements in unwrapped line parser #78846

Merged
merged 2 commits into from
Jan 22, 2024

Conversation

hnakamura5
Copy link
Contributor

@hnakamura5 hnakamura5 commented Jan 20, 2024

Make TableGen's statements to be parsed considering their structure.

  • Avoid to parse label
  • Avoid class from being parsed as c++'s class
  • Support if statement of the form if <cond> then { ... }
  • Support defset statement of the form defset <type> <name> {}

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 20, 2024

@llvm/pr-subscribers-clang-format

Author: Hirofumi Nakamura (hnakamura5)

Changes

Make TableGen's statements to be parsed considering their structure.

  • Removed label
  • Avoid class from being parsed as c++'s class
  • Support if statement of the form if &lt;cond&gt; then { ... }
  • Support defset statement of the form defset &lt;type&gt; &lt;name&gt; {}

Full diff: https://github.com/llvm/llvm-project/pull/78846.diff

2 Files Affected:

  • (modified) clang/lib/Format/UnwrappedLineParser.cpp (+26-1)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+12)
diff --git a/clang/lib/Format/UnwrappedLineParser.cpp b/clang/lib/Format/UnwrappedLineParser.cpp
index c08ce86449b6ea6..a81c8e2971e2af9 100644
--- a/clang/lib/Format/UnwrappedLineParser.cpp
+++ b/clang/lib/Format/UnwrappedLineParser.cpp
@@ -1661,7 +1661,8 @@ void UnwrappedLineParser::parseStructuralElement(
     // In Verilog labels can be any expression, so we don't do them here.
     // JS doesn't have macros, and within classes colons indicate fields, not
     // labels.
-    if (!Style.isJavaScript() && !Style.isVerilog() &&
+    // TableGen doesn't have labels.
+    if (!Style.isJavaScript() && !Style.isVerilog() && !Style.isTableGen() &&
         Tokens->peekNextToken()->is(tok::colon) && !Line->MustBeDeclaration) {
       nextToken();
       Line->Tokens.begin()->Tok->MustBreakBefore = true;
@@ -1790,6 +1791,12 @@ void UnwrappedLineParser::parseStructuralElement(
         addUnwrappedLine();
         return;
       }
+      if (Style.isTableGen()) {
+        // Do nothing special. In this case the l_brace becomes FunctionLBrace.
+        // This is same as def and so on.
+        nextToken();
+        break;
+      }
       [[fallthrough]];
     case tok::kw_struct:
     case tok::kw_union:
@@ -2028,6 +2035,16 @@ void UnwrappedLineParser::parseStructuralElement(
         // initialisers are indented the same way.
         if (Style.isCSharp())
           FormatTok->setBlockKind(BK_BracedInit);
+        // TableGen's defset statement has syntax of the form,
+        // `defset <type> <name> = { <statement>... }`
+        if (Style.isTableGen() &&
+            Line->Tokens.begin()->Tok->is(Keywords.kw_defset)) {
+          FormatTok->setFinalizedType(TT_FunctionLBrace);
+          parseBlock(/*MustBeDeclaration=*/false, /*AddLevels=*/1u,
+                     /*MunchSemi=*/false);
+          addUnwrappedLine();
+          break;
+        }
         nextToken();
         parseBracedList();
       } else if (Style.Language == FormatStyle::LK_Proto &&
@@ -2743,6 +2760,14 @@ FormatToken *UnwrappedLineParser::parseIfThenElse(IfStmtKind *IfKind,
     }
   }
 
+  // TableGen's if statement has the form of `if <cond> then { ... }`.
+  if (Style.isTableGen()) {
+    while (!eof() && !(FormatTok->is(Keywords.kw_then))) {
+      // Simply skip until then. This range only contains a value.
+      nextToken();
+    }
+  }
+
   // Handle `if !consteval`.
   if (FormatTok->is(tok::exclaim))
     nextToken();
diff --git a/clang/unittests/Format/TokenAnnotatorTest.cpp b/clang/unittests/Format/TokenAnnotatorTest.cpp
index 64b2abac5cce531..6c065817892b543 100644
--- a/clang/unittests/Format/TokenAnnotatorTest.cpp
+++ b/clang/unittests/Format/TokenAnnotatorTest.cpp
@@ -2232,6 +2232,18 @@ TEST_F(TokenAnnotatorTest, UnderstandTableGenTokens) {
   EXPECT_TOKEN(Tokens[0], tok::identifier, TT_Unknown);
   Tokens = Annotate("01234Vector");
   EXPECT_TOKEN(Tokens[0], tok::identifier, TT_Unknown);
+
+  // Structured statements.
+  Tokens = Annotate("class Foo {}");
+  EXPECT_TOKEN(Tokens[2], tok::l_brace, TT_FunctionLBrace);
+  Tokens = Annotate("def Def: Foo {}");
+  EXPECT_TOKEN(Tokens[2], tok::colon, TT_InheritanceColon);
+  EXPECT_TOKEN(Tokens[4], tok::l_brace, TT_FunctionLBrace);
+  Tokens = Annotate("if cond then {} else {}");
+  EXPECT_TOKEN(Tokens[3], tok::l_brace, TT_ControlStatementLBrace);
+  EXPECT_TOKEN(Tokens[6], tok::l_brace, TT_ElseLBrace);
+  Tokens = Annotate("defset Foo Def2 = {}");
+  EXPECT_TOKEN(Tokens[4], tok::l_brace, TT_FunctionLBrace);
 }
 
 TEST_F(TokenAnnotatorTest, UnderstandConstructors) {

Co-authored-by: Björn Schäpers <github@hazardy.de>
Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually fascinated how small the diff is, the description led me to thinking that we'd have a +/-200 lines change.

@hnakamura5 hnakamura5 merged commit df4ba00 into llvm:main Jan 22, 2024
3 of 4 checks passed
@hnakamura5
Copy link
Contributor Author

Thank you very much!

how small the diff is

Maybe it is by TableGen's simple syntax, and that here we are parsing only the structure of the statements. (e.g. ignoring the <cond> part of if now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants