Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang-format] Support of TableGen formatting. #76059

Closed
wants to merge 2 commits into from

Conversation

hnakamura5
Copy link
Contributor

Currently, TableGen has its language style but the it does not works well. This patch adds total support of TableGen formatting including the support for the code (multi line string), DAG args, bang operators, the cond operator, and the paste operators.

Options

TableGenAllowBreakBeforeInheritColon (Boolean)

Allows break before the colon of inheritance.

def Def
    : Parent {};

By default this is false.

TableGenAllowBreakAfterInheritColon (Boolean)

Allows break after the colon of inheritance.

def Def :
    Parent {};

By default this is true.

TableGenBreakInsideCondOperator (Boolean)

Insert the line break after each case of !cond operator.

  let CondOpe1 = !cond(!eq(size, 1): 1,
                       !eq(size, 16): 1,
                       true: 0);

By default this is true.

TableGenBreakInsideDAGArgList (Boolean)

Insert the line break after each element of DAGArg list.

  let DAGArgIns = (ins
      i32:$src1,
      i32:$src2
  );

By default this is false.

TableGenPreferBreakInsideSquareBracket (Boolean)

For whom likes such a style.

def Def : Parent<"Def",[
    a, b, c
]> {
  ...
}

By default this is false.

TableGenSpaceAroundDAGArgColon (Boolean)

Insert the space around the colon inside a DAGArg list.

  let DAGArgIns = (ins i32 : $src1, i32 : $src2);

By default this is false.

TableGenBreakingDAGArgOperators (List of Strings)

Works only when TableGenBreakInsideDAGArgList is true.
The list needs to be consists of identifiers.
If any identifier is specified, this limits the effect of TableGenBreakInsideDAGArgList only on DAGArgs beginning with the specified identifiers.

For example the configuration,

TableGenBreakingDAGArgOperators: ['ins', 'outs']

makes the line break only occurs inside DAGArgs beginning with the specified identifiers 'ins' and 'outs'.

let DAGArgIns = (ins
    i32:$src1,
    i32:$src2
);

let DAGArgOthers = (others i32:$other1, i32:$other2);

let DAGArgBang = (!cast<SomeType>("Some") i32:$src1, i32:$src2)

AlignConsecutiveTableGenCondOperatorColons

Supports AlignConsecutiveStyle configuration.
Align the colons inside !cond operators.

let CondOpe1 = !cond(!eq(size, 1) : 1,
                     !eq(size, 16): 1,
                     true         : 0);

AlignConsecutiveTableGenBreakingDAGArgColons

Supports AlignConsecutiveStyle configuration.
This works only when TableGenBreakInsideDAGArgList is true and the DAGArg is not excepted by TableGenBreakingDAGArgOperators's effect.
Align the colon inside DAGArg which have line break inside.

let dagarg = (ins
    a  :$src1```
    aa :$src2
    aaa:$src3
)

AlignConsecutiveTableGenDefinitions

Supports AlignConsecutiveStyle configuration.
This aligns the inherits colons of consecutive definitions.

def Def       : Parent {}
def DefDef    : Parent {}
def DefDefDef : Parent {}

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 20, 2023

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-format

Author: H.Nakamura (hnakamura5)

Changes

Currently, TableGen has its language style but the it does not works well. This patch adds total support of TableGen formatting including the support for the code (multi line string), DAG args, bang operators, the cond operator, and the paste operators.

Options

TableGenAllowBreakBeforeInheritColon (Boolean)

Allows break before the colon of inheritance.

def Def
    : Parent {};

By default this is false.

TableGenAllowBreakAfterInheritColon (Boolean)

Allows break after the colon of inheritance.

def Def :
    Parent {};

By default this is true.

TableGenBreakInsideCondOperator (Boolean)

Insert the line break after each case of !cond operator.

  let CondOpe1 = !cond(!eq(size, 1): 1,
                       !eq(size, 16): 1,
                       true: 0);

By default this is true.

TableGenBreakInsideDAGArgList (Boolean)

Insert the line break after each element of DAGArg list.

  let DAGArgIns = (ins
      i32:$src1,
      i32:$src2
  );

By default this is false.

TableGenPreferBreakInsideSquareBracket (Boolean)

For whom likes such a style.

def Def : Parent&lt;"Def",[
    a, b, c
]&gt; {
  ...
}

By default this is false.

TableGenSpaceAroundDAGArgColon (Boolean)

Insert the space around the colon inside a DAGArg list.

  let DAGArgIns = (ins i32 : $src1, i32 : $src2);

By default this is false.

TableGenBreakingDAGArgOperators (List of Strings)

Works only when TableGenBreakInsideDAGArgList is true.
The list needs to be consists of identifiers.
If any identifier is specified, this limits the effect of TableGenBreakInsideDAGArgList only on DAGArgs beginning with the specified identifiers.

For example the configuration,

TableGenBreakingDAGArgOperators: ['ins', 'outs']

makes the line break only occurs inside DAGArgs beginning with the specified identifiers 'ins' and 'outs'.

let DAGArgIns = (ins
    i32:$src1,
    i32:$src2
);

let DAGArgOthers = (others i32:$other1, i32:$other2);

let DAGArgBang = (!cast&lt;SomeType&gt;("Some") i32:$src1, i32:$src2)

AlignConsecutiveTableGenCondOperatorColons

Supports AlignConsecutiveStyle configuration.
Align the colons inside !cond operators.

let CondOpe1 = !cond(!eq(size, 1) : 1,
                     !eq(size, 16): 1,
                     true         : 0);

AlignConsecutiveTableGenBreakingDAGArgColons

Supports AlignConsecutiveStyle configuration.
This works only when TableGenBreakInsideDAGArgList is true and the DAGArg is not excepted by TableGenBreakingDAGArgOperators's effect.
Align the colon inside DAGArg which have line break inside.

let dagarg = (ins
    a  :$src1```
    aa :$src2
    aaa:$src3
)

AlignConsecutiveTableGenDefinitions

Supports AlignConsecutiveStyle configuration.
This aligns the inherits colons of consecutive definitions.

def Def       : Parent {}
def DefDef    : Parent {}
def DefDefDef : Parent {}

Patch is 68.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76059.diff

12 Files Affected:

  • (modified) clang/include/clang/Format/Format.h (+47)
  • (modified) clang/lib/Format/ContinuationIndenter.cpp (+16-2)
  • (modified) clang/lib/Format/Format.cpp (+29)
  • (modified) clang/lib/Format/FormatToken.h (+98)
  • (modified) clang/lib/Format/FormatTokenLexer.cpp (+141)
  • (modified) clang/lib/Format/FormatTokenLexer.h (+6)
  • (modified) clang/lib/Format/TokenAnnotator.cpp (+432-4)
  • (modified) clang/lib/Format/UnwrappedLineParser.cpp (+40-16)
  • (modified) clang/lib/Format/WhitespaceManager.cpp (+27-4)
  • (modified) clang/lib/Format/WhitespaceManager.h (+14)
  • (modified) clang/unittests/Format/FormatTestTableGen.cpp (+307)
  • (modified) clang/unittests/Format/TokenAnnotatorTest.cpp (+50)
diff --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 8604dea689f937..30a38aed99866e 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -396,6 +396,36 @@ struct FormatStyle {
   /// \version 17
   ShortCaseStatementsAlignmentStyle AlignConsecutiveShortCaseStatements;
 
+  /// Style of aligning consecutive TableGen cond operator colons.
+  /// \code
+  ///   !cond(!eq(size, 1) : 1,
+  ///         !eq(size, 16): 1,
+  ///         true         : 0)
+  /// \endcode
+  /// \version 18
+  AlignConsecutiveStyle AlignConsecutiveTableGenCondOperatorColons;
+
+  /// Style of aligning consecutive TableGen DAGArg operator colons.
+  /// Intended to be used with TableGenBreakInsideDAGArgList
+  /// \code
+  ///   let dagarg = (ins
+  ///       a  :$src1,
+  ///       aa :$src2,
+  ///       aaa:$src3
+  ///   )
+  /// \endcode
+  /// \version 18
+  AlignConsecutiveStyle AlignConsecutiveTableGenBreakingDAGArgColons;
+
+  /// Style of aligning consecutive TableGen def colons.
+  /// \code
+  ///   def Def       : Parent {}
+  ///   def DefDef    : Parent {}
+  ///   def DefDefDef : Parent {}
+  /// \endcode
+  /// \version 18
+  AlignConsecutiveStyle AlignConsecutiveTableGenDefinitions;
+
   /// Different styles for aligning escaped newlines.
   enum EscapedNewlineAlignmentStyle : int8_t {
     /// Don't align escaped newlines.
@@ -3037,6 +3067,7 @@ struct FormatStyle {
   bool isProto() const {
     return Language == LK_Proto || Language == LK_TextProto;
   }
+  bool isTableGen() const { return Language == LK_TableGen; }
 
   /// Language, this format style is targeted at.
   /// \version 3.5
@@ -4656,6 +4687,15 @@ struct FormatStyle {
   /// \version 8
   std::vector<std::string> StatementMacros;
 
+  /// Tablegen
+  bool TableGenAllowBreakBeforeInheritColon;
+  bool TableGenAllowBreakAfterInheritColon;
+  bool TableGenBreakInsideCondOperator;
+  bool TableGenBreakInsideDAGArgList;
+  bool TableGenPreferBreakInsideSquareBracket;
+  bool TableGenSpaceAroundDAGArgColon;
+  std::vector<std::string> TableGenBreakingDAGArgOperators;
+
   /// The number of columns used for tab stops.
   /// \version 3.7
   unsigned TabWidth;
@@ -4753,6 +4793,13 @@ struct FormatStyle {
            AlignConsecutiveMacros == R.AlignConsecutiveMacros &&
            AlignConsecutiveShortCaseStatements ==
                R.AlignConsecutiveShortCaseStatements &&
+           AlignConsecutiveTableGenCondOperatorColons ==
+               R.AlignConsecutiveTableGenCondOperatorColons &&
+           AlignConsecutiveTableGenBreakingDAGArgColons ==
+               R.AlignConsecutiveTableGenBreakingDAGArgColons &&
+           AlignConsecutiveTableGenDefinitions ==
+               R.AlignConsecutiveTableGenDefinitions &&
+           AlignConsecutiveMacros == R.AlignConsecutiveMacros &&
            AlignEscapedNewlines == R.AlignEscapedNewlines &&
            AlignOperands == R.AlignOperands &&
            AlignTrailingComments == R.AlignTrailingComments &&
diff --git a/clang/lib/Format/ContinuationIndenter.cpp b/clang/lib/Format/ContinuationIndenter.cpp
index bd319f21b05f86..176dc7744a5576 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -800,6 +800,7 @@ void ContinuationIndenter::addTokenOnCurrentLine(LineState &State, bool DryRun,
   if (Style.AlignAfterOpenBracket != FormatStyle::BAS_DontAlign &&
       !CurrentState.IsCSharpGenericTypeConstraint && Previous.opensScope() &&
       Previous.isNot(TT_ObjCMethodExpr) && Previous.isNot(TT_RequiresClause) &&
+      Previous.isNot(TT_TableGenDAGArgOpener) &&
       !(Current.MacroParent && Previous.MacroParent) &&
       (Current.isNot(TT_LineComment) ||
        Previous.isOneOf(BK_BracedInit, TT_VerilogMultiLineListLParen))) {
@@ -1229,7 +1230,7 @@ unsigned ContinuationIndenter::getNewLineColumn(const LineState &State) {
     return CurrentState.Indent;
   }
   if ((Current.isOneOf(tok::r_brace, tok::r_square) ||
-       (Current.is(tok::greater) && Style.isProto())) &&
+       (Current.is(tok::greater) && (Style.isProto() || Style.isTableGen()))) &&
       State.Stack.size() > 1) {
     if (Current.closesBlockOrBlockTypeList(Style))
       return State.Stack[State.Stack.size() - 2].NestedBlockIndent;
@@ -1257,6 +1258,12 @@ unsigned ContinuationIndenter::getNewLineColumn(const LineState &State) {
        Current.Next->isOneOf(tok::semi, tok::kw_const, tok::l_brace))) {
     return State.Stack[State.Stack.size() - 2].LastSpace;
   }
+  // When DAGArg closer exists top of line, it must be aligned in the similar
+  // way as function call above.
+  if (Current.is(tok::r_paren) && State.Stack.size() > 1 &&
+      (Current.isOneOf(TT_TableGenDAGArgCloser, TT_TableGenParamAngleCloser))) {
+    return State.Stack[State.Stack.size() - 2].LastSpace;
+  }
   if (Style.AlignAfterOpenBracket == FormatStyle::BAS_BlockIndent &&
       (Current.is(tok::r_paren) ||
        (Current.is(tok::r_brace) && Current.MatchingParen &&
@@ -1593,6 +1600,9 @@ unsigned ContinuationIndenter::moveStateToNextToken(LineState &State,
     State.StartOfStringLiteral = State.Column + 1;
   if (Current.is(TT_CSharpStringLiteral) && State.StartOfStringLiteral == 0) {
     State.StartOfStringLiteral = State.Column + 1;
+  } else if (Current.is(TT_TableGenMultiLineString) &&
+             State.StartOfStringLiteral == 0) {
+    State.StartOfStringLiteral = State.Column + 1;
   } else if (Current.isStringLiteral() && State.StartOfStringLiteral == 0) {
     State.StartOfStringLiteral = State.Column;
   } else if (!Current.isOneOf(tok::comment, tok::identifier, tok::hash) &&
@@ -1672,7 +1682,9 @@ void ContinuationIndenter::moveStatePastFakeLParens(LineState &State,
         (!Previous || Previous->isNot(tok::kw_return) ||
          (Style.Language != FormatStyle::LK_Java && PrecedenceLevel > 0)) &&
         (Style.AlignAfterOpenBracket != FormatStyle::BAS_DontAlign ||
-         PrecedenceLevel != prec::Comma || Current.NestingLevel == 0)) {
+         PrecedenceLevel != prec::Comma || Current.NestingLevel == 0) &&
+        (!Style.isTableGen() || !Previous ||
+         !Previous->isNot(TT_TableGenDAGArgListComma))) {
       NewParenState.Indent = std::max(
           std::max(State.Column, NewParenState.Indent), CurrentState.LastSpace);
     }
@@ -1915,6 +1927,8 @@ void ContinuationIndenter::moveStatePastScopeCloser(LineState &State) {
       (Current.isOneOf(tok::r_paren, tok::r_square, TT_TemplateString) ||
        (Current.is(tok::r_brace) && State.NextToken != State.Line->First) ||
        State.NextToken->is(TT_TemplateCloser) ||
+       State.NextToken->is(TT_TableGenParamAngleCloser) ||
+       State.NextToken->is(TT_TableGenListCloser) ||
        (Current.is(tok::greater) && Current.is(TT_DictLiteral)))) {
     State.Stack.pop_back();
   }
diff --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index 28271181e07d0c..9ba3d5dbd530e7 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -926,6 +926,12 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("AlignConsecutiveMacros", Style.AlignConsecutiveMacros);
     IO.mapOptional("AlignConsecutiveShortCaseStatements",
                    Style.AlignConsecutiveShortCaseStatements);
+    IO.mapOptional("AlignConsecutiveTableGenCondOperatorColons",
+                   Style.AlignConsecutiveTableGenCondOperatorColons);
+    IO.mapOptional("AlignConsecutiveTableGenBreakingDAGArgColons",
+                   Style.AlignConsecutiveTableGenBreakingDAGArgColons);
+    IO.mapOptional("AlignConsecutiveTableGenDefinitions",
+                   Style.AlignConsecutiveTableGenDefinitions);
     IO.mapOptional("AlignEscapedNewlines", Style.AlignEscapedNewlines);
     IO.mapOptional("AlignOperands", Style.AlignOperands);
     IO.mapOptional("AlignTrailingComments", Style.AlignTrailingComments);
@@ -1124,6 +1130,20 @@ template <> struct MappingTraits<FormatStyle> {
     IO.mapOptional("StatementAttributeLikeMacros",
                    Style.StatementAttributeLikeMacros);
     IO.mapOptional("StatementMacros", Style.StatementMacros);
+    IO.mapOptional("TableGenAllowBreakAfterInheritColon",
+                   Style.TableGenAllowBreakAfterInheritColon);
+    IO.mapOptional("TableGenAllowBreakBeforeInheritColon",
+                   Style.TableGenAllowBreakBeforeInheritColon);
+    IO.mapOptional("TableGenBreakInsideCondOperator",
+                   Style.TableGenBreakInsideCondOperator);
+    IO.mapOptional("TableGenBreakInsideDAGArgList",
+                   Style.TableGenBreakInsideDAGArgList);
+    IO.mapOptional("TableGenPreferBreakInsideSquareBracket",
+                   Style.TableGenPreferBreakInsideSquareBracket);
+    IO.mapOptional("TableGenSpaceAroundDAGArgColon",
+                   Style.TableGenSpaceAroundDAGArgColon);
+    IO.mapOptional("TableGenBreakingDAGArgOperators",
+                   Style.TableGenBreakingDAGArgOperators);
     IO.mapOptional("TabWidth", Style.TabWidth);
     IO.mapOptional("TypeNames", Style.TypeNames);
     IO.mapOptional("TypenameMacros", Style.TypenameMacros);
@@ -1437,6 +1457,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.AlignConsecutiveDeclarations = {};
   LLVMStyle.AlignConsecutiveMacros = {};
   LLVMStyle.AlignConsecutiveShortCaseStatements = {};
+  LLVMStyle.AlignConsecutiveTableGenBreakingDAGArgColons = {};
+  LLVMStyle.AlignConsecutiveTableGenCondOperatorColons = {};
+  LLVMStyle.AlignConsecutiveTableGenDefinitions = {};
   LLVMStyle.AlignTrailingComments = {};
   LLVMStyle.AlignTrailingComments.Kind = FormatStyle::TCAS_Always;
   LLVMStyle.AlignTrailingComments.OverEmptyLines = 0;
@@ -1585,6 +1608,12 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   LLVMStyle.StatementAttributeLikeMacros.push_back("Q_EMIT");
   LLVMStyle.StatementMacros.push_back("Q_UNUSED");
   LLVMStyle.StatementMacros.push_back("QT_REQUIRE_VERSION");
+  LLVMStyle.TableGenAllowBreakAfterInheritColon = true;
+  LLVMStyle.TableGenAllowBreakBeforeInheritColon = false;
+  LLVMStyle.TableGenBreakInsideCondOperator = true;
+  LLVMStyle.TableGenBreakInsideDAGArgList = false;
+  LLVMStyle.TableGenPreferBreakInsideSquareBracket = false;
+  LLVMStyle.TableGenSpaceAroundDAGArgColon = false;
   LLVMStyle.TabWidth = 8;
   LLVMStyle.UseTab = FormatStyle::UT_Never;
   LLVMStyle.VerilogBreakBetweenInstancePorts = true;
diff --git a/clang/lib/Format/FormatToken.h b/clang/lib/Format/FormatToken.h
index 3f9664f8f78a3e..bd16567233bb22 100644
--- a/clang/lib/Format/FormatToken.h
+++ b/clang/lib/Format/FormatToken.h
@@ -148,6 +148,23 @@ namespace format {
   TYPE(StructLBrace)                                                           \
   TYPE(StructRBrace)                                                           \
   TYPE(StructuredBindingLSquare)                                               \
+  TYPE(TableGenBangOperator)                                                   \
+  TYPE(TableGenCondOperator)                                                   \
+  TYPE(TableGenCondOperatorColon)                                              \
+  TYPE(TableGenCondOperatorComma)                                              \
+  TYPE(TableGenDAGArgCloser)                                                   \
+  TYPE(TableGenDAGArgListColon)                                                \
+  TYPE(TableGenDAGArgListColonToAlign)                                         \
+  TYPE(TableGenDAGArgListComma)                                                \
+  TYPE(TableGenDAGArgOpener)                                                   \
+  TYPE(TableGenDAGArgOperatorID)                                               \
+  TYPE(TableGenListCloser)                                                     \
+  TYPE(TableGenListOpener)                                                     \
+  TYPE(TableGenMultiLineString)                                                \
+  TYPE(TableGenParamAngleOpener)                                               \
+  TYPE(TableGenParamAngleCloser)                                               \
+  TYPE(TableGenTrailingPasteOperator)                                          \
+  TYPE(TableGenValueSuffix)                                                    \
   TYPE(TemplateCloser)                                                         \
   TYPE(TemplateOpener)                                                         \
   TYPE(TemplateString)                                                         \
@@ -671,6 +688,8 @@ struct FormatToken {
       return true;
     if (is(TT_DictLiteral) && is(tok::less))
       return true;
+    if (is(TT_TableGenParamAngleOpener))
+      return true;
     return isOneOf(tok::l_paren, tok::l_brace, tok::l_square,
                    TT_TemplateOpener);
   }
@@ -681,6 +700,10 @@ struct FormatToken {
       return true;
     if (is(TT_DictLiteral) && is(tok::greater))
       return true;
+    if (is(TT_TableGenParamAngleCloser))
+      return true;
+    if (is(TT_TableGenListCloser))
+      return true;
     return isOneOf(tok::r_paren, tok::r_brace, tok::r_square,
                    TT_TemplateCloser);
   }
@@ -1202,6 +1225,21 @@ struct AdditionalKeywords {
     kw_verilogHashHash = &IdentTable.get("##");
     kw_apostrophe = &IdentTable.get("\'");
 
+    // TableGen keywords
+    kw_bit = &IdentTable.get("bit");
+    kw_bits = &IdentTable.get("bits");
+    kw_code = &IdentTable.get("code");
+    kw_dag = &IdentTable.get("dag");
+    kw_def = &IdentTable.get("def");
+    kw_defm = &IdentTable.get("defm");
+    kw_defset = &IdentTable.get("defset");
+    kw_defvar = &IdentTable.get("defvar");
+    kw_dump = &IdentTable.get("dump");
+    kw_include = &IdentTable.get("include");
+    kw_list = &IdentTable.get("list");
+    kw_multiclass = &IdentTable.get("multiclass");
+    kw_then = &IdentTable.get("then");
+
     // Keep this at the end of the constructor to make sure everything here
     // is
     // already initialized.
@@ -1294,6 +1332,27 @@ struct AdditionalKeywords {
          kw_wildcard,     kw_wire,
          kw_with,         kw_wor,
          kw_verilogHash,  kw_verilogHashHash});
+
+    TableGenExtraKeywords = std::unordered_set<IdentifierInfo *>({
+        kw_assert,
+        kw_bit,
+        kw_bits,
+        kw_code,
+        kw_dag,
+        kw_def,
+        kw_defm,
+        kw_defset,
+        kw_defvar,
+        kw_dump,
+        kw_foreach,
+        kw_in,
+        kw_include,
+        kw_let,
+        kw_list,
+        kw_multiclass,
+        kw_string,
+        kw_then,
+    });
   }
 
   // Context sensitive keywords.
@@ -1539,6 +1598,21 @@ struct AdditionalKeywords {
   // Symbols in Verilog that don't exist in C++.
   IdentifierInfo *kw_apostrophe;
 
+  // TableGen keywords
+  IdentifierInfo *kw_bit;
+  IdentifierInfo *kw_bits;
+  IdentifierInfo *kw_code;
+  IdentifierInfo *kw_dag;
+  IdentifierInfo *kw_def;
+  IdentifierInfo *kw_defm;
+  IdentifierInfo *kw_defset;
+  IdentifierInfo *kw_defvar;
+  IdentifierInfo *kw_dump;
+  IdentifierInfo *kw_include;
+  IdentifierInfo *kw_list;
+  IdentifierInfo *kw_multiclass;
+  IdentifierInfo *kw_then;
+
   /// Returns \c true if \p Tok is a keyword or an identifier.
   bool isWordLike(const FormatToken &Tok) const {
     // getIdentifierinfo returns non-null for keywords as well as identifiers.
@@ -1811,6 +1885,27 @@ struct AdditionalKeywords {
     }
   }
 
+  bool isTableGenDefinition(const FormatToken &Tok) const {
+    return Tok.isOneOf(kw_def, kw_defm, kw_defset, kw_defvar, kw_multiclass,
+                       kw_let, tok::kw_class);
+  }
+
+  bool isTableGenKeyword(const FormatToken &Tok) const {
+    switch (Tok.Tok.getKind()) {
+    case tok::kw_class:
+    case tok::kw_else:
+    case tok::kw_false:
+    case tok::kw_if:
+    case tok::kw_int:
+    case tok::kw_true:
+      return true;
+    default:
+      return Tok.is(tok::identifier) &&
+             TableGenExtraKeywords.find(Tok.Tok.getIdentifierInfo()) !=
+                 TableGenExtraKeywords.end();
+    }
+  }
+
 private:
   /// The JavaScript keywords beyond the C++ keyword set.
   std::unordered_set<IdentifierInfo *> JsExtraKeywords;
@@ -1820,6 +1915,9 @@ struct AdditionalKeywords {
 
   /// The Verilog keywords beyond the C++ keyword set.
   std::unordered_set<IdentifierInfo *> VerilogExtraKeywords;
+
+  /// The TableGen keywords beyond the C++ keyword set.
+  std::unordered_set<IdentifierInfo *> TableGenExtraKeywords;
 };
 
 inline bool isLineComment(const FormatToken &FormatTok) {
diff --git a/clang/lib/Format/FormatTokenLexer.cpp b/clang/lib/Format/FormatTokenLexer.cpp
index 61430282c6f88c..8c2961ff1cbefa 100644
--- a/clang/lib/Format/FormatTokenLexer.cpp
+++ b/clang/lib/Format/FormatTokenLexer.cpp
@@ -93,6 +93,10 @@ ArrayRef<FormatToken *> FormatTokenLexer::lex() {
       // string literals are correctly identified.
       handleCSharpVerbatimAndInterpolatedStrings();
     }
+    if (Style.isTableGen()) {
+      handleTableGenMultilineString();
+      handleTableGenNumericLikeIdentifier();
+    }
     if (Tokens.back()->NewlinesBefore > 0 || Tokens.back()->IsMultiline)
       FirstInLineIndex = Tokens.size() - 1;
   } while (Tokens.back()->isNot(tok::eof));
@@ -272,6 +276,38 @@ void FormatTokenLexer::tryMergePreviousTokens() {
       return;
     }
   }
+  if (Style.isTableGen()) {
+    if (tryMergeTokens({tok::l_square, tok::l_brace},
+                       TT_TableGenMultiLineString)) {
+      Tokens.back()->Tok.setKind(tok::string_literal);
+      return;
+    }
+    if (Tokens.size() > 1 && Tokens.end()[-2]->is(tok::exclaim)) {
+      if (Tokens.back()->is(tok::identifier) ||
+          (Tokens.back()->is(tok::kw_if) && Tokens.back())) {
+      }
+    }
+    if (tryMergeTokens({tok::exclaim, tok::identifier},
+                       TT_TableGenBangOperator)) {
+      Tokens.back()->Tok.setKind(tok::identifier);
+      if (Tokens.back()->TokenText == "!cond")
+        Tokens.back()->setType(TT_TableGenCondOperator);
+      return;
+    }
+    if (tryMergeTokens({tok::exclaim, tok::kw_if}, TT_TableGenBangOperator)) {
+      // Here, "! if" becomes "!if". ! captures if even if the space exists.
+      // That is surely is a only one possibility in TableGen's syntax.
+      return;
+    }
+    if (tryMergeTokens({tok::plus, tok::numeric_constant}, TT_Unknown)) {
+      Tokens.back()->Tok.setKind(tok::numeric_constant);
+      return;
+    }
+    if (tryMergeTokens({tok::minus, tok::numeric_constant}, TT_Unknown)) {
+      Tokens.back()->Tok.setKind(tok::numeric_constant);
+      return;
+    }
+  }
 }
 
 bool FormatTokenLexer::tryMergeNSStringLiteral() {
@@ -763,6 +799,108 @@ void FormatTokenLexer::handleCSharpVerbatimAndInterpolatedStrings() {
   resetLexer(SourceMgr.getFileOffset(Lex->getSourceLocation(Offset + 1)));
 }
 
+void FormatTokenLexer::handleTableGenMultilineString() {
+  FormatToken *MultiLineString = Tokens.back();
+  if (MultiLineString->isNot(TT_TableGenMultiLineString)) {
+    // Multi line string starts with [{
+    return;
+  }
+  bool PrevIsRBrace = false;
+  const char *FirstBreak = nullptr;
+  const char *LastBreak = nullptr;
+  const char *Begin = MultiLineString->TokenText.begin();
+  for (const char *Current = Begin, *End = Lex->getBuffer().end();
+       Current != End; ++Current) {
+    if (*Current == ']' && PrevIsRBrace) {
+      // }] is the end of multi line string.
+      if (!FirstBreak)
+        FirstBreak = Current;
+      MultiLineString->TokenText = StringRef(Begin, Current - Begin + 1);
+      MultiLineString->ColumnWidth = encoding::columnWidthWithTabs(
+          StringRef(Begin, FirstBreak - Begin + 1),
+          MultiLineString->OriginalColumn, Style.TabWidth, Encoding);
+      if (LastBreak) {
+        MultiLineString->LastLineColumnWidth = encoding::columnWidthWithTabs(
+            StringRef(LastBreak + 1, Current - LastBreak),
+            MultiLineString->OriginalColumn, Style.TabWidth, Encoding);
+      }
+      resetLexer(SourceMgr.getFileOffset(Lex->getSourceLocation(Current + 1)));
+      return;
+    }
+    PrevIsRBrace = false;
+    if (*Current == '\n') {
+      MultiLineString->IsMultiline = true;
+      // Assure LastBreak is not equal to FirstBreak.
+      if (!FirstBreak)
+        FirstBreak = Current;
+      LastBreak = Current;
+      continue;
+    }
+    if (*Current == '}') {
+      // Memorize '}'. If ne...
[truncated]

@hnakamura5
Copy link
Contributor Author

The failure in Check code formatting seems to be caused by the commit (replacing to clang-format style)
5c60e2c

I avoid this by using clang-format binary from main branch in my local.
What should I do with this failure?

@owenca
Copy link
Contributor

owenca commented Dec 21, 2023

I avoid this by using clang-format binary from main branch in my local. What should I do with this failure?

You can ignore it if running the in-tree clang-format is clean, e.g.:

$ clang/tools/clang-format/git-clang-format --binary build/bin/clang-format HEAD~
clang-format did not modify any files
$ 

@hnakamura5
Copy link
Contributor Author

Thank you for your advice. I have checked the message "clang-format did not modify any files" in my local.

Currently, TableGen has its language style but the it does not works
well. This patch adds total support of TableGen formatting including
the support for the code (multi line string), DAG args, bang operators,
the cond operator, and the paste operators.
Copy link
Member

@rymiel rymiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is quite a big patch. I had a very quick look and noticed a few nit-picky thingy, mostly about formatting or naming and whatnot, mostly minor things.
However since I've never used tablegen, I'm not sure if I can actually review any of the logic in here, but I'll try to have another look later

R.AlignConsecutiveTableGenBreakingDAGArgColons &&
AlignConsecutiveTableGenDefinitions ==
R.AlignConsecutiveTableGenDefinitions &&
AlignConsecutiveMacros == R.AlignConsecutiveMacros &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small copy paste error, I'm assuming, AlignConsecutiveMacros has already been checked.
Also, these options should be alphabetical in order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your edits in Format.h won't show up in the documentation until after you've run the script in clang/docs/tools/dump_format_style.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information. I added the document.

Comment on lines 1133 to 1146
IO.mapOptional("TableGenAllowBreakAfterInheritColon",
Style.TableGenAllowBreakAfterInheritColon);
IO.mapOptional("TableGenAllowBreakBeforeInheritColon",
Style.TableGenAllowBreakBeforeInheritColon);
IO.mapOptional("TableGenBreakInsideCondOperator",
Style.TableGenBreakInsideCondOperator);
IO.mapOptional("TableGenBreakInsideDAGArgList",
Style.TableGenBreakInsideDAGArgList);
IO.mapOptional("TableGenPreferBreakInsideSquareBracket",
Style.TableGenPreferBreakInsideSquareBracket);
IO.mapOptional("TableGenSpaceAroundDAGArgColon",
Style.TableGenSpaceAroundDAGArgColon);
IO.mapOptional("TableGenBreakingDAGArgOperators",
Style.TableGenBreakingDAGArgOperators);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by these undocumented style options?

Comment on lines 285 to 289
if (Tokens.size() > 1 && Tokens.end()[-2]->is(tok::exclaim)) {
if (Tokens.back()->is(tok::identifier) ||
(Tokens.back()->is(tok::kw_if) && Tokens.back())) {
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement does nothing, or am I misunderstanding it?

@@ -111,6 +111,9 @@ const tooling::Replacements &WhitespaceManager::generateReplacements() {
alignConsecutiveDeclarations();
alignConsecutiveBitFields();
alignConsecutiveAssignments();
alignConsecutiveTableGenCondOperatorColons();
AlignConsecutiveTableGenBreakingDAGArgColons();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has different capitalization from all the others

Comment on lines 114 to 116
alignConsecutiveTableGenCondOperatorColons();
AlignConsecutiveTableGenBreakingDAGArgColons();
alignConsecutiveTableGenDefinition();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these replacements be guarded behind an isTableGen? At the very least, alignConsecutiveTableGenDefinition aligns on TT_InheritanceColon and that can appear in non-tablegen code too

@@ -40,6 +40,13 @@ class FormatTestTableGen : public ::testing::Test {
EXPECT_EQ(Code.str(), format(Code)) << "Expected code is not stable";
EXPECT_EQ(Code.str(), format(test::messUp(Code)));
}

static void verifyFormat(llvm::StringRef Code, const FormatStyle &Style) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this file was never brought up to date with all the other tests when f8d10d5 happened

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not noticed that. Thank you. I can also fix this. But now I am planing whether (and how) to split this growing pull request. A kind of refactoring may be better done after that.

Copy link
Contributor

@HazardyKnusperkeks HazardyKnusperkeks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all would gain a lot, if you could split this into multiple pull requests. The discussion can be more focused and you have a greater chance to get something merged. Similar to the Verilog changes.

/// def DefDefDef : Parent {}
/// \endcode
/// \version 18
AlignConsecutiveStyle AlignConsecutiveTableGenDefinitions;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you reuse AlignConsecutiveDeclarations and handle table gen there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The option AlignConsecutiveDeclarations aligns the identifier that is declared.
This AlignConsecutiveTableGenDefinitions option aligns the inheritance colon of def, that is the style often seen in TableGen file formatted by hand.
The name seems confusing. I changed the name to clarify it aligns colon.

/// )
/// \endcode
/// \version 18
AlignConsecutiveStyle AlignConsecutiveTableGenBreakingDAGArgColons;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort alphabetically.

@@ -4656,6 +4687,15 @@ struct FormatStyle {
/// \version 8
std::vector<std::string> StatementMacros;

/// Tablegen
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a bit more documentation here.

Comment on lines 4691 to 4692
bool TableGenAllowBreakBeforeInheritColon;
bool TableGenAllowBreakAfterInheritColon;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need both options to be enabled at the same time? Otherwise BreakInheritanceList should already be the option to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the information. I removed these options. Instead BreakInheritanceList works well. There happens some changes, but seems acceptable.

Comment on lines 703 to 708
if (is(TT_TableGenParamAngleCloser))
return true;
if (is(TT_TableGenListCloser))
return true;
return isOneOf(tok::r_paren, tok::r_brace, tok::r_square,
TT_TemplateCloser);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (is(TT_TableGenParamAngleCloser))
return true;
if (is(TT_TableGenListCloser))
return true;
return isOneOf(tok::r_paren, tok::r_brace, tok::r_square,
TT_TemplateCloser);
return isOneOf(tok::r_paren, tok::r_brace, tok::r_square,
TT_TemplateCloser, TT_TableGenParamAngleCloser, TT_TableGenListCloser);

auto Result = annotate(("def X { let V = " + Code + "; }").str(), Style);
return decltype(Result){Result.begin() + 6, Result.end() - 3};
};
// Both of bang/cond operators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Both of bang/cond operators
// Both of bang/cond operators.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Dec 23, 2023
- NOT fixed: Make FormatTestTableGen.cpp up to date to llvm@f8d10d5. I am planning to fix it in other pull request.
@hnakamura5
Copy link
Contributor Author

@rymiel @HazardyKnusperkeks
Thank you for your review!
I have fixed the points.
But for refactoring of the test base class in f8d10d5 .
I'm not sure I should do here, and if I should, I should do it in splitted pull request.

Now I really understand I should split this pull request into some parts. At first it is large and continue growing by adding documents.
I'm wondering how and current plan is separating semantically,

  • Handling multi line string (~100 lines).
  • Handling numeric like identifier (~100 lines).
  • Handling TableGen specific keywords (~100 lines)
  • Unwrapped line parsing(~100 lines).
  • Parse TableGen values (about 500+ lines including unittest).
  • Basic options (but for aligning ones) (about 500+ lines including the document).
  • Aligning options (about 100 lines including document).
  • Refactor unittests.

I'm not sure this is good plan. They may be complicated.
Could you help me to plan if you have some idea?

In addition, I do not know the appropriate way to split pull request after I made one. Is it enough to refer each other, and abort this at last?

@HazardyKnusperkeks
Copy link
Contributor

@rymiel @HazardyKnusperkeks Thank you for your review! I have fixed the points. But for refactoring of the test base class in f8d10d5 . I'm not sure I should do here, and if I should, I should do it in splitted pull request.

Now I really understand I should split this pull request into some parts. At first it is large and continue growing by adding documents. I'm wondering how and current plan is separating semantically,

* Handling multi line string (~100 lines).

* Handling numeric like identifier (~100 lines).

* Handling TableGen specific keywords (~100 lines)

* Unwrapped line parsing(~100 lines).

* Parse TableGen values (about 500+ lines including unittest).

* Basic options (but for aligning ones) (about 500+ lines including the document).

* Aligning options (about 100 lines including document).

* Refactor unittests.

I'm not sure this is good plan. They may be complicated. Could you help me to plan if you have some idea?

In addition, I do not know the appropriate way to split pull request after I made one. Is it enough to refer each other, and abort this at last?

Just create a pull request per feature/bug. You can sill use this for one of them, or abandon this. If some of them build on each other there is, as far as I know, no good way.

The few times I came to this I created multiple PRs and the top most commit(s) is/are part of multiple PRs and I mentioned this. Then we have to make sure that they are merged correctly. With multiple local branches you can still work on all of them in parallel.

My experience (here and at work) is, that smaller changes are easier to check and you get a faster response. Because 50~150 lines changes I can just review and move on. A 500+ change review will more likely be postponed.

@hnakamura5
Copy link
Contributor Author

Thanks to the advises, I begin to split this into several parts.

Made the keywords part in #77477 .

HazardyKnusperkeks pushed a commit that referenced this pull request Jan 11, 2024
Add TableGen keywords to the additional keyword list of the formatter.

This pull request is the splited part from
#76059 .
@hnakamura5
Copy link
Contributor Author

Multi line string part #78032

@hnakamura5
Copy link
Contributor Author

Numeric like identifiers part #78571

@hnakamura5
Copy link
Contributor Author

unwrapped line parser for statements part #78846 .

@hnakamura5
Copy link
Contributor Author

Numeric like identifiers part #78571

@hnakamura5
Copy link
Contributor Author

bang operators and numeric literals part #78996 .

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
Add TableGen keywords to the additional keyword list of the formatter.

This pull request is the splited part from
llvm#76059 .
@hnakamura5
Copy link
Contributor Author

Alignment option for cond operator: #82878.

@hnakamura5
Copy link
Contributor Author

Alignment option for definitions: #83008.

@hnakamura5
Copy link
Contributor Author

Break options for DAGArg: #83149.

@hnakamura5
Copy link
Contributor Author

Alignment option for DAGArg: #86150

@hnakamura5
Copy link
Contributor Author

All the split parts of this PR is merged. Thank you!

@hnakamura5 hnakamura5 closed this Mar 22, 2024
@owenca owenca removed the clang Clang issues not falling into any other category label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants