Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Clang][Comments] Support for parsing headers in Doxygen \par commands #91100

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hdoc
Copy link

@hdoc hdoc commented May 5, 2024

Background

Doxygen's \par command (link) has an optional argument, which denotes the header of the paragraph started by a given \par command.

In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code:

$ cat test.cpp
/// \par User defined paragraph:
/// Contents of the paragraph.
///
/// \par
/// New paragraph under the same heading.
///
/// \par
/// A second paragraph.
class A {};

$ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp
`-CXXRecordDecl 0x1530f3a78 <test.cpp:11:1, col:10> col:7 class A definition
  |-FullComment 0x1530fea38 <line:2:4, line:9:23>
  | |-ParagraphComment 0x1530fe7e0 <line:2:4>
  | | `-TextComment 0x1530fe7b8 <col:4> Text=" "
  | |-BlockCommandComment 0x1530fe800 <col:5, line:3:30> Name="par"
  | | `-ParagraphComment 0x1530fe878 <line:2:9, line:3:30>
  | |   |-TextComment 0x1530fe828 <line:2:9, col:32> Text=" User defined paragraph:"
  | |   `-TextComment 0x1530fe848 <line:3:4, col:30> Text=" Contents of the paragraph."
  | |-ParagraphComment 0x1530fe8c0 <line:5:4>
  | | `-TextComment 0x1530fe898 <col:4> Text=" "
  | |-BlockCommandComment 0x1530fe8e0 <col:5, line:6:41> Name="par"
  | | `-ParagraphComment 0x1530fe930 <col:4, col:41>
  | |   `-TextComment 0x1530fe908 <col:4, col:41> Text=" New paragraph under the same heading."
  | |-ParagraphComment 0x1530fe978 <line:8:4>
  | | `-TextComment 0x1530fe950 <col:4> Text=" "
  | `-BlockCommandComment 0x1530fe998 <col:5, line:9:23> Name="par"
  |   `-ParagraphComment 0x1530fe9e8 <col:4, col:23>
  |     `-TextComment 0x1530fe9c0 <col:4, col:23> Text=" A second paragraph."
  `-CXXRecordDecl 0x1530f3bb0 <line:11:1, col:7> col:7 implicit class A

As we can see above, the optional paragraph heading ("User defined paragraph") is not an argument of the \par BlockCommandComment, but instead a child TextComment.

For documentation generators like hdoc, it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible.

Change

This change parses \par command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition:

  • AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc.
  • TableGen is updated with an IsParCommand to support this functionality
  • lit tests are updated where needed

Copy link

github-actions bot commented May 5, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 5, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 5, 2024

@llvm/pr-subscribers-clang

Author: hdoc (hdoc)

Changes

Background

Doxygen's \par command (link) has an optional argument, which denotes the header of the paragraph started by a given \par command.

In short, the paragraph command can be used with a heading, or without one. The code block below shows both forms and how the current version of LLVM/Clang parses this code:

$ cat test.cpp
/// \par User defined paragraph:
/// Contents of the paragraph.
///
/// \par
/// New paragraph under the same heading.
///
/// \par
/// A second paragraph.
class A {};
$ clang++ -cc1 -ast-dump -fcolor-diagnostics -std=c++20 test.cpp `-CXXRecordDecl 0x1530f3a78 &lt;test.cpp:11:1, col:10&gt; col:7 class A definition
|-FullComment 0x1530fea38 &lt;line:2:4, line:9:23&gt; | |-ParagraphComment 0x1530fe7e0 &lt;line:2:4&gt;
| | `-TextComment 0x1530fe7b8 &lt;col:4&gt; Text=" "
| |-BlockCommandComment 0x1530fe800 &lt;col:5, line:3:30&gt; Name="par"
| | `-ParagraphComment 0x1530fe878 &lt;line:2:9, line:3:30&gt;
| | |-TextComment 0x1530fe828 &lt;line:2:9, col:32&gt; Text=" User defined paragraph:"
| | `-TextComment 0x1530fe848 &lt;line:3:4, col:30&gt; Text=" Contents of the paragraph." | |-ParagraphComment 0x1530fe8c0 &lt;line:5:4&gt;
| | `-TextComment 0x1530fe898 &lt;col:4&gt; Text=" "
| |-BlockCommandComment 0x1530fe8e0 &lt;col:5, line:6:41&gt; Name="par"
| | `-ParagraphComment 0x1530fe930 &lt;col:4, col:41&gt;
| | `-TextComment 0x1530fe908 &lt;col:4, col:41&gt; Text=" New paragraph under the same heading." | |-ParagraphComment 0x1530fe978 &lt;line:8:4&gt;
| | `-TextComment 0x1530fe950 &lt;col:4&gt; Text=" "
|
|
|
`-CXXRecordDecl 0x1530f3bb0 &lt;line:11:1, col:7&gt; col:7 implicit class A
`-BlockCommandComment 0x1530fe998 &lt;col:5, line:9:23&gt; Name="par" `-ParagraphComment 0x1530fe9e8 &lt;col:4, col:23&gt;
`-TextComment 0x1530fe9c0 &lt;col:4, col:23&gt; Text=" A second paragraph."

As we can see above, the optional paragraph heading ("User defined paragraph") is not an argument of the \par BlockCommandComment, but instead a child TextComment.

For documentation generators like hdoc, it would be ideal if we could parse Doxygen documentation comments with these semantics in mind. Currently that's not possible.

Change

This change parses \par command according to how Doxygen parses them, making an optional header available as a an argument if it is present. In addition:

  • AST unit tests are defined to test this functionality when an argument is present, isn't present, with additional spacing, etc.
  • TableGen is updated with an IsParCommand to support this functionality
  • lit tests are updated where needed

Full diff: https://github.com/llvm/llvm-project/pull/91100.diff

7 Files Affected:

  • (modified) clang/include/clang/AST/CommentCommandTraits.h (+4)
  • (modified) clang/include/clang/AST/CommentCommands.td (+2-1)
  • (modified) clang/include/clang/AST/CommentParser.h (+3-1)
  • (modified) clang/lib/AST/CommentParser.cpp (+77)
  • (modified) clang/test/Index/comment-misc-tags.m (+3-5)
  • (modified) clang/unittests/AST/CommentParser.cpp (+138-1)
  • (modified) clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp (+1)
diff --git a/clang/include/clang/AST/CommentCommandTraits.h b/clang/include/clang/AST/CommentCommandTraits.h
index 0c3254d84eb000..78c484fff3aede 100644
--- a/clang/include/clang/AST/CommentCommandTraits.h
+++ b/clang/include/clang/AST/CommentCommandTraits.h
@@ -88,6 +88,10 @@ struct CommandInfo {
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsHeaderfileCommand : 1;
 
+  /// True if this is a \\par command.
+  LLVM_PREFERRED_TYPE(bool)
+  unsigned IsParCommand : 1;
+
   /// True if we don't want to warn about this command being passed an empty
   /// paragraph.  Meaningful only for block commands.
   LLVM_PREFERRED_TYPE(bool)
diff --git a/clang/include/clang/AST/CommentCommands.td b/clang/include/clang/AST/CommentCommands.td
index e839031752cdd8..5fd687b0d8991f 100644
--- a/clang/include/clang/AST/CommentCommands.td
+++ b/clang/include/clang/AST/CommentCommands.td
@@ -18,6 +18,7 @@ class Command<string name> {
   bit IsThrowsCommand = 0;
   bit IsDeprecatedCommand = 0;
   bit IsHeaderfileCommand = 0;
+  bit IsParCommand = 0;
 
   bit IsEmptyParagraphAllowed = 0;
 
@@ -156,7 +157,7 @@ def Date       : BlockCommand<"date">;
 def Invariant  : BlockCommand<"invariant">;
 def Li         : BlockCommand<"li">;
 def Note       : BlockCommand<"note">;
-def Par        : BlockCommand<"par">;
+def Par        : BlockCommand<"par"> { let IsParCommand = 1; let NumArgs = 1; }
 def Post       : BlockCommand<"post">;
 def Pre        : BlockCommand<"pre">;
 def Remark     : BlockCommand<"remark">;
diff --git a/clang/include/clang/AST/CommentParser.h b/clang/include/clang/AST/CommentParser.h
index e11e818b1af0a1..b5f1c6c19f0ce7 100644
--- a/clang/include/clang/AST/CommentParser.h
+++ b/clang/include/clang/AST/CommentParser.h
@@ -100,6 +100,9 @@ class Parser {
   ArrayRef<Comment::Argument>
   parseCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
 
+  ArrayRef<Comment::Argument>
+  parseParCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs);
+
   BlockCommandComment *parseBlockCommand();
   InlineCommandComment *parseInlineCommand();
 
@@ -118,4 +121,3 @@ class Parser {
 } // end namespace clang
 
 #endif
-
diff --git a/clang/lib/AST/CommentParser.cpp b/clang/lib/AST/CommentParser.cpp
index 8adfd85d0160c3..54760f5ba932eb 100644
--- a/clang/lib/AST/CommentParser.cpp
+++ b/clang/lib/AST/CommentParser.cpp
@@ -149,6 +149,63 @@ class TextTokenRetokenizer {
     addToken();
   }
 
+  /// Check if this line starts with @par or \par
+  bool startsWithParCommand() {
+    unsigned Offset = 1;
+
+    /// Skip all whitespace characters at the beginning.
+    /// This needs to backtrack because Pos has already advanced past the
+    /// actual \par or @par command by the time this function is called.
+    while (isWhitespace(*(Pos.BufferPtr - Offset)))
+      Offset++;
+
+    /// Check if next four characters are \par or @par
+    llvm::StringRef LineStart(Pos.BufferPtr - 5, 4);
+    return LineStart.starts_with("\\par") || LineStart.starts_with("@par");
+  }
+
+  /// Extract a par command argument-header.
+  bool lexParHeading(Token &Tok) {
+    if (isEnd())
+      return false;
+
+    Position SavedPos = Pos;
+
+    consumeWhitespace();
+    SmallString<32> WordText;
+    const char *WordBegin = Pos.BufferPtr;
+    SourceLocation Loc = getSourceLocation();
+
+    if (!startsWithParCommand())
+      return false;
+
+    // Read until the end of this token, which is effectively the end of the
+    // line This gets us the content of the par header, if there is one.
+    while (!isEnd()) {
+      WordText.push_back(peek());
+      if (Pos.BufferPtr + 1 == Pos.BufferEnd) {
+        consumeChar();
+        break;
+      } else {
+        consumeChar();
+      }
+    }
+
+    const unsigned Length = WordText.size();
+    if (Length == 0) {
+      Pos = SavedPos;
+      return false;
+    }
+
+    char *TextPtr = Allocator.Allocate<char>(Length + 1);
+
+    memcpy(TextPtr, WordText.c_str(), Length + 1);
+    StringRef Text = StringRef(TextPtr, Length);
+
+    formTokenWithChars(Tok, Loc, WordBegin, Length, Text);
+    return true;
+  }
+
   /// Extract a word -- sequence of non-whitespace characters.
   bool lexWord(Token &Tok) {
     if (isEnd())
@@ -304,6 +361,23 @@ Parser::parseCommandArgs(TextTokenRetokenizer &Retokenizer, unsigned NumArgs) {
   return llvm::ArrayRef(Args, ParsedArgs);
 }
 
+ArrayRef<Comment::Argument>
+Parser::parseParCommandArgs(TextTokenRetokenizer &Retokenizer,
+                            unsigned NumArgs) {
+  auto *Args = new (Allocator.Allocate<Comment::Argument>(NumArgs))
+      Comment::Argument[NumArgs];
+  unsigned ParsedArgs = 0;
+  Token Arg;
+
+  while (ParsedArgs < NumArgs && Retokenizer.lexParHeading(Arg)) {
+    Args[ParsedArgs] = Comment::Argument{
+        SourceRange(Arg.getLocation(), Arg.getEndLocation()), Arg.getText()};
+    ParsedArgs++;
+  }
+
+  return llvm::ArrayRef(Args, ParsedArgs);
+}
+
 BlockCommandComment *Parser::parseBlockCommand() {
   assert(Tok.is(tok::backslash_command) || Tok.is(tok::at_command));
 
@@ -356,6 +430,9 @@ BlockCommandComment *Parser::parseBlockCommand() {
       parseParamCommandArgs(PC, Retokenizer);
     else if (TPC)
       parseTParamCommandArgs(TPC, Retokenizer);
+    else if (Info->IsParCommand)
+      S.actOnBlockCommandArgs(BC,
+                              parseParCommandArgs(Retokenizer, Info->NumArgs));
     else
       S.actOnBlockCommandArgs(BC, parseCommandArgs(Retokenizer, Info->NumArgs));
 
diff --git a/clang/test/Index/comment-misc-tags.m b/clang/test/Index/comment-misc-tags.m
index 47ee9d9aa392ab..6d018dbfcf193d 100644
--- a/clang/test/Index/comment-misc-tags.m
+++ b/clang/test/Index/comment-misc-tags.m
@@ -91,18 +91,16 @@ @interface IOCommandGate
   
 struct Test {int filler;};
 
-// CHECK:       (CXComment_BlockCommand CommandName=[par]
+// CHECK:       (CXComment_BlockCommand CommandName=[par] Arg[0]=User defined paragraph:
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:        (CXComment_Text Text=[ User defined paragraph:] HasTrailingNewline)
 // CHECK-NEXT:        (CXComment_Text Text=[ Contents of the paragraph.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[par]
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:        (CXComment_Text Text=[ New paragraph under the same heading.])))
+// CHECK-NEXT:        (CXComment_Text Text=[New paragraph under the same heading.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[note]
 // CHECK-NEXT:     (CXComment_Paragraph
 // CHECK-NEXT:        (CXComment_Text Text=[ This note consists of two paragraphs.] HasTrailingNewline)
 // CHECK-NEXT:        (CXComment_Text Text=[ This is the first paragraph.])))
 // CHECK:       (CXComment_BlockCommand CommandName=[par]
 // CHECK-NEXT:     (CXComment_Paragraph
-// CHECK-NEXT:     (CXComment_Text Text=[ And this is the second paragraph.])))
-
+// CHECK-NEXT:     (CXComment_Text Text=[And this is the second paragraph.])))
diff --git a/clang/unittests/AST/CommentParser.cpp b/clang/unittests/AST/CommentParser.cpp
index c3479672ae2a3c..d9150a848fcaf3 100644
--- a/clang/unittests/AST/CommentParser.cpp
+++ b/clang/unittests/AST/CommentParser.cpp
@@ -1427,8 +1427,145 @@ TEST_F(CommentParserTest, Deprecated) {
   }
 }
 
+TEST_F(CommentParserTest, ParCommandHasArg1) {
+  const char *Sources[] = {
+      "/// @par Paragraph header:",     "/// @par Paragraph header:\n",
+      "/// @par Paragraph header:\r\n", "/// @par Paragraph header:\n\r",
+      "/** @par Paragraph header:*/",
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 0));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg2) {
+  const char *Sources[] = {
+      "/// @par Paragraph header: ",     "/// @par Paragraph header: \n",
+      "/// @par Paragraph header: \r\n", "/// @par Paragraph header: \n\r",
+      "/** @par Paragraph header: */",
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 0));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header: ");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg3) {
+  const char *Sources[] = {
+      ("/// @par Paragraph header:\n"
+       "/// Paragraph body"),
+      ("/// @par Paragraph header:\r\n"
+       "/// Paragraph body"),
+      ("/// @par Paragraph header:\n\r"
+       "/// Paragraph body"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 1));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg4) {
+  const char *Sources[] = {
+      ("/// @par Paragraph header:\n"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+      ("/// @par Paragraph header:\r\n"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+      ("/// @par Paragraph header:\n\r"
+       "/// Paragraph body1\n"
+       "/// Paragraph body2"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 2));
+      ASSERT_TRUE(BCC->getNumArgs() == 1);
+      ASSERT_TRUE(BCC->getArgText(0) == "Paragraph header:");
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body1");
+      ASSERT_TRUE(GetChildAt(PC, 1, TC));
+      ASSERT_TRUE(TC->getText() == " Paragraph body2");
+    }
+  }
+}
+
+TEST_F(CommentParserTest, ParCommandHasArg5) {
+  const char *Sources[] = {
+      ("/// @par \n"
+       "/// Paragraphs with no text before newline have no heading"),
+      ("/// @par \r\n"
+       "/// Paragraphs with no text before newline have no heading"),
+      ("/// @par \n\r"
+       "/// Paragraphs with no text before newline have no heading"),
+  };
+
+  for (size_t i = 0, e = std::size(Sources); i != e; i++) {
+    FullComment *FC = parseString(Sources[i]);
+    ASSERT_TRUE(HasChildCount(FC, 2));
+
+    ASSERT_TRUE(HasParagraphCommentAt(FC, 0, " "));
+    {
+      BlockCommandComment *BCC;
+      ParagraphComment *PC;
+      TextComment *TC;
+      ASSERT_TRUE(HasBlockCommandAt(FC, Traits, 1, BCC, "par", PC));
+      ASSERT_TRUE(HasChildCount(PC, 1));
+      ASSERT_TRUE(BCC->getNumArgs() == 0);
+      ASSERT_TRUE(GetChildAt(PC, 0, TC));
+      ASSERT_TRUE(TC->getText() ==
+                  "Paragraphs with no text before newline have no heading");
+    }
+  }
+}
+
 } // unnamed namespace
 
 } // end namespace comments
 } // end namespace clang
-
diff --git a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
index a113b02e19995d..07b26dc2f6b8be 100644
--- a/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
+++ b/clang/utils/TableGen/ClangCommentCommandInfoEmitter.cpp
@@ -44,6 +44,7 @@ void clang::EmitClangCommentCommandInfo(RecordKeeper &Records,
        << Tag.getValueAsBit("IsThrowsCommand") << ", "
        << Tag.getValueAsBit("IsDeprecatedCommand") << ", "
        << Tag.getValueAsBit("IsHeaderfileCommand") << ", "
+       << Tag.getValueAsBit("IsParCommand") << ", "
        << Tag.getValueAsBit("IsEmptyParagraphAllowed") << ", "
        << Tag.getValueAsBit("IsVerbatimBlockCommand") << ", "
        << Tag.getValueAsBit("IsVerbatimBlockEndCommand") << ", "

@hdoc hdoc force-pushed the dev/par-command-argument-support branch from ff4e6f3 to 2b18453 Compare May 5, 2024 02:51
Comment on lines 154 to 160
unsigned Offset = 1;

/// Skip all whitespace characters at the beginning.
/// This needs to backtrack because Pos has already advanced past the
/// actual \par or @par command by the time this function is called.
while (isWhitespace(*(Pos.BufferPtr - Offset)))
Offset++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use StringRef::ltrim() to handle this instead of doing it manually.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we actually have to read the buffer backwards to find the @par, past Pos.BufferStart so it's a bit inconvenient to construct the StringRef which starts at some unknown position before Pos.BufferStart. In theory someone could have put 400 spaces between @par and the comment text, and we'd have to accommodate that when constructing the StringRef.

I think the solution as it's written right now handles that well enough. I could make it count spaces or look up the previous token containing the @par and compute the correct StringRef for it, but I think this is good enough.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think what's got me confused is that you calculate Offset but never actually use it (or modify Pos.BufferPtr). I was thinking you would have been able to use ltrim().starts_with(...) but I see now that the issue is we're at the start of the comment, beyond seeing par, and we want to backtrack.

But the logic below seems to be that we already know we're parsing a par command, so why do we need to do this check in the first place? (e.g., we get into this code path via:

 else if (Info->IsParCommand)
      S.actOnBlockCommandArgs(BC,
                              parseParCommandArgs(Retokenizer, Info->NumArgs));

and eventually land here, so don't we already know this is a par command?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is needed because Doxygen interprets the @par command differently if there is text on the same line or not.

If there is text on the same line, such as in Example 1 below, then the text is interpreted to be the paragraph heading, and that heading is stored as an argument for the comment. If there is no text on the same line, i.e. @par is alone on the line as in Example 2, then the paragraph is interpreted to have no heading.

/// @par I am a paragraph heading
void example1() {}

/// @par 
/// I am a paragraph with no heading
void example2() {}

To be able to correctly parse the paragraph heading, the lexer needs to go and check where the @par actually is, because if it's on the same line as some text we need to slurp that text and put it into the Arg for the comment. This code implements the functionality for this check by backtracking until it sees whitespace, and then checking whether the text immediately preceding that whitespace is a @par command or not.

All that being said, your feedback is helpful and I did find a small bug while re-checking this code. Offset should be used, otherwise the code has an implicit assumption that there is only one char of whitespace between the text and the @par. I've fixed that in the latest commit. Thank you for pointing that out :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than backtracking, can't we still do a peek ahead? e.g., once we've seen @par, we can do a peek to see if the next token is a newline or not (or do a scan through the buffer to see if there is a non-whitespace character or only whitespace ending with a newline/eof)?

clang/lib/AST/CommentParser.cpp Outdated Show resolved Hide resolved
clang/lib/AST/CommentParser.cpp Outdated Show resolved Hide resolved
clang/lib/AST/CommentParser.cpp Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants