Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][test] add testing for the AST matcher reference #94248

Open
wants to merge 10 commits into
base: users/5chmidti/rm_not_needed_run_overload_in_BoundNodesCallback
Choose a base branch
from

Conversation

5chmidti
Copy link
Contributor

@5chmidti 5chmidti commented Jun 3, 2024

Problem Statement

Previously, the examples in the AST matcher reference, which gets generated by the doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

Solution

This patch introduces a simple DSL around doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy to use.
In ASTMatchers.h, most matchers are documented with a doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that the documentation is tested, is by using doxygen's alias feature to declare custom aliases. These aliases forward to <tt>text</tt> (which is what doxygen's \c does, but for multiple words). Using the doxygen aliases is the obvious choice, because there are (now) four consumers:

  • people reading the header/using signature help
  • the doxygen generated documentation
  • the generated html AST matcher reference
  • (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a doxygen comment) and provides diagnostics and statistics about the matchers.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building and the script will only print something if it found an issue (compile failure, parsing issues, the expected and actual number of failures differs).

Description

DSL for generating the tests from documentation.

TLDR:

  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, the std flag supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one or more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (not to the
                         previous case)

The above block can be repeated inside a doxygen command for multiple code examples for a single matcher.
The test generation script will only look for these annotations and ignore anything else like \c or the sentences where these annotations are embedded into: The matcher \matcher{expr()} matches the number \match{42}..

Language Grammar

[] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count || sub
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases

Language Standard Versions

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

  • c all available versions of C
  • c++11 only C++11
  • c++11-or-later C++11 or later
  • c++11-or-earlier C++11 or earlier
  • c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
    23 (inclusive)
  • c++11-23,c same as above

Tags

type:

Match types are used to select where the string that is used to check if a node matches comes from.
Available: code, name, typestr, typeofstr. The default is code.

  • code: Forwards to tooling::fixit::getText(...) and should be the preferred way to show what matches.
  • name: Casts the match to a NamedDecl and returns the result of getNameAsString. Useful when the matched AST node is not easy to spell out (code type), e.g., namespaces or classes with many members.
  • typestr: Returns the result of QualType::getAsString for the type derived from Type (otherwise, if it is derived from Decl, recurses with Node->getTypeForDecl())

Matcher types are used to mark matchers as sub-matcher with 'sub' or as deactivated using 'none'. Testing sub-matcher is not implemented.

count:

Specifying a 'count=n' on a match will result in a test that requires that the specified match will be matched n times. Default is 1.

std:

A match allows specifying if it matches only in specific language versions. This may be needed when the AST differs between language versions.

sub:

The sub tag on a \match will indicate that the match is for a node of a bound sub-matcher.
E.g., \matcher{expr(expr().bind("inner"))} has a sub-matcher that binds to inner, which is the value for the sub tag of the expected match for the sub-matcher \match{sub=inner$...}. Currently, sub-matchers are not tested in any way.

What if ...?

... I want to add a matcher?

Add a doxygen comment to the matcher with a code example, corresponding matchers and matches, that shows what the matcher is supposed to do. Specify the compile arguments/supported languages if required, and run ninja check-clang-unit to test the documentation.

... the example I wrote is wrong?

The test-generation script will try to compile your example code before it continues. This makes finding issues with your example code easier because the test-failures are much more verbose.

The test-failure output of the generated test file will provide information about

  • where the generated test file is located
  • which line in ASTMatcher.h the example is from
  • which matches were: found, not-(yet)-found, expected
  • in case of an unexpected match: what the node looks like using the different types
  • the language version and if the test ran with a windows -target flag (also in failure summary)

... I don't adhere to the required order of the syntax?

The script will diagnose any found issues, such as matcher is missing an example with a file:line: prefix,
which should provide enough information about the issue.

... the script diagnoses a false-positive issue with a doxygen comment?

It hopefully shouldn't, but if you, e.g., added some non-matcher code and documented it with doxygen, then the script will consider that as a matcher documentation. As a result, the script will print that it detected a mismatch between the actual and the expected number of failures. If the diagnostic truly is a false-positive, change the expected_failure_statistics at the top of the generate_ast_matcher_doc_tests.py file.

Fixes #57607
Fixes #63748

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 3, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jun 3, 2024

@llvm/pr-subscribers-clang

Author: Julian Schmidt (5chmidti)

Changes

Previously, the examples in the AST matcher reference, which gets generated by the doxygen comments in ASTMatchers.h, were untested and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

This patch introduces a simple DSL around doxygen commands to enable testing the AST matcher documentation in a way that should be relatively easy.
In ASTMatchers.h, most matchers are documented with a doxygen comment. Most of these also have a code example that aims to show what the matcher will match, given a matcher somewhere in the documentation text. The way that testing the documentation is done, is by using doxygens alias feature to declare custom aliases. These aliases forward to &lt;tt&gt;text&lt;/tt&gt; (which is what doxygens \c does, but for multiple words). Using the doxygen aliases was the obvious choice, because there are (now) four consumers:

  • people reading the header/using signature help
  • the doxygen generated documentation
  • the generated html AST matcher reference
  • (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers have a documented example.
The new generate_ast_matcher_doc_tests.py script will warn on any undocumented matchers (but not on matchers without a doxygen comment) and provides diagnostics and statistics about the matchers. Below is a file-level comment from the test generation script that describes how documenting matchers to be tested works on a slightly more technical level. In general, the new comments can be used as a reference for how to implement a tested documentation.

The current statistics emitted by the parser are:

Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6

The tests are generated during building and the script will only print something if it found an issue (compile failure, parsing issues, the expected and actual number of failures differs).

DSL for generating the tests from documentation.

TLDR:
The order for a single code snippet example is:

\header{a.h}
\endheader <- zero or more header

\code
int a = 42;
\endcode
\compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and
whole languages

\matcher{expr()} <- one or more matchers in succession
\match{42} <- one or more matches in succession

\matcher{varDecl()} <- new matcher resets the context, the above
\match will not count for this new
matcher(-group)
\match{int a = 42} <- only applies to the previous matcher (no the
previous case)

The above block can be repeated inside a doxygen command for multiple code examples.

Language Grammar:
[] denotes an optional, and <> denotes user-input

compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
matcher_tag_key ::= type
match_tag_key ::= type || std || count
matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
matcher ::= \matcher{[matcher_tags$]<matcher>}
matchers ::= [matcher] matcher
match ::= \match{[match_tags$]<match>}
matches ::= [match] match
case ::= matchers matches
cases ::= [case] case
header-block ::= \header{<name>} <code> \endheader
code-block ::= \code <code> \endcode
testcase ::= code-block [compile_args] cases

The 'std' tag and '\compile_args' support specifying a specific language version, a whole language and all of its versions, and thresholds (implies ranges). Multiple arguments are passed with a ',' separator. For a language and version to execute a tested matcher, it has to match the specified '\compile_args' for the code, and the 'std' tag for the matcher. Predicates for the 'std' compiler flag are used with disjunction between languages (e.g. 'c || c++') and conjunction for all predicates specific to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:

  • c all available versions of C
  • c++11 only C++11
  • c++11-or-later C++11 or later
  • c++11-or-earlier C++11 or earlier
  • c++11-or-later,c++23-or-earlier,c all of C and C++ between 11 and
    23 (inclusive)
  • c++11-23,c same as above

Tags:

Type:
Match types are used to select where the string that is used to check if
a node matches comes from.
Available: code, name, typestr, typeofstr.
The default is 'code'.

Matcher types are used to mark matchers as submatchers with 'sub' or as
deactivated using 'none'. Testing submatchers is not implemented.

Count:
Specifying a 'count=n' on a match will result in a test that requires that
the specified match will be matched n times. Default is 1.

Std:
A match allows specifying if it matches only in specific language versions.
This may be needed when the AST differs between language versions.

Fixes #57607
Fixes #63748


Patch is 899.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/94248.diff

8 Files Affected:

  • (modified) clang/docs/LibASTMatchersReference.html (+5679-2272)
  • (modified) clang/docs/ReleaseNotes.rst (+2)
  • (modified) clang/docs/doxygen.cfg.in (+8-1)
  • (modified) clang/docs/tools/dump_ast_matchers.py (+63-5)
  • (modified) clang/include/clang/ASTMatchers/ASTMatchers.h (+3893-1632)
  • (modified) clang/unittests/ASTMatchers/ASTMatchersTest.h (+443-3)
  • (modified) clang/unittests/ASTMatchers/CMakeLists.txt (+15)
  • (added) clang/utils/generate_ast_matcher_doc_tests.py (+1160)
diff --git a/clang/docs/LibASTMatchersReference.html b/clang/docs/LibASTMatchersReference.html
index a16b9c44ef0ea..baf39befd796a 100644
--- a/clang/docs/LibASTMatchersReference.html
+++ b/clang/docs/LibASTMatchersReference.html
@@ -586,28 +586,36 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
   #pragma omp declare simd
   int min();
-attr()
-  matches "nodiscard", "nonnull", "noinline", and the whole "#pragma" line.
+
+The matcher attr()
+matches nodiscard, nonnull, noinline, and
+declare simd.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;</td><td class="name" onclick="toggle('cxxBaseSpecifier0')"><a name="cxxBaseSpecifier0Anchor">cxxBaseSpecifier</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXBaseSpecifier.html">CXXBaseSpecifier</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxBaseSpecifier0"><pre>Matches class bases.
 
-Examples matches public virtual B.
+Given
   class B {};
   class C : public virtual B {};
+
+The matcher cxxRecordDecl(hasDirectBase(cxxBaseSpecifier()))
+matches C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;</td><td class="name" onclick="toggle('cxxCtorInitializer0')"><a name="cxxCtorInitializer0Anchor">cxxCtorInitializer</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXCtorInitializer.html">CXXCtorInitializer</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxCtorInitializer0"><pre>Matches constructor initializers.
 
-Examples matches i(42).
+Given
   class C {
     C() : i(42) {}
     int i;
   };
+
+The matcher cxxCtorInitializer()
+matches i(42).
 </pre></td></tr>
 
 
@@ -619,17 +627,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
   public:
     int a;
   };
-accessSpecDecl()
-  matches 'public:'
+
+The matcher accessSpecDecl()
+matches public:.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('bindingDecl0')"><a name="bindingDecl0Anchor">bindingDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1BindingDecl.html">BindingDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="bindingDecl0"><pre>Matches binding declarations
-Example matches foo and bar
-(matcher = bindingDecl()
 
-  auto [foo, bar] = std::make_pair{42, 42};
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
+  auto [foo, bar] = make(42, 42);
+
+The matcher bindingDecl()
+matches foo and bar.
 </pre></td></tr>
 
 
@@ -642,14 +655,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
   myFunc(^(int p) {
     printf("%d", p);
   })
+
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('classTemplateDecl0')"><a name="classTemplateDecl0Anchor">classTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ClassTemplateDecl.html">ClassTemplateDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="classTemplateDecl0"><pre>Matches C++ class template declarations.
 
-Example matches Z
+Given
   template&lt;class T&gt; class Z {};
+
+The matcher classTemplateDecl()
+matches Z.
 </pre></td></tr>
 
 
@@ -660,13 +677,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;class T1, class T2, int I&gt;
   class A {};
 
-  template&lt;class T, int I&gt;
-  class A&lt;T, T*, I&gt; {};
+  template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {};
 
   template&lt;&gt;
   class A&lt;int, int, 1&gt; {};
-classTemplatePartialSpecializationDecl()
-  matches the specialization A&lt;T,T*,I&gt; but not A&lt;int,int,1&gt;
+
+The matcher classTemplatePartialSpecializationDecl()
+matches template&lt;class T, int I&gt; class A&lt;T, T*, I&gt; {},
+but does not match A&lt;int, int, 1&gt;.
 </pre></td></tr>
 
 
@@ -677,87 +695,128 @@ <h2 id="decl-matchers">Node Matchers</h2>
   template&lt;typename T&gt; class A {};
   template&lt;&gt; class A&lt;double&gt; {};
   A&lt;int&gt; a;
-classTemplateSpecializationDecl()
-  matches the specializations A&lt;int&gt; and A&lt;double&gt;
+
+The matcher classTemplateSpecializationDecl()
+matches class A&lt;int&gt;
+and class A&lt;double&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('conceptDecl0')"><a name="conceptDecl0Anchor">conceptDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1ConceptDecl.html">ConceptDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="conceptDecl0"><pre>Matches concept declarations.
 
-Example matches integral
-  template&lt;typename T&gt;
-  concept integral = std::is_integral_v&lt;T&gt;;
+Given
+  template&lt;typename T&gt; concept my_concept = true;
+
+
+The matcher conceptDecl()
+matches template&lt;typename T&gt;
+concept my_concept = true.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConstructorDecl0')"><a name="cxxConstructorDecl0Anchor">cxxConstructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructorDecl.html">CXXConstructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConstructorDecl0"><pre>Matches C++ constructor declarations.
 
-Example matches Foo::Foo() and Foo::Foo(int)
+Given
   class Foo {
    public:
     Foo();
     Foo(int);
     int DoSomething();
   };
+
+  struct Bar {};
+
+
+The matcher cxxConstructorDecl()
+matches Foo() and Foo(int).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxConversionDecl0')"><a name="cxxConversionDecl0Anchor">cxxConversionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConversionDecl.html">CXXConversionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxConversionDecl0"><pre>Matches conversion operator declarations.
 
-Example matches the operator.
+Given
   class X { operator int() const; };
+
+
+The matcher cxxConversionDecl()
+matches operator int() const.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDeductionGuideDecl0')"><a name="cxxDeductionGuideDecl0Anchor">cxxDeductionGuideDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDeductionGuideDecl.html">CXXDeductionGuideDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDeductionGuideDecl0"><pre>Matches user-defined and implicitly generated deduction guide.
 
-Example matches the deduction guide.
+Given
   template&lt;typename T&gt;
-  class X { X(int) };
+  class X { X(int); };
   X(int) -&gt; X&lt;int&gt;;
+
+
+The matcher cxxDeductionGuideDecl()
+matches the written deduction guide
+auto (int) -&gt; X&lt;int&gt;,
+the implicit copy deduction guide auto (int) -&gt; X&lt;T&gt;
+and the implicitly declared deduction guide
+auto (X&lt;T&gt;) -&gt; X&lt;T&gt;.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxDestructorDecl0')"><a name="cxxDestructorDecl0Anchor">cxxDestructorDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXDestructorDecl.html">CXXDestructorDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxDestructorDecl0"><pre>Matches explicit C++ destructor declarations.
 
-Example matches Foo::~Foo()
+Given
   class Foo {
    public:
     virtual ~Foo();
   };
+
+  struct Bar {};
+
+
+The matcher cxxDestructorDecl()
+matches virtual ~Foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxMethodDecl0')"><a name="cxxMethodDecl0Anchor">cxxMethodDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXMethodDecl.html">CXXMethodDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxMethodDecl0"><pre>Matches method declarations.
 
-Example matches y
+Given
   class X { void y(); };
+
+
+The matcher cxxMethodDecl()
+matches void y().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('cxxRecordDecl0')"><a name="cxxRecordDecl0Anchor">cxxRecordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1CXXRecordDecl.html">CXXRecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="cxxRecordDecl0"><pre>Matches C++ class declarations.
 
-Example matches X, Z
+Given
   class X;
   template&lt;class T&gt; class Z {};
+
+The matcher cxxRecordDecl()
+matches X and Z.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decl0')"><a name="decl0Anchor">decl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decl0"><pre>Matches declarations.
 
-Examples matches X, C, and the friend declaration inside C;
+Given
   void X();
   class C {
-    friend X;
+    friend void X();
   };
+
+The matcher decl()
+matches void X(), C
+and friend void X().
 </pre></td></tr>
 
 
@@ -767,40 +826,49 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { int y; };
-declaratorDecl()
-  matches int y.
+
+The matcher declaratorDecl()
+matches int y.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('decompositionDecl0')"><a name="decompositionDecl0Anchor">decompositionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1DecompositionDecl.html">DecompositionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="decompositionDecl0"><pre>Matches decomposition-declarations.
 
-Examples matches the declaration node with foo and bar, but not
-number.
-(matcher = declStmt(has(decompositionDecl())))
-
+Given
+  struct pair { int x; int y; };
+  pair make(int, int);
   int number = 42;
-  auto [foo, bar] = std::make_pair{42, 42};
+  auto [foo, bar] = make(42, 42);
+
+The matcher decompositionDecl()
+matches auto [foo, bar] = make(42, 42),
+but does not match number.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumConstantDecl0')"><a name="enumConstantDecl0Anchor">enumConstantDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumConstantDecl.html">EnumConstantDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumConstantDecl0"><pre>Matches enum constants.
 
-Example matches A, B, C
+Given
   enum X {
     A, B, C
   };
+The matcher enumConstantDecl()
+matches A, B and C.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('enumDecl0')"><a name="enumDecl0Anchor">enumDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1EnumDecl.html">EnumDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="enumDecl0"><pre>Matches enum declarations.
 
-Example matches X
+Given
   enum X {
     A, B, C
   };
+
+The matcher enumDecl()
+matches the enum X.
 </pre></td></tr>
 
 
@@ -808,9 +876,14 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="fieldDecl0"><pre>Matches field declarations.
 
 Given
-  class X { int m; };
-fieldDecl()
-  matches 'm'.
+  int a;
+  struct Foo {
+    int x;
+  };
+  void bar(int val);
+
+The matcher fieldDecl()
+matches int x.
 </pre></td></tr>
 
 
@@ -819,16 +892,20 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   class X { friend void foo(); };
-friendDecl()
-  matches 'friend void foo()'.
+
+The matcher friendDecl()
+matches friend void foo().
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('functionDecl0')"><a name="functionDecl0Anchor">functionDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1FunctionDecl.html">FunctionDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="functionDecl0"><pre>Matches function declarations.
 
-Example matches f
+Given
   void f();
+
+The matcher functionDecl()
+matches void f().
 </pre></td></tr>
 
 
@@ -837,6 +914,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Example matches f
   template&lt;class T&gt; void f(T t) {}
+
+
+The matcher functionTemplateDecl()
+matches template&lt;class T&gt; void f(T t) {}.
 </pre></td></tr>
 
 
@@ -845,8 +926,8 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   struct X { struct { int a; }; };
-indirectFieldDecl()
-  matches 'a'.
+The matcher indirectFieldDecl()
+matches a.
 </pre></td></tr>
 
 
@@ -854,10 +935,13 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="labelDecl0"><pre>Matches a declaration of label.
 
 Given
-  goto FOO;
-  FOO: bar();
-labelDecl()
-  matches 'FOO:'
+  void bar();
+  void foo() {
+    goto FOO;
+    FOO: bar();
+  }
+The matcher labelDecl()
+matches FOO: bar().
 </pre></td></tr>
 
 
@@ -866,8 +950,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   extern "C" {}
-linkageSpecDecl()
-  matches "extern "C" {}"
+
+The matcher linkageSpecDecl()
+matches extern "C" {}.
 </pre></td></tr>
 
 
@@ -875,12 +960,18 @@ <h2 id="decl-matchers">Node Matchers</h2>
 <tr><td colspan="4" class="doc" id="namedDecl0"><pre>Matches a declaration of anything that could have a name.
 
 Example matches X, S, the anonymous union type, i, and U;
+Given
   typedef int X;
   struct S {
     union {
       int i;
     } U;
   };
+The matcher namedDecl()
+matches typedef int X, S, int i
+ and U,
+with S matching twice in C++.
+Once for the injected class name and once for the declaration itself.
 </pre></td></tr>
 
 
@@ -890,8 +981,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace test {}
   namespace alias = ::test;
-namespaceAliasDecl()
-  matches "namespace alias" but not "namespace test"
+
+The matcher namespaceAliasDecl()
+matches alias,
+but does not match test.
 </pre></td></tr>
 
 
@@ -901,8 +994,9 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   namespace {}
   namespace test {}
-namespaceDecl()
-  matches "namespace {}" and "namespace test {}"
+
+The matcher namespaceDecl()
+matches namespace {} and namespace test {}.
 </pre></td></tr>
 
 
@@ -911,8 +1005,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-nonTypeTemplateParmDecl()
-  matches 'N', but not 'T'.
+
+The matcher nonTypeTemplateParmDecl()
+matches int N,
+but does not match typename T.
 </pre></td></tr>
 
 
@@ -922,6 +1018,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @interface Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -931,6 +1028,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo (Additions)
   @implementation Foo (Additions)
   @end
+
 </pre></td></tr>
 
 
@@ -940,6 +1038,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @implementation Foo
   @end
+
 </pre></td></tr>
 
 
@@ -949,6 +1048,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches Foo
   @interface Foo
   @end
+
 </pre></td></tr>
 
 
@@ -960,6 +1060,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
     BOOL _enabled;
   }
   @end
+
 </pre></td></tr>
 
 
@@ -974,6 +1075,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @implementation Foo
   - (void)method {}
   @end
+
 </pre></td></tr>
 
 
@@ -984,6 +1086,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
   @interface Foo
   @property BOOL enabled;
   @end
+
 </pre></td></tr>
 
 
@@ -993,6 +1096,7 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Example matches FooDelegate
   @protocol FooDelegate
   @end
+
 </pre></td></tr>
 
 
@@ -1001,48 +1105,58 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   void f(int x);
-parmVarDecl()
-  matches int x.
+The matcher parmVarDecl()
+matches int x.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('recordDecl0')"><a name="recordDecl0Anchor">recordDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1RecordDecl.html">RecordDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="recordDecl0"><pre>Matches class, struct, and union declarations.
 
-Example matches X, Z, U, and S
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
+
+The matcher recordDecl()
+matches X, Z,
+S and U.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('staticAssertDecl0')"><a name="staticAssertDecl0Anchor">staticAssertDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1StaticAssertDecl.html">StaticAssertDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="staticAssertDecl0"><pre>Matches a C++ static_assert declaration.
 
-Example:
-  staticAssertDecl()
-matches
-  static_assert(sizeof(S) == sizeof(int))
-in
+Given
   struct S {
     int x;
   };
   static_assert(sizeof(S) == sizeof(int));
+
+
+The matcher staticAssertDecl()
+matches static_assert(sizeof(S) == sizeof(int)).
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('tagDecl0')"><a name="tagDecl0Anchor">tagDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TagDecl.html">TagDecl</a>&gt;...</td></tr>
 <tr><td colspan="4" class="doc" id="tagDecl0"><pre>Matches tag declarations.
 
-Example matches X, Z, U, S, E
+Given
   class X;
   template&lt;class T&gt; class Z {};
   struct S {};
   union U {};
-  enum E {
-    A, B, C
-  };
+  enum E { A, B, C };
+
+
+The matcher tagDecl()
+matches class X, class Z {}, the injected class name
+class Z, struct S {},
+the injected class name struct S, union U {},
+the injected class name union U
+and enum E { A, B, C }.
 </pre></td></tr>
 
 
@@ -1051,8 +1165,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;template &lt;typename&gt; class Z, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'Z', but not 'N'.
+
+The matcher templateTemplateParmDecl()
+matches template &lt;typename&gt; class Z,
+but does not match int N.
 </pre></td></tr>
 
 
@@ -1061,8 +1177,10 @@ <h2 id="decl-matchers">Node Matchers</h2>
 
 Given
   template &lt;typename T, int N&gt; struct C {};
-templateTypeParmDecl()
-  matches 'T', but not 'N'.
+
+The matcher templateTypeParmDecl()
+matches typename T,
+but does not int N.
 </pre></td></tr>
 
 
@@ -1072,10 +1190,12 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   int X;
   namespace NS {
-  int Y;
+    int Y;
   }  // namespace NS
-decl(hasDeclContext(translationUnitDecl()))
-  matches "int X", but not "int Y".
+
+The matcher namedDecl(hasDeclContext(translationUnitDecl()))
+matches X and NS,
+but does not match Y.
 </pre></td></tr>
 
 
@@ -1085,17 +1205,22 @@ <h2 id="decl-matchers">Node Matchers</h2>
 Given
   typedef int X;
   using Y = int;
-typeAliasDecl()
-  matches "using Y = int", but not "typedef int X"
+
+The matcher typeAliasDecl()
+matches using Y = int,
+but does not match typedef int X.
 </pre></td></tr>
 
 
 <tr><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1Decl.html">Decl</a>&gt;</td><td class="name" onclick="toggle('typeAliasTemplateDecl0')"><a name="typeAliasTemplateDecl0Anchor">typeAliasTemplateDecl</a></td><td>Matcher&lt;<a href="https://clang.llvm.org/doxygen/classclang_1_1TypeAliasTemplateDecl.html">TypeAliasTemplateDecl</a>&gt;...</td></tr>
 <tr><td ...
[truncated]

@5chmidti
Copy link
Contributor Author

5chmidti commented Jun 3, 2024

CC @llvm/pr-subscribers-clang-tidy as stake-holders in matchers

@5chmidti 5chmidti force-pushed the users/5chmidti/rm_not_needed_run_overload_in_BoundNodesCallback branch from 0c53f15 to 615f30b Compare June 3, 2024 16:37
@5chmidti 5chmidti force-pushed the users/5chmidti/add_testing_for_the_AST_matcher_reference branch from 1b4b4e4 to 2e90b54 Compare June 3, 2024 16:37
@AaronBallman AaronBallman self-requested a review June 4, 2024 11:55
Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like precommit CI found relevant failures:

�_bk;t=1717501095887�FAILED: tools/clang/unittests/ASTMatchers/ASTMatchersDocTests.cpp 
�_bk;t=1717501095887�cmd.exe /C "cd /D C:\ws\src\build\tools\clang\unittests\ASTMatchers && C:\ws\src\clang\utils\generate_ast_matcher_doc_tests.py --input-file C:/ws/src/clang/include/clang/ASTMatchers/ASTMatchers.h --output-file C:/ws/src/build/tools/clang/unittests/ASTMatchers/ASTMatchersDocTests.cpp"
�_bk;t=1717501095887�  File "C:\ws\src\clang\utils\generate_ast_matcher_doc_tests.py", line 613
�_bk;t=1717501099131�    const StringRef Code = R"cpp(\n{"\t#include \"cuda.h\"\n" if has_cuda else ""}{self.code})cpp";\n"""

�_bk;t=1717501099131�                                                                                                        ^

�_bk;t=1717501099131�SyntaxError: f-string expression part cannot include a backslash

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add some documentation to ASTMatchers.h about the new special syntax for comments so that users who hit test failures with the new automatic tests have some more help getting to a solution.

/// matches "int X", but not "int Y".
/// \compile_args{-std=c++}
/// The matcher \matcher{namedDecl(hasDeclContext(translationUnitDecl()))}
/// matches \match{type=name$X} and \match{type=name$NS},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances do you need to use this special type=name$foo syntax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using type=name can be generally considered to be a style/readability/expressiveness choice if the AST node supports it. The X example would probably be better spelling the declaration out, the same goes for Y (probably remnants of the early days). There may be other trivial examples that could be spelled out.

There are for sure some more trivial cases which could be spelled out. I'll check on the documentation again tomorrow and provide some updates (also w.r.t to your other comment).

If we wanted to spell out the namespace, we could, but that would require writing the NS in a single line. It's an artificial limitation in the script that can probably be implemented if we want to have the option.

Comment on lines 389 to 390
/// matches \match{void X()}, \match{type=name;count=2$C}
/// and \match{count=2$friend void X()}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain \match{type=name;count=2$C}? I can see it matching class C, but I'm wondering what the second match is (and should we add a comment explaining that other match?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|-FunctionDecl <line:1:1, col:8> col:6 X 'void ()'
`-CXXRecordDecl <line:2:1, line:4:1> line:2:7 class C definition
  |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
  | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
  | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveConstructor exists simple trivial needs_implicit
  | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveAssignment exists simple trivial needs_implicit
  | `-Destructor simple irrelevant trivial needs_implicit
  |-CXXRecordDecl <col:1, col:7> col:7 implicit class C
  `-FriendDecl <line:3:5, col:19> col:17
    `-FunctionDecl parent 0xf23a388 prev 0xf284370 <col:5, col:19> col:17 friend X 'void ()'

Can you explain \match{type=name;count=2$C}?

That is the implicit class C in the AST above. I couldn't access it from the top-level C and I couldn't find a way from the implicit class C back to the top-level one, so I don't know how to call it. I thought it would be a decl but not a definition, however, getDefinition returns a nullptr for the implicit class C.

should we add a comment explaining that other match?

Certainly. I'll read the documentation again to see if there are more cases like this that could be improved as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thank you! I kind of figured it was the implicit class declaration.

Copy link
Contributor Author

@5chmidti 5chmidti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add some documentation to ASTMatchers.h

I'll add one in a day or so. I updated the description with some more information, and I'll probably take parts of that as a basis for the comment in the header (and update the script comment as well).

so that users who hit test failures with the new automatic tests have some more help getting to a solution.

There is now a What if ...? section to the pr description, which I will put into the header comment as well.

Comment on lines 389 to 390
/// matches \match{void X()}, \match{type=name;count=2$C}
/// and \match{count=2$friend void X()}.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|-FunctionDecl <line:1:1, col:8> col:6 X 'void ()'
`-CXXRecordDecl <line:2:1, line:4:1> line:2:7 class C definition
  |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
  | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
  | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveConstructor exists simple trivial needs_implicit
  | |-CopyAssignment simple trivial has_const_param needs_implicit implicit_has_const_param
  | |-MoveAssignment exists simple trivial needs_implicit
  | `-Destructor simple irrelevant trivial needs_implicit
  |-CXXRecordDecl <col:1, col:7> col:7 implicit class C
  `-FriendDecl <line:3:5, col:19> col:17
    `-FunctionDecl parent 0xf23a388 prev 0xf284370 <col:5, col:19> col:17 friend X 'void ()'

Can you explain \match{type=name;count=2$C}?

That is the implicit class C in the AST above. I couldn't access it from the top-level C and I couldn't find a way from the implicit class C back to the top-level one, so I don't know how to call it. I thought it would be a decl but not a definition, however, getDefinition returns a nullptr for the implicit class C.

should we add a comment explaining that other match?

Certainly. I'll read the documentation again to see if there are more cases like this that could be improved as well.

/// matches "int X", but not "int Y".
/// \compile_args{-std=c++}
/// The matcher \matcher{namedDecl(hasDeclContext(translationUnitDecl()))}
/// matches \match{type=name$X} and \match{type=name$NS},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using type=name can be generally considered to be a style/readability/expressiveness choice if the AST node supports it. The X example would probably be better spelling the declaration out, the same goes for Y (probably remnants of the early days). There may be other trivial examples that could be spelled out.

There are for sure some more trivial cases which could be spelled out. I'll check on the documentation again tomorrow and provide some updates (also w.r.t to your other comment).

If we wanted to spell out the namespace, we could, but that would require writing the NS in a single line. It's an artificial limitation in the script that can probably be implemented if we want to have the option.

@5chmidti
Copy link
Contributor Author

5chmidti commented Jun 8, 2024

  • added a file-level comment in the ASTMatcher.h file on how the syntax works (basically the pr description)
  • replaced some type=name matches with explicit code matches where applicable, to be more expressive
  • added comments to count= matches when they didn't explain why or if there were multiple matches

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes generally LGTM, though I would appreciate a second set of eyes on the CMake and Python changes because I have a bit less confidence in my review abilities there.

Thank you for adding the documentation to the header file, I think that will help folks when working on their own matchers.

One question I have is: do you happen to know how this impacts build times for Clang itself? I'm assuming that if ASTMatchers.h isn't modified, CMake won't re-run generate_ast_matcher_doc_tests.py and so the compile time performance hit is only on full rebuilds or when changing the header?

@5chmidti
Copy link
Contributor Author

5chmidti commented Jul 2, 2024

Thanks.

... how this impacts build times for Clang itself? I'm assuming that if ASTMatchers.h isn't modified, CMake won't re-run generate_ast_matcher_doc_tests.py and so the compile time performance hit is only on full rebuilds or when changing the header?

The 'state' of the generated file is only checked when the ASTMatchersTests target is built, because it's the only thing that depends on the generated file. And the file is only generated when: the file does not exist or ASTMatchers.h has changed (excluding transitive changes in includes) or generate_ast_matcher_doc_tests.py has changed.

@AaronBallman
Copy link
Collaborator

AaronBallman commented Jul 2, 2024

Thanks.

... how this impacts build times for Clang itself? I'm assuming that if ASTMatchers.h isn't modified, CMake won't re-run generate_ast_matcher_doc_tests.py and so the compile time performance hit is only on full rebuilds or when changing the header?

The 'state' of the generated file is only checked when the ASTMatchersTests target is built, because it's the only thing that depends on the generated file. And the file is only generated when: the file does not exist or ASTMatchers.h has changed (excluding transitive changes in includes) or generate_ast_matcher_doc_tests.py has changed.

Excellent, thank you for the confirmation! That sounds reasonable to me.

I think we've waited long enough for feedback on the cmake bits, so this is ready to land. We can address concerns post-commit. Do you need me to land the changes on your behalf?

@5chmidti
Copy link
Contributor Author

5chmidti commented Jul 3, 2024

Do you need me to land the changes on your behalf?

No need, I can do that. However, this PR is in a stack and depends on two small PRs: #94244 and #94243 that need to be reviewed before this PR can be merged

@AaronBallman
Copy link
Collaborator

Do you need me to land the changes on your behalf?

No need, I can do that. However, this PR is in a stack and depends on two small PRs: #94244 and #94243 that need to be reviewed before this PR can be merged

Ah thank you for pointing this out, I hadn't realized this was a patch stack.

@5chmidti 5chmidti force-pushed the users/5chmidti/rm_not_needed_run_overload_in_BoundNodesCallback branch from 615f30b to 51de627 Compare July 12, 2024 22:00
@5chmidti 5chmidti force-pushed the users/5chmidti/add_testing_for_the_AST_matcher_reference branch from c4a014e to fe4328c Compare July 12, 2024 22:01
@5chmidti
Copy link
Contributor Author

rebase on trunk + rebased stack

@5chmidti 5chmidti force-pushed the users/5chmidti/rm_not_needed_run_overload_in_BoundNodesCallback branch from 51de627 to 26d5b03 Compare July 12, 2024 23:13
Previously, the examples in the AST matcher reference, which gets
generated by the doxygen comments in `ASTMatchers.h`, were untested
and best effort.
Some of the matchers had no or wrong examples of how to use the matcher.

This patch introduces a simple DSL around doxygen commands to enable
testing the AST matcher documentation in a way that should be relatively
easy.
In `ASTMatchers.h`, most matchers are documented with a doxygen comment.
Most of these also have a code example that aims to show what the
matcher will match, given a matcher somewhere in the documentation text.
The way that testing the documentation is done, is by using doxygens
alias feature to declare custom aliases. These aliases forward to
`<tt>text</tt>` (which is what doxygens \c does, but for multiple words).
Using the doxygen aliases was the obvious choice, because there are
(now) four consumers:
 - people reading the header/using signature help
 - the doxygen generated documentation
 - the generated html AST matcher reference
 - (new) the generated matcher tests

This patch rewrites/extends the documentation such that all matchers
have a documented example.
The new `generate_ast_matcher_doc_tests.py` script will warn on any
undocumented matchers (but not on matchers without a doxygen comment)
and provides diagnostics and statistics about the matchers.
Below is a file-level comment from the test generation script that
describes how documenting matchers to be tested works on a slightly more
technical level. In general, the new comments can be used as a reference
for how to implement a tested documentation.

The current statistics emitted by the parser are:

```text
Statistics:
        doxygen_blocks                :   519
        missing_tests                 :    10
        skipped_objc                  :    42
        code_snippets                 :   503
        matches                       :   820
        matchers                      :   580
        tested_matchers               :   574
        none_type_matchers            :     6
```

The tests are generated during building and the script will only print
something if it found an issue (compile failure, parsing issues,
the expected and actual number of failures differs).

DSL for generating the tests from documentation.

TLDR:
The order for a single code snippet example is:

  \header{a.h}
  \endheader     <- zero or more header

  \code
    int a = 42;
  \endcode
  \compile_args{-std=c++,c23-or-later} <- optional, supports std ranges and
                                          whole languages

  \matcher{expr()} <- one or more matchers in succession
  \match{42}   <- one ore more matches in succession

  \matcher{varDecl()} <- new matcher resets the context, the above
                         \match will not count for this new
                         matcher(-group)
  \match{int a  = 42} <- only applies to the previous matcher (no the
                         previous case)

The above block can be repeated inside of a doxygen command for multiple
code examples.

Language Grammar:
  [] denotes an optional, and <> denotes user-input

  compile_args j:= \compile_args{[<compile_arg>;]<compile_arg>}
  matcher_tag_key ::= type
  match_tag_key ::= type || std || count
  matcher_tags ::= [matcher_tag_key=<value>;]matcher_tag_key=<value>
  match_tags ::= [match_tag_key=<value>;]match_tag_key=<value>
  matcher ::= \matcher{[matcher_tags$]<matcher>}
  matchers ::= [matcher] matcher
  match ::= \match{[match_tags$]<match>}
  matches ::= [match] match
  case ::= matchers matches
  cases ::= [case] case
  header-block ::= \header{<name>} <code> \endheader
  code-block ::= \code <code> \endcode
  testcase ::= code-block [compile_args] cases

The 'std' tag and '\compile_args' support specifying a specific
language version, a whole language and all of it's versions, and thresholds
(implies ranges). Multiple arguments are passed with a ',' seperator.
For a language and version to execute a tested matcher, it has to match
the specified '\compile_args' for the code, and the 'std' tag for the matcher.
Predicates for the 'std' compiler flag are used with disjunction between
languages (e.g. 'c || c++') and conjunction for all predicates specific
to each language (e.g. 'c++11-or-later && c++23-or-earlier').

Examples:
 - c                                    all available versions of C
 - c++11                                only C++11
 - c++11-or-later                       C++11 or later
 - c++11-or-earlier                     C++11 or earlier
 - c++11-or-later,c++23-or-earlier,c    all of C and C++ between 11 and
                                          23 (inclusive)
 - c++11-23,c                             same as above

Tags:

  Type:
  Match types are used to select where the string that is used to check if
  a node matches comes from.
  Available: code, name, typestr, typeofstr.
  The default is 'code'.

  Matcher types are used to mark matchers as submatchers with 'sub' or as
  deactivated using 'none'. Testing submatchers is not implemented.

  Count:
  Specifying a 'count=n' on a match will result in a test that requires that
  the specified match will be matched n times. Default is 1.

  Std:
  A match allows specifying if it matches only in specific language versions.
  This may be needed when the AST differs between language versions.

Fixes #57607
Fixes #63748
@5chmidti 5chmidti force-pushed the users/5chmidti/add_testing_for_the_AST_matcher_reference branch from fe4328c to ad11a89 Compare July 12, 2024 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors in AST matcher documentation/examples? classTemplateSpecializationDecl does not work as described
3 participants