Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some cpp comment stripping parsing #119

Merged
merged 22 commits into from Aug 12, 2022
Merged

Conversation

2bndy5
Copy link
Collaborator

@2bndy5 2bndy5 commented Jul 10, 2022

Includes

  • fix for Windows paths
  • fix for unresolved cross-refs in the new docs
  • support for \ and @ prefixed Doxygen commands
  • support for @retval command
  • support for param direction(s)
  • add new strip_comment() to accommodate for all forms of C++ comment syntax

I also enabled verbosity in the cpp.apigen demo and added a line that shows the number of declarations that will be parsed.

@2bndy5 2bndy5 requested a review from jbms July 10, 2022 00:27
@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 10, 2022

The fix for cross-references didn't work. I had to resort to using an explicit role for the cross-refs to actually get resolved. Otherwise they all just show as italics (using a <cite> tag in HTML).

I'm not sure what is going on here, but I don't like having to explicitly use a role name for simple cross-refs (especially when there aren't duplicates and no warnings).

@2bndy5 2bndy5 force-pushed the some-cpp-comment-stripping-parsing branch from 4c24ed0 to be91a02 Compare July 10, 2022 03:06
@jbms
Copy link
Owner

jbms commented Jul 11, 2022

The fix for cross-references didn't work. I had to resort to using an explicit role for the cross-refs to actually get resolved. Otherwise they all just show as italics (using a <cite> tag in HTML).

I'm not sure what is going on here, but I don't like having to explicitly use a role name for simple cross-refs (especially when there aren't duplicates and no warnings).

Turns out the issue is that the C++ apigen extension inserts default-role directives in order to ensure the default role is interpreted as cpp:expr within doc comments, but then reset it improperly so that it was the docutils default (title-reference I believe), rather than the default we set in our conf.py, any. I think title-reference does not result in nitpick warnings since it is implemented by docutils and not sphinx. The issue occurred on this page because of the prior apigen example on the same page (the roles are reset with each document).

I'm working on a fix for this and will send out a PR.

Separately, though, the .member syntax only works for py:obj references, not for any references. So regardless we should probably insert .. default-role:: py:obj before those autodoc directives.

@jbms
Copy link
Owner

jbms commented Jul 11, 2022

Thanks!

How about adding some unit tests of the new comment parsing, and other syntax improvements?

@jbms
Copy link
Owner

jbms commented Jul 11, 2022

Regarding retval, it sounds like the first argument is intended for documenting specific values (e.g. error codes), rather than the return type itself (which is already known from the signature):

https://stackoverflow.com/questions/60120282/is-there-a-retval-equivalent-for-out-parameters-in-doxygen-c-c

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 11, 2022

How about adding some unit tests of the new comment parsing, and other syntax improvements?

I'm working on this now... I think I'll use several tests: 1 for each supported Doxygen cmd and another to test various comment syntaxes.

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 11, 2022

I added a couple unit tests. These can be refined as we go, but I just wanted to test the additions in this PR.

BTW, I had to alter array.h (see ebfe2ba) because the subsequent line's indentation was getting misinterpreted within the :param x: field. This is mostly due to how the strip_comment() function was designed. I'd be more comfortable addressing that after we start fetching entire comment blocks (instead of 1 line at a time) because the function is meant to accept an entire comment (be it a block or just 1 line) from Cursor.raw_comment; meaning the first line's indenting whitespace is not stripped.

The strip_comment() also doesn't take into account non-doctring comments like so:

/** A short description. */
// This is not a docstring, but might get lumped into `raw_comment`'s value
auto func();

// This non-docstring comment will cause the regex check for comment prefix to
// skip the following docstr (when using `raw_comment`)
/*! This is a docstring. */
auto func(int);

It would be easy to adjust the new DOC_COMMENT_PREFIX pattern to capture the docstring comments and pass the matches to strip_comment()

)

output = api_parser.generate_output(config)
doc_strings = [
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to test each of these individually, so that there is just a single entity and you can just exactly check it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I hadn't done it this way, then I never would've guessed that the re.sub(brief|details, "", txt) call needed the re.MULTILINE flag.

Are you ok with testing each regex pattern per test, instead of testing each supported command per test?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this particular test structure helped spot that issue. But I just meant that it would be clearer to have the test just generate one entity at a time, so that each test case is more isolated, at least for these comment syntax tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the re.sub() call was only replacing the first \\brief on the first line but not \\details on the third line of the same comment... The multiline flag fixed that.

I could split up this test into per comment syntax form. Initially, I thought this comment was for the other test_function_fields() test.

Looking at the regex patterns, I don't know why you're looking for some fields like checks, dchecks, schecks (not really sure what they are supposed to represent). I get the pre, post, and invariant commands because those actually exist for Doxygen. The error (which isn't treated like an admonition) and requires (which seems equivalent to @concept) commands seem to be something you added (they don't exist for Doxygen).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, checks, error, and requires are "custom" commands I added for tensorstore, along with corresponding custom sphinx docfields that I left out. I agree they shouldn't be provided by default --- it would be good instead to have a mechanism for custom doxygen commands though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can look into implementing something similar to Doxygen's ALIASES setting. That is how breathe supports raw RST in Doxygen's XML output. Theoretically it can be used to add custom commands like the ones you're adding.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I split up the comment stripping tests into a class of tests. With using raw_comment, the test for trailing docstring comment is passing. 🎉

I could split up the test about function fields, but I think that can be done after the _normalize_doc_comment() function matures more (it is currently pretty limited).

for line in body
]
body[-1] = body[-1].rstrip("*/").rstrip()
body = dedent("\n".join(body)).splitlines()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why join with "\n" and then split again?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see there is a dedent in there. But I think we should only dedent multiline comments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is designed to handle both single line comments and multi-lined comment blocks (provided by raw_comment).

Stripping the leading whitespace from each line after the comment syntax is stripped away isn't great for code blocks and MD style blockquotes.

/**
 * @code{.cpp}
 * if (debugLevel) {
 *     printf("some output"); // the leading whitespace needs to be preserved here
 * }
 * @endcode
 */

///     This is a blockquote.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we should only dedent multiline comments.

I added a condition to skip dedenting single line comments, but the doc builds fail because the leading whitespace is now being interpreted as content to the ext's generated .. highlight:: cpp directive.

<cpp_apigen_rst_prolog>:6: ERROR: Error in "highlight" directive:
no content permitted.
.. highlight:: cpp
 Returns the data order.

Kinda glad I did the rebase on main now (this would've gone unnoticed until much later).

Copy link
Collaborator Author

@2bndy5 2bndy5 Jul 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, A single line comment (with respect to raw_comment) is a block that has 1 line

/** A single line comment */

/// Not a single line comment because
/// it has multiple lines in sequence

So, I'm not sure what the advantage is for not dedenting a single line comment. Maybe we could delegate this to regex with re.sub(r"^(\s{0, 3})\S", "", text) (notice there's no multiline flag). Any other concerns I can think of all require multiple lines (eg. lists)

@jbms
Copy link
Owner

jbms commented Jul 11, 2022

I added a couple unit tests. These can be refined as we go, but I just wanted to test the additions in this PR.

Thanks

BTW, I had to alter array.h (see ebfe2ba) because the subsequent line's indentation was getting misinterpreted within the :param x: field. This is mostly due to how the strip_comment() function was designed. I'd be more comfortable addressing that after we start fetching entire comment blocks (instead of 1 line at a time) because the function is meant to accept an entire comment (be it a block or just 1 line) from Cursor.raw_comment; meaning the first line's indenting whitespace is not stripped.

Is the change to array.h just due to a present bug in the code, or is the issue that doxygen does not allow a continutation line of \param to be indented (or rather, interprets the indentation as a blockquote in that case)?

The strip_comment() also doesn't take into account non-doctring comments like so:

/** A short description. */
// This is not a docstring, but might get lumped into `raw_comment`'s value
auto func();

// This non-docstring comment will cause the regex check for comment prefix to
// skip the following docstr (when using `raw_comment`)
/*! This is a docstring. */
auto func(int);

Does doxygen allow non-doc comments to be interspersed with doc comments?

I guess it could be reasonable to allow, not sure how common it would be.

It would be easy to adjust the new DOC_COMMENT_PREFIX pattern to capture the docstring comments and pass the matches to strip_comment()

It sounds like the current approach, of processing each comment token individually, works fine. What do you see as the advantage to using raw_comment? It sounds like that doesn't capture all comments, and would need more parsing, compared to the current approach of extracting each comment token separately.

I suppose one issue with the current approach is how to handle stripping of a single leading space on single-line doc comments:

///Is this a doc comment?
/// And is this indented relative to it?

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 11, 2022

Is the change to array.h just due to a present bug in the code, or is the issue that doxygen does not allow a continutation line of \param to be indented (or rather, interprets the indentation as a blockquote in that case)?

It mostly because Doxygen does not allow the continuation of a @param to be indented. Think of the Doxygen paragraphs like Markdown syntax, not rST. There's also a slight misconception in the function that splits the comment into lines. For example, raw_comment of a class member function is written like

class DummyClass
{
    /**
     * Some docstring.
     * @param arg This is 1 sentence of a paragraph.
     * This is the second sentence of the same paragraph.
     * @note This is a new paragraph because Doxygen doesn't allow
     * nesting certain commands that expect a single paragraph
     * @par
     * This is a second paragraph in the note because we explicitly use the `par` command
     * @warning This is a new paragraph
     * @parblock
     * This is a second paragraph because we use the `parblock` command.
     * 
     * This is a third paragraph in the warning.
     * @endparblock
     * This is a stray paragraph not in the warning.
     */
    void fun(bool arg);
}

The raw_comment value will be:

"""/**
     * Some docstring.
     * @param arg This is 1 sentence of a paragraph.
     * This is the second sentence of the same paragraph.
     * @note This is a new paragraph because Doxygen doesn't allow
     * nesting certain commands that expect a single paragraph
     * @par
     * This is a second paragraph in the note because we explicitly use the `par` command
     * @warning This is a new paragraph
     * @parblock
     * This is a second paragraph because we use the `parblock` command.
     * 
     * This is a third paragraph in the warning.
     * @endparblock
     * This is a stray paragraph not in the warning.
     */"""

with the line endings obviously.


Does doxygen allow non-doc comments to be interspersed with doc comments?

I guess it could be reasonable to allow, not sure how common it would be.

Doxygen does allow this, and I agree that it isn't all that common. I'm just trying to think of ways that might break our approach. To compensate for this wouldn't be hard (using regex). It just something on the radar, not a really high priority.


I suppose one issue with the current approach is how to handle stripping of a single leading space on single-line doc comments:

///Is this a doc comment?
/// And is this indented relative to it?

This is why I join the stripped comment, dedent it, and split it back up again. Generally, Doxygen doesn't have to worry if the whitespace after the comment syntax isn't uniform because Markdown indents are at least 3 spaces or more.

I should note that Doxygen does mandate that comments using the /// prefix requires that they begin and end with a blank line. So, really that would be written like

///
///This is a doc comment
/// This is another doc comment within the same paragraph.
///

It sounds like the current approach, of processing each comment token individually, works fine.

We need to improve the traversal better. Currently, it only returns a comment token if the comment is a single line. It is much more common to use /** ... multi-lined block ... */. The token approach also doesn't capture trailing comments used as docstrings

struct dummy
{
   int var; ///< This is the docstring.
}

However, raw_comment will easily capture the comment (albeit a leading block or a trailing single line). I also did a quick test, and raw_comment does work for macros -- I'm not sure what that other extension's src comment was talking about anymore.

I just think its over complicated to ignore clang's functionality and parse the source in python (even though clang has already done it for us).

@jbms
Copy link
Owner

jbms commented Jul 11, 2022

Is the change to array.h just due to a present bug in the code, or is the issue that doxygen does not allow a continutation line of \param to be indented (or rather, interprets the indentation as a blockquote in that case)?

It mostly because Doxygen does not allow the continuation of a @param to be indented. Think of the Doxygen paragraphs like Markdown syntax, not rST.

I was attempting to support both rST syntax and a subset of doxygen syntax at the same time. That also would make it relatively easy to migrate incrementally. Do you think that is still feasible, or do we have to have a configuration option to interpret as either doxygen syntax or rST syntax, which would make it significantly more difficult to migrate?

Maybe we can just convert the continuation lines after these doxygen commands appropriately.

Does doxygen allow non-doc comments to be interspersed with doc comments?
I guess it could be reasonable to allow, not sure how common it would be.

Doxygen does allow this, and I agree that it isn't all that common. I'm just trying to think of ways that might break our approach. To compensate for this wouldn't be hard (using regex). It just something on the radar, not a really high priority.

I suppose one issue with the current approach is how to handle stripping of a single leading space on single-line doc comments:

///Is this a doc comment?
/// And is this indented relative to it?

This is why I join the stripped comment, dedent it, and split it back up again. Generally, Doxygen doesn't have to worry if the whitespace after the comment syntax isn't uniform because Markdown indents are at least 3 spaces or more.

Okay, I guess in any case when using raw_comment it won't be an issue.

I should note that Doxygen does mandate that comments using the /// prefix requires that they begin and end with a blank line. So, really that would be written like

///
///This is a doc comment
/// This is another doc comment within the same paragraph.
///

I would prefer to relax this blank line restriction --- unless it serves some important purpose.

It sounds like the current approach, of processing each comment token individually, works fine.

We need to improve the traversal better. Currently, it only returns a comment token if the comment is a single line. It is much more common to use /** ... multi-lined block ... */. The token approach also doesn't capture trailing comments used as docstrings

struct dummy
{
   int var; ///< This is the docstring.
}

However, raw_comment will easily capture the comment (albeit a leading block or a trailing single line). I also did a quick test, and raw_comment does work for macros -- I'm not sure what that other extension's src comment was talking about anymore.

I just think its over complicated to ignore clang's functionality and parse the source in python (even though clang has already done it for us).

If raw_comment handles all of the cases that we care about, then it sounds like that is the way to go. I wasn't aware of it when I wrote this code. It looks like it does handle multiple consecutive comments, and also looks like we can get the source range via clang_Cursor_getCommentRange C function (that we can call from Python).

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 12, 2022

Do you think that is still feasible, or do we have to have a configuration option to interpret as either doxygen syntax or rST syntax

We can provide a config option; that was my original intention. However, people using breathe may already have their custom doxygen aliases setup to inject actual RST in their doc strings. This often looks like

@rst
.. note:: 
    This is an admonition.

    This is a second paragraph in the note.
@endrst

Where @rst and@endrst commands are ALIASES created in their doxygen config to pass raw RST as verbatim blocks in XML.

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 12, 2022

There's a huge difference between Doxygen syntax (which is mostly markdown) and RST syntax. I don't think expecting to parse a mix of the 2 is sustainable.

@jbms
Copy link
Owner

jbms commented Jul 12, 2022

There's a huge difference between Doxygen syntax (which is mostly markdown) and RST syntax. I don't think expecting to parse a mix of the 2 is sustainable.

What would the migration path be, then?

My own opinion is that rST syntax is better than doxygen syntax, and I imagine that may naturally be a common opinion among those using Sphinx for documentation --- when using doxygen rST syntax is not an option, but here it is.

But if it has to be toggled via a global option, then it would be difficult for users to gradually make use of more rST functionality. Instead they would have to convert all of their comments at once.

I suppose the user could have two entries in cpp_apigen_configs, one that filters by filename or symbol to include only entities documented using doxygen syntax, and another entry for entities documented using rST syntax. But not sure how convenient that will be. And currently with the way separate apigen configs are handled, you can't make an entity from one config a related entity of one defined in the other config. Perhaps that could be fixed, though.

Alternatively we could have a separate regular expression to indicate which symbols are documented using doxygen syntax.

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 12, 2022

If they're migrating from Doxygen, then there will inevitably need to be changes made to the comments. If they're migrating from breathe, then it should be less painless because they're likely already writing their comments in rst (albeit encapsulated in
@rst ... @endrst blocks). I'm not against supporting an @rst/@endrst command blocks out-of-the-box because its a very preferred approach for documenting C++ with Doxygen+breathe[+exhale]. Inline RST has a slightly different beginning command
(@rst-inline ... @endrst)

Am I understanding your use of "migrating" correctly?

I agree in that RST is the superior documenting syntax. I often look at motivations for using mkdocs or MyST parser as seriously flawed opinions. My grievances against Doxygen are endless...

@jbms
Copy link
Owner

jbms commented Jul 12, 2022

If they're migrating from Doxygen, then there will inevitably need to be changes made to the comments. If they're migrating from breathe, then it should be less painless because they're likely already writing their comments in rst (albeit encapsulated in @rst ... @endrst blocks). I'm not against supporting an @rst/@endrst command blocks out-of-the-box because its a very preferred approach for documenting C++ with Doxygen+breathe[+exhale]. Inline RST has a slightly different beginning command (@rst-inline ... @endrst)

Yes, perhaps we should support them out of the box for compatibility with breathe. Does breathe somehow define these automatically, or is it up to the user to define these as aliases in their doxygen config, and these are just the recommended names? If they are just the recommended names, perhaps we should just allow the user to specify them like any other alias.

Am I understanding your use of "migrating" correctly?

I'm thinking of the case where someone has a large existing codebase that they have been using doxygen or doxygen+breathe on, and they wish to switch to this extension, and also as they write new code avoid having to litter every doc comment with @rst and @endrst.

If there is just a global option, then they would have to convert all of their existing doc comments all at once.

@jbms
Copy link
Owner

jbms commented Jul 12, 2022

Are you going to change this to use raw_comment ?

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 12, 2022

Does breathe somehow define these automatically, or is it up to the user to define these as aliases in their doxygen config, and these are just the recommended names?

Breathe recommends these names but does not automatically define them. Some people use exhale which automatically define these commands (based on the breathe recommendations). FYI, exhale is just a wrapper for breathe that aims to be more automated and a quicker onboarding of breathe into projects; it also includes some patches for breathe...

Are you going to change this to use raw_comment?

yes, I've already got a stash for this. Currently, it will default to the line-by-line search if raw_comment is null.

The indentation removed here (on a subsequent
line) is interpretted as a blockquote while still
considered within the :param: field.

This change also satisfies what the the Doxygen
parser expects; meaning unexpected indentation
is prohibited amidst single a multi-lined paragraph.
@2bndy5 2bndy5 force-pushed the some-cpp-comment-stripping-parsing branch from 722bdad to 81ffc73 Compare July 16, 2022 08:49
adjust tests about leading whitespace for single line comments
@2bndy5 2bndy5 force-pushed the some-cpp-comment-stripping-parsing branch from 7ca7c5c to 808efa8 Compare July 16, 2022 09:41
@jbms
Copy link
Owner

jbms commented Jul 17, 2022

I think this currently doesn't handle the case where you have multiple consecutive comments of different types, e.g. a /* comment followed by a // comment.

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Jul 17, 2022

I think this currently doesn't handle the case where you have multiple consecutive comments of different types, e.g. a /* comment followed by a // comment.

Correct. I think that's a rare scenario but I could do some extra handling in split_comment_into_line().

tests/cpp_api_parser_test.py Outdated Show resolved Hide resolved
tests/cpp_api_parser_test.py Outdated Show resolved Hide resolved
tests/cpp_api_parser_test.py Outdated Show resolved Hide resolved
- reverted xrefs in Config docs (now that roles are fixed)
- removed old algorithm for parsing docstring line-by-line
- added regex to remove non-docstring comments from a raw_comment
- revised docstring parsing test with pytest.mark.paramtrize
- added 2 tests to check for proper removal of non-docstring comments
@2bndy5

This comment was marked as outdated.

@jbms

This comment was marked as outdated.

@jbms
Copy link
Owner

jbms commented Aug 12, 2022

Thanks!

This looks ready to merge!

@2bndy5
Copy link
Collaborator Author

2bndy5 commented Aug 12, 2022

Sorry about the messy git history, I was kinda hoping this would get squashed.

Also sorry for the delay, that other project that needed my attention has turned into a "whack-a-mole" carnival game. Ironically, it also uses the clang project... I'm going to need a vacation from clang when we get this feature finished. 😂

@jbms jbms merged commit c0675e9 into cpp-apigen Aug 12, 2022
@jbms jbms deleted the some-cpp-comment-stripping-parsing branch August 12, 2022 19:49
2bndy5 added a commit that referenced this pull request Aug 30, 2022
* fix cross-refs in docs

* enable cpp.apigen verbosity and show number of found decls

* adjust clang imports for quicker edits

* fix windows path separator compensation

* support `\` and `@` as cmd prefixes

- support for param direction
- support for retval cmd
- add new strip_comment() to strip all forms of C++ comment syntax from comment tokens' text
- use new strip_comment() instead of text.lstrip()
  This will also preserve consistent indentation that is needed for
  code-blocks (which isn't supported yet)
- modify index_interva.h to test some of the new features

* use explicit role for cross-refs

* admonition importance of Linux path separators

* change admonished text (per request)

* change erroneous admonition text

* allow blank docstr lines to get normalized

- add multiline flag for re.sub(brief/details) call
- ran black on api_parser.py

* add some unit tests

* add a blank line to array.h docstr

* change array.h until we support multline comments

The indentation removed here (on a subsequent
line) is interpretted as a blockquote while still
considered within the :param: field.

This change also satisfies what the the Doxygen
parser expects; meaning unexpected indentation
is prohibited amidst single a multi-lined paragraph.

* requested changes

* try to get raw_comment first

* update tests about comment stripping

* only dedent multiline comments during stripping

adjust tests about leading whitespace for single line comments

* [no ci] remove outdated conditional statement

* latest review requests

- reverted xrefs in Config docs (now that roles are fixed)
- removed old algorithm for parsing docstring line-by-line
- added regex to remove non-docstring comments from a raw_comment
- revised docstring parsing test with pytest.mark.paramtrize
- added 2 tests to check for proper removal of non-docstring comments

* return None when no raw_comment exists

* replace non-doc comments with blank lines

also added IDs to the growing parametrized test_comment_styles()

* satisfy review request about demo src (`\returns`)
jbms pushed a commit that referenced this pull request Sep 1, 2022
* fix cross-refs in docs

* enable cpp.apigen verbosity and show number of found decls

* adjust clang imports for quicker edits

* fix windows path separator compensation

* support `\` and `@` as cmd prefixes

- support for param direction
- support for retval cmd
- add new strip_comment() to strip all forms of C++ comment syntax from comment tokens' text
- use new strip_comment() instead of text.lstrip()
  This will also preserve consistent indentation that is needed for
  code-blocks (which isn't supported yet)
- modify index_interva.h to test some of the new features

* use explicit role for cross-refs

* admonition importance of Linux path separators

* change admonished text (per request)

* change erroneous admonition text

* allow blank docstr lines to get normalized

- add multiline flag for re.sub(brief/details) call
- ran black on api_parser.py

* add some unit tests

* add a blank line to array.h docstr

* change array.h until we support multline comments

The indentation removed here (on a subsequent
line) is interpretted as a blockquote while still
considered within the :param: field.

This change also satisfies what the the Doxygen
parser expects; meaning unexpected indentation
is prohibited amidst single a multi-lined paragraph.

* requested changes

* try to get raw_comment first

* update tests about comment stripping

* only dedent multiline comments during stripping

adjust tests about leading whitespace for single line comments

* [no ci] remove outdated conditional statement

* latest review requests

- reverted xrefs in Config docs (now that roles are fixed)
- removed old algorithm for parsing docstring line-by-line
- added regex to remove non-docstring comments from a raw_comment
- revised docstring parsing test with pytest.mark.paramtrize
- added 2 tests to check for proper removal of non-docstring comments

* return None when no raw_comment exists

* replace non-doc comments with blank lines

also added IDs to the growing parametrized test_comment_styles()

* satisfy review request about demo src (`\returns`)
jbms added a commit that referenced this pull request Sep 1, 2022
* Add default_literal_role

* Add C++ apigen extension

* Add default-literal-role and highlight-{push,pop} directives

Also adds facilities for saving/restoring default role state.

* Add {python_apigen,cpp_apigen,json_schema}_rst_{prolog,epilog} config options

This also ensures that the default roles and highlight language are
restored after generated JSON/C++/Python object descriptions are
inserted.

* Apply suggestions from code review

Co-authored-by: Brendan <2bndy5@gmail.com>

* Some cpp comment stripping parsing (#119)

* fix cross-refs in docs

* enable cpp.apigen verbosity and show number of found decls

* adjust clang imports for quicker edits

* fix windows path separator compensation

* support `\` and `@` as cmd prefixes

- support for param direction
- support for retval cmd
- add new strip_comment() to strip all forms of C++ comment syntax from comment tokens' text
- use new strip_comment() instead of text.lstrip()
  This will also preserve consistent indentation that is needed for
  code-blocks (which isn't supported yet)
- modify index_interva.h to test some of the new features

* use explicit role for cross-refs

* admonition importance of Linux path separators

* change admonished text (per request)

* change erroneous admonition text

* allow blank docstr lines to get normalized

- add multiline flag for re.sub(brief/details) call
- ran black on api_parser.py

* add some unit tests

* add a blank line to array.h docstr

* change array.h until we support multline comments

The indentation removed here (on a subsequent
line) is interpretted as a blockquote while still
considered within the :param: field.

This change also satisfies what the the Doxygen
parser expects; meaning unexpected indentation
is prohibited amidst single a multi-lined paragraph.

* requested changes

* try to get raw_comment first

* update tests about comment stripping

* only dedent multiline comments during stripping

adjust tests about leading whitespace for single line comments

* [no ci] remove outdated conditional statement

* latest review requests

- reverted xrefs in Config docs (now that roles are fixed)
- removed old algorithm for parsing docstring line-by-line
- added regex to remove non-docstring comments from a raw_comment
- revised docstring parsing test with pytest.mark.paramtrize
- added 2 tests to check for proper removal of non-docstring comments

* return None when no raw_comment exists

* replace non-doc comments with blank lines

also added IDs to the growing parametrized test_comment_styles()

* satisfy review request about demo src (`\returns`)

* Add warning that C++ apigen is experimental

* Add missing types-clang dependency

* Add pydantic.mypy plugin

* Change "libclang" extra -> "cpp"

Co-authored-by: Brendan <2bndy5@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants