Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8288699: cleanup HTML tree in HtmlDocletWriter.commentTagsToContent #9210

Conversation

jonathan-gibbons
Copy link
Contributor

@jonathan-gibbons jonathan-gibbons commented Jun 19, 2022

Please review a particularly satisfying review of a cleanup for
HtmlDocletWriter.commentTagsToContent, and the use of RawHtml.

The background for this is a separate cleanup for CommentHelper.getText,
during which it was observed that the code could be eliminated, with some
uses being replaced by HtmlDocletWriter.commentTagsToContent. That led
to more tags being processed, which in turn led to the observation that
Content.charCount was often giving incorrect results.

The primary root cause of the problem with Content.charCount was the
handling of StartElementTree in the visitor in commentTagsToContent.
The problem is that code there creates a sequence of text nodes for the
attributes, preceded by a RawHtml object containing <TAGNAME and followed
by another separate RawHtml node containing just >. The net result
is that charCount is mistaken into including the size of the intervening
text nodes in the total, because it does not recognize the "unbalanced" <
and > in the surrounding nodes.

The general system fix for commentTagsToContent is to change the visit...
methods to always append to the Content object supplied as a parameter, instead
of to a fixed local variable (result) in the enclosing scope. This allows
visitStartElement to accumulate the attributes in a temporary buffer,
and then create a single RawHtml node as the translation of the
StartElementTree. Changing the methodology of the visitors also
required disentangling the processing of {@docRoot}, which generally
always appears inside the href attribute, typically followed by
the rest of the URL. The complication there is that since forever there
has been special handling of {@docRoot}/.. which can be replaced by
the value of the --docrootparent option. In reviewing that part of the code,
it may help to just ignore the old code and just review the new replacement
code.

Starting with the code in commentTagsToContent, it became worthwhile to
use factory methods on RawHtml for stylized uses of the class, for start
element
, end element, and comment. These all have a charCount of zero,
so it only becomes necessary to compute the character count when given
arbitrary text.

Related, the Text and TextBuilder classes are changed so that they
contain unescaped content, and only escape the HTML characters when they
are written out. This allows their charCount to simply return the length()
of the string content, instead of having to parse the content to account
of any entities that may be present. This also means that newline characters
will be counted; previously, they were not.

Another noteworthy change. Somewhat usually, some resource strings for the
description of mandated methods, such as for enum and record classes, contain
simple HTML (<i>...</i>), which is modelled in a single TextTree and
subsequently converted to a non-standard use of RawHtml. To avoid
significantly changing the resource strings, the solution is to parse the
resources into a suitable List<DocTree>, using a relatively simple regular
expression. The goal is to not extend the support any further than that
provided.

Finally, one more change/bug fix. We do not claim to support HTML CDATA
sections, but there is one in the JDK API docs (in coll-index.html).
It "mostly worked" by accident, but dropped the leading <, which "broke"
the section in the generated output. I've put code into DocCommentParser
to handle CDATA sections, representing them (for now) as a Text node
containing just the section itself. This is then handled specially up in
commentTagsToContent, where it is translated to a new RawHtml node.

For reviewing, I suggest starting with RawHtml and then HtmlDocletWriter.
Many of the other changes are derivative changes, such as changing
new RawHtml(...) to RawHtml.of(...) and similar other changes.

All tests continue to pass without change.

There are minor changes to the generated docs, in two flavors.

  1. There is one instance, in Collectors.html, where the fixes to charCount
    are enough to change the separator after the type parameters in a
    method summary table from &nbsp; to <br>.
    See AbstractMemberWriter line 207. The current threshold is 10, and
    some of the methods in Collectors.html were less and are now more.

  2. A number of files use an unescaped > in doc comments. These are now
    translated to &gt; in the output. The change in behavior is because of
    changing to use html.Text nodes instead of html.RawHtml nodes to represent
    user-provided text in TextTree nodes.
    Previously, the generated output was inconsistent in this area: most
    instances used &gt;, but user-provided > was passed through.
    Now, the output is consistent.
    FWIW, tidy prefers the use of &gt; as well.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8288699: cleanup HTML tree in HtmlDocletWriter.commentTagsToContent

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9210/head:pull/9210
$ git checkout pull/9210

Update a local copy of the PR:
$ git checkout pull/9210
$ git pull https://git.openjdk.org/jdk pull/9210/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9210

View PR using the GUI difftool:
$ git pr show -t 9210

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9210.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2022

👋 Welcome back jjg! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 20, 2022
@openjdk
Copy link

openjdk bot commented Jun 20, 2022

@jonathan-gibbons The following labels will be automatically applied to this pull request:

  • compiler
  • javadoc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added javadoc javadoc-dev@openjdk.org compiler compiler-dev@openjdk.org labels Jun 20, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 20, 2022

Webrevs

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, apart from a few minor nitpicks in the attribute handling code.

The only remaining issue I have is that I don't (yet) fully convinced the old {@docRoot} handling code does the same thing as the new one, so I'm still working on that.

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had another look at the old code. I think it's safe to say that the new code is much better and does the same thing as the old code.

@openjdk
Copy link

openjdk bot commented Jun 30, 2022

@jonathan-gibbons This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8288699: cleanup HTML tree in HtmlDocletWriter.commentTagsToContent

Reviewed-by: hannesw

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • e779585: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 30, 2022
Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in the additional commit c312880 look good to me.

@openjdk
Copy link

openjdk bot commented Jul 8, 2022

@jonathan-gibbons this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8288699.commentTagsToContent
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Jul 8, 2022
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Jul 8, 2022
@jonathan-gibbons
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 8, 2022

Going to push as commit 54b4576.
Since your change was applied there have been 4 commits pushed to the master branch:

  • 6aaf141: 8289984: Files:isDirectory and isRegularFile methods not throwing SecurityException
  • 1877533: 6522064: Aliases from Microsoft CryptoAPI has bad character encoding
  • 9c86c82: 8282714: synthetic arguments are being added to the constructors of static local classes
  • e779585: 8271707: migrate tests to use jdk.test.whitebox.WhiteBox

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 8, 2022
@openjdk openjdk bot closed this Jul 8, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 8, 2022
@openjdk
Copy link

openjdk bot commented Jul 8, 2022

@jonathan-gibbons Pushed as commit 54b4576.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jonathan-gibbons jonathan-gibbons deleted the 8288699.commentTagsToContent branch November 18, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler compiler-dev@openjdk.org integrated Pull request has been integrated javadoc javadoc-dev@openjdk.org
2 participants