Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8248001: javadoc generates invalid HTML pages whose ftp:// links are broken #5198

Closed
wants to merge 9 commits into from

Conversation

masyano
Copy link

@masyano masyano commented Aug 20, 2021

Could you please review the 8248001 bug fixes?

The problem is that javadoc generates invalid HTML pages whose ftp:// links are broken. The fix changes not to use protocol names directly, but to use regular expression.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8248001: javadoc generates invalid HTML pages whose ftp:// links are broken

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5198/head:pull/5198
$ git checkout pull/5198

Update a local copy of the PR:
$ git checkout pull/5198
$ git pull https://git.openjdk.java.net/jdk pull/5198/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5198

View PR using the GUI difftool:
$ git pr show -t 5198

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5198.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 20, 2021

👋 Welcome back myano! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 20, 2021

@masyano The following label will be automatically applied to this pull request:

  • javadoc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the javadoc javadoc-dev@openjdk.org label Aug 20, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 20, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 20, 2021

Webrevs

|| lower.startsWith("http:")
|| lower.startsWith("https:")
|| lower.startsWith("file:")) {
if (text.matches("^[^:/?#]+:.+$")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use java.net.URI API to determine if the link contains a scheme?

Copy link
Member

@dfuch dfuch Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the link is in an HTML document and goes in an HTML document. If you wanted to use java.net.URI, depending on where from text comes from and whereto it goes, you might need first to decode it using URLDecoder, and then you might need to re-encode it before spitting it out... That's a lot of operations where things could go wrong, especially if the link contains a query string.

Copy link
Member

@dfuch dfuch Aug 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said a stricter regexp (unless I'm mistaken) could be: ^[a-zA-Z][a-zA-Z0-9+\-\.]*:.+$
[ from RFC 2396: scheme = alpha *( alpha | digit | "+" | "-" | "." ) ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concerns were correctness and code reuse. Using an API doesn't require one to read through RFC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue for simply adding ftp: as an additional condition (unless there's other interesting URI schemes I'm not thinking of?). If a regex is to be used, I agree it should be much stricter (and defined in a constant, so that the Pattern doesn't need to be compiled on each invocation?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally opt for a generic regexp-based solution such as proposed by @dfuch, but there is a security aspect to this as well (e.g. script invocation), so I'd go with the more conservative approach here to just add ftp: protocol to the list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided the regex ^[^:/?#]+:.+$ from the description in RFC 2396.

B. Parsing a URI Reference with a Regular Expression

   The following line is the regular expression for breaking-down a URI
   reference into its components.

      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
       12            3  4          5       6  7        8 9
  ...
   Therefore, we can determine the value of the four components and fragment as
  ...
      scheme    = $2

I agree that adding ftp: is better for the viewpoint of security. However, in addition to ftp, schemes such as javascript and git may be specified, so it's difficult to cover all commonly used schemes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That regexp will correctly break the URI into its different components but it doesn't guarantee that each of the component is syntactically correct - as further syntax restriction may apply on each of the components.

@masyano
Copy link
Author

masyano commented Sep 7, 2021

Thank you for your commnets. I would argue for simply adding ftp:.

@masyano
Copy link
Author

masyano commented Sep 13, 2021

I pushed the fix, can someone review it?

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for choosing the conservative approach! The fix looks good.

Using actual domain names in the test is problematic. We should use example.com, which has been reserved explicitly for this kind of purpose.

Also, I'm not convinced we have to introduced a new test for this, especially with the rather generic name of "TestHtmlDocletWriter". It seems there is an existing test for the feature in test/langtools/jdk/javadoc/doclet/testHrefInDocComment/TestHrefInDocComment.java. Would it be possible to update/enhance the existing test? If there is a good reason to introduce a new test, it should have a more telling name.

Comment on lines +1705 to +1706
|| lower.startsWith("file:")
|| lower.startsWith("ftp:")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding ftp: is OK, but given that the method is about modifying relative URLs, a reasonable/preferable alternative would be to use URI.isAbsolute

If a URISyntaxException occurs while creating the URI, I would suggest it should simply return text and not try and modify the text.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to use java.net.URI API has already been discussed above. #5198 (comment)

The method using regex has a security problem because it matches patterns other than ftp. So I adopted @hns 's suggestion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-gibbons Could you reply to the above comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masyano I think it's ok this way. After all, this is a simple bug and we have spent more than enough time on bike-shedding. If we want to improve the mechanism by using java.net.URI we can file a separate issue for it.

@masyano
Copy link
Author

masyano commented Sep 24, 2021

@hns As you commented, I changed to add test codes in test/langtools/jdk/javadoc/doclet/testHrefInDocComment,
and not to introduce a new test. I modified the test to use www.example.com instead of www.domain.com.

@hns
Copy link
Member

hns commented Sep 27, 2021

@masyano thanks, this helps to not inflate the size of our test suite more than necessary.

What is missing from the updated test is something like the checkHtml method in your original test that verifies the correct links in the generated files. Also, the copyright date in the TestHrefInDocComment.java should be updated to 2021.

@hns As you commented, I changed to add test codes in test/langtools/jdk/javadoc/doclet/testHrefInDocComment,
and not to introduce a new test. I modified the test to use www.example.com instead of www.domain.com.

@masyano
Copy link
Author

masyano commented Sep 28, 2021

I updated the copyright date to 2021.

I think checkHtml is not necessary because JavadocTester checks links by checkLinks(). The following is log when the fix has not been applyed.

  Checked 11 files.
  Found 189 references to 54 anchors in 16 files and 6 other URIs.
*      1 missing files
       0 duplicate ids
       0 missing ids
  Schemes
       1 file
       1 ftp
       1 http
       2 https
       1 mailto
  Hosts
       1 www.example.com
       2 docs.oracle.com
       1 www.exsample.com
FAILED: 1 errors found when checking links
        at javadoc.tester.JavadocTester.checkLinks(JavadocTester.java:565)
        at javadoc.tester.JavadocTester.javadoc(JavadocTester.java:384)
        at TestHrefInDocComment.test(TestHrefInDocComment.java:46)

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@openjdk
Copy link

openjdk bot commented Sep 30, 2021

@masyano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8248001: javadoc generates invalid HTML pages whose ftp:// links are broken

Reviewed-by: hannesw

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 315 new commits pushed to the master branch:

  • 2f955d6: 8273142: Remove dependancy of TestHttpServer, HttpTransaction, HttpCallback from open/test/jdk/sun/net/www/protocol/http/ tests
  • 94e31e5: 8274506: TestPids.java and TestPidsLimit.java fail with podman run as root
  • a8210c5: 8274401: C2: GraphKit::load_array_element bypasses Access API
  • dfc557c: 8274406: RunThese30M.java failed "assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block)) failed: early is high enough"
  • c0533ef: 8274522: java/lang/management/ManagementFactory/MXBeanException.java test fails with Shenandoah
  • f8415a9: 8274523: java/lang/management/MemoryMXBean/MemoryTest.java test should handle Shenandoah
  • 355356c: 8273435: Remove redundant zero-length check in ClassDesc.of
  • 97385d4: 8274405: Suppress warnings on non-serializable non-transient instance fields in javac and javadoc
  • 79cebe2: 8274050: Unnecessary Vector usage in javax.crypto
  • 97b2874: 8274509: Remove stray * and stylistic . from doc comments
  • ... and 305 more: https://git.openjdk.java.net/jdk/compare/a522d6b53cd841b4bfe87eac5778c9e5cdf5e90f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@hns) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 30, 2021
@hns
Copy link
Member

hns commented Sep 30, 2021

@masyano thanks for the fix, and your patience in revising it. I can sponsor the PR for you.

@masyano
Copy link
Author

masyano commented Sep 30, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 30, 2021
@openjdk
Copy link

openjdk bot commented Sep 30, 2021

@masyano
Your change (at version b68c760) is now ready to be sponsored by a Committer.

@masyano
Copy link
Author

masyano commented Sep 30, 2021

@hns Thank you for approving and sponsoring.

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have to revoke my approval. I just saw there are several unclosed tags in the new test file, and the test fails.

@openjdk openjdk bot removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated labels Sep 30, 2021
package pkg;

/**
*This class has <a href="{@docRoot}/pkg/J1.html#functions">various functions</
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclosed a element

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the last few characters were trimmed. I fixed it.

protected Object field1;

/**
*<a href="{@docRoot}/pkg/J1.html#functions">Creates an instance which has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another unclosed a element

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the last few characters were trimmed. I fixed it.

Copy link
Member

@hns hns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good now!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 30, 2021
@masyano
Copy link
Author

masyano commented Sep 30, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 30, 2021
@openjdk
Copy link

openjdk bot commented Sep 30, 2021

@masyano
Your change (at version c9d6489) is now ready to be sponsored by a Committer.

@hns
Copy link
Member

hns commented Sep 30, 2021

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 30, 2021

Going to push as commit bb95dda.
Since your change was applied there have been 315 commits pushed to the master branch:

  • 2f955d6: 8273142: Remove dependancy of TestHttpServer, HttpTransaction, HttpCallback from open/test/jdk/sun/net/www/protocol/http/ tests
  • 94e31e5: 8274506: TestPids.java and TestPidsLimit.java fail with podman run as root
  • a8210c5: 8274401: C2: GraphKit::load_array_element bypasses Access API
  • dfc557c: 8274406: RunThese30M.java failed "assert(!LCA_orig->dominates(pred_block) || early->dominates(pred_block)) failed: early is high enough"
  • c0533ef: 8274522: java/lang/management/ManagementFactory/MXBeanException.java test fails with Shenandoah
  • f8415a9: 8274523: java/lang/management/MemoryMXBean/MemoryTest.java test should handle Shenandoah
  • 355356c: 8273435: Remove redundant zero-length check in ClassDesc.of
  • 97385d4: 8274405: Suppress warnings on non-serializable non-transient instance fields in javac and javadoc
  • 79cebe2: 8274050: Unnecessary Vector usage in javax.crypto
  • 97b2874: 8274509: Remove stray * and stylistic . from doc comments
  • ... and 305 more: https://git.openjdk.java.net/jdk/compare/a522d6b53cd841b4bfe87eac5778c9e5cdf5e90f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 30, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 30, 2021
@openjdk
Copy link

openjdk bot commented Sep 30, 2021

@hns @masyano Pushed as commit bb95dda.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated javadoc javadoc-dev@openjdk.org
6 participants