New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8248001: javadoc generates invalid HTML pages whose ftp:// links are broken #5198
Conversation
👋 Welcome back myano! A progress list of the required criteria for merging this PR into |
Webrevs
|
|| lower.startsWith("http:") | ||
|| lower.startsWith("https:") | ||
|| lower.startsWith("file:")) { | ||
if (text.matches("^[^:/?#]+:.+$")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use java.net.URI
API to determine if the link contains a scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the link is in an HTML document and goes in an HTML document. If you wanted to use java.net.URI, depending on where from text
comes from and whereto it goes, you might need first to decode it using URLDecoder, and then you might need to re-encode it before spitting it out... That's a lot of operations where things could go wrong, especially if the link contains a query string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said a stricter regexp (unless I'm mistaken) could be: ^[a-zA-Z][a-zA-Z0-9+\-\.]*:.+$
[ from RFC 2396: scheme = alpha *( alpha | digit | "+" | "-" | "." ) ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concerns were correctness and code reuse. Using an API doesn't require one to read through RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue for simply adding ftp:
as an additional condition (unless there's other interesting URI schemes I'm not thinking of?). If a regex is to be used, I agree it should be much stricter (and defined in a constant, so that the Pattern
doesn't need to be compiled on each invocation?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally opt for a generic regexp-based solution such as proposed by @dfuch, but there is a security aspect to this as well (e.g. script invocation), so I'd go with the more conservative approach here to just add ftp:
protocol to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided the regex ^[^:/?#]+:.+$
from the description in RFC 2396.
B. Parsing a URI Reference with a Regular Expression
The following line is the regular expression for breaking-down a URI
reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9
...
Therefore, we can determine the value of the four components and fragment as
...
scheme = $2
I agree that adding ftp:
is better for the viewpoint of security. However, in addition to ftp, schemes such as javascript and git may be specified, so it's difficult to cover all commonly used schemes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That regexp will correctly break the URI into its different components but it doesn't guarantee that each of the component is syntactically correct - as further syntax restriction may apply on each of the components.
Thank you for your commnets. I would argue for simply adding |
I pushed the fix, can someone review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for choosing the conservative approach! The fix looks good.
Using actual domain names in the test is problematic. We should use example.com
, which has been reserved explicitly for this kind of purpose.
Also, I'm not convinced we have to introduced a new test for this, especially with the rather generic name of "TestHtmlDocletWriter". It seems there is an existing test for the feature in test/langtools/jdk/javadoc/doclet/testHrefInDocComment/TestHrefInDocComment.java
. Would it be possible to update/enhance the existing test? If there is a good reason to introduce a new test, it should have a more telling name.
|| lower.startsWith("file:") | ||
|| lower.startsWith("ftp:")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding ftp:
is OK, but given that the method is about modifying relative URLs, a reasonable/preferable alternative would be to use URI.isAbsolute
If a URISyntaxException
occurs while creating the URI, I would suggest it should simply return text
and not try and modify the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to use java.net.URI API has already been discussed above. #5198 (comment)
The method using regex has a security problem because it matches patterns other than ftp. So I adopted @hns 's suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonathan-gibbons Could you reply to the above comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masyano I think it's ok this way. After all, this is a simple bug and we have spent more than enough time on bike-shedding. If we want to improve the mechanism by using java.net.URI
we can file a separate issue for it.
@hns As you commented, I changed to add test codes in test/langtools/jdk/javadoc/doclet/testHrefInDocComment, |
@masyano thanks, this helps to not inflate the size of our test suite more than necessary. What is missing from the updated test is something like the
|
I updated the copyright date to 2021. I think
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@masyano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 315 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@hns) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@masyano thanks for the fix, and your patience in revising it. I can sponsor the PR for you. |
/integrate |
@hns Thank you for approving and sponsoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have to revoke my approval. I just saw there are several unclosed tags in the new test file, and the test fails.
package pkg; | ||
|
||
/** | ||
*This class has <a href="{@docRoot}/pkg/J1.html#functions">various functions</ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclosed a
element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the last few characters were trimmed. I fixed it.
protected Object field1; | ||
|
||
/** | ||
*<a href="{@docRoot}/pkg/J1.html#functions">Creates an instance which has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another unclosed a
element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, the last few characters were trimmed. I fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good now!
/integrate |
/sponsor |
Going to push as commit bb95dda.
Your commit was automatically rebased without conflicts. |
Could you please review the 8248001 bug fixes?
The problem is that javadoc generates invalid HTML pages whose ftp:// links are broken. The fix changes not to use protocol names directly, but to use regular expression.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5198/head:pull/5198
$ git checkout pull/5198
Update a local copy of the PR:
$ git checkout pull/5198
$ git pull https://git.openjdk.java.net/jdk pull/5198/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5198
View PR using the GUI difftool:
$ git pr show -t 5198
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5198.diff