8316734: URLEncoder should specify that replacement bytes will be used in case of coding error#16709
8316734: URLEncoder should specify that replacement bytes will be used in case of coding error#16709DarraghClarke wants to merge 4 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back dclarke! A progress list of the required criteria for merging this PR into |
|
@DarraghClarke The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
| * @throws IllegalArgumentException if the implementation encounters illegal | ||
| * characters | ||
| * @throws IllegalArgumentException if the implementation encounters malformed | ||
| * escape sequences |
There was a problem hiding this comment.
The method specifies that it throws IAE, the implNote seems to be saying the same thing, do I read this correctly? I'm wondering if the implNote can be removed.
There was a problem hiding this comment.
That's a good point. I see that there's another decode(String, String) method above in this file that has the same old @implNote but not @throws. Maybe the implNote should be removed there too and the @throws added.
Not sure it's worth touching the first @Deprecated decode(String) method though. Opinions?
There was a problem hiding this comment.
That's a good point. I see that there's another
decode(String, String)method above in this file that has the same old@implNotebut not@throws. Maybe the implNote should be removed there too and the@throwsadded. Not sure it's worth touching the first@Deprecated decode(String)method though. Opinions?
Since we are editing this method descriptions then it's probably best to add the throws IAE to the other 2-arg decode method. I suppose the 1-arg/deprecated decode method should document the exception too, doesn't need to be done in this PR of course.
There was a problem hiding this comment.
I'd be happy to change all in this PR if there are no objections
| * <p> | ||
| * If any consecutive well-formed escape sequences cannot | ||
| * be decoded as a sequence of characters in the supplied {@code Charset} | ||
| * {@linkplain java.nio.charset.CharsetDecoder##cae the replacement character} will be used. |
There was a problem hiding this comment.
I think it would be a bit clearer to say that erroneous bytes are replaced with the Charset's replacement value.
There was a problem hiding this comment.
Thanks for the suggestion, I just wanted to make sure I was understanding you correctly before committing the change.
Would it be something like this?
* Erroneous bytes are replaced with the supplied {@code Charset}'s
* {@linkplain java.nio.charset.CharsetDecoder##cae replacement value}.
There was a problem hiding this comment.
I am OK with new text on the condition it is moved inside the paragraph that talks about decoding (appended to lines 147-148 above):
* The supplied charset is used to determine
* what characters are represented by any consecutive escape sequences of
* the form "<i>{@code %xy}</i>". Erroneous bytes are replaced with the
* supplied {@code Charset}'s {@linkplain java.nio.charset.CharsetDecoder##cae
* replacement value}.
dfuch
left a comment
There was a problem hiding this comment.
LGTM. Maybe wait until @AlanBateman has had a chance to re-review before integrating.
| * UTF-8 should be used. Not doing so may introduce incompatibilities.</em> | ||
| * <p> | ||
| * If a character needs encoding but cannot be encoded, the | ||
| * {@linkplain CharsetEncoder##cae replacement bytes} will be used. |
There was a problem hiding this comment.
I think this text will appear in the "Note" section of the method description. We are adding normative text so I think would be better if the new text went into the first paragraph or introduce a new parameter before the "Note". We could replace the "Note" heading with @apiNote if you want to clean this up.
As regards the text, I think it would be more correct to say that if the input string is malformed, or if the input cannot be mapped to a valid byte sequence in the given charset, then the erroneous input with be replaced with the charset's replacement value.
There was a problem hiding this comment.
Thanks for the input Alan, I pushed a commit that makes use of @apiNote and changed the wording of the text. Let me know if there is anything else that could be improved
AlanBateman
left a comment
There was a problem hiding this comment.
Thanks for taking the feedback on this, I think both classes look much better now.
|
@DarraghClarke This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 162 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
/integrate |
|
Going to push as commit 48960df.
Your commit was automatically rebased without conflicts. |
|
@DarraghClarke Pushed as commit 48960df. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Currently the descriptions of
URLEncoder.encodeandURLDecoder.decodedon't specify their use of replacement bytes or replacement character when they cannot handle a character or sequence of bytes. This is longstanding behavior but needs to be documented.Solution
Added a new line to
URLEncoder.encodeAPI documentation to document that the charset's replacement bytes are used.Also changed
URLDecoder.decodeAPI documentation to document its use of the charset's replacement character, also changed some wording.Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16709/head:pull/16709$ git checkout pull/16709Update a local copy of the PR:
$ git checkout pull/16709$ git pull https://git.openjdk.org/jdk.git pull/16709/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16709View PR using the GUI difftool:
$ git pr show -t 16709Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16709.diff
Webrev
Link to Webrev Comment