Skip to content

8316734: URLEncoder should specify that replacement bytes will be used in case of coding error #16709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

DarraghClarke
Copy link
Contributor

@DarraghClarke DarraghClarke commented Nov 17, 2023

Currently the descriptions of URLEncoder.encode and URLDecoder.decode don't specify their use of replacement bytes or replacement character when they cannot handle a character or sequence of bytes. This is longstanding behavior but needs to be documented.

Solution

  • Added a new line to URLEncoder.encode API documentation to document that the charset's replacement bytes are used.

  • Also changed URLDecoder.decode API documentation to document its use of the charset's replacement character, also changed some wording.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Change requires CSR request JDK-8318004 to be approved
  • Commit message must refer to an issue

Issues

  • JDK-8316734: URLEncoder should specify that replacement bytes will be used in case of coding error (Bug - P4)
  • JDK-8318004: URLEncoder should specify that replacement bytes will be used in case of coding error (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16709/head:pull/16709
$ git checkout pull/16709

Update a local copy of the PR:
$ git checkout pull/16709
$ git pull https://git.openjdk.org/jdk.git pull/16709/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16709

View PR using the GUI difftool:
$ git pr show -t 16709

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16709.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 17, 2023

👋 Welcome back dclarke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels Nov 17, 2023
@openjdk
Copy link

openjdk bot commented Nov 17, 2023

@DarraghClarke The following label will be automatically applied to this pull request:

  • net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the net net-dev@openjdk.org label Nov 17, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 17, 2023

Webrevs

* @throws IllegalArgumentException if the implementation encounters illegal
* characters
* @throws IllegalArgumentException if the implementation encounters malformed
* escape sequences
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method specifies that it throws IAE, the implNote seems to be saying the same thing, do I read this correctly? I'm wondering if the implNote can be removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I see that there's another decode(String, String) method above in this file that has the same old @implNote but not @throws. Maybe the implNote should be removed there too and the @throws added.
Not sure it's worth touching the first @Deprecated decode(String) method though. Opinions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I see that there's another decode(String, String) method above in this file that has the same old @implNote but not @throws. Maybe the implNote should be removed there too and the @throws added. Not sure it's worth touching the first @Deprecated decode(String) method though. Opinions?

Since we are editing this method descriptions then it's probably best to add the throws IAE to the other 2-arg decode method. I suppose the 1-arg/deprecated decode method should document the exception too, doesn't need to be done in this PR of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to change all in this PR if there are no objections

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - let's fix it all here.

* <p>
* If any consecutive well-formed escape sequences cannot
* be decoded as a sequence of characters in the supplied {@code Charset}
* {@linkplain java.nio.charset.CharsetDecoder##cae the replacement character} will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a bit clearer to say that erroneous bytes are replaced with the Charset's replacement value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I just wanted to make sure I was understanding you correctly before committing the change.

Would it be something like this?

     * Erroneous bytes are replaced with the supplied {@code Charset}'s
     * {@linkplain java.nio.charset.CharsetDecoder##cae replacement value}.

Copy link
Member

@dfuch dfuch Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK with new text on the condition it is moved inside the paragraph that talks about decoding (appended to lines 147-148 above):

     * The supplied charset is used to determine
     * what characters are represented by any consecutive escape sequences of
     * the form "<i>{@code %xy}</i>". Erroneous bytes are replaced with the 
     * supplied {@code Charset}'s {@linkplain java.nio.charset.CharsetDecoder##cae
     * replacement value}.

Copy link
Member

@dfuch dfuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe wait until @AlanBateman has had a chance to re-review before integrating.

@@ -204,6 +204,9 @@ public static String encode(String s, String enc)
* "http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars">
* World Wide Web Consortium Recommendation</a> states that
* UTF-8 should be used. Not doing so may introduce incompatibilities.</em>
* <p>
* If a character needs encoding but cannot be encoded, the
* {@linkplain CharsetEncoder##cae replacement bytes} will be used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this text will appear in the "Note" section of the method description. We are adding normative text so I think would be better if the new text went into the first paragraph or introduce a new parameter before the "Note". We could replace the "Note" heading with @apiNote if you want to clean this up.

As regards the text, I think it would be more correct to say that if the input string is malformed, or if the input cannot be mapped to a valid byte sequence in the given charset, then the erroneous input with be replaced with the charset's replacement value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input Alan, I pushed a commit that makes use of @apiNote and changed the wording of the text. Let me know if there is anything else that could be improved

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the feedback on this, I think both classes look much better now.

@openjdk
Copy link

openjdk bot commented Nov 28, 2023

@DarraghClarke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8316734: URLEncoder should specify that replacement bytes will be used in case of coding error

Reviewed-by: dfuchs, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 162 new commits pushed to the master branch:

  • 69c0b24: 8320714: java/util/Locale/LocaleProvidersRun.java and java/util/ResourceBundle/modules/visibility/VisibilityTest.java timeout after passing
  • 66ae6d5: 8320899: Select the correct Makefile when running make in build directory
  • ebbef62: 8320769: Remove ill-adviced "make install" target
  • 86bb804: 8320863: dsymutil command leaves around temporary directories
  • db7fedf: 8320358: GHA: ignore jdk* branches
  • e33b6c1: 8319437: NMT should show library names in call stacks
  • 2fae07f: 8319311: JShell Process Builder should be configurable
  • 63ad868: 8319668: Fixup of jar filename typo in BadFactoryTest.sh
  • 4bcda60: 8319713: Parallel: Remove PSAdaptiveSizePolicy::should_full_GC
  • 99f870c: 8320781: Fix whitespace in j.l.Double and j.u.z.ZipInputStream @snippets
  • ... and 152 more: https://git.openjdk.org/jdk/compare/2e34a2ebf0f14043b129461b0397495e7e75a38b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Nov 28, 2023
@DarraghClarke
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Nov 29, 2023

Going to push as commit 48960df.
Since your change was applied there have been 180 commits pushed to the master branch:

  • 1594653: 8310644: Make panama memory segment close use async handshakes
  • 65dfcae: 8308399: Recommend --release when -source and -target are misused
  • 335f5db: 8320911: RISC-V: Enable hotspot/jtreg/compiler/intrinsics/chacha/TestChaCha20.java
  • 77d604a: 8319373: Serial: Refactor dirty cards scanning during Young GC
  • 38cfb22: 8318706: Implement JEP 423: Region Pinning for G1
  • e44d4b2: 8320858: Move jpackage tests to tier3
  • 5dcf3a5: 8320715: Improve the tests of test/hotspot/jtreg/compiler/intrinsics/float16
  • 78b6c2b: 8320898: exclude compiler/vectorapi/reshape/TestVectorReinterpret.java on ppc64(le) platforms
  • 9a6ca23: 8320918: Fix errors in the built-in Catalog implementation
  • 5e1b771: 8316422: TestIntegerUnsignedDivMod.java triggers "invalid layout" assert in FrameValues::validate
  • ... and 170 more: https://git.openjdk.org/jdk/compare/2e34a2ebf0f14043b129461b0397495e7e75a38b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 29, 2023
@openjdk openjdk bot closed this Nov 29, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 29, 2023
@openjdk
Copy link

openjdk bot commented Nov 29, 2023

@DarraghClarke Pushed as commit 48960df.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated net net-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants