8314774: Optimize URLEncoder #15354

Glavo · 2023-08-19T20:57:27Z

I mainly made these optimizations:

Avoid allocating StringBuilder when there are no characters in the URL that need to be encoded;
~~Implement a fast path for UTF-8.~~ (Has been removed from this PR)

In addition to improving performance, these optimizations also reduce temporary objects:

It no longer allocates any object when there are no characters in the URL that need to be encoded;
The initial size of StringBuilder is larger to avoid expansion as much as possible;
~~For UTF-8, the temporary CharArrayWriter, strings and byte arrays are no longer needed.~~ (Has been removed from this PR)

I also updated the tests to add more test cases.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8314774: Optimize URLEncoder (Enhancement - P4)

Reviewers

Claes Redestad (@cl4es - Reviewer) ⚠️ Review applies to a2cb7b30
Daniel Fuchs (@dfuch - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15354/head:pull/15354
$ git checkout pull/15354

Update a local copy of the PR:
$ git checkout pull/15354
$ git pull https://git.openjdk.org/jdk.git pull/15354/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15354

View PR using the GUI difftool:
$ git pr show -t 15354

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15354.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-08-19T20:58:41Z

👋 Welcome back Glavo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-08-19T20:59:45Z

@Glavo The following label will be automatically applied to this pull request:

net

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

caizixian · 2023-08-22T14:38:57Z

Please use https://bugs.openjdk.org/browse/JDK-8314774

mlbridge · 2023-08-22T20:02:49Z

Webrevs

jaikiran · 2023-08-23T07:22:59Z

Hello Glavo, for changes like these, I think it would be more productive and useful to create a mailing list discussion first to provide some context on why this change is needed and gathering inputs from people familiar with this code on whether this change is necessary and worth it. Such discussions will then give the Reviewers some context and inputs on what needs to be considered in these changes and to what extent the changes should be done in the code.

With the proposed changes in this PR which touches the character encoding handling and such, I think this will need a very thorough review keeping aside the performance aspects. I don't have enough experience of this class to know if it's worth doing this amount of change for any kind of performance improvements which may not be visible outside of micro benchmarks.

Glavo · 2023-08-23T17:03:55Z

Hello Glavo, for changes like these, I think it would be more productive and useful to create a mailing list discussion first to provide some context on why this change is needed and gathering inputs from people familiar with this code on whether this change is necessary and worth it. Such discussions will then give the Reviewers some context and inputs on what needs to be considered in these changes and to what extent the changes should be done in the code.

I see. Thank you for your suggestion.

I don't have enough experience of this class to know if it's worth doing this amount of change for any kind of performance improvements which may not be visible outside of micro benchmarks.

I know it's usually not a performance bottleneck, so the main goal of this PR is to reduce temporary object allocations.

I noticed that this method is called quite frequently in our code, and it is also used in popular frameworks such as spring. I want to minimize GC pressure by minimizing unnecessary temporary objects.

Since that method almost always uses UTF-8, I think it's worth providing a fast path for UTF-8. If it is too difficult to review, then I will try to optimize it in other ways.

dfuch · 2023-08-23T18:51:37Z

The fast path that just returns the given string if ASCII-only and no encoding looks simple enough. I don't particularly like the idea of embedding the logic of encoding UTF-8 into that class though, that increases the complexity significantly, and Charset encoders are there for that. Also I don't understand the reason for changing BitSet into a boolean array - that seems gratuitous?

Glavo · 2023-08-23T23:55:21Z

I don't particularly like the idea of embedding the logic of encoding UTF-8 into that class though, that increases the complexity significantly, and Charset encoders are there for that.

Unfortunately, the CharsetEncoder is too generic. Due to our knowledge of UTF-8, implementing it inline eliminates unnecessary temporary objects. There are already some places that do this, such as String.

I'm thinking we might be able to extract this logic into a static helper class.

public class UTF8EncodeUtils {
    public static boolean isSingleByte(char c) { return c < 0x80; }
    public static boolean isDoubleBytes(char c) { return c < 0x800; }

    public static byte[] encodeDoubleBytes(char c) {
        byte b0 = (byte) (0xc0 | (c >> 6));
        byte b1 = (byte) (0x80 | (c & 0x3f));
        return new byte[]{b0, b1};
    }

    public static byte[] encodeThreeBytes(char c) {
        byte b0 = (byte) (0xe0 | (c >> 12));
        byte b1 = (byte) (0x80 | ((c >> 6) & 0x3f));
        byte b2 = (byte) (0x80 | (c & 0x3f));
        return new byte[]{b0, b1, b2};
    }

    public static byte[] encodeCodePoint(int uc) {
        byte b0 = (byte) (0xf0 | ((uc >> 18)));
        byte b1 = (byte) (0x80 | ((uc >> 12) & 0x3f));
        byte b2 = (byte) (0x80 | ((uc >> 6) & 0x3f));
        byte b3 = (byte) (0x80 | (uc & 0x3f));
        return new byte[]{b0, b1, b2, b3};
    }
}

We can use this helper class to reimplement String and the UTF-8 CharsetEncoder (after we make sure it has no overhead), then use it to implement more UTF-8 fast paths.

I've also been doing some work on OutputStreamWriter recently. By implementing a fast path for UTF-8, there are over 20x speedups in some cases. I think maybe we can get exciting improvements in more places.

Glavo · 2023-08-24T00:01:44Z

Also I don't understand the reason for changing BitSet into a boolean array - that seems gratuitous?

I observed a throughput improvement of 7%~10% after switching from BitSet to boolean[].

Glavo · 2023-08-24T02:22:46Z

I will extract the logic of encoding UTF-8 to UTF8EncodeUtils, and then rerun the benchmark:

Baseline:
Benchmark                                         (count)  (maxLength)  (mySeed)  Mode  Cnt        Score   Error   Units
URLEncodeDecode.testEncodeUTF8                        1024         1024         3  avgt   15        5.582 ± 0.009   ms/op
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate          1024         1024         3  avgt   15     1439.974 ± 2.386  MB/sec
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate.norm     1024         1024         3  avgt   15  8429374.434 ± 0.239    B/op
URLEncodeDecode.testEncodeUTF8:gc.count               1024         1024         3  avgt   15        6.000          counts
URLEncodeDecode.testEncodeUTF8:gc.time                1024         1024         3  avgt   15        9.000              ms

Inline UTF-8 encoding:
Benchmark                                         (count)  (maxLength)  (mySeed)  Mode  Cnt        Score       Error   Units
URLEncodeDecode.testEncodeUTF8                        1024         1024         3  avgt   15        3.681 ±     0.156   ms/op
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate          1024         1024         3  avgt   15      519.050 ±    23.530  MB/sec
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate.norm     1024         1024         3  avgt   15  2000689.365 ± 12769.291    B/op
URLEncodeDecode.testEncodeUTF8:gc.count               1024         1024         3  avgt   15        3.000              counts
URLEncodeDecode.testEncodeUTF8:gc.time                1024         1024         3  avgt   15        3.000                  ms

Use UTF8EncodeUtils:
Benchmark                                         (count)  (maxLength)  (mySeed)  Mode  Cnt        Score       Error   Units
URLEncodeDecode.testEncodeUTF8                        1024         1024         3  avgt   15        3.753 ±     0.169   ms/op
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate          1024         1024         3  avgt   15      507.190 ±    24.402  MB/sec
URLEncodeDecode.testEncodeUTF8:gc.alloc.rate.norm     1024         1024         3  avgt   15  1992529.825 ± 12769.347    B/op
URLEncodeDecode.testEncodeUTF8:gc.count               1024         1024         3  avgt   15        3.000              counts
URLEncodeDecode.testEncodeUTF8:gc.time                1024         1024         3  avgt   15        3.000                  ms

Using UTF8EncodeUtils is approximately 2% slower, which is acceptable as it does not increase the object allocation rate.

Compared to baseline, this PR reduces memory allocation by 76%.

dfuch · 2023-08-24T09:44:21Z

I am not sure the added complexity is worth the gain. It's fine for String to have special knowledge of UTF-8 but I don't think we want that to bleed all over the place.

Glavo · 2023-09-18T23:17:32Z

/integrate

openjdk · 2023-09-18T23:19:15Z

@Glavo
Your change (at version a2cb7b3) is now ready to be sponsored by a Committer.

AlanBateman · 2023-09-19T06:46:32Z

I ran the tier1 test with no failures.

It's very important to run the tier2 tests as that is where the jdk_net test group runs.

Glavo · 2023-09-19T06:56:48Z

I ran the tier1 test with no failures.

It's very important to run the tier2 tests as that is where the jdk_net test group runs.

I see. I ran tier2 and the only failure seemed unrelated (runtime/Thread/ThreadCountLimit.java).

src/java.base/share/classes/java/net/URLEncoder.java

Co-authored-by: Claes Redestad <claes.redestad@oracle.com>

dfuch

Nice work @Glavo, @cl4es. I am happy with where this pull request eventually ended. Thanks for your patience and for taking on so many feedback!
Please make sure tier2 tests are still passing before integrating.

cl4es · 2023-09-19T11:50:04Z

You need to issue /integrate again since there's been changes since the last time.

Glavo · 2023-09-19T11:53:17Z

You need to issue /integrate again since there's been changes since the last time.

I thought I had to wait for a re review after the modification to integrate.

Glavo · 2023-09-19T11:53:23Z

/integrate

Glavo · 2023-09-19T11:54:14Z

Nice work @Glavo, @cl4es. I am happy with where this pull request eventually ended. Thanks for your patience and for taking on so many feedback! Please make sure tier2 tests are still passing before integrating.

I'm re-running the tier2 tests. I'll reply here when it's done.

cl4es · 2023-09-19T11:55:40Z

Re-approval isn't actually required but perhaps it would be good form to pick up that habit.

Glavo · 2023-09-19T13:38:07Z

I ran tier1~2 tests and there were no new failures.

Glavo · 2023-09-19T13:38:59Z

/integrate

openjdk · 2023-09-19T13:41:25Z

@Glavo
Your change (at version 9eb12c8) is now ready to be sponsored by a Committer.

openjdk · 2023-09-19T13:43:00Z

@Glavo
Your change (at version 9eb12c8) is now ready to be sponsored by a Committer.

cl4es · 2023-09-19T13:47:27Z

/sponsor

openjdk · 2023-09-19T13:50:02Z

Going to push as commit f25c920.
Since your change was applied there have been 13 commits pushed to the master branch:

7c5f2a2: 8315669: Open source several Swing PopupMenu related tests
cf74b8c: 8316337: (bf) Concurrency issue in DirectByteBuffer.Deallocator
4461eeb: 8312498: Thread::getState and JVM TI GetThreadState should return TIMED_WAITING virtual thread is timed parked
670b456: 8315038: Capstone disassembler stops when it sees a bad instruction
fab372d: 8316428: G1: Nmethod count statistics only count last code root set iterated
283c360: 8314877: Make fields final in 'java.net' package
86115c2: 8316420: Serial: Remove unused GenCollectedHeap::oop_iterate
d038571: 8030815: Code roots are not accounted for in region prediction
138542d: 8316061: Open source several Swing RootPane and Slider related tests
f52e500: 8316104: Open source several Swing SplitPane and RadioButton related tests
... and 3 more: https://git.openjdk.org/jdk/compare/373e37bf13df654ba40c0bd9fcf345215be4eafb...master

Your commit was automatically rebased without conflicts.

RogerRiggs · 2023-09-19T13:48:27Z

src/java.base/share/classes/java/net/URLEncoder.java

                if (c == ' ') {
                    c = '+';


The extra test (on every regular character) for space could be moved to a separate if at line 255 (and remove space from DONT_NEED_ENCODING). The performance improvement might not be noticable but it would remove an anomaly from the algorithm.

openjdk · 2023-09-19T13:50:10Z

@cl4es @Glavo Pushed as commit f25c920.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Glavo · 2023-09-19T14:09:03Z

@cl4es I also have a PR (#15353) to remove DEFAULT_ENCODING_NAME from URLEncoder and URLDecoder, can you take a look at it?

Glavo added 8 commits August 20, 2023 00:51

Optimize URLEncoder

cee7c9b

fix comment

703d026

fix encode surrogate pair

2ec1924

fix Decoder test

bb1221e

update SurrogatePairs test

9ec8fdd

fix SurrogatePairs test

e7d7f62

update SurrogatePairs test

9c9c3bc

Remove CASE_DIFF

00733ae

openjdk bot added the net net-dev@openjdk.org label Aug 19, 2023

fix checkstyle

6eea3ea

Glavo changed the title ~~Optimize URLEncoder~~ 8314774: Optimize URLEncoder Aug 22, 2023

openjdk bot added the rfr Pull request is ready for review label Aug 22, 2023

UTF8EncodeUtils

03b1127

openjdk bot removed the rfr Pull request is ready for review label Aug 24, 2023

Glavo added 2 commits August 24, 2023 09:28

fix test

5385a1a

Use byte[] in UTF8EncodeUtils

d8b94dd

openjdk bot added the rfr Pull request is ready for review label Aug 24, 2023

Glavo added 2 commits August 24, 2023 09:59

Add @ForceInline

92d93c0

Add final modifier

82626c9

openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 18, 2023

cl4es reviewed Sep 19, 2023

View reviewed changes

src/java.base/share/classes/java/net/URLEncoder.java Outdated Show resolved Hide resolved

Update src/java.base/share/classes/java/net/URLEncoder.java

9eb12c8

Co-authored-by: Claes Redestad <claes.redestad@oracle.com>

openjdk bot removed the sponsor Pull request is ready to be sponsored label Sep 19, 2023

dfuch approved these changes Sep 19, 2023

View reviewed changes

openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 19, 2023

RogerRiggs reviewed Sep 19, 2023

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Sep 19, 2023

openjdk bot closed this Sep 19, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 19, 2023

Glavo deleted the url-encoder branch September 19, 2023 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8314774: Optimize URLEncoder #15354

8314774: Optimize URLEncoder #15354

Glavo commented Aug 19, 2023 •

edited by openjdk bot

bridgekeeper bot commented Aug 19, 2023

openjdk bot commented Aug 19, 2023

caizixian commented Aug 22, 2023

mlbridge bot commented Aug 22, 2023 •

edited

jaikiran commented Aug 23, 2023

Glavo commented Aug 23, 2023

dfuch commented Aug 23, 2023

Glavo commented Aug 23, 2023 •

edited

Glavo commented Aug 24, 2023

Glavo commented Aug 24, 2023 •

edited

dfuch commented Aug 24, 2023

Glavo commented Sep 18, 2023

openjdk bot commented Sep 18, 2023

AlanBateman commented Sep 19, 2023

Glavo commented Sep 19, 2023 •

edited

dfuch left a comment •

edited

cl4es commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

cl4es commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

cl4es commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

RogerRiggs Sep 19, 2023

openjdk bot commented Sep 19, 2023

Glavo commented Sep 19, 2023

8314774: Optimize URLEncoder #15354

8314774: Optimize URLEncoder #15354

Conversation

Glavo commented Aug 19, 2023 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Aug 19, 2023

openjdk bot commented Aug 19, 2023

caizixian commented Aug 22, 2023

mlbridge bot commented Aug 22, 2023 • edited

Webrevs

jaikiran commented Aug 23, 2023

Glavo commented Aug 23, 2023

dfuch commented Aug 23, 2023

Glavo commented Aug 23, 2023 • edited

Glavo commented Aug 24, 2023

Glavo commented Aug 24, 2023 • edited

dfuch commented Aug 24, 2023

Glavo commented Sep 18, 2023

openjdk bot commented Sep 18, 2023

AlanBateman commented Sep 19, 2023

Glavo commented Sep 19, 2023 • edited

dfuch left a comment • edited

Choose a reason for hiding this comment

cl4es commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

cl4es commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

cl4es commented Sep 19, 2023

openjdk bot commented Sep 19, 2023

RogerRiggs Sep 19, 2023

Choose a reason for hiding this comment

openjdk bot commented Sep 19, 2023

Glavo commented Sep 19, 2023

Glavo commented Aug 19, 2023 •

edited by openjdk bot

mlbridge bot commented Aug 22, 2023 •

edited

Glavo commented Aug 23, 2023 •

edited

Glavo commented Aug 24, 2023 •

edited

Glavo commented Sep 19, 2023 •

edited

dfuch left a comment •

edited