Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8259842: Remove Result cache from StringCoding #2102

Closed
wants to merge 29 commits into from
Closed

Conversation

@cl4es
Copy link
Member

@cl4es cl4es commented Jan 15, 2021

The StringCoding.resultCached mechanism is used to remove the allocation of a StringCoding.Result object on potentially hot paths used in some String constructors. Using a ThreadLocal has overheads though, and the overhead got a bit worse after JDK-8258596 which addresses a leak by adding a SoftReference.

This patch refactors much of the decode logic back into String and gets rid of not only the Result cache, but the Result class itself along with the StringDecoder class and cache.

Microbenchmark results:

Baseline

Benchmark                                           (charsetName)  Mode  Cnt    Score    Error   Units
decodeCharset                                            US-ASCII  avgt   15  193.043 ±  8.207   ns/op
decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15  112.009 ±  0.001    B/op
decodeCharset                                          ISO-8859-1  avgt   15  164.580 ±  6.514   ns/op
decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
decodeCharset                                               UTF-8  avgt   15  328.370 ± 18.420   ns/op
decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15  224.019 ±  0.002    B/op
decodeCharset                                               MS932  avgt   15  328.870 ± 20.056   ns/op
decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15  232.020 ±  0.002    B/op
decodeCharset                                          ISO-8859-6  avgt   15  193.603 ± 12.305   ns/op
decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15  112.010 ±  0.001    B/op
decodeCharsetName                                        US-ASCII  avgt   15  209.454 ±  9.040   ns/op
decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15  112.009 ±  0.001    B/op
decodeCharsetName                                      ISO-8859-1  avgt   15  188.234 ±  7.533   ns/op
decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
decodeCharsetName                                           UTF-8  avgt   15  399.463 ± 12.437   ns/op
decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15  224.019 ±  0.003    B/op
decodeCharsetName                                           MS932  avgt   15  358.839 ± 15.385   ns/op
decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15  208.017 ±  0.003    B/op
decodeCharsetName                                      ISO-8859-6  avgt   15  162.570 ±  7.090   ns/op
decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15  112.009 ±  0.001    B/op
decodeDefault                                                 N/A  avgt   15  316.081 ± 11.101   ns/op
decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15  224.019 ±  0.002    B/op

Patched:
Benchmark                                           (charsetName)  Mode  Cnt    Score    Error   Units
decodeCharset                                            US-ASCII  avgt   15  159.153 ±  6.082   ns/op
decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15  112.009 ±  0.001    B/op
decodeCharset                                          ISO-8859-1  avgt   15  134.433 ±  6.203   ns/op
decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15  112.009 ±  0.001    B/op
decodeCharset                                               UTF-8  avgt   15  297.234 ± 21.654   ns/op
decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15  224.019 ±  0.002    B/op
decodeCharset                                               MS932  avgt   15  311.583 ± 16.445   ns/op
decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15  208.018 ±  0.002    B/op
decodeCharset                                          ISO-8859-6  avgt   15  156.216 ±  6.522   ns/op
decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15  112.010 ±  0.001    B/op
decodeCharsetName                                        US-ASCII  avgt   15  180.752 ±  9.411   ns/op
decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15  112.010 ±  0.001    B/op
decodeCharsetName                                      ISO-8859-1  avgt   15  156.170 ±  8.003   ns/op
decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15  112.010 ±  0.001    B/op
decodeCharsetName                                           UTF-8  avgt   15  370.337 ± 22.199   ns/op
decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15  224.019 ±  0.001    B/op
decodeCharsetName                                           MS932  avgt   15  312.589 ± 15.067   ns/op
decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15  208.018 ±  0.002    B/op
decodeCharsetName                                      ISO-8859-6  avgt   15  173.205 ±  9.647   ns/op
decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15  112.009 ±  0.001    B/op
decodeDefault                                                 N/A  avgt   15  303.459 ± 16.452   ns/op
decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15  224.019 ±  0.001    B/op

Most variants improve. There's a small added overhead in String charsetName variants for some charsets such as ISO-8859-6 that benefited slightly from the StringDecoder cache due avoiding a lookup, but most variants are not helped by this cache and instead see a significant gain from skipping that step. Charset variants don't need a lookup and improve across the board.

Another drawback is that we need to cram more logic into String to bypass the StringCoding.Result indirection - but getting rid of two commonly used ThreadLocal caches and most cases getting a bit better raw throughput in the process I think more than enough makes up for that.

Testing: tier1-4


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2102/head:pull/2102
$ git checkout pull/2102

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 15, 2021

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@cl4es The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Jan 15, 2021
@cl4es cl4es marked this pull request as ready for review Jan 15, 2021
@openjdk openjdk bot added the rfr label Jan 15, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 15, 2021

@RogerRiggs
Copy link
Contributor

@RogerRiggs RogerRiggs commented Jan 15, 2021

Interesting, I was/am in the middle of converting Result to be a Valhalla primitive class to reduce the memory pressure and had written some new jmh tests too.
And to reduce the dependency on ThreadLocals.

@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 15, 2021

Interesting, I was/am in the middle of converting Result to be a Valhalla primitive class to reduce the memory pressure and had written some new jmh tests too.
And to reduce the dependency on ThreadLocals.

Ok, I expect that would end up at similar performance while retaining the separation of concerns. But this way we're not dependent on valhalla to get rid of the TLs.

I'd be happy to add more JMH tests here. I expected this area to already have some, but it seems all the micros added during the work on compact strings work in JDK 9 are unaccounted for.

Copy link
Member

@naotoj naotoj left a comment

Nit: copyright year in System.java is 2021.

src/java.base/share/classes/java/lang/String.java Outdated Show resolved Hide resolved
@plevart
Copy link
Contributor

@plevart plevart commented Jan 15, 2021

Hi Claes,
This is quite an undertaking in re-factoring!
I think that decoding logic that you produced can still be kept in StringCoding class though. The private constructor that you created for UTF-8 decoding is unnecessary. The logic can be kept in a static method and then the String instance constructed in it from the value and coder. The only public constructor that remains is then the one taking a Charset parameter. I took your code and moved it back to StringCoding while for the mentioned constructor I tried the following trick which could work (I haven't tested yet with JMH): if you pass an instance of newly constructed object down to a method that is inlined into the caller and that method does not pass that object to any other methods, JIT will eliminate the heap allocation, so I suspect that you can pass results from the called method back in that object and still avoid allocation...

plevart@691600e

WDYT?

@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 15, 2021

WDYT?

I get that the approach I took got a bit messy, but I've just spent some time cleaning it up. Please have a look at the latest, which outlines much of the logic and consolidates the replace/throw logic in the UTF8 decode paths. I've checked it does not regress on the micro, and I think the overall state of the code now isn't much messier than the original code.

@plevart
Copy link
Contributor

@plevart plevart commented Jan 16, 2021

Some common logic is now extracted into methods. That's good. But much of the decoding logic still remains in String. I still think all of static methods can be moved to StringCoding directly as they are now while the private UTF-8 decoding constructor can be replaced with static method and moved to StringCoding. The only problem might be the public String constructor taking Charset parameter. But even here, passing a newly constructed mutable object to the method can be used to return multiple values from the method while relying on JIT to eliminate object allocation.
Do you think this code belongs more to String than to StringCoding?

@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 16, 2021

Do you think this code belongs more to String than to StringCoding?

I consider StringCoding an implementation detail of String, so I'm not sure there's (much) value in maintaining the separation of concern if it is cause for a performance loss. While encapsulating and separating concerns is a fine purpose I think the main purpose served by StringCoding is to resolve some bootstrap issues: String is one of the first classes to be loaded and eagerly pulling in dependencies like ThreadLocal and Charsets before bootstrapping leads to all manner of hard to decipher issues (yes - I've tried :-)).

@plevart
Copy link
Contributor

@plevart plevart commented Jan 17, 2021

This looks much better now. Maybe just one small suggestion: java.lang.StringCoding#lookupCharset is used in two places and in both places the same exception handling/rethrowing logic is wrapped around the invocation. So you could move that logic into the lookupCharset method which would simplify call sites. You could even get rid of String.lookupCharset method that way.

@plevart
Copy link
Contributor

@plevart plevart commented Jan 17, 2021

When you combine the logic of String.lookupCharset:

    private static Charset lookupCharset(String charsetName)
            throws UnsupportedEncodingException {
        Objects.requireNonNull(charsetName);
        try {
            Charset cs = StringCoding.lookupCharset(charsetName);
            if (cs == null) {
                throw new UnsupportedEncodingException(charsetName);
            }
            return cs;
        } catch (IllegalCharsetNameException ics) {
            throw new UnsupportedEncodingException(charsetName);
        }
    }

... and StringCoding.lookupCharset:

    static Charset lookupCharset(String csn) {
        if (Charset.isSupported(csn)) {
            try {
                return Charset.forName(csn);
            } catch (UnsupportedCharsetException x) {
                throw new Error(x);
            }
        }
        return null;
    }

...you get this:

    private static Charset lookupCharset(String charsetName)
            throws UnsupportedEncodingException {
        Objects.requireNonNull(charsetName);
        try {
            Charset cs;
            inner: {
                if (Charset.isSupported(charsetName)) {
                    try {
                        cs = Charset.forName(charsetName);
                        break inner;
                    } catch (UnsupportedCharsetException x) {
                        throw new Error(x);
                    }
                }
                cs = null;
            }
            if (cs == null) {
                throw new UnsupportedEncodingException(charsetName);
            }
            return cs;
        } catch (IllegalCharsetNameException ics) {
            throw new UnsupportedEncodingException(charsetName);
        }
    }

...and that can be simplified to this:

    static Charset lookupCharset(String csn) throws UnsupportedEncodingException {
        Objects.requireNonNull(csn);
        try {
            return Charset.forName(csn);
        } catch (UnsupportedCharsetException | IllegalCharsetNameException x) {
            throw new UnsupportedEncodingException(csn);
        }
    }

which has an additional benefit that it only performs one lookup (Charset.forName) instead of two (Charset.isSupported & Charset.forName)...

@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 17, 2021

@plevart: I simplified the lookup logic based on your suggestion. Removed some unreachable paths in and simplified StringCoding.encode(String, byte, byte[]) as a result.

Simplifying to one lookup speeds up decodeCharsetName cases a notch:

before:
decodeCharsetName          UTF-8  avgt   15  370.337 ± 22.199  ns/op
after:
decodeCharsetName          UTF-8  avgt   15  335.971 ± 15.894  ns/op
Copy link
Contributor

@plevart plevart left a comment

This looks good.
Are you planning to do similar things for encoding too?

@openjdk
Copy link

@openjdk openjdk bot commented Jan 17, 2021

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259842: Remove Result cache from StringCoding

Reviewed-by: naoto, plevart, rriggs

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 8 new commits pushed to the master branch:

  • 2f47c39: 8259943: FileDescriptor.close0 does not handle EINTR
  • a8073ef: 8253478: (se) epoll Selector should use eventfd for wakeup instead of pipe
  • 34eb8b3: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC
  • c3c6662: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp
  • 6ce0799: 8259851: Use boolean type for tasks in SubTasksDone
  • 4bcffeb: 8260029: aarch64: fix typo in verify_oop_array
  • e1de0bf: 8260043: Reduce allocation in sun.net.www.protocol.jar.Handler.parseURL
  • 4dfd8cc: 8259897: gtest os.dll_address_to_function_and_library_name_vm fails on AIX

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jan 17, 2021
@plevart
Copy link
Contributor

@plevart plevart commented Jan 17, 2021

I already approved the changes and they are OK. Maybe for a followup: just noticing after the fact that logic for newStringUTF8NoRepl(....) vs. new String(...., StringCoding.UTF_8) differ in handling unmappabale characters, which is by the spec, but also the constructor contains special handling of input that only contains non-negative bytes:

        if (charset == UTF_8) {
            if (COMPACT_STRINGS && !StringCoding.hasNegatives(bytes, offset, length)) {
                this.value = Arrays.copyOfRange(bytes, offset, offset + length);
                this.coder = LATIN1;
                return;

...while newStringUTF8NoRepl(....) does not contain this optimization. I guess ZipCoder could benefit from that optimization too since paths are mostly ASCII only. So WDYT of this additional simplification/consolidation of UTF-8 decoding:

plevart@0b8b12c

@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 17, 2021

newStringUTF8NoRepl(....) does not contain this optimization.

Good catch: this optimization was in the original code for newStringNoRepl but not newStringUTF8NoRepl. I'll put it back to avoid a regression, but this time into newStringUTF8NoRepl so that both paths get the optimization.

Copy link
Contributor

@plevart plevart left a comment

This looks good, Claes.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

The large number of package exposed methods in StringCoding is ugly and makes the code harder to maintain.
Can the code duplication of UTF8 in the String constructors be reduced?
It would be cleaner to move all of the remaining StringCoding methods into String and make them private again. Reading the code now requires quite a bit of cross referencing and the invariants are hard to verify.

while (sp < sl) {
int b1 = src[sp++];
static int decodeUTF8_UTF16(byte[] bytes, int offset, int sl, byte[] dst, int dp, boolean doReplace) {
while (offset < sl) {

This comment has been minimized.

@RogerRiggs

RogerRiggs Jan 19, 2021
Contributor

The renaming of sp -> offset seems unmotivated and breaks the symmetry with dp.

Copy link
Contributor

@RogerRiggs RogerRiggs left a comment

This looks very good. Thanks for the extra refactoring and consolidating of code.

}
static final Charset ISO_8859_1 = sun.nio.cs.ISO_8859_1.INSTANCE;
static final Charset US_ASCII = sun.nio.cs.US_ASCII.INSTANCE;
static final Charset UTF_8 = sun.nio.cs.UTF_8.INSTANCE;

This comment has been minimized.

@RogerRiggs

RogerRiggs Jan 21, 2021
Contributor

These could move to String also, if there is a benefit them being local.
Otherwise, the uses in String could refer directly to the INSTANCEs in the sun.nio.cs... classes.

@@ -522,85 +48,18 @@ final String requestedCharsetName() {
*/
private static native void err(String msg);

This comment has been minimized.

@RogerRiggs

RogerRiggs Jan 21, 2021
Contributor

A separate cleanup could remove this unused method and the corresponding native code in StringCoding.c.

}

@IntrinsicCandidate
private static int implEncodeISOArray(byte[] sa, int sp,
public static int implEncodeISOArray(byte[] sa, int sp,
byte[] da, int dp, int len) {
int i = 0;

This comment has been minimized.

@RogerRiggs

RogerRiggs Jan 21, 2021
Contributor

As a separate cleanup...
If there isn't any value to these intrinsics being in StringCoder, they could also move to String
with the corresponding intrinsic references.

@naotoj
naotoj approved these changes Jan 21, 2021
@cl4es
Copy link
Member Author

@cl4es cl4es commented Jan 22, 2021

Passed testing tiers 1-4

/integrate

@openjdk openjdk bot closed this Jan 22, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Jan 22, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 22, 2021

@cl4es Since your change was applied there have been 14 commits pushed to the master branch:

  • d066f2b: 8260030: Improve stringStream buffer handling
  • 1452280: 8164484: Unity, JTable cell editor, javax/swing/JComboBox/6559152/bug6559152.java
  • a70acf2: 8259928: compiler/jvmci tests fail with -Xint
  • ba38661: 8259882: Reduce the inclusion of perfData.hpp
  • 92c2f08: 8259869: [macOS] Remove desktop module dependencies on JNF Reference APIs
  • a7c2ebc: 8239894: Xserver crashes when the wrong high refresh rate is used
  • 2f47c39: 8259943: FileDescriptor.close0 does not handle EINTR
  • a8073ef: 8253478: (se) epoll Selector should use eventfd for wakeup instead of pipe
  • 34eb8b3: 8255765: Shenandoah: Isolate concurrent, degenerated and full GC
  • c3c6662: 8259954: gc/shenandoah/mxbeans tests fail with -Xcomp
  • ... and 4 more: https://git.openjdk.java.net/jdk/compare/7f7166dbc8822afaa39382cf9cb42be8458b0e4b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 58ceb25.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants