New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8259842: Remove Result cache from StringCoding #2102
Conversation
…ringCoding rather than eagerly initialize on String clinit
|
Webrevs
|
Interesting, I was/am in the middle of converting Result to be a Valhalla primitive class to reduce the memory pressure and had written some new jmh tests too. |
Ok, I expect that would end up at similar performance while retaining the separation of concerns. But this way we're not dependent on valhalla to get rid of the TLs. I'd be happy to add more JMH tests here. I expected this area to already have some, but it seems all the micros added during the work on compact strings work in JDK 9 are unaccounted for. |
Hi Claes, WDYT? |
I get that the approach I took got a bit messy, but I've just spent some time cleaning it up. Please have a look at the latest, which outlines much of the logic and consolidates the replace/throw logic in the UTF8 decode paths. I've checked it does not regress on the micro, and I think the overall state of the code now isn't much messier than the original code. |
Some common logic is now extracted into methods. That's good. But much of the decoding logic still remains in String. I still think all of static methods can be moved to StringCoding directly as they are now while the private UTF-8 decoding constructor can be replaced with static method and moved to StringCoding. The only problem might be the public String constructor taking Charset parameter. But even here, passing a newly constructed mutable object to the method can be used to return multiple values from the method while relying on JIT to eliminate object allocation. |
I consider StringCoding an implementation detail of String, so I'm not sure there's (much) value in maintaining the separation of concern if it is cause for a performance loss. While encapsulating and separating concerns is a fine purpose I think the main purpose served by StringCoding is to resolve some bootstrap issues: String is one of the first classes to be loaded and eagerly pulling in dependencies like ThreadLocal and Charsets before bootstrapping leads to all manner of hard to decipher issues (yes - I've tried :-)). |
When you combine the logic of String.lookupCharset:
... and StringCoding.lookupCharset:
...you get this:
...and that can be simplified to this:
which has an additional benefit that it only performs one lookup (Charset.forName) instead of two (Charset.isSupported & Charset.forName)... |
@plevart: I simplified the lookup logic based on your suggestion. Removed some unreachable paths in and simplified Simplifying to one lookup speeds up
|
This looks good.
Are you planning to do similar things for encoding too?
@cl4es This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 8 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the
|
I already approved the changes and they are OK. Maybe for a followup: just noticing after the fact that logic for
...while |
Good catch: this optimization was in the original code for |
The large number of package exposed methods in StringCoding is ugly and makes the code harder to maintain.
Can the code duplication of UTF8 in the String constructors be reduced?
It would be cleaner to move all of the remaining StringCoding methods into String and make them private again. Reading the code now requires quite a bit of cross referencing and the invariants are hard to verify.
while (sp < sl) { | ||
int b1 = src[sp++]; | ||
static int decodeUTF8_UTF16(byte[] bytes, int offset, int sl, byte[] dst, int dp, boolean doReplace) { | ||
while (offset < sl) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The renaming of sp -> offset seems unmotivated and breaks the symmetry with dp.
…nd the ThreadLocal encoder facility.
This looks very good. Thanks for the extra refactoring and consolidating of code.
} | ||
static final Charset ISO_8859_1 = sun.nio.cs.ISO_8859_1.INSTANCE; | ||
static final Charset US_ASCII = sun.nio.cs.US_ASCII.INSTANCE; | ||
static final Charset UTF_8 = sun.nio.cs.UTF_8.INSTANCE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These could move to String also, if there is a benefit them being local.
Otherwise, the uses in String could refer directly to the INSTANCEs in the sun.nio.cs... classes.
@@ -522,85 +48,18 @@ final String requestedCharsetName() { | |||
*/ | |||
private static native void err(String msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate cleanup could remove this unused method and the corresponding native code in StringCoding.c.
} | ||
|
||
@IntrinsicCandidate | ||
private static int implEncodeISOArray(byte[] sa, int sp, | ||
public static int implEncodeISOArray(byte[] sa, int sp, | ||
byte[] da, int dp, int len) { | ||
int i = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a separate cleanup...
If there isn't any value to these intrinsics being in StringCoder, they could also move to String
with the corresponding intrinsic references.
Passed testing tiers 1-4 /integrate |
@cl4es Since your change was applied there have been 14 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 58ceb25. |
The
StringCoding.resultCached
mechanism is used to remove the allocation of aStringCoding.Result
object on potentially hot paths used in someString
constructors. Using aThreadLocal
has overheads though, and the overhead got a bit worse after JDK-8258596 which addresses a leak by adding aSoftReference
.This patch refactors much of the decode logic back into
String
and gets rid of not only theResult
cache, but theResult
class itself along with theStringDecoder
class and cache.Microbenchmark results:
Most variants improve. There's a small added overhead in
String charsetName
variants for some charsets such asISO-8859-6
that benefited slightly from theStringDecoder
cache due avoiding a lookup, but most variants are not helped by this cache and instead see a significant gain from skipping that step.Charset
variants don't need a lookup and improve across the board.Another drawback is that we need to cram more logic into
String
to bypass theStringCoding.Result
indirection - but getting rid of two commonly usedThreadLocal
caches and most cases getting a bit better raw throughput in the process I think more than enough makes up for that.Testing: tier1-4
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2102/head:pull/2102
$ git checkout pull/2102