-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8289220: Locale.forLanguageTag throws NPE due to soft ref used in locale cache being cleared #14211
Conversation
👋 Welcome back sunny868! A progress list of the required criteria for merging this PR into |
Webrevs
|
I don't think Locale.forLanguageTag is specified to return null so you might have to do a bit more analysis to see what this issue is about. There are soft refs usages in the locale and it could be that there is some that doesn't handle cases where the ref is cleared. |
This also likely has little to do with Shenandoah itself, and it "only" reproduces because Shenandoah has the aggressively cleaning mode. So the synopsis should reflect the exact issue in |
as @AlanBateman said, this patch may not be the radical solution. But once executed here, the 62 if (key == null || newVal == null) { -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot can trigger frequent GC without caring if the memory is really low. So the problem of soft references can be exposed. I suspect that this code is not executed under normal circumstances, so there is no exposure problem. |
No, we can't return null if the API isn't already specified to return null. I think you'll need to dig into soft ref usages in this code as I assume there is clearing happening at a time that the code doesn't expect. It's unlikely to be specific to ShenandoahGC. |
Jtreg tier1 can trigger the same error with vmoptions:"-Xcomp -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot
The following modification verifies this.
But this should not be a reasonable solution. I think I should to find a better solution. |
Yes, this seems to be a bug here so I think move the issue to core-libs/java.util:i18n as it looks like the caching done by BaseLocale needs to be re-examined. |
I've moved the issue to core-libs/java.util:i18n and change the title of the JBS issue to make it clearer what this is about. Can you adjust the PR description too as this is not ShenandoahGC issue. @naotoj We might want to think about adding more tests in this area to ensure that the locale cache is stress tested. |
…ale cache being cleared
@sunny868 Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
I've done basic testing jtreg tier1-3 and partial shenandoah testing with vmoptions |
Those SoftReference caches are introduced with this change: https://bugs.openjdk.org/browse/JDK-8196869, and there is a stress test added for it: test/jdk/java/util/Locale/SoftKeys.java |
As to the patch, would you please elaborate on your changes more? To me, it is simply inlining |
assert (key.holder != null && key.holderRef == null); | ||
BaseLocale l = key.holder; | ||
BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true); | ||
return (new Key(locale)).getBaseLocale(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a more rigorous approach would look like this:
BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true);
BaseLocal value = (new Key(locale)).getBaseLocale();
Reference.reachabilityFence(locale);
return value;
But the current patch has passed the tests with -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot
, so I think the current patch is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see a clear explanation of how the proposed patch fixes the problem. Also, I would appreciate the reasoning in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For hotspot, when GC occurs, it causes all threads to run to the nearest safepoint and then freeze. Generally, safepoints are generated at branch jumps, method ends(ret instructions), loops instructions, and so on. Therefore, the purpose of this patch is to make the creation and use of a softReferences in the same method without branch, jumps and loops in between, that is ensure that GC will not occur in the process of the sofeReferences be created and used.
That's why I didn't use Reference.reachabilityFence(locale)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning seems invalid. There are method calls in there, and you rely on inlining heuristics for this not to break. Please use reachabilityFence instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears you are assuming that some combination of bytecodes constitutes a critical section that excludes the GC. But the JVMS makes no guarantees about GC exclusion across bytecodes.
A thread can be pre-empted at any bytecode (even in the middle of a bytecode). Another thread can trigger a GC. Maybe the first thread will be rolled forward to a place convenient to the JVM, but you cannot predict what that will be, because the JVMS does not give you any contract about that matter.
For example, some JITs deoptimize (branch to the interpreter) at unpredictable points for reasons no Java programmer should ever think about (because it’s not in the JVMS contract). Deoptimizing can allocate storage (for example, to materialize objects whose allocation was deferred by escape analysis). Thus, it is not safe to assume that any particular bytecode is immune from GC.
Also, some JITs (like C2) inject synthetic safepoints injected as part of arbitrarily complex loop transformations. A GC at such a safepoint might possibly appear to be tied to a bytecode which is simply a fall-through from a previous bytecode. This can happen if the loop is rotated, and a fallthrough point begins to function as a back-branch in the IR.
The rule of thumb for non-JIT engineers is, if you find yourself trying to predict what how “the JIT must work”, stop.
The net of this is that, if you need to preserve an object across a critical section, don’t try to read the JIT’s mind or expect it to read yours. Put in a reachability fence that spans that critical section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what the right fix for this is. If it involves reachabilityFence, then a try-finally statement should be used, with the "critical section" being within the try-clause and the reachabilityFence within the finally-clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fisk @rose00 for the explanation. This patch is indeed based on the idea that small functions are inlined, cause -Xcomp
be used, and I have seen with parameter -XX:+PrintCompilation -XX:+PrintInlining
that functions Key::<init>
and getBaseLocale
are inlined.
But as @rose00 say, it's not safe, so reachabilityFence should be used. please review again.
All JMH tests or some of them? I had trigger JMH test, waiting for run results. |
Existing JMH Local test results be tested on LOONGARCH64 show no performance fallback.
|
…ale cache being cleared
…ale cache being cleared
…ale cache being cleared
BaseLocale l = key.holder; | ||
BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true); | ||
try { | ||
BaseLocale value = (new Key(locale)).getBaseLocale(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Why do we need to create thrown-away new Key
here?
Can we just use locale?
In its current form the
That being said, the number of long-lived |
// subclass must return non-null key/value object | ||
return null; | ||
throw new NullPointerException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this caught somewhere, it seem strange to throw NPE here.
The motivation seems to be startup (JDK-8196869). It looks like it needs to be re-evaluated and if we continue with caching in this area then I think we'll need some stress tests to go with it. |
Good point Daniel. I created an issue for re-examining the cache mechanism in |
command: make test CONF=fastdebug JTREG="VM_OPTIONS=-Xcomp" TEST=gc/TestAllocHumongousFragment.java
error info:
Caused by: java.lang.NullPointerException: Cannot invoke "sun.util.locale.BaseLocale.getVariant()" because "base" is null
at java.base/java.util.Locale.forLanguageTag(Locale.java:1802)
at java.base/sun.util.cldr.CLDRBaseLocaleDataMetaInfo.(CLDRBaseLocaleDataMetaInfo.java:41)
... 24 more
Note that the test runs with -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot and SoftReferences are involved (LocaleObjectCache uses SoftReferences, used by printf method called in getRandomInstance(Utils.java:511)).
Maybe we have to deal with the case where the getBaseLocale() return value is null. the call stack is:
in LocaleObjectCache.java:64
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14211/head:pull/14211
$ git checkout pull/14211
Update a local copy of the PR:
$ git checkout pull/14211
$ git pull https://git.openjdk.org/jdk.git pull/14211/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14211
View PR using the GUI difftool:
$ git pr show -t 14211
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14211.diff
Webrev
Link to Webrev Comment