Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8289220: Locale.forLanguageTag throws NPE due to soft ref used in locale cache being cleared #14211

Closed
wants to merge 5 commits into from

Conversation

sunny868
Copy link
Contributor

@sunny868 sunny868 commented May 30, 2023

command: make test CONF=fastdebug JTREG="VM_OPTIONS=-Xcomp" TEST=gc/TestAllocHumongousFragment.java
error info:

Caused by: java.lang.NullPointerException: Cannot invoke "sun.util.locale.BaseLocale.getVariant()" because "base" is null
at java.base/java.util.Locale.forLanguageTag(Locale.java:1802)
at java.base/sun.util.cldr.CLDRBaseLocaleDataMetaInfo.(CLDRBaseLocaleDataMetaInfo.java:41)
... 24 more

Note that the test runs with -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot and SoftReferences are involved (LocaleObjectCache uses SoftReferences, used by printf method called in getRandomInstance(Utils.java:511)).

Maybe we have to deal with the case where the getBaseLocale() return value is null. the call stack is:

at java.base/sun.util.locale.LocaleObjectCache.get(LocaleObjectCache.java:64)
at java.base/sun.util.locale.BaseLocale.getInstance(BaseLocale.java:169)
at java.base/sun.util.locale.InternalLocaleBuilder.getBaseLocale(InternalLocaleBuilder.java:524)
at java.base/java.util.Locale.forLanguageTag(Locale.java:1874)

in LocaleObjectCache.java:64

 62             if (key == null || newVal == null) {                                
 63                 // subclass must return non-null key/value object               
 64                 return null; // run here
 65             }

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8289220: Locale.forLanguageTag throws NPE due to soft ref used in locale cache being cleared (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14211/head:pull/14211
$ git checkout pull/14211

Update a local copy of the PR:
$ git checkout pull/14211
$ git pull https://git.openjdk.org/jdk.git pull/14211/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14211

View PR using the GUI difftool:
$ git pr show -t 14211

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14211.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 30, 2023

👋 Welcome back sunny868! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 30, 2023
@openjdk
Copy link

openjdk bot commented May 30, 2023

@sunny868 The following labels will be automatically applied to this pull request:

  • core-libs
  • i18n

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org labels May 30, 2023
@mlbridge
Copy link

mlbridge bot commented May 30, 2023

Webrevs

@AlanBateman
Copy link
Contributor

AlanBateman commented May 30, 2023

I don't think Locale.forLanguageTag is specified to return null so you might have to do a bit more analysis to see what this issue is about. There are soft refs usages in the locale and it could be that there is some that doesn't handle cases where the ref is cleared.

@shipilev
Copy link
Member

This also likely has little to do with Shenandoah itself, and it "only" reproduces because Shenandoah has the aggressively cleaning mode. So the synopsis should reflect the exact issue in Locale, once found.

@sunny868
Copy link
Contributor Author

as @AlanBateman said, this patch may not be the radical solution. But once executed here, the return null should be handled.

62 if (key == null || newVal == null) {
63 // subclass must return non-null key/value object
64 return null; // then what we can do?
65 }

-XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot can trigger frequent GC without caring if the memory is really low. So the problem of soft references can be exposed. I suspect that this code is not executed under normal circumstances, so there is no exposure problem.

@AlanBateman
Copy link
Contributor

as @AlanBateman said, this patch may not be the radical solution. But once executed here, the return null should be handled.

No, we can't return null if the API isn't already specified to return null. I think you'll need to dig into soft ref usages in this code as I assume there is clearing happening at a time that the code doesn't expect. It's unlikely to be specific to ShenandoahGC.

@sunny868
Copy link
Contributor Author

sunny868 commented May 31, 2023

Jtreg tier1 can trigger the same error with vmoptions:"-Xcomp -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot
I found the GC occurs between when the soft reference is assigned and when it is used.


         private BaseLocale getBaseLocale() {
-            return (holder == null) ? holderRef.get() : holder;
+//            return (holder == null) ? holderRef.get() : holder;
+            if (holder == null) {
+              System.out.println("getBaseLocale this=" + this + "  SoftReference=" + holderRef.get()); // null
+              return holderRef.get();
+            } else {
+              System.out.println("getBaseLocale return holder");
+              return holder;
+            }
         }

The following modification verifies this.


--- a/src/java.base/share/classes/sun/util/locale/BaseLocale.java
+++ b/src/java.base/share/classes/sun/util/locale/BaseLocale.java

@@ -257,19 +261,21 @@ public final class BaseLocale {

         private final boolean normalized;
         private final int hash;
-
+        private BaseLocale locale;    // make locale to a member variable
         private Key(String language, String script, String region,
                     String variant, boolean normalize) {
-            BaseLocale locale = new BaseLocale(language, script, region, variant, normalize);
+            locale = new BaseLocale(language, script, region, variant, normalize);
             this.normalized = normalize;

But this should not be a reasonable solution. I think I should to find a better solution.

@AlanBateman
Copy link
Contributor

Jtreg tier1 can trigger the same error with vmoptions:"-Xcomp -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot I found the GC occurs between when the soft reference is assigned and when it is used.

Yes, this seems to be a bug here so I think move the issue to core-libs/java.util:i18n as it looks like the caching done by BaseLocale needs to be re-examined.

@AlanBateman
Copy link
Contributor

AlanBateman commented Jun 1, 2023

I've moved the issue to core-libs/java.util:i18n and change the title of the JBS issue to make it clearer what this is about. Can you adjust the PR description too as this is not ShenandoahGC issue.

@naotoj We might want to think about adding more tests in this area to ensure that the locale cache is stress tested.

@openjdk
Copy link

openjdk bot commented Jun 1, 2023

@sunny868 Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@sunny868
Copy link
Contributor Author

sunny868 commented Jun 1, 2023

I've done basic testing jtreg tier1-3 and partial shenandoah testing with vmoptions -Xcomp -XX:+UseShenandoahGC -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot, the results are all OK.

@naotoj
Copy link
Member

naotoj commented Jun 1, 2023

Those SoftReference caches are introduced with this change: https://bugs.openjdk.org/browse/JDK-8196869, and there is a stress test added for it: test/jdk/java/util/Locale/SoftKeys.java
@sunny868 Would you be able to add a JMH test to make sure that your change would not affect the startup time?
@cl4es would you take a look at this change?
Thanks!

@naotoj
Copy link
Member

naotoj commented Jun 1, 2023

As to the patch, would you please elaborate on your changes more? To me, it is simply inlining Key.normalize(key) content into Cache.createObject(), and not sure how it prevents the issue in which the referent got GC'ed during the reference creation and use.

assert (key.holder != null && key.holderRef == null);
BaseLocale l = key.holder;
BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true);
return (new Key(locale)).getBaseLocale();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a more rigorous approach would look like this:


BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true);
BaseLocal value = (new Key(locale)).getBaseLocale();
Reference.reachabilityFence(locale);
return value;

But the current patch has passed the tests with -XX:ShenandoahGCHeuristics=aggressive -XX:+ShenandoahOOMDuringEvacALot, so I think the current patch is OK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see a clear explanation of how the proposed patch fixes the problem. Also, I would appreciate the reasoning in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For hotspot, when GC occurs, it causes all threads to run to the nearest safepoint and then freeze. Generally, safepoints are generated at branch jumps, method ends(ret instructions), loops instructions, and so on. Therefore, the purpose of this patch is to make the creation and use of a softReferences in the same method without branch, jumps and loops in between, that is ensure that GC will not occur in the process of the sofeReferences be created and used.

That's why I didn't use Reference.reachabilityFence(locale).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning seems invalid. There are method calls in there, and you rely on inlining heuristics for this not to break. Please use reachabilityFence instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you are assuming that some combination of bytecodes constitutes a critical section that excludes the GC. But the JVMS makes no guarantees about GC exclusion across bytecodes.

A thread can be pre-empted at any bytecode (even in the middle of a bytecode). Another thread can trigger a GC. Maybe the first thread will be rolled forward to a place convenient to the JVM, but you cannot predict what that will be, because the JVMS does not give you any contract about that matter.

For example, some JITs deoptimize (branch to the interpreter) at unpredictable points for reasons no Java programmer should ever think about (because it’s not in the JVMS contract). Deoptimizing can allocate storage (for example, to materialize objects whose allocation was deferred by escape analysis). Thus, it is not safe to assume that any particular bytecode is immune from GC.

Also, some JITs (like C2) inject synthetic safepoints injected as part of arbitrarily complex loop transformations. A GC at such a safepoint might possibly appear to be tied to a bytecode which is simply a fall-through from a previous bytecode. This can happen if the loop is rotated, and a fallthrough point begins to function as a back-branch in the IR.

The rule of thumb for non-JIT engineers is, if you find yourself trying to predict what how “the JIT must work”, stop.

The net of this is that, if you need to preserve an object across a critical section, don’t try to read the JIT’s mind or expect it to read yours. Put in a reachability fence that spans that critical section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the right fix for this is. If it involves reachabilityFence, then a try-finally statement should be used, with the "critical section" being within the try-clause and the reachabilityFence within the finally-clause.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fisk @rose00 for the explanation. This patch is indeed based on the idea that small functions are inlined, cause -Xcomp be used, and I have seen with parameter -XX:+PrintCompilation -XX:+PrintInlining that functions Key::<init> and getBaseLocale are inlined.
But as @rose00 say, it's not safe, so reachabilityFence should be used. please review again.

@sunny868 sunny868 changed the title 8289220: [Shenandoah] TestAllocObjectArrays fails intermittently 8289220: Locale.forLanguageTag throws NPE due to soft ref used in locale cache being cleared Jun 2, 2023
@sunny868
Copy link
Contributor Author

sunny868 commented Jun 2, 2023

@sunny868 Would you be able to add a JMH test to make sure that your change would not affect the startup time?

All JMH tests or some of them? I had trigger JMH test, waiting for run results.

@sunny868
Copy link
Contributor Author

sunny868 commented Jun 5, 2023

Existing JMH Local test results be tested on LOONGARCH64 show no performance fallback.


// parent
Benchmark                         Mode  Cnt   Score   Error  Units
LocaleDefaults.getDefault         avgt    3  35.731 ± 0.805  ns/op
LocaleDefaults.getDefaultDisplay  avgt    3  36.099 ± 1.037  ns/op
LocaleDefaults.getDefaultFormat   avgt    3  36.123 ± 1.718  ns/op

// current
Benchmark                         Mode  Cnt   Score   Error  Units
LocaleDefaults.getDefault         avgt    3  35.649 ± 0.264  ns/op
LocaleDefaults.getDefaultDisplay  avgt    3  36.141 ± 1.283  ns/op
LocaleDefaults.getDefaultFormat   avgt    3  36.092 ± 1.371  ns/op

BaseLocale l = key.holder;
BaseLocale locale = new BaseLocale(l.getLanguage(), l.getScript(), l.getRegion(), l.getVariant(), true);
try {
BaseLocale value = (new Key(locale)).getBaseLocale();
Copy link
Member

@turbanoff turbanoff Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. Why do we need to create thrown-away new Key here?
Can we just use locale?

@djelinski
Copy link
Member

djelinski commented Jun 7, 2023

In its current form the BaseLocale.Cache.createObject function can return null; the object returned by BaseLocale.Key.normalizeKey(key) holds a soft reference to locale, and by the time getBaseLocale is called, the reference may be cleared.

Key only uses SoftReferences because it must not hold a reference to the object returned by createObject. We could refactor createObject to return a clone of the BaseLocale referenced by Key, and then Key would not need to use SoftReferences.

That being said, the number of long-lived BaseLocale objects is very limited; we only keep them in Locale, LocaleKey, and Locale.CONSTANT_LOCALES. Unless I'm missing something, we could solve this problem by removing BaseLocale caching, and improve performance at the same time.

// subclass must return non-null key/value object
return null;
throw new NullPointerException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this caught somewhere, it seem strange to throw NPE here.

@AlanBateman
Copy link
Contributor

That being said, the number of long-lived BaseLocale objects is very limited; we only keep them in Locale, LocaleKey, and Locale.CONSTANT_LOCALES. Unless I'm missing something, we could solve this problem by removing BaseLocale caching, and improve performance at the same time.

The motivation seems to be startup (JDK-8196869). It looks like it needs to be re-evaluated and if we continue with caching in this area then I think we'll need some stress tests to go with it.

@naotoj
Copy link
Member

naotoj commented Jun 7, 2023

Good point Daniel. I created an issue for re-examining the cache mechanism in BaseLocale: https://bugs.openjdk.org/browse/JDK-8309622

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

9 participants