New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8286066: assert(k != __null) failed: klass not loaded caused by FillerObject_klass #8519
Conversation
|
@DamonFool The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/label hotspot-gc |
/label add hotspot-gc |
@DamonFool |
@DamonFool The |
Webrevs
|
/* GC support, should be loaded as early as possible */ \ | ||
do_klass(FillerObject_klass, jdk_internal_vm_FillerObject ) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems just a touch too early for my liking, Even though it is an empty class it is in a different package and that might perturb a few things in an unexpected way.
IIUC the class is going to be needed when the initial TLAB is expanded. Did you determine exactly which class was causing that? I'd like to see this loading delayed until after the core system classes, as much as possible.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems just a touch too early for my liking, Even though it is an empty class it is in a different package and that might perturb a few things in an unexpected way.
IIUC the class is going to be needed when the initial TLAB is expanded. Did you determine exactly which class was causing that? I'd like to see this loading delayed until after the core system classes, as much as possible.
Thanks.
Thanks @dholmes-ora for the review.
I added a jtreg to this bug.
It can also be reproduced with java -XX:-UseCompressedClassPointers -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -Xshare:dump
on Linux/x64.
I saw the crash after java.lang.ThreadDeath
was loaded on x86_32.
Maybe, we can load FillerObject
after java.lang.Error
.
STDERR:
stdout: [[0.004s][info][class,path] bootstrap loader class path=/home/jdk/build/linux-x86-server-fastdebug/images/jdk/lib/modules
[0.006s][warning][gc,init ] Consider enabling -XX:+AlwaysPreTouch to avoid memory commit hiccups
[0.006s][info ][cds ] Core region alignment: 4096
[0.006s][info ][class,path] app loader class path=/home/jdk/build/linux-x86-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_runtime_cds_appcds_TestEpsilonGCWithCDS_java/scratch/0/hello.jar
[0.006s][info ][class,path] opened: /home/jdk/build/linux-x86-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_runtime_cds_appcds_TestEpsilonGCWithCDS_java/scratch/0/hello.jar
[0.006s][info ][class,load] opened: /home/jdk/build/linux-x86-server-fastdebug/test-support/jtreg_test_hotspot_jtreg_runtime_cds_appcds_TestEpsilonGCWithCDS_java/scratch/0/hello.jar
[0.010s][info ][class,load] java.lang.Object source: jrt:/java.base
[0.010s][info ][class,load] java.io.Serializable source: jrt:/java.base
[0.010s][info ][class,load] java.lang.Comparable source: jrt:/java.base
[0.010s][info ][class,load] java.lang.CharSequence source: jrt:/java.base
[0.010s][info ][class,load] java.lang.constant.Constable source: jrt:/java.base
[0.010s][info ][class,load] java.lang.constant.ConstantDesc source: jrt:/java.base
[0.011s][info ][class,load] java.lang.String source: jrt:/java.base
[0.011s][info ][class,load] java.lang.reflect.AnnotatedElement source: jrt:/java.base
[0.012s][info ][class,load] java.lang.reflect.GenericDeclaration source: jrt:/java.base
[0.012s][info ][class,load] java.lang.reflect.Type source: jrt:/java.base
[0.012s][info ][class,load] java.lang.invoke.TypeDescriptor source: jrt:/java.base
[0.012s][info ][class,load] java.lang.invoke.TypeDescriptor$OfField source: jrt:/java.base
[0.012s][info ][class,load] java.lang.Class source: jrt:/java.base
[0.012s][info ][class,load] java.lang.Cloneable source: jrt:/java.base
[0.013s][info ][class,load] java.lang.ClassLoader source: jrt:/java.base
[0.013s][info ][class,load] java.lang.System source: jrt:/java.base
[0.013s][info ][class,load] java.lang.Throwable source: jrt:/java.base
[0.013s][info ][class,load] java.lang.Error source: jrt:/java.base
[0.013s][info ][class,load] java.lang.ThreadDeath source: jrt:/java.base
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc: SuppressErrorAt=/vmClasses.hpp:55
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/home/jdk/src/hotspot/share/classfile/vmClasses.hpp:55), pid=78368, tid=78371
# assert(k != __null) failed: klass not loaded
#
# JRE version: (19.0) (fastdebug build )
# Java VM: OpenJDK Server VM (fastdebug 19-internal-adhoc..jdk, interpreted mode, tiered, epsilon gc, linux-x86)
# Problematic frame:
# V [libjvm.so+0x85f54d] CollectedHeap::fill_with_dummy_object(HeapWordImpl**, HeapWordImpl**, bool)+0x42d
#
What do you think?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, we can load
FillerObject
afterjava.lang.Error
.
But I'm not sure if it's always safe to do so on any platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the x64 case (thanks for that reproducer!) it crashes after loading java.lang.Thread$UncaughtExceptionHandler
- so somewhat later than the x86 case.
I thought the underlying problem was the TLAB size but at least with EpsilonGC it is acting very counterintuitively - any size > 6K crashes, while less does not. But logging also seems to show that the requested size is not being honoured in those cases but the Ergo size of 2K is being selected! This is all very odd, but perhaps specific to Epsilon.
With G1 I can crash with a TLABSize of 4K (or less), but it occurs much later after loading jdk.internal.reflect.UnsafeStaticFieldAccessorImpl
(for 4K). With 2K minimum TLABSize it is after java.lang.Boolean
.
So I guess after Error is a reasonable compromise.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess after Error is a reasonable compromise.
Updated.
runtime/cds/appcds/TestEpsilonGCWithCDS.java runtime/cds/FillerObjectLoadTest.java
passed on Linux/x86_32.
More testing is in progress.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With G1 I can crash with a TLABSize of 4K (or less), but it occurs much later after loading
The jtreg test had been updated to also test the default GC.
Thanks.
I tested tier1~tier3 on Linux/x64, no regression. |
@DamonFool This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 25 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
An alternative fix which seems more comprehensive could be to use E.g. in
instead of direct use of I see that Use of the small filler objects is really rare, additionally it only happens at most once per TLAB/PLAB fill, and is mostly done by the application anyway, so I do not see an issue with such an additional flag. Another option is to have some global variable containing that klass (either in This would completely avoid the somewhat brittle guessing about the initialization order of the klasses, and avoids any runtime overhead by checking whether the klass has already been loaded during runtime at the cost of a single global variable. At this time I kind of prefer this second option. |
Changing my approval after thinking it through a bit more to discuss the options we have.
Thanks @tschatzl for your review and suggestion. Then part of the unused memory would be filled with |
The gc filler objects only provide additional verification and easier debugging (i.e. any reference to such a filler object is simply an error somewhere). We don't (and can't) verify that every non-FillerObject is being referenced at all times. Also, the "wrong" filler objects would also only persist until the first garbage collection, as
This can certainly be combined with early as possible initialization of that global variable with the correct So I believe this is safe (and actually necessary to avoid specific TLAB sizes to fail anyway) to do. Thanks, |
Sounds good! |
If we decide to go with the latest implementation, the bug title should be updated. |
String java_home_src = System.getProperty("java.home"); | ||
String java_home_dst = CDSTestUtils.getOutputDir() + File.separator + "moved_jdk"; | ||
CDSTestUtils.clone(new File(java_home_src), new File(java_home_dst)); | ||
String dstJava = java_home_dst + File.separator + "bin" + File.separator + "java"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you avoid all this by specifying the name of the shared archive file for the dump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you avoid all this by specifying the name of the shared archive file for the dump?
It works!
Updated.
Thanks.
Updated. |
ProcessBuilder pb = ProcessTools.createJavaProcessBuilder( | ||
"-XX:+IgnoreUnrecognizedVMOptions", "-XX:-UseCompressedClassPointers", | ||
"-XX:+UnlockExperimentalVMOptions", "-XX:+UseEpsilonGC", "-Xshare:dump", | ||
"-XX:SharedArchiveFile=./hello.jsa"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use TestCommon.getCurrentArchiveName()
to get a unique name to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that might be overkill for this simple case ... I wonder if @iklam could comment?
hello.jsa is a bit of a weird name to use though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
TestCommon.getCurrentArchiveName()
to get a unique name to use.
Hi @dholmes-ora , I didn't get the point why we need to get a unique name.
What's the problem if we always use the same name?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had issues in the past with tests running concurrently IIRC . I'm not sure if this is still an issue if the test working directory is already unique. @iklam should remember the details. We put in a lot of support code to make sure things always worked smoothly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had problems on Windows in the past. If two tests generate an archive with the same name, after the first test finishes, jtreg deletes the first version of this file. However, Windows virus scanners will keep the file alive for a little longer. When the second tests tries to create the file again, it gets a file permission error. This would lead to infrequent random failures in the CDS tests.
That's the reason we use TestCommon.getCurrentArchiveName()
in the CDS tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had issues in the past with tests running concurrently IIRC . I'm not sure if this is still an issue if the test working directory is already unique.
The archive file is generated in the test's "working scratch" directory, which is reused. For example, if you have 3 test cases and a jtreg concurrency setting of 2: 0/scratch/
could be used twice whereas 1/scratch/
is used once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had problems on Windows in the past. If two tests generate an archive with the same name, after the first test finishes, jtreg deletes the first version of this file. However, Windows virus scanners will keep the file alive for a little longer. When the second tests tries to create the file again, it gets a file permission error. This would lead to infrequent random failures in the CDS tests.
That's the reason we use
TestCommon.getCurrentArchiveName()
in the CDS tests.
Thanks @iklam and @dholmes-ora for your kind explanation.
Updated.
TestCommon.getCurrentArchiveName()
returns null
so I use TestCommon.getNewArchiveName()
instead.
Thanks.
@@ -60,6 +60,7 @@ | |||
|
|||
class ClassLoaderData; | |||
|
|||
Klass* CollectedHeap::_filler_klass = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name this member _filler_object_klass
to better distinguish it from _filler_array*
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name this member
_filler_object_klass
to better distinguish it from_filler_array*
.
Updated.
Thanks.
Looks good! But I'd like to test it in our CI before integration please.
Thanks,
David
I'll push it through our tier1-3 and report back |
I had a clear run on tier1-3 so all good to integrate |
Thank you all for the review and testing. |
Going to push as commit 7ebc4bc.
Your commit was automatically rebased without conflicts. |
@DamonFool Pushed as commit 7ebc4bc. |
Hi all,
FillerObject_klass
was loaded too late, which may lead to VM crash due toFillerObject_klass
is unloaded.We found this bug by running
runtime/cds/appcds/TestEpsilonGCWithCDS.java
on x86_32.From the following gdb backtrace,
FillerObject_klass
may be used duringvmClasses::resolve_all
(see frame #27 resolvingException_klass
).So
FillerObject_klass
should be loaded as soon as possible.The fix just loads
FillerObject
afterObject_klass
.Thanks.
Best regards,
Jie
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8519/head:pull/8519
$ git checkout pull/8519
Update a local copy of the PR:
$ git checkout pull/8519
$ git pull https://git.openjdk.java.net/jdk pull/8519/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8519
View PR using the GUI difftool:
$ git pr show -t 8519
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8519.diff