-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transient fatal error when building spring-graalvm-native with GraalVM 20.2.0 #2748
Comments
Thanks for the report @sdeleuze. |
Unlikely - the deadlock in #2732 should never result in a crash. |
Despite the description above, I think the crashes are only happening in Java 8. The crash in https://ci.spring.io/teams/spring-graalvm-native/pipelines/spring-graalvm-native/jobs/java11-key-samples-build/builds/420 is:
This is the first OpenJDK 8 update release to include a backport of JFR. It's possible that the JFR bug fixes mentioned above were not included in this backport (@adinn @zakkak ). |
@dougxc JDK-8245283 has not been back-ported to jdk11u so that might cause a failure on jdk11u. However, it's not going to be the cause of the problem on to jdk8u. The patch for that bug skips over a JVM_Constant_Dynamic bytecode but that was only added to the byecode spec from release 11. This may perhaps be the same error as was seen in JDK-8232997. The error was resolved by changes made for a related issue, JDK-8230400 (for which DK-8232997 was marked as a duplicate). The latter had a two part fix. The first part avoids a potential error when comparing old and new methods in the case where the method arrays may not have the same size. The change was backported to jdk11u but the method comparison still present in jdk8u. This may well be what is causing the SEGV. However, in order for this to be the cause of the crash I think the user has to have installed their own agent in the runtime. Is that the case here? The second part of the fix was pushed as a fix to JDK-8233111. It does not appear to have been backported to either jdk11u or jdk8u. I'm not sure whether it could be the cause of the SEGV but I suppose it is possible. It certainly seems to relate to running a Java JIT in the JVM. |
There is an SVM agent in play although I don't know what it does (@peter-hofer ?): |
Ah yes. That agent is responsible for adding code to manage class and instance init, tracking init at build time and maybe deferring it to runtime. It can add a new <clinit> method. That might be enough to cause the problem. |
The agent that I think you're referring to ( |
As @adinn said the agent is there to add the I would advise removing |
@vjovanov I reproduced the issue locally (without |
As pointed out by @vjovanov, it seems to be JFR related, but we enable nothing about JFR on our side, we just ship JFR support with https://github.com/spring-projects/spring-framework/blob/master/spring-core/src/main/java/org/springframework/core/metrics/jfr/FlightRecorderApplicationStartup.java referenced nowhere. But in our builds I noticed Is there a workaround? |
That's a very bizarre SEGV. The hs_err file shows the crash point is early in method position_stream_after_cp(). This is JFR code which modifies loaded class bytecode for class Event or one of its subclasses (I suspect that in this case it is actually the latter given that the call to position_stream_after_cp happens 0x491 bytes into method JfrEventClassTransformer::on_klass_creation()). In an case the problem is not in the transformation code itself because it is happening befoe any byrtes get modified. The problem lies with the bytecode stream for the class to be transformed that is passed to position_stream_after_cp(). The offending routine is passed a ClassFileStream i.e. a stream over a byte array in classfile format that is handed over by the class loader. The code works its way through the mapped byte array skipping over the header and constant pool data. It's purpose is to leave the stream pointing at the next block of data which identifies class fields (methods come after that).
The machine code listed in the hs_err file immediately around the crash pc (0x7f0fb20eaa34) disassembles as follows:
This is 0x44 bytes into the method. So, it appears to be at the start of the loop reading a tag byte from the constant pool and then skipping the relevant number of data bytes depending on the tag. The faulting instruction appears to be at the start of the loop where the tag byte is being range checked it to see if it is above 0x12, the highest possible tag. Looking at related info in the hs_err file the address in rdx from which the byte is being read is 0x7f0ec02f736a
and the SEGV is an access error
So, although 0x00007f0ec02f736a does not look like a wildly wrong address for the classfile byte array it clearly must be since the it is not in a segment mapped for READ access. Now this ClassFileStream pointer has come out of core JVM code (the entry to the JFR on_klass_creation method comes under java.lang.ClassLoader.defineClass1 via SystemDictionary::resolve_from_stream). It is hard to see how that might relate to any class file transformer employed by GraalVM. As Hamlet said, something is rotten in the state of Danmark. Diagnosing this will require a reliable reproducer on a debuginfo build of OpenJDK so the JVM can be debugged. |
Could it be possible to share JVMCI builds without JFR enabled in order to check that fixes the issue? |
You may not need to do that. Your suggestion of disabling JFR is worth trying first. See what happens when you run the native image build with --disable-jfr (you will probably need to pass -J--disable-jfr). That should stop the JFR class transformation code being invoked. |
I am not sure, when I specify |
Oops, apologies. Of course the correct JVM option is -XX:-FlightRecorder. |
Thanks for the tip, I am starting a few full builds on our CI with this option to see if that solves the issue. I will keep you informed. |
Same error with |
Do we have a way to remove |
I saw a similar error on #2164 and related JDK-8232997 cc @gilles-duboscq. Related? |
Trying a new build with |
Using |
@sdeleuze you could take the |
To help tracking oracle/graal#2748.
@dougxc We have the choice of the LabsJDK version we use so I have switched to the debug variant. The bug is transient so by nature we can't reproduce it for every |
@dougxc Could you please share some guidance on what we could do to provide you more insights now that we use the debug builds? |
Do you have a new hs-err file from the debug build? Maybe it provides more detail that @adinn might be able to decipher. If not, then the next step is to add assertions to the C++ to try work out what's going wrong. My suspicion is still that there's something wrong in the JFR backport but I do not know that code very well. |
Let me reproduce the bug locally and provide you the hs-err file. |
Hi @sdeleuze @dougxc I think the culprit is probably this issue just reported in the JDK8 JVM https://bugs.openjdk.java.net/browse/JDK-8252904 It certainly looks likely as the cause of this issue. Andrey Petrushkov has proposed a patch for it in the jdk8u mail list which I am reviewing now. https://mail.openjdk.java.net/pipermail/jdk8u-dev/2020-September/012642.html It looks like his patch will fix the problem but I'm not sure it is acceptable because it uses a ResourceMark in a very obscure way. |
To help tracking oracle/graal#2748.
To help tracking oracle/graal#2748.
The related JDK issue is expected to be fixed in openjdk8u282, is it something we could benefit in GraalVM 20.3? |
Sorry, for not confirming earlier but yes you are right that a fix has already been pushed and will appear in openjdk8u282 (I reviewed the fix and the jdk8u maintainers approved it). I am sorry I cannot help you much regarding pushing the patch to the GraalVM Oracle 'labs' jdk8 releases. I assume the fix will eventually go into a release automatically -- Oracle jdk8u mostly keeps parity with openjdk8u when it comes to bug fixes and 'labs' jdk8 is derived form Oracle jdk8. What GraalVM release it goes into is not something I am qualified to say. Perhaps an Oracle GraalVM dev could answer this. |
I will apply https://bugs.openjdk.java.net/browse/JDK-8252904 to graal-jvmci-8. If all goes well, it should make it into GraalVM CE 20.3 but if not, definitely in 21.0. |
Thanks @dougxc please keep us informed. |
TheJDK-8252904 patch has been applied (graalvm/graal-jvmci-8@0284e18) so this will make it into 20.3. |
Please re-open if the issue still shows up in the 20.3 release. |
Not sure if that's due to |
I confirm the bug seems to be fixed, thanks a lot! In practice Java 11 seems not stable for us, and Java 8 will require a fix for #2965 . |
Could you please add this one in the 20.3 milestone? |
We see sometimes on our CI such kind of transient error on various samples with with GraalVM 20.2.0-dev:
It happens with Java 8 (https://ci.spring.io/teams/spring-graalvm-native/pipelines/spring-graalvm-native/jobs/java8-key-samples-build/builds/476 or https://ci.spring.io/teams/spring-graalvm-native/pipelines/spring-graalvm-native/jobs/java8-key-samples-build/builds/478) and also on Java 11 (https://ci.spring.io/teams/spring-graalvm-native/pipelines/spring-graalvm-native/jobs/java11-key-samples-build/builds/420).
We use up to date builds of
release/graal-vm/20.2
withopenjdk-8u262+10-jvmci-20.2-b03
andlabsjdk-ce-11.0.8+10-jvmci-20.2-b03
.Despite the message I don't see any
hs_err_pid1073.log
error file on our CI machine.I have enabled core dumps on the CI in order to try to provide more information.
Please tag this issue with the
spring
label.The text was updated successfully, but these errors were encountered: