Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8310829: guarantee(!HAS_PENDING_EXCEPTION) failed in ExceptionTranslation::doit #14641

Closed
wants to merge 8 commits into from

Conversation

dougxc
Copy link
Member

@dougxc dougxc commented Jun 25, 2023

The VMSupport class is required for translating an exception between the HotSpot and libgraal heaps.
Loading it lazily can result in a loading exception, obscuring the exception being translated.
To avoid this, VMSupport is loaded eagerly along with the other vmClasses.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8310829: guarantee(!HAS_PENDING_EXCEPTION) failed in ExceptionTranslation::doit (Bug - P2)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14641/head:pull/14641
$ git checkout pull/14641

Update a local copy of the PR:
$ git checkout pull/14641
$ git pull https://git.openjdk.org/jdk.git pull/14641/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14641

View PR using the GUI difftool:
$ git pr show -t 14641

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14641.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 25, 2023

👋 Welcome back dnsimon! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 25, 2023
@openjdk
Copy link

openjdk bot commented Jun 25, 2023

@dougxc The following labels will be automatically applied to this pull request:

  • graal
  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Jun 25, 2023
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2023

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eager loading seems reasonable, but I do not understand the details here. In what way was loading failing? You still have to initialize VMSupport before you can call methods on it, so that could also fail - though the code does not seem to notice/handle this. ??

@@ -582,7 +582,7 @@ C2V_VMENTRY_NULL(jobject, lookupType, (JNIEnv* env, jobject, jstring jname, ARGU
TempNewSymbol class_name = SymbolTable::new_symbol(str);

if (class_name->utf8_length() <= 1) {
JVMCI_THROW_MSG_0(InternalError, err_msg("Primitive type %s should be handled in Java code", class_name->as_C_string()));
JVMCI_THROW_MSG_0(InternalError, err_msg("Primitive type %s should be handled in Java code", str));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unrelated to the fix at hand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a minor fix up I noticed while making changes a few lines below. It just avoids a conversion of a Symbol back to a C string when the original C string is available.

@dougxc
Copy link
Member Author

dougxc commented Jun 26, 2023

The eager loading seems reasonable, but I do not understand the details here. In what way was loading failing? You still have to initialize VMSupport before you can call methods on it, so that could also fail - though the code does not seem to notice/handle this. ??

The failure was due to OutOfMemoryError when running a GC test that stresses the heap (TestPLABEvacuationFailure.java).
The usages of vmSupport below all use JavaCalls:call_static which takes care of linking and initializing the class.

@dholmes-ora
Copy link
Member

dholmes-ora commented Jun 26, 2023

The usages of vmSupport below all use JavaCalls:call_static which takes care of linking and initializing the class.

Sure but my point is that initialization can fail and the code does not seem to directly take that into account - if the first decode encounters a class initialization error then it will just return with the exception pending and no decoding will actually have occurred. If we get to the encode first and it encounters an initialization error it will still attempt to do a decode that must in turn fail and again we return with an exception pending and no encoding/decoding occurring. This code does not seem to be written in a way that allows for an error initializing VMSupport. So while the fix side-steps the problematic guarantee, I think it is just highlighting that this code is not prepared for "system" errors like this.

@dholmes-ora
Copy link
Member

dholmes-ora commented Jun 26, 2023

Edited above to correct: I think it is just highlighting that this code is not prepared for "system" errors like this.

@dougxc
Copy link
Member Author

dougxc commented Jun 26, 2023

if the first decode encounters a class initialization error then it will just return with the exception pending and no decoding will actually have occurred

That's fine - if VMSupport fails class initialization, then no "rich" decoding can occur (by definition) and so throwing the clinit error is the best we can do.

If we get to the encode first and it encounters an initialization error it will still attempt to do a decode that must in turn fail

You have to keep in mind that encode and decode are called in different runtimes. If encoding an exception in the HotSpot runtime fails (e.g. due to an OOME calling VMSupport.<clinit> in the HotSpot heap), then it's still fine to try call VMSupport.decode in the libgraal runtime. If VMSupport.decode in the libgraal runtime also fails, then it throw the exception describing the secondary failure. There's simply no way to guarantee all info is shown when secondary issues occur during exception handling.

@dougxc
Copy link
Member Author

dougxc commented Jun 26, 2023

BTW, I've manually tested this with libgraal. It's not possible to add a jtreg test until libgraal in part of OpenJDK.

Copy link
Contributor

@tkrodriguez tkrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

@openjdk
Copy link

openjdk bot commented Jun 26, 2023

@dougxc This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8310829: guarantee(!HAS_PENDING_EXCEPTION) failed in ExceptionTranslation::doit

Reviewed-by: never, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 26, 2023
@dholmes-ora
Copy link
Member

That's fine - if VMSupport fails class initialization, then no "rich" decoding can occur (by definition) and so throwing the clinit error is the best we can do.

Then why did you not simply handle the exception from the loading of VMSupport the same way?

@dougxc
Copy link
Member Author

dougxc commented Jun 26, 2023

Then why did you not simply handle the exception from the loading of VMSupport the same way?

The only actual case seen of the original way failing is due to OOME in GC stress tests. If loading of VMSupport is done during VM startup, then no stress test code can cause an OOME while loading VMSupport. I doubt VMSupport.<clinit> would ever fail in practice.

@dholmes-ora
Copy link
Member

I doubt VMSupport. would ever fail in practice.

I would think it is likely to fail with OOME under the same GC stress test conditions.

But if this is just a stress test issue then bailing out when the loading fails would be far simpler than pre-loading the class. You've added to the VM startup cost just to avoid a stress test failure.

@dougxc
Copy link
Member Author

dougxc commented Jun 26, 2023

I would think it is likely to fail with OOME under the same GC stress test conditions.

But if this is just a stress test issue then bailing out when the loading fails would be far simpler than pre-loading the class. You've added to the VM startup cost just to avoid a stress test failure.

Any app running near the GC limit can cause this and we've seen this with libgraal in practice. When the VM is near the GC limit, any compilation can cause an OOME and then when that OOME wants to be translated back to libgraal, it can be the first time VMSupport is used.
The time for loading VMSupport eagerly is in the noise.
For example, -verbose:class shows that about 30 classes (including VMSupport) are loaded in 1ms:

[0.011s][info][class,load] java.lang.reflect.Executable source: shared objects file
[0.012s][info][class,load] java.lang.reflect.Method source: shared objects file
[0.012s][info][class,load] java.lang.reflect.Constructor source: shared objects file
[0.012s][info][class,load] jdk.internal.vm.ContinuationScope source: shared objects file
[0.012s][info][class,load] jdk.internal.vm.Continuation source: shared objects file
[0.012s][info][class,load] jdk.internal.vm.StackChunk source: shared objects file
[0.012s][info][class,load] jdk.internal.vm.VMSupport source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.MagicAccessorImpl source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.MethodAccessor source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.MethodAccessorImpl source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.DelegatingClassLoader source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.ConstantPool source: shared objects file
[0.012s][info][class,load] java.lang.annotation.Annotation source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.CallerSensitive source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.ConstructorAccessor source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.ConstructorAccessorImpl source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.DirectConstructorHandleAccessor$NativeAccessor source: shared objects file
[0.012s][info][class,load] jdk.internal.reflect.SerializationConstructorAccessorImpl source: shared objects file
[0.012s][info][class,load] java.lang.invoke.MethodHandle source: shared objects file
[0.012s][info][class,load] java.lang.invoke.DirectMethodHandle source: shared objects file
[0.012s][info][class,load] java.lang.invoke.VarHandle source: shared objects file
[0.012s][info][class,load] java.lang.invoke.MemberName source: shared objects file
[0.012s][info][class,load] java.lang.invoke.ResolvedMethodName source: shared objects file
[0.012s][info][class,load] java.lang.invoke.MethodHandleNatives source: shared objects file
[0.012s][info][class,load] java.lang.invoke.LambdaForm source: shared objects file
[0.012s][info][class,load] java.lang.invoke.TypeDescriptor$OfMethod source: shared objects file
[0.012s][info][class,load] java.lang.invoke.MethodType source: shared objects file
[0.012s][info][class,load] java.lang.BootstrapMethodError source: shared objects file
[0.012s][info][class,load] java.lang.invoke.CallSite source: shared objects file
[0.012s][info][class,load] jdk.internal.foreign.abi.NativeEntryPoint source: shared objects file
[0.012s][info][class,load] jdk.internal.foreign.abi.ABIDescriptor source: shared objects file
[0.012s][info][class,load] jdk.internal.foreign.abi.VMStorage source: shared objects file
[0.013s][info][class,load] jdk.internal.foreign.abi.UpcallLinker$CallRegs source: shared objects file

@dholmes-ora
Copy link
Member

It may be in the noise but noise adds up over time.

It just seems to me that the simplest fix here would have been to convert

Klass* vmSupport = SystemDictionary::resolve_or_fail(vmSymbols::jdk_internal_vm_VMSupport(), true, THREAD);
guarantee(!HAS_PENDING_EXCEPTION, "");

to

Klass* vmSupport = SystemDictionary::resolve_or_fail(vmSymbols::jdk_internal_vm_VMSupport(), true, CHECK);

and just return on exception. A very isolated change with zero impact on anything else.

@dougxc
Copy link
Member Author

dougxc commented Jun 27, 2023

I've pushed the loading of VMSupport down into the 2 cases where it's actually needed and made it lazy again which should address all concerns about extra noise at VM startup. Thanks for the push back on this - I agree that it's the best solution.

@tkrodriguez
Copy link
Contributor

I don't think pushing it down that far improves things. encode always precedes decode so why not resolve it before the encode call.

      // Resolve VMSupport class explicitly as HotSpotJVMCI::compute_offsets
      // may not have been called.
      Klass* vmSupport = SystemDictionary::resolve_or_fail(vmSymbols::jdk_internal_vm_VMSupport(), true, THREAD);
      int res = 0;
      if (!HAS_PENDING_EXCEPTION) {
        res = encode(THREAD, vmSupport, buffer, buffer_size);
      }

return;
}
int res = encode(THREAD, vmSupport, buffer, buffer_size);
int res = encode(THREAD, buffer, buffer_size);
if (_from_env != nullptr && !_from_env->is_hotspot() && _from_env->has_pending_exception()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this check before the HAS_PENDING_EXCEPTION check? Couldn't you end up returning with a pending exception in this path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If encode is SharedLibraryToHotSpotExceptionTranslation::encode there is no possibility of a pending HotSpot exception upon it returning. If it's HotSpotToSharedLibraryExceptionTranslation::encode then _from_env->is_hotspot() is true and so this if branch is not taken.

@dougxc
Copy link
Member Author

dougxc commented Jun 28, 2023

I don't think pushing it down that far improves things. encode always precedes decode so why not resolve it before the encode call.

Because the VMSupport klass is only needed if calling into HotSpot so it's better to push it down into HotSpotToSharedLibraryExceptionTranslation::encode. Also, if the VMSupport klass is used for encoding, it's not needed for decoding (the libgraal VMSupport jclass value is used instead).

@tkrodriguez
Copy link
Contributor

Right, I was forgetting about the virtual nature of this code. Why can't the pending exception handling be in the respective virtual? The partition between virtual and shared is kind of ugly.

@dougxc
Copy link
Member Author

dougxc commented Jun 28, 2023

Why can't the pending exception handling be in the respective virtual?

Done: 9236128

Copy link
Contributor

@tkrodriguez tkrodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more clear to me. Thanks!

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now isolated to JVMCI code that I'm not familiar with, so don't feel competent to review. The general idea seems okay but I'm not familiar enough with the details. Someone from the compiler team should review this now.

Thanks.

@dougxc
Copy link
Member Author

dougxc commented Jun 29, 2023

Someone from the compiler team should review this now.

@vnkozlov could you please review this or nominate someone else from the compiler team to look at it. Thanks.

@vnkozlov
Copy link
Contributor

I am fins with idea of changes. But, please, activate GHA testing for this branch.
And there is build error on Windows:

c:\workspace\open\src\hotspot\share\jvmci\jvmciEnv.cpp(449): error C2220: the following warning is treated as an error
c:\workspace\open\src\hotspot\share\jvmci\jvmciEnv.cpp(449): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data

@dougxc
Copy link
Member Author

dougxc commented Jun 30, 2023

I have fixed the warning on Windows: 5bb3b52

But, please, activate GHA testing for this branch.

Isn't GHA a strict subset of or equal to tier1 mach5 testing? If so, what's the point of doing redundant testing?

@vnkozlov
Copy link
Contributor

But, please, activate GHA testing for this branch.

Isn't GHA a strict subset of or equal to tier1 mach5 testing? If so, what's the point of doing redundant testing?

It builds and tests configurations (32-bit) we don't have in our testing.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@dougxc
Copy link
Member Author

dougxc commented Jun 30, 2023

But, please, activate GHA testing for this branch.

Isn't GHA a strict subset of or equal to tier1 mach5 testing? If so, what's the point of doing redundant testing?

It builds and tests configurations (32-bit) we don't have in our testing.

Good to know - thanks!

@dougxc
Copy link
Member Author

dougxc commented Jun 30, 2023

Thanks for the reviews.

/integrate

@openjdk
Copy link

openjdk bot commented Jun 30, 2023

Going to push as commit f6bdccb.
Since your change was applied there have been 13 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 30, 2023
@openjdk openjdk bot closed this Jun 30, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 30, 2023
@openjdk
Copy link

openjdk bot commented Jun 30, 2023

@dougxc Pushed as commit f6bdccb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants