-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8318586: Explicitly handle upcall stub allocation failure #16311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into |
|
@JornVernee The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
|
/solves JDK-8318653 |
|
@JornVernee |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it pretty weird to terminate the VM if we cannot allocate upcall stub. Does this mean the user code could actually terminate the VM on this fatal? Unit test suggests so.
Can the VM code actually handle things without upcall stub present, if e.g. memory is exhausted?
src/hotspot/share/code/codeBlob.cpp
Outdated
| { | ||
| MutexLocker mu(CodeCache_lock, Mutex::_no_safepoint_check_flag); | ||
| blob = new (size) UpcallStub(name, cb, size, receiver, frame_data_offset); | ||
| if (blob == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be safer to call into fatal without having CodeCache_lock held. Move it out of MutexLocker scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern follows what is done in RuntimeStub::new_runtime_stub, which also calls fatal under the lock.
I agree it's probably better to call outside of the lock (and that is something I noticed in the original change for RuntimeStub as well). I'm happy to fix it for both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is okay to handle RuntimeStub in a separate (and more cleanly backportable) PR. Let's make the new code do the right thing from the start.
I think the question is whether the user can do anything reasonable if the allocation fails. Upcall stubs are allocated as a result of a call to But it sounds like you're saying that plain user code should never result in a VM error (if we can help it). That is something I agree with. We'd have to throw some exception from |
Yes, exactly. Granted, there are resource exhaustion situations where the overall progress could be sluggish (e.g. if we near the Java heap OOME), but we don't usually elevate that to globally shutting down the JVM, unless user explicitly requests this, e.g. with I think the situation for It is not the problem with this concrete PR, which I think is fine, but it exposes the larger, more important architectural question. |
FWIW, we use RuntimeStub for downcall stubs allocated by the linker. So there is also an unbounded number of those. (Well technically bounded by the 255 argument limit * all argument type combinations, but that number is very large).
Ok, I'll discuss with the others in the FFM team. I think if we turn this failure into an OOME (leaning on that side at the moment), then we also need a spec change + CSR. I'll wait with this PR until we reach some conclusion. |
I think we can proceed with this PR. The explicit failure is still better than a failure somewhere downstream. That is, this PR does not change the failure mode substantially, right? If something else is doable, like throwing the actual exception, we can replace this |
That's right. |
|
@JornVernee This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 11 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
Is stub allocation the same as other VM C heap exhaustion cases? They will terminate the VM too. Otherwise it would be better report an error and have the Java code thrown OOME. |
I think it depends on the particular case. e.g. I think an allocation failure in the particular case of the FFM Linker allocating stubs, is something that we can reasonably report back to the user. Looking at this again, I realize that we're also allocating a I agree that bubbling up the allocation failures as OOME would be better. |
|
I agree that avoiding a VM fatal error is preferable, like a recent change to make JVMCI RuntimeStub creation failure result in a BailoutException instead of a fatal error. |
|
I've uploaded another version that throws a OOME when the allocation of a downcall or upcall stub fails. (on x64 only for now, I'll look at the other platforms as well). Let me know if that seems better. |
| * @throws IllegalArgumentException if {@code !address.isNative()}, or if {@code address.equals(MemorySegment.NULL)}. | ||
| * @throws IllegalArgumentException if an invalid combination of linker options is given. | ||
| * @throws IllegalCallerException If the caller is in a module that does not have native access enabled. | ||
| * @throws OutOfMemoryError if the runtime does not have the memory needed to create the downcall handle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions for the phrasing here are welcome. I think we should use something that works for both downcall handles and upcall stubs though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOME is pretty much understood to be possible anywhere, given it is a VirtualMachineError. We often do not document it explicitly. The risk with documenting it is that it gives the impression that other methods, which don't document it, can never throw it. A rough grep for @throws OutOfMemoryError reveals only 15 classes in java.base that explicitly document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking inspiration from other methods that throw this exception, maybe something like this might work:
if the downcall method handle cannot be allocated by the Linker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I also agree with @dholmes-ora - e.g. I'm not sure how much OOME is important to document here, since it reflects an internal state of the JVM, rather than something the client can do something about.
E.g. if you create an allocator with SegmentAllocator::slicingAllocator, at some point you are going to run out of space in the underlying segment, so it makes sense to report failures (and to document why that happens). But in this case the documentation is going to be very vague, and I don't think it provides a lot of value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I figured it was similar to Unsafe::allocateMemory, which also documents to OOME. But then again, the user is not directly interested in memory in this case.
I"ll remove these @throws tags then
|
I've also removed the test that tries to trigger an OOME when allocating downcall stubs. It seems not really possible to isolate that particular code path (unless a direct whitebox API is added maybe, but that also kinda defeats the purpose of testing), leading to a flaky test. I've left the test for upcall stubs, as that seems to work well enough (but, might need to drop that as well). |
|
Okay, I have (finally) updated all the other platforms. Please take another look. Thanks. |
mcimadamore
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes look good
|
@JornVernee this pull request can not be integrated into git checkout UpcallStubAllocFailure
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
|
/integrate |
|
Going to push as commit e96e191.
Your commit was automatically rebased without conflicts. |
|
@JornVernee Pushed as commit e96e191. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Explicitly handle UpcallStub allocation failures by terminating. We currently might try to use the returned
nullptrwhich would fail sooner or later. This patch just makes the termination explicit.Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16311/head:pull/16311$ git checkout pull/16311Update a local copy of the PR:
$ git checkout pull/16311$ git pull https://git.openjdk.org/jdk.git pull/16311/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16311View PR using the GUI difftool:
$ git pr show -t 16311Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16311.diff
Webrev
Link to Webrev Comment