Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM #11316

Closed
wants to merge 3 commits into from

Conversation

TobiHartmann
Copy link
Member

@TobiHartmann TobiHartmann commented Nov 23, 2022

Method::build_profiling_method_data acquires the MethodData_lock when initializing Method::_method_data to prevent multiple allocations by different threads. The problem is that when metaspace allocation fails and JvmtiExport::should_post_resource_exhausted() is set, we assert during the ThreadToNativeFromVM transition in JVMTI code.

Since concurrent initialization is a rare event, I suggest to get rid of the lock and perform the initialization with a cmpxchg, similar to how method counters are initialized:

if (!mh->init_method_counters(counters)) {
MetadataFactory::free_metadata(mh->method_holder()->class_loader_data(), counters);
}

Since current code in Method::set_method_data uses a Atomic::release_store, I added a OrderAccess::release().

Thanks,
Tobias


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11316/head:pull/11316
$ git checkout pull/11316

Update a local copy of the PR:
$ git checkout pull/11316
$ git pull https://git.openjdk.org/jdk pull/11316/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11316

View PR using the GUI difftool:
$ git pr show -t 11316

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11316.diff

…iled: must release all locks when leaving VM
@TobiHartmann
Copy link
Member Author

/label hotspot

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 23, 2022

👋 Welcome back thartmann! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added rfr Pull request is ready for review hotspot hotspot-dev@openjdk.org labels Nov 23, 2022
@openjdk
Copy link

openjdk bot commented Nov 23, 2022

@TobiHartmann
The hotspot label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Nov 23, 2022

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MethodData::allocate cannot be called whilst holding a lock then perhaps we can assert that in there?

I think there is a broader problem here that metaspace allocation can occur in numerous places and any failure could post the resource-exhausted event and potentially lead to problems like this.

This fix side-steps one problematic call-site, but going lock-free has its own concerns.

// The store into method must be released. On platforms without
// total store order (TSO) the reference may become visible before
// the initialization of data otherwise.
OrderAccess::release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A release here is not necessary as Atomic::replace_if_null will by default have memory_order_conservative which maintains the full bi-directional fence of the internal cmpxchg.

return; // return the exception (which is cleared)
}
ClassLoaderData* loader_data = method->method_holder()->class_loader_data();
MethodData* method_data = MethodData::allocate(loader_data, method, THREAD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the downside of lock-free here is that we have to pre-allocate and then later free. How expensive is that? How likely are we to get multiple threads attempting this at the same time? We might trigger a resource-exhausted event unnecessarily due to the temporary use of metaspace.

@TobiHartmann
Copy link
Member Author

Thanks for the review, David!

I added an assert to MethodData::allocate and fixed the ciReplay code which only ever gets executed by a single thread by removing the lock. I also removed the OrderAccess::release().

Regarding the overhead of going lock-free: I temporarily added code that counts the number of times that multiple threads attempt initialization and asserts if it's more that 50x. This triggers in 25 out of 203.772 runs (tests from tier1 to tier3). The average size of these allocations is around 88 words (704 bytes). I therefore think it's fine to avoid a lock in this case. Also, we already use a lock-free mechanism for method::build_method_counters.

What do you think?

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of the issue. The fix looks ok to me. Seems like you've removed all uses of MethodData_lock so you could remove the declaration and definition too.

Thanks, Richard.

@openjdk
Copy link

openjdk bot commented Nov 28, 2022

@TobiHartmann This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8297389: resexhausted003 fails with assert(!thread->owns_locks()) failed: must release all locks when leaving VM

Reviewed-by: dholmes, rrich, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 151 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 28, 2022
@TobiHartmann
Copy link
Member Author

Thanks for the review, Richard. Good catch, I removed the MethodData_lock.

@dean-long
Copy link
Member

I added an assert to MethodData::allocate and fixed the ciReplay code which only ever gets executed by a single thread by removing the lock. I also removed the OrderAccess::release().

I'm not sure if it's guaranteed that replay runs single-threaded. I haven't looked into the details, but I think ReplayInline can run in any compiler thread after startup, and then of course there is JDK-8254110.

@dholmes-ora
Copy link
Member

I therefore think it's fine to avoid a lock in this case. Also, we already use a lock-free mechanism for Method::build_method_counters.

Method::build_method_counters doesn't appear to be lock-free, in the sense there are no atomic operations; rather concurrency just doesn't seem to be a concern with that code. ??

But as long as the overhead is low and interference unlikely, then we can see how this works out in practice. Thanks.

@TobiHartmann
Copy link
Member Author

I'm not sure if it's guaranteed that replay runs single-threaded. I haven't looked into the details, but I think ReplayInline can run in any compiler thread after startup, and then of course there is JDK-8254110.

But ReplayInline does not load ciMethodData, i.e., does not call process_ciMethodData, right?

The only way this code can get executed is through JNI_CreateJavaVM_inner -> ciReplay::replay -> ciReplay::replay_impl -> process-> process_command -> process_ciMethodData and that only ever happens single-threaded.

Of course, if we are ever going to implement JDK-8254110, we need to revisit that code, but in that case the assert that I added will trigger.

What do you think?

Method::build_method_counters doesn't appear to be lock-free, in the sense there are no atomic operations; rather concurrency just doesn't seem to be a concern with that code. ??

It uses the exact same mechanism that I now added to Method::build_profiling_method_data, including atomic operations to initialize _method_counters:

bool Method::init_method_counters(MethodCounters* counters) {
// Try to install a pointer to MethodCounters, return true on success.
return Atomic::replace_if_null(&_method_counters, counters);
}

And I verified that multiple threads attempt to initialize the counters by adding verification code to:

if (!mh->init_method_counters(counters)) {
MetadataFactory::free_metadata(mh->method_holder()->class_loader_data(), counters);
}

@dholmes-ora
Copy link
Member

It uses the exact same mechanism that I now added to Method::build_profiling_method_data, including atomic operations to initialize _method_counters

Sorry I somehow missed the init_method_counters logic. :(

@dean-long
Copy link
Member

Of course, if we are ever going to implement JDK-8254110, we need to revisit that code, but in that case the assert that I added will trigger.

What do you think?

OK, please add a TODO in JDK-8254110 explaining this so it doesn't get lost.

@TobiHartmann
Copy link
Member Author

OK, please add a TODO in JDK-8254110 explaining this so it doesn't get lost.

Thanks, done.

@TobiHartmann
Copy link
Member Author

@dholmes-ora, @dean-long are you okay with the latest version?

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes seem okay but I'm not that familiar with the actual code here.

Thanks.

@TobiHartmann
Copy link
Member Author

Thanks, David!

@TobiHartmann
Copy link
Member Author

Thanks, Dean!

@TobiHartmann
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 1, 2022

Going to push as commit 9f24a6f.
Since your change was applied there have been 152 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 1, 2022
@openjdk openjdk bot closed this Dec 1, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 1, 2022
@openjdk
Copy link

openjdk bot commented Dec 1, 2022

@TobiHartmann Pushed as commit 9f24a6f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants