Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8298084: Memory leak in Method::build_profiling_method_data #11526

Closed
wants to merge 6 commits into from

Conversation

jcking
Copy link
Contributor

@jcking jcking commented Dec 6, 2022

Resolve memory leak related to MethodData::_extra_data_lock in Method::build_profiling_method_data by calling MethodData::~MethodData().


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8298084: Memory leak in Method::build_profiling_method_data

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11526/head:pull/11526
$ git checkout pull/11526

Update a local copy of the PR:
$ git checkout pull/11526
$ git pull https://git.openjdk.org/jdk pull/11526/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11526

View PR using the GUI difftool:
$ git pr show -t 11526

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11526.diff

Signed-off-by: Justin King <jcking@google.com>
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 6, 2022

👋 Welcome back jcking! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@jcking The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review labels Dec 6, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 6, 2022

Webrevs

@dholmes-ora
Copy link
Member

I'm somewhat confused by the existing code. We have:

// Destroy MethodData
method_data()->~MethodData();

which doesn't actually delete the MethodData it just runs the destructor. So where is the MethodData object actually allocated, and when is that freed? I would expect the lock to disappear at the same time if it is embedded in the MethodData object.

@jcking
Copy link
Contributor Author

jcking commented Dec 6, 2022

I'm somewhat confused by the existing code. We have:

// Destroy MethodData
method_data()->~MethodData();

which doesn't actually delete the MethodData it just runs the destructor. So where is the MethodData object actually allocated, and when is that freed? I would expect the lock to disappear at the same time if it is embedded in the MethodData object.

Mutex used to be a value member (this change switches it to be a pointer), so ~MethodData actually deleted the lock and its associated data. However calling the destructor like that, and then accessing fields later is really sketchy and I am pretty sure undefined behavior to some extent.

Many of the other Metadata derived classes use deallocate_contents() to release other metadata objects and release_C_heap_structures() to cleanup non-metadata memory. So I was just attempting to be consistent.

86fdbc0 addressed the original leak and called the destructor directly, but a more recent change re-introduced another leak.

@jcking
Copy link
Contributor Author

jcking commented Dec 6, 2022

The Metadata derived classes also do not really use destructors at all. None of them are called directly under normal circumstances.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, I changed this code with JDK-8297389 and thereby accidentally introduced the leak. Thanks for fixing!

The fix looks good to me, but I'm not an expert in that area.

@TobiHartmann
Copy link
Member

/reviewers 2

@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@jcking This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 6, 2022
@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@TobiHartmann
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 6, 2022
@dholmes-ora
Copy link
Member

I'm struggling to see how JDK-8297389 affected this. ??

@dholmes-ora
Copy link
Member

Also see:

https://mail.openjdk.org/pipermail/hotspot-dev/2019-October/039783.html

The use of the Mutex* causes a performance issue.

@dholmes-ora
Copy link
Member

Reading:

https://mail.openjdk.org/pipermail/hotspot-dev/2019-October/039783.html

also refreshed me on the whole delete vs destructor issue.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct.

src/hotspot/share/oops/methodData.cpp Outdated Show resolved Hide resolved
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 7, 2022
@jcking
Copy link
Contributor Author

jcking commented Dec 7, 2022

Also see:

https://mail.openjdk.org/pipermail/hotspot-dev/2019-October/039783.html

The use of the Mutex* causes a performance issue.

@dholmes-ora Should we just have MetadataFactory::free_metadata call the destructor for T right after deallocate_contents and immediately before deallocating? The compiler will be smart enough to omit the call for types that are trivially destructible. That would also solve this bug, preserve any performance, and ensure it cannot happen again easily. It is fairly easy to forget to call release_c_heap_structures.

@TobiHartmann
Copy link
Member

I'm struggling to see how JDK-8297389 affected this. ??

JDK-8297389 added the call to MetadataFactory::free_metadata in Method::build_profiling_method_data which does not free MethodData::_extra_data_lock. As Coleen explained: "Methods are attached to InstanceKlass so let InstanceKlass call release_C_heap_structures for them, except for this case that was added for MethodData here."

@jcking
Copy link
Contributor Author

jcking commented Dec 7, 2022

Also see:
https://mail.openjdk.org/pipermail/hotspot-dev/2019-October/039783.html
The use of the Mutex* causes a performance issue.

@dholmes-ora Should we just have MetadataFactory::free_metadata call the destructor for T right after deallocate_contents and immediately before deallocating? The compiler will be smart enough to omit the call for types that are trivially destructible. That would also solve this bug, preserve any performance, and ensure it cannot happen again easily. It is fairly easy to forget to call release_c_heap_structures.

This would also assume MetadataFactory::free_metadata is always called with the most derived class, since I believe Metadata objects don't have virtual destructors.

@dholmes-ora
Copy link
Member

I see how JDK-8297389 caused this now but am unclear how to truly fix it. The memory management aspects of all this are extremely convoluted. In the context of JDK-8297389 if it needs to free a MethodData that it allocated, then I would expect that to be a complete freeing - not sure why it doesn't call the destructor to handle things as that is what the destructor is for.

@jcking
Copy link
Contributor Author

jcking commented Dec 7, 2022

I see how JDK-8297389 caused this now but am unclear how to truly fix it. The memory management aspects of all this are extremely convoluted. In the context of JDK-8297389 if it needs to free a MethodData that it allocated, then I would expect that to be a complete freeing - not sure why it doesn't call the destructor to handle things as that is what the destructor is for.

IMO the destructor should just be called from MetadataFactory::free_metadata before returning the memory to the underlying arena. That way the destructor is the last thing called and not accessed again. Destructors mark the end of the objects lifetime.

@TobiHartmann
Copy link
Member

IMO the destructor should just be called from MetadataFactory::free_metadata before returning the memory to the underlying arena. That way the destructor is the last thing called and not accessed again. Destructors mark the end of the objects lifetime.

Yes, that does sound like the right thing to do.

@dholmes-ora
Copy link
Member

@dholmes-ora Is that what you were suggesting when you said "not sure why it doesn't call the destructor to handle things as that is what the destructor is for"?

Yes - though as I said the memory management here is so convoluted that it is hard for me to see in detail if that is correct.

@coleenp
Copy link
Contributor

coleenp commented Dec 7, 2022

No, do not make MetadataFactory::free_metadata call the destructor. It might break other metadata that is passed to it. It would require special consideration and redo the whole design of the thing. We really don't want metadata objects to have destructors because they shouldn't be normally deleted individually. Yes, the memory management is convoluted here.
The release_C_heap_structures() call is the right thing to do here.
I see that there's concern about performance of the pointer to Mutex. I measured this at one point but can't explain why it would make a difference.

@jcking jcking requested review from coleenp and TobiHartmann and removed request for TobiHartmann and coleenp December 8, 2022 05:01
@dholmes-ora
Copy link
Member

Sorry, the destructor has to, of course, be used in a safe manner - I didn't mean to imply otherwise. Given what free_metadata does with the md we can't call the destructor before calling it. Perhaps we can add a default parameter bool invoke_destructor = false to free_metadata, to then call the destructor at the right point (which would be before or after md->deallocate_contents).

@jcking
Copy link
Contributor Author

jcking commented Dec 8, 2022

Sorry, the destructor has to, of course, be used in a safe manner - I didn't mean to imply otherwise. Given what free_metadata does with the md we can't call the destructor before calling it. Perhaps we can add a default parameter bool invoke_destructor = false to free_metadata, to then call the destructor at the right point (which would be before or after md->deallocate_contents).

Yeah, that is what I was thinking. jcking@5f1aad3 is an example of this approach. Though it requires making Metadata hierarchy have virtual destructors to work as expected. Which should be fine, as it already has virtual functions. Happy to move that commit here if that would be the preferred approach.

@dholmes-ora
Copy link
Member

Yeah, that is what I was thinking. jcking@5f1aad3 is an example of this approach.

That is somewhat more elaborate/complex than I envisaged - and I'm not sure all those C++ features are usable in hotspot at present. Can we just add a specialized version of free_metadata for MethodData which takes the extra parameter?

…tructor should be invoked

Signed-off-by: Justin King <jcking@google.com>
@jcking
Copy link
Contributor Author

jcking commented Dec 8, 2022

Yeah, that is what I was thinking. jcking@5f1aad3 is an example of this approach.

That is somewhat more elaborate/complex than I envisaged - and I'm not sure all those C++ features are usable in hotspot at present. Can we just add a specialized version of free_metadata for MethodData which takes the extra parameter?

That works for me, though all those features should be C++11 and C++14. Added a default argument to MetadataFactory::free_metadata that controls whether the destructor is invoked. If is false by default.

Signed-off-by: Justin King <jcking@google.com>
Comment on lines +76 to +77
using U = typename RemoveCV<typename RemovePointer<T>::type>::type;
md->~U();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of a template specialization for MethodData, but I guess this works. Lets see what others think.

Copy link
Contributor

@coleenp coleenp Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can do this here. The preceding deallocate_contents() function will deallocate the metadata so this might crash. It's like calling the destructor after the delete operator.
Maybe before it the deallocate contents call?
Specializing for MethodData might work if you add code to tell InstanceKlass::deallocate_contents() not to call release_C_heap_structures on methods.

Edit: I'm wrong, it's the deallocate function call below does the deallocate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't you have to free this?
FailedSpeculation::free_failed_speculations(method_data()->get_failed_speculations_address());

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me (took me a bit to understand the template metaprogamming involved). A comment wouldn't hurt.

Signed-off-by: Justin King <jcking@google.com>
Signed-off-by: Justin King <jcking@google.com>
@jcking
Copy link
Contributor Author

jcking commented Dec 8, 2022

Looks reasonable to me (took me a bit to understand the template metaprogamming involved). A comment wouldn't hurt.

Fair, added some comments.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the additional c++ code for the exceptional case of MethodData. The lifetime of Metadata and it will leak for class unloading. I think fixing this to be more rational for the general case would be better via the RFE that you submitted.

@@ -144,8 +144,6 @@ void Method::release_C_heap_structures() {
#if INCLUDE_JVMCI
FailedSpeculation::free_failed_speculations(method_data()->get_failed_speculations_address());
#endif
// Destroy MethodData
method_data()->~MethodData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need this here. This is for the class unloading path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this needs to remain else we are removing other destructor calls, not just adding back the one currently missing. But it raises the question for me as to why we have two distinct code paths to clean up the MethodData: one from Method::release_C_heap_structures and the other from Method::deallocate_contents? Aren't there code paths where both of those would be executed and we would then call the destructor twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw the other comments point to the comment in JDK-8298346 so no need to respond to this.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 8, 2022
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit made the most sense to me in the context of this code.

Comment on lines +76 to +77
using U = typename RemoveCV<typename RemovePointer<T>::type>::type;
md->~U();
Copy link
Contributor

@coleenp coleenp Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can do this here. The preceding deallocate_contents() function will deallocate the metadata so this might crash. It's like calling the destructor after the delete operator.
Maybe before it the deallocate contents call?
Specializing for MethodData might work if you add code to tell InstanceKlass::deallocate_contents() not to call release_C_heap_structures on methods.

Edit: I'm wrong, it's the deallocate function call below does the deallocate.

@coleenp
Copy link
Contributor

coleenp commented Dec 8, 2022

I can't find why we thought it was more performant to embed the mutex in MethodData.

Though it requires making Metadata hierarchy have virtual destructors to work as expected. Which should be fine, as it already has virtual functions.

Not everything that calls free_metadata is a Metadata object. Some are MetaspaceObj and don't have virtual pointers.

@coleenp
Copy link
Contributor

coleenp commented Dec 8, 2022

See: #11601 Was there a test for this?

I think I have an alternative and I posted how it's supposed to work in https://bugs.openjdk.org/browse/JDK-8298346.

@jcking
Copy link
Contributor Author

jcking commented Dec 9, 2022

I can't find why we thought it was more performant to embed the mutex in MethodData.

If I had to guess, its because it removes an allocation. Mutex currently performs strdup on the name. So creating a Mutex via new requires 2 allocations. Embedding it requires 1.

@jcking
Copy link
Contributor Author

jcking commented Dec 9, 2022

See: #11601 Was there a test for this?

Not directly, with the proposed approach (I left a comment as I was still a little confused), it might actually be possible. However it will be much easier to catch once I upstream LSan. Speaking of that, I filed https://bugs.openjdk.org/browse/JDK-8298445 as I have been meaning to.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this to the runtime experts to review but the PR needs to be rebased for JDK 20 in any case.

@coleenp
Copy link
Contributor

coleenp commented Dec 9, 2022

I want to take over this fix and I will rebase it to JDK 20. Sorry for being so late to remembering how this code works. The LSan tool sounds great. I'll pay attention to that now.

@jcking
Copy link
Contributor Author

jcking commented Dec 9, 2022 via email

@jcking
Copy link
Contributor Author

jcking commented Dec 12, 2022

Withdrawing in favor of @coleenp pull requests.

@jcking jcking closed this Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
4 participants