Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8279949: JavaThread::_free_handle_block leaks native memory #7453

Closed
wants to merge 1 commit into from

Conversation

lmesnik
Copy link
Member

@lmesnik lmesnik commented Feb 12, 2022

Please review following fix which delete whole list of JNIHandle blocks in JNIHandleBlock::release_block(...).
Also, I added sanity verification of _pop_frame_link to ensure that there are no leaks there.

Fix verified with tier1-6. Also, verified that memory leak is not reproduced anymore.

Thanks to Vladimir I. for finding exact root cause of problem.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8279949: JavaThread::_free_handle_block leaks native memory

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7453/head:pull/7453
$ git checkout pull/7453

Update a local copy of the PR:
$ git checkout pull/7453
$ git pull https://git.openjdk.java.net/jdk pull/7453/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7453

View PR using the GUI difftool:
$ git pr show -t 7453

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7453.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 12, 2022

👋 Welcome back lmesnik! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 12, 2022
@openjdk
Copy link

openjdk bot commented Feb 12, 2022

@lmesnik The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 12, 2022
@mlbridge
Copy link

mlbridge bot commented Feb 12, 2022

Webrevs

@dholmes-ora
Copy link
Member

Hi Leonid,

I can't help but think the real problem here is that some code is not managing the active handles correctly. AFAICS we should not end up with a chain of handle blocks when a thread exits! The JVMTI push_frame/pop_frame logic looks a bit messy in that regard - it isn't clear if these are allowed to be unbalanced.

Cheers,
David

@dholmes-ora
Copy link
Member

Correction JNI pushLocalFrame/popLocalFrame, not JVM TI.

@dholmes-ora
Copy link
Member

On further examination it seems we can't guarantee balanced use so cleaning up at thread exit does seem the right approach.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry Leonid,

I think we need to understand exactly how we end up with a chain of blocks at thread exit - see comment below.

David

assert(block->pop_frame_link() == NULL, "pop_frame_link should be NULL");
delete block;
block = next;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the comment below about the pop_frame_link, I'm concerned that it may be possible that the loop above has deleted the block that pop_frame_link refers to. AFAICS we should only have a chain above if we have unbalanced pushLocalFrame/popLocalFrame, but that is the case the following code is trying to deal with too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the bug report indicates the problem is with a chain of blocks in the free-list - something which again seems to indicate unbalanced use. But the fix doesn't just change the free-list handling, it also changes the active-list handling.

@iwanowww
Copy link

think we need to understand exactly how we end up with a chain of blocks at thread exit

My understanding is JNIHandleBlock is allocated on every upcall from VM to Java code (it's part of JavaCallWrapper ctor). Since it is possible to have "nested" upcalls (Java -> VM -> Java -> VM), there are multiple thread-local blocks allocated and they eventually end up on the free list.

@dholmes-ora
Copy link
Member

@iwanowww Right for the free-list case this is expected if there are nested upcalls. So that is normal - my mistake for thinking it needed unbalanced use. So happy to see that free-list get recursively deleted.

But still concerned there may be some unexpected interaction with pop_frame_link when the active-list is cleared.

Simplest solution would be to introduce a new method that clears recursively and use that for the free-list case. Still leaves unanswered questions about the active-list, but wouldn't change any behaviour there.

@lmesnik
Copy link
Member Author

lmesnik commented Feb 13, 2022

I agree that JNIHandleBlocks are a little bit unclear and might need some explanation.

The active_list is organized as list of lists of JNIHandleBlocks:
block->_next->_next....
|
_pop_frame_link
|
block->_next->_next...

so the only fist block could have non-NULL _pop_frame_link and refer to the first block in the list. See class JNIHandleBlock

  // The following instance variables are only used by the first block in a chain.
  // Having two types of blocks complicates the code and the space overhead in negligible.
  JNIHandleBlock* _last;                        // Last block in use
  JNIHandleBlock* _pop_frame_link;              // Block to restore on PopLocalFrame call
  uintptr_t*      _free_list;                   // Handle free list

The function release_block() deletes (or return to _free_handle_block) the list block->_next->_next.. and recursively calls itself for next frame.

The JNIHandleBlock referred with _next never might be referred with _pop_frame_link and cloud be safely deleted. My assertion just check that only first block refer to other frame.

It might be simpler to have
JNIHandleBlockFrame which is list of JNIHandleBlock and also have a pointer _pop_frame_link to other chain to avoid mess with _next and _pop_frame_link pointers.

@dholmes-ora
Copy link
Member

Leonid has pointed out how the bug was introduced by JDK-8276658. Previously release_block would either add a chain of blocks to the thread-local freelist, or the global freelist. After JDK-8276658 it would either add a chain of blocks to the thread-local freelist or delete the first block in the chain - hence the current bug.

IIUC now, _pop_frame_link if non-null, points to a completely distinct chain of blocks that was saved as part of pushLocalFrame and so can never point to a block which has just been deleted.

@openjdk
Copy link

openjdk bot commented Feb 14, 2022

@lmesnik This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8279949: JavaThread::_free_handle_block leaks native memory

Reviewed-by: dholmes, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 16 new commits pushed to the master branch:

  • 9d0a4c3: 8274238: Inconsistent type for young_list_target_length()
  • 2604a88: 8281585: Remove unused imports under test/lib and jtreg/gc
  • 534e557: 8256368: Avoid repeated upcalls into Java to re-resolve MH/VH linkers/invokers
  • 95f198b: 8274980: Improve adhoc build version strings
  • c61d629: 8281553: Ensure we only require liveness from mach-nodes with barriers
  • 2597206: 8280783: Parallel: Refactor PSCardTable::scavenge_contents_parallel
  • 2632d40: 8281637: Remove unused VerifyOption_G1UseNextMarking
  • 46f5229: 8281539: IGV: schedule approximation computes immediate dominators wrongly
  • 1ef45c5: 8280799: С2: assert(false) failed: cyclic dependency prevents range check elimination
  • 483d4b9: 8281505: Add CompileCommand PrintIdealPhase
  • ... and 6 more: https://git.openjdk.java.net/jdk/compare/6fdfe0458df989a7946b4f52a3023e8a39fb3bbb...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 14, 2022
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thank you for finding and fixing this!

Atomic::dec(&_blocks_allocated);
delete block;
} else {
DEBUG_ONLY(block->set_pop_frame_link(NULL));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also be an assert that the pop_frame_link is null ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first block which can have non-NULL set_pop_frame_link. So I set it to NULL so assert in the while loop doesn't fail.

@lmesnik
Copy link
Member Author

lmesnik commented Feb 15, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Feb 15, 2022

Going to push as commit 1aff44b.
Since your change was applied there have been 34 commits pushed to the master branch:

  • 394ce5f: 8280825: Modules that "provide" ToolProvider should document the name that can be used
  • 745f7e7: 8281186: runtime/cds/appcds/DumpingWithNoCoops.java fails
  • 1870465: 8281744: x86: Use short jumps in TIG::set_vtos_entry_points
  • 2fe0bf6: 8281748: runtime/logging/RedefineClasses.java failed "assert(addr != __null) failed: invariant"
  • bc61484: 8280136: Serial: Remove unnecessary use of ExpandHeap_lock
  • 2112a9d: 8246033: bin/print_config.js script uses nashorn jjs tool
  • 1c12b15: 8281741: [testbug] PrintIdealPhaseTest fails with -Xcomp
  • f82866b: 8281555: [macos] Get rid of deprecated Style Masks constants
  • 8819f45: 8281722: Removal of PrintIdealLevel
  • 622970e: 8281728: Redundant null check in LineNumberInputStream.read
  • ... and 24 more: https://git.openjdk.java.net/jdk/compare/6fdfe0458df989a7946b4f52a3023e8a39fb3bbb...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 15, 2022
@openjdk openjdk bot closed this Feb 15, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 15, 2022
@openjdk
Copy link

openjdk bot commented Feb 15, 2022

@lmesnik Pushed as commit 1aff44b.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
4 participants