Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8317809: Insertion of free code blobs into code cache can be very slow during class unloading #16759

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 21, 2023

Insert code blobs in a sorted fashion to exploit the finger-optimization when adding, making this procedure O(n) instead of O(n^2)

Introduces a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading. GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge).

The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every insertion to allow for concurrent users for the lock to progress.

Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2).

Upcoming changes will

  • separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly reduce code purging time for the STW collectors.
  • better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in G1CollectedHeap::complete_cleaning)
  • untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better parallelism
  • G1: move some significant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging)
  • Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging.

Please also first looking into the (small) PR this depends on.

The crash on linux-x86 is fixed by PR#16766 which I split out for quicker reviews.

Testing: tier1-7

Thanks,
Thomas


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8317809: Insertion of free code blobs into code cache can be very slow during class unloading (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16759/head:pull/16759
$ git checkout pull/16759

Update a local copy of the PR:
$ git checkout pull/16759
$ git pull https://git.openjdk.org/jdk.git pull/16759/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16759

View PR using the GUI difftool:
$ git pr show -t 16759

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16759.diff

Webrev

Link to Webrev Comment

Thomas Schatzl added 3 commits November 20, 2023 12:20
…ptimization when adding, making this procedure O(n) instead of O(n^2)

Introduce a globally available ClassUnloadingContext that contains common methods pertaining to class and code unloading.
GCs may use it to efficiently manage unlinked class loader datas and nmethods to allow use of common methods (unlink/merge).

The steps typically are registering a new to be unlinked CLD/nmethod, and then purge its memory later. STW collectors perform
this work in one big chunk taking the CodeCache_lock, for the entire duration, while concurrent collectors lock/unlock for every
insertion to allow for concurrent users for the lock to progress.

Some care has been taken to stay consistent with an "unloading = unlinking + purge" scheme; however particularly the existing
CLD handling API (still) mixes unlinking and purging in its CLD::unload() call. To simplify this change that is mostly geared
towards separating nmethod unlinking from purging, to make code blob freeing O(n) instead of O(n^2).

Upcoming changes will
* separate nmethod unregistering from nmethod purging to allow doing that in bulk (for the STW collectors); that can significantly
  reduce code purging time for the STW collectors.
* better name the second stage of unlinking (called "cleaning" throughout, e.g. the work done in `G1CollectedHeap::complete_cleaning`)
* untangle CLD unlinking and what's called "cleaning" now to allow moving more stuff into the second unlinking stage for better
  parallelism
* G1: move some signifcant tasks from the remark pause to concurrent (unregistering nmethods, freeing code blobs and cld/metaspace purging)
* Maybe move Serial/Parallel GC metaspace purging closer to other unlinking/purging code to keep things local and allow easier logging.
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 21, 2023

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into pr/16733 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8317809 8317809: Insertion of free code blobs into code cache can be very slow during class unloading Nov 21, 2023
@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@tschatzl The following labels will be automatically applied to this pull request:

  • hotspot
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Nov 21, 2023
@tschatzl
Copy link
Contributor Author

/label add hotspot-gc

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Nov 21, 2023
@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@tschatzl
The hotspot-gc label was successfully added.

@tschatzl tschatzl marked this pull request as ready for review November 21, 2023 16:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 21, 2023

Webrevs

@tschatzl
Copy link
Contributor Author

I added some explanation of why there is a class hierarchy for ClassUnloadingContext in the description (and some further background).

@openjdk-notifier openjdk-notifier bot changed the base branch from pr/16733 to master November 22, 2023 17:21
@openjdk-notifier
Copy link

The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout submit/8317809-sorted-insertion-of-free-blobs
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk
Copy link

openjdk bot commented Nov 22, 2023

@tschatzl this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout submit/8317809-sorted-insertion-of-free-blobs
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Nov 22, 2023
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Nov 24, 2023
…loadingContext for now as unnecessary for this change, use iterators, other review comments
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 30, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 30, 2023
Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one minor & subjective comment.

Comment on lines +68 to +69
void purge_nmethods();
void free_code_blobs();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is exposing too much detail, especially when the adjacent API just combines them.

@openjdk
Copy link

openjdk bot commented Dec 4, 2023

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8317809: Insertion of free code blobs into code cache can be very slow during class unloading

Reviewed-by: iwalulya, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • 1cf7ef5: 8321273: Parallel: Remove unused UpdateOnlyClosure::_space_id
  • 517b178: 8306914: Implement JEP 458: Launch Multi-File Source-Code Programs
  • aec3865: 8320697: RISC-V: Small refactoring for runtime calls
  • 50d1839: 8318809: java/util/concurrent/ConcurrentLinkedQueue/WhiteBox.java shows intermittent failures on linux ppc64le and aarch64
  • 81484d8: 8320687: sun.jvmstat.monitor.MonitoredHost.getMonitoredHost() throws unexpected exceptions when invoked concurrently
  • 30b5d42: 8321069: JvmtiThreadState::state_for_while_locked() returns nullptr for an attached JNI thread with a java.lang.Thread object after JDK-8319935
  • bd04f91: 8321131: Console read line with zero out should zero out underlying buffer in JLine
  • 155abc5: 8311906: Improve robustness of String constructors with mutable array inputs
  • 316b783: 8321276: runtime/cds/appcds/dynamicArchive/DynamicSharedSymbols.java failed with "'17 2: jdk/test/lib/apps ' missing from stdout/stderr"
  • 65be5e0: 8305931: jdk/jfr/jcmd/TestJcmdDumpPathToGCRoots.java failed with "Expected chains but found none"
  • ... and 12 more: https://git.openjdk.org/jdk/compare/0d0a657414563a2211bcc3474aa7e4317307f98b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 4, 2023
@tschatzl
Copy link
Contributor Author

tschatzl commented Dec 4, 2023

Fwiw, to put this change in a bit more context: it is part of a series of changes to improve class unloading performance back to pre-jdk21 levels (and better).

The basic plan:

  • this change, JDK-8317809, that improves nmethod sorting/free list handling (and introduces the ClassUnloadingContext)

  • JDK-8317007 that allows bulk unregistering of nmethods instead of (slow) per-nmethod unregistering (also out for review)

With the above two changes, Remark pause time should be <= before removal of the code root sweeper (lots of changes went in already that improved time taken for various parts of the class/code unloading).

I am planning the following follow-ups in the next few months (after FC time will be spent on bugfixing, and holidays coming up):

  • (for G1) move out several parts of class unloading into the concurrent phase, at least this will include
    • bulk nmethod unregistering (JDK-8317007)
    • nmethod code blob freeing (this change)
    • metaspace unloading

Not necessarily in a single change; this basically halves g1 remark pause times again in my testing.

  • split up and parallelize ClassLoaderData unloading; currently with this change, when registering CLDs CLD->unload() is immediately called as before. However this is wasteful as most of that method can either be "obviously" parallelized or made so that other tasks can run in parallel.
    So the plan is that class unloading (SystemDictionary::do_unloading) will be split into a part that iterates only over the CLD list to determine dead ones, and a parallel part.

There are no CR/PRs out for these latter two items, but hopefully this will short of making everything concurrent keep class/code unloading times low enough for some time.

@tschatzl
Copy link
Contributor Author

tschatzl commented Dec 5, 2023

Thanks @albertnetymk @walulyai for your reviews
/integrate

@openjdk
Copy link

openjdk bot commented Dec 5, 2023

Going to push as commit 30817b7.
Since your change was applied there have been 23 commits pushed to the master branch:

  • a56286f: 8321269: Require platforms to define DEFAULT_CACHE_LINE_SIZE
  • 1cf7ef5: 8321273: Parallel: Remove unused UpdateOnlyClosure::_space_id
  • 517b178: 8306914: Implement JEP 458: Launch Multi-File Source-Code Programs
  • aec3865: 8320697: RISC-V: Small refactoring for runtime calls
  • 50d1839: 8318809: java/util/concurrent/ConcurrentLinkedQueue/WhiteBox.java shows intermittent failures on linux ppc64le and aarch64
  • 81484d8: 8320687: sun.jvmstat.monitor.MonitoredHost.getMonitoredHost() throws unexpected exceptions when invoked concurrently
  • 30b5d42: 8321069: JvmtiThreadState::state_for_while_locked() returns nullptr for an attached JNI thread with a java.lang.Thread object after JDK-8319935
  • bd04f91: 8321131: Console read line with zero out should zero out underlying buffer in JLine
  • 155abc5: 8311906: Improve robustness of String constructors with mutable array inputs
  • 316b783: 8321276: runtime/cds/appcds/dynamicArchive/DynamicSharedSymbols.java failed with "'17 2: jdk/test/lib/apps ' missing from stdout/stderr"
  • ... and 13 more: https://git.openjdk.org/jdk/compare/0d0a657414563a2211bcc3474aa7e4317307f98b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 5, 2023
@openjdk openjdk bot closed this Dec 5, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 5, 2023
@openjdk
Copy link

openjdk bot commented Dec 5, 2023

@tschatzl Pushed as commit 30817b7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8317809-sorted-insertion-of-free-blobs branch December 5, 2023 10:38
assert(cld->is_unloading(), "invariant");
cld->classes_do(f);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why CLDG specific methods were moved here. They should be unaware of nmethod purging. and these 4 methods don't have any nmethod purging in them either and are specific to the CLDG implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have GC take control how the unloading CLDs are stored/which data structure it is going to use to manage them to ultimately allow more control about class unloading for parallelization.

Which on the one hand makes pauses shorter (for stw collectors), and on the other hand decreases the time the CLDG_lock is held (not sure it is nice that the concurrent collectors currently may hold that one for ~100ms in my test...).

I believe having the linked list of unloading CLDs embedded in the CLDs for use by the GC not only seems wrong (i.e. it's a GC data structure located in runtime code) but is also very limiting (need to have one for all, fixed singly linked list).

This change moves knowledge of how unloading CLDs are managed to GC area - runtime code just tells GC that a particular CLD is unloading.
(Currently the ClassUnloadingContext also calls the unload method during registration to keep current functionality, but the plan is to separate the step of registration and actual unloading to allow custom handling of the second part; the registering, although it's still walking a singly linked list, is comparatively fast).

These four methods provide a thin abstraction over the CLDs that are unloading (that runtime doesn't need and should not worry about imo).

With that in place it is possible to slice the actual unloading work into phases according to dependencies (depending on GC if desired), potentially overlapping with other existing phases in collectors already allowing that (e.g. the parallel code unloading, but that is only an implementation detail to reduce overall parallel phases), or even moving some of that work sometime else (the CLD::unload() method unfortunately currently may do some memory freeing too).

However most time is spent in notifying various components which can be parallelized (at least parallelize the different types of notifications).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org
5 participants