-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8318706: Implement JEP 423: Region Pinning for G1 #16342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8318706: Implement JEP 423: Region Pinning for G1 #16342
Conversation
|
👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into |
|
/label add hotspot-gc |
|
@tschatzl |
The JEP covers the idea very well, so I'm only covering some implementation details here:
* regions get a "pin count" (reference count). As long as it is non-zero, we conservatively
never reclaim that region even if there is no reference in there. JNI code might have
references to it.
* the JNI spec only requires us to provide pinning support for typeArrays, nothing else.
This implementation uses this in various ways:
* when evacuating from a pinned region, we evacuate everything live but the typeArrays to
get more empty regions to clean up later.
* when formatting dead space within pinned regions we use filler objects. Pinned regions
may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray
either.
These dead but referenced typeArrays luckily have the same header size of our filler
objects, so we can use their headers for our fillers. The problem is that previously
there has been that restriction that filler objects are half a region size at most, so
we can end up with the need for placing a filler object header inside a typeArray.
The code could be clever and handle this situation by splitting the to be filled area
so that this can't happen, but the solution taken here is allowing filler arrays to
cover a whole region. They are not referenced by Java code anyway, so there is no harm
in doing so (i.e. gc code never touches them anyway).
* G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind
are never put into the collection set and automatically skipped. However assuming that the
pinning is of short length, we put them into the candidates when we can.
* there is the problem that if an applications pins a region for a long time g1 will skip
evacuating that region over and over. that may lead to issues with the current policy
in marking regions (only exit mixed phase when there are no marking candidates) and
just waste of processing time (when the candidate stays in the retained candidates)
The cop-out chosen here is to "age out" the regions from the candidates and wait until
the next marking happens.
I.e. pinned marking candidates are immediately moved to retained candidates, and if
in total the region has been pinned for `G1NumCollectionsKeepUnreclaimable` collections
it is dropped from the candidates. Its current value is fairly random.
* G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition
to something like:
`GC(6) Pause Young (Normal) (Evacuation Failure) 1M->1M(22M) 36.16ms`
there is that new tag `(Pinned)` that indicates that one or more regions that were pinned
were encountered during gc. E.g.
`GC(6) Pause Young (Normal) (Pinned) (Evacuation Failure) 1M->1M(22M) 36.16ms`
`Pinned` and `Evacuation Failure` tags are not exclusive. GC might have encountered both pinned
regions and evacuation failed regions in the same collection or even in the same region.
whitespace fixes
925edf8 to
44d430a
Compare
…/HeapRegion.java so that resourcehogs/serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java does not fail
|
The new TestPinnedOldObjectsEvacuation.java test isn't stable, otherwise passes tier1-8. No perf changes. I'm opening this PR for review even if this is the case, this is not a blocker for review, and fix it later. |
Webrevs
|
|
Had a discussion with @albertnetymk and we came to the following agreement about naming: "allocation failure" - allocation failed in the to-space due to memory exhaustion I will apply this new naming asap. |
… evacuation failure and types of it: * evacuation failure is the general concept. It includes * pinned regions * allocation failure One region can both be pinned and experience an allocation failure. G1 GC messages use tags "(Pinned)" and "(Allocation Failure)" now instead of "(Evacuation Failure)" Did not rename the G1EvacFailureInjector since this adds a lot of noise.
Done. I left out the Tier1 seems to pass, will redo upper tiers again. The only noteworthy externally visible change is that the |
…oung/old generation.
| "retained region restore purposes.") \ | ||
| range(1, 256) \ | ||
| \ | ||
| product(uint, G1NumCollectionsKeepPinned, 8, DIAGNOSTIC, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason this is not EXPERIMENTAL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this does not in any way enable risky/experimental code not fit for production. This knob is for helping diagnose performance issues.
G1 does have its fair share of experimental options, but all/most of these were from the initial import where G1 as a whole had been experimental (unstable) for some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag conceptually related (or similar) to G1RetainRegionLiveThresholdPercent, which is an exp, so I thought they should be the same category.
|
@tschatzl This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 57 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
walulyai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Nits:
|
@tschatzl this pull request can not be integrated into git checkout submit/8318706-implementation-of-region-pinning-in-g1
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
…Evacuation Failure" with a cause description (either "Allocation" or "Pinned")
kstefanj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few small things.
- fix counting of pinned/allocation failed regions in log - some cleanup of evacuation failure code, removing unnecessary members - comments
|
Thanks @albertnetymk @kstefanj @walulyai for your reviews! Given that the JEP is now targeted, I will integrate. This has been a fairly long journey until today... :) /integrate |
|
Going to push as commit 38cfb22.
Your commit was automatically rebased without conflicts. |
|
@tschatzl thanks for your excellent work! I know that JEP 423 targets JDK 22, but I wonder if this can be backported to JDK 21. For big data workloads like Apache Spark, we do see that G1 generally performs better than other GC algorithms, but one major issue is that it heavily uses JNI for compression/decompression (e.g. zstd-jni), thus easy to OOM. I have tested some internal Spark jobs, which were easy to OOM on JDK 21, work well on JDK 22, but given that JDK 22 has been EOL, I would appreciate it if this could be landed on JDK 21 |
The JEP covers the idea very well, so I'm only covering some implementation details here:
regions get a "pin count" (reference count). As long as it is non-zero, we conservatively never reclaim that region even if there is no reference in there. JNI code might have references to it.
the JNI spec only requires us to provide pinning support for typeArrays, nothing else. This implementation uses this in various ways:
when evacuating from a pinned region, we evacuate everything live but the typeArrays to get more empty regions to clean up later.
when formatting dead space within pinned regions we use filler objects. Pinned regions may be referenced by JNI code only, so we can't overwrite contents of any dead typeArray either. These dead but referenced typeArrays luckily have the same header size of our filler objects, so we can use their headers for our fillers. The problem is that previously there has been that restriction that filler objects are half a region size at most, so we can end up with the need for placing a filler object header inside a typeArray. The code could be clever and handle this situation by splitting the to be filled area so that this can't happen, but the solution taken here is allowing filler arrays to cover a whole region. They are not referenced by Java code anyway, so there is no harm in doing so (i.e. gc code never touches them anyway).
G1 currently only ever actually evacuates young pinned regions. Old pinned regions of any kind are never put into the collection set and automatically skipped. However assuming that the pinning is of short length, we put them into the candidates when we can.
there is the problem that if an applications pins a region for a long time g1 will skip evacuating that region over and over. that may lead to issues with the current policy in marking regions (only exit mixed phase when there are no marking candidates) and just waste of processing time (when the candidate stays in the retained candidates)
The cop-out chosen here is to "age out" the regions from the candidates and wait until the next marking happens.
I.e. pinned marking candidates are immediately moved to retained candidates, and if in total the region has been pinned for
G1NumCollectionsKeepUnreclaimablecollections it is dropped from the candidates. Its current value is fairly random.G1 pauses got a new tag if there were pinned regions in the collection set. I.e. in addition to something like:
GC(6) Pause Young (Normal) (Evacuation Failure) 1M->1M(22M) 36.16msthere is that new tag
(Pinned)that indicates that one or more regions that were pinnedwere encountered during gc. E.g.
GC(6) Pause Young (Normal) (Pinned) (Allocation Failure) 1M->1M(22M) 36.16msPinnedandAllocation Failuretags are not exclusive. GC might have encountered both pinnedregions and allocation failed regions in the same collection or even in the same region. (I am
open to a better name for the
(Pinned)tag)Testing: tier1-8
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16342/head:pull/16342$ git checkout pull/16342Update a local copy of the PR:
$ git checkout pull/16342$ git pull https://git.openjdk.org/jdk.git pull/16342/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16342View PR using the GUI difftool:
$ git pr show -t 16342Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16342.diff
Webrev
Link to Webrev Comment