8335356: Shenandoah: Improve concurrent cleanup locking#20086
8335356: Shenandoah: Improve concurrent cleanup locking#20086pengxiaolong wants to merge 19 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into |
|
@pengxiaolong This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 24 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@ysramakrishna, @shipilev) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
|
@pengxiaolong The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
ysramakrishna
left a comment
There was a problem hiding this comment.
Could you share any visible changes using the three different schemes (and baseline current) with say SPECjbb or such. Ideally, this affects some user-visible score or latency that we can use as a goodness metric that improves. I am a bit leery of why exactly 30 us, and not say 100 us. Also, I am thinking that a straight count might perform as well and the time-based solution almost seems overengineered to me -- or at least I'd like to see evidence that that engineering effort is worth the resulting bang for a service level metric such as latency or throughput.
I suggested the time-based approach to Xiaolong to side-step the discussion about the "reasonable" batch size. The good batch size would fluctuate between the machines, heap sizes, region counts. Since we are doing this whole dance to avoid hoarding the lock for a long time to avoid tail latencies increase for allocators waiting for the same lock, it is also more reasonable to just track the time directly here. This is not to mention that fastdebug builds would zap the unused heap, which makes cleanup orders of magnitude slower, and the large batch sizes would hoard the lock way too much, deviating from the "normal" release behavior. Time-based approach accomodates this as well. |
|
Found an easy workload to demonstrate the impact on max latencies on allocation path. |
|
Impressive and a nice demonstration of the improvements! Benchmarking with HyperAlloc may also be useful or even just SPECjbb may show some non-linear improvements, who knows? May be worth measuring, perhaps? Running the count-based and time-based on a (slow,fast) x (arm,x86) system to fill the matrix would be great, but may be more effort than worthwhile, but just putting it out there. Good data of actual measured improvements always makes me happy, though! :-) Thanks for the extra effort in collecting the data and sharing it. Reviewed and approved, thank you! |
|
Thanks a lot @shipilev @ysramakrishna! I'll attach more benchmark result if I get some. /integrate |
|
@pengxiaolong |
|
Based on Aleksey's benchmark, I wrote a very simple benchmark to generate HdrHistogram, run command like below to generate HdrHistogram metrics: (hardware: AWS EC2 r7g.4xlarge) Here is the HdrHistogram: |
|
/sponsor |
|
Going to push as commit b32e4a6.
Your commit was automatically rebased without conflicts. |
|
@shipilev @pengxiaolong Pushed as commit b32e4a6. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |

Hi all,
This PR is to improve the usage of heap lock in ShenandoahFreeSet::recycle_trash, the original observation mentioned in the bug should be caused by an uncommitted/reverted change I added when Aleksey and I worked on JDK-8331411. Even the change was not committed, the way ShenandoahFreeSet::recycle_trash using heap lock is still not efficient, we think we should improve it.
With the logs added in this commit 5688ee2, I got some key metrics: average time to acquire heap lock is about 450 ~ 900 ns, average time to recycle one trash regions is about 600ns (if not batched, it is 1000+ns, might be related to branch prediction). The current implementation takes heap lock once for every trash region, assume there are 1000 regions to recycle, the time wasted on acquiring heap lock is more than 0.6ms.
The PR splits the recycling process into two steps: 1. Filter out all the trash regions; 2. recycle the trash regions in batches. I can see some benefits which improve the performance:
1. Less time spent on acquiring heap lock, less contention with mutators/allocators
2.Simpler loops in filtering and batch recycling, presumably benefit CPU branch prediction
Here are some logs from test running h2 benchmark:
TIP with debug log, code link, Average time per region: 2312 ns
Optimized, but w/o batching optimization, basically recycle all trash with one single lock acquirement , code link, Average time per region: 560 ns
With batch size of 128, code link, Average time per region: 533 ns
Batch with timed lock up to 30us, PR version, Average time per region: 1118 ns
Decided on batch with timed lock for following reasons:
Additional test:
make clean test TEST=hotspot_gc_shenandoahProgress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20086/head:pull/20086$ git checkout pull/20086Update a local copy of the PR:
$ git checkout pull/20086$ git pull https://git.openjdk.org/jdk.git pull/20086/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20086View PR using the GUI difftool:
$ git pr show -t 20086Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20086.diff
Webrev
Link to Webrev Comment