-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8272609: Add string deduplication support to SerialGC #5153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back ddong! A progress list of the required criteria for merging this PR into |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes mostly look okay.
But it's not obvious that string deduplication support is useful for SerialGC. The kinds of applications SerialGC is often used for don't really benefit from string deduplication. Can someone suggest a good use-case?
One possible benefit is that if Epsilon is the only collector not supporting string deduplication then we might be able to simplify the string deduplication tests, replacing the per-GC test descriptions with just one to require !Epsilon and otherwise use whatever the current GC happens to be.
…rs or destructors
In some serverless applications (such as AWS Lambda) scenarios, SerialGC is generally used as the default GC algorithm. If there are many string constructions in the application, I think string deduplication may be beneficial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more tidying up. Sorry for not noticing a couple of these earlier.
#ifndef SHARE_GC_SERIAL_STRINGDEDUP_HPP | ||
#define SHARE_GC_SERIAL_STRINGDEDUP_HPP | ||
|
||
#include "gc/shared/stringdedup/stringDedup.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing here that requires this header. Rather, it is needed in the inline.hpp file and should be moved there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
// Candidate selection policy for young during evacuation. | ||
// If to is young then age should be the new (survivor's) age. | ||
// if to is old then age should be the age of the copied from object. | ||
static inline bool is_candidate_from_evacuation(oop java_string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument is poorly named. The value might not be, and indeed is unlikely to be, a String. Part of this function's job is to check whether the argument is a String.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Use obj as instead.
// Candidate if string is being evacuated from young to old but has not | ||
// reached the deduplication age threshold, i.e. has not previously been a | ||
// candidate during its life in the young generation. | ||
return SerialHeap::heap()->young_gen()->is_in_reserved(java_string) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This depends on implicit includes since there's nothing here to ensure the type of young_gen()
is in scope. I think it's coming from serialHeap.hpp, but that could be changed in the future to forward-declare the generation class rather than include the associated header. It might be better to have this function in a .cpp file (as it was in an earlier iteration) to limit header exposure in clients. This function isn't so performance critical, since before calling it we've already checked that deduplication is enabled and the argument is a String, so being inline isn't as important here as for is_candidate_from_evacuation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@kimbarrett |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@D-D-H This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 4 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kimbarrett, @walulyai) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
@@ -601,6 +603,8 @@ void DefNewGeneration::collect(bool full, | |||
// Verify that the usage of keep_alive didn't copy any objects. | |||
assert(heap->no_allocs_since_save_marks(), "save marks have not been newly set."); | |||
|
|||
_string_dedup_requests.flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if placing _string_dedup_requests.flush();
before weak roots processing (WeakProcessor::weak_oops_do
) makes more senses. The documentation says flush
should be called when marking is done; for Serial young GC, it's equivalent to when evacuation is done, which is right after ref-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That documentation is mistaken; that's a left-over from an earlier development version that I forgot to update. Without special additional care, the lifetime also needs to cover final-reference processing. I have a todo item to file a bug against that comment.
@@ -214,4 +216,5 @@ void MarkSweep::KeepAliveClosure::do_oop(narrowOop* p) { MarkSweep::KeepAliveClo | |||
void MarkSweep::initialize() { | |||
MarkSweep::_gc_timer = new (ResourceObj::C_HEAP, mtGC) STWGCTimer(); | |||
MarkSweep::_gc_tracer = new (ResourceObj::C_HEAP, mtGC) SerialOldTracer(); | |||
MarkSweep::_string_dedup_requests = new StringDedup::Requests(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why _string_dedup_requests
is a pointer. Is it possible to define static StringDedup::Requests _string_dedup_requests;
directly? That way, Requests
doesn't need to use the inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a non-local variable with a non-trivial destructor. See
https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2021-August/036412.html
@@ -112,6 +112,8 @@ void GenMarkSweep::invoke_at_safepoint(ReferenceProcessor* rp, bool clear_all_so | |||
|
|||
deallocate_stacks(); | |||
|
|||
MarkSweep::_string_dedup_requests->flush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, flush
should be called inside mark_sweep_phase1
, right after ref-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review and Kim's detailed explanation.
I think putting this invocation here or the position you suggest should have no effect on the result, so I tend not to make this modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's desirable to narrow the request window (btw first add
and flush
), which offers more prompt string-dedup and memory release (in flush
). Additionally, it makes more sense to call flush
right after the complete live objects traversal (strong-marking + final-marking as part of ref-processing), since we will not visit any new objects then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer it where it is. Rather than putting something between phantom
reference processing an oopstorage-based weak processing, I would prefer to
merge those two. I don't remember if there's an RFE for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think flush
should be called sooner, e.g. inside mark_sweep_phase1
, which is not in conflict with merging phantom-processing and weak root processing. Anyway, this is subjective; feel free to integrate the current version.
@kimbarrett @walulyai @albertnetymk thanks for the review! |
/integrate |
/sponsor |
Going to push as commit e8a289e.
Your commit was automatically rebased without conflicts. |
Hi team,
Please help review this change to add string deduplication support to SerialGC.
Thanks,
Denghui
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153
$ git checkout pull/5153
Update a local copy of the PR:
$ git checkout pull/5153
$ git pull https://git.openjdk.java.net/jdk pull/5153/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5153
View PR using the GUI difftool:
$ git pr show -t 5153
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5153.diff