New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8290025: Remove the Sweeper #9741
Conversation
|
/label add hotspot |
@fisk |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Reviewed most code, but left JVMCI, JVMTI, and tests, for others to review.
@fisk This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 14 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Thanks for the review, @stefank |
Hi @fisk, thanks for the amazing cleanup! We've had a bunch of issues with the code sweeper and AsyncGetCallTrace with trying to unwind from dead code blobs (or what looked like it at least). Have you run some stress tests for AsyncGetCallTrace and this change? It may be something we can look into to make sure it doesn't increase the chances of crashes, or even reduces the chances of crashes. CC @jbachorik @parttimenerd |
Hi Erik, The change breaks the ARM32 port as the nmethod entry barriers are not implemented there yet. We need a way to work without nmethod entry barriers for the ARM32 platform.
|
I would certainly appreciate your help in stress testing that. |
Is there a plan for the arm32 port to support nmethod entry barriers? I would really appreciate if the solution to this problem is that arm32 implements nmethod entry barriers. You only need to support the STW data and code patching variation, and it should do pretty much exactly what said AArch64 port does. Arm32 is the only platform I know of that doesn't support nmethod entry barriers, and IMO the JVM should be able to assume this is a feature that we can rely on. |
I would be interested in seeing what the graph looks like with my current proposed change. I don't know what state my prototype was in when you performed measurements, and the heuristics have been tweaked. Having said that, if half the code cache was filled up, it sounds like it's getting to a point where you do want to flush things that haven't been used in a long while, to avoid getting into the "red zone" of aggressive sweeping at 90%, in order to free up space for currently hot code. To me it looks like it is working kind of as expected. Was there a regression due to unloading perceivably cold code? You can get more information about the heuristic decisions with -Xlog:codecache with my change. |
@luhenry @fisk I'm going to stress test it with my jdk-profiling-tester on M1 and x86 over night (comparing it with the current master). |
Thank you! |
There are still 15 comments that refer to There is also some printing and non product flags which needs an overhaul in what terminology to use, and with what meaning, in the world where the sweeper and zombie nmethods are gone. The second part is just an observation of future work, but it would be nice to have this patch at least deliver with no stale zombie comments. Details
|
Below is a CodeCache chart (numbers from -Xlog:codecache this time) on the current proposed change. App gets 250MB of CodeCache, and executes 23 independent benchmarks.
No, specific of my run is a set of independent benchmarks: warmup, iteration, warmup, iteration - it does not get regression unless the CodeHeap is completely over. Sweeper itself does not affect performance, I checked it. My concern is about user applications.
In my picture, we can see that the methods are swept out when the CodeCache is 70% free. For me it is not expected. Even a super-cold method can come back. We should not flush it when we are far from memory starvation. |
I created a JBS record for this task: JDK-8291302 thank you! |
if (nmethod_needs_unregister) { | ||
Universe::heap()->unregister_nmethod(this); | ||
} | ||
flush_dependencies(/*delete_immediately*/true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush_dependencies
is now only ever called with false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I removed it, and some remove_dependent methods that were only used in the true version that is no longer called.
I found all mentions of zombie I could find. Notably left ZombieALot as it has never created zombies, so the name was already wrong and misleading. It should arguably be called DeoptimizeALot or something... but that's obviously already taken. I'd prefer not to not discuss its naming in this PR as it was already wrong and there are enough things to consider as it is in here. |
Okay. I put in a special ARM32 mode for you for now. It will work but it won't remove cold nmethods. I really hope we can remove the special mode soon and assume all platforms have nmethod entry barriers. I don't like having special modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial review of few code/
files. Will do more later.
I am not comfortable with name unloading
for this process because historically, for me, it is associated with class unloading and "unloading" corresponding nmethods. cleaning
is more similar with sweeping
. But it could be only me.
src/hotspot/share/code/codeCache.cpp
Outdated
return _cold_gc_count; | ||
} | ||
|
||
void CodeCache::on_allocation() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about CodeCache::gc_on_allocation()
? Otherwise it is not clear what it does in places where it is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/hotspot/share/ci/ciEnv.cpp
Outdated
// Notify code cache unloading that we are about to allocate, which may | ||
// or may not require freeing up memory first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing comment. As I understand it could be: Check if memory should be freed before allocation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
// Calculate the number of GCs after which an nmethod is expected to have been | ||
// used in order to not be classed as cold. | ||
void CodeCache::update_cold_gc_count() { | ||
if (!MethodFlushing || !UseCodeCacheFlushing || NmethodSweepActivity == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -1,5 +1,5 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it Git hiccup with the file renaming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, haha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git automagically computes renaming based on how similar the files are. I just moved the CompilationLog class, and removed the file it came from, so it thinks it's a "rename". One of the pitfalls of git.
|
||
NoSafepointVerifier nsv; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is window after releasing all previous locks and here when nmethod could become dead. I don't see post_compiled_method()
checking for that.
Also I don't see such NoSafepointVerifier
in JVMCIRuntime::register_method()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nmethod can only get unloaded at safepoint polls. There aren't any when unlocking. The reason the NSV isn't just crossing the entire scope, is because some locks like Compile_lock will block when we take the lock, just not when we unlock it.
Unfortunately I can't put a similar NSV in the JVMCI installer, because when you run JVMCI with native image, there is a setter that uses JNI, which will check for safepoints. However, that only happens if the nmethod failed to install, which doesn't really matter, but it does prevent the NSV from being in the same corresponding location there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am talking about concurrent change of nmethod's state. When you publish nmethod (set Method::_code) and it is not locked it could be marked non-entrant. It depends on how OS's scheduler run/suspend this compiler's thread.
But you are right, unloading only happens in SFP. So the code should be still there and it is still alright to initialize tasks' fields.
nmethod::post_compiled_method_load_event()
has NoSafepointVerifier
already. Only task's fields settings is not covered. Do you think we still need it here?
|
Thanks for reviewing this change! I'm using the term "code cache unloading", because that is what we have always called unloading of nmethods triggered by the GC. And now it is the GC that owns this completely, so code cache unloading is the way in which nmethods are freed. At least that was my reasoning. I'd like to split the "unloading" parts of the CodeCache to a separate file, but I decided it's better to do that in a separate patch, as this patch is large enough and I don't want to move around code as well in it. Hope this makes sense. |
Ah. Good find. I can add the null check. |
I just noticed that there is already a bug open: https://bugs.openjdk.org/browse/JDK-8292368 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #9968 for the missing null check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the whole PR (there were already enough reviews), but I like the sweeper removal and what I've seen looks good. Impressive how strongly the sweeper is interwoven with the rest of hotspot! I'll be glad to see it go away.
Please note that there's a little more merging required after recent changes.
I'd like to rerun it in our nightly testing and discuss with Thomas. Heuristics fine tuning could still be done as follow-up, but we should try to avoid major drawbacks.
I fixed the merge conflict. Please let me know when you feel ready on your end. |
I think the term "full GC" has led to some confusion. You mean that we now need a complete marking cycle before we can flush any unused nmethods, right? |
Yes that's right. By "full GC", I meant a GC that traverses live objects in the entire heap, as opposed to a subset of it. For me, whether that operation is performed STW, or concurrently doesn't make it more or less "full". But I realize that many people associate the term with a full [implied STW] GC. |
Thanks for the confirmation. Our CI found one issue in a test with small code cache which was already reported by Thomas above: One more thing: You are using --- a/src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp
+++ b/src/hotspot/cpu/ppc/gc/shared/barrierSetNMethod_ppc.cpp
@@ -122,7 +122,12 @@ void BarrierSetNMethod::disarm(nmethod* nm) {
}
void BarrierSetNMethod::arm(nmethod* nm, int arm_value) {
- Unimplemented();
+ if (!supports_entry_barrier(nm)) {
+ return;
+ }
+
+ NativeNMethodBarrier* barrier = get_nmethod_barrier(nm);
+ barrier->release_set_guard_value(arm_value);
}
bool BarrierSetNMethod::is_armed(nmethod* nm) { |
I added the missing PPC code, thank you. As far as adapters go, I don't believe we have ever removed any. We have relied on caching/sharing of adapters of the same signature to keep the footprint down. |
Thanks for adding it! Looks good from our side. Our CI didn't find any other problems and we can still look into handling of adapters after this change. |
/integrate |
Thank you Martin! |
Going to push as commit 054c23f.
Your commit was automatically rebased without conflicts. |
Some more information about the "Out of space in CodeCache for adapters" issue: The test was running with less than 240MB ReservedCodeCacheSize. The VM doesn't enable SegmentedCodeCache in this case. Seems like the new implementation doesn't make sure we have enough space for adapters without SegmentedCodeCache, but we're not using this case in production. May still be a a good enhancement. |
Hmm. Good observation. Without segmented code cache, these issues are indeed more prevalent by nature. I'm not sure what the use case is for running without segmented code cache. |
I guess <240MB ReservedCodeCacheSize is relevant for people who run tiny Java apps in small containers and try to minimize all memory sizes. Some of them might observe this regression. We could reserve more space for adapters or we could try to enable SegmentedCodeCache for smaller code cache sizes as well. |
Enabling SegmentedCodeCache for smaller code cache sizes sounds like a good idea. |
Moving to SegmentedCodeCache for all platforms results in a performance regression for small platforms, but I remember there were plans (@eastig?) to halve the (ReservedCodeCacheSize >= 240*M) limit. |
When the world was still young, the sweeper was built to unload bad smelling nmethods. While it has been going through various revisions, the GCs got support for class unloading, and the need for the GCs to get rid of nmethods with a different unpleasant scent.
The two systems would now compete for unloading nmethods, and the responsibility of throwing away nmethods would blur. The sweeper was still good at throwing away nmethods faster as it only needs to scan stacks, and not do a full GC.
With the advent of Loom, the situation has gotten even worse. The stacks are now also in the Java heap. The sweeper is unable to throw away nmethods without the liveness analysis of a full GC, which also performs code cache unloading, but isn't allowed to actually delete nmethods due to races with the sweeper. In a way we have the worst of both worlds, where both the sweeper and GC are crippled, unable to unload nmethods without the help of the other. And there are a very large number of complicated races that the JVM needs to deal with, especially with concurrent code cache unloading not interfering with concurrent sweeping. And concurrent sweeping not interfering with the application.
The sweeper cycle exposes 2 flavours of nmethods that are "dead" to the system. So whenever nmethods are used, we have to know they are not dead. But we typically don't have the tools to really know they are not dead. For example, one might think grabbing the CodeCache_lock and using an iterator that only walks is_alive() nmethods would help make sure you don't get dead nmethods in your iterator. However, that is not the case, because the CodeCache_lock can't be held across the entire zombie transition due to "reasons" that are not trivial to actually change. Because of this, code has to deal with nmethods flipping around randomly to a dead state.
I propose to get out of this sad situation, by removing the sweeper. If we need a full GC anyway to remove nmethods, we might as well let the GC do everything. This removes the notion of is_zombie(), is_unloaded() and hence is_alive() from the JVM. It also removes the notion of the orthogonal but related nmethodLocker to keep nmethods around, without preventing them from dying. We instead throw away nmethods the way we throw away pretty much anything else in the unloading GC code:
This way, if you get a reference to an nmethod, it won't go away until the next safepoint poll, and will not flip around liveness due to concurrent transitions.
In the new model, we use nmethod entry barriers to keep track of the last time an nmethod was on-stack. This is then used to 1) prove that not_entrant nmethods that haven't been on-stack for an entire GC can be removed, and 2) heuristically remove nmethods that have never been called for N full GCs, where N is calculated based on code cache allocation rate, GC frequency, remaining free memory until "trouble", etc. Similar to metaspace, there is also some threshold GC trigger to start GC when the code cache is filling up, and nothing else is triggering full GCs. The threshold gets smaller as we approach a point of being uncomfortably close to code cache exhaustion. Past said point, we GC very aggressively, and you probably want a larger code cache.
I have tested this in mach5 tier1-7, I have run through perf aurora with no regressions, and also run an "internal large application" to see how it scales, also with no regressions. Since testing tier1-7 a few small tweaks have been made so I am running some extra testing.
I have tried to be as compatible as possible to previous sweeping related JVM flags, arguing that nothing in the flags implies whether the implementation is using a GC or a separate sweeper thread. However, the UseCodeAging flag I have obsoleted, as UseCodeCacheFlushing is the flag for deciding cold nmethods should be removed, and with the new mechanism for doing that, there is no need for UseCodeAging flag as well.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9741/head:pull/9741
$ git checkout pull/9741
Update a local copy of the PR:
$ git checkout pull/9741
$ git pull https://git.openjdk.org/jdk pull/9741/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9741
View PR using the GUI difftool:
$ git pr show -t 9741
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9741.diff