-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8320649: C2: Optimize scoped values #16966
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back roland! A progress list of the required criteria for merging this PR into |
@rwestrel |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No review yet, I just performed some quick testing.
The optimized build fails:
[2023-12-05T16:26:12,957Z] open/src/hotspot/share/opto/loopnode.cpp:4745: error: undefined reference to 'ScopedValueGetHitsInCacheNode::verify() const'
[2023-12-05T16:26:12,960Z] open/src/hotspot/share/opto/loopnode.cpp:4761: error: undefined reference to 'ScopedValueGetLoadFromCacheNode::verify() const'
[2023-12-05T16:26:12,964Z] open/src/hotspot/share/opto/loopnode.cpp:4908: error: undefined reference to 'ScopedValueGetHitsInCacheNode::verify() const'
[2023-12-05T16:26:12,967Z] open/src/hotspot/share/opto/loopnode.cpp:4911: error: undefined reference to 'ScopedValueGetLoadFromCacheNode::verify() const'
[2023-12-05T16:26:12,976Z] open/src/hotspot/share/opto/loopopts.cpp:3935: error: undefined reference to 'ScopedValueGetHitsInCacheNode::verify() const'
[2023-12-05T16:26:15,455Z] collect2: error: ld returned 1 exit status
compiler/c2/irTests/TestScopedValue.java
fails with -Xcomp
on Linux x64:
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f8de687d1b9, pid=270115, tid=270131
#
# JRE version: Java(TM) SE Runtime Environment (22.0) (fastdebug build 22-internal-2023-12-05-1616186.tobias.hartmann.jdk2)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 22-internal-2023-12-05-1616186.tobias.hartmann.jdk2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x12931b9] PhaseIdealLoop::get_early_ctrl(Node*)+0x4c9
Current CompileTask:
C2:30390 8110 b 4 compiler.c2.irTests.TestScopedValue::testFastPath13 (28 bytes)
Stack: [0x00007f8dc4353000,0x00007f8dc4453000], sp=0x00007f8dc444d6c0, free space=1001k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x12931b9] PhaseIdealLoop::get_early_ctrl(Node*)+0x4c9 (loopnode.hpp:1139)
V [libjvm.so+0x1293d95] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0x75 (loopnode.cpp:251)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x1293df6] PhaseIdealLoop::set_subtree_ctrl(Node*, bool) [clone .part.0]+0xd6 (node.hpp:399)
V [libjvm.so+0x12960e0] PhaseIdealLoop::test_and_load_from_cache(Node*, Node*, Node*, Node*, float, float, Node*, Node*&, Node*&, Node*&)+0x820 (loopnode.cpp:4900)
V [libjvm.so+0x1296b6a] PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode*)+0x82a (loopnode.cpp:4822)
V [libjvm.so+0x1297473] PhaseIdealLoop::expand_scoped_value_get_nodes()+0x243 (loopnode.cpp:4737)
V [libjvm.so+0x12a45ed] PhaseIdealLoop::build_and_optimize()+0xf0d (loopnode.cpp:4672)
V [libjvm.so+0x9f4ea2] PhaseIdealLoop::optimize(PhaseIterGVN&, LoopOptsMode)+0x432 (loopnode.hpp:1113)
V [libjvm.so+0x9ed945] Compile::optimize_loops(PhaseIterGVN&, LoopOptsMode)+0x75 (compile.cpp:2248)
V [libjvm.so+0x9f0253] Compile::Optimize()+0xfd3 (compile.cpp:2500)
V [libjvm.so+0x9f37e1] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c21 (compile.cpp:860)
V [libjvm.so+0x83eca7] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1e7 (c2compiler.cpp:134)
V [libjvm.so+0x9ff17c] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c (compileBroker.cpp:2299)
V [libjvm.so+0x9ffe08] CompileBroker::compiler_thread_loop()+0x468 (compileBroker.cpp:1958)
V [libjvm.so+0xeb93bc] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:720)
V [libjvm.so+0x17992c6] Thread::call_run()+0xb6 (thread.cpp:220)
V [libjvm.so+0x14a30f7] thread_native_entry(Thread*)+0x127 (os_linux.cpp:787)
compiler/c2/irTests/TestScopedValue.java
fails with -ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation
on Linux x64:
Failed IR Rules (1) of Methods (1)
----------------------------------
1) Method "public static void compiler.c2.irTests.TestScopedValue.testFastPath7()" - [Failed IR rules: 1]:
* @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#C#CALL_OF_METHOD#_", "slowGet"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
> Phase "PrintIdeal":
- failOn: Graph contains forbidden nodes:
* Constraint 1: "(\\d+(\\s){2}(Call.*Java.*)+(\\s){2}===.*slowGet )"
- Matched forbidden node:
* 501 CallStaticJava === 370 6 7 8 1 (648 1 1 1 1 1 ) [[ 502 503 504 ]] # Static java.lang.ScopedValue::slowGet
compiler/c2/irTests/TestScopedValue.java
fails with -XX:TypeProfileLevel=222
on AArch64:
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S29331/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e2db05c4-923c-4a63-923b-5f9870681cc5/runs/c74e986d-15f1-46dc-822b-a41d12c079e0/workspace/open/src/hotspot/share/opto/callGenerator.cpp:929), pid=44590, tid=26115
# Error: assert(in->Opcode() == Op_LoadP || in->Opcode() == Op_LoadN) failed
Current CompileTask:
C2:766 689 b 4 compiler.c2.irTests.TestScopedValue::testFastPath1 (30 bytes)
Stack: [0x00000001719ec000,0x0000000171bef000], sp=0x0000000171beb0c0, free space=2044k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.dylib+0x1130268] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x564 (callGenerator.cpp:929)
V [libjvm.dylib+0x1130a88] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V [libjvm.dylib+0x5618b0] print_error_for_unit_test(char const*, char const*, char*)+0x0
V [libjvm.dylib+0x396ea0] LateInlineScopedValueCallGenerator::process_result(GraphKit&)+0x2534
V [libjvm.dylib+0x38f8dc] CallGenerator::do_late_inline_helper()+0x660
V [libjvm.dylib+0x4cd2bc] Compile::inline_scoped_value_calls(PhaseIterGVN&)+0x570
V [libjvm.dylib+0x4c6944] Compile::Optimize()+0x210
V [libjvm.dylib+0x4c54bc] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1228
V [libjvm.dylib+0x38a590] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1e0
V [libjvm.dylib+0x4e2f48] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x854
V [libjvm.dylib+0x4e238c] CompileBroker::compiler_thread_loop()+0x348
V [libjvm.dylib+0x8bb170] JavaThread::thread_main_inner()+0x1dc
V [libjvm.dylib+0x1076548] Thread::call_run()+0xf4
V [libjvm.dylib+0xe39138] thread_native_entry(Thread*)+0x138
C [libsystem_pthread.dylib+0x726c] _pthread_start+0x94
compiler/c2/irTests/TestScopedValue.java
fails with -XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan
on AArch64:
compiler.lib.ir_framework.shared.TestRunException: There was an error while invoking @Run method private void compiler.c2.irTests.TestScopedValue.testFastPath1Runner() throws java.lang.Exception
at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:162)
at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:87)
at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:822)
at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:249)
at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:164)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
... 4 more
Caused by: java.lang.RuntimeException: should be compiled
at compiler.c2.irTests.TestScopedValue.testFastPath1Runner(TestScopedValue.java:87)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
... 6 more
compiler/c2/TestUnsignedByteCompare.java
and compiler/codegen/TestSignedMultiplyLong.java
fail with -Duse.JTREG_TEST_THREAD_FACTORY=Virtual -XX:-VerifyContinuations
intermittent on Windows x64:
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (/System/Volumes/Data/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S29331/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/e2db05c4-923c-4a63-923b-5f9870681cc5/runs/c74e986d-15f1-46dc-822b-a41d12c079e0/workspace/open/src/hotspot/share/opto/compile.cpp:813), pid=29127, tid=26371
# assert(IncrementalInline || (_late_inlines.length() == 0 && !has_mh_late_inlines())) failed: incremental inlining is off
Current CompileTask:
C2:5797 3627 b java.lang.System$2::scopedValueCache (4 bytes)
Stack: [0x0000000171694000,0x0000000171897000], sp=0x0000000171894bc0, free space=2050k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.dylib+0x1130268] VMError::report_and_die(int, char const*, char const*, char*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x564 (compile.cpp:813)
V [libjvm.dylib+0x1130a88] VMError::report_and_die(Thread*, unsigned int, unsigned char*, void*, void*)+0x0
V [libjvm.dylib+0x5618b0] print_error_for_unit_test(char const*, char const*, char*)+0x0
V [libjvm.dylib+0x4c5794] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1500
V [libjvm.dylib+0x38a590] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1e0
V [libjvm.dylib+0x4e2f48] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x854
V [libjvm.dylib+0x4e238c] CompileBroker::compiler_thread_loop()+0x348
V [libjvm.dylib+0x8bb170] JavaThread::thread_main_inner()+0x1dc
V [libjvm.dylib+0x1076548] Thread::call_run()+0xf4
V [libjvm.dylib+0xe39138] thread_native_entry(Thread*)+0x138
C [libsystem_pthread.dylib+0x726c] _pthread_start+0x94
Just let me know if you need any more information.
Thanks for doing that. All issues should be fixed now (I couldn't reproduce the last one so not 100% sure about that one). |
I'm not a C2 expert, so my high-level comments might not all make sense, but here goes.
|
A couple of answers:
Binding and get() are usually separated by a long way. It's a common pattern to use get() inside a loop when a ScopedValue is used to hold a capability object which is private within a library context.
Maybe I'm misunderstanding this question, but that's what the scoped value cache does. |
That's a fair reaction.
Initially, I thought about delaying the inlining of The other thing about optimizing It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not.
The thing about
Eliminating
I think my comments above cover that one. |
Thanks @rwestrel, that helps. I have no objections to this change, but I don't understand C2 enough to do a deeper review. |
@theRealAph I guess it boils down to whether the hash value can be treated as a compile-time constant, which seems possible because it's marked final. |
It always has been in the tests I've done. One of the interesting challenges with this work has been to make sure scoped value performance doesn't regress. A great advantage of this PR is that a dedicated scoped value optimization helps to make such regressions less likely. |
Tests all pass now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just studied the background and hope to look into this in the next days.
Personal wishlist: can you add a case where this optimization enables vectorization? Or do your optimizations happen too late for that?
// } | ||
// MyLong long2 = (MyLong)scopedValue.get(); | ||
// return long1.getValue() + long2.getValue(); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you still working on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I couldn't make the test work unfortunately, so I wasn't sure whether to leave the test commented out (in case someone revisits that later) or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have the body of the test put in, and the IR-rules commented out, with a follow-up RFE for investigation, if you think there is something one can do about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @rwestrel
I'm sending out a first batch or comments, more coming later.
@@ -105,7 +105,8 @@ Node* BarrierSetC2::store_at_resolved(C2Access& access, C2AccessValue& val) cons | |||
assert(access.is_opt_access(), "either parse or opt access"); | |||
C2OptAccess& opt_access = static_cast<C2OptAccess&>(access); | |||
Node* ctl = opt_access.ctl(); | |||
MergeMemNode* mm = opt_access.mem(); | |||
assert(opt_access.mem()->is_MergeMem(), ""); | |||
MergeMemNode* mm = opt_access.mem()->as_MergeMem(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does as_MergeMem
not assert is_MergeMem
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update ✔️
// } | ||
// continue: | ||
// | ||
// slow_call: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes it look like continue is a fall-through to slow_call, that is not what you want, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update ✔️
// the result of the parsing of the java code where the success path ends with an Halt node. The reason for that is | ||
// that some paths may end with an uncommon trap and if one traps, we want the trap to be recorded for the right bci. | ||
// When the ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair is expanded, split if finds the duplicate | ||
// logic and cleans it up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the comment section at the beginning of the method. Otherwise I may start reading down linearly, reverse-engineer the code, and only discover this afterwards... 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates ✔️
_process_result = v; | ||
} | ||
|
||
virtual void process_result(GraphKit& kit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be really nice if you refactored this huge method (6+ pages of code) into smaller units.
It would for example make separation of pattern-matching and transformation easier to see.
slow_call = c->as_CallStaticJava(); | ||
assert(slow_call->method()->intrinsic_id() == vmIntrinsics::_SVslowGet, ""); | ||
} else { | ||
assert(c->is_Proj() || c->is_Catch(), ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(c->is_Proj() || c->is_Catch(), ""); | |
assert(c->is_Proj() || c->is_Catch(), "unexpected node in pattern matching"); |
Unique_Node_List wq; | ||
wq.push(kit.control()); | ||
for (uint i = 0; i < wq.size(); ++i) { | ||
Node* c = wq.at(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we assert that these are all CFG nodes? And give it a more expressive name?
} | ||
} | ||
// get_first_iff/get_second_iff contain the first/second check we ran into during the graph traversal but they may | ||
// not be the first/second one in execution order. Perform another traversal to figure out which is first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can this not be done in the first traversal, and why does this (down) traversal do the right thing?
Can we assert that c
is always CFG?
Please mention in a comment that we are only traversing the same CFG nodes from the first traversal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can this not be done in the first traversal, and why does this (down) traversal do the right thing?
The first traversal starts from the end of the method and follows control paths until it reaches the Thread.scopedValueCache()
call. Given the shape of the method and that some paths may have been trimmed and end with an uncommon trap, it could reach either the first or the second if that probe the cache first.
// get_first_iff/get_second_iff contain the first/second check we ran into during the graph traversal but they may | ||
// not be the first/second one in execution order. Perform another traversal to figure out which is first. | ||
if (get_second_iff != nullptr) { | ||
Node_Stack stack(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No visited set. Can this trigger an exponential explosion with if/region diamonds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No visited set. Can this trigger an exponential explosion with if/region diamonds?
It only follows the control subgraph for the ScopedValue.get()
which is fairly simple.
get_first_iff->in(1)->as_Bool()->_test._test == BoolTest::ne ? 0 : 1); | ||
CallStaticJavaNode* get_first_iff_unc = get_first_iff_failure->is_uncommon_trap_proj(Deoptimization::Reason_none); | ||
if (get_first_iff_unc != nullptr) { | ||
// first cache check never hits, keep only the second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to understand:
We still have an unc-trap for the first. So we never failed so far, right? So we always found it in the cache, or am I wrong?
We are not removing this unc-trap though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ScopedValue.get()
codee probes 2 cache locations. If, when pattern matching the get()
subgraph:
-
we only find a single if that probes the cache, then, according to profile data, there was always a hit at the first cache location.
-
we find 2 ifs, then the first and second locations were probed. If the first if's other branch is to an uncommon trap, then that location never saw a cache hit. In that case, when the
ScopedValueGetHitsInCacheNode
is expanded, only code to probe the second location is added back to the IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second badge of comments.
kit.set_i_o(io); | ||
|
||
// remove the scopedValueCache() call | ||
CallProjections scoped_value_cache_projs = CallProjections(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CallProjections scoped_value_cache_projs = CallProjections(); | |
CallProjections scoped_value_cache_projs; |
Is the assignment really necessary, or style-wise preferrable? I see you use it without elsewhere.
// Now move right above the scopedValueCache() call | ||
Node* mem = scoped_value_cache->in(TypeFunc::Memory); | ||
Node* c = scoped_value_cache->in(TypeFunc::Control); | ||
Node* io = scoped_value_cache->in(TypeFunc::I_O); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use input_mem
, input_ctrl
, input_io
. Then the replacements below would read more intuitively.
first_index == nullptr ? C->top() : first_index, | ||
second_index == nullptr ? C->top() : second_index); | ||
|
||
// It will later be expanded back to all the checks so record profile data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also copy the node info (e.g. line number etc)?
sv_hits_in_cache->set_profile_data(2, get_second_iff->_fcnt, get_second_prob); | ||
} else { | ||
sv_hits_in_cache->set_profile_data(2, 0, 0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case of code duplication. Why not write a method that extracts cnt, prob
for an iff
?`Or maybe that already exists?
assert(sv_hits_in_cachex == sv_hits_in_cache, ""); | ||
|
||
// And compute the probability of a miss in the cache | ||
float prob; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float prob; | |
float cache_miss_prob; |
r->init_req(2, in_cache); | ||
|
||
// ScopedValueGetLoadFromCache is a single that represents the result of a hit in the cache | ||
Node* cache_value = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node* cache_value = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache)); | |
Node* sv_load_from_cache = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache)); |
For consistency with sv_hits_in_cache
and the node class name.
do_intrinsic(_SVslowGet, java_lang_ScopedValue, slowGet_name, void_object_signature, F_R) \ | ||
do_name( slowGet_name, "slowGet") \ | ||
do_intrinsic(_SVCacheInvalidate, java_lang_ScopedValue_Cache, invalidate_name, int_void_signature, F_S) \ | ||
do_name( invalidate_name, "invalidate") \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents us from writing out _scopedValueGet
etc, just like the other names here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update ✔️
src/hotspot/share/opto/cfgnode.hpp
Outdated
|
||
ProjNode* result_out() { | ||
return proj_out_or_null(Result); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either verify that we have not null, or else rename to result_out_or_null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update ✔️
src/hotspot/share/opto/compile.cpp
Outdated
@@ -461,6 +462,7 @@ void Compile::disconnect_useless_nodes(Unique_Node_List& useful, Unique_Node_Lis | |||
remove_useless_late_inlines( &_string_late_inlines, useful); | |||
remove_useless_late_inlines( &_boxing_late_inlines, useful); | |||
remove_useless_late_inlines(&_vector_reboxing_late_inlines, useful); | |||
remove_useless_late_inlines( &_scoped_value_late_inlines, useful); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove_useless_late_inlines( &_scoped_value_late_inlines, useful); | |
remove_useless_late_inlines( &_scoped_value_late_inlines, useful); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it align with the code above
And:
We keep adding more and more of these...
Soon we will need to refactor this to a list of lists 😅
src/hotspot/share/opto/compile.cpp
Outdated
@@ -2025,6 +2030,73 @@ void Compile::inline_boxing_calls(PhaseIterGVN& igvn) { | |||
} | |||
} | |||
|
|||
void Compile::inline_scoped_value_calls(PhaseIterGVN& igvn) { | |||
if (_scoped_value_late_inlines.length() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than indenting everything, I would just check _scoped_value_late_inlines.is_empty()
and return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third comment batch.
I still have not looked at some parts, but I think this is enough details for today ;)
src/hotspot/share/opto/compile.cpp
Outdated
} | ||
C->set_has_scoped_value_get_nodes(true); | ||
CallNode* call = cg->call_node(); | ||
CallProjections projs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CallProjections projs; | |
CallProjections call_projs; |
src/hotspot/share/opto/compile.cpp
Outdated
call->extract_projections(&projs, true); | ||
Node* sv = call->in(TypeFunc::Parms); | ||
Node* control_out = projs.fallthrough_catchproj; | ||
Node* res = projs.resproj; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have longer and more descriptive names for sv
and res
please 🙏 😃
src/hotspot/share/opto/compile.cpp
Outdated
gvn->record_for_igvn(control_out); | ||
res = res->clone(); | ||
gvn->set_type_bottom(res); | ||
gvn->record_for_igvn(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you clone these? (maybe add a comment)
src/hotspot/share/opto/compile.cpp
Outdated
case Op_ScopedValueGetResult: | ||
case Op_ScopedValueGetHitsInCache: | ||
case Op_ScopedValueGetLoadFromCache: { | ||
ShouldNotReachHere(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Add a comment!
Node* ScopedValueGetLoadFromCacheNode::scoped_value() const { | ||
Node* hits_in_cache = in(1); | ||
assert(hits_in_cache->Opcode() == Op_ScopedValueGetHitsInCache, ""); | ||
return ((ScopedValueGetHitsInCacheNode*)hits_in_cache)->scoped_value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add the neccessary bits to the class so you can use as_ScopedValueGetLoadFromCache()
?
src/hotspot/share/opto/multnode.cpp
Outdated
wq.push(u); | ||
} | ||
} | ||
} else if (n->Opcode() != Op_Halt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a path without a call and only Halt node is also a uncommon trap?
} | ||
return res; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code duplication warning 😉
Not sure what is the best solution though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove duplication with a simple "implement" function, that takes a parameter "want_unique". Then if you find the element, and don't want unique, you return. If you are looking for unique, you just continue and check that you don't find it again.
src/hotspot/share/opto/type.cpp
Outdated
@@ -614,6 +614,16 @@ void Type::Initialize_shared(Compile* current) { | |||
TypeInstKlassPtr::OBJECT = TypeInstKlassPtr::make(TypePtr::NotNull, current->env()->Object_klass(), 0); | |||
TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0); | |||
|
|||
const Type **fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Type **fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); | |
const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); |
src/hotspot/share/opto/type.cpp
Outdated
fgetfromcache[1] = TypeInstPtr::BOTTOM; | ||
fgetfromcache[2] = TypeAryPtr::OOPS; | ||
TypeTuple::make(3, fgetfromcache); | ||
const Type **fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Type **fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); | |
const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); |
src/hotspot/share/opto/type.hpp
Outdated
@@ -746,6 +746,7 @@ class TypeTuple : public Type { | |||
static const TypeTuple *LONG_PAIR; | |||
static const TypeTuple *INT_CC_PAIR; | |||
static const TypeTuple *LONG_CC_PAIR; | |||
static const TypeTuple *SV_GET_RESULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static const TypeTuple *SV_GET_RESULT; | |
static const TypeTuple* SV_GET_RESULT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path 1 of today.
Node* first_index = nullptr; // index in the cache for first hash | ||
Node* second_index = nullptr; // index in the cache for second hash | ||
CallStaticJavaNode* slow_call = nullptr; // slowGet() call if any | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setup really looks like it should be a class, maybe called ScopedValueGetPatternMatcher
?
All your variables here could be fields, and the scope below a method, or even split into multiple methods.
} | ||
|
||
|
||
bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a short comment above, that we are trying to hoist the If
for a ScopedValueGetHitsInCache
out of the loop, if possible.
You have one on the first line, but I think generally we place them at the top, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, now looking at the method... maybe a longer comment with a picture or pseudocode would be helpful.
It would greatly help me in reviewing the code - otherwise I basically have to draw the picture on a piece of paper myself before understanding it ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it the most when there is ascii art that uses the variable names in the code below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll read the code in a later review.
// A ScopedValueGetHitsInCache check is loop invariant if the scoped value object it is applied to is loop invariant | ||
BoolNode* bol = iff->in(1)->as_Bool(); | ||
if (bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->scoped_value()) && | ||
invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index1()) && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index2())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refactor this if:
- if we don't take it, you return false, so make it a bailout. This allows you to already bailout if
bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache
fails. - After that check, you can already have this line:
ScopedValueGetHitsInCacheNode* hits_in_the_cache = (ScopedValueGetHitsInCacheNode*) bol->in(1);
, and simplify the other 3 conditions of the if, make it more readable. - But make them bailouts as well, instead of indenting the rest of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know how is_invar
works. But why is a use-node not automatically variant, if a def-node is variant. Or stated in other terms: why not just check invar.is_invariant(hits_in_the_cache)
?
bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj, | ||
ParsePredicateSuccessProj* parse_predicate_proj, | ||
Invariance &invar, Deoptimization::DeoptReason reason, | ||
IfNode* iff, IfProjNode*&new_predicate_proj) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IfNode* iff, IfProjNode*&new_predicate_proj) { | |
IfNode* iff, IfProjNode* &new_predicate_proj) { |
if (TraceLoopPredicate) { | ||
tty->print("Predicate invariant if: %d ", new_predicate_iff->_idx); | ||
loop->dump_head(); | ||
} else if (TraceLoopOpts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not have them as separate ifs? What if someone enables both, will they not miss a line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the code pattern used elsewhere in PhaseIdealLoop::loop_predication_impl_helper()
.
src/hotspot/share/opto/loopnode.cpp
Outdated
return progress; | ||
} | ||
|
||
void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { | |
void PhaseIdealLoop::expand_sv_get_hits_in_cache_and_load_from_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And again, a picture / pseudocode of the transformation would help immensely.
src/hotspot/share/opto/loopnode.cpp
Outdated
set_ctrl(zero, C->root()); | ||
_igvn.replace_input_of(iff, 1, zero); | ||
_igvn.replace_node(get_from_cache, C->top()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/hotspot/share/opto/loopnode.cpp
Outdated
return; | ||
} | ||
|
||
Node* load_of_cache = get_from_cache->in(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node* load_of_cache = get_from_cache->in(1); | |
Node* cache_adr = get_from_cache->in(1); |
src/hotspot/share/opto/subnode.hpp
Outdated
} | ||
|
||
Node* mem() const { | ||
return in(Memory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not verify that this is a MemNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MemNodes are not the only ones carrying memory state (projection for a call, membar etc).
src/hotspot/share/opto/loopnode.cpp
Outdated
Node* first_index = get_from_cache->index1(); | ||
Node* second_index = get_from_cache->index2(); | ||
|
||
if (first_index == C->top() && second_index == C->top()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this not be done during igvn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here is that the cache was always seen to be null so no code to to probe the cache was added to the IR. When that happens, the optimizations still apply (i.e. there could be dominated ScopedValue.get()
that can be replaced by this one or a dominating one that can replace this one). This should also be very uncommon. So the nodes that are put in place to enable optimizations should be left in until expansion to not miss optimization opportunities.
Co-authored-by: Andrey Turbanov <turbanoff@gmail.com>
In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. |
So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation 😞 |
Co-authored-by: Emanuel Peter <emanuel.peter@oracle.com>
Even if that was possible, ScopedValue get loads are from the cache indexed by a hash stored as a field in the ScopedValue object. I'm not sure how you would be able to tell which of the loads from several get() calls are contiguous in memory. |
I moved most of the scoped value specific code to |
I also removed |
@eme64 change is ready for another review |
@rwestrel I feel like I am heavily stepping on your toes now.... I'm thinking in particular about your most recent changes with:
Don't get me wrong: I like those refactorings, but they should be done separately. If you can find anything else that could be done separately, that would help greatly. I have been painstakingly separating my SuperWord PR's into more reviewable patches, and I do get quicker reviews that way. My concern: I think the code is now in a state that can be understood (if one spends a day reading it all), but it is hard for me to say that it is correct. If I now approve this patch, then a subsequent reviewer will pay less attention, hence, I feel like I cannot just approve it too quickly now. If I am too annoying, feel free to ask someone else to review and I will just step back. Maybe @theRealAph wants to review for a while? |
@rwestrel one idea to split things here:
This way, I can spend only a few hours on one at a time, and we can get this done. |
The problem I see is that they have little value unless this patch is integrated as it is. What if another reviewer thinks it's better to keep everything related to loop predication together? There's no need to change the class |
Would one commit per line above work? Or do you think it needs to be different PRs? |
@rwestrel Honestly, I would like to take a break from this for now. I think the code is significantly better/readable than when we first started. So if someone like @vnkozlov simply scans and approves it, and as such takes the responsibility of "first reviewer", then I'm totally fine with that. |
I want to see performance numbers on x64 and aarch64 before starting looking on it. It would be nice to have data for all micros Put results into JBS and post short summary here. You can compare by disable/enable new intrinsics. |
I'm on it. |
I've spoken to some senior JDK developers and their feeling is that this patch is too specific to the current scoped value implementation and too complex to go into HotSpot, especially for a feature like scoped values that is not yet out of preview. |
I know this is "parked" now, but there are some internal conversations happening. One question that @dean-long and @rose00 had: |
There are 3 optimizations that this patch performs: 1- replace a There are 2 1- when profile reports that the slow path that updates the cache is not taken: Obviously, before The patch performs optimization 1-, 2- and 3- for patterns 1- and 2- but, it does it better for pattern 1- than 2-. If the slow path is included in compiled code, then only a I thought about always speculating initially that the slow path is not taken when compiling |
What's a good benchmark to run to show the benefit of this change, or to show the effect of different cache sizes and/or Java implementation changes? I tried running micro:ScopedValue benchmarks with -Djava.lang.ScopedValue.cacheSize=2 and didn't see a difference. But the new compiler/scoped_value/TestScopedValue.java test fails in compiler.c2.irTests.TestScopedValue.testFastPath16 with the cache size set to 2. Given the right benchmark, there are some experiments I'd like to try, related to the ScopedValue Java implemenation:
|
I think it might make sense to split 1 and 2, which are independent of the details of get() and put(), from 3. Then we can consider if there are other optimizations we can do around opaque() get() and put(). For example, why can't we replace a get() with the value from a dominating put()? Why can't we eliminate both the put() and get() completely, as long as the value can't "escape" or we deoptimize? |
With Benchmark Mode Cnt Score Error Units before this patch and
after.
Looks pretty deterministic to me. Every value has two hash codes, primary and secondary,and they are different.
Put another conditional load in the control flow? I'm not sure that would do much, but OK. I guess I don't know how this would work.
I guess that might help.
Interesting. I did a version of the code that used bytecode generation to produce a new accessor method for each scoped value a year or two ago, for that same reason. It did work, but was rather heavyweight. Re benchmarks: the benchmarks are all there, but the current design is based on principles, as well as benchmark results.
|
Ah, I just realized what you meant. Without random replacement in the cache, there is a high probability that scoped values will collide, because the cache is small. Even with only two scoped values accessed alternately, each will repeatedly kick out the other, leading to a linear probe every time. The only way to avoid this without random replacement would be to make the cache considerably larger, and even then it would still occasionally happen. |
I guess I don't understand how random replacement is supposed to help. Do you have a pointer to where I can read up on the topic? |
On 5/30/24 21:54, Dean Long wrote:
https://en.wikipedia.org/wiki/Cache_replacement_policies#Random_replacement_(RR) The maths is pretty simple: if you have only one slot for each entry, one Random replacement is good for software implementation because it doesn't
In practice, today's processors speculate both primary and secondary loads Random replacement is a low-cost way to increase the hit ratio of a cache |
One other thing. Let's say you always check the slot before overwriting it, and only then go to the secondary slot. You find the secondary slot is occupied. The best thing to do then is random replacement. Given that the end effect of just doing random replacement is the same, there's nothing to be gained from the added complexity. |
@rwestrel This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
This change implements C2 optimizations for calls to
ScopedValue.get(). Indeed, in:
v2
can be replaced byv1
and the second call toget()
can beoptimized out. That's true whatever is between the 2 calls unless a
new mapping for
scopedValue
is created in between (when that happensno optimizations is performed for the method being compiled). Hoisting
a
get()
call out of loop for a loop invariantscopedValue
shouldalso be legal in most cases.
ScopedValue.get()
is implemented in java code as a 2 step process. Acache is attached to the current thread object. If the
ScopedValue
object is in the cache then the result from
get()
is read fromthere. Otherwise a slow call is performed that also inserts the
mapping in the cache. The cache itself is lazily allocated. One
ScopedValue
can be hashed to 2 different indexes in the cache. On acache probe, both indexes are checked. As a consequence, the process
of probing the cache is a multi step process (check if the cache is
present, check first index, check second index if first index
failed). If the cache is populated early on, then when the method that
calls
ScopedValue.get()
is compiled, profile reports the slow pathas never taken and only the read from the cache is compiled.
To perform the optimizations, I added 3 new node types to C2:
the pair
ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for
the cache probe
a cfg node ScopedValueGetResultNode to help locate the result of the
get()
call in the IR graph.In pseudo code, once the nodes are inserted, the code of a
get()
is:In the snippet:
Replacing
v2
byv1
is then done by starting from theScopedValueGetResult
node for the secondget()
and looking for adominating
ScopedValueGetResult
for the sameScopedValue
object. When one is found, it is used as a replacement. Eliminating
the second
get()
call is achieved by makingScopedValueGetHitsInCache
always successful if there's a dominatingScopedValueGetResult
and replacing its companionScopedValueGetLoadFromCache
by the dominatingScopedValueGetResult
.Hoisting a
get()
out of loop is achieved by peeling one iteration ofthe loop. The optimization above then finds a dominating
get()
andremoved the
get()
from the loop body.An important case, I think, is when profile predicts the slow case to
never taken. Then the code of
get()
is:The
ScopedValueGetResult
doesn't help and is removed early one. Theoptimization process then looks for a pair of
ScopedValueGetHitsInCache
/ScopedValueGetLoadFromCache
thatdominates the current pair of
ScopedValueGetHitsInCache
/ScopedValueGetLoadFromCache
and canreplace them. In that case, hoisting a
ScopedValue.get()
can bedone by predication and I added special logic in predication for that.
Adding the new nodes to the graph when a
ScopedValue.get()
call isencountered is done in several steps:
1- inlining of
ScopedValue.get()
is delayed and the call is enqueuedfor late inlining.
2- Once the graph is fully constructed, for each call to
ScopedValue.get()
, aScopedValueGetResult
is added between theresult of the call and its uses.
3- the call is then inlined by parsing the
ScopedValue.get()
method4- finally the subgraph that results is pattern matched and the pieces
required to perform the cache probe are extracted and attached to new
ScopedValueGetHitsInCache
/ScopedValueGetLoadFromCache
nodesThere are a couple of reasons for steps 3 and 4:
As mentioned above probing the cache is a multi step process. Having
only 2 nodes in a simple graph shape to represent it makes it easier
to write robust optimizations
the subgraph for the method after parsing contains valuable pieces
of information: profile data that captures which of the 2 locations
in the cache is the most likely to causee a hit. Profile data is
attached to the nodes.
Removal of redundant nodes is done during loop opts. The
ScopedValue
nodes are then expanded. That also happens during loop opts because
once expansion is over, there are opportunities for further
optimizations/clean up that can only happens during loop opts. During
expansion,
ScopedValueGetResult
nodes are removed andScopedValueGetHitsInCache
/ScopedValueGetLoadFromCache
are expandedto the multi step process of probing the cache. Profile data attached
to the nodes are used to assign correct frequencies/counts to the If
nodes. Of the 2 locations in the cache that are tested, the one that's
the most likely to see a hit (from profile data) is done first.
/cc hotspot-compiler
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966
$ git checkout pull/16966
Update a local copy of the PR:
$ git checkout pull/16966
$ git pull https://git.openjdk.org/jdk.git pull/16966/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16966
View PR using the GUI difftool:
$ git pr show -t 16966
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16966.diff
Webrev
Link to Webrev Comment