-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8290965: PPC64: Implement post-call NOPs #17171
Conversation
👋 Welcome back rrich! A progress list of the required criteria for merging this PR into |
/label remove hotspot |
@reinrich |
@reinrich |
Co-authored-by: Andrew Haley <aph-open@littlepinkcloud.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage of CMPI/CMPLI looks great. Assuming kind::nmethod
by default will likely work, but I wonder if we could avoid that without measurable performance loss (see comments below).
if (_pc == nullptr) { | ||
_pc = (address)own_abi()->lr; | ||
assert(_pc != nullptr, "must have PC"); | ||
} | ||
|
||
if (_cb == nullptr) { | ||
_cb = CodeCache::find_blob(_pc); | ||
if (_cb == nullptr ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the whitespace!
if (_cb == nullptr) { | ||
_cb = CodeCache::find_blob(_pc); | ||
if (_cb == nullptr ) { | ||
_cb = knd == kind::nmethod ? CodeCache::find_blob_fast(_pc) : CodeCache::find_blob(_pc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(knd == kind::nmethod)
would look better.
@@ -393,16 +393,26 @@ | |||
inline common_abi* own_abi() const { return (common_abi*) _sp; } | |||
inline common_abi* callers_abi() const { return (common_abi*) _fp; } | |||
|
|||
enum class kind { | |||
native, // The frame's pc is not necessarily in the CodeCache. | |||
// CodeCache::find_blob_fast(void* pc) can yield wrong results in this case and must not be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably call it unknown
.
inline frame::frame(intptr_t* sp, address pc, intptr_t* unextended_sp, intptr_t* fp, CodeBlob* cb) | ||
: _sp(sp), _pc(pc), _cb(cb), _oop_map(nullptr), | ||
_on_heap(false), DEBUG_ONLY(_frame_index(-1) COMMA) _unextended_sp(unextended_sp), _fp(fp) { | ||
setup(); | ||
setup(kind::nmethod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think kind::nmethod
should only be used if cb != nullptr which is not checked, here. Is this one performance critical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think
kind::nmethod
should only be used if cb != nullptr which is not checked, here. Is this one performance critical?
I don't quite understand: the purpose of using kind::nmethod
is to allow for a fast lookup of the cb which is only done if cb != nullptr.
See also my other response where kind::nmethod
is default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one performance critical?
This is a good question. Honestly I have difficulties understanding why PCNs should be performance critical at all. AFAIK frames are only iterated on the slow path when freezing/thawing. Maybe the slow path is not that uncommen, e.g. if StackChunks are visited by GC.
I wanted to use kind::nmethod
as default whenever possible in order not to miss a place that actually is performance critical.
See also #8955 (comment)
@@ -1187,8 +1187,12 @@ void MacroAssembler::post_call_nop() { | |||
if (!Continuations::enabled()) { | |||
return; | |||
} | |||
// We use CMPI/CMPLI instructions to encode post call nops. | |||
// We set bit 9 to distinguish post call nops from real CMPI/CMPI instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be CMPI/CMPLI. Maybe add that CMPI and CMPLI opcodes only differ in one bit which we use to encode data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// |0 0 1 0 1|DATA HI| 1| DATA LO | | ||
// | |4 bits | | 22 bits | | ||
// | ||
// Bit 9 is alwys 1 for PCNs to distinguish them from CMPI/CMPLI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always
, maybe distinguish from "regular CMPI/CMPLI".
|
||
public: | ||
|
||
// Constructors | ||
inline frame(intptr_t* sp, intptr_t* fp, address pc); | ||
inline frame(intptr_t* sp, address pc, intptr_t* unextended_sp = nullptr, intptr_t* fp = nullptr, CodeBlob* cb = nullptr); | ||
inline frame(intptr_t* sp, address pc, kind knd = kind::nmethod); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using kind::nmethod
by default is potentially dangerous. The pc may be outside of the code cache and calling find_blob_fast would be unreliable. It's used by pns for debugging code. It doesn't look performance critical and we could use a conservative default.
I guess that we don't see issues because native code doesn't set bit 9 in CMPI/CMPLI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pns
does not use this constructor. It uses frame::frame(void* sp, void* fp, void* pc) : frame((intptr_t*)sp, (address)pc, kind::code_blob)
. So there's no problem. pns
seems to be the only user of this one. It might good to use kind::native
there.
Using kind::native
(or kind::unknow
) as default instead of kind::nmethod
is potentially problematic since there might be locations in shared code that should set kind::nmethod
. I think this requires a clean-up of the shared frame api. Note also that using the wrong kind (wrong constructor on other platfroms) hit the assertion in CodeCache::find_blob_and_oopmap
(that's how I noticed that the distinction is actually needed :))
@@ -89,21 +89,27 @@ inline void frame::setup() { | |||
inline frame::frame() : _sp(nullptr), _pc(nullptr), _cb(nullptr), _oop_map(nullptr), _deopt_state(unknown), | |||
_on_heap(false), DEBUG_ONLY(_frame_index(-1) COMMA) _unextended_sp(nullptr), _fp(nullptr) {} | |||
|
|||
inline frame::frame(intptr_t* sp) : frame(sp, nullptr) {} | |||
inline frame::frame(intptr_t* sp) : frame(sp, nullptr, kind::nmethod) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Potentially dangerous default value. Not performance critical AFAICS.
Mailing list message from Andrew Haley on hotspot-compiler-dev: On 12/20/23 20:36, Richard Reingruber wrote:
These data are really interesting. How did you gather them? Thanks. |
This is the code for the stats based on master: c376fcc |
The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork: git checkout ppc_post_call_nop
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push |
@reinrich this pull request can not be integrated into git checkout ppc_post_call_nop
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! The constructors should still be used with care, but I think your code is at least as good as other platforms (rather better IMHO).
@reinrich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
I intend to ship this ppc only pr tomorrow if the tests pass after merging master. I don't expect another review. |
/integrate |
Going to push as commit de97c0e.
Your commit was automatically rebased without conflicts. |
Implementation of post call nops (PCNs) on ppc64.
Depends on #17150
About post call nops:
Background:
Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames.
Post call nops on ppc64
1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1])
x86_64: 1 instruction, 8 bytes
aarch64: 3 instruction, 12 bytes
[1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B
https://openpowerfoundation.org/specifications/isa/
26 bits data payload
x86_64: 32 bits; aarch64: 32 bits
9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64).
x86_64: 8 bits; aarch64: 8 bits
17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment.
x86_64: 24 bits; aarch64: 24 bits
Also used when reconstructing the back chain after thawing continuation frames (see
Thaw<ConfigT>::patch_caller_links
)Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs.
The fast lookup may only be used if the pc is known to be in the code cache because
CodeCache::find_blob_fast
can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to passframe::kind::native
in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development.Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing
frame::kind::code_blob
.Statistics
Comments
C1: We get decode failures even if patching always succeeded because not all PCNs are patched. Only PCNs in nmethods are actually patched. E.g. C2 runtime stubs like
_new_array_nozero_Java
have PCNs that are not patched.C2: With Skynet.java there are 100x more PCN lookups. This is because it stresses virtual threads.
C2: With Skynet.java there are more PCN lookups on ppc64le. They originate from
Thaw<ConfigT>::patch_caller_links
.Testing
The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171
$ git checkout pull/17171
Update a local copy of the PR:
$ git checkout pull/17171
$ git pull https://git.openjdk.org/jdk.git pull/17171/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17171
View PR using the GUI difftool:
$ git pr show -t 17171
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17171.diff
Webrev
Link to Webrev Comment