-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8343789: Move mutable nmethod data out of CodeCache #21276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into |
|
@bulasevich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 41 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@bulasevich The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
@vnkozlov Hi Vladimir, |
b15f504 to
531b630
Compare
9444108 to
c90f5b7
Compare
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, thank you for doing his work.
Main question is: why you did it only for nmethod?
Second question: do you see any performance effects with this change?
My concern is that we iterate relocation info data from different memory space to patch code.
| } else { | ||
| address dummy = address(uintptr_t(pc()) & -wordSize); // A nearby aligned address | ||
| ldr_constant(dst, Address(dummy, rspec)); | ||
| mov(dst, Address(dummy, rspec)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- it is not a load from a Constant Pool, so calling ldr_constant here is seems incorrect
- the ldr_constant function utilizes either ldr (with a range limit of ±1MB) or, when -XX:-NearCpool is enabled, adrp (range limit of ±2GB) followed by ldr — both of which may fall short when mutable data is allocated on the C heap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks wrong, for a number of reasons. First, the dummy address would no longer be needed, and we could just use the same mov as the supports_instruction_patching() case. However, if supports_instruction_patching() is false, I think we are not allowed to generate a multi-instruction movz/movk sequence. We really need something like ldr_constant for this case, so that we load from memory.
However, as you point out, this is tied to NearCpool. For a far metadata slot access, ADR+LDR is the right answer. After this change, will there be any metadata left that could still benefit from NearCpool? If not, then it might make sense to turn it off permanently. Instead of choosing between PC-relative "ldr literal" and far ADR+LDR based on NearCpool, we could decide based on the distance to the metadata table. I believe "ldr literal" only has a 1MB range.
CC @theRealAph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Thanks for pointing that out to me.
I have a fix for movoop issue on supports_instruction_patching=false case. Probably it should be considered as a separate change: #22448
src/hotspot/share/code/codeBlob.hpp
Outdated
| address _mutable_data; | ||
| int _mutable_data_size; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add special CodeBlob subclass for nmethod to avoid increase of size for all blobs and stubs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. All CodeBlobs with relocation info needs a mutable data. Let me know if you think it must be a separate subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, move _mutable_data after _oop_maps and _mutable_data_size after _size to avoid padding.
Also update comment at line 70 to describe new CodeBlob layout.
|
Note, with https://bugs.openjdk.org/browse/JDK-8334691 and other changes I moving into direction to make relocation info data immutable. It is already "immutable" in mainline JDK after https://bugs.openjdk.org/browse/JDK-8333819. But it is still mutable in Leyden because we have to patch indexes during publishing nmethod. My idea was to move relocation info data (which has big size) into |
|
Mutable sizes % do not add up: |
Yes. I did it symmetrically to a separate immutable data storage for nmethod. Now I see I do not like implementation where for some blobs relocation info is local, but for other it stays aside. I am going to rework that.
On the aarch machine I see a slight improvement on the big benchmark caused by the code sparsity improvement. Though I need to do more benchmarking to make sure I am not making things worse for others. |
Thanks. The correct sizes: |
Hmm. If relocation info goes to an immutable blob, oops+metadata hardly deserves a separate blob. |
c90f5b7 to
52dd527
Compare
|
Performance update. On an aarch machine the CodeCacheStress benchmark shows a 1-2% performance improvement with this change, Statistics on the CodeCacheStress benchmark with high numberOfClasses-instanceCount-rangeOfClasses parameter values:
|
88e3106 to
a358c6b
Compare
|
-XX:+PrintNMethodStatistics |
Webrevs
|
|
It would be nice to make relocations immutable, but one roadblock is the use of relocInfo::change_reloc_info_for_address() by C1 patching. We would need to separate mutable and immutable relocations, or replace C1 patching with deoptimization, like on DEOPTIMIZE_WHEN_PATCHING aarch64. |
src/hotspot/share/code/codeBlob.cpp
Outdated
| // The mutable_data_size is either calculated by the nmethod constructor to account | ||
| // for reloc_info and additional data, or it is set here to accommodate only the relocation data. | ||
| _mutable_data_size = (mutable_data_size == 0) ? cb->total_relocation_size() : mutable_data_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange to treat relocations as special. Wouldn't it be better to have the caller always pass in the correct value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or compute using something like required_mutable_data_space()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Thank you. I moved mutable_data_size calculation out of CodeBlob.
src/hotspot/share/code/codeBlob.hpp
Outdated
| CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, | ||
| int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments); | ||
| int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, | ||
| int mutable_data_size = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to allow the default for mutable data size to be the relocations size, then instead of using = 0 here, you could do this instead:
CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size,
int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments,
int mutable_data_size);
CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size,
int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments) :
CodeBlob(name, kind, cb, size, header_size,
frame_complete_offset, frame_size, oop_maps, caller_must_gc_arguments,
cb->total_relocation_size)
{
}
but I would prefer not to treat relocations as special, and have the caller always pass the correct value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
…de option: _relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16. Fix: use _oops_size int16 field to calculate metadata offset
…o address scenarios where os::malloc allocates buffers beyond the typical ±4GB range accessible with adrp
…odeBlob purge to call os::free, fix nmethod::print, update Layout description
|
@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
Yes. Thanks! |
|
@bulasevich is it ready for testing now? |
@vnkozlov yes, it's ready for testing. Thanks! |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My testing tier1-7, stress, comp passed with one new failure JDK-8351457 which is not related I think.
|
Hi @dean-long, Would you mind doing a re-review of this PR? I have reverted the movement of oops into a separate buffer, as it caused issues on AArch. All platform-specific details are now removed, making the change much simpler. |
dean-long
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good.
|
Let me integrate. Many thanks to the reviewers! |
|
/integrate |
|
Going to push as commit 83de340.
Your commit was automatically rebased without conflicts. |
|
@bulasevich Pushed as commit 83de340. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
| (data_offset() + data_end_offset), nmethod_size); | ||
| CHECKED_CAST(_oops_size, uint16_t, align_up(code_buffer->total_oop_size(), oopSize)); | ||
| uint16_t metadata_size = (uint16_t)align_up(code_buffer->total_metadata_size(), wordSize); | ||
| JVMCI_ONLY(CHECKED_CAST(_jvmci_data_size, uint16_t, align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast is lossy in that jvmci_data->size() returns an int. It caused a double free or corruption (out) crash in Graal in the case where a JVMCINMethodData had a very long name. We've fixed this by limiting the length of the name but I'm wondering if there was some special reason for this cast? If so, can you please add extra logic preventing this code from running off the end of allocated memory:
#if INCLUDE_JVMCI
if (compiler->is_jvmci()) {
// Initialize the JVMCINMethodData object inlined into nm
jvmci_nmethod_data()->copy(jvmci_data);
}
#endif
If not, please remove the cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast was added by 8331087, which reduced the supported JVMCI data size to uint16_t. I don't remember this issue with long names coming up during that review, so I guess we all missed it. @dougxc please file a bug so we can track this. It seems like JVMCINMethodData::copy should do something like truncate long names rather than blindly assuming it has enough space. If uint16_t is unreasonably small for JVMCI nmethod data we could revert that change and make it 32 bits again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in 8331087, I think only _jvmci_data_offset was subject to the narrowing cast.
I've opened https://bugs.openjdk.org/browse/JDK-8355896.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my mistake. I was thinking _jvmci_data_offset was used to compute jvmci_data_end(), not jvmci_data_begin().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should use 32 bits. Even if we revert back to using _jvmci_data_offset we can NOT use uint16_t because size of relocation (after which JVMCI data is placed) data is bigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the report. Yes, cast to uint16 is wrong. I am going to fix the issue here: #24965
This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache.
OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data).
Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1–2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark.
The numbers. Immutable data constitutes ~30% on the nmehtod. Mutable data constitutes ~8% of nmethod. Example (statistics collected on the CodeCacheStress benchmark):
Functional testing: jtreg on arm/aarch/x86.
Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks.
Alternative solution (see comments): In the future, relocations can be moved to _immutable_data.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276$ git checkout pull/21276Update a local copy of the PR:
$ git checkout pull/21276$ git pull https://git.openjdk.org/jdk.git pull/21276/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21276View PR using the GUI difftool:
$ git pr show -t 21276Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21276.diff
Using Webrev
Link to Webrev Comment