Skip to content

Conversation

@bulasevich
Copy link
Contributor

@bulasevich bulasevich commented Oct 1, 2024

This change relocates mutable data (such as relocations, metadata, jvmci data) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data out of the CodeCache.

OOPs was initially moved to a new mutable data blob, but then moved back to nmethod due to performance issues on dacapo benchmarks on aarch with ShenandoagGC (why Shenandoah: it is the only GC with supports_instruction_patching=false - it requires loading from the oops table in compiled code, which takes three instructions for a remote data).

Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1–2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark.

The numbers. Immutable data constitutes ~30% on the nmehtod. Mutable data constitutes ~8% of nmethod. Example (statistics collected on the CodeCacheStress benchmark):

  • nmethod_count:134000, total_compilation_time: 510460ms
  • total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms,
  • total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB

Functional testing: jtreg on arm/aarch/x86.
Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks.

Alternative solution (see comments): In the future, relocations can be moved to _immutable_data.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8343789: Move mutable nmethod data out of CodeCache (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276
$ git checkout pull/21276

Update a local copy of the PR:
$ git checkout pull/21276
$ git pull https://git.openjdk.org/jdk.git pull/21276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21276

View PR using the GUI difftool:
$ git pr show -t 21276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21276.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 1, 2024

👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@bulasevich This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8343789: Move mutable nmethod data out of CodeCache

Reviewed-by: kvn, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 41 new commits pushed to the master branch:

  • ffa6340: 8351567: Jar Manifest test ValueUtf8Coding produces misleading diagnostic output
  • 8d8bd0c: 8349492: Update sun/security/pkcs12/KeytoolOpensslInteropTest.java to use a recent Openssl version
  • 73465b9: 8160327: Support for thumbnails present in APP1 marker for JPEG
  • dbdbbd4: 8348597: Update HarfBuzz to 10.4.0
  • 7999091: 8351555: Help section added in JDK-8350638 uses invalid HTML
  • 8450ae9: 8351440: Link with -reproducible on macOS
  • b40be22: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges
  • 6b84bde: 8350007: Add usage message to the javadoc executable
  • 32f2c2d: 8351017: ChronoUnit.MONTHS.between() not giving correct result when date is in February
  • d90b79a: 8351046: Rename ObjectMonitor functions
  • ... and 31 more: https://git.openjdk.org/jdk/compare/cfab88b1a2351a187bc1be153be96ca983a7776c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@bulasevich The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 1, 2024
@bulasevich
Copy link
Contributor Author

@vnkozlov Hi Vladimir,
What do you think about the idea of ​​moving relocInfo data out of nmethod additionally to recent Move immutable nmethod data from CodeCache? It would reduce the CodeHeap fill by 5%.

@bulasevich bulasevich force-pushed the mutable_data branch 2 times, most recently from 9444108 to c90f5b7 Compare November 7, 2024 19:03
@bulasevich bulasevich changed the title move nmethods relocInfo data to C heap 8343789: Move mutable nmethod data out of CodeCache Nov 7, 2024
Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thank you for doing his work.

Main question is: why you did it only for nmethod?

Second question: do you see any performance effects with this change?
My concern is that we iterate relocation info data from different memory space to patch code.

} else {
address dummy = address(uintptr_t(pc()) & -wordSize); // A nearby aligned address
ldr_constant(dst, Address(dummy, rspec));
mov(dst, Address(dummy, rspec));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • it is not a load from a Constant Pool, so calling ldr_constant here is seems incorrect
  • the ldr_constant function utilizes either ldr (with a range limit of ±1MB) or, when -XX:-NearCpool is enabled, adrp (range limit of ±2GB) followed by ldr — both of which may fall short when mutable data is allocated on the C heap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks wrong, for a number of reasons. First, the dummy address would no longer be needed, and we could just use the same mov as the supports_instruction_patching() case. However, if supports_instruction_patching() is false, I think we are not allowed to generate a multi-instruction movz/movk sequence. We really need something like ldr_constant for this case, so that we load from memory.
However, as you point out, this is tied to NearCpool. For a far metadata slot access, ADR+LDR is the right answer. After this change, will there be any metadata left that could still benefit from NearCpool? If not, then it might make sense to turn it off permanently. Instead of choosing between PC-relative "ldr literal" and far ADR+LDR based on NearCpool, we could decide based on the distance to the metadata table. I believe "ldr literal" only has a 1MB range.
CC @theRealAph

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Thanks for pointing that out to me.
I have a fix for movoop issue on supports_instruction_patching=false case. Probably it should be considered as a separate change: #22448

Comment on lines 127 to 129
address _mutable_data;
int _mutable_data_size;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add special CodeBlob subclass for nmethod to avoid increase of size for all blobs and stubs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. All CodeBlobs with relocation info needs a mutable data. Let me know if you think it must be a separate subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, move _mutable_data after _oop_maps and _mutable_data_size after _size to avoid padding.

Also update comment at line 70 to describe new CodeBlob layout.

@vnkozlov
Copy link
Contributor

vnkozlov commented Nov 7, 2024

Note, with https://bugs.openjdk.org/browse/JDK-8334691 and other changes I moving into direction to make relocation info data immutable. It is already "immutable" in mainline JDK after https://bugs.openjdk.org/browse/JDK-8333819. But it is still mutable in Leyden because we have to patch indexes during publishing nmethod.

My idea was to move relocation info data (which has big size) into immutable data section of nmethod. And leave mutable _oops and _metadata together with code since they are relatively small and we need to patch them together with code.

@vnkozlov
Copy link
Contributor

vnkozlov commented Nov 7, 2024

Mutable sizes % do not add up:

mutable data    = 6071648 (9.396180%)
   relocation    = 3437176 (12.846409%)
   oops          = 239488 (0.895084%)
   metadata      = 2394984 (8.951227%)

@bulasevich
Copy link
Contributor Author

bulasevich commented Nov 9, 2024

Main question is: why you did it only for nmethod?

Yes. I did it symmetrically to a separate immutable data storage for nmethod. Now I see I do not like implementation where for some blobs relocation info is local, but for other it stays aside. I am going to rework that.

any performance effects with this change?

On the aarch machine I see a slight improvement on the big benchmark caused by the code sparsity improvement. Though I need to do more benchmarking to make sure I am not making things worse for others.

Benchmark              Mode  Cnt    Score   Error  Units  |  Benchmark              Mode  Cnt    Score   Error  Units
JmhDotty.runOperation    ss  999  861.717 ± 1.543  ms/op  |  JmhDotty.runOperation    ss  999  840.959 ± 1.473  ms/op
                                                          |
       34555411781      cache-misses:u                    |         34343012187      cache-misses:u
     2913869717708      cpu-cycles:u                      |       2863838151745      cpu-cycles:u
     4185324759051      instructions:u                    |       4209616523046      instructions:u            
     1460914744576      L1-icache-loads:u                 |       1452066316397      L1-icache-loads:u
       97806845375      L1-icache-load-misses:u           |         93815390496      L1-icache-load-misses:u   
     1191854820746      iTLB-loads:u                      |       1169231847276      iTLB-loads:u
       10591067761      iTLB-load-misses:u                |         10134696419      iTLB-load-misses:u        
      838964735227      branch-loads:u                    |        838353168582      branch-loads:u
       25829615231      branch-load-misses:u              |         24361474411      branch-load-misses:u
      836291984964      br_pred:u                         |        838153583659      br_pred:u
       25733552818      br_mis_pred:u                     |         24353396612      br_mis_pred:u
         562168308      group0-code_sparsity:u            |           449848707      group0-code_sparsity:u

@bulasevich
Copy link
Contributor Author

bulasevich commented Nov 9, 2024

Mutable sizes % do not add up:

Thanks. The correct sizes:

Statistics for 21032 bytecoded nmethods for C1:
 mutable data    = 10488856 (8.688409%)
   relocation    = 6573064 (62.667118%)
   oops          = 515680 (4.916456%)
   metadata      = 3400112 (32.416424%)
Statistics for 8171 bytecoded nmethods for C2:
 mutable data    = 6572064 (10.118859%)
   relocation    = 3406216 (51.828709%)
   oops          = 305912 (4.654733%)
   metadata      = 2859936 (43.516556%)

@bulasevich
Copy link
Contributor Author

My idea was to move relocation info data (which has big size) into immutable data section of nmethod. And leave mutable _oops and _metadata together with code since they are relatively small and we need to patch them together with code.

Hmm. If relocation info goes to an immutable blob, oops+metadata hardly deserves a separate blob.

@bulasevich
Copy link
Contributor Author

Performance update. On an aarch machine the CodeCacheStress benchmark shows a 1-2% performance improvement with this change,

Statistics on the CodeCacheStress benchmark with high numberOfClasses-instanceCount-rangeOfClasses parameter values:

  • nmethod_count:134000, total_compilation_time: 510460ms
  • total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms,
  • total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB

@bulasevich bulasevich force-pushed the mutable_data branch 3 times, most recently from 88e3106 to a358c6b Compare November 19, 2024 13:26
@bulasevich
Copy link
Contributor Author

-XX:+PrintNMethodStatistics

Statistics for 21032 bytecoded nmethods for C1:
 total size      = 120722408 (100%)
 in CodeCache    = 79358808 (65.736603%)
   header        = 5215936 (6.572599%)
   constants     = 320 (0.000403%)
   main code     = 69017912 (86.969444%)
   stub code     = 5124640 (6.457557%)
 mutable data    = 10488856 (8.688409%)
   relocation    = 6573064 (62.667118%)
   oops          = 515680 (4.916456%)
   metadata      = 3400112 (32.416424%)
 immutable data  = 30874744 (25.574991%)
   dependencies  = 636240 (2.060714%)
   nul chk table = 756920 (2.451583%)
   handler table = 180456 (0.584478%)
   scopes pcs    = 16052608 (51.992683%)
   scopes data   = 13248520 (42.910542%)
Statistics for 8171 bytecoded nmethods for C2:
 total size      = 64948664 (100%)
 in CodeCache    = 25580504 (39.385727%)
   header        = 2026408 (7.921689%)
   constants     = 448 (0.001751%)
   main code     = 20925472 (81.802422%)
   stub code     = 2628176 (10.274137%)
 mutable data    = 6572064 (10.118859%)
   relocation    = 3406216 (51.828709%)
   oops          = 305912 (4.654733%)
   metadata      = 2859936 (43.516556%)
 immutable data  = 32796096 (50.495411%)
   dependencies  = 926992 (2.826532%)
   nul chk table = 537024 (1.637463%)
   handler table = 1695568 (5.170030%)
   scopes pcs    = 15451968 (47.115265%)
   scopes data   = 14184544 (43.250710%)

@bulasevich bulasevich marked this pull request as ready for review November 21, 2024 14:13
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2024
@mlbridge
Copy link

mlbridge bot commented Nov 21, 2024

@dean-long
Copy link
Member

It would be nice to make relocations immutable, but one roadblock is the use of relocInfo::change_reloc_info_for_address() by C1 patching. We would need to separate mutable and immutable relocations, or replace C1 patching with deoptimization, like on DEOPTIMIZE_WHEN_PATCHING aarch64.

Comment on lines 101 to 103
// The mutable_data_size is either calculated by the nmethod constructor to account
// for reloc_info and additional data, or it is set here to accommodate only the relocation data.
_mutable_data_size = (mutable_data_size == 0) ? cb->total_relocation_size() : mutable_data_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange to treat relocations as special. Wouldn't it be better to have the caller always pass in the correct value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or compute using something like required_mutable_data_space()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Thank you. I moved mutable_data_size calculation out of CodeBlob.

CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size,
int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments);
int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments,
int mutable_data_size = 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to allow the default for mutable data size to be the relocations size, then instead of using = 0 here, you could do this instead:

 CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size,
           int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments,
           int mutable_data_size);

 CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size,
           int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments) : 
    CodeBlob(name, kind, cb, size, header_size,
           frame_complete_offset, frame_size, oop_maps, caller_must_gc_arguments,
           cb->total_relocation_size)
{
}

but I would prefer not to treat relocations as special, and have the caller always pass the correct value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

…de option:

_relocation_size can exceed 64Kb, in this case _metadata_offset do not fit into int16.
Fix: use _oops_size int16 field to calculate metadata offset
…o address scenarios

where os::malloc allocates buffers beyond the typical ±4GB range accessible with adrp
…odeBlob purge to call os::free, fix nmethod::print, update Layout description
@openjdk
Copy link

openjdk bot commented Mar 6, 2025

@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

@bulasevich
Copy link
Contributor Author

Please swap matadata and jvmci data in outputs ...

Also please merge latest JDK which have SA cleanup related to compilers: #23782

Yes. Thanks!

@vnkozlov
Copy link
Contributor

vnkozlov commented Mar 6, 2025

@bulasevich is it ready for testing now?

@bulasevich
Copy link
Contributor Author

@bulasevich is it ready for testing now?

@vnkozlov yes, it's ready for testing. Thanks!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My testing tier1-7, stress, comp passed with one new failure JDK-8351457 which is not related I think.

@bulasevich
Copy link
Contributor Author

Hi @dean-long,

Would you mind doing a re-review of this PR? I have reverted the movement of oops into a separate buffer, as it caused issues on AArch. All platform-specific details are now removed, making the change much simpler.

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 10, 2025
@bulasevich
Copy link
Contributor Author

Let me integrate. Many thanks to the reviewers!

@bulasevich
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 11, 2025

Going to push as commit 83de340.
Since your change was applied there have been 47 commits pushed to the master branch:

  • 0de2cdd: 8351458: (ch) Move preClose to UnixDispatcher
  • cd9f1d3: 8286204: [Accessibility,macOS,VoiceOver] VoiceOver reads the spinner value 10 as 1 when user iterates to 10 for the first time on macOS
  • 4cf6316: 8351414: C2: MergeStores must happen after RangeCheck smearing
  • 8a5ed47: 8350148: Native stack overflow when writing Java heap objects into AOT cache
  • 5928209: 8347405: MergeStores with reverse bytes order value
  • f984c2b: 8351505: (fs) Typo in the documentation of java.nio.file.spi.FileSystemProvider.getFileSystem()
  • ffa6340: 8351567: Jar Manifest test ValueUtf8Coding produces misleading diagnostic output
  • 8d8bd0c: 8349492: Update sun/security/pkcs12/KeytoolOpensslInteropTest.java to use a recent Openssl version
  • 73465b9: 8160327: Support for thumbnails present in APP1 marker for JPEG
  • dbdbbd4: 8348597: Update HarfBuzz to 10.4.0
  • ... and 37 more: https://git.openjdk.org/jdk/compare/cfab88b1a2351a187bc1be153be96ca983a7776c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 11, 2025
@openjdk openjdk bot closed this Mar 11, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 11, 2025
@openjdk
Copy link

openjdk bot commented Mar 11, 2025

@bulasevich Pushed as commit 83de340.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

(data_offset() + data_end_offset), nmethod_size);
CHECKED_CAST(_oops_size, uint16_t, align_up(code_buffer->total_oop_size(), oopSize));
uint16_t metadata_size = (uint16_t)align_up(code_buffer->total_metadata_size(), wordSize);
JVMCI_ONLY(CHECKED_CAST(_jvmci_data_size, uint16_t, align_up(compiler->is_jvmci() ? jvmci_data->size() : 0, oopSize)));
Copy link
Member

@dougxc dougxc Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is lossy in that jvmci_data->size() returns an int. It caused a double free or corruption (out) crash in Graal in the case where a JVMCINMethodData had a very long name. We've fixed this by limiting the length of the name but I'm wondering if there was some special reason for this cast? If so, can you please add extra logic preventing this code from running off the end of allocated memory:

#if INCLUDE_JVMCI
    if (compiler->is_jvmci()) {
      // Initialize the JVMCINMethodData object inlined into nm
      jvmci_nmethod_data()->copy(jvmci_data);
    }
#endif

If not, please remove the cast.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast was added by 8331087, which reduced the supported JVMCI data size to uint16_t. I don't remember this issue with long names coming up during that review, so I guess we all missed it. @dougxc please file a bug so we can track this. It seems like JVMCINMethodData::copy should do something like truncate long names rather than blindly assuming it has enough space. If uint16_t is unreasonably small for JVMCI nmethod data we could revert that change and make it 32 bits again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in 8331087, I think only _jvmci_data_offset was subject to the narrowing cast.
I've opened https://bugs.openjdk.org/browse/JDK-8355896.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my mistake. I was thinking _jvmci_data_offset was used to compute jvmci_data_end(), not jvmci_data_begin().

Copy link
Contributor

@vnkozlov vnkozlov Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should use 32 bits. Even if we revert back to using _jvmci_data_offset we can NOT use uint16_t because size of relocation (after which JVMCI data is placed) data is bigger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report. Yes, cast to uint16 is wrong. I am going to fix the issue here: #24965

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

8 participants