Skip to content

8321137: Relax ICStub alignment #16911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shipilev
Copy link
Member

@shipilev shipilev commented Nov 30, 2023

WIP, submitting for others to poke holes in it.

Similarly to JDK-8284578, we would like to handle ICStub alignment. Currently, the small stub that takes only 24 bytes of code is covered by 128 bytes on AArch64. This is due to the same thing fixed by JDK-8284578 for interpreter codelets: aligning twice the CodeEntryAlignment.

128 bytes per ICStub means we deplete 10K ICBuffer with only 79 stubs. This actually happens multiple times even on a simple HelloWorld.java invocation that invokes some javac code, causing ICBufferFull safepoints. We can increase ICBuffer size, especially after JDK-8314220, but we cannot do this without limits, since it eats up code cache.

But if we assume that code entry alignment is not a strict requirement, and used to improve performance for frequently used code, then maybe we do not have to over-align the IC stub, given it is probably only used during IC transitions? It would significantly improve ICStub footprint and require smaller ICBuffer.

Current patch affects ICStub size in different ways on different platforms, since current size is effectively 2xCodeEntryAlignment, and new size is cache line size:

  • AArch64: 128 -> 64 bytes
  • x86_64: 64 -> 64 bytes
  • PPC64: 512 -> 128 bytes
  • S390X: 128 -> 256 bytes (!)
  • ARM: 32 -> 64 bytes (!)
  • Zero: <not applicable, Zero does not use IC stubs>

Additional testing:

  • Linux x86_64 server fastdebug tier1 tier2 tier3
  • Linux AArch64 server fastdebug tier1 tier2 tier3

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8321137: Relax ICStub alignment (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16911/head:pull/16911
$ git checkout pull/16911

Update a local copy of the PR:
$ git checkout pull/16911
$ git pull https://git.openjdk.org/jdk.git pull/16911/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16911

View PR using the GUI difftool:
$ git pr show -t 16911

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16911.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 30, 2023

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 30, 2023
@shipilev shipilev marked this pull request as ready for review November 30, 2023 21:31
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 30, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 30, 2023

Webrevs

@shipilev shipilev marked this pull request as draft November 30, 2023 21:35
@openjdk openjdk bot removed the rfr Pull request is ready for review label Nov 30, 2023
@dean-long
Copy link
Member

This is a good idea. I was working on something similar, making the logic smarter so code start could still align with CodeEntryAlignment while still allowing the header to use a smaller alignment. But that might be overkill, because as you say, this code is transient. And some day we might even be able to get rid of ICStub.

@fisk
Copy link
Contributor

fisk commented Dec 1, 2023

Driveby comment to consider...
One thing that sometimes keeps me up at night is that when ICStubs are small enough that a newly alocated one shares s cache line with a previously allocated one, then we start to rely on instruction cache coherency working properly for correctness. In particular a CPU might have both the destination of an inline cache and the contents of the ICStub cache line in its instruction caches. Yet when we first update the ICStub and then subsequently update the destination to point at the stub, we are relying on instruction fetching being ordered.

We have a similar situation with static call stubs. But there we have an isb instruction in the static call stub which is there from when an nmethod is compiled, which deals with this exact problem. Now thatstub is used to transfer control to the interpreter and we can do inefficient things. As for ICStubs, it would probably not be very pretty.

So instead, as I see it, we have kind of relied on ICStubs being spaced out in memory enough that the instruction fetcher would just not have it cached since the ICStub finalization which happens in a safepoint and is followed by cross modifying code fences on all threads.

Because of this, I think I would sleep a bit worse at night, especially regarding AArch64, if these stubs start to get too cozy with each other in memory. I sleep better with the social distancing policy of today, where the stubs are aware there is a dangerous disease to be worried about from their friends and not to catch it.

Having said that, I am currently removing ICStubs alltogether and hope to integrate in a not too distant future.

Just my 50 I$...

@dean-long
Copy link
Member

To address @fisk concerns, how about we reduce the alignment of ICStub code_begin to cache line size, and reduce the alignment of the ICStub header to alignof(ICStub)? The remaining problem, allowing ICStub_from_destination_address() to correctly map from code_begin back to the header, is still possible and is something I have been working on. The solution is to align code_begin first and compute the header location from that, instead of the other way around.

@fisk
Copy link
Contributor

fisk commented Dec 1, 2023

That sounds like a good idea, @dean-long

@shipilev
Copy link
Member Author

shipilev commented Dec 4, 2023

All right, keeping ICStub per cache line then! I think the easier way to achieve this is to align the entire stub at cache line size, and then align code section at smaller alignment. This assumes instruction and data cache line sizes are the same and covered by default. This also still assumes that calling into ICStub entry at more lax alignment is fine. The new patch still reduces the ICStub size from 128 bytes to 64 bytes on AArch64.

(If I have time, I would look into actually aligning code_begin first, as @dean-long suggested, but I expect it to be more intrusive.)

@shipilev
Copy link
Member Author

shipilev commented Dec 4, 2023

I think the easier way to achieve this is to align the entire stub at cache line size, and then align code section at smaller alignment.

Grrr, this does not work, because StubQueue computes the stub size delta code_end and code_begin and checks it to be the unit of alignment. This only works if code section alignment is at equal or larger of the stub alignment. So my current patch that does stub-align=CL-size; code-align=HW-size fails.

Another attempt is align both stub and code section to CL-size/2, which gives us the size and alignments we want without messing anything else.

@dean-long
Copy link
Member

This version looks good too.

@shipilev
Copy link
Member Author

shipilev commented Dec 5, 2023

Depends on #16973 to avoid footprint regression on x86_64.

@shipilev
Copy link
Member Author

shipilev commented Jan 4, 2024

This apparently morphed to "Let's allocate ICStub per cache line" instead of relaxing the alignment. The footprint improvement on most important architectures would be a side effect of this. I am going to restart this work in a separate PR to cleanly capture this development.

@shipilev
Copy link
Member Author

shipilev commented Jan 5, 2024

Closed in favor of #17277.

@shipilev shipilev closed this Jan 5, 2024
@shipilev shipilev deleted the JDK-8321137-relax-icstub-align branch January 8, 2025 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants