Skip to content

8350488: [leyden] Experimental AOT-only mode#44

Closed
shipilev wants to merge 2 commits into
openjdk:premainfrom
shipilev:JDK-8350488-aot-only
Closed

8350488: [leyden] Experimental AOT-only mode#44
shipilev wants to merge 2 commits into
openjdk:premainfrom
shipilev:JDK-8350488-aot-only

Conversation

@shipilev

@shipilev shipilev commented Feb 21, 2025

Copy link
Copy Markdown
Member

There are interesting use cases where we want the AOT-only mode. We can emulate this in current Leyden prototype by relying on preload code, and stopping any profiling, which would naturally lead to no JIT compilations. This would also make interpreter code a bit faster in case we need to fall back there. This mode looks also helpful for studying the compiler dynamics.

Additional testing:

  • Eyeballing compilation logs with -XX:+PreloadOnly
  • Linux x86_64 server fastdebug, runtime/cds
  • Linux x86_64 server fastdebug, runtime/cds with -XX:+PreloadOnly

Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JDK-8350488: [leyden] Experimental AOT-only mode (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/44/head:pull/44
$ git checkout pull/44

Update a local copy of the PR:
$ git checkout pull/44
$ git pull https://git.openjdk.org/leyden.git pull/44/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 44

View PR using the GUI difftool:
$ git pr show -t 44

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/44.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper

bridgekeeper Bot commented Feb 21, 2025

Copy link
Copy Markdown

👋 Welcome back shade! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk

openjdk Bot commented Feb 21, 2025

Copy link
Copy Markdown

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8350488: [leyden] Experimental AOT-only mode

Reviewed-by: kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the premain branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk Bot added the rfr Pull request is ready for review label Feb 21, 2025
@mlbridge

mlbridge Bot commented Feb 21, 2025

Copy link
Copy Markdown

Webrevs

@shipilev

shipilev commented Feb 21, 2025

Copy link
Copy Markdown
Member Author

Sample performance results, after pre-training JavacBenchApp with 200x iterations:

# === 32 cores

# Default
  Time (mean ± σ):     403.4 ms ±   6.5 ms    [User: 1082.1 ms, System: 167.6 ms]
  Range (min … max):   387.6 ms … 417.7 ms    30 runs
 
# -XX:+PreloadOnly
  Time (mean ± σ):     439.1 ms ±   5.3 ms    [User: 544.2 ms, System: 84.2 ms]
  Range (min … max):   430.3 ms … 456.2 ms    30 runs
 
# === 2 cores

# Default
  Time (mean ± σ):     531.8 ms ±  30.1 ms    [User: 870.9 ms, System: 114.3 ms]
  Range (min … max):   479.6 ms … 606.8 ms    30 runs
 
# -XX:+PreloadOnly
  Time (mean ± σ):     425.8 ms ±   6.4 ms    [User: 530.2 ms, System: 76.4 ms]
  Range (min … max):   418.9 ms … 451.4 ms    30 runs

In both cases, "user" time goes down because we have no additional code load / JIT compilations. In 32-core case, we can see that peak performance suffers a bit, since preload code is not 100% efficient. The combination of these two factors is a net benefit in 2-core case: not doing JIT compilations more than pays for preload code inefficiency. This tradeoff of course depends on how well-trained the scenario is.

-XX:+PreloadBlocking shows that the "performance floor" for this workload is about 300ms, which means we run with only +33..80% "overhead" on this short workload. The bulk of this overhead caused by how fast we can load the C2 SC code.

@shipilev

Copy link
Copy Markdown
Member Author

Plus, preload code needs to be a bit more resilient to invalidations, which would improve the perf gap here: #38. With that POC patch applied, the peak performance gap shortens considerably:

# === 32 cores

# Default
  Time (mean ± σ):     404.0 ms ±  17.0 ms    [User: 1070.5 ms, System: 168.6 ms]
  Range (min … max):   381.0 ms … 464.8 ms    30 runs
 
# -XX:+PreloadOnly
  Time (mean ± σ):     415.1 ms ±   5.8 ms    [User: 525.3 ms, System: 82.5 ms]
  Range (min … max):   403.8 ms … 428.4 ms    30 runs

# === 2 cores

# Default
  Time (mean ± σ):     533.0 ms ±  29.3 ms    [User: 867.6 ms, System: 114.9 ms]
  Range (min … max):   463.6 ms … 587.6 ms    30 runs

# -XX:+PreloadOnly
  Time (mean ± σ):     404.5 ms ±   4.1 ms    [User: 508.8 ms, System: 78.4 ms]
  Range (min … max):   399.6 ms … 419.4 ms    30 runs

@vnkozlov vnkozlov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Next step could be to cache only preload code during training to reduce size of cached data and time to load it. You don't need to save profiling data too.

@openjdk openjdk Bot added the ready Pull request is ready to be integrated label Feb 21, 2025
@shipilev

Copy link
Copy Markdown
Member Author

Next step could be to cache only preload code during training to reduce size of cached data and time to load it. You don't need to save profiling data too.

Yes, that would a good next step if we find this thing useful.

Thanks for the review!

/integrate

@openjdk openjdk Bot added the sponsor Pull request is ready to be sponsored label Feb 21, 2025
@openjdk

openjdk Bot commented Feb 21, 2025

Copy link
Copy Markdown

@shipilev
Your change (at version 3635e59) is now ready to be sponsored by a Committer.

@vnkozlov

Copy link
Copy Markdown
Collaborator

/sponsor

@openjdk

openjdk Bot commented Feb 21, 2025

Copy link
Copy Markdown

Going to push as commit aa4e947.
Since your change was applied there have been 107 commits pushed to the premain branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk Bot added the integrated Pull request has been integrated label Feb 21, 2025
@openjdk openjdk Bot closed this Feb 21, 2025
@openjdk openjdk Bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Feb 21, 2025
@openjdk

openjdk Bot commented Feb 21, 2025

Copy link
Copy Markdown

@vnkozlov @shipilev Pushed as commit aa4e947.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@iwanowww

Copy link
Copy Markdown
Collaborator

Proposed solution looks to me way too specific and niche.

Instead, I'd prefer to see a way to combine different JIT-compilation modes (-XX:TieredStopAtLevel=01234) with ability to limit usage of AOTed code (e.g., -XX:DisableAOTCodeLevels=P124). PreloadOnly is equivalent to -XX:TieredStopAtLevel=0 and -XX:DisableAOTCodeLevels=124.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants