8325469: Freeze/Thaw code can crash in the presence of OSR frames #18637

pchilano · 2024-04-04T19:52:18Z

Freeze/thaw code assumes that a compiled frame for a method where num_stack_arg_slots() > 0 will always have the arguments setup above the metadata at the bottom of the frame. But when converting an interpreter frame to a compiled frame during OSR we don't explicitly leave room for the stack arguments after popping the interpreter frame. All parameters needed will be read from the "buf" array and stored inside the frame before calling OSR_migration_end().

This mismatch in how the stack looks and what we assume can lead to different crashes. In particular the issue happens when the OSR conversion happens for the bottom-most frame in the stack. If the OSR frame has a caller in the stack then there is no issue on freezing/thawing. I added more details about this in the bug comments.

When the OSR conversion happens for the bottom-most frame then a future freeze/thaw can lead to crashes for all cases: freeze_fast/thaw_fast, freeze_fast/thaw_slow, freeze_slow/thaw_slow. When freezing fast, either thawing fast or slow can lead to trying to read past the bottom of the stackChunk or writing below the allocated space in the stack. The freeze slow case is almost okay, except that it uncovered an invalid assert that is triggered if the size of the OSR frame plus all the other frames we freeze takes less space than the size of locals minus parameters of the interpreter frame that was OSR. I also added more details about these in the bug comments.

I tested different fixes, but I think the most straightforward one is to add _num_stack_arg_slots in the nmethod class and initialize it accordingly depending on whether the nmethod is an OSR one or not.

The patch includes a new test that exercises all these possible combinations of OSR frame at bottom of stack or not, and then freezing fast/slow and thawing fast/slow. The bottom case where we freeze fast and thaw slow reproduces the originally reported crash. There are actually two different failure modes depending of whether this is a thaw top or return barrier case. The other bottom cases lead to the other crashes described in the bug comments.
The new test uncover another bug besides the OSR issues, but since it's a different one I filed a separate JBS issue (JDK-8329665) and I made this a dependent PR.

I tested the current patch with the new test and also run it through mach5 tiers1-6.

Thanks,
Patricio

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8325469: Freeze/Thaw code can crash in the presence of OSR frames (Bug - P3)

Reviewers

Ron Pressler (@pron - Committer) ⚠️ Review applies to ab275358
Dean Long (@dean-long - Reviewer) ⚠️ Review applies to ab275358

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18637/head:pull/18637
$ git checkout pull/18637

Update a local copy of the PR:
$ git checkout pull/18637
$ git pull https://git.openjdk.org/jdk.git pull/18637/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18637

View PR using the GUI difftool:
$ git pr show -t 18637

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18637.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-04-04T19:52:43Z

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into pr/18632 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-04-04T19:53:25Z

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8325469: Freeze/Thaw code can crash in the presence of OSR frames

Reviewed-by: rpressler, dlong

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

76cbe48: 8329430: MetaspaceShared::preload_and_dump should clear pending exception
f7c8413: 8326116: JFR: Add help option to -XX:StartFlightRecording
941bee1: 8327640: Allow NumberFormat strict parsing
2ede143: 8330279: Typo in java.text.Bidi class description
90df3b7: 8329190: (ch) DatagramChannel.receive should throw ClosedChannelException when called on closed channel

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2024-04-04T19:53:52Z

@pchilano The following labels will be automatically applied to this pull request:

core-libs
hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

pchilano · 2024-04-04T19:56:09Z

/label remove core-libs

openjdk · 2024-04-04T19:57:49Z

@pchilano
The core-libs label was successfully removed.

mlbridge · 2024-04-04T21:10:21Z

Webrevs

dean-long · 2024-04-05T23:43:27Z

This looks good, but have you considered computing the value every time instead of caching it in _num_stack_arg_slots and increasing the size of every nmethod?

pron · 2024-04-05T20:39:55Z

src/hotspot/share/code/nmethod.cpp

@@ -801,6 +802,7 @@ nmethod::nmethod(

    init_defaults();
    _entry_bci               = entry_bci;
+    _num_stack_arg_slots     = entry_bci != InvocationEntryBci ? 0 : _method->constMethod()->num_stack_arg_slots();


If I understand correctly, is the condition on this line the actual fix?

Yes. The point is that _num_stack_arg_slots should not be fixed for a given Method as now but it should depend on the actual nmethod.

pron · 2024-04-05T20:42:47Z

test/jdk/jdk/internal/vm/Continuation/OSRTest.java

+        while (!cont.isDone()) {
+            cont.run();
+            if (freezeFast && !thawFast && fooCallCount == 2) {
+                // All frames freezed in last yield should be compiled


freezed -> frozen

pron · 2024-04-05T20:46:37Z

test/jdk/jdk/internal/vm/Continuation/OSRTest.java

+        // provoke OSR compilation
+        for (int i = 0; i < 500_000 * fooCallCount; i++) {
+        }
+        fooCallCount++;


Perhaps use WhiteBox to check if we're OSRed?

I'll test using isMethodCompiled(m, true) as another condition to break the loop.

pron · 2024-04-05T20:53:04Z

test/jdk/jdk/internal/vm/Continuation/OSRTest.java

+        // provoke OSR compilation
+        for (int i = 0; i < 5_000_000 * fooCallCount; i++) {
+        }
+        fooCallCount++;


Ditto. Perhaps use WhiteBox to check if we're OSRed?

pchilano · 2024-04-08T14:12:45Z

This looks good, but have you considered computing the value every time instead of caching it in _num_stack_arg_slots and increasing the size of every nmethod?

Since this is used in the thaw fast path too I wanted the avoid the extra load of constMethod if possible, but I think either case is fine. Moving _is_unlinked to where the other booleans are defined actually keeps the size of the nmethod same as before (368 bytes). What do you think?

dean-long · 2024-04-10T06:31:35Z

Since this is used in the thaw fast path too I wanted the avoid the extra load of constMethod if possible, but I think either case is fine. Moving _is_unlinked to where the other booleans are defined actually keeps the size of the nmethod same as before (368 bytes). What do you think?

Can you do a performance measurement to see if the extra load actually makes a difference. I think @vnkozlov is also doing nmethod field reordering/compaction, so the relative overhead of an extra field might not remain 0.

pron · 2024-04-10T18:06:33Z

It may be hard to do a proper measurement because the number of methods in our microbenchmarks is small. We're also talking an extra branch, I think. This is code than can be called a million times per second per core. It's very performance sensitive. So I would prefer to first see if there's an impact on nmethod size, and only if there is consider whether the speed implications are acceptable.

dean-long · 2024-04-10T19:47:22Z

OK, let's go with the new nmethod field.

openjdk-notifier · 2024-04-16T14:13:57Z

The parent pull request that this pull request depends on has now been integrated and the target branch of this pull request has been updated. This means that changes from the dependent pull request can start to show up as belonging to this pull request, which may be confusing for reviewers. To remedy this situation, simply merge the latest changes from the new target branch into this pull request by running commands similar to these in the local repository for your personal fork:

git checkout JDK-8325469
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# if there are conflicts, follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk · 2024-04-16T14:15:49Z

@pchilano this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8325469
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

openjdk · 2024-04-16T14:56:10Z

⚠️ @pchilano This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

pchilano · 2024-04-17T16:17:39Z

Thanks for the reviews @pron and @dean-long!

pchilano · 2024-04-17T16:17:44Z

/integrate

openjdk · 2024-04-17T16:18:57Z

Going to push as commit fd331ff.
Since your change was applied there have been 30 commits pushed to the master branch:

9fd7802: 8325494: C2: Broken graph after not skipping CastII node anymore for Assertion Predicates after JDK-8309902
192ec38: 8329595: spurious variable "might not have been initialized" on static final field
03e8417: 8329948: Remove string template feature
ff3e76f: 8330053: JFR: Use LocalDateTime instead ZonedDateTime
811aadd: 8324683: Unify AttachListener code for Posix platforms
5841cb3: 8330107: Separate out "awt" libraries from Awt2dLibraries.gmk
89129e3: 8212895: ChronoField.INSTANT_SECONDS's range doesn't match the range of Instant
9445047: 8330163: C2: improve CMoveNode::Value() when condition is always true or false
d2f9a1e: Merge
33d7127: 8322122: Enhance generation of addresses
... and 20 more: https://git.openjdk.org/jdk/compare/f11a496de61d800a680517457eb43b078a633953...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-04-17T16:19:03Z

@pchilano Pushed as commit fd331ff.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

vnkozlov · 2024-04-17T20:19:00Z

Hi @pchilano

This change did affect my PR which try to reduce nmethod header size #18768.

I am fine with caching the value in nmethod but why you used int field for it? It is u2 in constMethod.hpp#L209.

I am currently resolving conflict in my PR with your changes and I am planning to use u2 for it in nmethod too. Are you okay with that?

pchilano · 2024-04-17T20:32:37Z

Hi @pchilano

This change did affect my PR which try to reduce nmethod header size #18768.

I am fine with caching the value in nmethod but why you used int field for it? It is u2 in constMethod.hpp#L209.

I am currently resolving conflict in my PR with your changes and I am planning to use u2 for it in nmethod too. Are you okay with that?

Yes. I just used int because that was the return value of num_stack_arg_slots() that I moved from method.hpp, but I missed the field can just be defined as a u2 instead.

vnkozlov · 2024-04-17T20:57:22Z

Yes. I just used int because that was the return value of num_stack_arg_slots() that I moved from method.hpp, but I missed the field can just be defined as a u2 instead.

Okay. Thanks!

pchilano added 2 commits April 4, 2024 11:49

v1

33354c7

v1

07a9cb5

openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Apr 4, 2024

openjdk bot removed the core-libs core-libs-dev@openjdk.org label Apr 4, 2024

pchilano marked this pull request as ready for review April 4, 2024 21:06

openjdk bot added the rfr Pull request is ready for review label Apr 4, 2024

pron reviewed Apr 8, 2024

View reviewed changes

pchilano added 2 commits April 8, 2024 08:59

take ResourceMark out of debug only

1636b16

fix comment

b35306f

use WhiteBox to verify OSR compilation

ab27535

pron approved these changes Apr 9, 2024

View reviewed changes

dean-long approved these changes Apr 10, 2024

View reviewed changes

openjdk-notifier bot changed the base branch from pr/18632 to master April 16, 2024 14:13

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Apr 16, 2024

Merge branch 'JDK-8329665' into JDK-8325469

dd2a1da

Merge branch 'master' into JDK-8325469

e614e73

openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Apr 16, 2024

openjdk bot added the integrated Pull request has been integrated label Apr 17, 2024

openjdk bot closed this Apr 17, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 17, 2024

vnkozlov mentioned this pull request Apr 18, 2024

8329433: Reduce nmethod header size #18768

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8325469: Freeze/Thaw code can crash in the presence of OSR frames #18637

8325469: Freeze/Thaw code can crash in the presence of OSR frames #18637

pchilano commented Apr 4, 2024 •

edited by openjdk bot

bridgekeeper bot commented Apr 4, 2024

openjdk bot commented Apr 4, 2024 •

edited

openjdk bot commented Apr 4, 2024

pchilano commented Apr 4, 2024

openjdk bot commented Apr 4, 2024

mlbridge bot commented Apr 4, 2024 •

edited

dean-long commented Apr 5, 2024

pron Apr 5, 2024

pchilano Apr 8, 2024

pron Apr 5, 2024

pchilano Apr 8, 2024

pron Apr 5, 2024

pchilano Apr 8, 2024

pron Apr 5, 2024

pchilano commented Apr 8, 2024

dean-long commented Apr 10, 2024

pron commented Apr 10, 2024

dean-long commented Apr 10, 2024

openjdk-notifier bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

pchilano commented Apr 17, 2024

pchilano commented Apr 17, 2024

openjdk bot commented Apr 17, 2024

openjdk bot commented Apr 17, 2024

vnkozlov commented Apr 17, 2024

pchilano commented Apr 17, 2024

vnkozlov commented Apr 17, 2024

8325469: Freeze/Thaw code can crash in the presence of OSR frames #18637

8325469: Freeze/Thaw code can crash in the presence of OSR frames #18637

Conversation

pchilano commented Apr 4, 2024 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Apr 4, 2024

openjdk bot commented Apr 4, 2024 • edited

openjdk bot commented Apr 4, 2024

pchilano commented Apr 4, 2024

openjdk bot commented Apr 4, 2024

mlbridge bot commented Apr 4, 2024 • edited

Webrevs

dean-long commented Apr 5, 2024

pron Apr 5, 2024

Choose a reason for hiding this comment

pchilano Apr 8, 2024

Choose a reason for hiding this comment

pron Apr 5, 2024

Choose a reason for hiding this comment

pchilano Apr 8, 2024

Choose a reason for hiding this comment

pron Apr 5, 2024

Choose a reason for hiding this comment

pchilano Apr 8, 2024

Choose a reason for hiding this comment

pron Apr 5, 2024

Choose a reason for hiding this comment

pchilano commented Apr 8, 2024

dean-long commented Apr 10, 2024

pron commented Apr 10, 2024

dean-long commented Apr 10, 2024

openjdk-notifier bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

pchilano commented Apr 17, 2024

pchilano commented Apr 17, 2024

openjdk bot commented Apr 17, 2024

openjdk bot commented Apr 17, 2024

vnkozlov commented Apr 17, 2024

pchilano commented Apr 17, 2024

vnkozlov commented Apr 17, 2024

pchilano commented Apr 4, 2024 •

edited by openjdk bot

openjdk bot commented Apr 4, 2024 •

edited

mlbridge bot commented Apr 4, 2024 •

edited