Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index #16837

Closed
wants to merge 5 commits into from

Conversation

pchilano
Copy link
Contributor

@pchilano pchilano commented Nov 28, 2023

Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region.
Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS.

The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp.

I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5.

Thanks,
Patricio

[1] pchilano@42ae926


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16837/head:pull/16837
$ git checkout pull/16837

Update a local copy of the PR:
$ git checkout pull/16837
$ git pull https://git.openjdk.org/jdk.git pull/16837/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16837

View PR using the GUI difftool:
$ git pr show -t 16837

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16837.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 28, 2023

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 28, 2023

@pchilano The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 28, 2023
@pchilano pchilano marked this pull request as ready for review November 28, 2023 00:33
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 28, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 28, 2023

Webrevs

@pchilano
Copy link
Contributor Author

Running some extra tests I see the callee can use the argument area to store data that is different from the one passed. This is actually something @fparain told me some time ago. So this simpler solution won't do. Before applying pchilano@42ae926 instead, @dean-long how about if we just prevent c2 from using this stack slot for the caller?

@dean-long
Copy link
Member

I don't really like the use of address for the pointer that might be either oop or narrowOop, because the bitmap can't really support an arbitrary char* address. I think it would be better to cleanup methods that take intptr_t* and instead use template <typename OopT> like bit_index_for.
Also, I would prefer removing the round up in java_calling_convention, because the current patch fixes a subtle bug with an even subtler work-around, but let's see what other reviews think.

@pchilano
Copy link
Contributor Author

Also, I would prefer removing the round up in java_calling_convention, because the current patch fixes a subtle bug with an even subtler work-around, but let's see what other reviews think.

I removed the round up in java_calling_convention and do_type_calling_convention. The simpler approach wasn't going to work anyways. I think the callee can use the argument area to store data of a different type than the one passed as argument. So the last stack slot might not contain a narrow oop initially but could later on.

@pchilano
Copy link
Contributor Author

I don't really like the use of address for the pointer that might be either oop or narrowOop, because the bitmap can't really support an arbitrary char* address. I think it would be better to cleanup methods that take intptr_t* and instead use template like bit_index_for.

The thing is that we would need to check before calling clear_bitmap_bits() if UseCompressedOops is set to use oop or narrowOop, which makes the code uglier in my opinion. This is a private method anyways and is not meant to be used outside this two callers in thaw code. But I added asserts in clear_bitmap_bits() to verify start and end have the expected alignment. I also added an is_aligned() assert in stackChunkOopDesc::bit_index_for() just for extra caution. While adding this last assert I actually realized I had a bug in my original experiment branch because I missed to align_down end when UseCompressedOops is not set, so thanks for pointing this out.

@pchilano
Copy link
Contributor Author

I tested the last version in mach5 loom-tiers[1-5] and with the failing test. I'll keep running more rounds though since issues with this code are highly intermittent.

// we need to clear the bits that correspond to arguments as they reside in the caller frame
// or they will keep objects that are otherwise unreachable alive
log_develop_trace(continuations)("clearing bitmap for " INTPTR_FORMAT " - " INTPTR_FORMAT, p2i(start), p2i(start+range));
address effective_end = UseCompressedOops ? end : align_down(end, wordSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the align_down for correctness, or just for the benefit of the new assert at line 2179? Since it's not immediately obvious, I think it deserves a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because end is not necessarily word aligned anymore the pointer arithmetic we do in bit_index_for() would be UB, since p can point to the middle of an oop (in practice we would probably not see any issue because that's implemented as a substraction and then an arithmetic shift right which will round down the result). So we need to align end down if UseCompressedOops is not set. That last half word part should not contain an oop anyways so the assert is to verify that. I added a comment, please take a look.

@dean-long
Copy link
Member

OK, the use of address still seems misleading, but I can't think of anything much better, except perhaps void*.

Do we really need a version of num_stack_arg_slots() that rounds up?

I wish we didn't have duplicate code between java_calling_convention() and Fingerprinter, and unnecessarily different calling conventions between platforms, but those issues could be cleaned up in a separate RFE.

@pchilano
Copy link
Contributor Author

pchilano commented Nov 30, 2023

OK, the use of address still seems misleading, but I can't think of anything much better, except perhaps void*.

I think of it as just a range of memory we are passing. I don't immediately see void* as better since that could also point anywhere and not be aligned.

Do we really need a version of num_stack_arg_slots() that rounds up?

All the other callers in freeze/thaw calculate the size of the argument area in words based on this number (e.g. frame::compiled_frame_stack_argsize()). So if we have an odd number of stack slots we would end up calculating the wrong size.

Copy link
Member

@dean-long dean-long left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this looks good to me. Please get a 2nd review.

@openjdk
Copy link

openjdk bot commented Nov 30, 2023

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index

Reviewed-by: dlong, fparain

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 30, 2023
@dean-long
Copy link
Member

I would be tempted to put the round up in compiled_frame_stack_argsize, but it's not a big deal.

@@ -2298,7 +2309,10 @@ void ThawBase::recurse_thaw_compiled_frame(const frame& hf, frame& caller, int n
// can only fix caller once this frame is thawed (due to callee saved regs); this happens on the stack
_cont.tail()->fix_thawed_frame(caller, SmallRegisterMap::instance);
} else if (_cont.tail()->has_bitmap() && added_argsize > 0) {
clear_bitmap_bits(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf) + frame::metadata_words_at_top, added_argsize);
address start = (address)(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf) + frame::metadata_words_at_top);
int stack_args_slots = f.cb()->as_compiled_method()->method()->num_stack_arg_slots(false /* rounded */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we could trust the added_argsize value here, but that would require more changes to where rounding is done.

Copy link
Contributor

@fparain fparain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thank you for having explored the different options to fix this bug.

@pchilano
Copy link
Contributor Author

pchilano commented Dec 1, 2023

Thanks for the reviews @dean-long and @fparain!
I want to give it a couple more rounds of testing in the upper tiers. Since I'll be out on vacation after today that means I'll integrate this once I'm back.

@pchilano
Copy link
Contributor Author

pchilano commented Jan 2, 2024

/integrate

@openjdk
Copy link

openjdk bot commented Jan 2, 2024

Going to push as commit e9e694f.
Since your change was applied there have been 4 commits pushed to the master branch:

  • da1091e: 8322807: Eliminate -Wparentheses warnings in gc code
  • c2477a5: 8322868: java/io/BufferedInputStream/TransferToTrusted.java has bad copyright header
  • 2cf5f01: 8322802: Add testing for ZipFile.getEntry respecting the 'Language encoding' flag
  • 38042ad: 8320971: Use BufferedInputStream.buf directly when param of implTransferTo() is trusted

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 2, 2024
@openjdk openjdk bot closed this Jan 2, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 2, 2024
@openjdk
Copy link

openjdk bot commented Jan 2, 2024

@pchilano Pushed as commit e9e694f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@pchilano
Copy link
Contributor Author

pchilano commented Jan 4, 2024

/backport jdk22

@openjdk
Copy link

openjdk bot commented Jan 4, 2024

@pchilano the backport was successfully created on the branch backport-pchilano-e9e694f4 in my personal fork of openjdk/jdk22. To create a pull request with this backport targeting openjdk/jdk22:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit e9e694f4 from the openjdk/jdk repository.

The commit being backported was authored by Patricio Chilano Mateo on 2 Jan 2024 and was reviewed by Dean Long and Frederic Parain.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22:

$ git fetch https://github.com/openjdk-bots/jdk22.git backport-pchilano-e9e694f4:backport-pchilano-e9e694f4
$ git checkout backport-pchilano-e9e694f4
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk22.git backport-pchilano-e9e694f4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants