-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8320275: assert(_chunk->bitmap().at(index)) failed: Bit not set at index #16837
Conversation
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
Webrevs
|
Running some extra tests I see the callee can use the argument area to store data that is different from the one passed. This is actually something @fparain told me some time ago. So this simpler solution won't do. Before applying pchilano@42ae926 instead, @dean-long how about if we just prevent c2 from using this stack slot for the caller? |
I don't really like the use of |
I removed the round up in java_calling_convention and do_type_calling_convention. The simpler approach wasn't going to work anyways. I think the callee can use the argument area to store data of a different type than the one passed as argument. So the last stack slot might not contain a narrow oop initially but could later on. |
The thing is that we would need to check before calling |
I tested the last version in mach5 loom-tiers[1-5] and with the failing test. I'll keep running more rounds though since issues with this code are highly intermittent. |
// we need to clear the bits that correspond to arguments as they reside in the caller frame | ||
// or they will keep objects that are otherwise unreachable alive | ||
log_develop_trace(continuations)("clearing bitmap for " INTPTR_FORMAT " - " INTPTR_FORMAT, p2i(start), p2i(start+range)); | ||
address effective_end = UseCompressedOops ? end : align_down(end, wordSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the align_down for correctness, or just for the benefit of the new assert at line 2179? Since it's not immediately obvious, I think it deserves a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because end
is not necessarily word aligned anymore the pointer arithmetic we do in bit_index_for() would be UB, since p
can point to the middle of an oop (in practice we would probably not see any issue because that's implemented as a substraction and then an arithmetic shift right which will round down the result). So we need to align end
down if UseCompressedOops is not set. That last half word part should not contain an oop anyways so the assert is to verify that. I added a comment, please take a look.
OK, the use of Do we really need a version of num_stack_arg_slots() that rounds up? I wish we didn't have duplicate code between java_calling_convention() and Fingerprinter, and unnecessarily different calling conventions between platforms, but those issues could be cleaned up in a separate RFE. |
I think of it as just a range of memory we are passing. I don't immediately see void* as better since that could also point anywhere and not be aligned.
All the other callers in freeze/thaw calculate the size of the argument area in words based on this number (e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good to me. Please get a 2nd review.
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
I would be tempted to put the round up in |
@@ -2298,7 +2309,10 @@ void ThawBase::recurse_thaw_compiled_frame(const frame& hf, frame& caller, int n | |||
// can only fix caller once this frame is thawed (due to callee saved regs); this happens on the stack | |||
_cont.tail()->fix_thawed_frame(caller, SmallRegisterMap::instance); | |||
} else if (_cont.tail()->has_bitmap() && added_argsize > 0) { | |||
clear_bitmap_bits(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf) + frame::metadata_words_at_top, added_argsize); | |||
address start = (address)(heap_frame_top + ContinuationHelper::CompiledFrame::size(hf) + frame::metadata_words_at_top); | |||
int stack_args_slots = f.cb()->as_compiled_method()->method()->num_stack_arg_slots(false /* rounded */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we could trust the added_argsize
value here, but that would require more changes to where rounding is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for having explored the different options to fix this bug.
Thanks for the reviews @dean-long and @fparain! |
/integrate |
Going to push as commit e9e694f.
Your commit was automatically rebased without conflicts. |
/backport jdk22 |
@pchilano the backport was successfully created on the branch backport-pchilano-e9e694f4 in my personal fork of openjdk/jdk22. To create a pull request with this backport targeting openjdk/jdk22:master, just click the following link: The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:
If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22:
|
Please review the following fix. The assert fails while verifying the top frame of the stackChunk before returning from a thaw call. The stackChunk is in gc mode but we found a narrow oop for this c2 compiled frame that doesn't have its corresponding bit set. This is because while thawing its callee we cleared the bitmap range associated with the argument area, but this narrow oop happens to land at the very last stack slot of that region.
Loom code assumes the size of the argument area is always a multiple of 2 stack slots, as SharedRuntime::java_calling_convention() shows. But c2 doesn't seem to follow this convention and, knowing the last passed argument only takes one stack slot, it's using the remaining space to store a narrow oop for the caller. There are more details about the specific crash in JBS.
The initial proposed fix is to just restrict the range of the bitmap we clear by excluding the last stack slot of the argument area, since passed oops are always word aligned. I've also experimented with a patch where I changed SharedRuntime::java_calling_convention() and Fingerprinter::do_type_calling_convention() to not round up the number of stack slots used, and then changed the callers to use a round up value or not depending on the needs [1]. I wasn't convinced it was worthy given we only care about this difference in this Loom code, but I don't mind going with that fix instead. The 3rd alternative would be to just change c2 to not use this stack slot and start spilling at a word aligned offset from the sp.
I run the patch with the failing test and verified the crash doesn't reproduce anymore. I've also run this patch through loom tiers1-5.
Thanks,
Patricio
[1] pchilano@42ae926
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16837/head:pull/16837
$ git checkout pull/16837
Update a local copy of the PR:
$ git checkout pull/16837
$ git pull https://git.openjdk.org/jdk.git pull/16837/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16837
View PR using the GUI difftool:
$ git pr show -t 16837
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16837.diff
Webrev
Link to Webrev Comment