8269240: java/foreign/stackwalk/TestAsyncStackWalk.java test failed with concurrent GC #149
Conversation
👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into |
@JornVernee The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/help label |
@JornVernee Available commands:
|
OptimizedEntryBlob::FrameData* OptimizedEntryBlob::frame_data_for_frame(const frame& frame) const { | ||
ShouldNotCallThis(); | ||
return nullptr; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, stubs were missing on some platforms, and this seems to have been fine before, but now started causing compilation errors, so I've added them here.
@@ -928,6 +928,7 @@ void ThreadSafepointState::handle_polling_page_exception() { | |||
if( nm->is_at_poll_return(real_return_addr) ) { | |||
// See if return type is an oop. | |||
bool return_oop = nm->method()->is_returning_oop(); | |||
HandleMark hm(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing an assert(_handle_mark_nesting > 1) failed: memory leak: allocating handle outside HandleMark
when the existing code allocates the Handle for return_value
in the code down below. It seems to be a simple case of a missing handle mark, so I've added it here. (looking at the stack trace in the error log, the caller frame is native code, so I don't think we can expect the caller to have a HandleMark).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does native code reach a safepoint polling point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack trace looks like this:
Current thread (0x000002a2489f0b50): JavaThread "Thread-4551" [_thread_in_Java, id=24920, stack(0x000000d9e0500000,0x000000d9e0600000)]
Stack: [0x000000d9e0500000,0x000000d9e0600000]
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [jvm.dll+0xae6651] os::platform_print_native_stack+0xf1 (os_windows_x86.cpp:235)
V [jvm.dll+0xda3f25] VMError::report+0x1005 (vmError.cpp:739)
V [jvm.dll+0xda58ae] VMError::report_and_die+0x7fe (vmError.cpp:1549)
V [jvm.dll+0xda5fe4] VMError::report_and_die+0x64 (vmError.cpp:1330)
V [jvm.dll+0x4ceca7] report_vm_error+0xb7 (debug.cpp:282)
V [jvm.dll+0x6511be] HandleArea::allocate_handle+0x3e (handles.cpp:35)
V [jvm.dll+0xb8e334] ThreadSafepointState::handle_polling_page_exception+0x314 (safepoint.cpp:939)
C 0x000002a035d8caa7
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
v ~SafepointBlob
J 639 c1 java.lang.invoke.LambdaForm$MH+0x000000080101b800.invoke(Ljava/lang/Object;)V java.base@18-internal (87 bytes) @ 0x000002a03635f7b4 [0x000002a03635f4a0+0x0000000000000314]
J 620 c1 java.lang.invoke.LambdaForm$MH+0x0000000801018c00.invoke(Ljava/lang/Object;)V java.base@18-internal (37 bytes) @ 0x000002a036353e0c [0x000002a036353720+0x00000000000006ec]
v ~BufferBlob::�mэYсссс��ЫGў�
So I think the 'native code' is something being called by the safepoint blob, but I'm not sure why it's marked with a C
instead of V
in the stack trace (maybe just a stack unwind failure?).
FWIW, this part has already been fixed as part of: #173 (not sure why it still shows up in the diff)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so we're not actually _thread_in_native it is just a chunk of VM generated code. Something still seems off to me about the need for the HandleMark but that isn't your problem.
src/hotspot/share/runtime/thread.cpp
Outdated
#ifdef ASSERT | ||
void JavaThread::verify_frame_info() { | ||
assert((!has_last_Java_frame() && java_call_counter() == 0) || | ||
(has_last_Java_frame() && java_call_counter() > 0), | ||
"unexpected frame info: has_last_frame=%d, java_call_counter=%d", | ||
has_last_Java_frame(), java_call_counter()); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced some duplication here while debugging, but this change is not strictly needed (though a nice cleanup, IMO). Let me know, and I could remove this part if preferred.
e446a4f
to
f8b2a92
Compare
/label remove hotspot |
@JornVernee |
@JornVernee |
@JornVernee |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@@ -1125,6 +1125,8 @@ class JavaThread: public Thread { | |||
void set_requires_cross_modify_fence(bool val) PRODUCT_RETURN NOT_PRODUCT({ _requires_cross_modify_fence = val; }) | |||
|
|||
private: | |||
DEBUG_ONLY(void verify_frame_info();) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you declare verify_frame_info
as returning a bool
(and just put a return true;
at the end of JavaThread::verify_frame_info()
), you can call it as:
assert(verify_frame_info(), "unexpected frame info");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I'd like to keep it the way it is though, so that the assert message contains the has_last_frame
& java_call_counter
values. (this was one of the reason I did this refactor as well, since the assert I was hitting out of those 3 didn't contain that info).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep the assert inside verify_frame_info()
which dumps additional data. (Yes, it's a bit confusing: assert inside an assert :-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR I'm fine with it either way.
@JornVernee This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 106 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jorn,
I can't comment on all the details here - especially in the x86_64 upcall handler code. The mapping to JavaCallWrapper seems reasonable but there are a few differences that I'm now assuming stem from the fact upcalls start _thread_in_native while JCW starts from _thread_in_vm?
Some minor comments and queries below (mostly issues with existing code that you copied).
Thanks,
David
@@ -71,6 +72,93 @@ void ProgrammableUpcallHandler::detach_thread(Thread* thread) { | |||
vm->functions->DetachCurrentThread(vm); | |||
} | |||
|
|||
// modelled after JavaCallWrapper::JavaCallWrapper | |||
Thread* ProgrammableUpcallHandler::on_entry(OptimizedEntryBlob::FrameData* context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should return JavaThread not Thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I've been careful here to return a Thread*
since the result is stored in r15_thread
and I thought converting between sub and super types could potentially result in different pointers due to the way super types are laid out within a subtype. I thought it worked like this:
Subclass
+---
| {Subclass vtable pointer}
| +--- (base class Super)
| | {Super vtable pointer}
| +---
+---
So, I thought plainly using a JavaThread*
in generated machine code where a Thread*
was expected could cause trouble, since the pointer needs to be offset for the type conversion.
But now that I'm looking at some cases with compiler explorer, the pointer offset only seems to be needed when using multiple inheritance, for instance:
class SuperA {
public:
virtual void foo();
};
class SuperB {
public:
virtual void bar();
};
class Sub : public SuperA, public SuperB {
public:
virtual void baz();
};
Results in:
class Sub size(16):
+---
0 | +--- (base class SuperA)
0 | | {vfptr}
| +---
8 | +--- (base class SuperB)
8 | | {vfptr}
| +---
+---
Sub::$vftable@SuperA@:
| &Sub_meta
| 0
0 | &SuperA::foo
1 | &Sub::baz
Sub::$vftable@SuperB@:
| -8
0 | &SuperB::bar
Sub::baz this adjustor: 0
(https://godbolt.org/z/1665fWzff)
It seems that the sub type just reuses the vtable pointer of the first super type (probably to avoid having to do this pointer offsetting). Though, converting between SuperB*
and Sub*
would require offsetting the pointer. I'm still not sure this is guaranteed to work like this with all compilers though (the example is with MSVC, which has options to dump class layouts).
The result of on_entry
is stored in r15_thread
, so I guess I'm wondering if it's safe to store a JavaThread*
instead of a Thread*
in r15
, and other code, which may expect r15
to hold a Thread*
, is guaranteed to keep working? (FWIW, after changing the return type to JavaThread*
the tests that exercise this code still pass on Windows with MSVC, and on WSL Linux with GCC).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I sent the wrong godbolt link: https://godbolt.org/z/1665fWzff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JornVernee I have small correct to your comment. We use simple inheritance for Thread subclasses. Their instances have one vtbl pointer at the same offset as in base class. But this pointer will point to separate vtable for each subclass (and base class). The layout (sequence) of methods pointers in vtable is the same in base class and subclasses. But subclass specific methods pointers will be different.
The only issue is that you have to make sure to cast passed object pointer to correct subclass (or base class). Otherwise you will get incorrect vtable and incorrect virtual methods pointers.
R15 is used by our JIT compiled code and Interpreter code which are executed only in JavaThread so the pinter it contains is JavaThread*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Vladimir. This also matches what I was seeing in compiler explorer, but I wasn't sure if we could assume it always worked like that with every C++ compiler.
It sound like R15 is expected to hold a JavaThread*
though, so making the return type of on_entry
be JavaThread*
as David suggested seems correct.
Thanks
@@ -928,6 +928,7 @@ void ThreadSafepointState::handle_polling_page_exception() { | |||
if( nm->is_at_poll_return(real_return_addr) ) { | |||
// See if return type is an oop. | |||
bool return_oop = nm->method()->is_returning_oop(); | |||
HandleMark hm(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does native code reach a safepoint polling point?
The main difference stems from the fact that, since these upcalls can come from non-JNI native code, we can not assume that the thread is already attached to the VM, so we do that on the fly instead, and we detach the thread again after the upcall (if needed). Those are what the call to The other thing is that there is no stack watermark check at the end of The last thing is that we transition directly from Other than that, I've tried to mimic what Is there anything else that looks different? |
Mailing list message from David Holmes on hotspot-runtime-dev: On 15/07/2021 10:29 pm, Jorn Vernee wrote:
Wow! Okay I've never seen anyone query this before. AFAIK whatever we But I don't deal with the under-the-covers parts of the C++ compiler. David |
1 similar comment
Mailing list message from David Holmes on hotspot-runtime-dev: On 15/07/2021 10:29 pm, Jorn Vernee wrote:
Wow! Okay I've never seen anyone query this before. AFAIK whatever we But I don't deal with the under-the-covers parts of the C++ compiler. David |
David, I've addressed your review comments. For now I went with changing the return type of I tried for a while to implement a static assert to check that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates.
Minor comments below.
David
I've addressed the new review comments, changing some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good.
Thanks,
David
// Returns the current thread as a JavaThread, or NULL if not attached | ||
static inline JavaThread* current_or_null(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't intended to suggest the introduction of this method, but I see now how my comment led to it. It is fine.
Vladimir's review covers that part of the patch (confirmed with him), so I will go ahead and integrate this. |
/integrate |
Going to push as commit 845c31d.
Your commit was automatically rebased without conflicts. |
@JornVernee Pushed as commit 845c31d. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This patch rewrites the prologue and epilogue of panama upcalls, in order to fix the test failure from the title.
Previously, we did a call to potentially attach the current thread to the VM, and then afterwards did the same suspend and stack reguard checks that we do on the back-edge of a native downcall. Then, on the back edge of the upcall we did another conditional call to detach the thread.
The suspend and reguard checks on the front-edge are incorrect, so I've changed the 2 calls to mimic what is done by JavaCallWrapper instead (with attach and detach included), and removed the old suspend and stack reguard checks.
FWIW, this removes the JavaFrameAnchor save/restore MacroAssembler code. This is now written in C++. Also, MacroAssembler code was added to save/restore the result of the upcall around the call on the back-edge, which was previously missing. Since the new code allocates a handle block as well, I've added handling for those oops to frame & OptimizedUpcallBlob.
Testing: local running of
jdk_foreign
on Windows and Linux (WSL). Tier 1-3Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/149/head:pull/149
$ git checkout pull/149
Update a local copy of the PR:
$ git checkout pull/149
$ git pull https://git.openjdk.java.net/jdk17 pull/149/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 149
View PR using the GUI difftool:
$ git pr show -t 149
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/149.diff