Skip to content

8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails#12420

Closed
plummercj wants to merge 5 commits intoopenjdk:masterfrom
plummercj:8282379_invoke_method
Closed

8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails#12420
plummercj wants to merge 5 commits intoopenjdk:masterfrom
plummercj:8282379_invoke_method

Conversation

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 3, 2023

A number of tests that use JDI invokeMethod() support occasionally fail when run using virtual threads. The tests fail with:

# ERROR: TEST FAILED: wrong invocation:
# ERROR: invoking debuggee thread instance of java.lang.VirtualThread(name='invokemethod010tThr', id=272)
# ERROR: is not suspended again after the invocation 

Although as explained later, this is a misleading message, and does not accurately reflect why the test is failing. More on that below.

The root cause of the failure is due to these tests using JDI invokeMethod support with the INVOKE_SINGLE_THREADED flag. This is not always going to work with virtual threads because the invoked method is doing a Thread.sleep(400), and that is at risk of blocking indefinitely if all other threads are blocked form making progress. Note that technically platform threads could fail in the same manner. However, the reason the failure only happens now with virtual threads is because the implementation of Thread.sleep() differs for virtual threads, and may require ownership of a monitor that sometimes is held by another thread.

Another issue is that the tests do not do a very good job of error handling when this happens, and give the misleading failure reason of the invoked thread not being suspended after the invoke completed. The reason it is not suspended is because the invoke has actually not completed. There was a timeout that the test did not properly note as the cause of the failure. The test (debugger side) spawns a thread to do the JDI invokeMethod with, and then waits for it with:

            invThr.join(argHandler.getWaitTime()*60000);

This join() times out, but the test assumes once it returns the invoke is complete, even though the invoked thread is actually still in the middle of the invoke. So that is the reason debuggee invokemethod thread is not currently suspended. I've fixed this by having a test check if invThr is still alive after the join. If it is, then the test is made to fail at that point, rather than continuing on and checking the debuggee threads status. The failure then becomes:

nsk.share.TestFailure: TEST FAILED: invoke never completed

At that point a vm.resume() is done to allow the invoke to complete, and the test will exit with this failure.

As for avoiding the failure in the first place (the deadlock in the debuggee during the invoke), this is really a test bug for relying on INVOKE_SINGLE_THREADED and assuming that the invoked thread won't become deadlocked. Since there is a Thread.sleep() call in the invoked method, it can't make this assumption. From the ObjectReference.invoke() spec:

"By default, all threads in the target VM are resumed while the method is being invoked if they were previously suspended by an event or by VirtualMachine.suspend() or ThreadReference.suspend(). This is done to prevent the deadlocks that will occur if any of the threads own monitors that will be needed by the invoked method."

"The resumption of other threads during the invocation can be prevented by specifying the INVOKE_SINGLE_THREADED bit flag in the options argument; however, there is no protection against or recovery from the deadlocks described above, so this option should be used with great caution."

For platform threads, sleep() doesn't require any monitors, so these tests never ran into problems before. For virtual threads however there is some synchronization done, and potential reliance on other threads not being suspended. A way around this is to always use sleep(0), which will at least attempt to yield the thread. For platform threads an actual yield is likely. For a virtual thread it will not yield in this particular case because the virtual thread is pinned to the carrier thread due to the jvmti breakpoint callback that is currently in the call chain of the invoked thread. So for virtual threads this effectively the same as sitting in a spin loop with no yielding. This is ok. CPU wasting is not a concern.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12420/head:pull/12420
$ git checkout pull/12420

Update a local copy of the PR:
$ git checkout pull/12420
$ git pull https://git.openjdk.org/jdk pull/12420/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12420

View PR using the GUI difftool:
$ git pr show -t 12420

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12420.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 3, 2023

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8282379 8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails Feb 3, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 3, 2023
@openjdk
Copy link

openjdk bot commented Feb 3, 2023

@plummercj The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Feb 3, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 3, 2023

Webrevs

@dholmes-ora
Copy link
Member

Relying on sleep(0) to avoid the problem with virtual threads seems fragile - what if the action of that changes for VTs in the future? In any case the use of sleep(0) needs to be documented so it doesn't get changed back, or converted to the logically equivalent Thread.yield.

@plummercj
Copy link
Contributor Author

Relying on sleep(0) to avoid the problem with virtual threads seems fragile - what if the action of that changes for VTs in the future? In any case the use of sleep(0) needs to be documented so it doesn't get changed back, or converted to the logically equivalent Thread.yield.

I could just remove the sleep(0) call altogether, or I could make the original sleep(400) call conditional on not being a virtual thread, although I don't like that solution so much. In either case is sounds like you want a comment. A more general one would simply state not to call anything that might block since other threads are suspended. Now that I think of it, it seems like the two display() calls could block since they do I/O, although the way this test is run it seems they never do. I bet if you had some other thread(s) doing display() calls at the same time, and one of them got suspended in the middle of the display, then you might see issues.

@plummercj
Copy link
Contributor Author

I updated one of the tests to get rid of the sleep and added a comment. Let me know what you think. Once it's ok'd I'll apply the same change to the rest of the tests:

c1bcaef

while(!doExit) {
l--; l++;
Thread.currentThread().sleep(400);
Thread.currentThread().sleep(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW this is an anti-pattern - use Thread.sleep(n) - it always applies to the current thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about replacing sleep(0) with yield()?
Do we expect it also causing deadlocks? Was it considered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sleep(0) on a virtual thread ends up doing a tryYield(), which is the same as what is done when calling sleep(0), so it should work, although I'm not so sure it is any better.

@dholmes-ora
Copy link
Member

Okay so these tests are incredibly fragile - as you note display() may be a problem in theory, as could any potential class-loading or initialization. But without the sleep you have a busy-wait loop that may cause problems of its own (e.g. it might trigger on-stack-replacement but the compiler thread may require a resource held by one of the suspended threads!). It is impossible to know that something is 100% safe to do in this kind of situation.

@plummercj
Copy link
Contributor Author

Okay so these tests are incredibly fragile - as you note display() may be a problem in theory, as could any potential class-loading or initialization.

Yes, the more that is done int the called method the more fragile it becomes. The javadoc is pretty clear about the risks of using INVOKE_SINGLE_THREADED, and I think most users (such as IntelliJ) will quickly timeout the invoke, do a vm.resume() to unstick it, and then try the invoke again. The common use of JDI invoke by IDEs is to call toString() on local variables that are references to objects, which means even during common usage the invoke has the potential to do just about anything.

But without the sleep you have a busy-wait loop that may cause problems of its own (e.g. it might trigger on-stack-replacement but the compiler thread may require a resource held by one of the suspended threads!). It is impossible to know that something is 100% safe to do in this kind of situation.

That's not possible since the thread will be in interpOnly mode (because invokes can only be done when suspended at an event).

@dholmes-ora
Copy link
Member

Good to know interpOnly mode at least places some limits on what may happen. The busy-loop is not likely to trigger any deadlock in that case.

I will leave it to you whether to select sleep(0) or no sleep at all.

@plummercj
Copy link
Contributor Author

All the tests now have the sleep() call removed and I added the warning comment.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay by me. Thanks.

@openjdk
Copy link

openjdk bot commented Feb 9, 2023

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8282379: [LOOM] vmTestbase/nsk/jdi/ClassType/invokeMethod/invokemethod011 sometimes fails

Reviewed-by: dholmes, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 66 new commits pushed to the master branch:

  • c72f951: 8301853: C4819 warnings were reported in HotSpot on Windows
  • 36478ee: 8288783: Error messages are confusing when options conflict in -XX:StartFlightRecording
  • 70f3150: 8301443: Clean broken comments from Windows code
  • 5561c39: 8294484: MetalBorder's FrameBorder & DialogBorder have border rendering issues when scaled
  • c8cc7b6: 8301704: Shorten the number of GCs in UnloadingTest.java to verify a class loader not being unloaded
  • dc6d52c: 8301876: Crash in DumpTimeClassInfo::add_verification_constraint
  • 873558e: 8300914: Allow @ as an escape in documentation comments
  • 8a9e383: 8301717: Remove obsolete jib profiles
  • 631a279: 8301567: The test/jdk/java/awt/AppContext/ApplicationThreadsStop/java.policy is unused
  • 638d612: 8298478: (fs) Path.of should allow input to include long path prefix
  • ... and 56 more: https://git.openjdk.org/jdk/compare/5962226cc33de047946aca6522f020c97d663d2f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 9, 2023
@sspitsyn
Copy link
Contributor

sspitsyn commented Feb 9, 2023

Looks good.
Thanks,
Serguei

@plummercj
Copy link
Contributor Author

Thanks for the reviews Serguei and David!

/integrate

@openjdk
Copy link

openjdk bot commented Feb 9, 2023

Going to push as commit f4b72df.
Since your change was applied there have been 80 commits pushed to the master branch:

  • 7fd440d: 8298868: Update EngineCloseOnAlert.java for changes to TLS implementation
  • 7901f45: 8301260: Add system property to toggle XML Signature secure validation mode
  • 597a9a4: 8301822: BasicLookAndFeel does not need to check for null after checking for type
  • 3b05a94: 8301858: Verification error when compiling switch with record patterns
  • e4d1cff: 8300268: ServerImpl allows too many idle connections when using sun.net.httpserver.maxIdleConnections
  • af8973d: Merge
  • e81f20b: 8301989: new javax.swing.text.DefaultCaret().setBlinkRate(N) results in NPE
  • 6f460e4: 8301863: ObjectInputFilter example incorrectly calls rejectUndecidedClass
  • 2caa56a: 8301843: Remove dummy region allocation
  • d401982: 8302121: Parallel: Remove unused arg in PSCardTable::inline_write_ref_field_gc
  • ... and 70 more: https://git.openjdk.org/jdk/compare/5962226cc33de047946aca6522f020c97d663d2f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 9, 2023
@openjdk openjdk bot closed this Feb 9, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 9, 2023
@openjdk
Copy link

openjdk bot commented Feb 9, 2023

@plummercj Pushed as commit f4b72df.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants