Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8285416: [LOOM] Some nsk/jdi tests fail due to needing too many virtual threads #11735

Closed
wants to merge 3 commits into from

Conversation

plummercj
Copy link
Contributor

@plummercj plummercj commented Dec 20, 2022

There are a few nsk debugger tests that pin multiple virtual threads to carrier threads when synchronizing. Sometime the default number of carrier threads (which equals the number of CPUs) is not enough, and the test deadlocks because virtual threads start to wait forever for an available carrier thread. This PR fixes this problem by using the jdk.virtualThreadScheduler.parallelism property to change the default number of carrier threads. I believe the largest number of carrier threads any test needs is 11, so I chose 15 just to be safe.

I had initially tried to fix each individual test by using the test support in VThreadRunner.setParallism(). The advantage of this was limiting the scope of the change to just a few tests, and also being able to specify the exact number of needed carrier threads. The disadvantage was having to make quite a few changes to quite a few tests, plus I had one troublesome test that was still failing, I believe because I didn't fully understand how many carrier threads it needed. Just giving every test 15 carrier threads in the end was a lot easier.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issues

  • JDK-8285416: [LOOM] Some nsk/jdi tests fail due to needing too many virtual threads
  • JDK-8282383: [LOOM] 6 nsk JDI and JDB tests sometimes failing with vthread wrapper due to running out of carrier threads

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11735/head:pull/11735
$ git checkout pull/11735

Update a local copy of the PR:
$ git checkout pull/11735
$ git pull https://git.openjdk.org/jdk pull/11735/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11735

View PR using the GUI difftool:
$ git pr show -t 11735

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11735.diff

…ugh carrier threads if the test pins a lot of threads.
@plummercj
Copy link
Contributor Author

/issue JDK-8282383

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 20, 2022

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8285416 8285416: [LOOM] Some nsk/jdi tests fail due to needing too many virtual threads Dec 20, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 20, 2022
@openjdk
Copy link

openjdk bot commented Dec 20, 2022

@plummercj
Adding additional issue to issue list: 8282383: [LOOM] 6 nsk JDI and JDB tests sometimes failing with vthread wrapper due to running out of carrier threads.

@openjdk
Copy link

openjdk bot commented Dec 20, 2022

@plummercj The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Dec 20, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 20, 2022

Webrevs

@dholmes-ora
Copy link
Member

This sounds like a bug with the underlying executor. If the VT's pin their carrier threads then the executor should increase its parallelism level automatically to compensate for that.

@plummercj
Copy link
Contributor Author

This sounds like a bug with the underlying executor. If the VT's pin their carrier threads then the executor should increase its parallelism level automatically to compensate for that.

It's probably best if @AlanBateman explains this. In the meantime, you might find the following code in VirtualThread.createDefaultScheduler() useful:

        String parallelismValue = System.getProperty("jdk.virtualThreadScheduler.parallelism");
        String maxPoolSizeValue = System.getProperty("jdk.virtualThreadScheduler.maxPoolSize");
        String minRunnableValue = System.getProperty("jdk.virtualThreadScheduler.minRunnable");
        if (parallelismValue != null) {
            parallelism = Integer.parseInt(parallelismValue);
        } else {
            parallelism = Runtime.getRuntime().availableProcessors();
        }
        if (maxPoolSizeValue != null) {
            maxPoolSize = Integer.parseInt(maxPoolSizeValue);
            parallelism = Integer.min(parallelism, maxPoolSize);
        } else {
            maxPoolSize = Integer.max(parallelism, 256);
        }

Also, Alan mentioned the following to me:

"There are two configuration knobs. One is parallelism, the other is maxPoolSize. maxPoolSize is the maximum number of carrier threads. parallelism is really the target parallelism. It's value is the number of hardware threads but it might be increased temporarily during operations that pin threads. So if you are monitoring the number of carriers on an 8 core system then you might see 9 or 10 threads periodically, only to compensate for threads that are pinned."

What's not clear of me is if the "pinning" that happens during synchronization is taken into account with this strategy. I think it might not actually be considered pinning (from an implementation point of view), but does have the same affect of occupying the carrier thread.

@dholmes-ora
Copy link
Member

The JEP defines pinning as expected:

There are two scenarios in which a virtual thread cannot be unmounted during blocking operations because it is pinned to its carrier:

When it executes code inside a synchronized block or method, or
When it executes a native method or a foreign function.

Pinning does not make an application incorrect, but it might hinder its scalability. If a virtual thread performs a blocking operation such as I/O or BlockingQueue.take() while it is pinned, then its carrier and the underlying OS thread are blocked for the duration of the operation. Frequent pinning for long durations can harm the scalability of an application by capturing carriers.

But then also says:

The scheduler does not compensate for pinning by expanding its parallelism.

which contradicts what you quoted from Alan above - though I prefer that behaviour as the JEP's behaviour seems a design flaw to me.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the actual changes what you have seems fine. I wonder whether the tests themselves should be checking the parallelism value is large enough? At the moment there is a disconnect between the number of vthreads a test may create versus the setting of the parallelism level in the launcher/binder code.

@plummercj
Copy link
Contributor Author

@dholmes-ora I initially took the approach of making sure each test that was ever running short of carrier threads explicitly requested the minimum it needed. It resulted in quite a few changes, and I still had one test that was occasionally failing. Possibly I just came up one short in the carrier thread calculation. I abandoned it for this PR because it's much simpler. Here's a PR I just created for those changes in case you want to have a look at what was involved: #11762.

@dholmes-ora
Copy link
Member

in case you want to have a look at what was involved: #11762.

Yeah that isn't nice. I was thinking a more simple check of the property value against the required number of threads. Though based on your comment "nthreads+1" is not always enough so the check would not be sufficient.

@plummercj
Copy link
Contributor Author

Yeah that isn't nice. I was thinking a more simple check of the property value against the required number of threads. Though based on your comment "nthreads+1" is not always enough so the check would not be sufficient.

I think I can get past the nthreads + 1 issue. I believe in one test there is another vthread that I was not accounting for, so probably nthreads + 2 would work for that test. However, I don't quite understand your suggestion. Are you suggesting that after checking the property, if too small then I don't run the test and just have it pass?

@dholmes-ora
Copy link
Member

dholmes-ora commented Dec 22, 2022

I'm suggesting the test reports failure if the parallelism level is too low - with a message to go and change the launcher/binder code.

@plummercj
Copy link
Contributor Author

plummercj commented Dec 22, 2022

I'm suggesting the test reports failure if the parallelism level is too low - with a message to go and change the launcher/binder code.

Are saying that in addition to the changes in this PR I should also change each of the tests to add a check to make sure parallelism is set high enough?

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good besides what David is requesting.
Thanks,
Serguei

@openjdk
Copy link

openjdk bot commented Dec 22, 2022

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8285416: [LOOM] Some nsk/jdi tests fail due to needing too many virtual threads
8282383: [LOOM] 6 nsk JDI and JDB tests sometimes failing with vthread wrapper due to running out of carrier threads

Reviewed-by: dholmes, sspitsyn, alanb, lmesnik

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 102 new commits pushed to the master branch:

  • ea25a56: 8299520: TestPrintXML.java output error messages in case compare fails
  • 92dfc73: 8294526: sun/security/provider/SubjectCodeSource.java no longer referenced
  • 3d0db02: Merge
  • a6a903d: 8288204: GVN Crash: assert() failed: correct memory chain
  • 37f8b05: 8298592: Add java man page documentation for ChaCha20 and Poly1305 intrinsics
  • 245f0cf: 8291302: ARM32: nmethod entry barriers support
  • a9ce772: 8299437: Make InetSocketAddressHolder shallowly immutable
  • 8afd665: 8299395: Remove metaprogramming/removeCV.hpp
  • 3757433: 8295974: jni_FatalError and Xcheck:jni warnings should print the native stack when there are no Java frames
  • 417d01e: 8299441: Fix typos in some test files under core-libs component
  • ... and 92 more: https://git.openjdk.org/jdk/compare/9194e915495434c154ff4cf142d527b163026b3c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 22, 2022
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are saying that in addition to the changes in this PR I should also change each of the tests to add a check to make sure parallelism is set high enough?

I was, but now I see the tests involved and the fact this problem is just an artifact of running those tests in virtual threads, then I really don't want to see those tests polluted with VT specific code. I know more now about the issue with pinning on monitor entry and not being able to increase parallelism for that case - but ideally that would indeed by the fix and I'll look into that some more.

But this fix is approved. Thanks.

@AlanBateman
Copy link
Contributor

The proposed change looks okay and a lot more maintainable than adjusting specific tests.

@plummercj
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Jan 6, 2023

Going to push as commit d6e9f01.
Since your change was applied there have been 147 commits pushed to the master branch:

  • ba03f42: 8299746: Accept unknown signatureAlgorithm in PKCS7 SignerInfo
  • 3dcf700: 8299336: InputStream::DEFAULT_BUFFER_SIZE should be 16384
  • 1e99729: 8299274: Add elements to resolved_references consistently
  • 8cc1669: 8299721: [Vector API] assert in switch-default of LibraryCallKit::arch_supports_vector_rotate is too weak to catch bugs
  • 5598acc: 8294403: [REDO] make test should report only on executed tests
  • 88f0ea7: 8299726: [cleanup] Some code cleanup in opto/compile.hpp
  • 0234f81: 8298447: Unnecessary Vector usage in DocPrintJob implementations
  • cc4936a: 8298720: Insufficient error handling when CodeBuffer is exhausted
  • b5b7948: 8298240: Replace the usage of ImageLayoutException by the CMMException
  • 99be740: 8299306: Test "javax/swing/JFileChooser/FileSystemView/CustomFSVLinkTest.java" fails on Windows 10 x64 because there are some buttons did not display button name
  • ... and 137 more: https://git.openjdk.org/jdk/compare/9194e915495434c154ff4cf142d527b163026b3c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 6, 2023
@openjdk openjdk bot closed this Jan 6, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 6, 2023
@openjdk
Copy link

openjdk bot commented Jan 6, 2023

@plummercj Pushed as commit d6e9f01.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
5 participants