Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8273315: Parallelize and increase timeouts for java/foreign/TestMatrix.java test #5358

Closed

Conversation

shipilev
Copy link
Contributor

@shipilev shipilev commented Sep 3, 2021

This test runs a lot of configurations, and spends a lot of time serially. This is especially pronounced when run in prospective tier4 runs (JDK-8273314). There are reports of multi-hour runs (see JDK-8271613). We can parallelize the test configurations for this test to make it hurt less. Also, timeouts need to be increased for TestUpcall test configs, because some of them are very slow in fastdebug mode.

Sample run:

$ time CONF=linux-x86_64-server-fastdebug make run-test TEST=java/foreign/TestMatrix.java | ts -s
00:00:00 Building target 'run-test' in configuration 'linux-x86_64-server-fastdebug'
00:00:03 Test selection 'java/foreign/TestMatrix.java', will run:
00:00:03 * jtreg:test/jdk/java/foreign/TestMatrix.java
00:00:03 
00:00:03 Running test 'jtreg:test/jdk/java/foreign/TestMatrix.java'
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TTFT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FFFF
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FFTF
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TTTT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FTTT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FFTT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FFFT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TTTF
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FTTF
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TFFT
00:00:31 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TFTF
00:00:32 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TFFF
00:00:32 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FTFF
00:00:35 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TFTT
00:00:35 Passed: java/foreign/TestMatrix.java#UpcallHighArity-TTFF
00:00:38 Passed: java/foreign/TestMatrix.java#UpcallHighArity-FTFT
00:01:50 Passed: java/foreign/TestMatrix.java#Downcall-FF
00:02:27 Passed: java/foreign/TestMatrix.java#Downcall-TF
00:03:03 Passed: java/foreign/TestMatrix.java#Downcall-FT
00:03:47 Passed: java/foreign/TestMatrix.java#Downcall-TT
00:04:17 Passed: java/foreign/TestMatrix.java#Upcall-FTFF
00:04:23 Passed: java/foreign/TestMatrix.java#Upcall-TFFF
00:05:46 Passed: java/foreign/TestMatrix.java#Upcall-TTFF
00:06:03 Passed: java/foreign/TestMatrix.java#Upcall-TFFT
00:06:44 Passed: java/foreign/TestMatrix.java#Upcall-FTFT
00:08:24 Passed: java/foreign/TestMatrix.java#Upcall-TFTF
00:08:39 Passed: java/foreign/TestMatrix.java#Upcall-TTFT
00:09:16 Passed: java/foreign/TestMatrix.java#Upcall-FTTF
00:09:19 Passed: java/foreign/TestMatrix.java#Upcall-TFTT
00:10:01 Passed: java/foreign/TestMatrix.java#Upcall-FTTT
00:10:37 Passed: java/foreign/TestMatrix.java#Upcall-TTTF
00:12:16 Passed: java/foreign/TestMatrix.java#Upcall-TTTT
00:12:17 Test results: passed: 32
00:12:21 
00:12:21 ==============================
00:12:21 Test summary
00:12:21 ==============================
00:12:21    TEST                                              TOTAL  PASS  FAIL ERROR   
00:12:21    jtreg:test/jdk/java/foreign/TestMatrix.java          32    32     0     0   
00:12:21 ==============================

real	12m20.538s
user	131m54.043s
sys	0m59.944s

If we don't parallelize, then those 130 minutes are spent serially.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273315: Parallelize and increase timeouts for java/foreign/TestMatrix.java test

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5358/head:pull/5358
$ git checkout pull/5358

Update a local copy of the PR:
$ git checkout pull/5358
$ git pull https://git.openjdk.java.net/jdk pull/5358/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5358

View PR using the GUI difftool:
$ git pr show -t 5358

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5358.diff

@shipilev shipilev changed the title Jdk 8273315 parallel testmatrix 8273315: Parallelize and increase timeouts for java/foreign/TestMatrix.java test Sep 3, 2021
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 3, 2021

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Sep 3, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 3, 2021

@shipilev The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs label Sep 3, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 3, 2021

Webrevs

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Sep 7, 2021

How is this test executed? I think when this was added the idea was that this had to be an advanced test which had to be run explicitly by users, but would not be picked up in the various test groups/tiers. I'm defo not seeing it running in the jdk_foreign group, so it shouldn't run on tier4. I'm not sure what's happening?

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 7, 2021

How is this test executed? I think when this was added the idea was that this had to be an advanced test which had to be run explicitly by users, but would not be picked up in the various test groups/tiers. I'm defo not seeing it running in the jdk_foreign group, so it shouldn't run on tier4. I'm not sure what's happening?

As I wrote in PR, "This is especially pronounced when run in prospective tier4 runs (JDK-8273314)." -- that group was supposed to run all tests not caught by other tiers. That work seems to be stalled a bit. See also JDK-8271613 that is submitted by SAP folks who run it as part of all JDK tests, I guess.

But nevertheless, I think speeding up the test would be even more useful for those who run this test by hand: waiting for 12 minutes instead of 1.5 hours ;)

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Sep 7, 2021

Ok, I see - there's a new configuration for tier4 which runs all tests excluded from tier1/2/3. To me that seems to be the issue. This particular test was never meant to be executed as part of automated build infra. So I'd suggest we disable the test from tier4 first, and then, if you still want to pursue the improvement, we can do that too. On my machine the test doesn't take 1.5 hours.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 7, 2021

if you still want to pursue the improvement, we can do that too.

Yes, I still want to pursue this test improvement.

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Sep 7, 2021

So, what is the policy for defining developers tests that are not meant to be ran on every build infra, but are meant to be run on a more casual basis by developers working in a particular area? When we added TestMatrix we made sure to exclude it from the relevant groups. I suspect other tests might have followed the same approach. But if we have a test that automatically catches all excluded tests, and folks start running this "excluded tests" group by default, what are developers supposed to do?

I guess there's a reason why tests might not have been included in a tier. Defining a blanket rule which re-adds all excluded tests seems like a questionable move to me. Surely in the future, keeping this in mind, developers will probably refrain from pushing these tests to OpenJDK altogether, and store them somewhere private - which doesn't seem great.

So, I understand you have a fix which parallelize the test execution (great!), but it seems like we're talking past each other a bit, in the sense that you (or any other) should have never picked up this test in an automatic test run in the first place.

Also, execution time is part of the picture, albeit the most visible one, since it causes timeouts and failures. But what about CPU cycles? Sure, if we parallelize, we can get better execution time - but you still end up wasting CPU cycles on a test which you are not meant to run in the first place. Is this the right thing to do?

I believe that, ultimately, this is a (test) policy issue, which should be discussed accordingly in an OpenJDK mailing list.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 7, 2021

So, what is the policy for defining developers tests that are not meant to be ran on every build infra, but are meant to be run on a more casual basis by developers working in a particular area? When we added TestMatrix we made sure to exclude it from the relevant groups.

Honestly, I don't know. Maybe there is some jtreg keyword that gates the automatic execution? From the jtreg perspective, either automatic system or human invocation must look the same, so it has to be some external input.

Personally, I was always under impression that if I add a regression test to the source tree, somebody would eventually run it. Hotspot even has the catch-all hotspot_all test group for this. JDK does not have a catch-all test group, but I would not be surprised if such test group existed.

So, I understand you have a fix which parallelize the test execution (great!), but it seems like we're talking past each other a bit, in the sense that you (or any other) should have never picked up this test in an automatic test run in the first place.

Right. But I don't think the test execution policy discussion is that relevant for the test improvement in question. Do we want those who do run this test manually/occasionally enjoy the improved run time due to better parallelism? If so, please consider this PR on this merit alone.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 8, 2021

New commit: marked tests as manual, as per Maurizio's request. This forced me to drop the timeout= clauses, as those are incompatible with manual (jtreg plainly refuses to run these). Since OpenJDK build always passes -a to tests, one would need to run the jtreg separately to get tests to run, which also includes setting up nativepath appropriately. I have put some instructions at the top of the file.

$ time ~/Install/jtreg-6+1/bin/jtreg -jdk:/home/shade/trunks/jdk/build/linux-x86_64-server-fastdebug/images/jdk/ -nativepath:/home/shade/trunks/jdk/build/linux-x86_64-server-fastdebug/support/test/jdk/jtreg/native/lib/ -concurrency:auto ./test/jdk/java/foreign/TestMatrix.java
Test results: passed: 32
Report written to /home/shade/trunks/jdk/JTreport/html/report.html
Results written to /home/shade/trunks/jdk/JTwork

real	12m36.699s
user	127m9.995s
sys	1m1.343s

@ArnoZeller
Copy link
Contributor

@ArnoZeller ArnoZeller commented Sep 8, 2021

New commit: marked tests as manual, as per Maurizio's request. This forced me to drop the timeout= clauses, as those are incompatible with manual (jtreg plainly refuses to run these).

Ok, I am too late, but just for the record: I let your old patch (including timeout adjustments) run in our test infrastructure and it fixed the timeout issue from JDK-8271613.

Probably it could be a better solution to add the stress keyword for slow running or high load tests - like it is done in the hotspot part of the tests. Then automated test run can exclude or select these test by keyword.

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Changes looks good. Whether we want to use manual or @stress I'm not sure. I guess it depends a lot on which parameters are typically used by CI to run those tests.

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Sep 8, 2021

Changes looks good. Whether we want to use manual or @stress I'm not sure. I guess it depends a lot on which parameters are typically used by CI to run those tests.

I note that, for instance, the makefile make/RunTests.gmk does pass the -automatic flag to jtreg, but I don't see any keyword-based exclusion set up there - so I think manual would work well there.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 8, 2021

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273315: Parallelize and increase timeouts for java/foreign/TestMatrix.java test

Reviewed-by: mcimadamore

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 52 new commits pushed to the master branch:

  • faa942c: 8273000: Remove WeakReference-based class initialisation barrier implementation
  • 21012f2: 8078641: MethodHandle.asTypeCache can retain classes from unloading
  • 1855574: 8273038: ChannelInputStream.transferTo() uses FileChannel.transferTo(FileChannel)
  • 6750c34: 8270533: AArch64: size_fits_all_mem_uses should return false if its output is a CAS
  • a66629a: 8254167: G1: Record regions where evacuation failed to provide targeted iteration
  • 286a1f6: 8273440: Zero: Disable runtime/Unsafe/InternalErrorTest.java
  • 7d24a33: 8273318: Some containers/docker/TestJFREvents.java configs are running out of memory
  • 1513dc7: 8271603: Unnecessary Vector usage in java.desktop
  • ea4907a: 8273047: test jfr/api/consumer/TestRecordedFrame.java timing out
  • 4eacdb3: 8273104: Refactoring option parser for UL
  • ... and 42 more: https://git.openjdk.java.net/jdk/compare/aaa6f696b06b335f81efccf0966612b086dd2e73...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 8, 2021
@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 8, 2021

Probably it could be a better solution to add the stress keyword for slow running or high load tests - like it is done in the hotspot part of the tests. Then automated test run can exclude or select these test by keyword.

Thought so too, but I think Maurizio wants this test to run only explicitly by those who want to run it. So there needs to be something that disallows automatic runs completely. "manual" seems to be good at this. Do you agree? If you do, I'll integrate.

@ArnoZeller
Copy link
Contributor

@ArnoZeller ArnoZeller commented Sep 8, 2021

I am fine with this solution - it solves my issue.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Sep 9, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Sep 9, 2021

Going to push as commit dc33bd8.
Since your change was applied there have been 70 commits pushed to the master branch:

  • 5b1dfe4: 8273439: Fix G1CollectedHeap includes and forward declarations
  • 6eba443: 8273387: remove some unreferenced gtk-related functions
  • 5df2648: 8273218: G1: Rename g1EvacuationInfo to g1EvacInfo
  • 12f0b77: 8273516: ProblemList compiler/c2/Test7179138_1.java in -Xcomp mode on win-X64
  • 7fd6b0b: 8273242: (test) Refactor to use TestNG for RuntimeTests ExecCommand tests
  • a5e4def: 8265489: Stress test times out because of long ObjectSynchronizer::monitors_iterate(...) operation
  • 9b5991e: 8273451: Remove unreachable return in mutexLocker::wait
  • e680503: 8273185: Rename the term "atomic" in ReferenceProcessor
  • ba31eee: 8273109: runtime/cds/appcds/loaderConstraints/DynamicLoaderConstraintsTest times out
  • 4d5e1ed: 8272375: Improve phrasing of synthesized descriptions in JavaFX docs
  • ... and 60 more: https://git.openjdk.java.net/jdk/compare/aaa6f696b06b335f81efccf0966612b086dd2e73...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 9, 2021
@openjdk openjdk bot added integrated and removed ready labels Sep 9, 2021
@openjdk openjdk bot removed the rfr label Sep 9, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 9, 2021

@shipilev Pushed as commit dc33bd8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Sep 9, 2021

Thanks for taking care of this!

@shipilev shipilev deleted the JDK-8273315-parallel-testmatrix branch Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs integrated
4 participants