Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8277893: Arraycopy stress tests #6594

Closed
wants to merge 15 commits into from

Conversation

shipilev
Copy link
Contributor

@shipilev shipilev commented Nov 29, 2021

I would like to fork the new tests off the JDK-8150730. These tests were instrumental in capturing many bugs in my arraycopy work, and I think they are good on their own merit, because they provide a test for the current baseline and on-going minor improvements in arraycopy on all platforms, not only x86_64, and they might be cleanly backportable.

A brief tour of these tests:

  • Tests all data types;
  • Tests small arrays exhaustively, which captures conjoint/disjoint cases, errors near the edges, etc;
  • Tests large arrays with fuzzing around powers of two and powers of ten, both conjoint and disjoint cases;
  • Tests all available compilation modes for arraycopy stubs; for example, running on AVX-512 enabled machine runs all versions down to -XX:UseAVX=0 -XX:UseSSE=0 cases;
  • Tests with/without compressed oops mode -- theoretically only needed for Object copies, but Hotspot cobbles together int+coops and long+no-coops loops, so I decided to alternate coops mode for all data types;

My previous version used individual @run clauses for all configurations, but I think the Java driver is cleaner and easier to maintain.

Test times:

# x86_64 (TR 3970X)
real	4m6.192s
user	52m50.523s
sys	0m13.755s

# x86_64 (TR 3970X) -XX:+UseZGC
real	6m2.573s
user	72m43.541s
sys	0m25.697s

# x86_32 (TR 3970X)
real	6m56.405s
user	92m56.377s
sys	0m6.677s

# x86_64 (i5-11500)
real	29m19.024s
user	103m52.925s
sys	1m7.175s

# AArch64 (ThunderX2)
real	2m59.623s
user	26m14.624s
sys	0m9.771s

Since these tests are quite long, especially on small machines, I hooked them up to hotspot:tier3.

Additional testing:

  • Linux x86_64 fastdebug compiler/stress/arraycopy
  • Linux x86_32 fastdebug compiler/stress/arraycopy
  • Linux AArch64 fastdebug compiler/stress/arraycopy

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6594/head:pull/6594
$ git checkout pull/6594

Update a local copy of the PR:
$ git checkout pull/6594
$ git pull https://git.openjdk.java.net/jdk pull/6594/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6594

View PR using the GUI difftool:
$ git pr show -t 6594

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6594.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Nov 29, 2021

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 29, 2021

@shipilev The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Nov 29, 2021
@shipilev shipilev marked this pull request as ready for review Nov 29, 2021
@openjdk openjdk bot added the rfr label Nov 29, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Nov 29, 2021

Webrevs

@neliasso
Copy link

@neliasso neliasso commented Nov 29, 2021

/label hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler label Nov 29, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 29, 2021

@neliasso
The hotspot-compiler label was successfully added.

@stooart-mon
Copy link
Contributor

@stooart-mon stooart-mon commented Nov 30, 2021

This looks great, thanks Aleksey. This covers all of the cases I'd reasonably expect to see covered.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

I assume that test/micro/org/openjdk/bench/java/lang micros cover all these cases. Otherwise you may need to add some.

test/hotspot/jtreg/TEST.groups Outdated Show resolved Hide resolved
@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Nov 30, 2021

I assume that test/micro/org/openjdk/bench/java/lang micros cover all these cases. Otherwise you may need to add some.

Yes. Performance tests will come separately. This PR covers purely functional tests that verify arraycopies are not foobar-ing array contents, not hitting any asserts, or otherwise crash VMs. Performance tests would run on a limited set of inputs and in release bits, so they are bad for verification like this :)

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Good. Let me test it before approval.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 2, 2021

Most testing passed fine. I am still waiting results on linux-aarch64.

But I got 1 timeout failure running next test on Windows-x64-debug with -XX:+UseZGC:

compiler/arraycopy/stress/TestStressObjectArrayCopy.java

reason: User specified action: run main/othervm/timeout=960 -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI StressArrayCopyDriver TestStressObjectArrayCopy 
Timeout information:
elapsed time (seconds): 3920.744

One flags combination run can take up to 8 min and you have 20 of them:

[2021-12-02T16:33:27.024914600Z] Waiting for completion for process 9764
[2021-12-02T16:41:27.136342300Z] Waiting for completion finished for process 9764

Windows VM image has 12 cores, 46Gb memory and AMD latest CPU. I don't think it HW is issue. But it could be something with OS at that time.

On linux-x64-debug it took: main: 1411.585 seconds running with ZGC. Still more then your specified timeout timeout=960

Please, check it. You can increase timeout or split testing.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 2, 2021

Please, check it. You can increase timeout or split testing.

Yes, thank you, that's exactly why I wanted to do these tests ahead of the actual arraycopy changes. I'll take a look at what can be done. FWIW, my Windows VM running on TR 3970X passed these tests in reasonable time, maybe it is Windows+ZGC-specific problem here.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 2, 2021

Yes, ZGC is definitely affecting it. With ParallelGC on linux-x64 the time was: main: 551.014 seconds

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 2, 2021

Yes, ZGC is definitely affecting it. With ParallelGC on linux-x64 the time was: main: 551.014 seconds

I suspect arraycopy GC barriers, because Shenandoah is also quite a bit slow. In exhaustive tests for small arrays, runtime calls probably dominate. I'll figure it out.

# UseParallelGC
real	6m32.100s
user	54m19.617s
sys	0m36.336s

# UseG1GC
real	6m31.220s
user	55m50.315s
sys	0m19.156s

# UseSerialGC
real	5m53.627s
user	52m59.540s
sys	0m29.012s

# UseShenandoahGC
real	11m1.101s
user	65m26.868s
sys	0m34.472s

# UseZGC
real	15m15.289s
user	73m15.533s
sys	0m31.396s

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 2, 2021

Just let you know that testing (tier1,2,3) on linux-aarch64 passed clean. All testing now finished.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 6, 2021

Yes, definitely GC arraycopy barriers in object array copy cases. Shenandoah cuts the overhead in half by emitting the runtime check on GC state, so it can skip calling to runtime in most cases. ZGC calls to runtime for all object copies. This adds up quite a bit for small object arrays.

I looked whether to trim down the array sizes we feed into these tests, but that does not look compelling to me, as these tests were useful with current settings in arraycopy improvements work. The GC specific overheads also make object array tests take a disproportionate amount of time, wrecking parallelism that might have helped to subsume these penalties.

So, new work:
a) Sets up larger timeouts to cater for slow machines;
b) Rewrites the per-type tests to use a single driver, so it can balance over single-type jobs;
c) Defaults the stress parallelism to N_CPU / 4

New test times are updated in PR body. @vnkozlov, would you like to try this in Oracle infra again?

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 6, 2021

I started testing.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 7, 2021

All passed but I still got big time (bigger than you specified in test) with ZGC on Windows again but it did not failed (may be it passed by very small marging):

TEST: compiler/arraycopy/stress/TestStressArrayCopy.java
  build: 3.814 seconds
  compile: 1.518 seconds
  compile: 2.296 seconds
  build: 0.281 seconds
  compile: 0.281 seconds
  driver: 1.432 seconds
  build: 0.822 seconds
  compile: 0.822 seconds
  main: 4105.356 seconds
TEST RESULT: Passed. Execution successful

On linux-x64 time was : main: 1877.843 seconds

Both machines were AMD so testing was done upto AVX2 only.

Test time on Intel CPU with AVX512 and Windows has much smaller time: main: 1427.786 seconds

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 7, 2021

All passed but I still got big time (bigger than you specified in test) with ZGC on Windows again but it did not failed (may be it passed by very small marging):

I believe hotspot tests run with timeoutFactor=4.0, so this has some leeway still. Want me to bump the timeout to 7200 to fit the "real" time then?

On linux-x64 time was : main: 1877.843 seconds
Both machines were AMD so testing was done upto AVX2 only.
Test time on Intel CPU with AVX512 and Windows has much smaller time: main: 1427.786 seconds

Yeah, as I said above, part of that is ZGC calling into runtime for object arraycopy barriers. On Windows, this also includes saving callee args required for barriers a bit awkwardly. So the combination of ZGC+Windows would be slow. I don't think we can do much about it, except skipping ZGC configs.

The tests themselves default to NCPU/4 parallelism, maybe the AMD-Windows node used a very low parallelism. If you want, try to do /3 or /2 here: https://github.com/openjdk/jdk/pull/6594/files#diff-232c02a1271aca771d1a81615066af6ed4ad8ab6ab55d71ec9ff3e2ec0011582R68.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Dec 9, 2021

@shipilev I missed your answer. Yes, bump the timeout to 7200. And I think we are done after that.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 9, 2021

@shipilev I missed your answer. Yes, bump the timeout to 7200. And I think we are done after that.

Done in new commit. Thanks!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Good.
Something happened with notifications. I also did not get your response.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 14, 2021

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277893: Arraycopy stress tests

Reviewed-by: kvn, mli

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 2 new commits pushed to the master branch:

  • 5179672: 8278953: Clarify Class.getDeclaredConstructor specification
  • 8549d8b: 8277100: Dynamic dump can inadvertently overwrite default CDS archive

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Dec 14, 2021
@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 20, 2021

Rebased to current master. Tests still pass. I think I need a second (R)eviewer to push this, please.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Lgtm. Just some minor comment in test code.

@Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Dec 21, 2021

Thanks for updating, looks good.

@shipilev
Copy link
Contributor Author

@shipilev shipilev commented Dec 21, 2021

GHA are clean for the latest revision. This is test-only PR for JDK 19, so any future problems with it can be resolved next year. Meanwhile, we can get more testing over the end-of-the-year break. I am integrating.

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Dec 21, 2021

Going to push as commit 29bd736.
Since your change was applied there have been 3 commits pushed to the master branch:

  • ff5d417: 8278893: Parallel: Remove GCWorkerDelayMillis
  • 5179672: 8278953: Clarify Class.getDeclaredConstructor specification
  • 8549d8b: 8277100: Dynamic dump can inadvertently overwrite default CDS archive

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated label Dec 21, 2021
@openjdk openjdk bot closed this Dec 21, 2021
@openjdk openjdk bot removed ready rfr labels Dec 21, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 21, 2021

@shipilev Pushed as commit 29bd736.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8277893-arraycopy-tests branch Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-compiler integrated
5 participants