Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8244490: [vector] Move Vector API micro benchmarks under test/micro #77

Closed

Conversation

@mgkwill
Copy link
Contributor

@mgkwill mgkwill commented May 3, 2021

Vector API micro benchmarks are currently located under test/jdk/jdk/incubator/vector/benchmark which makes them rather as "dead" code without possibility to built and run.

The proper location for micro benchmarks is actually test/micro/ directory.
It would be nice to move Vector API benchmarks there so they can be built automatically as part 'test-image' make target.
Once they are built they can be run as
make run-test TEST=micro:BENCHMARK_TEST_NAME


Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Issue

  • JDK-8244490: [vector] Move Vector API micro benchmarks under test/micro

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/panama-vector pull/77/head:pull/77
$ git checkout pull/77

Update a local copy of the PR:
$ git checkout pull/77
$ git pull https://git.openjdk.java.net/panama-vector pull/77/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 77

View PR using the GUI difftool:
$ git pr show -t 77

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/panama-vector/pull/77.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 3, 2021

👋 Welcome back mgkwill! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label May 3, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented May 3, 2021

@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented May 3, 2021

@PaulSandoz @katyapav Tried my hand at a draft of https://bugs.openjdk.java.net/browse/JDK-8244490.

If I am on the right track, any feedback or suggestions would be helpful.

@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented May 3, 2021

@PaulSandoz @katyapav Tried my hand at a draft of https://bugs.openjdk.java.net/browse/JDK-8244490.

If I am on the right track, any feedback or suggestions would be helpful.

I was able to run using make test TEST="micro:Int64Vector" MICRO="FORK=2;WARMUP_ITER=5;" CONF=linux-x86_64-server-release with this patch.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented May 3, 2021

@mgkwill thank you for doing this. I need to check out the PR locally and give it a test drive, but eyeballing it looks like you are on track. It unfortunately means there is some duplication in generation logic, but that seems unavoidable. Moving to this area is a prerequisit for integration into the mainline.

Separate from moving the location, but related to mainline integration, I have concerns as to the number of benchmarks generated. It may be we have to curate a default smaller set, but enable the ability for someone to generate more for local testing. We could consider that later on.

@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented May 3, 2021

@mgkwill thank you for doing this. I need to check out the PR locally and give it a test drive, but eyeballing it looks like you are on track. It unfortunately means there is some duplication in generation logic, but that seems unavoidable. Moving to this area is a prerequisit for integration into the mainline.

Happy to do it. I stumbled upon the issue when I needed to run some of the tests. It seemed like something helpful and I thought it would be nice to get benchmarks into mainline. Let me know how the local test works.

Separate from moving the location, but related to mainline integration, I have concerns as to the number of benchmarks generated. It may be we have to curate a default smaller set, but enable the ability for someone to generate more for local testing. We could consider that later on.

Sounds reasonable to me. More guidance here and I'm happy to help there also.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented May 6, 2021

I looked more in detail, there are some immediate issues, and i think an architectural issue with the split.

The more immediate issues are:

  1. Some benchmark tests fail to build via make. Some are related to bad ASCII characters in comments, some due to compiler warnings, and others due to external dependencies in the pom.xml (namely org.junit.jupiter:junit-jupiter-api). I think all those are fixable. (See https://openjdk.java.net/groups/build/doc/testing.html for configuring, compiling, and executing JMH tests.)
  2. The structure under test/micro is misleading, and does not follow directory to package name convention. This is also fixable. I think we need to remove the maven project and merge in under test/micro/org/openjdk/bench/jdk/incubator/vector. We remove the maven project and place the non-generated benchmark files directly under the aforementioned directory (and fix the ones that fail to compile). Generated benchmarks could be placed under, say, operation.

Initially I thought it was fine that generation scripts were split between unit tests and performance tests, but architecturally, i now think it better to keep one set of bash scripts for generated code, maintained under the vector module. These are complex and splitting will cause divergence. These scripts can generate the operation performance tests under the test/micro directory at the appropriate location.

@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented May 6, 2021

I looked more in detail, there are some immediate issues, and i think an architectural issue with the split.

Thanks for the taking a look @PaulSandoz.

The more immediate issues are:

  1. Some benchmark tests fail to build via make. Some are related to bad ASCII characters in comments, some due to compiler warnings, and others due to external dependencies in the pom.xml (namely org.junit.jupiter:junit-jupiter-api). I think all those are fixable. (See https://openjdk.java.net/groups/build/doc/testing.html for configuring, compiling, and executing JMH tests.)

Noted. I'll take a look and fix.

  1. The structure under test/micro is misleading, and does not follow directory to package name convention. This is also fixable. I think we need to remove the maven project and merge in under test/micro/org/openjdk/bench/jdk/incubator/vector. We remove the maven project and place the non-generated benchmark files directly under the aforementioned directory (and fix the ones that fail to compile). Generated benchmarks could be placed under, say, operation.

+1

Initially I thought it was fine that generation scripts were split between unit tests and performance tests, but architecturally, i now think it better to keep one set of bash scripts for generated code, maintained under the vector module. These are complex and splitting will cause divergence. These scripts can generate the operation performance tests under the test/micro directory at the appropriate location.

Agreed. I was a bit worried about maintenance and divergence also. I'll revert that portion and instead generate under test/micro ... operation.

Updated patch coming in a few days.

mgkwill added 6 commits Jun 7, 2021
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
This reverts commit ef323d7.
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill mgkwill force-pushed the vector_microtest_update branch from ef323d7 to e7c9a1e Jun 8, 2021
@openjdk openjdk bot removed the rfr label Jun 8, 2021
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@openjdk openjdk bot added the rfr label Jun 8, 2021
@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented Jun 8, 2021

Thanks again for your review @PaulSandoz. I set this aside until now due to priority work for jdk17 freeze.

Updated per review.

  1. Reverted original draft refactor, to keep perf. test generation code in original location.
  2. Changed location of generated perf. tests to test/micro/org/openjdk/bench/jdk/incubator/vector/operation
  3. Moved non-generated tests and respective directories to test/micro/org/openjdk/bench/jdk/incubator/vector/.

I.E.:
test/micro/org/openjdk/bench/jdk/incubator/vector/bigdata
test/micro/org/openjdk/bench/jdk/incubator/vector/crypto
test/micro/org/openjdk/bench/jdk/incubator/vector/utf8

  1. Fixed broken tests, now all tests build.
    4a. Fixed comments that used invalid characters
    4b. Refactored to remove junit dependency
    4c. Removed static from some @benchmark methods to fix static method should be qualified by type name, VectorDistance, instead of by an expression compile warning.

  2. Updated package location of all moved files and generated files to org.openjdk.bench.jdk.incubator.vector.<dir name>

  3. Removed pom.xml and other no longer needed files.

I can now build and run all tests using:
make test TEST="micro:org.openjdk.bench.jdk.incubator.vector" MICRO="FORK=2;WARMUP_ITER=5;" CONF=linux-x86_64-server-release

# Run complete. Total time: 04:40:06

Benchmark                                                               (ARRAY_LENGTH)  (dataSize)  (maxBytes)  (size)   Mode  Cnt       Score        Error   Units
o.o.b.j.i.v.bigdata.BooleanArrayCheck.filterAll                                   1024         N/A         N/A     N/A  thrpt    5    3884.266 ?    110.491  ops/ms
o.o.b.j.i.v.bigdata.BooleanArrayCheck.filterAll_vec                               1024         N/A         N/A     N/A  thrpt    5   16060.256 ?   5485.055  ops/ms
o.o.b.j.i.v.bigdata.ValueRangeCheckAndCastL2I.castL2I                             1024         N/A         N/A     N/A  thrpt    5    1470.185 ?     87.374  ops/ms
o.o.b.j.i.v.bigdata.ValueRangeCheckAndCastL2I.castL2I_vec                         1024         N/A         N/A     N/A  thrpt    5    1725.998 ?    763.099  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedScalarDouble                N/A         N/A         N/A     N/A  thrpt    5    3466.435 ?     61.524  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedScalarFloat                 N/A         N/A         N/A     N/A  thrpt    5    2949.337 ?    167.843  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDouble128             N/A         N/A         N/A     N/A  thrpt    5    3831.298 ?    670.538  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDouble256             N/A         N/A         N/A     N/A  thrpt    5    6916.338 ?   1240.536  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorDoubleMax             N/A         N/A         N/A     N/A  thrpt    5    6922.271 ?   1193.232  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloat128              N/A         N/A         N/A     N/A  thrpt    5    7667.293 ?   1627.940  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloat256              N/A         N/A         N/A     N/A  thrpt    5   12881.137 ?   2561.933  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilOptimizedVectorFloatMax              N/A         N/A         N/A     N/A  thrpt    5   12870.951 ?   2536.147  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilScalarDouble                         N/A         N/A         N/A     N/A  thrpt    5    3357.433 ?     77.093  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilScalarFloat                          N/A         N/A         N/A     N/A  thrpt    5    2812.819 ?    321.380  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDouble128                      N/A         N/A         N/A     N/A  thrpt    5    3834.837 ?    865.640  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDouble256                      N/A         N/A         N/A     N/A  thrpt    5    6894.900 ?   1743.191  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorDoubleMax                      N/A         N/A         N/A     N/A  thrpt    5    6906.721 ?   1718.390  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloat128                       N/A         N/A         N/A     N/A  thrpt    5    7443.532 ?   1897.726  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloat256                       N/A         N/A         N/A     N/A  thrpt    5   12136.630 ?   2997.062  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.cosinesimilVectorFloatMax                       N/A         N/A         N/A     N/A  thrpt    5   12155.037 ?   2943.135  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredScalar                                 N/A         N/A         N/A     N/A  thrpt    5    3615.867 ?    188.036  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDouble128                        N/A         N/A         N/A     N/A  thrpt    5    4036.363 ?   1023.672  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDouble256                        N/A         N/A         N/A     N/A  thrpt    5    7766.613 ?   2090.510  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorDoubleMax                        N/A         N/A         N/A     N/A  thrpt    5    7758.038 ?   2196.819  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloat128                         N/A         N/A         N/A     N/A  thrpt    5    8603.290 ?   2372.100  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloat256                         N/A         N/A         N/A     N/A  thrpt    5   17002.021 ?   5948.044  ops/ms
o.o.b.j.i.v.bigdata.VectorDistance.l2SquaredVectorFloatMax                         N/A         N/A         N/A     N/A  thrpt    5   17027.964 ?   5759.068  ops/ms
o.o.b.j.i.v.crypto.ChaChaBench.encrypt128                                          N/A       16384         N/A     N/A  thrpt    8   21730.541 ?   5043.050   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt128                                          N/A       65536         N/A     N/A  thrpt    8    5496.309 ?   1245.802   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt256                                          N/A       16384         N/A     N/A  thrpt    8   40332.908 ?  12166.064   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt256                                          N/A       65536         N/A     N/A  thrpt    8   10304.809 ?   2968.075   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt512                                          N/A       16384         N/A     N/A  thrpt    8     285.604 ?     82.992   ops/s
o.o.b.j.i.v.crypto.ChaChaBench.encrypt512                                          N/A       65536         N/A     N/A  thrpt    8      58.590 ?     16.202   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth128                                           N/A       16384         N/A     N/A  thrpt    8     201.353 ?     67.948   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth128                                           N/A       65536         N/A     N/A  thrpt    8      49.908 ?     15.856   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth256                                           N/A       16384         N/A     N/A  thrpt    8     364.051 ?    108.472   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth256                                           N/A       65536         N/A     N/A  thrpt    8      99.809 ?     33.255   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth512                                           N/A       16384         N/A     N/A  thrpt    8     472.031 ?    149.266   ops/s
o.o.b.j.i.v.crypto.Poly1305Bench.auth512                                           N/A       65536         N/A     N/A  thrpt    8     120.617 ?     35.399   ops/s
o.o.b.j.i.v.operation.Byte128Vector.ABS                                            N/A         N/A         N/A    1024  thrpt    5   16498.980 ?   3687.013  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ABSMasked                                      N/A         N/A         N/A    1024  thrpt    5   13420.584 ?   4553.559  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADD                                            N/A         N/A         N/A    1024  thrpt    5   14389.853 ?   2559.982  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDLanes                                       N/A         N/A         N/A    1024  thrpt    5    5408.304 ?   1057.194  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDMasked                                      N/A         N/A         N/A    1024  thrpt    5   12571.391 ?   3170.842  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ADDMaskedLanes                                 N/A         N/A         N/A    1024  thrpt    5    4300.352 ?    911.484  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND                                            N/A         N/A         N/A    1024  thrpt    5   14197.257 ?   2830.585  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDLanes                                       N/A         N/A         N/A    1024  thrpt    5    5437.373 ?    607.144  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDMasked                                      N/A         N/A         N/A    1024  thrpt    5   12535.445 ?   3085.079  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ANDMaskedLanes                                 N/A         N/A         N/A    1024  thrpt    5    4251.096 ?   1278.371  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND_NOT                                        N/A         N/A         N/A    1024  thrpt    5   11770.249 ?   2634.626  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.AND_NOTMasked                                  N/A         N/A         N/A    1024  thrpt    5   10638.213 ?   3156.234  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHR                                           N/A         N/A         N/A    1024  thrpt    5    2698.630 ?    866.569  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRMasked                                     N/A         N/A         N/A    1024  thrpt    5    2438.774 ?    716.473  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRMaskedShift                                N/A         N/A         N/A    1024  thrpt    5    4884.817 ?   1014.613  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.ASHRShift                                      N/A         N/A         N/A    1024  thrpt    5    6436.859 ?    990.331  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.BITWISE_BLEND                                  N/A         N/A         N/A    1024  thrpt    5   11348.187 ?   3386.170  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.BITWISE_BLENDMasked                            N/A         N/A         N/A    1024  thrpt    5    9926.890 ?   4263.756  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.DIV                                            N/A         N/A         N/A    1024  thrpt    5     169.706 ?    165.121  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.DIVMasked                                      N/A         N/A         N/A    1024  thrpt    5      97.473 ?     77.981  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.EQ                                             N/A         N/A         N/A    1024  thrpt    5    1486.251 ?   2582.369  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.FIRST_NONZERO                                  N/A         N/A         N/A    1024  thrpt    5     491.076 ?    582.534  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.FIRST_NONZEROMasked                            N/A         N/A         N/A    1024  thrpt    5     396.484 ?    535.547  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.GE                                             N/A         N/A         N/A    1024  thrpt    5    1455.821 ?   2516.670  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.GT                                             N/A         N/A         N/A    1024  thrpt    5    1496.344 ?   2613.716  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.IS_DEFAULT                                     N/A         N/A         N/A    1024  thrpt    5    1447.973 ?   2375.492  ops/ms
o.o.b.j.i.v.operation.Byte128Vector.IS_NEGATIVE                                    N/A         N/A         N/A    1024  thrpt    5    1429.487 ?   2438.604  ops/ms

...

o.o.b.j.i.v.operation.ShortScalar.scatter256                                       N/A         N/A         N/A    1024  thrpt    5    1348.385 ?    140.870  ops/ms
o.o.b.j.i.v.operation.ShortScalar.scatter512                                       N/A         N/A         N/A    1024  thrpt    5    1444.297 ?    101.135  ops/ms
o.o.b.j.i.v.operation.ShortScalar.scatterBase0                                     N/A         N/A         N/A    1024  thrpt    5    1682.814 ?     62.974  ops/ms
o.o.b.j.i.v.operation.ShortScalar.zero                                             N/A         N/A         N/A    1024  thrpt    5   24059.679 ?   2452.079  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A      64  thrpt    5     124.008 ?    167.291  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A    1024  thrpt    5       8.291 ?     11.679  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI128                                    N/A         N/A         N/A   65536  thrpt    5       0.132 ?      0.194  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A      64  thrpt    5     104.170 ?    149.013  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A    1024  thrpt    5       5.604 ?      9.367  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI256                                    N/A         N/A         N/A   65536  thrpt    5       0.094 ?      0.132  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A      64  thrpt    5      12.467 ?     14.546  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A    1024  thrpt    5       0.781 ?      0.915  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI512                                    N/A         N/A         N/A   65536  thrpt    5       0.012 ?      0.014  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A      64  thrpt    5      41.097 ?     48.712  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A    1024  thrpt    5       2.555 ?      3.112  ops/ms
o.o.b.j.i.v.operation.SortVector.sortVectorI64                                     N/A         N/A         N/A   65536  thrpt    5       0.041 ?      0.048  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A      64  thrpt    5   34764.547 ?   2124.872  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A    1024  thrpt    5    2604.449 ?     63.086  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.scalar                                    N/A         N/A         N/A    4096  thrpt    5     653.640 ?     24.208  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A      64  thrpt    5   68950.388 ?  27430.053  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A    1024  thrpt    5   15432.548 ?   7166.145  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorInt                                 N/A         N/A         N/A    4096  thrpt    5    4068.423 ?   1865.103  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A      64  thrpt    5   52088.764 ?  56731.840  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A    1024  thrpt    5    4369.735 ?   7629.283  ops/ms
o.o.b.j.i.v.operation.SumOfUnsignedBytes.vectorShort                               N/A         N/A         N/A    4096  thrpt    5    2692.982 ?   3791.073  ops/ms
o.o.b.j.i.v.utf8.DecodeBench.decodeScalar                                          N/A       32768           1     N/A  thrpt    8   54369.096 ?   1063.656   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeScalar                                          N/A     8388608           1     N/A  thrpt    8     151.256 ?      6.312   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVector                                          N/A       32768           1     N/A  thrpt    8    6452.803 ?   2950.714   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVector                                          N/A     8388608           1     N/A  thrpt    8      24.918 ?     12.696   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVectorASCII                                     N/A       32768           1     N/A  thrpt    8  261340.791 ? 102913.900   ops/s
o.o.b.j.i.v.utf8.DecodeBench.decodeVectorASCII                                     N/A     8388608           1     N/A  thrpt    8     293.033 ?     74.431   ops/s

I am getting java.lang.RuntimeException: Incorrect result for DecodeBench.java test run:

java.lang.RuntimeException: Incorrect result
        at org.openjdk.bench.jdk.incubator.vector.utf8.DecodeBench.tearDownInvocation(DecodeBench.java:204)
        at org.openjdk.bench.jdk.incubator.vector.utf8.jmh_generated.DecodeBench_decodeVectorASCII_jmhTest.decodeVectorASCII_thrpt_jmhStub(DecodeBench_decodeVectorASCII_jmhTest.java:127)
        at org.openjdk.bench.jdk.incubator.vector.utf8.jmh_generated.DecodeBench_decodeVectorASCII_jmhTest.decodeVectorASCII_Throughput(DecodeBench_decodeVectorASCII_jmhTest.java:85)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
        at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:831)

I'm looking into it, I'm not sure if this is a lack of hardware issue, test design, or bug. But it should not be failing https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/utf8/DecodeBench.java#L201

@TearDown(Level.Invocation)
    public void tearDownInvocation() {
        out = new String(dst.array());
        if (!in.equals(out)) {
            System.out.println("in  = (" + in.length() + ") \"" + arrayToString(in.getBytes()) + "\"");
            System.out.println("out = (" + out.length() + ") \"" + arrayToString(out.getBytes()) + "\"");
            throw new RuntimeException("Incorrect result");
        }
    }

Previously I was only running Int64Vector perf test, which hid build issues with other tests. Any more feedback is appreciated.

Separate from moving the location, but related to mainline integration, I have concerns as to the number of benchmarks generated. It may be we have to curate a default smaller set, but enable the ability for someone to generate more for local testing. We could consider that later on.

Do you have more thoughts here? Which tests should be included versus excluded?

mgkwill added 2 commits Jun 8, 2021
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
Copy link
Member

@PaulSandoz PaulSandoz left a comment

Thank you. Looks good. Just fix the license headers before integrating.

Now we have the benchmarks integrated into the build system we can consider what it means to "curate".

I am wondering if we should package the vector benchmarks as a separate jar to the other benchmarks. It might sound strange to say that given they were separate to begin with! but they were not integrated with the build system. That likely requires changes to the build system and discussion with the maintainers of the build system. We can start that conversation after integration.

We can follow up with the DecodeBench.java issue after integration.

test/jdk/jdk/incubator/vector/BENCHMARKS.md Outdated Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented Jun 9, 2021

@mgkwill This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8244490: [vector] Move Vector API micro benchmarks under test/micro

Reviewed-by: psandoz

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the vectorIntrinsics branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@PaulSandoz) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Jun 9, 2021
Signed-off-by: Marcus G K Williams <marcus.williams@intel.com>
@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented Jun 10, 2021

Thank you. Looks good. Just fix the license headers before integrating.

Now we have the benchmarks integrated into the build system we can consider what it means to "curate".

Agreed. Perhaps a conversation for the mailing list.

I am wondering if we should package the vector benchmarks as a separate jar to the other benchmarks. It might sound strange to say that given they were separate to begin with! but they were not integrated with the build system. That likely requires changes to the build system and discussion with the maintainers of the build system. We can start that conversation after integration.

+1

We can follow up with the DecodeBench.java issue after integration.

+1

@mgkwill
Copy link
Contributor Author

@mgkwill mgkwill commented Jun 10, 2021

/integrate

@openjdk openjdk bot added the sponsor label Jun 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jun 10, 2021

@mgkwill
Your change (at version a2c7ed8) is now ready to be sponsored by a Committer.

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Jun 10, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Jun 10, 2021

Going to push as commit 86a8d38.

@openjdk
Copy link

@openjdk openjdk bot commented Jun 10, 2021

@PaulSandoz @mgkwill Pushed as commit 86a8d38.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mgkwill mgkwill deleted the vector_microtest_update branch Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants