Skip to content

Conversation

@TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented May 23, 2024

PPC64 implementation of JDK-8180450. Please review!
I noticed that r_array_length is sometimes 0 and I don't see code for that on x86. Any idea? (This has been addressed in the discussion.)
How can we verify it? By comparing the performance using the micro benchmarks?

Micro benchmark results without patch (measured on Power10 with 2*8 hardware threads):

Original 
SecondarySuperCacheHits: 13.033 ±(99.9%) 0.058 ns/op [Average]
SecondarySuperCacheInterContention.test     avgt   15  432.366 ±  8.364  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  432.310 ±  8.460  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  432.422 ± 10.819  ns/op
SecondarySuperCacheIntraContention.test  avgt   15  355.192 ± 3.597  ns/op
SecondarySupersLookup.testNegative00  avgt   15  12.274 ± 0.026  ns/op
SecondarySupersLookup.testNegative01  avgt   15  12.300 ± 0.039  ns/op
SecondarySupersLookup.testNegative02  avgt   15  12.304 ± 0.034  ns/op
SecondarySupersLookup.testNegative03  avgt   15  12.276 ± 0.050  ns/op
SecondarySupersLookup.testNegative04  avgt   15  12.235 ± 0.044  ns/op
SecondarySupersLookup.testNegative05  avgt   15  12.308 ± 0.156  ns/op
SecondarySupersLookup.testNegative06  avgt   15  12.291 ± 0.048  ns/op
SecondarySupersLookup.testNegative07  avgt   15  12.307 ± 0.052  ns/op
SecondarySupersLookup.testNegative08  avgt   15  12.398 ± 0.075  ns/op
SecondarySupersLookup.testNegative09  avgt   15  12.552 ± 0.122  ns/op
SecondarySupersLookup.testNegative10  avgt   15  12.490 ± 0.083  ns/op
SecondarySupersLookup.testNegative16  avgt   15  12.565 ± 0.092  ns/op
SecondarySupersLookup.testNegative20  avgt   15  19.059 ± 0.958  ns/op
SecondarySupersLookup.testNegative30  avgt   15  19.268 ± 0.124  ns/op
SecondarySupersLookup.testNegative32  avgt   15  20.059 ± 0.114  ns/op
SecondarySupersLookup.testNegative40  avgt   15  25.117 ± 0.368  ns/op
SecondarySupersLookup.testNegative50  avgt   15  32.735 ± 0.359  ns/op
SecondarySupersLookup.testNegative55  avgt   15  34.866 ± 0.152  ns/op
SecondarySupersLookup.testNegative56  avgt   15  35.492 ± 0.276  ns/op
SecondarySupersLookup.testNegative57  avgt   15  36.620 ± 0.334  ns/op
SecondarySupersLookup.testNegative58  avgt   15  37.226 ± 0.180  ns/op
SecondarySupersLookup.testNegative59  avgt   15  37.774 ± 0.241  ns/op
SecondarySupersLookup.testNegative60  avgt   15  38.627 ± 1.451  ns/op
SecondarySupersLookup.testNegative61  avgt   15  39.395 ± 0.249  ns/op
SecondarySupersLookup.testNegative62  avgt   15  40.047 ± 0.377  ns/op
SecondarySupersLookup.testNegative63  avgt   15  40.703 ± 0.416  ns/op
SecondarySupersLookup.testNegative64  avgt   15  41.067 ± 0.216  ns/op
SecondarySupersLookup.testPositive01  avgt   15  12.333 ± 0.037  ns/op
SecondarySupersLookup.testPositive02  avgt   15  12.345 ± 0.042  ns/op
SecondarySupersLookup.testPositive03  avgt   15  12.353 ± 0.075  ns/op
SecondarySupersLookup.testPositive04  avgt   15  12.320 ± 0.021  ns/op
SecondarySupersLookup.testPositive05  avgt   15  12.326 ± 0.046  ns/op
SecondarySupersLookup.testPositive06  avgt   15  12.327 ± 0.063  ns/op
SecondarySupersLookup.testPositive07  avgt   15  12.317 ± 0.040  ns/op
SecondarySupersLookup.testPositive08  avgt   15  12.356 ± 0.042  ns/op
SecondarySupersLookup.testPositive09  avgt   15  12.346 ± 0.044  ns/op
SecondarySupersLookup.testPositive10  avgt   15  12.371 ± 0.040  ns/op
SecondarySupersLookup.testPositive16  avgt   15  12.371 ± 0.078  ns/op
SecondarySupersLookup.testPositive20  avgt   15  12.295 ± 0.048  ns/op
SecondarySupersLookup.testPositive30  avgt   15  12.249 ± 0.170  ns/op
SecondarySupersLookup.testPositive32  avgt   15  12.158 ± 0.026  ns/op
SecondarySupersLookup.testPositive40  avgt   15  12.173 ± 0.049  ns/op
SecondarySupersLookup.testPositive50  avgt   15  12.194 ± 0.053  ns/op
SecondarySupersLookup.testPositive60  avgt   15  12.153 ± 0.032  ns/op
SecondarySupersLookup.testPositive63  avgt   15  12.218 ± 0.088  ns/op
SecondarySupersLookup.testPositive64  avgt   15  12.208 ± 0.038  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  11470.410 ± 81.462  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  10784.154 ± 31.282  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12  11455.959 ± 76.254  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12  10723.693 ± 32.825  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12    152.832 ±  1.474  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt   12    132.049 ±  4.752  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt   12    150.466 ±  7.390  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt   12    134.714 ±  1.128  ms/op

With patch:

SecondarySuperCacheHits.test  avgt   15  13.020 ± 0.006  ns/op
SecondarySuperCacheInterContention.test     avgt   15  423.927 ± 11.187  ns/op
SecondarySuperCacheInterContention.test:t1  avgt   15  427.683 ± 12.014  ns/op
SecondarySuperCacheInterContention.test:t2  avgt   15  420.171 ± 14.093  ns/op
SecondarySuperCacheIntraContention.test  avgt   15  355.758 ± 6.324  ns/op
SecondarySupersLookup.testNegative00  avgt   15  12.192 ± 0.035  ns/op
SecondarySupersLookup.testNegative01  avgt   15  12.218 ± 0.067  ns/op
SecondarySupersLookup.testNegative02  avgt   15  12.191 ± 0.030  ns/op
SecondarySupersLookup.testNegative03  avgt   15  12.196 ± 0.033  ns/op
SecondarySupersLookup.testNegative04  avgt   15  12.213 ± 0.042  ns/op
SecondarySupersLookup.testNegative05  avgt   15  12.173 ± 0.018  ns/op
SecondarySupersLookup.testNegative06  avgt   15  12.197 ± 0.040  ns/op
SecondarySupersLookup.testNegative07  avgt   15  12.178 ± 0.030  ns/op
SecondarySupersLookup.testNegative08  avgt   15  12.197 ± 0.059  ns/op
SecondarySupersLookup.testNegative09  avgt   15  12.159 ± 0.047  ns/op
SecondarySupersLookup.testNegative10  avgt   15  12.157 ± 0.038  ns/op
SecondarySupersLookup.testNegative16  avgt   15  12.190 ± 0.050  ns/op
SecondarySupersLookup.testNegative20  avgt   15  12.178 ± 0.101  ns/op
SecondarySupersLookup.testNegative30  avgt   15  12.228 ± 0.062  ns/op
SecondarySupersLookup.testNegative32  avgt   15  12.196 ± 0.063  ns/op
SecondarySupersLookup.testNegative40  avgt   15  12.203 ± 0.074  ns/op
SecondarySupersLookup.testNegative50  avgt   15  12.177 ± 0.056  ns/op
SecondarySupersLookup.testNegative55  avgt   15  12.212 ± 0.065  ns/op
SecondarySupersLookup.testNegative56  avgt   15  12.205 ± 0.063  ns/op
SecondarySupersLookup.testNegative57  avgt   15  12.215 ± 0.061  ns/op
SecondarySupersLookup.testNegative58  avgt   15  12.196 ± 0.042  ns/op
SecondarySupersLookup.testNegative59  avgt   15  12.211 ± 0.055  ns/op
SecondarySupersLookup.testNegative60  avgt   15  12.409 ± 0.181  ns/op
SecondarySupersLookup.testNegative61  avgt   15  12.445 ± 0.184  ns/op
SecondarySupersLookup.testNegative62  avgt   15  12.461 ± 0.207  ns/op
SecondarySupersLookup.testNegative63  avgt   15  59.441 ± 0.549  ns/op
SecondarySupersLookup.testNegative64  avgt   15  59.834 ± 0.312  ns/op
SecondarySupersLookup.testPositive01  avgt   15  12.176 ± 0.020  ns/op
SecondarySupersLookup.testPositive02  avgt   15  12.188 ± 0.036  ns/op
SecondarySupersLookup.testPositive03  avgt   15  12.207 ± 0.042  ns/op
SecondarySupersLookup.testPositive04  avgt   15  12.214 ± 0.041  ns/op
SecondarySupersLookup.testPositive05  avgt   15  12.239 ± 0.032  ns/op
SecondarySupersLookup.testPositive06  avgt   15  12.193 ± 0.054  ns/op
SecondarySupersLookup.testPositive07  avgt   15  12.182 ± 0.039  ns/op
SecondarySupersLookup.testPositive08  avgt   15  12.190 ± 0.020  ns/op
SecondarySupersLookup.testPositive09  avgt   15  12.195 ± 0.065  ns/op
SecondarySupersLookup.testPositive10  avgt   15  12.178 ± 0.029  ns/op
SecondarySupersLookup.testPositive16  avgt   15  12.202 ± 0.033  ns/op
SecondarySupersLookup.testPositive20  avgt   15  12.192 ± 0.048  ns/op
SecondarySupersLookup.testPositive30  avgt   15  12.150 ± 0.034  ns/op
SecondarySupersLookup.testPositive32  avgt   15  12.141 ± 0.030  ns/op
SecondarySupersLookup.testPositive40  avgt   15  12.159 ± 0.063  ns/op
SecondarySupersLookup.testPositive50  avgt   15  12.174 ± 0.052  ns/op
SecondarySupersLookup.testPositive60  avgt   15  12.178 ± 0.046  ns/op
SecondarySupersLookup.testPositive63  avgt   15  12.200 ± 0.060  ns/op
SecondarySupersLookup.testPositive64  avgt   15  12.179 ± 0.041  ns/op
TypePollution.instanceOfInterfaceSwitchLinearNoSCC          avgt   12  11458.792 ±  31.853  ns/op
TypePollution.instanceOfInterfaceSwitchLinearSCC            avgt   12  10816.996 ± 117.688  ns/op
TypePollution.instanceOfInterfaceSwitchTableNoSCC           avgt   12   7409.874 ±  10.493  ns/op
TypePollution.instanceOfInterfaceSwitchTableSCC             avgt   12   7378.197 ±  10.471  ns/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearNoSCC  avgt   12    153.648 ±   2.088  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchLinearSCC    avgt    9    136.728 ±   1.059  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableNoSCC   avgt    8     74.106 ±   0.272  ms/op
TypePollution.parallelInstanceOfInterfaceSwitchTableSCC     avgt    5     73.969 ±   0.528  ms/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8331117: [PPC64] secondary_super_cache does not scale well (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19368/head:pull/19368
$ git checkout pull/19368

Update a local copy of the PR:
$ git checkout pull/19368
$ git pull https://git.openjdk.org/jdk.git pull/19368/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19368

View PR using the GUI difftool:
$ git pr show -t 19368

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19368.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 23, 2024

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 23, 2024

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331117: [PPC64] secondary_super_cache does not scale well

Reviewed-by: rrich, amitkumar

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 332 new commits pushed to the master branch:

  • 8464ce6: 8332113: Update nsk.share.Log to be always verbose
  • 548e95a: 8330702: Update failure handler to don't generate Error message if cores actions are empty
  • dae0bda: 8334252: Verifier error for lambda declared in early construction context
  • b5212d7: 8328107: Shenandoah/C2: TestVerifyLoopOptimizations test failure
  • efab48c: 8333714: Cleanup the usages of CHECK_EXCEPTION_NULL_FAIL macro in java launcher
  • cc64aea: 8332400: isspace argument should be a valid unsigned char
  • 9b0a5c5: 8333248: VectorGatherMaskFoldingTest.java failed when maximum vector bits is 64
  • 6861766: 8332818: ubsan: archiveHeapLoader.cpp:70:27: runtime error: applying non-zero offset 18446744073707454464 to null pointer
  • b818679: 8293980: Resolve CONSTANT_FieldRef at CDS dump time
  • eb2488f: 8330198: Add some class loading related perf counters to measure VM startup
  • ... and 322 more: https://git.openjdk.org/jdk/compare/e19a421c30534566ba0dea0fa84f812ebeecfc87...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 23, 2024
@openjdk
Copy link

openjdk bot commented May 23, 2024

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 23, 2024
@mlbridge
Copy link

mlbridge bot commented May 23, 2024

@TheRealMDoerr
Copy link
Contributor Author

@theRealAph: It would be great if you could take a look and see if you can spot any bug. Especially, I wonder why r_array_length happens to be 0 in some cases, but x86 doesn't check.

@theRealAph
Copy link
Contributor

@theRealAph: It would be great if you could take a look and see if you can spot any bug. Especially, I wonder why r_array_length happens to be 0 in some cases, but x86 doesn't check.

Why would it not be zero? Some classes don't have secondary super types.
In addition, 12ns is very slow. I don't understand that.

@theRealAph
Copy link
Contributor

PPC64 implementation of JDK-8180450. Please review! I noticed that r_array_length is sometimes 0 and I don't see code for that on x86. Any idea? How can we verify it? By comparing the performance using the micro benchmarks?

Run all of tier1 with -XX:+VerifySecondarySupers

@TheRealMDoerr
Copy link
Contributor Author

TheRealMDoerr commented May 27, 2024

Why would it not be zero? Some classes don't have secondary super types.

I had to check for r_array_length >= 0 here: https://github.com/openjdk/jdk/pull/19368/files#diff-0f708565c9e138b8013165540634368334f5d1df2ba437e39696e9791440050dR2312
The x86 implementation doesn't do that and I wonder why. Doesn't it access stale memory, here?

cmpq(r_super_klass, Address(r_array_base, r_array_index, Address::times_8));

@TheRealMDoerr
Copy link
Contributor Author

12ns is very slow. I don't understand that.

Right, it's surprisingly slow regardless if the patch is applied or not. My x86 machine is about 8x faster. The PPC64 machine isn't optimized for single thread performance. It's configured to use SMT8 (8 threads per core). I guess s390 will achieve better single thread performance.

@theRealAph
Copy link
Contributor

Why would it not be zero? Some classes don't have secondary super types.

I had to check for r_array_length >= 0 here: https://github.com/openjdk/jdk/pull/19368/files#diff-0f708565c9e138b8013165540634368334f5d1df2ba437e39696e9791440050dR2312 The x86 implementation doesn't do that and I wonder why. Doesn't it access stale memory, here?

No, because we already checked that there must be something to look at before calling the slow path.

Invariant: array_length == popcount(bitmap)

cmpq(r_super_klass, Address(r_array_base, r_array_index, Address::times_8));

If there's a bit set in the bitmap, then there must be a corresponding entry in the array. If we get to the slow path there must be at least two bits set in the bitmap. Therefore, at this point, array_length >= 2.

@theRealAph
Copy link
Contributor

cmpdi(CCR0, r_bitmap, (bit + 1) & Klass::SECONDARY_SUPERS_TABLE_MASK);

Why is this a compare, not a bit test?

@TheRealMDoerr
Copy link
Contributor Author

Thank you so much for finding my bug! This explains why I got array_lenght 0. Fixed and added assertion (see 2nd commit).

@TheRealMDoerr
Copy link
Contributor Author

Performance seems to be not affected by that bug. Note that I have used #19427 to run TypePollution micro benchmarks.

@TheRealMDoerr
Copy link
Contributor Author

/label add hotspot-compiler

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label May 28, 2024
@openjdk
Copy link

openjdk bot commented May 28, 2024

@TheRealMDoerr
The hotspot-compiler label was successfully added.

@theRealAph
Copy link
Contributor

Performance seems to be not affected by that bug.

That is extremely suspicious.


li(result, 1); // failure
// We test the MSB of r_array_index, i.e. its sign bit
bgt(CCR0, L_fallthrough);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong. Should be greater or equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Fixed. Thank you!

@offamitkumar
Copy link
Member

@TheRealMDoerr I got one test failure on PPC with these changes:

diff --git a/src/hotspot/share/runtime/globals.hpp b/src/hotspot/share/runtime/globals.hpp
index 6bfb260606b..70897a1066e 100644
--- a/src/hotspot/share/runtime/globals.hpp
+++ b/src/hotspot/share/runtime/globals.hpp
@@ -1988,13 +1988,13 @@ const int ObjectAlignmentInBytes = 8;
                 "rewriting/transformation independently of the JVMTI "      \
                 "can_{retransform/redefine}_classes capabilities.")         \
                                                                             \
-  product(bool, UseSecondarySupersCache, true, DIAGNOSTIC,                  \
+  product(bool, UseSecondarySupersCache, false, DIAGNOSTIC,                  \
                 "Use secondary supers cache during subtype checks.")        \
                                                                             \
-  product(bool, UseSecondarySupersTable, false, DIAGNOSTIC,                 \
+  product(bool, UseSecondarySupersTable, true, DIAGNOSTIC,                 \
                 "Use hash table to lookup secondary supers.")               \
                                                                             \
-  product(bool, VerifySecondarySupers, false, DIAGNOSTIC,                   \
+  product(bool, VerifySecondarySupers, true, DIAGNOSTIC,                   \
           "Check that linear and hashed secondary lookups return the same result.") \
                                                                             \
   product(bool, StressSecondarySupers, false, DIAGNOSTIC,                   \
==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:./test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java
>>                                                       1     0     1     0 <<
==============================
TEST FAILURE

But if I revert the changes I had done, then it passes. Same situation I'm facing on s390x. Is this expected ?

failure log:
type_profile_failure.log

@TheRealMDoerr
Copy link
Contributor Author

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

// points to the length, we don't need to adjust it to point to the
// data.
assert(Array<Klass*>::base_offset_in_bytes() == wordSize, "Adjust this code");
assert(Array<Klass*>::length_offset_in_bytes() == 0, "Adjust this code");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the assertion for Array<Klass*>::length_offset_in_bytes() is needed.
Isn't it sufficient to assert Array<Klass*>::base_offset_in_bytes() == wordSize?
What would break if Array<Klass*>::length_offset_in_bytes() == 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion is pointless. I've removed it.

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Martin,
thanks for the port. It looks good. I've only got a few minor comments.
Cheers, Richard.

u1 super_klass_slot) {
assert_different_registers(r_sub_klass, r_super_klass, temp1, temp2, temp3, temp4, result);

Label L_fallthrough;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L_done would be a better name.

beq(CCR0, L_fallthrough); // (result != 0)

// Linear probe. Rotate the bitmap so that the next bit to test is
// in Bit 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bit 2 that's tested next after the rotation, isn't it? See L2331 in lookup_secondary_supers_table_slow_path

Suggested change
// in Bit 1.
// in Bit 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's rather confusing language. In fact, the bit we just tested is in Bit 1.


LOOKUP_SECONDARY_SUPERS_TABLE_REGISTERS;

Label L_fallthrough;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L_done would be a better name.


ldx(result, r_array_base, r_array_index);
xor_(result, result, r_super_klass);
beq(CCR0, L_fallthrough);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might add a comment success (result == 0).

bool success = false;
u1 super_klass_slot = ((Klass*)$super_con$$constant)->hash_slot();
if (InlineSecondarySupersTest) {
success = __ lookup_secondary_supers_table($sub$$Register, $super_reg$$Register,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

success is always true. Can it be removed?

} while(0)

// Return true: we succeeded in generating this code
bool MacroAssembler::lookup_secondary_supers_table(Register r_sub_klass,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method always returns true. Should even return a value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there to communicate failure, if there was any. Some ports can fail to generate code because of space exhaustion, and we need to communicate this to the caller.

@TheRealMDoerr
Copy link
Contributor Author

Thanks for reviewing! Your suggestions are all good (see my update).

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Thanks, Richard.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 14, 2024
@TheRealMDoerr
Copy link
Contributor Author

Thanks for the review!
I've tried some very simple statistics to check that the first lookup directly hits as expected: java -Xcomp -XX:-TieredCompilation -version
secondary supers direct hit ratio: 537 / 539 (the other 2 ones were hit with the first check in the first slow path loop)

Comment on lines 2156 to 2160
assert(r_super_klass == R3_ARG1 && \
r_array_base == R4_ARG2 && \
r_array_length == R5_ARG3 && \
(r_array_index == R6_ARG4 || r_array_index == noreg) && \
(r_sub_klass == R7_ARG5 || r_sub_klass == noreg) && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can set r_super_klass = R5 and r_sub_klass =R7 to keep consistency in c1_Runtime1_ppc.cpp:

    case slow_subtype_check_id:
      { // Support for uint StubRoutine::partial_subtype_check( Klass sub, Klass super );
        const Register sub_klass = R5,
                       super_klass = R4,
                       temp1_reg = R6,
                       temp2_reg = R0;
        __ check_klass_subtype_slow_path(sub_klass, super_klass, temp1_reg, temp2_reg); // returns with CR0.eq if successful
        __ crandc(CCR0, Assembler::equal, CCR0, Assembler::equal); // failed: CR0.ne
        __ blr();
      }
      break;

I can see this being done for aarch64, x86 and risc-v as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, using the same registers for sub_klass and super_klass as C1 should do no harm. See latest commit.

@TheRealMDoerr
Copy link
Contributor Author

I've added a check for Power7 because JDK-8331859 is not yet implemented. It should get removed by that one again. I think we should have this check in case this PR gets backported.

@theRealAph
Copy link
Contributor

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

I've never seen this. It must be a regression. I'll have a look.

@theRealAph
Copy link
Contributor

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

Ah, I see. The test is doing some IR node counts for Klass loads, and -UseSecondarySupersCache deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this.

Copy link
Member

@offamitkumar offamitkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used it as a base to implement s390x changes, so gone through it multiple times; Also ran tier1 test ( on power 8 machine) with -XX:+UseSecondarySupersTable -XX:+VerifySecondarySupers -XX:+StressSecondarySupers;

LGTM.

@TheRealMDoerr
Copy link
Contributor Author

Thanks for all reviews, testing and comments!
/integrate

@openjdk
Copy link

openjdk bot commented Jun 17, 2024

Going to push as commit 0d1080d.
Since your change was applied there have been 338 commits pushed to the master branch:

  • 113a2c0: 8332903: ubsan: opto/output.cpp:1002:18: runtime error: load of value 171, which is not a valid value for type 'bool'
  • d751441: 8330586: GHA: Drop additional gcc/glibc packages installation for x86_32
  • 5e09397: 8334222: exclude containers/cgroup/PlainRead.java
  • 7b38bfe: 8333729: C2 SuperWord: remove some @requires usages in test/hotspot/jtreg/compiler/loopopts/superword
  • 29b6392: 8334228: C2 SuperWord: fix JDK-24 regression in VPointer::cmp_for_sort after JDK-8325155
  • 31e8deb: 8324781: runtime/Thread/TestAlwaysPreTouchStacks.java failed with Expected a higher ratio between stack committed and reserved
  • 8464ce6: 8332113: Update nsk.share.Log to be always verbose
  • 548e95a: 8330702: Update failure handler to don't generate Error message if cores actions are empty
  • dae0bda: 8334252: Verifier error for lambda declared in early construction context
  • b5212d7: 8328107: Shenandoah/C2: TestVerifyLoopOptimizations test failure
  • ... and 328 more: https://git.openjdk.org/jdk/compare/e19a421c30534566ba0dea0fa84f812ebeecfc87...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 17, 2024
@openjdk openjdk bot closed this Jun 17, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 17, 2024
@openjdk
Copy link

openjdk bot commented Jun 17, 2024

@TheRealMDoerr Pushed as commit 0d1080d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr TheRealMDoerr deleted the 8331117_PPC64_secondary_super_cache branch June 17, 2024 09:30
@TheRealMDoerr
Copy link
Contributor Author

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

Ah, I see. The test is doing some IR node counts for Klass loads, and -UseSecondarySupersCache deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this.

Maybe file an issue and ask the IR test folks to take a look? I don't think this PR is a good place to discuss it.

@reinrich
Copy link
Member

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

Ah, I see. The test is doing some IR node counts for Klass loads, and -UseSecondarySupersCache deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this.

I think the @IR rule can be duplicated using applyIf to distinguish between -XX:-UseSecondarySupersCache and -XX:+UseSecondarySupersCache

@theRealAph
Copy link
Contributor

That doesn't look like a platform specific thing. I'm getting the same result on x86_64. @theRealAph: Is that a known limitation or is it worth a new JBS issue?

Ah, I see. The test is doing some IR node counts for Klass loads, and -UseSecondarySupersCache deletes one of those loads. So it's not actually a behavioural change. I'm not sure what to do about this.

Maybe file an issue and ask the IR test folks to take a look? I don't think this PR is a good place to discuss it.

The tests are not all expected to work with arbitrary combinations of -XX arguments. In many cases, they will surely fail. I'll be working on a followup patch that will remove all uses of secondary_super_cache, and at that point the test will need to be fixed.

(result == R8_ARG6 || result == noreg), "registers must match ppc64.ad"); \
} while(0)

// Return true: we succeeded in generating this code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have removed the return value but not its comment. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I think we can live with it :-)

@theRealAph
Copy link
Contributor

But if I revert the changes I had done, then it passes. Same situation I'm facing on s390x. Is this expected ?

failure log: type_profile_failure.log

Sorry for necro-posting, but I saw that there had never been a reply to this one.

The IR tests that are faliing count the number of CMP nodes in a type check. When we disable the use of secondary_super_cache in C2, we reduce the number of CMP nodes, because we are no longer checking the secondary_super_cache. This is failure OK for now, because it never triggers without diagnostic VM options, but when we remove the secondary_super_cache altogether this test will have to be revised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants