8180450: secondary_super_cache does not scale well #18309

theRealAph · 2024-03-14T18:24:11Z

This PR is a redesign of subtype checking.

The implementation of subtype checking in the HotSpot JVM is now twenty years old. There have been some performance-related bugs reported, and the only way to fix them is a redesign of the way it works.

So what's changed, so that the old design should be replaced?

Firstly, the computers of today aren't the computers of twenty years ago. It's not merely a matter of speed: the systems are much more parallel, both in the sense of having more cores and each core can run many instructions in parallel. Because of this, the speed ratio between memory accesses and the rate at which we can execute instructions has become wider and wider.

The most severe reported problem is to do with the "secondary supers cache". This is a 1-element per-class cache for interfaces (and arrays of interfaces). Unfortunately, if two threads repeatedly update this cache, the result is that a cache line ping-pongs between cores, causing a severe slowdown.

Also, the linear search for an interface that is absent means that the entire list of interfaces has to be scanned. This plays badly with newer language features such as JEP 406, pattern matching for switch.

However, the computers of today can help us. The very high instruction-per-cycle rate of a Great Big Out-Of-Order (GBOOO) processor allows us to execute many of the instructions of a hash table lookup in parallel, as long as we avoid dependencies between instructions.

The solution

We use a hashed lookup of secondary supers. This is a 64-way hash table, with linear probing for collisions. The table is compressed, in that null entries are removed, and the resulting hash table fits into the same secondary supers array as today's unsorted array of secondary supers. This means that existing code in HotSpot that simply does a linear scan of the secondary supers array does not need to be altered.

We add a bitmap field to each Klass object. This bitmap contains an occupancy bit corresponding to each element of the hash table, with a 1 indicating element presence. As well as allowing the hash table to be decompressed, this bimap is used as a simple kind of Bloom Filter. To determine whether a superclass is present, we simply have to check a single bit in the bitmap. If the bit is clear, we know that the superclass is not present. If the bit is set, we have to do a little arithmetic and then consult the hash table.

It works like this:

       mov    sub_klass, [& sub_klass->_bitmap]
       mov    array_index, sub_klass
       shl    array_index, <hash code of class we're looking for>
  ╭    jns    not_found 
  │    popcnt array_index,array_index
  │    mov    array_base, [& sub_klass->_secondary_supers]
  │    cmp    super_klass, [array_base+array_index*8]
  │    je     found

The popcount instruction returns the cardinality of a bitset. By shifting out the bits higher in number than the hash code of the element we're looking for, we leave only the lower bits. popcount, then, gives us the index of the element we're looking for.

If we don't get a match at the first attempt, we test the next bit in the bitset and jump to a fallback stub:

  │    bt     sub_klass, <next hash code>
  │╭   jae    not_found
  ││   ror    sub_klass, <hash code>
  ││   call   Stub::klass_subtype_fallback
  ││   je     found
  ↘↘

Collisions are rare. Vladimir Ivanov did a survey of Java benchmark code, and it is very rare to see more than about 20 super-interfaces, and even that gives us a collision rate of only about 0.25.

The time taken for a positive lookup is somewhere between 3 - 6 cycles, or about 0.9 - 1.8 ns. This is a robust figure, confirmed across current AArch64 and x86 designs, and this rate can be sustained indefinitely. Negative lookups are slightly faster because there's usually no need to consult the secondary super cache, at about 3 - 5 cycles.

The current secondary super cache lookup is usually slightly faster than a hash table for positive lookups, at about 3 - 4 cycles, but it performs badly with negative lookups, and unless the class you're looking for is in the secondary super cache it performs badly as well. For example, a negative lookup in a class with 4 interfaces takes 10 - 47 cycles.

Limitations and disadvantages

This PR is a subset of what is possible. It is only implemented for C2, in the cases where the superclass is a constant, known at compile time. Given that this is almost all uses of subtype checking, it's not really a restriction.

There is less sharing of interface arrays between classes than there was before. In some cases, I have to make copies of the interfaces array and sort the copies, rather than just using the same array at runtime. This is fixable, and will be fixed in a soon-to-come patch.

I haven't removed the secondary super cache. It's still used by the interpreter and C1.

In the future I'd like to delete the secondary super cache, but it is used in many places across the VM. Today is not that day.

Performance testing

Hashing the secondary supers arrays takes a little time. I've added a perf counter for this, so you can see. It's only really a few milliseconds added to startup.

I have run Renaissance and SPECjvm, and whatever speed differences there may be are immeasurably small, down in the noise. The SecondarySupersLookup benchmark in this PR allows you to isolate single instanceof invocations with various sets of secondary superclasses.

Finally

Vladimir Ivanov was very generous with his time and his advice. He explained the problem and went into detail about his experiments, and shared with me his experimental code. This saved me a great deal of time in this work.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8180450: secondary_super_cache does not scale well (Bug - P3)

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer) ⚠️ Review applies to c43e9c6a
Vladimir Ivanov (@iwanowww - Reviewer) ⚠️ Review applies to c43e9c6a
Dean Long (@dean-long - Reviewer) ⚠️ Review applies to c43e9c6a

Contributors

Vladimir Ivanov <vlivanov@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18309/head:pull/18309
$ git checkout pull/18309

Update a local copy of the PR:
$ git checkout pull/18309
$ git pull https://git.openjdk.org/jdk.git pull/18309/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18309

View PR using the GUI difftool:
$ git pr show -t 18309

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18309.diff

Webrev

Link to Webrev Comment

…JDK-8180450

…DK-8180450

…JDK-8180450

theRealAph · 2024-04-15T12:34:48Z

So it turns out that even without a POPCNT intstruction, this algorithm is still faster than the current linear search in all reasonable cases. I've pushed a change that uses a hand-coded population count.

                           soft popcount      current linear search
Lookup.testPositive01      1.537 ± 0.151      11.955 ±  0.011 ns/op
Lookup.testPositive02      1.535 ± 0.251      12.499 ±  0.029 ns/op
Lookup.testPositive03      2.279 ± 0.141      13.041 ±  0.010 ns/op
Lookup.testPositive04      1.819 ± 0.036      13.579 ±  0.051 ns/op
Lookup.testPositive05      1.537 ± 0.187      15.755 ±  0.079 ns/op
Lookup.testPositive06      3.121 ± 0.015      14.672 ±  0.016 ns/op
Lookup.testPositive07      2.270 ± 0.004      15.215 ±  0.006 ns/op
Lookup.testPositive08      4.445 ± 0.225      15.849 ±  2.889 ns/op
Lookup.testPositive09      3.122 ± 0.001      16.286 ±  8.472 ns/op
Lookup.testPositive10      1.820 ± 0.048      17.030 ± 14.481 ns/op
Lookup.testPositive16      7.228 ± 0.176      19.606 ±  1.130 ns/op
Lookup.testPositive20      3.503 ± 0.016      22.186 ±  4.257 ns/op
Lookup.testPositive30      6.672 ± 0.175      27.960 ±  0.381 ns/op
Lookup.testPositive32     12.031 ± 0.619      28.794 ±  0.107 ns/op
Lookup.testPositive40     12.502 ± 0.242      33.415 ±  0.024 ns/op
Lookup.testPositive50     24.355 ± 2.444      39.612 ± 15.678 ns/op
Lookup.testPositive60     41.386 ± 1.143      44.529 ±  1.298 ns/op
Lookup.testPositive63     68.823 ± 0.992      46.187 ±  0.042 ns/op
Lookup.testPositive64     48.325 ± 2.477      46.721 ±  0.198 ns/op

vnkozlov

Seems fine now.

iwanowww · 2024-04-15T20:58:44Z

Performance testing results look fine.

theRealAph · 2024-04-16T10:54:52Z

Performance testing results look fine.

I wonder, could you do me a little favour? Please run the performance tests with -XX:-UseSecondarySuperCache. Thanks.

theRealAph · 2024-04-16T14:21:06Z

/integrate

openjdk · 2024-04-16T14:21:53Z

Going to push as commit f11a496.
Since your change was applied there have been 79 commits pushed to the master branch:

bfff02e: 8330165: C2: make superword consistently use PhaseIdealLoop::register_new_node()
e073d5b: 8329665: fatal error: memory leak: allocating without ResourceMark
6e77d91: 8330261: Clean up linking steps
61fa4d4: 8330351: Remove the SHARED_LIBRARY and STATIC_LIBRARY macros
56ff87a: 8330359: G1: Remove unused forward declaration in g1BlockOffsetTable.hpp
8a5b86c: 8196106: Support nested infinite or recursive flat mapped streams
58911cc: 8188784: javax/management/notification/BroadcasterSupportDeadlockTest.java - TEST FAILED: deadlock
97c1808: 8329099: Undocumented exception thrown by Instruction factory methods accepting Opcode
def2577: 8330215: Trim working set for OldObjectSamples
2f11afd: 8330172: G1: Consolidate update_bot_for_block and update_bot_for_obj in HeapRegion
... and 69 more: https://git.openjdk.org/jdk/compare/b80ba0851841a8490e61371ac4ef3514dc6eddf5...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-04-16T14:22:04Z

@theRealAph Pushed as commit f11a496.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

iwanowww · 2024-04-16T19:03:29Z

I wonder, could you do me a little favour? Please run the performance tests with -XX:-UseSecondarySuperCache. Thanks.

Sure, I'll let you know once the testing is over.

theRealAph · 2024-04-25T08:56:06Z

/backport jdk22u

openjdk · 2024-04-25T08:57:44Z

@theRealAph Could not automatically backport f11a496d to openjdk/jdk22u due to conflicts in the following files:

src/hotspot/cpu/aarch64/aarch64.ad
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp
src/hotspot/share/memory/universe.cpp
src/hotspot/share/memory/universe.hpp
src/hotspot/share/runtime/abstract_vm_version.hpp
src/hotspot/share/runtime/globals.hpp

Please fetch the appropriate branch/commit and manually resolve these conflicts by using the following commands in your personal fork of openjdk/jdk22u. Note: these commands are just some suggestions and you can use other equivalent commands you know.

# Fetch the up-to-date version of the target branch
$ git fetch --no-tags https://git.openjdk.org/jdk22u.git master:master

# Check out the target branch and create your own branch to backport
$ git checkout master
$ git checkout -b backport-theRealAph-f11a496d

# Fetch the commit you want to backport
$ git fetch --no-tags https://git.openjdk.org/jdk.git f11a496de61d800a680517457eb43b078a633953

# Backport the commit
$ git cherry-pick --no-commit f11a496de61d800a680517457eb43b078a633953
# Resolve conflicts now

# Commit the files you have modified
$ git add files/with/resolved/conflicts
$ git commit -m 'Backport f11a496de61d800a680517457eb43b078a633953'

Once you have resolved the conflicts as explained above continue with creating a pull request towards the openjdk/jdk22u with the title Backport f11a496de61d800a680517457eb43b078a633953.

Below you can find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit f11a496d from the openjdk/jdk repository.

The commit being backported was authored by Andrew Haley on 16 Apr 2024 and was reviewed by Vladimir Kozlov, Vladimir Ivanov and Dean Long.

Thanks!

TheRealMDoerr · 2024-04-25T09:48:02Z

I've filed https://bugs.openjdk.org/browse/JDK-8331117 for PPC64. @bulasevich, @fyang, @amitkumar: You may want to check if it makes sense for your platforms.

offamitkumar · 2024-04-25T11:29:59Z

@TheRealMDoerr I guess you pinged wrong Amit 🙂

JBS Issue for s390x: https://bugs.openjdk.org/browse/JDK-8331126

theRealAph · 2024-04-25T12:11:23Z

I've filed https://bugs.openjdk.org/browse/JDK-8331117 for PPC64. @bulasevich, @fyang, @amitkumar: You may want to check if it makes sense for your platforms.

I think it makes sense everywhere. It's even a win on machines without POPCOUNT, which surprised me. Once you have hashed lookups the secondary supers cache doesn't help at all.

I want to delete the secondary supers cache soon, because it's an additional unnecessary step. @iwanowww did some measurements (DaCapo, Renaissance, SPECjbb2005, SPECjvm2008 on linux-x64/macos-aarch64), and he saw no significant regressions without secondary supers cache.

TheRealMDoerr · 2024-04-25T12:17:27Z

I want to delete the secondary supers cache soon, because it's an additional unnecessary step. @iwanowww did some measurements (DaCapo, Renaissance, SPECjbb2005, SPECjvm2008 on linux-x64/macos-aarch64), and he saw no significant regressions without secondary supers cache.

Thanks for the information. So, we should probably wait for that. Is there a JBS issue already?

theRealAph · 2024-04-25T12:20:32Z

I want to delete the secondary supers cache soon, because it's an additional unnecessary step. @iwanowww did some measurements (DaCapo, Renaissance, SPECjbb2005, SPECjvm2008 on linux-x64/macos-aarch64), and he saw no significant regressions without secondary supers cache.

Thanks for the information. So, we should probably wait for that. Is there a JBS issue already?

No, don't wait! Every port will benefit from this change, now.

theRealAph · 2024-04-29T09:13:15Z

/backport jdk21u

openjdk · 2024-04-29T09:14:55Z

@theRealAph Could not automatically backport f11a496d to openjdk/jdk21u due to conflicts in the following files:

src/hotspot/cpu/aarch64/aarch64.ad
src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp
src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp
src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp
src/hotspot/cpu/x86/stubGenerator_x86_64.cpp
src/hotspot/cpu/x86/stubGenerator_x86_64.hpp
src/hotspot/share/memory/universe.cpp
src/hotspot/share/memory/universe.hpp
src/hotspot/share/opto/c2_globals.hpp
src/hotspot/share/runtime/abstract_vm_version.hpp
src/hotspot/share/runtime/globals.hpp
src/hotspot/share/runtime/stubRoutines.cpp
src/hotspot/share/runtime/stubRoutines.hpp

Please fetch the appropriate branch/commit and manually resolve these conflicts by using the following commands in your personal fork of openjdk/jdk21u. Note: these commands are just some suggestions and you can use other equivalent commands you know.

# Fetch the up-to-date version of the target branch
$ git fetch --no-tags https://git.openjdk.org/jdk21u.git master:master

# Check out the target branch and create your own branch to backport
$ git checkout master
$ git checkout -b backport-theRealAph-f11a496d

# Fetch the commit you want to backport
$ git fetch --no-tags https://git.openjdk.org/jdk.git f11a496de61d800a680517457eb43b078a633953

# Backport the commit
$ git cherry-pick --no-commit f11a496de61d800a680517457eb43b078a633953
# Resolve conflicts now

# Commit the files you have modified
$ git add files/with/resolved/conflicts
$ git commit -m 'Backport f11a496de61d800a680517457eb43b078a633953'

Once you have resolved the conflicts as explained above continue with creating a pull request towards the openjdk/jdk21u with the title Backport f11a496de61d800a680517457eb43b078a633953.

Below you can find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit f11a496d from the openjdk/jdk repository.

The commit being backported was authored by Andrew Haley on 16 Apr 2024 and was reviewed by Vladimir Kozlov, Vladimir Ivanov and Dean Long.

Thanks!

Andrew Haley added 30 commits February 8, 2024 10:00

Test

d6d9811

SecondarySuperMissBackoff.patch

40edcb3

Basically works

9181166

Inter

ad533bc

Inter

7b76caf

Inter

c4980df

Back out fallback counter

e9aea16

Cleanup

05e2ed1

Cleanup

0369693

Register FUBAR

d5b11bd

Seems to work on x86

aa2e26a

Snapshot

d052cb7

Cleanup

13a0610

Minor x86 improvements.

6d10dc1

Minor details

3cf5c0b

Thinko

f67745a

Robin Hood hash

76417ac

Merge branch 'JDK-8180450' of https://github.com/theRealAph/jdk into …

0c23ea5

…JDK-8180450

Cleanup

da342a8

Testing

6d404f4

Testing

f37e768

Testing

2bbd8e5

Stuff

8d267fc

Merge remote-tracking branch 'refs/remotes/origin/JDK-8180450' into J…

2b7a0f2

…DK-8180450

cleanup

6dc4721

JDK-8180450: secondary_super_cache does not scale well

86b7789

Merge branch 'JDK-8180450' of https://github.com/theRealAph/jdk into …

75cb1f9

…JDK-8180450

JDK-8180450: secondary_super_cache does not scale well

5953212

JDK-8180450: secondary_super_cache does not scale well

9d833e2

JDK-8180450: secondary_super_cache does not scale well

fa7f280

JDK-8180450: secondary_super_cache does not scale well

c43e9c6

vnkozlov approved these changes Apr 15, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Apr 15, 2024

iwanowww approved these changes Apr 15, 2024

View reviewed changes

dean-long approved these changes Apr 16, 2024

View reviewed changes

JDK-8180450: secondary_super_cache does not scale well

a622c7d

openjdk bot added the integrated Pull request has been integrated label Apr 16, 2024

openjdk bot closed this Apr 16, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 16, 2024

graalvmbot mentioned this pull request May 16, 2024

[GR-53920] Use hashed lookup in the secondary supers array. oracle/graal#8934

Merged

This was referenced Nov 25, 2024

[Backport] 18309: fix of jdk-8180450 dragonwell-project/dragonwell11#892

Open

[Backport] 18309: type pollution fix dragonwell-project/dragonwell11#893

Open

theRealAph mentioned this pull request Dec 11, 2024

8180450: secondary_super_cache does not scale well openjdk/jdk21u-dev#1090

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8180450: secondary_super_cache does not scale well #18309

8180450: secondary_super_cache does not scale well #18309

theRealAph commented Mar 14, 2024 •

edited by openjdk bot

Loading

theRealAph commented Apr 15, 2024

vnkozlov left a comment

iwanowww commented Apr 15, 2024

theRealAph commented Apr 16, 2024

theRealAph commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

iwanowww commented Apr 16, 2024

theRealAph commented Apr 25, 2024

openjdk bot commented Apr 25, 2024

TheRealMDoerr commented Apr 25, 2024

offamitkumar commented Apr 25, 2024

theRealAph commented Apr 25, 2024

TheRealMDoerr commented Apr 25, 2024

theRealAph commented Apr 25, 2024

theRealAph commented Apr 29, 2024

openjdk bot commented Apr 29, 2024

8180450: secondary_super_cache does not scale well #18309

8180450: secondary_super_cache does not scale well #18309

Conversation

theRealAph commented Mar 14, 2024 • edited by openjdk bot Loading

The solution

Limitations and disadvantages

Performance testing

Finally

Progress

Issue

Reviewers

Contributors

Reviewing

Webrev

theRealAph commented Apr 15, 2024

vnkozlov left a comment

Choose a reason for hiding this comment

iwanowww commented Apr 15, 2024

theRealAph commented Apr 16, 2024

theRealAph commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

openjdk bot commented Apr 16, 2024

iwanowww commented Apr 16, 2024

theRealAph commented Apr 25, 2024

openjdk bot commented Apr 25, 2024

TheRealMDoerr commented Apr 25, 2024

offamitkumar commented Apr 25, 2024

theRealAph commented Apr 25, 2024

TheRealMDoerr commented Apr 25, 2024

theRealAph commented Apr 25, 2024

theRealAph commented Apr 29, 2024

openjdk bot commented Apr 29, 2024

theRealAph commented Mar 14, 2024 •

edited by openjdk bot

Loading