Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read class from object header #12

Closed
wants to merge 55 commits into from
Closed

Conversation

rkennke
Copy link
Collaborator

@rkennke rkennke commented Jul 13, 2021

This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately).

Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much.

It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation.

That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive.

In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded).

Testing:

  • tier1
  • tier2
  • hotspot_gc
    (all without Shenandoah and ZGC, see above)

Progress

  • Change must not contain extraneous whitespace
  • Change must be properly reviewed

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/lilliput pull/12/head:pull/12
$ git checkout pull/12

Update a local copy of the PR:
$ git checkout pull/12
$ git pull https://git.openjdk.java.net/lilliput pull/12/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12

View PR using the GUI difftool:
$ git pr show -t 12

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/lilliput/pull/12.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jul 13, 2021

👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@rkennke rkennke marked this pull request as ready for review Jul 27, 2021
@openjdk openjdk bot added the rfr label Jul 27, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Jul 27, 2021

@rkennke rkennke requested a review from shipilev Jul 29, 2021
Copy link
Collaborator

@shipilev shipilev left a comment

All right, it looks fine for the experimental code. A few questions/comments:

src/hotspot/share/runtime/synchronizer.cpp Outdated Show resolved Hide resolved
src/hotspot/share/runtime/synchronizer.cpp Outdated Show resolved Hide resolved
src/hotspot/share/oops/oop.inline.hpp Outdated Show resolved Hide resolved
src/hotspot/share/oops/markWord.inline.hpp Outdated Show resolved Hide resolved
src/hotspot/share/oops/oop.hpp Outdated Show resolved Hide resolved
@openjdk
Copy link

@openjdk openjdk bot commented Jul 30, 2021

@rkennke This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

Read class from object header

Reviewed-by: shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Jul 30, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Aug 5, 2021

Mailing list message from Dave Dice on lilliput-dev:

On 2021-7-28, at 9:22 AM, Roman Kennke <rkennke at openjdk.java.net<mailto:rkennke at openjdk.java.net>> wrote:

This changes the Hotspot runtime to load the Klass* from the header instead of the dedicated Klass* word. The dedicated word is only still used for verification and for access by generated code (the former will eventually go away, the latter will be implemented separately).

Currently, this means we need to coordinate with the ObjectSynchronizer: when encountering a header that is a stack lock or a monitor, the header is displaced. Worse, if it is a stack-locked that is owned by a thread other than the calling thread, we must first inflate the lock to a full monitor. This is particularily bad for GCs. Luckily, most paths only do this at a safepoint, where we can actually safely access foreign stack locks and don't need to worry about inflation. Notably exception is concurrent marking by G1GC, which can cause inflation of locks, but it doesn't hurt very much.

It's really bad for Shenandoah and ZGC, though: when relocating objects, GC needs to know the object size of the from-space copy. However, this can cause inflation, and inflation creates new WeakHandle in the resulting monitor, and that would be initialized with a from-space copy, which is a no-go during evacuation/relocation.

The following ? ?Compact Java Monitors? -- might provide some relief : https://arxiv.org/pdf/2102.04188.pdf.

As described, it handles just the identity hashCode value, but it?s trivial to extend the idea to displacing the whole lilliput header word.

-Dave

That said, I have been told that work is under way to get rid of displaced headers altogether, which would neatly solve all those problems. I have no desire to make complicated workarounds for Shenandoah GC and ZGC. I disabled both in my own builds for now, and will implement them as soon as the monitor changes arrive.

In a couple of places in GC we need to access the header carefully: when concurrently forwarding (by parallel GC threads), we need to ensure we access the Klass* from an unforwarded header, and must also ensure to avoid re-loading the Klass* once we have the good header (that is why so many asserts have been removed - they would potentially re-load the Klass* from a header that may now be forwarded).

@rkennke
Copy link
Collaborator Author

@rkennke rkennke commented Sep 16, 2021

The following ? ?Compact Java Monitors? -- might provide some relief : https://arxiv.org/pdf/2102.04188.pdf.

As described, it handles just the identity hashCode value, but it?s trivial to extend the idea to displacing the whole lilliput header word.

Thanks, Dave! I will study it!
My current thinking wrt identityHashcode is to not store it in the header at all, but instead append it to the object on-demand. That is, when identityHashCode() is called, it would get re-computed as long as the object does not move, and as soon as it moves, an extra field would get appended to the object (or rather, often it fits in alignment gap at the end), and the hashcode is installed there. This is implemented in prototype here: https://github.com/rkennke/lilliput/tree/compact-hashcode
But, as far as I can see, it doesn't help with the problem that concurrent GCs have with the thread-locks.

@rkennke
Copy link
Collaborator Author

@rkennke rkennke commented Sep 16, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Sep 16, 2021

Going to push as commit 02606f2.

@openjdk openjdk bot closed this Sep 16, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Sep 16, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 16, 2021

@rkennke Pushed as commit 02606f2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 16, 2021

Mailing list message from Dave Dice on lilliput-dev:

On 2021-9-16, at 6:52 AM, Roman Kennke <rkennke at openjdk.java.net<mailto:rkennke at openjdk.java.net>> wrote:

On Thu, 5 Aug 2021 19:42:23 GMT, Dave Dice <dave.dice at oracle.com<mailto:dave.dice at oracle.com>> wrote:

The following ? ?Compact Java Monitors? -- might provide some relief : https://arxiv.org/pdf/2102.04188.pdf.

As described, it handles just the identity hashCode value, but it?s trivial to extend the idea to displacing the whole lilliput header word.

Thanks, Dave! I will study it!
My current thinking wrt identityHashcode is to not store it in the header at all, but instead append it to the object on-demand. That is, when identityHashCode() is called, it would get re-computed as long as the object does not move, and as soon as it moves, an extra field would get appended to the object (or rather, often it fits in alignment gap at the end), and the hashcode is installed there. This is implemented in prototype here: https://github.com/rkennke/lilliput/tree/compact-hashcode

Hi Roman,

I looked through compact-hashcode and, if I?m reading the definitions of ?hashctrl? correctly, this appears to be the tri-state (2-bit) hashCode algorithm from Bacon et al. : https://doi.org/10.1007/3-540-47993-7_5. If that?s actually the case, it?d likely be good to include a citation in the code. It was a good idea 20 years ago, and remains a good idea. The only downside I know of is that you can exhaust memory extending objects in a moving GC, but there are ways to guard against that condition.

The Compact Java Monitor approach is rather agonistic concerning the hashCode (and for that matter, anything else in the header, such as the class/klass information, and age bits). If you use the IBM 2-bit scheme, that?s fine, and if you need to displace it, that?s fine as well.

I hope to send out some results next week comparing the relative performance of a few potential ?synchronized? implementations.

Regards
Dave

But, as far as I can see, it doesn't help with the problem that concurrent GCs have with the thread-locks.

-------------

PR: https://git.openjdk.java.net/lilliput/pull/12

@rkennke
Copy link
Collaborator Author

@rkennke rkennke commented Sep 16, 2021

I looked through compact-hashcode and, if I?m reading the definitions of ?hashctrl? correctly, this appears to be the tri-state (2-bit) hashCode algorithm from Bacon et al. : https://doi.org/10.1007/3-540-47993-7_5. If that?s actually the case, it?d likely be good to include a citation in the code.

That is likely true. I can't access the paper, though, it asks me to pay 26€ ;-) I haven't read the paper, I've adopted the algorithm from talking to an OpenJ9 guy - OJ9 apparently uses a similar algorithm (haven't looked at their code, either).

It was a good idea 20 years ago, and remains a good idea. The only downside I know of is that you can exhaust memory extending objects in a moving GC, but there are ways to guard against that condition.

Right. It can't happen with sliding GCs, but copying GCs could theoretically run into this problem. It seems very very unlikely, but not impossible.

The Compact Java Monitor approach is rather agonistic concerning the hashCode (and for that matter, anything else in the header, such as the class/klass information, and age bits). If you use the IBM 2-bit scheme, that?s fine, and if you need to displace it, that?s fine as well.

I hope to send out some results next week comparing the relative performance of a few potential ?synchronized? implementations.

I am a bit unsure about which direction to go. I also heard that there is work under way to remove JVM-side locking altogether, and use j.u.c instead, which would make the whole displaced-header problem go away. Getting rid of displaced headers would be a huge win. Otherwise we'll have to come up with a way to deal with it in concurrent GCs.

Thanks,
Roman

@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 16, 2021

Mailing list message from Dave Dice on lilliput-dev:

On 2021-9-16, at 1:01 PM, Roman Kennke <rkennke at openjdk.java.net<mailto:rkennke at openjdk.java.net>> wrote:

On Thu, 16 Sep 2021 16:43:30 GMT, Dave Dice <dave.dice at oracle.com<mailto:dave.dice at oracle.com>> wrote:

I looked through compact-hashcode and, if I?m reading the definitions of ?hashctrl? correctly, this appears to be the tri-state (2-bit) hashCode algorithm from Bacon et al. : https://doi.org/10.1007/3-540-47993-7_5. If that?s actually the case, it?d likely be good to include a citation in the code.

That is likely true. I can't access the paper, though, it asks me to pay 26? ;-) I haven't read the paper, I've adopted the algorithm from talking to an OpenJ9 guy - OJ9 apparently uses a similar algorithm (haven't looked at their code, either).

Here?s a non-paywall version of the paper hosted by IBM :

https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon02Space.pdf

It remains a good read.

?

It was a good idea 20 years ago, and remains a good idea. The only downside I know of is that you can exhaust memory extending objects in a moving GC, but there are ways to guard against that condition.

Right. It can't happen with sliding GCs, but copying GCs could theoretically run into this problem. It seems very very unlikely, but not impossible.

Agreed ...

The Compact Java Monitor approach is rather agonistic concerning the hashCode (and for that matter, anything else in the header, such as the class/klass information, and age bits). If you use the IBM 2-bit scheme, that?s fine, and if you need to displace it, that?s fine as well.

I hope to send out some results next week comparing the relative performance of a few potential ?synchronized? implementations.

I am a bit unsure about which direction to go. I also heard that there is work under way to remove JVM-side locking altogether, and use j.u.c instead, which would make the whole displaced-header problem go away. Getting rid of displaced headers would be a huge win. Otherwise we'll have to come up with a way to deal with it in concurrent GCs.

I think the loom folks are certainly interested in replacing synchronized with ReentrantLock-like constructs to avoiding the current pinning of virtual threads to threads.

I?ve experimented (mostly outside the JVM but in a fairly faithful C++ simulacrum) with a number of ideas, all the way from approaches that don?t touch the header at all (aesthetically desirable, but costly) to ones that borrow just a few bits of the header, to ones that still use displacement, but make accessing the displaced value much more sane (CJM). As noted, I hope to send out a rough paper with some data next week.

Regards
Dave

Thanks,
Roman

-------------

PR: https://git.openjdk.java.net/lilliput/pull/12

@rkennke rkennke mentioned this pull request Apr 13, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated
2 participants