Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8255397: x86: coalesce reference and int entry points into vtos bytecodes #865

Closed
wants to merge 5 commits into from

Conversation

@cl4es
Copy link
Member

@cl4es cl4es commented Oct 26, 2020

On x86 - both 32- and 64-bit - the code laid out for transitionining into a vtos bytecode when having a reference and int top-of-stack state is semantically identical, and can be coalesced.

This patch removes a short jump on some cases which is marginally beneficial when interpreting, while measurably reducing overhead of generating the interpreter itself.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed) (7/9 running) (3/9 running)

Issue

  • JDK-8255397: x86: coalesce reference and int entry points into vtos bytecodes

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/865/head:pull/865
$ git checkout pull/865

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 26, 2020

👋 Welcome back redestad! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk openjdk bot added the rfr label Oct 26, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 26, 2020

@cl4es The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Loading

@openjdk openjdk bot added the hotspot label Oct 26, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 26, 2020

Webrevs

Loading

Copy link
Contributor

@shipilev shipilev left a comment

It rubs me the wrong way that we are effectively changing push_ptr to push_i for aep. While it is implemented in the same manner in interp_masm_x86.cpp -- delegating to push, it still means if push_i implementation changes, aep would do the push_i as if it is integer, not pointer. Ditto a change in push_ptr (adding verification, maybe?) would miss this code.

So, how much of the improvement we are talking about to sacrifice this?

Loading

@coleenp
Copy link
Contributor

@coleenp coleenp commented Oct 27, 2020

Yes, I had the same queasy feeling looking at this. It's just unclear at this callsite that push_i doesn't do the wrong thing for 64 bit.

Loading

@cl4es
Copy link
Member Author

@cl4es cl4es commented Oct 27, 2020

It rubs me the wrong way that we are effectively changing push_ptr to push_i for aep. While it is implemented in the same manner in interp_masm_x86.cpp -- delegating to push, it still means if push_i implementation changes, aep would do the push_i as if it is integer, not pointer. Ditto a change in push_ptr (adding verification, maybe?) would miss this code.

Verification is done explicitly with __ verify_oop(..) and friends, so it seems unlikely we'll overload push_ptr any time soon (and they have been semantically identical for many years, even before the merging of 32- and 64-bit interp_masm_x86...). But I acknowledge this adds a fragility here, but perhaps there are some assertions we can add to put a check that push_ptr and push_i stays semantically the same?

So, how much of the improvement we are talking about to sacrifice this?

A few hundred thousand instructions and branches on Hello World (seems unconditional jumps are logged as branches by perf?):

Baseline:

       103,795,433      instructions              #    0.59  insn per cycle           ( +-  0.07% )
        20,263,519      branches                  #  200.867 M/sec                    ( +-  0.08% )
           731,187      branch-misses             #    3.61% of all branches          ( +-  0.15% )       0.067306367 seconds time elapsed                                          ( +-  0.24% )

Patch:

       103,466,523      instructions              #    0.59  insn per cycle           ( +-  0.07% )
        20,068,162      branches                  #  201.935 M/sec                    ( +-  0.08% )
           727,575      branch-misses             #    3.63% of all branches          ( +-  0.13% )       0.066568115 seconds time elapsed                                          ( +-  0.27% )

For Hello World maybe half of that comes from reduced overhead of generating, the rest from quickening quite a few bytecode transitions. There's a scaling component (seen a few million instruction gains on slightly larger apps), but it's nothing huge.

Loading

@shipilev
Copy link
Contributor

@shipilev shipilev commented Oct 28, 2020

Okay, so that is 0.3% less instructions and ~1% less branches on Hello World. That's interesting.

Would rebalancing the entry points order give the similar improvement without messing up the code? For example, what happens if we move aep to be the last entry point, and set up [bcsi]ep for a short jump?

There is a middle-ground, I think: introduce push_i_or_ptr and delegate it to push. That would make it clear what usages expect push_i and push_ptr shapes to match, and if later it proves to be a problem, we could easily revert all new usages to the old form.

Loading

@@ -605,6 +605,10 @@ void InterpreterMacroAssembler::push_i(Register r) {
push(r);
}

void InterpreterMacroAssembler::push_i_or_ptr(Register r) {
push_i(r);
Copy link
Contributor

@shipilev shipilev Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be push(r): it is both cleaner and avoids a middle call to push_i(r).

Loading

// ptr might be pushed to the stack. This method will never do any
// verification of the oop.
Copy link
Contributor

@shipilev shipilev Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to mention verification here.

Loading

Copy link
Member Author

@cl4es cl4es Oct 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll drop that.

Loading

@cl4es
Copy link
Member Author

@cl4es cl4es commented Oct 28, 2020

Would rebalancing the entry points order give the similar improvement without messing up the code? For example, what happens if we move aep to be the last entry point, and set up [bcsi]ep for a short jump?

Looks like a wash, at least on Hello World

There is a middle-ground, I think: introduce push_i_or_ptr and delegate it to push. That would make it clear what usages expect push_i and push_ptr shapes to match, and if later it proves to be a problem, we could easily revert all new usages to the old form.

Good suggestion.

Loading

Copy link
Contributor

@shipilev shipilev left a comment

This looks good to me. Coleen needs to ack too.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2020

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255397: x86: coalesce reference and int entry points into vtos bytecodes

Reviewed-by: shade, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • 3bd5b80: 8243583: Change 'final' error checks to throw ICCE
  • 1f00c3b: 8255527: Shenandoah: Let ShenadoahGCStateResetter disable barriers
  • 3c4fc79: 8255299: Drop explicit zeroing at instantiation of Atomic* objects
  • 6b2d11b: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic
  • 591e7e2: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Oct 28, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2020

@cl4es This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255397: x86: coalesce reference and int entry points into vtos bytecodes

Reviewed-by: shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • 3bd5b80: 8243583: Change 'final' error checks to throw ICCE
  • 1f00c3b: 8255527: Shenandoah: Let ShenadoahGCStateResetter disable barriers
  • 3c4fc79: 8255299: Drop explicit zeroing at instantiation of Atomic* objects
  • 6b2d11b: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic
  • 591e7e2: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

Copy link
Contributor

@coleenp coleenp left a comment

Yes that was a good suggestion @shipilev and the change looks much better to me now. Thanks!

Loading

@cl4es
Copy link
Member Author

@cl4es cl4es commented Oct 28, 2020

@shipilev @coleenp - thank you for reviewing!

/integrate

Loading

@openjdk openjdk bot closed this Oct 28, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 28, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 28, 2020

@cl4es Since your change was applied there have been 5 commits pushed to the master branch:

  • 3bd5b80: 8243583: Change 'final' error checks to throw ICCE
  • 1f00c3b: 8255527: Shenandoah: Let ShenadoahGCStateResetter disable barriers
  • 3c4fc79: 8255299: Drop explicit zeroing at instantiation of Atomic* objects
  • 6b2d11b: 8255246: AArch64: Implement BigInteger shiftRight and shiftLeft accelerator/intrinsic
  • 591e7e2: 8255378: [Vector API] Remove redundant vector length check after JDK-8254814 and JDK-8255210

Your commit was automatically rebased without conflicts.

Pushed as commit bbf0a31.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants