Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8283726: x86_64 intrinsics for compareUnsigned method in Integer and Long #9068

Closed
wants to merge 5 commits into from

Conversation

merykitty
Copy link
Member

@merykitty merykitty commented Jun 7, 2022

Hi,

This patch implements intrinsics for Integer/Long::compareUnsigned using the same approach as the JVM does for long and floating-point comparisons. This allows efficient and reliable usage of unsigned comparison in Java, which is a basic operation and is important for range checks such as discussed in #8620 .

Thank you very much.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8283726: x86_64 intrinsics for compareUnsigned method in Integer and Long

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9068/head:pull/9068
$ git checkout pull/9068

Update a local copy of the PR:
$ git checkout pull/9068
$ git pull https://git.openjdk.org/jdk pull/9068/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9068

View PR using the GUI difftool:
$ git pr show -t 9068

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9068.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 7, 2022

👋 Welcome back merykitty! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 7, 2022
@openjdk
Copy link

openjdk bot commented Jun 7, 2022

@merykitty The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jun 7, 2022
@merykitty merykitty changed the title 8283726: x86_64 intrinsics for compare method in Integer and Long 8283726: x86_64 intrinsics for compareUnsigned method in Integer and Long Jun 7, 2022
@merykitty
Copy link
Member Author

/label hotspot-compiler

@mlbridge
Copy link

mlbridge bot commented Jun 7, 2022

Webrevs

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 7, 2022
@openjdk
Copy link

openjdk bot commented Jun 7, 2022

@merykitty
The hotspot-compiler label was successfully added.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add microbenchmark and show its results.

src/hotspot/share/opto/subnode.hpp Outdated Show resolved Hide resolved
@merykitty
Copy link
Member Author

I have added a benchmark for the intrinsic. The result is as follows, thanks a lot:

                                                Before          After
Benchmark                 (size)  Mode  Cnt  Score   Error  Score   Error  Units
Integers.compareUnsigned     500  avgt   15  0.527 ± 0.002  0.498 ± 0.011  us/op
Longs.compareUnsigned        500  avgt   15  0.677 ± 0.014  0.561 ± 0.006  us/op

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I submitted testing.
You need second review.

@openjdk
Copy link

openjdk bot commented Jun 8, 2022

@merykitty This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283726: x86_64 intrinsics for compareUnsigned method in Integer and Long

Reviewed-by: kvn, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 321 new commits pushed to the master branch:

  • b96ba19: 8289182: NMT: MemTracker::baseline should return void
  • 779b4e1: 8287001: Add warning message when fail to load hsdis libraries
  • 910053b: 8280235: Deprecated flag FlightRecorder missing from VMDeprecatedOptions test
  • 7b3bf97: 8289401: Add dump output to TestRawRSACipher.java
  • 86dc760: Merge
  • 1504804: 8289398: ProblemList jdk/jfr/api/consumer/recordingstream/TestOnEvent.java on linux-x64 again
  • 9b7805e: 8289069: Very slow C1 arraycopy jcstress tests after JDK-8279886
  • c42b796: 8288058: Broken links on constant-values page
  • a814293: 8275784: Bogus warning generated for record with compact constructor
  • 6f9717b: 8288836: (fs) Files.writeString spec for IOException has "specified charset" when no charset is provided
  • ... and 311 more: https://git.openjdk.org/jdk/compare/645be42f76b8983a9096ed90caa70b5c59dd822c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @jatin-bhateja) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 8, 2022
@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 8, 2022

Tier1-4 testing passed - no new failures. I suggest to push it into JDK 20 after fork and after you get second review.

@sviswa7
Copy link

sviswa7 commented Jun 10, 2022

@merykitty Could you please also add the micro benchmark where compareUnsigned result is stored directly in an integer and show the performance of that?

@merykitty
Copy link
Member Author

Thanks @sviswa7 for the suggestion, the results of getting the value of compareUnsigned directly is as follow:

                                                       Before          After
Benchmark                       (size)  Mode  Cnt   Score   Error  Score   Error  Units
Integers.compareUnsignedDirect     500  avgt   15   0.639 ± 0.022  0.626 ± 0.002  us/op
Longs.compareUnsignedDirect        500  avgt   15   0.672 ± 0.011  0.609 ± 0.004  us/op

Comment on lines +13041 to +13043
__ cmpl($src1$$Register, $src2$$Register);
__ movl($dst$$Register, -1);
__ jccb(Assembler::below, done);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By placing compare adjacent to conditional jump in-order frontend can trigger macro-fusion.
Kindly refer section 3.4.2.2 of Intel's optimization manual.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that by swapping the mov and the cmp instruction, the rule needs to have dst different from src1 and src2, which increases register pressure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not follow your comment, allocation decisions purely based on LRGs interferences and data flow attributes attached to operands and is agnostic to encoding block contents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion requires us having additional TEMP dst for the match rule. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, macro fusion is a fine microarchitectural optimization which can reduce load on entire execution pipeline and is deterministic for specific pair of cmp + jump instructions, you have aggregated destination's defs and its usages towards the tail which can save TEMP attribution on destination operand and may save a redundant spill only for high register pressure blocks. I am ok with existing handling.

Thanks for your explanations.

Comment on lines +13093 to +13095
__ cmpq($src1$$Register, $src2$$Register);
__ movl($dst$$Register, -1);
__ jccb(Assembler::below, done);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@@ -13022,6 +13022,32 @@ instruct testL_reg_mem2(rFlagsReg cr, rRegP src, memory mem, immL0 zero)
ins_pipe(ialu_cr_reg_mem);
%}

// Manifest a CmpU result in an integer register. Very painful.
// This is the test to avoid.
instruct cmpU3_reg_reg(rRegI dst, rRegI src1, rRegI src2, rFlagsReg flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add 32 bit support?
Integer pattern can be moved to common file x86.ad and 64 pattern can handled in 32/64 bit AD files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will add support for 32-bit after this patch, basic rules are often put in the bit-specific ad file so I think it would be more preferable to follow that convention here.

// Since it is not consumed by Bools, it is not really a Cmp.
init_class_id(Class_Sub);
}
virtual int Opcode() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-lining may connect the inputs to constant, hence a Value routine may be useful here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CmpU3 inherits the Value method from its superclass CmpU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its fine then.

init_class_id(Class_Sub);
}
virtual int Opcode() const;
virtual uint ideal_reg() const { return Op_RegI; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value routine to handle constant folding.

@merykitty
Copy link
Member Author

@jatin-bhateja Thanks a lot for your reviews and suggestions, I have answered your comments.

@merykitty
Copy link
Member Author

Thank you very much for your reviews
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 29, 2022
@openjdk
Copy link

openjdk bot commented Jun 29, 2022

@merykitty
Your change (at version 0ab881a) is now ready to be sponsored by a Committer.

@jatin-bhateja
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jun 29, 2022

Going to push as commit 108cd69.
Since your change was applied there have been 321 commits pushed to the master branch:

  • b96ba19: 8289182: NMT: MemTracker::baseline should return void
  • 779b4e1: 8287001: Add warning message when fail to load hsdis libraries
  • 910053b: 8280235: Deprecated flag FlightRecorder missing from VMDeprecatedOptions test
  • 7b3bf97: 8289401: Add dump output to TestRawRSACipher.java
  • 86dc760: Merge
  • 1504804: 8289398: ProblemList jdk/jfr/api/consumer/recordingstream/TestOnEvent.java on linux-x64 again
  • 9b7805e: 8289069: Very slow C1 arraycopy jcstress tests after JDK-8279886
  • c42b796: 8288058: Broken links on constant-values page
  • a814293: 8275784: Bogus warning generated for record with compact constructor
  • 6f9717b: 8288836: (fs) Files.writeString spec for IOException has "specified charset" when no charset is provided
  • ... and 311 more: https://git.openjdk.org/jdk/compare/645be42f76b8983a9096ed90caa70b5c59dd822c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 29, 2022
@openjdk openjdk bot closed this Jun 29, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jun 29, 2022
@openjdk
Copy link

openjdk bot commented Jun 29, 2022

@jatin-bhateja @merykitty Pushed as commit 108cd69.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
4 participants