Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8283726: x86 intrinsics for compare method in Integer and Long #7975

Closed
wants to merge 6 commits into from

Conversation

vamsi-parasa
Copy link
Contributor

@vamsi-parasa vamsi-parasa commented Mar 27, 2022

Implements x86 intrinsics for compare() method in java.lang.Integer and java.lang.Long.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8283726: x86 intrinsics for compare method in Integer and Long

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7975/head:pull/7975
$ git checkout pull/7975

Update a local copy of the PR:
$ git checkout pull/7975
$ git pull https://git.openjdk.java.net/jdk pull/7975/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7975

View PR using the GUI difftool:
$ git pr show -t 7975

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7975.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 27, 2022

👋 Welcome back vamsi-parasa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 27, 2022

@vamsi-parasa The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Mar 27, 2022
@merykitty
Copy link
Member

This is both complicated and inefficient, I would suggest building the intrinsic in the IR graph so that the compiler can simplify Integer.compareUnsigned(x, y) < 0 into x u< y. Thanks.

@vamsi-parasa vamsi-parasa marked this pull request as ready for review March 28, 2022 05:26
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 28, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 28, 2022

Webrevs

Comment on lines +235 to +239
do_intrinsic(_compare_i, java_lang_Integer, compare_name, int2_int_signature, F_S) \
do_intrinsic(_compare_l, java_lang_Long, compare_name, long2_int_signature, F_S) \
do_intrinsic(_compareUnsigned_i, java_lang_Integer, compare_unsigned_name, int2_int_signature, F_S) \
do_name( compare_unsigned_name, "compareUnsigned") \
do_intrinsic(_compareUnsigned_l, java_lang_Long, compare_unsigned_name, long2_int_signature, F_S) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating these methods as intrinsic will create a box around the underneath comparison logic, this shall prevent any regular constant folding which could have optimized out certain control paths, I would suggest to to handle constant folding for newly added nodes in associated Value routines.

Comment on lines +42 to +67
class CompareSignedINode : public CompareNode {
public:
CompareSignedINode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
virtual int Opcode() const;
};

//---------- CompareSignedLNode -----------------------------------------------------
class CompareSignedLNode : public CompareNode {
public:
CompareSignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
virtual int Opcode() const;
};

//---------- CompareUnsignedINode -----------------------------------------------------
class CompareUnsignedINode : public CompareNode {
public:
CompareUnsignedINode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
virtual int Opcode() const;
};

//---------- CompareUnsignedLNode -----------------------------------------------------
class CompareUnsignedLNode : public CompareNode {
public:
CompareUnsignedLNode(Node* in1, Node* in2) : CompareNode(in1, in2) {}
virtual int Opcode() const;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intent here seems to be to enable further auto-vectorization of newly create IR nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the intention.

src/hotspot/cpu/x86/x86_64.ad Outdated Show resolved Hide resolved
src/hotspot/cpu/x86/x86_64.ad Outdated Show resolved Hide resolved
Comment on lines +62 to +78
for (int i = 0; i < BUFFER_SIZE; i++) {
input1[i] = rng.nextInt();
if (mode.equals("equal")) {
input2[i] = input1[i];
continue;
}
else input2[i] = rng.nextInt();

if (!mode.equals("mixed")) {
boolean doSwap = (mode.equals("lessThanEqual") && input1[i] > input2[i]) ||
(mode.equals("greaterThanEqual") && input1[i] < input2[i]);
if (doSwap) {
int tmp = input1[i];
input1[i] = input2[i];
input2[i] = tmp;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic re-organization suggestion:-

 for (int i = 0 ; i < BUFFER_SIZE; i++) {
    input1[i] = rng.nextLong();
 }

 if (mode.equals("equals") {
    GRADIANT = 0;
 } else if (mode.equals("greaterThanEquals")) {
    GRADIANT = 1;
 } else {
    assert mode.equals("lessThanEqual");
    GRADIANT = -1;
 }

 for(int i = 0 ; i < BUFFER_SIZE; i++) {
    input2[i] = input1[i] + i*GRADIANT;
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested refactoring is definitely elegant but one rare possibility is overflow due to the addition/subtraction. The swap logic doesn't have that problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested refactoring is definitely elegant but one rare possibility is overflow due to the addition/subtraction. The swap logic doesn't have that problem.

Given that BUFFER_SIZE is 1024 can we not circumvent this problem by using rng.nextLong(4096) in initialization loop.

Comment on lines +1 to +5
/*
* Copyright (c) 2022, Oracle and/or its affiliates. All rights reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can unify this benchmark along with integer compare micro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do the unification.

Comment on lines +12103 to +12107
//----------Intrinsics for Compare function------------------------------------

instruct compareSignedI_rReg(rRegI dst, rRegI op1, rRegI op2, rRegI tmp, rFlagsReg cr)
%{
match(Set dst (CompareSignedI op1 op2));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also include these patterns in x86_32.ad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update x86_32.ad as well.

src/hotspot/cpu/x86/x86_64.ad Outdated Show resolved Hide resolved
@vamsi-parasa
Copy link
Contributor Author

vamsi-parasa commented Mar 29, 2022

This is both complicated and inefficient, I would suggest building the intrinsic in the IR graph so that the compiler can simplify Integer.compareUnsigned(x, y) < 0 into x u< y. Thanks.

Thank you for the suggestion! This intrinsic uses 1 cmp instruction instead of two and shows ~10% improvement due to better branch prediction. Even without the intrinsic, the compiler is currently able to reduce it to x u< y but is still generating two cmp (unsigned) instructions as Integer.compareUnsigned(x, y) is implemented as x u< y? -1 : (x ==y ? 0 : 1).

@merykitty
Copy link
Member

merykitty commented Mar 29, 2022

@vamsi-parasa But normally the result of the compare methods is not used as a raw integer (it is useless to do so since the methods do not have any promise regarding the value of the result, only its sign). The idiom is to compare the result with 0, such as Integer.compare(x, y) > 0, the compiler can reduce this to x > y (last time I checked it does not do so but in principle this is possible). Your intrinsic prevents the compiler from doing such transformations, hurting the performance of real programs.

Even without the intrinsic, the compiler is currently able to reduce it to x u< y

It is because the compiler can recognise the pattern x + MIN_VALUE < y + MIN_VALUE and transforms it into x u< y. This transformation is fragile however if one of the arguments is in the form x + con, in such cases constant propagation may lead to slight deviations from recognised patterns, defeat the transformations. As a result, it may be justifiable to have a dedicated intrinsic for that since unsigned comparisons are pretty basic operations that are needed for optimal range check performance.

Thanks.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Mar 31, 2022

@vamsi-parasa But normally the result of the compare methods is not used as a raw integer (it is useless to do so since the methods do not have any promise regarding the value of the result, only its sign). The idiom is to compare the result with 0, such as Integer.compare(x, y) > 0, the compiler can reduce this to x > y (last time I checked it does not do so but in principle this is possible). Your intrinsic prevents the compiler from doing such transformations, hurting the performance of real programs.

Even without the intrinsic, the compiler is currently able to reduce it to x u< y

It is because the compiler can recognise the pattern x + MIN_VALUE < y + MIN_VALUE and transforms it into x u< y. This transformation is fragile however if one of the arguments is in the form x + con, in such cases constant propagation may lead to slight deviations from recognised patterns, defeat the transformations. As a result, it may be justifiable to have a dedicated intrinsic for that since unsigned comparisons are pretty basic operations that are needed for optimal range check performance.

Thanks.

I agree, what @merykitty is proposing is to add another transformation on top of newly generated IR nodes which can reduce the emitted JIT sequence, along with this constant operand handling for newly created IR nodes though Value routines will also be useful.

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 28, 2022

@vamsi-parasa This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented May 26, 2022

@vamsi-parasa This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
3 participants