Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8265128: [REDO] Optimize Vector API slice and unslice operations #3804

Closed
wants to merge 4 commits into from

Conversation

@sviswa7
Copy link

@sviswa7 sviswa7 commented Apr 29, 2021

All the slice and unslice variants that take more than one argument can benefit from already intrinsic methods on similar lines as slice(origin) and unslice(origin).

Changes include:

  • Rewrite Vector API slice/unslice using already intrinsic methods
  • Fix in library_call.cpp:inline_preconditions_checkIndex() to not modify control if intrinsification fails
  • Vector API conversion tests thresholds adjustment

Base Performance:
Benchmark (size) Mode Cnt Score Error Units
TestSlice.vectorSliceOrigin 1024 thrpt 5 11763.372 ± 254.580 ops/ms
TestSlice.vectorSliceOriginVector 1024 thrpt 5 599.286 ± 326.770 ops/ms
TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6627.601 ± 22.060 ops/ms
TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 401.858 ± 220.340 ops/ms
TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 421.993 ± 231.703 ops/ms

Performance with patch:
Benchmark (size) Mode Cnt Score Error Units
TestSlice.vectorSliceOrigin 1024 thrpt 5 11792.091 ± 37.296 ops/ms
TestSlice.vectorSliceOriginVector 1024 thrpt 5 8388.174 ± 115.886 ops/ms
TestSlice.vectorSliceUnsliceOrigin 1024 thrpt 5 6662.159 ± 8.203 ops/ms
TestSlice.vectorSliceUnsliceOriginVector 1024 thrpt 5 5206.300 ± 43.637 ops/ms
TestSlice.vectorSliceUnsliceOriginVectorPart 1024 thrpt 5 5194.278 ± 13.376 ops/ms


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8265128: [REDO] Optimize Vector API slice and unslice operations

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3804/head:pull/3804
$ git checkout pull/3804

Update a local copy of the PR:
$ git checkout pull/3804
$ git pull https://git.openjdk.java.net/jdk pull/3804/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3804

View PR using the GUI difftool:
$ git pr show -t 3804

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3804.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 29, 2021

👋 Welcome back sviswanathan! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 29, 2021

@sviswa7 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented Apr 29, 2021

/label core-libs

Loading

@openjdk openjdk bot added the core-libs label Apr 29, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 29, 2021

@sviswa7
The core-libs label was successfully added.

Loading

@sviswa7 sviswa7 marked this pull request as ready for review Apr 29, 2021
@openjdk openjdk bot added the rfr label Apr 29, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 29, 2021

Loading

final
@ForceInline
$abstractvectortype$ sliceTemplate(int origin) {
Objects.checkIndex(origin, length());
Copy link
Member

@PaulSandoz PaulSandoz Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Objects.checkIndex the upper bound is exclusive, where as i think in this case it needs to be inclusive.

The original bound check was:

       if ((origin < 0) || (origin >= VLENGTH)) {
         throw new ArrayIndexOutOfBoundsException("Index " + origin + " out of bounds for vector length " + VLENGTH);
       }
...

in accordance with the spec in the JavaDoc.

Loading

Copy link
Author

@sviswa7 sviswa7 Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I will update this to Objects.checkIndex(origin, length()+1);

Loading

@ForceInline
$abstractvectortype$ sliceTemplate(int origin) {
Objects.checkIndex(origin, length());
VectorShuffle<$Boxtype$> Iota = iotaShuffle();
Copy link
Member

@PaulSandoz PaulSandoz Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorShuffle<$Boxtype$> Iota = iotaShuffle();
VectorShuffle<$Boxtype$> iota = iotaShuffle();

use lower case first letter for variable names.

Loading

Copy link
Author

@sviswa7 sviswa7 Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make this change.

Loading

VectorShuffle<$Boxtype$> Iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = Iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
Iota = iotaShuffle(origin, 1, true);
return vspecies().zero().blend(this.rearrange(Iota), BlendMask);
Copy link
Member

@PaulSandoz PaulSandoz Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observation: this is like this.rearrange(iota, blendMask) but we avoid the checking for exceptional shuffle indexes, since we know there are no such exceptional indexes.

And, of course,this method is like slice(origin vspecies().zero()). I wonder if this template should defer to the vector accepting tempting? Or the method on vector itself?

Loading

Copy link
Author

@sviswa7 sviswa7 Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to optimize as much as possible:
There is overhead in checking for exception shuffle indices.
The slice(origin, vector) version has one extra rearrange call.

Loading

@@ -48,7 +48,7 @@ public void getRunTime(ITestResult tr) {
System.out.println(tr.getName() + " took " + time + " ms");
}

static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 1000);
static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
Copy link
Member

@PaulSandoz PaulSandoz Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this?

Loading

Copy link
Author

@sviswa7 sviswa7 Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following reasons:
The INVOC_COUNT is set as 100 for all other tests.
The conversion tests take longer time to complete than other tests.
For the conversion tests it looks like the intrinsics don't trigger sometimes and tests take longer to execute and timeout. This was one of the reasons for backout last time.

Loading

Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented Apr 30, 2021

@PaulSandoz I have implemented your review comments. Please take a look.

Loading

@@ -48,7 +48,7 @@ public void getRunTime(ITestResult tr) {
System.out.println(tr.getName() + " took " + time + " ms");
}

static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 1000);
static final int INVOC_COUNT = Integer.getInteger("jdk.incubator.vector.test.loop-iterations", 100);
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Loading

VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
iota = iotaShuffle(origin, 1, true);
return (($abstractvectortype$)v1).rearrange(iota).blend(this.rearrange(iota), BlendMask);
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (($abstractvectortype$)v1).rearrange(iota).blend(this.rearrange(iota), BlendMask);
return that.rearrange(iota).blend(this.rearrange(iota), blendMask);

Loading

return vectorFactory(res);
Objects.checkIndex(origin, length() + 1);
VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));

Loading

$abstractvectortype$ sliceTemplate(int origin) {
Objects.checkIndex(origin, length() + 1);
VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));

Loading

return vectorFactory(res);
Objects.checkIndex(origin, length() + 1);
VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT,
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT,
VectorMask<$Boxtype$> blendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT,

Loading

VectorMask<$Boxtype$> BlendMask = iota.toVector().compare((part == 0) ? VectorOperators.GE : VectorOperators.LT,
(broadcast(($type$)(origin))));
iota = iotaShuffle(-origin, 1, true);
return (($abstractvectortype$)w).blend(this.rearrange(iota), BlendMask);
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (($abstractvectortype$)w).blend(this.rearrange(iota), BlendMask);
return that.blend(this.rearrange(iota), blendMask);

Loading

VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.LT, (broadcast(($type$)(length() - origin))));
iota = iotaShuffle(origin, 1, true);
return vspecies().zero().blend(this.rearrange(iota), BlendMask);
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return vspecies().zero().blend(this.rearrange(iota), BlendMask);
return vspecies().zero().blend(this.rearrange(iota), blendMask);

Loading

unsliceTemplate(int origin) {
Objects.checkIndex(origin, length() + 1);
VectorShuffle<$Boxtype$> iota = iotaShuffle();
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.GE,
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.GE,
VectorMask<$Boxtype$> blendMask = iota.toVector().compare(VectorOperators.GE,

Loading

VectorMask<$Boxtype$> BlendMask = iota.toVector().compare(VectorOperators.GE,
(broadcast(($type$)(origin))));
iota = iotaShuffle(-origin, 1, true);
return vspecies().zero().blend(this.rearrange(iota), BlendMask);
Copy link
Member

@PaulSandoz PaulSandoz Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return vspecies().zero().blend(this.rearrange(iota), BlendMask);
return vspecies().zero().blend(this.rearrange(iota), blendMask);

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented Apr 30, 2021

@PaulSandoz Thanks a lot for these suggestions. I have included all of them in this updated commit.

Loading

@openjdk
Copy link

@openjdk openjdk bot commented Apr 30, 2021

@sviswa7 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8265128: [REDO] Optimize Vector API slice and unslice operations

Reviewed-by: psandoz, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 174 new commits pushed to the master branch:

  • e5d3ee3: 8266802: Shenandoah: Round up region size to page size unconditionally
  • 8851cb6: 8266779: Use instead of ZERO_WIDTH_SPACE
  • 0cc7833: 8265208: [JEP-356] : SplittableRandom and SplittableGenerators - splits() methods does not throw NullPointerException when source is null
  • f78440a: 8266440: Shenandoah: TestReferenceShortcutCycle.java test failed on AArch64
  • de78431: 8241502: C2: Migrate x86_64.ad to MacroAssembler
  • c8b7447: 8266603: jpackage: Add missing copyright file in Java runtime .deb installers
  • c494efc: 8266774: System property values for stdout/err on Windows UTF-8
  • 25d99e5: 8266330: itableMethodEntry::initialize() asserts with archived old classes
  • 5d761fc: 8266796: Clean up the unnecessary code in the method UnsharedNameTable#fromUtf
  • e41fd73: 8266252: Streamline AbstractInterpreter::method_kind
  • ... and 164 more: https://git.openjdk.java.net/jdk/compare/9df6cc7cc2633e4231b9b69bed8a0f9e13ec74a7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Loading

@openjdk openjdk bot added the ready label Apr 30, 2021
@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented Apr 30, 2021

@PaulSandoz would it be possible for you to run this through your testing?

Loading

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Apr 30, 2021

@PaulSandoz would it be possible for you to run this through your testing?

Started, will report back when done.

Loading

@PaulSandoz
Copy link
Member

@PaulSandoz PaulSandoz commented Apr 30, 2021

@PaulSandoz would it be possible for you to run this through your testing?

Started, will report back when done.

Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented Apr 30, 2021

@PaulSandoz would it be possible for you to run this through your testing?

Started, will report back when done.

Tier 1 to 3 tests all pass on build profiles linux-x64 linux-aarch64 macosx-x64 windows-x64

@PaulSandoz Thanks a lot!

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented May 3, 2021

@iwanowww Could you please review if the change in library_call.cpp looks ok to you?

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented May 10, 2021

@PaulSandoz After we fixed the Objects.checkIndex arguments per your review comment, the changes in library_call.cpp are no more needed so I have backed them out. That leaves only changes in vector api code which you have reviewed. Please let me know if it is ok to push.

Loading

Copy link
Member

@PaulSandoz PaulSandoz left a comment

I don't claim to understand the HotSpot details, but Java changes still look good.

Loading

Copy link

@iwanowww iwanowww left a comment

Looks good.

PS: I still think there's a problem with LibraryCallKit::inline_preconditions_checkIndex: it shouldn't bail out intrinsification in the middle of the process leaving a partially constructed graph behind. I don't see why short-circuiting the logic once the path is dead (if (stopped()) return true;) won't work. But that's a topic for another fix.

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented May 10, 2021

Thanks a lot for the review Paul and Vladimir.
Vladimir, I will submit a separate patch for LibraryCallKit::inline_preconditions_checkIndex fix.

Loading

@sviswa7
Copy link
Author

@sviswa7 sviswa7 commented May 10, 2021

/integrate

Loading

@openjdk openjdk bot closed this May 10, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels May 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 10, 2021

@sviswa7 Since your change was applied there have been 174 commits pushed to the master branch:

  • e5d3ee3: 8266802: Shenandoah: Round up region size to page size unconditionally
  • 8851cb6: 8266779: Use instead of ZERO_WIDTH_SPACE
  • 0cc7833: 8265208: [JEP-356] : SplittableRandom and SplittableGenerators - splits() methods does not throw NullPointerException when source is null
  • f78440a: 8266440: Shenandoah: TestReferenceShortcutCycle.java test failed on AArch64
  • de78431: 8241502: C2: Migrate x86_64.ad to MacroAssembler
  • c8b7447: 8266603: jpackage: Add missing copyright file in Java runtime .deb installers
  • c494efc: 8266774: System property values for stdout/err on Windows UTF-8
  • 25d99e5: 8266330: itableMethodEntry::initialize() asserts with archived old classes
  • 5d761fc: 8266796: Clean up the unnecessary code in the method UnsharedNameTable#fromUtf
  • e41fd73: 8266252: Streamline AbstractInterpreter::method_kind
  • ... and 164 more: https://git.openjdk.java.net/jdk/compare/9df6cc7cc2633e4231b9b69bed8a0f9e13ec74a7...master

Your commit was automatically rebased without conflicts.

Pushed as commit 23446f1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants