Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266332: Adler32 intrinsic for x86 64-bit platforms #3806

Closed
wants to merge 12 commits into from

Conversation

xbzhang99
Copy link
Contributor

@xbzhang99 xbzhang99 commented Apr 29, 2021

Implement Adler32 intrinsic for x86 64-bit platform using vector instructions.

The benchmark test/micro/org/openjdk/bench/java/util/TestAdler32.java is contributed by Pengfei Li (pli, Pengfei.Li@arm.com).

For this benchmark, the optimization shows ~5x improvement.

Base:
Benchmark (count) Mode Cnt Score Error Units
TestAdler32Perf.testAdler32Update 64 avgt 25 0.084 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 128 avgt 25 0.104 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 256 avgt 25 0.146 ± 0.002 us/op
TestAdler32Perf.testAdler32Update 512 avgt 25 0.226 ± 0.002 us/op
TestAdler32Perf.testAdler32Update 1024 avgt 25 0.390 ± 0.005 us/op
TestAdler32Perf.testAdler32Update 2048 avgt 25 0.714 ± 0.007 us/op
TestAdler32Perf.testAdler32Update 4096 avgt 25 1.359 ± 0.014 us/op
TestAdler32Perf.testAdler32Update 8192 avgt 25 2.751 ± 0.023 us/op
TestAdler32Perf.testAdler32Update 16384 avgt 25 5.494 ± 0.077 us/op
TestAdler32Perf.testAdler32Update 32768 avgt 25 11.058 ± 0.160 us/op
TestAdler32Perf.testAdler32Update 65536 avgt 25 22.198 ± 0.319 us/op

With patch:
Benchmark (count) Mode Cnt Score Error Units
TestAdler32Perf.testAdler32Update 64 avgt 25 0.020 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 128 avgt 25 0.025 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 256 avgt 25 0.031 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 512 avgt 25 0.048 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 1024 avgt 25 0.078 ± 0.001 us/op
TestAdler32Perf.testAdler32Update 2048 avgt 25 0.139 ± 0.002 us/op
TestAdler32Perf.testAdler32Update 4096 avgt 25 0.262 ± 0.004 us/op
TestAdler32Perf.testAdler32Update 8192 avgt 25 0.524 ± 0.010 us/op
TestAdler32Perf.testAdler32Update 16384 avgt 25 1.017 ± 0.022 us/op
TestAdler32Perf.testAdler32Update 32768 avgt 25 2.058 ± 0.052 us/op
TestAdler32Perf.testAdler32Update 65536 avgt 25 3.994 ± 0.013 us/op


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266332: Adler32 intrinsic for x86 64-bit platforms

Reviewers

Contributors

  • Xubo Zhang <xubo.zhang@intel.com>
  • Greg B Tucker <greg.b.tucker@intel.com>
  • Pengfei Li <pli@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3806/head:pull/3806
$ git checkout pull/3806

Update a local copy of the PR:
$ git checkout pull/3806
$ git pull https://git.openjdk.java.net/jdk pull/3806/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3806

View PR using the GUI difftool:
$ git pr show -t 3806

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3806.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 29, 2021

👋 Welcome back xbzhang99! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 29, 2021

@xbzhang99 The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Apr 29, 2021
@xbzhang99 xbzhang99 changed the title 8266332: Implement Adler32 intrinsic for x86 64-bit platforms 8266332: Adler32 intrinsic for x86 64-bit platforms Apr 29, 2021
@xbzhang99
Copy link
Contributor Author

/check

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 29, 2021
@openjdk
Copy link

openjdk bot commented Apr 29, 2021

@xbzhang99 Unknown command check - for a list of valid commands use /help.

@xbzhang99
Copy link
Contributor Author

Currently, only 64-bit Linux is supported

@mlbridge
Copy link

mlbridge bot commented Apr 29, 2021

@@ -3244,6 +3244,18 @@ void MacroAssembler::vpmullw(XMMRegister dst, XMMRegister nds, Address src, int
Assembler::vpmullw(dst, nds, src, vector_len);
}

void MacroAssembler::vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len) {
// Used in sign-bit flipping with aligned address.
bool aligned_adr = (((intptr_t)src.target() & 15) == 0);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an AVX instruction. So alignment is not required. The assert need to only check for UseAVX>0.

__ align(CodeEntryAlignment);
StubCodeMark mark(this, "StubRoutines", "updateBytesAdler32");

address start = __ pc();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The algorithm part can go into macroAssembler_x86_adler.cpp with Intel copyright (see macroAssembler_x86_sha.cpp).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macroAssembler_x86_adler.cpp added

__ cmpl(s, size);
__ cmovl(Assembler::above, s, size); // s = min(size, LIMIT)
__ lea(end, Address(s, data, Address::times_1, -CHUNKSIZE_M1));
__ cmpq(data, end);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be cmpptr here.


// reduce
__ vpslld(yb, yb, 3, Assembler::AVX_256bit); //b is scaled by 8
__ vpmulld(ysa, ya, ExternalAddress((address) StubRoutines::x86::_adler32_ascale_table), Assembler::AVX_256bit); //need scratch register??
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the instructions with ExternalAddress can modify rscratch1 which is r10. It is good to pass an explicit scratch register to these as last argument.

….cpp; added a scratch reg to vpmulld, and some other issues
@@ -3244,6 +3244,17 @@ void MacroAssembler::vpmullw(XMMRegister dst, XMMRegister nds, Address src, int
Assembler::vpmullw(dst, nds, src, vector_len);
}

void MacroAssembler::vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg) {
// Used in sign-bit flipping with aligned address.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could remove the spurious comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

BLOCK_COMMENT("Entry:");
__ enter(); // required for proper stackwalking of RuntimeStub frame

__ vmovdqu(yshuf0, ExternalAddress((address) StubRoutines::x86::_adler32_shuf0_table));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For vmovdqu also it is good to be explicit with scratch register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added scratch register to vmovdqu

#include "runtime/stubRoutines.hpp"
#include "macroAssembler_x86.hpp"


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updateBytesAdler32 should be under #ifdef _LP64, #endif.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#ifdef _LP64 ... #endif added

@@ -898,6 +898,16 @@ void VM_Version::get_processor_features() {
FLAG_SET_DEFAULT(UseCRC32Intrinsics, false);
}

if (supports_avx2() && UseAdler32Intrinsics) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under #ifdef _LP64.
For 32-bit, UseAdler32Intrinsics should be set to false.

@@ -0,0 +1,209 @@
/*
* Copyright (c) 2016, Intel Corporation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright year should be 2021.

@xbzhang99
Copy link
Contributor Author

/contributor add @xbzhang99
/contributor add Lauren Greg B Tucker greg.b.tucker@intel.com

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99 Could not parse @xbzhang99 as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99
Contributor Lauren Greg B Tucker <greg.b.tucker@intel.com> successfully added.

@xbzhang99
Copy link
Contributor Author

/contributor remove Lauren Greg B Tucker greg.b.tucker@intel.com
/contributor add @gbtucker

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99 Could not parse Lauren Greg B Tucker greg.b.tucker@intel.com as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99 Could not parse @gbtucker as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@xbzhang99
Copy link
Contributor Author

/contributor add @gbtucker

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99 Could not parse @gbtucker as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@xbzhang99
Copy link
Contributor Author

/contributor add xbzhang99

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99 Could not parse xbzhang99 as a valid contributor.
Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

@xbzhang99
Copy link
Contributor Author

/contributor remove Lauren Greg B Tucker greg.b.tucker@intel.com

@openjdk
Copy link

openjdk bot commented May 1, 2021

@xbzhang99
Contributor Lauren Greg B Tucker <greg.b.tucker@intel.com> successfully removed.

@xbzhang99
Copy link
Contributor Author

/contributor add Xubo Zhang xubo.zhang@intel.com

@openjdk
Copy link

openjdk bot commented May 17, 2021

@xbzhang99
Contributor Pengfei Li <pli@openjdk.org> successfully added.

@xbzhang99
Copy link
Contributor Author

@vnkozlov I implemented your review comments. Could you please take a look.

@vnkozlov
Copy link
Contributor

I'm not a lawyer, but Pengfei, please contribute this benchmark. All you have to do is copy it into cr.openjdk.java.net. That should be enough for someone else to take it from there. And AFAICR files should have a copyright header, which you should do too.

@theRealAph micro is already there for long time: https://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java
It missed copyright header which is added in these changes.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 comments.

UseAdler32Intrinsics = true;
}
} else if (UseAdler32Intrinsics) {
if (!FLAG_IS_DEFAULT(UseAdler32Intrinsics))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add {}.

void vpmulld(XMMRegister dst, XMMRegister nds, Address src, int vector_len) {
Assembler::vpmulld(dst, nds, src, vector_len);
};
void vpmulld(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) {
Assembler::vpmulld(dst, nds, src, vector_len);
}
void vpmulld(XMMRegister dst, XMMRegister nds, AddressLiteral src, int vector_len, Register scratch_reg = rscratch1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my comment was lost.
I see only last version of method is used in stub. Why you need additional 2 wrapper methods?
Also the code always pass scratch_reg - you don't need to set default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first two were introduced by other patches
will remove the scratch_reg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I added first two.
The vpmulld is overloaded in base Assembler class. If I override one method in MacroAssembler class, the C++ compiler doesn’t seem to find the other overloaded functions, they somehow become hidden.
So, I need to override those as well in macroAssembler, otherwise I get the following error:
./src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp: In member function 'void C2_MacroAssembler::reduce_operation_256(BasicType, int, XMMRegister, XMMRegister, XMMRegister)':
./src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp:1573:64: error: no matching function for call to 'C2_MacroAssembler::vpmulld(XMMRegisterImpl*&, XMMRegisterImpl*&, XMMRegisterImpl*&, int&)'

@pfustc
Copy link
Member

pfustc commented May 18, 2021

I’ve copied @xbzhang99 's modified test case into http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java The original one w/o copyright header is backed up at http://cr.openjdk.java.net/~pli/rfr/8216259/TestAdler32.java.old Please let me know if I should do anything else.

@@ -7854,6 +7854,18 @@ void Assembler::vbroadcastsd(XMMRegister dst, Address src, int vector_len) {
emit_operand(dst, src);
}

void Assembler::vbroadcastf128(XMMRegister dst, Address src, int vector_len) {
assert(VM_Version::supports_avx(), "");
assert(vector_len == AVX_256bit, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like "vector_len" can only be AVX_256bit. Do we really need a parameter then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your are right, for now it can only be AVX_256bit. But I think in the future other lengths will be used too. So we should have a more generic signature.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@openjdk
Copy link

openjdk bot commented May 18, 2021

@xbzhang99 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8266332: Adler32 intrinsic for x86 64-bit platforms

Co-authored-by: Xubo Zhang <xubo.zhang@intel.com>
Co-authored-by: Greg B Tucker <greg.b.tucker@intel.com>
Co-authored-by: Pengfei Li <pli@openjdk.org>
Reviewed-by: sviswanathan, jbhateja, kvn, neliasso

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 273 new commits pushed to the master branch:

  • 64e2479: 8267407: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java on linux-aarch64
  • 9760dba: 8267321: Use switch expression for VarHandle$AccessMode lookup
  • fdd0352: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287
  • 0b49f5a: 8267257: Shenandoah: Always deduplicate strings when it is enabled during full gc
  • 12050f0: 8266651: Convert Table method parameters from String to Content
  • e749f75: 8267304: Bump global JTReg memory limit to 768m
  • e858dd6: 8267361: JavaTokenizer reads octal numbers mistakenly
  • 1b93b81: 8267133: jdk/jfr/event/gc/collection/TestG1ParallelPhases.java fails with Not expected phases: RestorePreservedMarks, RemoveSelfForwardingPtr: expected true, was false
  • 88b1142: 8267357: build breaks with -Werror option on micro benchmark added for JDK-8256973
  • 6ef46ce: 8231672: Simplify the reference processing parallelization framework
  • ... and 263 more: https://git.openjdk.java.net/jdk/compare/2c381e0f8d5777d289a6eb410d66d8221c7d9d1b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@sviswa7, @jatin-bhateja, @vnkozlov, @neliasso) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 18, 2021
Copy link

@neliasso neliasso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@xbzhang99
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label May 19, 2021
@openjdk
Copy link

openjdk bot commented May 19, 2021

@xbzhang99
Your change (at version 0583b2c) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented May 19, 2021

/test tier1

@openjdk
Copy link

openjdk bot commented May 19, 2021

@sviswa7 you need to get approval to run the tests in tier1 for commits up until 0583b2c

@openjdk openjdk bot added the test-request label May 19, 2021
@sviswa7
Copy link

sviswa7 commented May 19, 2021

@vnkozlov Looks like automated tests are not run. Could you please help?

@vnkozlov
Copy link
Contributor

I started our internal testing.

@vnkozlov
Copy link
Contributor

tier1-3 testing is clean. It ran compiler/intrinsics/zip/TestAdler32.java test.

@vnkozlov
Copy link
Contributor

/sponsor

@openjdk openjdk bot closed this May 19, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 19, 2021
@openjdk
Copy link

openjdk bot commented May 19, 2021

@vnkozlov @xbzhang99 Since your change was applied there have been 280 commits pushed to the master branch:

  • b961f25: 8267191: Avoid repeated SystemDictionaryShared::should_be_excluded calls
  • 74f30ad: 8263684: Avoid wrapping into BufferedWriter twice
  • 9820f3d: 8267371: Concurrent gtests take too long
  • 38d690b: 8265262: CITime - 'other' incorrectly calculated
  • 66ab6d8: 8264181: javadoc tool Incorrect error message about malformed link
  • 99fcc41: 8234532: Remove ThreadLocalAllocBuffer::_fast_refill_waste since it is never set
  • 237fee8: 8267339: Temporarily disable os.release_multi_mappings_vm on macOS x64
  • 64e2479: 8267407: ProblemList vmTestbase/vm/mlvm/anonloader/stress/oome/metaspace/Test.java on linux-aarch64
  • 9760dba: 8267321: Use switch expression for VarHandle$AccessMode lookup
  • fdd0352: 8267338: [JVMCI] revive JVMCI API removed by JDK-8243287
  • ... and 270 more: https://git.openjdk.java.net/jdk/compare/2c381e0f8d5777d289a6eb410d66d8221c7d9d1b...master

Your commit was automatically rebased without conflicts.

Pushed as commit 8e3549f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@xbzhang99
Copy link
Contributor Author

/backport jdk11u-dev

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@xbzhang99 Unknown command backport - for a list of valid commands use /help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated test-request
8 participants