8294588: Auto vectorize half precision floating point conversion APIs#11471
8294588: Auto vectorize half precision floating point conversion APIs#11471smita-kamath wants to merge 9 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back svkamath! A progress list of the required criteria for merging this PR into |
|
label /hotspot |
|
@smita-kamath The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
| System.out.println("PASSED"); | ||
| } | ||
|
|
||
| @Test |
There was a problem hiding this comment.
New IR node checking annotations missing.
| virtual int Opcode() const; | ||
| }; | ||
|
|
||
| class HF2FVNode : public VectorNode { |
There was a problem hiding this comment.
You may use same naming convention as used for other vector casting IR nodes
VectorCastH2F and F2H
| ins_pipe( pipe_slow ); | ||
| %} | ||
|
|
||
| instruct vconvF2HF(vec dst, vec src) %{ |
There was a problem hiding this comment.
We do have a destination memory flavour of VCVTPS2PH, adding a memory pattern will fold subsequent store in one instruction.
src/hotspot/cpu/x86/x86.ad
Outdated
| case Op_F2HFV: | ||
| if (!VM_Version::supports_f16c() && !VM_Version::supports_avx512vl()) { |
There was a problem hiding this comment.
We need different check for vector flavors (HF2FV/F2HV) vs the scalar flavors (ConvF2HF/ConvHF2F).
The check needed for vector flavors is:
if (!VM_Version::supports_f16c() && !VM_Version::supports_avx512()) { return false; }
Also in vm_version_x86.cpp, the F16C features should be disabled when UseAVX is set to 0, i.e. the following
if (UseAVX < 1) {
_features &= ~CPU_AVX;
_features &= ~CPU_VZEROUPPER;
}
should be updated to:
if (UseAVX < 1) {
_features &= ~CPU_AVX;
_features &= ~CPU_VZEROUPPER;
_features &= ~CPU_F16C;
}
src/hotspot/cpu/x86/x86.ad
Outdated
| case Op_HF2FV: | ||
| case Op_F2HFV: | ||
| if (!VM_Version::supports_f16c() && !VM_Version::supports_avx512vl()) { | ||
| return false; | ||
| } | ||
| break; |
There was a problem hiding this comment.
This can be removed as match_rule_supported() has previously happened.
src/hotspot/cpu/x86/x86.ad
Outdated
| int src_size = Matcher::vector_length_in_bytes(this, $src); | ||
| int dst_size = src_size * 2; | ||
| int vlen_enc = vector_length_encoding(dst_size); |
There was a problem hiding this comment.
This could now be changed to:
int vlen_enc = Matcher::vector_length_encoding(this);
|
@smita-kamath this pull request can not be integrated into git checkout JDK-8294588
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
| void Assembler::vcvtph2ps(XMMRegister dst, XMMRegister src, int vector_len) { | ||
| assert(VM_Version::supports_avx512vl() || VM_Version::supports_f16c(), ""); |
There was a problem hiding this comment.
This should be VM_Version::supports_evex(). Also same for vcvtps2ph.
|
@smita-kamath The patch looks good to me. You will need another review. |
|
@smita-kamath This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 42 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@sviswa7, @vnkozlov, @jatin-bhateja, @XiaohongGong) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
vnkozlov
left a comment
There was a problem hiding this comment.
Changes are straight-forward but I have few comments.
And we need to test it again.
| if (UseAVX < 1) { | ||
| _features &= ~CPU_AVX; | ||
| _features &= ~CPU_VZEROUPPER; | ||
| _features &= ~CPU_F16C; |
There was a problem hiding this comment.
Is is_knights_family() supports f16c? We switch off some avx512 features for it. But it looks like f16c is not connected to avx512.
There was a problem hiding this comment.
Hi Vladimir, you're correct that f16c is not connected to avx512.
| assert(VM_Version::supports_evex() || VM_Version::supports_f16c(), ""); | ||
| InstructionMark im(this); | ||
| InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /*uses_vl */ true); | ||
| attributes.set_address_attributes(/* tuple_type */ EVEX_HVM, /* input_size_in_bits */ EVEX_NObit); |
There was a problem hiding this comment.
Is it correct to set EVEX_* attributes in case EVEX is switched off (by UseAVX flag)?
There was a problem hiding this comment.
Or a CPU supports F16C but does not EVEX (avx512f).
There was a problem hiding this comment.
Hi Vladimir, we have a prior example of vpaddb instruction where these attributes are set. The assembler will ignore these attributes if UseAVX < 3.
There was a problem hiding this comment.
Good. Thank you for answering my questions.
| } | ||
|
|
||
| @Test | ||
| @IR(counts = {IRNode.VECTOR_CAST_F2H, "> 0"}, applyIfCPUFeature = {"avx512f", "true"}) |
There was a problem hiding this comment.
You can add "FC16" also in the feature list and use applyIfCPUFeaturesOr
| * @bug 8294588 | ||
| * @summary Auto-vectorize Float.floatToFloat16, Float.float16ToFloat API's | ||
| * @requires vm.compiler2.enabled | ||
| * @requires vm.cpu.features ~= ".*avx.*" |
There was a problem hiding this comment.
Test may also execute on target if it has FC16, you can remove this CPU feature check. Feature since IR annotations already has a feature check.
jatin-bhateja
left a comment
There was a problem hiding this comment.
Verified my comments addressed. IR test is enabled for AVX, but can also be enabled for FC16 since some VM features can be selectively enabled in instances.
|
@vnkozlov I have addressed comments from Fei Gao and Xiaohong Gong. I have limited vectorization to avx2 and higher. If the changes look good to you, could you kindly run the tests? Thanks for all your help. |
@smita-kamath, can you explain why it does not work with AVX1? If it really requires AVX2 then you should just disable F16C for |
|
@vnkozlov you are right. It should work with AVX=1. I will make the changes. Thank you for your comment. |
|
@vnkozlov I have updated the test case to work with AVX=1. |
|
@vnkozlov The test was failing earlier with -XX:UseAVX=1 because the right implemented() check was not happening as Fei Gao explained. In vectornode.cpp, method VectorCastNode::implemented() was not getting the right vopc (VectorCastF2X, VectorCastS2X instead of VectorCastF2HF and VectorCastHF2F) after call to VectorCastNode::opcode() and so the Matcher::match_rule_supported_superword() was called with wrong vopc. This is now fixed as Smita has fixed the VectorCastNode::opcode() and VectorCastNode::implemented(). |
|
Thank you @sviswa7 for explanation! Good. |
|
I started new testing after verifying locally that test passed with |
Also, the IR test was only enabled for avx512f earlier, which some how over shadowed the problem. Since VM features are queried using CPUID hence matcher will give up if both F16C and AVX512F are not present. Hi @smita-kamath , we should not explicitly disable the F16C in vm_version. |
@jatin-bhateja When User sets -XX:UseAVX=0 on command line F16C needs to be disabled explicitly (in vm_version) as it needs AVX support. |
Thanks you @sviswa7 for explanation! |
|
Unfortunately I have to restart testing because JTREG version was update but I did not update my local repo which caused half of tests failed with "harness" error :^( |
|
Good news is the test passed in this testing (hotspot vector testing passed a whole). |
|
@vnkozlov, Thanks so much for running the tests. I really appreciate your help. |
fg1417
left a comment
There was a problem hiding this comment.
Thanks for your update. The change involving superword and vectornode parts looks good to me now.
vnkozlov
left a comment
There was a problem hiding this comment.
Latest testing results are good.
|
@vnkozlov Thanks a lot for your review comments and for testing this patch. |
|
/integrate |
|
@smita-kamath |
|
/sponsor |
|
Going to push as commit 073897c.
Your commit was automatically rebased without conflicts. |
|
@jatin-bhateja @smita-kamath Pushed as commit 073897c. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi All,
I have added changes for autovectorizing Float.float16ToFloat and Float.floatToFloat16 API's.
Following are the performance numbers of JMH micro Fp16ConversionBenchmark:
Before code changes:
Benchmark | (size) | Mode | Cnt | Score | Error | Units
Fp16ConversionBenchmark.float16ToFloat | 2048 | thrpt | 3 | 1044.653 | ± 0.041 | ops/ms
Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | thrpt | 3 | 2341529.9 | ± 11765.453 | ops/ms
Fp16ConversionBenchmark.floatToFloat16 | 2048 | thrpt | 3 | 2156.662 | ± 0.653 | ops/ms
Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt | 3 | 2007988.1 | ± 361.696 | ops/ms
After:
Benchmark | (size) | Mode | Cnt | Score | Error | Units
Fp16ConversionBenchmark.float16ToFloat | 2048 | thrpt | 3 | 20460.349 |± 372.327 | ops/ms
Fp16ConversionBenchmark.float16ToFloatMemory | 2048 | thrpt | 3 | 2342125.200 |± 9250.899 |ops/ms
Fp16ConversionBenchmark.floatToFloat16 | 2048 | thrpt | 3 | 22553.977 |± 483.034 | ops/ms
Fp16ConversionBenchmark.floatToFloat16Memory | 2048 | thrpt | 3 | 2007899.797 |± 150.296 | ops/ms
Kindly review and share your feedback.
Thanks.
Smita
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11471/head:pull/11471$ git checkout pull/11471Update a local copy of the PR:
$ git checkout pull/11471$ git pull https://git.openjdk.org/jdk pull/11471/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11471View PR using the GUI difftool:
$ git pr show -t 11471Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11471.diff