Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8319429: Resetting MXCSR flags degrades ecore #16504

Closed
wants to merge 3 commits into from

Conversation

vpaprotsk
Copy link
Contributor

@vpaprotsk vpaprotsk commented Nov 3, 2023

Improves vector rounding on ECore about 10x

(BEFORE) FpRoundingBenchmark.test_round_float        2048  thrpt    3  40.912 ± 0.044  ops/ms
(AFTER ) FpRoundingBenchmark.test_round_float        2048  thrpt    3  431.682 ± 0.727  ops/ms

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8319429: Resetting MXCSR flags degrades ecore (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16504/head:pull/16504
$ git checkout pull/16504

Update a local copy of the PR:
$ git checkout pull/16504
$ git pull https://git.openjdk.org/jdk.git pull/16504/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16504

View PR using the GUI difftool:
$ git pr show -t 16504

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16504.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 3, 2023

👋 Welcome back vpaprotsk! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 3, 2023

@vpaprotsk The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Nov 3, 2023
@vpaprotsk vpaprotsk changed the title [JDK-8319429] Don't zero out mxcsr flag bits on ECore 8319429: Don't zero out mxcsr flag bits on ECore Nov 3, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 3, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 3, 2023

Webrevs

Comment on lines 572 to 574
product(bool, DoEcoreOpt, false, DIAGNOSTIC, \
"Perform Ecore Optimization") \
\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a CPU specific flag in ./cpu/x86/globals_x86.hpp (similar to how we have linux specific flags in ./os/linux/globals_linux.hpp). Also the description should clarify that the default is actually true for Ecore systems, and false elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, didn't know about that file, will try.

Comment on lines 860 to 864
// Check if processor has Intel Ecore
if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 &&
(_model == 0x97 || _model == 0xAC || _model == 0xAF)) {
FLAG_SET_DEFAULT(DoEcoreOpt, true);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what should happen if the flag is set true by the user and there is no Ecore? What affect will that have? Should it be allowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ISA point of view, they are the same, so if user flips flag on purpose, code will still be correct. Its also helpful to test Ecore optimized code on a Pcore (I have some more patches coming in under this option soon)

@vpaprotsk vpaprotsk changed the title 8319429: Don't zero out mxcsr flag bits on ECore 8319429: Resetting MXCSR flags degrades ecore Nov 6, 2023
@jatin-bhateja
Copy link
Member

/label add hotspot-compiler-dev

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Nov 7, 2023
@openjdk
Copy link

openjdk bot commented Nov 7, 2023

@jatin-bhateja
The hotspot-compiler label was successfully added.

@@ -857,6 +857,12 @@ void VM_Version::get_processor_features() {
}
#endif

// Check if processor has Intel Ecore
if (FLAG_IS_DEFAULT(DoEcoreOpt) && is_intel() && cpu_family() == 6 &&
(_model == 0x97 || _model == 0xAC || _model == 0xAF)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On hybrid platforms, it will be enabled on both Pcore and Ecore. Performance is essentially unchanged on Pcore

@@ -7440,7 +7440,7 @@ instruct vround_float_evex(vec dst, vec src, rRegP tmp, vec xtmp1, vec xtmp2, kR
format %{ "vector_round_float $dst,$src\t! using $tmp, $xtmp1, $xtmp2, $ktmp1, $ktmp2 as TEMP" %}
ins_encode %{
int vlen_enc = vector_length_encoding(this);
InternalAddress new_mxcsr = $constantaddress((jint)0x3F80);
InternalAddress new_mxcsr = $constantaddress((jint)(DoEcoreOpt ? 0x3FBF : 0x3F80));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a preprocessor macro for conditional selection pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced that it makes it cleaner; while reading, means one extra file lookup to see what the macro does, vs how it immediately is clear now (ctags working half the time..). Macros seem to be paired better here with conditional compilation, don't see them paired with dev options..

I can perhaps be convinced, it is repeated 6 times in this PR. Perhaps globals_x86.hpp might be an acceptable place, but doesn't appear to have a precedent there.

@@ -214,6 +214,10 @@ define_pd_global(intx, InitArrayShortSize, 8*BytesPerLong);
product(bool, UseLibmIntrinsic, true, DIAGNOSTIC, \
"Use Libm Intrinsics") \
\
/* Autodetected, see vm_version_x86.cpp */ \
product(bool, DoEcoreOpt, false, DIAGNOSTIC, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to name to DoEcoreOpt -> DoEcoreOpts or EnableX86ECoreOpts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment addressal is pending.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Nov 7, 2023

As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw
exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of
invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions.

Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets with just P-cores.

@vpaprotsk
Copy link
Contributor Author

As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions.

Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets with just P-cores.

I considered it. There doesn't appear to be any functional correctness issues, since Java does not support the signaling part of BFP IEEE anyway, those flags are essentially noop. I also measured on a some PCore systems, the performance is unaffected. I mostly went with this fix to be conservative, since there probably should be more performance testing otherwise. Might be cleaner to have it set to just one value.

@jatin-bhateja
Copy link
Member

jatin-bhateja commented Nov 7, 2023

As per JVM specification section 2.8 "The floating-point instructions of the Java Virtual Machine do not throw exceptions, trap, or otherwise signal the IEEE 754 exceptional conditions of invalid operation, division by zero, overflow, underflow, or inexact.", thus JVM does not check these exceptions.
Your patch is always setting lower 6 bits of MXCSR on hybrid CPUs which has both E and P core, do you see any concerns if these bits are default ON for other server targets with just P-cores.

I considered it. There doesn't appear to be any functional correctness issues, since Java does not support the signaling part of BFP IEEE anyway, those flags are essentially noop. I also measured on a some PCore systems, the performance is unaffected. I mostly went with this fix to be conservative, since there probably should be more performance testing otherwise. Might be cleaner to have it set to just one value.

Do you have any idea why these settings give a performance bump over E-core, are these suggested settings in x86 manuals?
It will be good if you can add some reference to justify these settings.

Copy link

@sviswa7 sviswa7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

@openjdk
Copy link

openjdk bot commented Nov 7, 2023

@vpaprotsk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8319429: Resetting MXCSR flags degrades ecore

Reviewed-by: sviswanathan, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 71 new commits pushed to the master branch:

  • d7b0ba9: 8319554: Select LogOutput* directly for stdout and stderr
  • 68110b7: 8319574: Exec/process tests should be marked as flagless
  • 7b971c1: 8319705: RISC-V: signumF/D intrinsics fails compiler/intrinsics/math/TestSignumIntrinsic.java
  • f939542: 8319324: FFM: Reformat javadocs
  • a3f1b33: 8319664: IGV always output on PhaseRemoveUseless
  • f57b78c: 8319726: Parallel GC: Re-use object in object-iterator
  • 4451a92: 8319748: [JVMCI] TestUseCompressedOopsFlagsWithUlimit.java crashes on libgraal
  • 7d8adfa: 8316746: Top of lock-stack does not match the unlocked object
  • dd9eab1: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
  • 7e4cb2f: 8318962: Update ProcessTools javadoc with suggestions in 8315097
  • ... and 61 more: https://git.openjdk.org/jdk/compare/ea6a88a0aa4e8a365a94e71078e67a4452f40945...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@dholmes-ora, @sviswa7, @TobiHartmann) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 7, 2023
@vpaprotsk
Copy link
Contributor Author

@dholmes-ora did you have any further concerns? Would you mind running any extra tests you have?

@dholmes-ora
Copy link
Member

@dholmes-ora did you have any further concerns? Would you mind running any extra tests you have?

I have no further concerns - thanks for moving the flag. Someone from our compiler team should review this and run it through our CI. I don't know if we have any machines that will be affected by this.

@sviswa7
Copy link

sviswa7 commented Nov 8, 2023

@TobiHartmann @vnkozlov Could we please get one more review on this small PR?

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. I'll run it through our testing and report back.

@TobiHartmann
Copy link
Member

All tests passed.

@sviswa7
Copy link

sviswa7 commented Nov 9, 2023

Thanks a lot Tobias!

@vpaprotsk
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 9, 2023
@openjdk
Copy link

openjdk bot commented Nov 9, 2023

@vpaprotsk
Your change (at version fd95663) is now ready to be sponsored by a Committer.

@sviswa7
Copy link

sviswa7 commented Nov 9, 2023

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 9, 2023

Going to push as commit 636a351.
Since your change was applied there have been 71 commits pushed to the master branch:

  • d7b0ba9: 8319554: Select LogOutput* directly for stdout and stderr
  • 68110b7: 8319574: Exec/process tests should be marked as flagless
  • 7b971c1: 8319705: RISC-V: signumF/D intrinsics fails compiler/intrinsics/math/TestSignumIntrinsic.java
  • f939542: 8319324: FFM: Reformat javadocs
  • a3f1b33: 8319664: IGV always output on PhaseRemoveUseless
  • f57b78c: 8319726: Parallel GC: Re-use object in object-iterator
  • 4451a92: 8319748: [JVMCI] TestUseCompressedOopsFlagsWithUlimit.java crashes on libgraal
  • 7d8adfa: 8316746: Top of lock-stack does not match the unlocked object
  • dd9eab1: 8310886: C2 SuperWord: Two nodes should be isomorphic if they are loop invariant but pinned at different nodes outside the loop
  • 7e4cb2f: 8318962: Update ProcessTools javadoc with suggestions in 8315097
  • ... and 61 more: https://git.openjdk.org/jdk/compare/ea6a88a0aa4e8a365a94e71078e67a4452f40945...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 9, 2023
@openjdk openjdk bot closed this Nov 9, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 9, 2023
@openjdk
Copy link

openjdk bot commented Nov 9, 2023

@sviswa7 @vpaprotsk Pushed as commit 636a351.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants