-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8330266: RISC-V: Restore frm to RoundingMode::rne after JNI #18785
Conversation
👋 Welcome back mli! A progress list of the required criteria for merging this PR into |
@Hamlin-Li This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 94 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@Hamlin-Li The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
mv(tmp2, RoundingMode::rne); | ||
// Set FRM to the state we need. We do want Round to Nearest. We | ||
// don't want non-IEEE rounding modes. | ||
beq(tmp1, tmp2, skip_fsrmi); // Only reset FRM if it's wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as #18758 (comment), you can take advantage of RoundingMode::rne == 0
by doing beq(tmp1, zr, skip_fsrmi);
and adding a guarantee(RoundingMode::rne == 0);
right above for self-documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
// Set FRM to the state we need. We do want Round to Nearest. We | ||
// don't want non-IEEE rounding modes. | ||
guarantee(RoundingMode::rne == 0, "must be"); | ||
beq(tmp, zr, skip_fsrmi); // Only reset FRM if it's wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it really better (performance wise) than doing it always, unconditionaly (so minus frrm, minus beq) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done by following reasons:
- by the optimization guide, https://riscv-optimization-guide.riseproject.dev/#_controlling_rounding_behavior_scalar.
- aarch64 apply the similar optimization.
Please also check discussion at: #18758
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the links, however there are no performance claims in that discussion, only few "maybes" for "some hardware", can we check on existing h/w ( c910, u74) ? with jmh test doing dummy jni calls ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggesion.
c910 might be suitable for this kind test (u74 is in order, so not?), but I don't have the hardware now.
If anyone happens to have it and can help test it that would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VladimirKempik https://riscv-optimization-guide.riseproject.dev/#_controlling_rounding_behavior_scalar is written by multiple industry player who know how their hardware is going to behave, and it is confirmed that it will be more performant to have this frrm
+beq
than always doing fsrm
. If you disagree with the wording in that guide, you're welcome to open a discussion on https://gitlab.com/riseproject/riscv-optimization-guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see there is no performance degradation on the current generation of hardware. That's a great motivator to get this change in with the branch as we already know that the branch is going to bring a performance improvement on next/other generations of hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even when showing middle finger to branch predictor:
unsigned char prbarray[512];
static int setupme()
{
srand(time(NULL));
for(int i = 0; i < 512; i++)
{
int r = rand()%15;
prbarray[i]= (r > 10) ? (r-10) : 0;
}
return 1;
}
EXPORT void func() {
static int setup = 0;
static unsigned long counter = 0;
if(!setup) setup = setupme();
int x = prbarray[counter++%512];
asm
("csrw fcsr,%0\n\t"
:
: "r" (x)
);
}
results are still equal between branchless and branchful versions
I'm ok with this change, buts lets wait for yet another opinion: do we accept such "future proof" changes or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion on this. This code will only be enabled by the user on command line in case we are calling some buggy external libraries which may corrupt the FP control register. It won't make a difference for the most normal cases. My local tests show that frrm
is 3x faster than fsrmi
on sifive/u74. So it does make sense to have the branch check on platforms like this. It should not be a big issue to other more advanced platforms with a good BranchPredictor.
Suggesion: beqz(tmp, skip_fsrmi); // Only reset FRM if it's wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggesion: beqz(tmp, skip_fsrmi); // Only reset FRM if it's wrong
fixed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks everyone for testing and discussion!
Thanks @luhenry @VladimirKempik @RealFYang for your reviewing! |
Hi,
Can you help to review this patch?
As discussed at: #18758 (review), we'd better to do it.
Similar thing is done on aarch64, https://bugs.openjdk.org/browse/JDK-8320892
Thanks
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18785/head:pull/18785
$ git checkout pull/18785
Update a local copy of the PR:
$ git checkout pull/18785
$ git pull https://git.openjdk.org/jdk.git pull/18785/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18785
View PR using the GUI difftool:
$ git pr show -t 18785
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18785.diff
Webrev
Link to Webrev Comment