-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers #11907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…casting errors with large numbers
|
👋 Welcome back chagedorn! A progress list of the required criteria for merging this PR into |
|
@chhagedorn The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
src/hotspot/share/opto/mulnode.cpp
Outdated
| // x / a does not underflow/overflow since |x / a| <= |x|. If a * b does underflow/overflow, then we'll get a different | ||
| // result compared to a * b. Special case MIN_VALUE * -1 whose result is MIN_VALUE. | ||
| bool does_overflow(const NativeType a, const NativeType b) const { | ||
| NativeType x = java_multiply(a, b); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for 0 values to return false; immediately before all calculations?
Also may be path results of java_multiply() to does_overflow(). Otherwise you execute it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated my patch with your suggestion.
|
@chhagedorn This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 15 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
I would suggest we do a 128-bit multiplication or a 64-bit high multiplication instead. This will help constant folding of Thanks. |
Thanks for your suggestion. I'm not sure if I understand this correctly, though. How would this help for But if we are just talking about the implementation, how could we do a 128-bit multiplication inside the VM? Using |
|
@chhagedorn I think we can do similar to what our core library is doing
This helps in the sense that we currently do not have a 64-bit high multiplication inside the VM, so having one makes adding a more effective Thanks. |
|
I think I understand now what you mean, thanks for the explanation. That's a very interesting idea. I will try it out and get back with an updated patch proposal and some tests where this can be beneficial. Thanks, |
|
Today's processors support 64-to-128-bit multiply in just a few cycles. This is a useful operation here and elsewhere. We should bite the bullet and arrange to make it available in HotSpot. It would make this particular problem a little simpler to solve. Here is a good external example of it being done well in fast, portable code: https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L294 Here is an example (not to be followed literally) of code that has recently worked well for me on both x86 and ARM: There should be both signed and unsigned versions of this or a similar primitive. I suggest making the two-output thing (full-multiply, signed or unsigned) to be the primitive for HotSpot, not the mul-hi intrinsic that Java has (high-half of full-multiply, signed or unsigned). Java tilts strongly towards one-output primitives because we don't have Project Valhalla yet, but C++ has no such restriction. |
|
That would indeed make things easier and faster. We would probably also need such a portable solution to provide 128-bit integers for all architectures with a struct that stores the lo and high part. Would be great to get these in at some point, so I'd suggest to file an RFE for it. For this bug here, I'm not sure if we should wait until the 128-bit integer support makes it in. I think it is okay to spend some more cycles (compared to using 128-bit integers) during the compilation when using @merykitty's proposal with a high multiplication for now. We could still come back to this code again and update it with 128-bit integers later. What do you think? |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
… have the same number of overflows/underflows as suggested by Quan
|
I've updated the algorithm to implement @merykitty's idea to get a more precise type if all the cross products overflowed/underflowed the same number of times. I've included a lot of tests to make sure it works properly. For the implementation, I added As the C++ standard does not define the behavior of shifting a signed integer to the right and leaves it up to the compiler to decide, I've used a portable version. Even though, I think most compilers will use an arithmetic shift but can we be sure about that? There is this comment in the code that assumes that this is always the case: jdk/src/hotspot/share/utilities/globalDefinitions.hpp Lines 1230 to 1232 in b2d3622
Thanks, |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest update looks reasonable.
|
Thanks Vladimir for reviewing it again! |
|
Looks good to me too. |
|
Thanks Igor for reviewing it again! |
|
@merykitty As you've initially suggested this improved approach, do you agree with the updated version? |
merykitty
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good to me, thanks a lot
| // Right shifts with signed integers are compiler implementation specific according to the C++ standard. | ||
| // Use a portable version instead. | ||
| inline int64_t shift_right_arithmetic(int64_t value, uint8_t shift_amount) { | ||
| return value < 0 ? (int64_t)(~(~(uint64_t)value >> shift_amount)) : (int64_t)((uint64_t)value >> shift_amount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have java_shift_right already, it assumes signed extension, however.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L1240
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've seen that - was just not too sure about using it as the C++ standard leaves it up to the compiler to decide (most compilers will probably just sign extend). However, I see that we have already been using java_shift_right. So it might be fine to just use that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts about that @vnkozlov?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need @kbarrett opinion on C++ code for this case.
I prefer to use already defined functions if they work to avoid duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we've initially pinged the wrong Kim Barrett :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++14 5.8/3 In the description of "E1 >> E2" it says "If E1 has a signed type
and a negative value, the resulting value is implementation-defined."
However, C++20 7.6.7/3 further defines integral arithmetic, as part of
requiring two's-complement behavior.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html
The corresponding C++20 text is "Right-shift on signed integral types is an
arithmetic right shift, which performs sign-extension."
As discussed in the two's complement proposal, all known modern C++ compilers
already behave that way. And it is unlikely any would go off and do something
different now, with C++20 tightening things up.
So I think relying on sign extension by right shift is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment quoted earlier from globalDefinitions.hpp could be expanded to include the above analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot Kim for your input and the detailed comments! I've included it as a comment at the java_shift_right() method and updated the code to directly use java_shift_right() instead of shift_right_arithmetic().
|
Thank you Quan for reviewing it again! |
vnkozlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good.
|
Thanks Vladimir for approving it again. I've re-run some testing with latest master which looked good. |
|
/integrate |
|
Going to push as commit c466cdf.
Your commit was automatically rebased without conflicts. |
|
@chhagedorn Pushed as commit c466cdf. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
The current logic in
MulLNode::mul_ring()casts alljlongvalues of the involved type ranges of a multiplication todoublein order to catch overflows when multiplying the two type ranges. This works fine for values in thejlongrange that are not larger than 253 or lower than -253. For numbers outside that range, we could experience precision errors because these numbers cannot be represented precisely due to the nature of how doubles are represented with a 52 bit mantissa. For example, the number 253 and 253 + 1 both have the samedoublerepresentation of 253.In
MulLNode::mul_ring(), we could do a multiplication with aloorhivalue of a type that is larger than 253 (or smaller than -253). In this case, we might get a different result compared to doing the same multiplication withjlongvalues (even though there is no overflow/underflow). As a result, we returnTypeLong::LONG(bottom type) and missed an optimization opportunity (e.g. folding anIfwhen using theMulLnode in the condition etc.).This was caught by the new verification code added to CCP in JDK-8257197 which checks that after CCP, we should not get a different type anymore when calling
Value()on a node. In the found fuzzer testcase, we run into the precision problem described above for aMulLnode and set the type to bottom during CCP (even though there was no actual overflow). Since the type is bottom, we do not re-add the node to the CCP worklist because the premise is that types only go from top to bottom during CCP. Afterwards, an input type of theMulLnode is updated again in such a way that the previously imprecisedoublemultiplication inmul_ring()is now exact (by coincidence). We then hit the "missed optimization opportunity" assert added by JDK-8257197.To fix this problem, I suggest to switch from a
jlong- >doublemultiplication overflow check to an overflow check without casting. I've used the idea thatx = a * bis the same asb = x / a(fora != 0and!(a = -1 && b = MIN_VALUE)) which is also applied inMath.multiplyExact():jdk/src/java.base/share/classes/java/lang/Math.java
Lines 1022 to 1036 in 66db0bb
The code of
MulLNode::mul_ring()is almost identical toMulINode::mul_ring(). I've refactored that into a template class in order to share the code and simplified the overflow checking by usingMIN/MAX4instead of using nestedif/elsestatements.Thanks,
Christian
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11907/head:pull/11907$ git checkout pull/11907Update a local copy of the PR:
$ git checkout pull/11907$ git pull https://git.openjdk.org/jdk pull/11907/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 11907View PR using the GUI difftool:
$ git pr show -t 11907Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11907.diff