Skip to content

Conversation

@chhagedorn
Copy link
Member

@chhagedorn chhagedorn commented Jan 9, 2023

The current logic in MulLNode::mul_ring() casts all jlong values of the involved type ranges of a multiplication to double in order to catch overflows when multiplying the two type ranges. This works fine for values in the jlong range that are not larger than 253 or lower than -253. For numbers outside that range, we could experience precision errors because these numbers cannot be represented precisely due to the nature of how doubles are represented with a 52 bit mantissa. For example, the number 253 and 253 + 1 both have the same double representation of 253.

In MulLNode::mul_ring(), we could do a multiplication with a lo or hi value of a type that is larger than 253 (or smaller than -253). In this case, we might get a different result compared to doing the same multiplication with jlong values (even though there is no overflow/underflow). As a result, we return TypeLong::LONG (bottom type) and missed an optimization opportunity (e.g. folding an If when using the MulL node in the condition etc.).

This was caught by the new verification code added to CCP in JDK-8257197 which checks that after CCP, we should not get a different type anymore when calling Value() on a node. In the found fuzzer testcase, we run into the precision problem described above for a MulL node and set the type to bottom during CCP (even though there was no actual overflow). Since the type is bottom, we do not re-add the node to the CCP worklist because the premise is that types only go from top to bottom during CCP. Afterwards, an input type of the MulL node is updated again in such a way that the previously imprecise double multiplication in mul_ring() is now exact (by coincidence). We then hit the "missed optimization opportunity" assert added by JDK-8257197.

To fix this problem, I suggest to switch from a jlong - > double multiplication overflow check to an overflow check without casting. I've used the idea that x = a * b is the same as b = x / a (for a != 0 and !(a = -1 && b = MIN_VALUE)) which is also applied in Math.multiplyExact():

public static long multiplyExact(long x, long y) {
long r = x * y;
long ax = Math.abs(x);
long ay = Math.abs(y);
if (((ax | ay) >>> 31 != 0)) {
// Some bits greater than 2^31 that might cause overflow
// Check the result using the divide operator
// and check for the special case of Long.MIN_VALUE * -1
if (((y != 0) && (r / y != x)) ||
(x == Long.MIN_VALUE && y == -1)) {
throw new ArithmeticException("long overflow");
}
}
return r;
}

The code of MulLNode::mul_ring() is almost identical to MulINode::mul_ring(). I've refactored that into a template class in order to share the code and simplified the overflow checking by using MIN/MAX4 instead of using nested if/else statements.

Thanks,
Christian


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11907/head:pull/11907
$ git checkout pull/11907

Update a local copy of the PR:
$ git checkout pull/11907
$ git pull https://git.openjdk.org/jdk pull/11907/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11907

View PR using the GUI difftool:
$ git pr show -t 11907

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11907.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 9, 2023

👋 Welcome back chagedorn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 9, 2023
@openjdk
Copy link

openjdk bot commented Jan 9, 2023

@chhagedorn The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jan 9, 2023
@mlbridge
Copy link

mlbridge bot commented Jan 9, 2023

Webrevs

// x / a does not underflow/overflow since |x / a| <= |x|. If a * b does underflow/overflow, then we'll get a different
// result compared to a * b. Special case MIN_VALUE * -1 whose result is MIN_VALUE.
bool does_overflow(const NativeType a, const NativeType b) const {
NativeType x = java_multiply(a, b);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for 0 values to return false; immediately before all calculations?
Also may be path results of java_multiply() to does_overflow(). Otherwise you execute it twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated my patch with your suggestion.

@openjdk
Copy link

openjdk bot commented Jan 9, 2023

@chhagedorn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers

Reviewed-by: iveresov, kvn, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 15 new commits pushed to the master branch:

  • 5685107: 8302491: NoClassDefFoundError omits the original cause of an error
  • 671a452: 8303606: Memory leaks in Arguments::parse_each_vm_init_arg
  • a95bc7a: 8294974: Convert jdk.jshell/jdk.jshell.execution.LocalExecutionControl to use the Classfile API to instrument classes
  • f835aaa: 8300727: java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java failed with "List wasn't garbage collected"
  • 466ffeb: 8303965: java.net.http.HttpClient should reset the stream if response headers contain malformed header fields
  • 431e702: 8303213: Avoid AtomicReference in TextComponentPrintable
  • 4cf4c59: 8303824: Parallel: Use more strict card table API
  • 8e41bf2: 8303238: Create generalizations for existing LShift ideal transforms
  • 805a4e6: 8303883: Confusing parameter name in G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes
  • 25e7ac2: 8294966: Convert jdk.jartool/sun.tools.jar.FingerPrint to use the ClassFile API to parse JAR entries
  • ... and 5 more: https://git.openjdk.org/jdk/compare/d20bde29f2c0162ea62b42d0b618566cf5d9678a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 9, 2023
@merykitty
Copy link
Member

I would suggest we do a 128-bit multiplication or a 64-bit high multiplication instead. This will help constant folding of MulHiNodes and also MulLNode even if the multiplication overflows. Also, a high multiplication is likely to be cheaper than 4 division.

Thanks.

@chhagedorn
Copy link
Member Author

I would suggest we do a 128-bit multiplication or a 64-bit high multiplication instead. This will help constant folding of MulHiNodes and also MulLNode even if the multiplication overflows. Also, a high multiplication is likely to be cheaper than 4 division.

Thanks for your suggestion. I'm not sure if I understand this correctly, though. How would this help for MulHiNode? And if we have an overflow, I'm not sure how we can get a better type then bottom for the Mul node since we cannot have multiple ranges for a type. Could you explain your idea in more detail?

But if we are just talking about the implementation, how could we do a 128-bit multiplication inside the VM? Using __int128_t (I haven't seen any usage of that though - I guess it is not supported)? However, I think we should not worry too much about the performance given that these are only 4 divisions and we are in the context of C2 compiling a method.

@merykitty
Copy link
Member

@chhagedorn I think we can do similar to what our core library is doing

public static long multiplyHigh(long x, long y) {

This helps in the sense that we currently do not have a 64-bit high multiplication inside the VM, so having one makes adding a more effective MulHiNode::Value become trivial. Regarding MulLNode::Value, even if the multiplication overflows, I think we can still reason about the value of the product if all the high parts are the same. The same goes for MulINode::Value where we can cast them to jlong instead of double.

Thanks.

@chhagedorn
Copy link
Member Author

I think I understand now what you mean, thanks for the explanation. That's a very interesting idea. I will try it out and get back with an updated patch proposal and some tests where this can be beneficial.

Thanks,
Christian

@rose00
Copy link
Contributor

rose00 commented Jan 10, 2023

Today's processors support 64-to-128-bit multiply in just a few cycles. This is a useful operation here and elsewhere. We should bite the bullet and arrange to make it available in HotSpot. It would make this particular problem a little simpler to solve.

Here is a good external example of it being done well in fast, portable code:

https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L294

Here is an example (not to be followed literally) of code that has recently worked well for me on both x86 and ARM:

typedef __int128 uint128_t;
inline uint128_t make_uint128(uint64_t hi, uint64_t lo) {
  uint128_t hi128 = hi, lo128 = lo;
  if ((hi128 | lo128) >> 64)  abort();  // please no sign extension bugs
  return (hi128 << 64) | lo128;
}
inline void take_uint128(uint64_t& hi_out, uint64_t& lo_out, uint128_t x) {
  hi_out = (uint64_t)(x >> 64);
  lo_out = (uint64_t)(x >> 0);
}
inline void full_multiply(uint64_t& hi_out, uint64_t& lo_out, uint64_t x, uint64_t y) {
  take_uint128(hi_out, lo_out, make_uint128(0, x) * make_uint128(0, y));
}

There should be both signed and unsigned versions of this or a similar primitive.

I suggest making the two-output thing (full-multiply, signed or unsigned) to be the primitive for HotSpot, not the mul-hi intrinsic that Java has (high-half of full-multiply, signed or unsigned). Java tilts strongly towards one-output primitives because we don't have Project Valhalla yet, but C++ has no such restriction.

@chhagedorn
Copy link
Member Author

That would indeed make things easier and faster. We would probably also need such a portable solution to provide 128-bit integers for all architectures with a struct that stores the lo and high part. Would be great to get these in at some point, so I'd suggest to file an RFE for it.

For this bug here, I'm not sure if we should wait until the 128-bit integer support makes it in. I think it is okay to spend some more cycles (compared to using 128-bit integers) during the compilation when using @merykitty's proposal with a high multiplication for now. We could still come back to this code again and update it with 128-bit integers later. What do you think?

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@chhagedorn
Copy link
Member Author

I've updated the algorithm to implement @merykitty's idea to get a more precise type if all the cross products overflowed/underflowed the same number of times. I've included a lot of tests to make sure it works properly. For the implementation, I added multiply_high_signed() to globalDefinitions.hpp. I had first tried to make it work with an unsigned version but that somehow did not as some tests kept failing. I've left the unsigned multiply_high_unsigned() in globalDefinitions.hpp for future use. But I could also remove it again as it is currently unused.

As the C++ standard does not define the behavior of shifting a signed integer to the right and leaves it up to the compiler to decide, I've used a portable version. Even though, I think most compilers will use an arithmetic shift but can we be sure about that? There is this comment in the code that assumes that this is always the case:

// For signed shift right, assume C++ implementation >> sign extends.
JAVA_INTEGER_SHIFT_OP(>>, java_shift_right, jint, jint)
JAVA_INTEGER_SHIFT_OP(>>, java_shift_right, jlong, jlong)

Thanks,
Christian

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest update looks reasonable.

@chhagedorn
Copy link
Member Author

Thanks Vladimir for reviewing it again!

@veresov
Copy link
Contributor

veresov commented Feb 9, 2023

Looks good to me too.

@chhagedorn
Copy link
Member Author

Thanks Igor for reviewing it again!

@chhagedorn
Copy link
Member Author

@merykitty As you've initially suggested this improved approach, do you agree with the updated version?

Copy link
Member

@merykitty merykitty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good to me, thanks a lot

// Right shifts with signed integers are compiler implementation specific according to the C++ standard.
// Use a portable version instead.
inline int64_t shift_right_arithmetic(int64_t value, uint8_t shift_amount) {
return value < 0 ? (int64_t)(~(~(uint64_t)value >> shift_amount)) : (int64_t)((uint64_t)value >> shift_amount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have java_shift_right already, it assumes signed extension, however.

https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L1240

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've seen that - was just not too sure about using it as the C++ standard leaves it up to the compiler to decide (most compilers will probably just sign extend). However, I see that we have already been using java_shift_right. So it might be fine to just use that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about that @vnkozlov?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need @kbarrett opinion on C++ code for this case.
I prefer to use already defined functions if they work to avoid duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimbarrett

Looks like we've initially pinged the wrong Kim Barrett :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++14 5.8/3 In the description of "E1 >> E2" it says "If E1 has a signed type
and a negative value, the resulting value is implementation-defined."

However, C++20 7.6.7/3 further defines integral arithmetic, as part of
requiring two's-complement behavior.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html
The corresponding C++20 text is "Right-shift on signed integral types is an
arithmetic right shift, which performs sign-extension."

As discussed in the two's complement proposal, all known modern C++ compilers
already behave that way. And it is unlikely any would go off and do something
different now, with C++20 tightening things up.

So I think relying on sign extension by right shift is fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment quoted earlier from globalDefinitions.hpp could be expanded to include the above analysis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Kim for your input and the detailed comments! I've included it as a comment at the java_shift_right() method and updated the code to directly use java_shift_right() instead of shift_right_arithmetic().

@chhagedorn
Copy link
Member Author

Thank you Quan for reviewing it again!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.

@chhagedorn
Copy link
Member Author

Thanks Vladimir for approving it again. I've re-run some testing with latest master which looked good.

@chhagedorn
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 14, 2023

Going to push as commit c466cdf.
Since your change was applied there have been 26 commits pushed to the master branch:

  • 55aa122: 8304059: Use InstanceKlass in dependencies
  • ec1eb00: 8303415: Add VM_Version::is_intrinsic_supported(id)
  • 31680b2: 8303410: Remove ContentSigner APIs and jarsigner -altsigner and -altsignerpath options
  • 0cc0f06: 8304015: G1: Metaspace-induced GCs should not trigger maximal compaction
  • 43eca1d: 8303910: jdk/classfile/CorpusTest.java failed 1 of 6754 tests
  • b6d70f2: 8303973: Library detection in runtime/ErrorHandling/TestDwarf.java fails on ppc64le RHEL8.5 for libpthread-2.28.so
  • 2bb990e: 8301244: Tidy up compiler specific warnings files
  • c073ef2: 8303482: Update LCMS to 2.15
  • 49181b8: 8303955: RISC-V: Factor out the tmp parameter from copy_memory and copy_memory_v
  • 7bbc5e0: 8300517: Refactor VisibleMemberTable (method members)
  • ... and 16 more: https://git.openjdk.org/jdk/compare/d20bde29f2c0162ea62b42d0b618566cf5d9678a...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 14, 2023
@openjdk openjdk bot closed this Mar 14, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 14, 2023
@openjdk
Copy link

openjdk bot commented Mar 14, 2023

@chhagedorn Pushed as commit c466cdf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

6 participants