8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers #11907

chhagedorn · 2023-01-09T16:19:46Z

The current logic in MulLNode::mul_ring() casts all jlong values of the involved type ranges of a multiplication to double in order to catch overflows when multiplying the two type ranges. This works fine for values in the jlong range that are not larger than 2⁵³ or lower than -2⁵³. For numbers outside that range, we could experience precision errors because these numbers cannot be represented precisely due to the nature of how doubles are represented with a 52 bit mantissa. For example, the number 2⁵³ and 2⁵³ + 1 both have the same double representation of 2⁵³.

In MulLNode::mul_ring(), we could do a multiplication with a lo or hi value of a type that is larger than 2⁵³ (or smaller than -2⁵³). In this case, we might get a different result compared to doing the same multiplication with jlong values (even though there is no overflow/underflow). As a result, we return TypeLong::LONG (bottom type) and missed an optimization opportunity (e.g. folding an If when using the MulL node in the condition etc.).

This was caught by the new verification code added to CCP in JDK-8257197 which checks that after CCP, we should not get a different type anymore when calling Value() on a node. In the found fuzzer testcase, we run into the precision problem described above for a MulL node and set the type to bottom during CCP (even though there was no actual overflow). Since the type is bottom, we do not re-add the node to the CCP worklist because the premise is that types only go from top to bottom during CCP. Afterwards, an input type of the MulL node is updated again in such a way that the previously imprecise double multiplication in mul_ring() is now exact (by coincidence). We then hit the "missed optimization opportunity" assert added by JDK-8257197.

To fix this problem, I suggest to switch from a jlong - > double multiplication overflow check to an overflow check without casting. I've used the idea that x = a * b is the same as b = x / a (for a != 0 and !(a = -1 && b = MIN_VALUE)) which is also applied in Math.multiplyExact():

jdk/src/java.base/share/classes/java/lang/Math.java

Lines 1022 to 1036 in 66db0bb

    
           public static long multiplyExact(long x, long y) { 
        
               long r = x * y; 
        
               long ax = Math.abs(x); 
        
               long ay = Math.abs(y); 
        
               if (((ax | ay) >>> 31 != 0)) { 
        
                   // Some bits greater than 2^31 that might cause overflow 
        
                   // Check the result using the divide operator 
        
                   // and check for the special case of Long.MIN_VALUE * -1 
        
                  if (((y != 0) && (r / y != x)) || 
        
                      (x == Long.MIN_VALUE && y == -1)) { 
        
                       throw new ArithmeticException("long overflow"); 
        
                   } 
        
               } 
        
               return r; 
        
           }

The code of MulLNode::mul_ring() is almost identical to MulINode::mul_ring(). I've refactored that into a template class in order to share the code and simplified the overflow checking by using MIN/MAX4 instead of using nested if/else statements.

Thanks,
Christian

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers

Reviewers

Igor Veresov (@veresov - Reviewer) ⚠️ Review applies to 0453789f
Vladimir Kozlov (@vnkozlov - Reviewer)
Quan Anh Mai (@merykitty - Committer) ⚠️ Review applies to 0453789f

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11907/head:pull/11907
$ git checkout pull/11907

Update a local copy of the PR:
$ git checkout pull/11907
$ git pull https://git.openjdk.org/jdk pull/11907/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11907

View PR using the GUI difftool:
$ git pr show -t 11907

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11907.diff

…casting errors with large numbers

bridgekeeper · 2023-01-09T16:21:02Z

👋 Welcome back chagedorn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-01-09T16:24:40Z

@chhagedorn The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-01-09T16:28:48Z

Webrevs

vnkozlov · 2023-01-09T20:19:49Z

src/hotspot/share/opto/mulnode.cpp

+  // x / a does not underflow/overflow since |x / a| <= |x|. If a * b does underflow/overflow, then we'll get a different
+  // result compared to a * b. Special case MIN_VALUE * -1 whose result is MIN_VALUE.
+  bool does_overflow(const NativeType a, const NativeType b) const {
+    NativeType x = java_multiply(a, b);


Should we check for 0 values to return false; immediately before all calculations?
Also may be path results of java_multiply() to does_overflow(). Otherwise you execute it twice.

I've updated my patch with your suggestion.

openjdk · 2023-01-09T22:42:22Z

@chhagedorn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers

Reviewed-by: iveresov, kvn, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 15 new commits pushed to the master branch:

5685107: 8302491: NoClassDefFoundError omits the original cause of an error
671a452: 8303606: Memory leaks in Arguments::parse_each_vm_init_arg
a95bc7a: 8294974: Convert jdk.jshell/jdk.jshell.execution.LocalExecutionControl to use the Classfile API to instrument classes
f835aaa: 8300727: java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java failed with "List wasn't garbage collected"
466ffeb: 8303965: java.net.http.HttpClient should reset the stream if response headers contain malformed header fields
431e702: 8303213: Avoid AtomicReference in TextComponentPrintable
4cf4c59: 8303824: Parallel: Use more strict card table API
8e41bf2: 8303238: Create generalizations for existing LShift ideal transforms
805a4e6: 8303883: Confusing parameter name in G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes
25e7ac2: 8294966: Convert jdk.jartool/sun.tools.jar.FingerPrint to use the ClassFile API to parse JAR entries
... and 5 more: https://git.openjdk.org/jdk/compare/d20bde29f2c0162ea62b42d0b618566cf5d9678a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

merykitty · 2023-01-10T02:57:55Z

I would suggest we do a 128-bit multiplication or a 64-bit high multiplication instead. This will help constant folding of MulHiNodes and also MulLNode even if the multiplication overflows. Also, a high multiplication is likely to be cheaper than 4 division.

Thanks.

chhagedorn · 2023-01-10T14:16:52Z

I would suggest we do a 128-bit multiplication or a 64-bit high multiplication instead. This will help constant folding of MulHiNodes and also MulLNode even if the multiplication overflows. Also, a high multiplication is likely to be cheaper than 4 division.

Thanks for your suggestion. I'm not sure if I understand this correctly, though. How would this help for MulHiNode? And if we have an overflow, I'm not sure how we can get a better type then bottom for the Mul node since we cannot have multiple ranges for a type. Could you explain your idea in more detail?

But if we are just talking about the implementation, how could we do a 128-bit multiplication inside the VM? Using __int128_t (I haven't seen any usage of that though - I guess it is not supported)? However, I think we should not worry too much about the performance given that these are only 4 divisions and we are in the context of C2 compiling a method.

merykitty · 2023-01-10T14:47:03Z

@chhagedorn I think we can do similar to what our core library is doing

jdk/src/java.base/share/classes/java/lang/Math.java

Line 1399 in 8b0133f

public static long multiplyHigh(long x, long y) {

This helps in the sense that we currently do not have a 64-bit high multiplication inside the VM, so having one makes adding a more effective MulHiNode::Value become trivial. Regarding MulLNode::Value, even if the multiplication overflows, I think we can still reason about the value of the product if all the high parts are the same. The same goes for MulINode::Value where we can cast them to jlong instead of double.

Thanks.

chhagedorn · 2023-01-10T16:40:39Z

I think I understand now what you mean, thanks for the explanation. That's a very interesting idea. I will try it out and get back with an updated patch proposal and some tests where this can be beneficial.

Thanks,
Christian

rose00 · 2023-01-10T16:54:40Z

Today's processors support 64-to-128-bit multiply in just a few cycles. This is a useful operation here and elsewhere. We should bite the bullet and arrange to make it available in HotSpot. It would make this particular problem a little simpler to solve.

Here is a good external example of it being done well in fast, portable code:

https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L294

Here is an example (not to be followed literally) of code that has recently worked well for me on both x86 and ARM:

typedef __int128 uint128_t;
inline uint128_t make_uint128(uint64_t hi, uint64_t lo) {
  uint128_t hi128 = hi, lo128 = lo;
  if ((hi128 | lo128) >> 64)  abort();  // please no sign extension bugs
  return (hi128 << 64) | lo128;
}
inline void take_uint128(uint64_t& hi_out, uint64_t& lo_out, uint128_t x) {
  hi_out = (uint64_t)(x >> 64);
  lo_out = (uint64_t)(x >> 0);
}
inline void full_multiply(uint64_t& hi_out, uint64_t& lo_out, uint64_t x, uint64_t y) {
  take_uint128(hi_out, lo_out, make_uint128(0, x) * make_uint128(0, y));
}

There should be both signed and unsigned versions of this or a similar primitive.

I suggest making the two-output thing (full-multiply, signed or unsigned) to be the primitive for HotSpot, not the mul-hi intrinsic that Java has (high-half of full-multiply, signed or unsigned). Java tilts strongly towards one-output primitives because we don't have Project Valhalla yet, but C++ has no such restriction.

chhagedorn · 2023-01-11T13:44:28Z

That would indeed make things easier and faster. We would probably also need such a portable solution to provide 128-bit integers for all architectures with a struct that stores the lo and high part. Would be great to get these in at some point, so I'd suggest to file an RFE for it.

For this bug here, I'm not sure if we should wait until the 128-bit integer support makes it in. I think it is okay to spend some more cycles (compared to using 128-bit integers) during the compilation when using @merykitty's proposal with a high multiplication for now. We could still come back to this code again and update it with 128-bit integers later. What do you think?

vnkozlov

Looks good.

… have the same number of overflows/underflows as suggested by Quan

chhagedorn · 2023-01-20T15:37:17Z

I've updated the algorithm to implement @merykitty's idea to get a more precise type if all the cross products overflowed/underflowed the same number of times. I've included a lot of tests to make sure it works properly. For the implementation, I added multiply_high_signed() to globalDefinitions.hpp. I had first tried to make it work with an unsigned version but that somehow did not as some tests kept failing. I've left the unsigned multiply_high_unsigned() in globalDefinitions.hpp for future use. But I could also remove it again as it is currently unused.

As the C++ standard does not define the behavior of shifting a signed integer to the right and leaves it up to the compiler to decide, I've used a portable version. Even though, I think most compilers will use an arithmetic shift but can we be sure about that? There is this comment in the code that assumes that this is always the case:

jdk/src/hotspot/share/utilities/globalDefinitions.hpp

Lines 1230 to 1232 in b2d3622

    
           // For signed shift right, assume C++ implementation >> sign extends. 
        
           JAVA_INTEGER_SHIFT_OP(>>, java_shift_right, jint, jint) 
        
           JAVA_INTEGER_SHIFT_OP(>>, java_shift_right, jlong, jlong)

Thanks,
Christian

vnkozlov

Latest update looks reasonable.

chhagedorn · 2023-02-09T07:01:16Z

Thanks Vladimir for reviewing it again!

veresov · 2023-02-09T14:46:31Z

Looks good to me too.

chhagedorn · 2023-02-10T08:23:08Z

Thanks Igor for reviewing it again!

chhagedorn · 2023-02-13T07:03:39Z

@merykitty As you've initially suggested this improved approach, do you agree with the updated version?

merykitty

Otherwise looks good to me, thanks a lot

merykitty · 2023-02-13T16:40:03Z

src/hotspot/share/utilities/globalDefinitions.hpp

+// Right shifts with signed integers are compiler implementation specific according to the C++ standard.
+// Use a portable version instead.
+inline int64_t shift_right_arithmetic(int64_t value, uint8_t shift_amount) {
+  return value < 0 ? (int64_t)(~(~(uint64_t)value >> shift_amount)) : (int64_t)((uint64_t)value >> shift_amount);


I believe we have java_shift_right already, it assumes signed extension, however.

https://github.com/openjdk/jdk/blob/master/src/hotspot/share/utilities/globalDefinitions.hpp#L1240

Yes, I've seen that - was just not too sure about using it as the C++ standard leaves it up to the compiler to decide (most compilers will probably just sign extend). However, I see that we have already been using java_shift_right. So it might be fine to just use that as well.

What are your thoughts about that @vnkozlov?

We need @kbarrett opinion on C++ code for this case.
I prefer to use already defined functions if they work to avoid duplication.

@kimbarrett

Looks like we've initially pinged the wrong Kim Barrett :-)

C++14 5.8/3 In the description of "E1 >> E2" it says "If E1 has a signed type
and a negative value, the resulting value is implementation-defined."

However, C++20 7.6.7/3 further defines integral arithmetic, as part of
requiring two's-complement behavior.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r3.html
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1236r1.html
The corresponding C++20 text is "Right-shift on signed integral types is an
arithmetic right shift, which performs sign-extension."

As discussed in the two's complement proposal, all known modern C++ compilers
already behave that way. And it is unlikely any would go off and do something
different now, with C++20 tightening things up.

So I think relying on sign extension by right shift is fine.

That comment quoted earlier from globalDefinitions.hpp could be expanded to include the above analysis.

Thanks a lot Kim for your input and the detailed comments! I've included it as a comment at the java_shift_right() method and updated the code to directly use java_shift_right() instead of shift_right_arithmetic().

chhagedorn · 2023-02-13T18:57:45Z

Thank you Quan for reviewing it again!

… from Kim

vnkozlov

Good.

chhagedorn · 2023-03-14T14:56:05Z

Thanks Vladimir for approving it again. I've re-run some testing with latest master which looked good.

chhagedorn · 2023-03-14T14:56:13Z

/integrate

openjdk · 2023-03-14T14:57:57Z

Going to push as commit c466cdf.
Since your change was applied there have been 26 commits pushed to the master branch:

55aa122: 8304059: Use InstanceKlass in dependencies
ec1eb00: 8303415: Add VM_Version::is_intrinsic_supported(id)
31680b2: 8303410: Remove ContentSigner APIs and jarsigner -altsigner and -altsignerpath options
0cc0f06: 8304015: G1: Metaspace-induced GCs should not trigger maximal compaction
43eca1d: 8303910: jdk/classfile/CorpusTest.java failed 1 of 6754 tests
b6d70f2: 8303973: Library detection in runtime/ErrorHandling/TestDwarf.java fails on ppc64le RHEL8.5 for libpthread-2.28.so
2bb990e: 8301244: Tidy up compiler specific warnings files
c073ef2: 8303482: Update LCMS to 2.15
49181b8: 8303955: RISC-V: Factor out the tmp parameter from copy_memory and copy_memory_v
7bbc5e0: 8300517: Refactor VisibleMemberTable (method members)
... and 16 more: https://git.openjdk.org/jdk/compare/d20bde29f2c0162ea62b42d0b618566cf5d9678a...master

Your commit was automatically rebased without conflicts.

openjdk · 2023-03-14T14:58:24Z

@chhagedorn Pushed as commit c466cdf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to …

393dcdf

…casting errors with large numbers

openjdk bot added the rfr Pull request is ready for review label Jan 9, 2023

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jan 9, 2023

vnkozlov reviewed Jan 9, 2023

View reviewed changes

veresov approved these changes Jan 9, 2023

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Jan 9, 2023

review

0d438ca

vnkozlov approved these changes Jan 13, 2023

View reviewed changes

chhagedorn added 2 commits January 18, 2023 13:23

Merge branch 'master' into JDK-8299546

7c71687

Change algorithm to handle overflows/underflows if the cross products…

0453789

… have the same number of overflows/underflows as suggested by Quan

vnkozlov approved these changes Feb 8, 2023

View reviewed changes

veresov approved these changes Feb 9, 2023

View reviewed changes

merykitty approved these changes Feb 13, 2023

View reviewed changes

Merge branch 'master' into JDK-8299546

18428a6

chhagedorn added 3 commits March 13, 2023 15:43

Merge branch 'master' into JDK-8299546

8458395

Use java_shift_right instead of portable version with updated comment…

dc6896e

… from Kim

Fix build failures on Mac

391260d

vnkozlov approved these changes Mar 13, 2023

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Mar 14, 2023

openjdk bot closed this Mar 14, 2023

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 14, 2023

	public static long multiplyExact(long x, long y) {
	long r = x * y;
	long ax = Math.abs(x);
	long ay = Math.abs(y);
	if (((ax \| ay) >>> 31 != 0)) {
	// Some bits greater than 2^31 that might cause overflow
	// Check the result using the divide operator
	// and check for the special case of Long.MIN_VALUE * -1
	if (((y != 0) && (r / y != x)) \|\|
	(x == Long.MIN_VALUE && y == -1)) {
	throw new ArithmeticException("long overflow");
	}
	}
	return r;
	}

8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers #11907

8299546: C2: MulLNode::mul_ring() wrongly returns bottom type due to casting errors with large numbers #11907

Uh oh!

Conversation

chhagedorn commented Jan 9, 2023 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Jan 9, 2023

Uh oh!

openjdk bot commented Jan 9, 2023

Uh oh!

mlbridge bot commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openjdk bot commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merykitty commented Jan 10, 2023

Uh oh!

chhagedorn commented Jan 10, 2023

Uh oh!

merykitty commented Jan 10, 2023

Uh oh!

chhagedorn commented Jan 10, 2023

Uh oh!

rose00 commented Jan 10, 2023

Uh oh!

chhagedorn commented Jan 11, 2023

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

chhagedorn commented Jan 20, 2023

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

chhagedorn commented Feb 9, 2023

Uh oh!

veresov commented Feb 9, 2023

Uh oh!

chhagedorn commented Feb 10, 2023

Uh oh!

chhagedorn commented Feb 13, 2023

Uh oh!

merykitty left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chhagedorn commented Feb 13, 2023

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

chhagedorn commented Mar 14, 2023

Uh oh!

chhagedorn commented Jan 9, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Jan 9, 2023 •

edited

Loading

openjdk bot commented Jan 9, 2023 •

edited

Loading