Skip to content

Conversation

@dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Jan 28, 2022

A trivial fix to solve undefined behavior in src/hotspot/cpu/aarch64/immediate_aarch64.cpp:
replicate().

I was not able to reproduce the reported failure using:

Xcode: Version 13.2.1 (13C100) which includes clang Apple LLVM 13.0.0 (clang-1300.0.29.30)

so I'm moving forward with the proposed fix from a code inspection
point of view.

I've tested this fix with Mach5 Tier[1-6]. Tier1 and Tier2 have completed with
no failures. Tier[3-6] are still running.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8280476: [macOS] : hotspot arm64 bug exposed by latest clang

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7270/head:pull/7270
$ git checkout pull/7270

Update a local copy of the PR:
$ git checkout pull/7270
$ git pull https://git.openjdk.java.net/jdk pull/7270/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7270

View PR using the GUI difftool:
$ git pr show -t 7270

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7270.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 28, 2022

👋 Welcome back dcubed! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@dcubed-ojdk
Copy link
Member Author

/label add hotspot-runtime
/label add hotspot-compiler

It would be great to hear from @theRealAph on this review thread.

@openjdk
Copy link

openjdk bot commented Jan 28, 2022

@dcubed-ojdk The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-compiler hotspot-compiler-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org labels Jan 28, 2022
@openjdk
Copy link

openjdk bot commented Jan 28, 2022

@dcubed-ojdk
The hotspot-runtime label was successfully added.

@dcubed-ojdk
Copy link
Member Author

Since this fix involved undefined behavior, I'd also like to hear
from @kimbarrett on this review thread. Thanks!

@openjdk
Copy link

openjdk bot commented Jan 28, 2022

@dcubed-ojdk The hotspot-compiler label was already applied.

@dcubed-ojdk dcubed-ojdk marked this pull request as ready for review January 28, 2022 16:54
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 28, 2022
@mlbridge
Copy link

mlbridge bot commented Jan 28, 2022

Webrevs

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird. We can't ever have hit it, or chaos would have ensued.

I'm going to leave it to Andrew Dinn to review this one.

@dcubed-ojdk
Copy link
Member Author

@theRealAph - Thanks for chiming in on the review thread. This is definitely
weird and I have not been able to reproduce the failure mode. The sighting came
indirectly from someone at Apple via @prrace so we haven't been able to
get better version information (yet).

I look forward to getting a review from @adinn.

// would result in undefined behavior.
if (nbits == 64) {
assert(count <= 1, "must be");
return bits;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the < 64 case, shouldn't this be return (count == 0) ? 0 : bits; ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but that's an algorithmic question we'll need @adinn to answer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimbarrett - Thanks for the review! Good catch! I believe you are correct,
but let's wait for @adinn to chime in here.

Copy link

@kimbarrett kimbarrett Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning 0 if count == 0 and bits otherwise is consistent with the old code,
assuming the problematic shift doesn't cause UB data corruption or something
otherwise strange. It might do the "mathematically correct" thing of shifting
out all the bits (so zeroing the value). Another possibility is that the
implemented shift amount for N is N mod word-size (some platforms do that, but
I haven't looked up aarch64), which in this case is 64 mod 64, or 0. I don't
think there are any other non-strange non-UB-data-corruption possibilities.

result is initially 0.
with the old code:
- if count == 0, there are no iterations, and final result is still 0.
- if count == 1, there is one iteration.
  result <<= nbits
    -- if UB corruption, all bets are off
    -- if shifts by 64, result is unchanged (== 0)
    -- if shifts by 64 mod 64 (== 0), result is unchanged (== 0)
  result |= (bits & mask)
    -- result == bits
- if count > 1, for additional iterations
  result <<= nbits
    -- if UB corruption, all bets are off
    -- if shifts by 64, result == 0
    -- if shifts by 64 mod 64 (== 0), result is unchanged (== bits)
  result |= (bits & mask)
    -- result == bits

So with old code, for count > 0, result == bits (or UB corrupted data).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing to note is that replicate is never actually called with count == 0. It's questionable whether it ever should be (especially as this function is really only meant to be used locally to this code) but logically I think the result ought to be 0 if it ever does get called that way.

What replicate is meant to do is replicate the bottom nbits bits of the uint64_t passed via argument bits across successive subfields of result up to (count * bits) worth of bits (sorry ... the names are somewhat inconveniently chosen as far as this discussion is concerned). So, anyway, a count of zero ought to insert 0 bits giving a zero result.

As to how it is used, replicate is only called from method expandLogicalImmediate with two alternative calling patterns. It is sometimes called like this

for (int i = 0; i < 6; i++) {
  int nbits = 1 << i;
  . . .
    replicate(and_bit, 1, nbits)

n.b. argument count is being supplied as nbits and argument nbits is supplied as 1 (yeah, naming is hard). These calls serve to insert anywhere between 1 and 32 copies of a single bit into the rightmost portion of the result.

It is also called like this:

for (int i = 0; i < 6; i++) {
  int nbits = 1 << i;
  . . .
      replicate(and_bits_top, 2 * nbits, 32 / nbits)

This is used to replicate a bit pattern taken from the lower end of the first input across the whole of a uint64_t result. For these calls the input arguments count and nbits satisfy the following invariants:

nbits | 64
count | 64
nbits * count == 64
2 <= nbits <= 32
32 >= count >= 2

I am not actually clear why the caller is calling replicate in the way it does. The algorithm is way more complicated than it actually needs to be. The basic idea is to insert imms 1s into a 2^k bit field, right rotate them by immr and then replicate the result across the full 64 bits. There's a few wrinkles which are explained here for those who are interested:

expandLogicalImmediate(uint32_t immN, uint32_t immr, uint32_t imms, uint64_t& bimm)

  • outputs a bit pattern of width 64 by replicating a bit field of width 2^k and returns 1 or fails and returns 0

  • when immN is 1 then k is 6 and immr/imms are masked to 6 bit integers.

  • when immN is 0 then k is the count of the first 0 bit in imms and immr/imms are masked to k-bit integers (i.e. leading 1s in imms determine dead bits of imms and immr)

A k value of 0 (i.e. immN == 0 and imms = all 1s) fails and returns 0

After masking

An imms value of all 1s (i.e. 2^k - 1) fails and returns 0 (note given the previous check we only get here when immN == 1)

The bit field can now be constructed as follows:

  • imms + 1 specifies the number of 1s to insert into the 2^k-field (note that imms + 1 < 2^k)
  • immr counts how far to right rotate the 2^k-field

The resulting 2^k-field is then replicated across the full 64-bit result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree with one of your points:

replicate is never actually called with count == 0

so that means that this:

assert(count <= 1, "must be");

is a little too loose and should change to:

assert(count == 1, "must be");

if we're going to keep it at all. I think we should keep it, but
I can be convinced otherwise since the original code had no similar assert().

That is entirely reasonable given that replicate is only meant to be used locally and none of these local use cases will ever try to replicate 0 copies of a bit field.

We could actually add one or both of these more general asserts at entry to replicate since they apply to all cases, not just when nbits == 64:

assert(count > 0, "must be")
assert(count * nbits <= 64, "must be")

In other words it is an error to try to replicate zero copies of a bit field and it is an error to try to replicate more copies than can fit into 64 bits.

Note that the second assert removes the need for the current assert and combining it with the first assert also enforces your recommended alternative.

We could also add another general assert

assert(nbits > 0, "must be")

i.e. it is an error to try to replicate an empty bit field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kimbarrett - Thanks for confirming that we have UB here.

@adinn - I'll take a look and updating the assert() calls and retest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing to note is that replicate is never actually called with count == 0.

This is crucial: it means we have no UB.

Could I ask you to take the opportunity add a little of this commentary to the rather opaque code? Pretty please? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I ask you to take the opportunity add a little of this commentary to the rather opaque code? Pretty please? :-)

Sure,this stuff really needs to be better documented in the code itself. I'll do that as a follow-up to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing to note is that replicate is never actually called with count == 0.

This is crucial: it means we have no UB.

The UB occurs when count == 1 and nbits == 64.

@dcubed-ojdk
Copy link
Member Author

dcubed-ojdk commented Feb 3, 2022

The v01 changes have passed Mach5 Tier[126].
Mach5 Tier[345] are still running, only windows-client tests remain.
macosx-aarch64 and linux-aarch64 testing in Tier[1-6] all passed.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Feb 4, 2022

@dcubed-ojdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8280476: [macOS] : hotspot arm64 bug exposed by latest clang

Reviewed-by: kbarrett, adinn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 94 new commits pushed to the master branch:

  • d4b99bc: 8281120: G1: Rename G1BlockOffsetTablePart::alloc_block to update_for_block
  • 66b2c3b: 8280948: [TESTBUG] Write a regression test for JDK-4659800
  • 7207f2a: Merge
  • 01f93dd: 8279385: [test] Adjust sun/security/pkcs12/KeytoolOpensslInteropTest.java after 8278344
  • 3d926dd: 8277795: ldap connection timeout not honoured under contention
  • 51b53a8: 8280913: Create a regression test for JRootPane.setDefaultButton() method
  • 46c6c6f: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64
  • c936e70: 8280593: [PPC64, S390] redundant allocation of MacroAssembler in StubGenerator ctor
  • 63e11cf: 8280970: Cleanup dead code in java.security.Provider
  • e44dc63: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack
  • ... and 84 more: https://git.openjdk.java.net/jdk/compare/cab590517bf705418c7376edd5d7066b13b6dde8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 4, 2022
@dcubed-ojdk
Copy link
Member Author

@adinn or @theRealAph - I need one more reviewer to approve so I can integrate this.

@dcubed-ojdk
Copy link
Member Author

@adinn - Thanks for the review. As you mentioned above, I'll leave it up to
you to improve the comment in a new PR.

/integrate

@openjdk
Copy link

openjdk bot commented Feb 4, 2022

Going to push as commit f5d6fdd.
Since your change was applied there have been 94 commits pushed to the master branch:

  • d4b99bc: 8281120: G1: Rename G1BlockOffsetTablePart::alloc_block to update_for_block
  • 66b2c3b: 8280948: [TESTBUG] Write a regression test for JDK-4659800
  • 7207f2a: Merge
  • 01f93dd: 8279385: [test] Adjust sun/security/pkcs12/KeytoolOpensslInteropTest.java after 8278344
  • 3d926dd: 8277795: ldap connection timeout not honoured under contention
  • 51b53a8: 8280913: Create a regression test for JRootPane.setDefaultButton() method
  • 46c6c6f: 8281043: Intrinsify recursive ObjectMonitor locking for PPC64
  • c936e70: 8280593: [PPC64, S390] redundant allocation of MacroAssembler in StubGenerator ctor
  • 63e11cf: 8280970: Cleanup dead code in java.security.Provider
  • e44dc63: 8271055: Crash during deoptimization with "assert(bb->is_reachable()) failed: getting result from unreachable basicblock" with -XX:+VerifyStack
  • ... and 84 more: https://git.openjdk.java.net/jdk/compare/cab590517bf705418c7376edd5d7066b13b6dde8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 4, 2022
@openjdk openjdk bot closed this Feb 4, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 4, 2022
@openjdk
Copy link

openjdk bot commented Feb 4, 2022

@dcubed-ojdk Pushed as commit f5d6fdd.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dcubed-ojdk dcubed-ojdk deleted the JDK-8280476 branch February 4, 2022 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants