Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8330275: Crash in XMark::follow_array #18941

Closed
wants to merge 8 commits into from

Conversation

ashu-mehra
Copy link
Contributor

@ashu-mehra ashu-mehra commented Apr 24, 2024

This PR addresses the issue in ZGC where the number of address offset bits can go beyond the limit imposed by the encoding scheme in mark stack, thereby causing the encoding to fail.
Encoding of partial array offset in mark stack requires that the max address bit be no more than 46 bit. But the current mechanism to probe maximum address offset bits on aarch64, riscv and ppc platforms can return value larger that 44 bits. This patch sets the maximum address offset bits to 44.

I have updated the generational mode to avoid subtracting 3 bits from the maximum address offset bit probed by the system, as the generational mode does not use multi-mapping.

I have also updated the code to set MarkPartialArrayMinSizeShift dynamically depending on the number of address offset bits used. This would avoid running into such problem again if in future maximum address offset bits is increased beyond 44.

For some reason (that I can't comprehend from the code) the existing implementation for probing the max addressable bit for ppc in non-generation ZGC is very different from other platforms and from generational mode as well. I have kept the existing implementation as is and just fixed it to ensure it does not return value greater than 44 bits.

Testing: test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x on x86 tier1, tier2 and tier3 on aarch64 using fastdebug build with options JTREG="EXTRA_PROBLEM_LISTS=ProblemList-zgc.txt;JAVA_OPTIONS=-XX:+UseZGC -XX:+ZVerifyOops;JOBS=4" (as per the suggestion in JDK-8330275)

Update: Striked out the changes that are not relevant now that it is only doing a point fix for aarch64


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18941/head:pull/18941
$ git checkout pull/18941

Update a local copy of the PR:
$ git checkout pull/18941
$ git pull https://git.openjdk.org/jdk.git pull/18941/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18941

View PR using the GUI difftool:
$ git pr show -t 18941

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18941.diff

Webrev

Link to Webrev Comment

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@bridgekeeper
Copy link

bridgekeeper bot commented Apr 24, 2024

👋 Welcome back asmehra! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 24, 2024

@ashu-mehra This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8330275: Crash in XMark::follow_array

Reviewed-by: stefank, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 275 new commits pushed to the master branch:

  • ad78b7f: 8331185: Enable compiler memory limits in debug builds
  • aafa15f: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails
  • edd47c1: 8308033: The jcmd thread dump related tests should test virtual threads
  • 1aebab7: 8320995: RISC-V: C2 PopCountVI
  • 0eff492: 8330278: Have SSLSocketTemplate.doClientSide use loopback address
  • c6f611c: 8331798: Remove unused arg of checkErgonomics() in TestMaxHeapSizeTools.java
  • 0e1dca7: 8331715: Remove virtual specifiers in ContiguousSpace
  • 7f29904: 8330005: RandomGeneratorFactory.getDefault() throws exception when the runtime image only has java.base module
  • 2baacfc: 8331789: ubsan: deoptimization.cpp:403:29: runtime error: load of value 208, which is not a valid value for type 'bool'
  • 7b79426: 8278353: Provide Duke as default favicon in Simple Web Server
  • ... and 265 more: https://git.openjdk.org/jdk/compare/064628471b83616b4463baa78618d1b7a66d0c7c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@ashu-mehra ashu-mehra marked this pull request as draft April 24, 2024 20:24
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 24, 2024
@openjdk
Copy link

openjdk bot commented Apr 24, 2024

@ashu-mehra The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot hotspot-dev@openjdk.org and removed rfr Pull request is ready for review labels Apr 24, 2024
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra ashu-mehra marked this pull request as ready for review April 24, 2024 20:43
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 24, 2024
@mlbridge
Copy link

mlbridge bot commented Apr 24, 2024

Webrevs

@ashu-mehra
Copy link
Contributor Author

I am currently trying to get access to aarch64 system and run the tests test/hotspot/jtreg/gc/z and test/hotspot/jtreg/gc/x.
I would appreciate if some one can also test the ppc and riscv changes as I don't have access to such systems.

@ashu-mehra
Copy link
Contributor Author

/label add hotspot-gc

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Apr 25, 2024
@openjdk
Copy link

openjdk bot commented Apr 25, 2024

@ashu-mehra
The hotspot-gc label was successfully added.

Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ashu-mehra,

Thanks for fixing this issue.

There's a number of changes style changes I would like to make to make sure that the code looks more inline with what the rest of the ZGC code looks like. But before we start with that I would like to request that we skip making the changes to marking stack code and limit the changes to only the probing code. Doing so will make it easier to get this fix reviewed and delivered.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra
Copy link
Contributor Author

@stefank I am trying to understand the reason behind your suggestion to remove the changes in marking stack code. Are they not correct or is it that they don't belong to this PR?
Anyway I have removed them from this PR.

@stefank
Copy link
Member

stefank commented Apr 26, 2024

@stefank I am trying to understand the reason behind your suggestion to remove the changes in marking stack code. Are they not correct or is it that they don't belong to this PR? Anyway I have removed them from this PR.

To me, it was not bleeding obvious that they were the right thing to do, and given other changes that doesn't follow the grown ZGC coding style, I wanted suggest a way forward for you to get this bug fixed, with less resistance from us ZGC developers/maintainers. That was the reasoning.

@tstuefe
Copy link
Member

tstuefe commented Apr 26, 2024

@stefank I am trying to understand the reason behind your suggestion to remove the changes in marking stack code. Are they not correct or is it that they don't belong to this PR? Anyway I have removed them from this PR.

To me, it was not bleeding obvious that they were the right thing to do, and given other changes that doesn't follow the grown ZGC coding style, I wanted suggest a way forward for you to get this bug fixed, with less resistance from us ZGC developers/maintainers. That was the reasoning.

I agree with Stefan. I would keep the patch as minimal as possible to make it easier to follow the actual error that has been fixed, and to make it easier for backporters to decide what to downport.

Code cleanups can happen in a separate RFE.

Ashu, are the other platforms actually broken? If yes, which ones? If a platform is not broken, I would defer touching it up to a separate cleanup RFE. Again because of patch clarity.

@stefank
Copy link
Member

stefank commented Apr 26, 2024

So, the absolute minimal point-fix would be to change the value 47 to 46, which would be very easy to backport, right?

If we still want to make the change that is currently in the PR I would like to tweak the code along the lines of what I've in my branch here:
master...stefank:jdk:pr_18941

The extra patch:

  • Moves the global constants to the file I think they more belong to
  • Moves all the probe bit handling into ZPlatformAddressOffsetBits
  • Extracts some of the "bit-to-bits" calculations into intermediate constants

The last two points where done to (at least for me) see and understand why the various plus and minuses where performed.

I didn't touch the PPC code, since it's quite difference and I don't want to risk messing it up.

@ashu-mehra
Copy link
Contributor Author

I agree from the point of view of backporting, point-fix is all we need in this PR.

@tstuefe As for the other platforms (riscv and ppc), looking at their code they seem to be broken in the same way as aarch64 but then the problem only happens if the user runs with > 1TB heap size with more than 48 addressable bits.
Again, in the spirit of "do not touch if it is not broken", I am fine if we restrict the change to just aarch64.

@tstuefe @stefank please let me know if you agree with just doing the point-fix to aarch64.

@tstuefe
Copy link
Member

tstuefe commented Apr 27, 2024

I agree from the point of view of backporting, point-fix is all we need in this PR.

@tstuefe As for the other platforms (riscv and ppc), looking at their code they seem to be broken in the same way as aarch64 but then the problem only happens if the user runs with > 1TB heap size with more than 48 addressable bits. Again, in the spirit of "do not touch if it is not broken", I am fine if we restrict the change to just aarch64.

@tstuefe @stefank please let me know if you agree with just doing the point-fix to aarch64.

Absolutely. We can do any platform testing on other platforms and cleanups in subsequent RFEs.

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra
Copy link
Contributor Author

Sorry for the long absence on this PR. I have updated the PR to just do a point fix for aarch64. I have also done tier1, tier2 and tier3 tests on aarch64.

Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

FWIW, the 128TB and 64GB numbers are just confusing when we are talking about a bit position. If the 46th bit succeeds the usable address range is 128TB, and the 46th bit will account for 64TB out of those 128TB. I wouldn't mind at all if we just ripped out the mentions of 64TB and 64GB here.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 8, 2024
Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra
Copy link
Contributor Author

I wouldn't mind at all if we just ripped out the mentions of 64TB and 64GB here.

Done

@ashu-mehra
Copy link
Contributor Author

@tstuefe does this look ok?

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, question inline.

@@ -142,9 +142,9 @@
// * 63-48 Fixed (16-bits, always zero)
//

// Default value if probing is not implemented for a certain platform: 128TB
static const size_t DEFAULT_MAX_ADDRESS_BIT = 47;
// Minimum value returned, if probing fails: 64GB
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason you removed the comment for MINIMUM_MAX_ADDRESS_BIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I think I misunderstood stefank's suggestion. I should have just removed the values 64GB and 128TB mentioned in the comment. Let me restore the rest.

…l numbers that can be confusing

Signed-off-by: Ashutosh Mehra <asmehra@redhat.com>
@ashu-mehra
Copy link
Contributor Author

As the last commit is a trivial change to add the comments back, I am not requesting new review and integrating it as is.

@ashu-mehra
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented May 8, 2024

Going to push as commit 42b1d85.
Since your change was applied there have been 277 commits pushed to the master branch:

  • 230fac8: 8331941: Make CollectedHeap::parallel_object_iterator public
  • c845261: 8331924: Parallel: Remove unused MutableSpace::mangle_unused_area_complete
  • ad78b7f: 8331185: Enable compiler memory limits in debug builds
  • aafa15f: 8331208: Memory stress test that checks OutOfMemoryError stack trace fails
  • edd47c1: 8308033: The jcmd thread dump related tests should test virtual threads
  • 1aebab7: 8320995: RISC-V: C2 PopCountVI
  • 0eff492: 8330278: Have SSLSocketTemplate.doClientSide use loopback address
  • c6f611c: 8331798: Remove unused arg of checkErgonomics() in TestMaxHeapSizeTools.java
  • 0e1dca7: 8331715: Remove virtual specifiers in ContiguousSpace
  • 7f29904: 8330005: RandomGeneratorFactory.getDefault() throws exception when the runtime image only has java.base module
  • ... and 267 more: https://git.openjdk.org/jdk/compare/064628471b83616b4463baa78618d1b7a66d0c7c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 8, 2024
@openjdk openjdk bot closed this May 8, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 8, 2024
@openjdk
Copy link

openjdk bot commented May 8, 2024

@ashu-mehra Pushed as commit 42b1d85.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

// Minimum value returned, if probing fails: 64GB
// Default value if probing is not implemented for a certain platform
// Max address bit is restricted by implicit assumptions in the code, for instance
// the bit layout of XForwardingEntry or Partial array entry (see XMarkStackEntry) in mark stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was copy-n-pasted without updating the names to ZForwardingEntry and ZMarkStackEntry.

@ashu-mehra
Copy link
Contributor Author

/backport jdk21u-dev

@openjdk
Copy link

openjdk bot commented May 28, 2024

@ashu-mehra the backport was successfully created on the branch backport-ashu-mehra-42b1d858 in my personal fork of openjdk/jdk21u-dev. To create a pull request with this backport targeting openjdk/jdk21u-dev:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 42b1d858 from the openjdk/jdk repository.

The commit being backported was authored by Ashutosh Mehra on 8 May 2024 and was reviewed by Stefan Karlsson and Thomas Stuefe.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21u-dev:

$ git fetch https://github.com/openjdk-bots/jdk21u-dev.git backport-ashu-mehra-42b1d858:backport-ashu-mehra-42b1d858
$ git checkout backport-ashu-mehra-42b1d858
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21u-dev.git backport-ashu-mehra-42b1d858

⚠️ @ashu-mehra You are not yet a collaborator in my fork openjdk-bots/jdk21u-dev. An invite will be sent out and you need to accept it before you can proceed.

@ashu-mehra
Copy link
Contributor Author

/backport jdk22u

@openjdk
Copy link

openjdk bot commented May 28, 2024

@ashu-mehra the backport was successfully created on the branch backport-ashu-mehra-42b1d858 in my personal fork of openjdk/jdk22u. To create a pull request with this backport targeting openjdk/jdk22u:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 42b1d858 from the openjdk/jdk repository.

The commit being backported was authored by Ashutosh Mehra on 8 May 2024 and was reviewed by Stefan Karlsson and Thomas Stuefe.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk22u:

$ git fetch https://github.com/openjdk-bots/jdk22u.git backport-ashu-mehra-42b1d858:backport-ashu-mehra-42b1d858
$ git checkout backport-ashu-mehra-42b1d858
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk22u.git backport-ashu-mehra-42b1d858

⚠️ @ashu-mehra You are not yet a collaborator in my fork openjdk-bots/jdk22u. An invite will be sent out and you need to accept it before you can proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants