Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254125: Assertion in cppVtables.cpp during builds on 32bit Windows #591

Closed

Conversation

iklam
Copy link
Member

@iklam iklam commented Oct 11, 2020

Problem: when iterating over the cloned vtables, the original code assumes that they are laid out consecutively in memory. However, since JDK-8224509, the memory allocated for each of the the cloned vtables is now 8-byte aligned. This introduces gaps between the cloned vtables, and causes the assert to fail.

Fix: the fix is to no longer assume the consecutive memory layout. Instead, use the CppVtables::_index array to access each individual cloned vtable.

Note: I also cleaned up the code significantly. I feel the original code is pretty hard to understand, so if I just do the bare minimum to fix the bug, it will be pretty hard to review.

I would suggest that the reviewers look at just the new version of the code and see if it's working as described (instead of looking at the diff to understand what the bug was and how it has been fixed).

This version still uses the x-macro CPP_VTABLE_TYPES_DO to enumerate over the classes whose vtables need to be cloned. I plan to change that into templates in a future RFE.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8254125: Assertion in cppVtables.cpp during builds on 32bit Windows

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/591/head:pull/591
$ git checkout pull/591

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 11, 2020

👋 Welcome back iklam! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 11, 2020

@iklam The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Oct 11, 2020
@iklam
Copy link
Member Author

iklam commented Oct 11, 2020

/label remove hotspot

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.org label Oct 11, 2020
@openjdk
Copy link

openjdk bot commented Oct 11, 2020

@iklam
The hotspot label was successfully removed.

@iklam
Copy link
Member Author

iklam commented Oct 11, 2020

/label add hotspot-runtime

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 11, 2020
@openjdk
Copy link

openjdk bot commented Oct 11, 2020

@iklam
The hotspot-runtime label was successfully added.

@iklam iklam marked this pull request as ready for review October 11, 2020 04:32
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 11, 2020
@mlbridge
Copy link

mlbridge bot commented Oct 11, 2020

Webrevs

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, but after testing it on x86_32, I wonder if https://bugs.openjdk.java.net/browse/JDK-8254606 is the problem with this patch, or a different issue.

@iklam
Copy link
Member Author

iklam commented Oct 13, 2020

This looks reasonable to me, but after testing it on x86_32, I wonder if https://bugs.openjdk.java.net/browse/JDK-8254606 is the problem with this patch, or a different issue.

JDK-8254606 is also caused by JDK-8224509, which has missed some cases where align_up(xxx, SharedSpaceObjectAlignment) is necessary.

Since the cppVtables assertion fix cannot be easily validated by itself, I've decided to also include the align_up fixes in this PR.

I have tested the fix on linux-x86 (32-bit) but the latest repo seems a bit flaky. Even if I disable CDS, sometime I get failures in (non-CDS) test cases like There cannot be a NullPointerException at bci 14 of method byte java.lang.String.coder().

I was able to run all 200+ CDS tests cases, and I would get about 5 failures due with the above error message. If I ran the failed tests again, they passed. So I think I have fixed the 32-bit issues with CDS, but there are probably other problems with the 32-bit build.

Note that Oracle no longer supports the 32-bit x86 VMs. So my fix is intended to ensure that the CDS code is using SharedSpaceObjectAlignment properly, and I can only test thoroughly on our supported platforms.

@shipilev
Copy link
Member

I have tested the fix on linux-x86 (32-bit) but the latest repo seems a bit flaky. Even if I disable CDS, sometime I get failures in (non-CDS) test cases like There cannot be a NullPointerException at bci 14 of method byte java.lang.String.coder().

That's https://bugs.openjdk.java.net/browse/JDK-8254611 -- I integrated the fix, please merge from master. That said, there are a few failing tests in tier1 anyway, you can ignore them:

  • gc/metaspace/TestCapacityUntilGCWrapAround.java
  • java/foreign/TestMismatch.java
  • java/util/stream/test/org/openjdk/tests/java/util/stream/SpliteratorTest.java
  • tools/javac/lambda/T8031967.java

@iklam
Copy link
Member Author

iklam commented Oct 13, 2020

I have tested the fix on linux-x86 (32-bit) but the latest repo seems a bit flaky. Even if I disable CDS, sometime I get failures in (non-CDS) test cases like There cannot be a NullPointerException at bci 14 of method byte java.lang.String.coder().

That's https://bugs.openjdk.java.net/browse/JDK-8254611 -- I integrated the fix, please merge from master.

Thanks for the info. I merged and all CDS tests passed.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the new patch, and it passes tier1 on Linux x86_32 (well, with some known failures).

@openjdk
Copy link

openjdk bot commented Oct 13, 2020

@iklam This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254125: Assertion in cppVtables.cpp during builds on 32bit Windows

Reviewed-by: shade, ccheung

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • bdda205: 8254369: Node::disconnect_inputs may skip precedences

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 13, 2020
@iklam
Copy link
Member Author

iklam commented Oct 14, 2020

In the function ArchiveBuilder::gather_klass_and_symbol(MetaspaceClosure::Ref* ref, bool read_only)
Instead of BytesPerWord, is BytesPerInt more appropriate since the ref->size() returns an int?

Hi Calvin, thanks for the review.

For the line int bytes = ref->size() * BytesPerWord;, ref->size() has the same API as all the size() methods of the MetaspaceObj classes. I.e., "the number of words needed to hold this object". That's why we need to multiple by BytesPerWord to calculate the number of bytes needed.

@iklam
Copy link
Member Author

iklam commented Oct 16, 2020

/integrate

@openjdk openjdk bot closed this Oct 16, 2020
@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 16, 2020
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 16, 2020
@openjdk
Copy link

openjdk bot commented Oct 16, 2020

@iklam Since your change was applied there has been 1 commit pushed to the master branch:

  • bdda205: 8254369: Node::disconnect_inputs may skip precedences

Your commit was automatically rebased without conflicts.

Pushed as commit 5145bed.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
3 participants