Skip to content

8320525: G1: G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes accesses partially unloaded klass#16766

Closed
tschatzl wants to merge 2 commits intoopenjdk:masterfrom
tschatzl:submit/8320525-distribute-marked-bytes
Closed

8320525: G1: G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes accesses partially unloaded klass#16766
tschatzl wants to merge 2 commits intoopenjdk:masterfrom
tschatzl:submit/8320525-distribute-marked-bytes

Conversation

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 21, 2023

Hi all,

please review this fix that removes the access to a partially unloaded (i.e. unlinked only) Klass used for debug code in G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes.

This starts to fail if metadata purging happens before the call to this methods (as https://bugs.openjdk.org/browse/JDK-8317809 suggests). The test gc/g1/humongousObjects/TestHumongousClassLoader.java starts to crash on linux-x86 with 100% reproduction because it more aggressively uncommits memory when purging metaspace.

The fix fixes the asserts to only access the klass when it should not be unloaded yet.

Testing: failing test case not failing any more, gha

Thanks,
Thomas


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8320525: G1: G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes accesses partially unloaded klass (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16766/head:pull/16766
$ git checkout pull/16766

Update a local copy of the PR:
$ git checkout pull/16766
$ git pull https://git.openjdk.org/jdk.git pull/16766/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16766

View PR using the GUI difftool:
$ git pr show -t 16766

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16766.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 21, 2023

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8320525 8320525: G1: G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes accesses partially unloaded klass Nov 21, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 21, 2023
@openjdk
Copy link

openjdk bot commented Nov 21, 2023

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Nov 21, 2023
@mlbridge
Copy link

mlbridge bot commented Nov 21, 2023

Webrevs

assert(marked_bytes == 0 || obj_size_in_words * HeapWordSize == marked_bytes,
// regions; also, we should not access their header any more them as their
// klass may have been unloaded.
assert(marked_bytes == 0 || cast_to_oop(hr->bottom())->size() * HeapWordSize == marked_bytes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to separate these two cases into two methods, for instance? Taking a step back, why do we even need to call note_end_of_marking on these effectively empty regions?

Copy link
Contributor Author

@tschatzl tschatzl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to call HeapRegion::note_end_of_marking on them, even if they are empty for completeness. It's not completely necessary because reclamation will probably reset them correctly, but it's easier to reason if they (empty and nonempty regions) are handled the same to me.

I.e. so all regions have note_start/end_of_marking called.

Copy link
Contributor Author

@tschatzl tschatzl Nov 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is easier to understand to separate these two cases (empty/non-empty) regions here: both distributing bytes (it's fine to distribute 0 bytes) and being consistent with calling the start/end notifications (and note_end works as expected on empty regions too) for all regions is easier to follow to me compared to having an unnecessary exception.
Because then the question is: why have that exception?

Even if they do not do anything "meaningful" other than resetting some internal state that is later overwritten for these specially handled empty regions anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all regions have note_start/end_of_marking called.

Currently, the pair is invoked on all regions and the filtering (skipping young-region for example) is done inside these methods. However, the logic why a particular kind of region can (should) be skipped really belongs to the caller. The region itself doesn't know how to react to marking-start/end. (This is kind of tied to the ticket of moving marking-related fields outside region.)

why have that exception?

Because live-region and effective-region are diff, and mixing them causes confusion. I think the existence of the new comment "we should not access their header any more them..." demonstrates that it's not super obvious why the current code (before this PR) is problematic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so all regions have note_start/end_of_marking called.

Currently, the pair is invoked on all regions and the filtering (skipping young-region for example) is done inside these methods. However, the logic why a particular kind of region can (should) be skipped really belongs to the caller. The region itself doesn't know how to react to marking-start/end. (This is kind of tied to the ticket of moving marking-related fields outside region.)

We already agreed that these note* methods are basically part of the caller, placed in the wrong location because the members it accesses are in the wrong location. I do not think messing with this here in this CR half-heartedly is a good idea. As soon as the work to move TAMS and PB starts, this is going to change anyway and is imho a more appropriate time to reconsider this (and will probably naturally fix itself).

The problematic code is assertion code, which quite often accesses internals one normally would not. The regular code is independent of whether the region's klass is live or not after all.

Deleting this assert would fix the issue at hand as well. Another option would be to just not do class unloading this early; there is no particular reason to do it right after marking completed.

why have that exception?

Because live-region and effective-region are diff, and mixing them causes confusion.

What is an "effective-region" in this context? I do not understand this sentence.

I think the existence of the new comment "we should not access their header any more them..." demonstrates that it's not super obvious why the current code (before this PR) is problematic.

To me this change indicates that the sanity check code (this problematic statement is part of sanity check code - regular code does not use it) is doing things it should not. The original author (me I guess) correctly considered that we already unloaded classes super-early for some reason, wanted to have some extra check there, but then botched the refactoring (factoring out the obj_size_in_words calculation for the two uses).
That particular new comment is only to make it abundantly clear to not factor out any kind of obj_size calculation any more (I removed the second use in this change).
Maybe not adding the comment would have been better, as the marked_bytes == 0 predicate already indicates just that (and the second use of obj_size_in_words is gone)

Looking at the code again, another source for confusion is maybe wrong comment placement. I will improve these.

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to change anyway and is imho a more appropriate time to reconsider this (and will probably naturally fix itself).

OK.

What is an "effective-region" in this context?

I meant effectively-empty region.

@openjdk
Copy link

openjdk bot commented Nov 24, 2023

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8320525: G1: G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes accesses partially unloaded klass

Reviewed-by: ayang, iwalulya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 83 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 24, 2023
@tschatzl
Copy link
Contributor Author

Thanks @albertnetymk @walulyai for your reviews
/integrate

@openjdk
Copy link

openjdk bot commented Nov 28, 2023

Going to push as commit 21d361e.
Since your change was applied there have been 88 commits pushed to the master branch:

  • dc256fb: 8320061: [nmt] Multiple issues with peak accounting
  • adad132: 8320767: Use := wherever possible in spec.gmk.in
  • 69c0b24: 8320714: java/util/Locale/LocaleProvidersRun.java and java/util/ResourceBundle/modules/visibility/VisibilityTest.java timeout after passing
  • 66ae6d5: 8320899: Select the correct Makefile when running make in build directory
  • ebbef62: 8320769: Remove ill-adviced "make install" target
  • 86bb804: 8320863: dsymutil command leaves around temporary directories
  • db7fedf: 8320358: GHA: ignore jdk* branches
  • e33b6c1: 8319437: NMT should show library names in call stacks
  • 2fae07f: 8319311: JShell Process Builder should be configurable
  • 63ad868: 8319668: Fixup of jar filename typo in BadFactoryTest.sh
  • ... and 78 more: https://git.openjdk.org/jdk/compare/9598ff83860235281a08091128b5df90a4a76916...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 28, 2023
@openjdk openjdk bot closed this Nov 28, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 28, 2023
@openjdk
Copy link

openjdk bot commented Nov 28, 2023

@tschatzl Pushed as commit 21d361e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8320525-distribute-marked-bytes branch January 16, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants