Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272170: Missing memory barrier when checking active state for regions #6324

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Nov 10, 2021

Hi all,

can I have reviews for this fix to memory visibility race where we miss to make sure that the HeapRegion pointer in HeapRegionManager::_regions is visible before the "active"-check for that HeapRegion can return true?

I.e. the problem is this (according to my understanding):

void HeapRegionManager::expand(uint start, uint num_regions, WorkerThreads* pretouch_workers) {
  commit_regions(start, num_regions, pretouch_workers);
  for (uint i = start; i < start + num_regions; i++) {
    HeapRegion* hr = _regions.get_by_index(i);
    if (hr == NULL) {
      hr = new_heap_region(i);
      OrderAccess::storestore();
      _regions.set_by_index(i, hr);
      _allocated_heapregions_length = MAX2(_allocated_heapregions_length, i + 1);
    }
    G1CollectedHeap::heap()->hr_printer()->commit(hr);
  }
  activate_regions(start, num_regions);
}

E.g. we first commit the memory, then create a HeapRegion instance which we properly guard with a StoreStore before assigning it to the _regions array. After that at the end we activate the regions (via activate_regions), meaning that the contents of the new HeapRegion* in the region table are valid.
These bits in the _active_regions bitmap are put with memory barriers (via par_set_range).

In HeapRegionManager::par_iterate, if we iterate over the region map, we first check whether the region is available in the _active_regions bitmap, and then get and pass on the corresponding HeapRegion* to the given closure.

However there is no memory ordering between reading the available-bit in the _active_regions bitmap, so that bit could become visible to the thread iterating over the regions before the contents of the _region map itself, in effect passing a nullptr to the closure which isn't allowed. (Some details about the debugging session in the CR).

void HeapRegionManager::par_iterate(HeapRegionClosure* blk, HeapRegionClaimer* hrclaimer, const uint start_index) const {
  // Every worker will actually look at all regions, skipping over regions that
  // are currently not committed.
  // This also (potentially) iterates over regions newly allocated during GC. This
  // is no problem except for some extra work.
  const uint n_regions = hrclaimer->n_regions();
  for (uint count = 0; count < n_regions; count++) {
    const uint index = (start_index + count) % n_regions;
    assert(index < n_regions, "sanity");
    // Skip over unavailable regions
    if (!is_available(index)) {
      continue;
    }
    HeapRegion* r = _regions.get_by_index(index);
[...]
    bool res = blk->do_heap_region(r);
    if (res) {
      return;
    }
  }
}

The suggested fix is to ensure proper memory ordering in is_available, i.e. the loads must be ordered correctly.

I did check a bit around the usage of this method and is_unavailable, but I do not think there is a similar issue.

Testing: test crashed with a frequency of around 1/500, now passing 5k runs

Thanks,
Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8272170: Missing memory barrier when checking active state for regions

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6324/head:pull/6324
$ git checkout pull/6324

Update a local copy of the PR:
$ git checkout pull/6324
$ git pull https://git.openjdk.java.net/jdk pull/6324/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6324

View PR using the GUI difftool:
$ git pr show -t 6324

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6324.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Nov 10, 2021

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Nov 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 10, 2021

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc label Nov 10, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Nov 10, 2021

Webrevs

Copy link
Contributor

@kstefanj kstefanj left a comment

Looks good, nice debugging!

@openjdk
Copy link

@openjdk openjdk bot commented Nov 10, 2021

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272170: Missing memory barrier when checking active state for regions

Reviewed-by: sjohanss, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 51 new commits pushed to the master branch:

  • 02f7900: 8276932: G1: Annotate methods with override explicitly in g1CollectedHeap.hpp
  • fdcd16a: 8277048: Tiny improvements to the specification text for java.util.Properties.load
  • b231f5b: 8276921: G1: Remove redundant failed evacuation regions calculation in RemoveSelfForwardPtrHRClosure
  • ca2efb7: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend
  • 296780c: 8276983: Small fixes to DumpAllocStat::print_stats
  • 8c5f030: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build
  • 176d21d: 8276824: refactor Thread::is_JavaThread_protected
  • 74f3e69: 8277071: [BACKOUT] JDK-8276743 Make openjdk build Zip Archive generation "reproducible"
  • b85500e: 8276123: ZipFile::getEntry will not return a file entry when there is a directory entry of the same name within a Zip File
  • 0d2980c: 8258192: Obsolete the CriticalJNINatives flag
  • ... and 41 more: https://git.openjdk.java.net/jdk/compare/fd0a25e62b2c8abc3a419c2e80abbcf11c9e882f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Nov 10, 2021
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Nov 15, 2021

Thanks @albertnetymk @kstefanj for your reviews.
/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Nov 15, 2021

Going to push as commit 35a831d.
Since your change was applied there have been 51 commits pushed to the master branch:

  • 02f7900: 8276932: G1: Annotate methods with override explicitly in g1CollectedHeap.hpp
  • fdcd16a: 8277048: Tiny improvements to the specification text for java.util.Properties.load
  • b231f5b: 8276921: G1: Remove redundant failed evacuation regions calculation in RemoveSelfForwardPtrHRClosure
  • ca2efb7: 8274687: JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend
  • 296780c: 8276983: Small fixes to DumpAllocStat::print_stats
  • 8c5f030: 8276453: Undefined behavior in C1 LIR_OprDesc causes SEGV in fastdebug build
  • 176d21d: 8276824: refactor Thread::is_JavaThread_protected
  • 74f3e69: 8277071: [BACKOUT] JDK-8276743 Make openjdk build Zip Archive generation "reproducible"
  • b85500e: 8276123: ZipFile::getEntry will not return a file entry when there is a directory entry of the same name within a Zip File
  • 0d2980c: 8258192: Obsolete the CriticalJNINatives flag
  • ... and 41 more: https://git.openjdk.java.net/jdk/compare/fd0a25e62b2c8abc3a419c2e80abbcf11c9e882f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 15, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 15, 2021

@tschatzl Pushed as commit 35a831d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8272170-memory-barrier-checking-active branch Nov 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc integrated
3 participants