Skip to content

JDK-8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException #3628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

schmelter-sap
Copy link
Contributor

@schmelter-sap schmelter-sap commented Apr 22, 2021

This fixes a race condition in the CompressionBackend class of the heap dump code.

The race happens when the thread iterating the heap wants to write the data it has collected. If the compression backend has worker threads, the buffer to write would just be added to a queue and the worker threads would then compress (if needed) and write the buffer. But if no worker threads are present, the thread doing the iteration must do this itself.

The iterating thread checks the _nr_of_threads member under lock protection and if it is 0, it assume it would have to do the work itself. It then releases the lock and enters the loop of the worker threads for one round. But after the lock has been released, a worker thread could be registered and handle the buffer itself. Then the iterating thread would wait until another buffer is available, which will never happen.

The fix is to take the buffer to write out of the queue in the iterating thread under lock protection and the do the unlocking.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3628/head:pull/3628
$ git checkout pull/3628

Update a local copy of the PR:
$ git checkout pull/3628
$ git pull https://git.openjdk.java.net/jdk pull/3628/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3628

View PR using the GUI difftool:
$ git pr show -t 3628

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3628.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 22, 2021

👋 Welcome back rschmelter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 22, 2021

@schmelter-sap The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Apr 22, 2021
@schmelter-sap schmelter-sap marked this pull request as ready for review April 27, 2021 14:39
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 27, 2021
@mlbridge
Copy link

mlbridge bot commented Apr 27, 2021

Webrevs

@schmelter-sap
Copy link
Contributor Author

/label serviceability

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Apr 28, 2021
@openjdk
Copy link

openjdk bot commented Apr 28, 2021

@schmelter-sap
The serviceability label was successfully added.

@linzang
Copy link
Contributor

linzang commented May 12, 2021

Dear Ralf(@schmelter-sap),
I am not a reviewer, just want to state that this change looks good to me and also helpful.
I have encountered an issue that happened 2 times among 100 times try when testing #2261. And after applied this change, it works normally for another 100 times test. Just FYI.

BRs,
Lin

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ralf,

your change looks good to me.

Thanks for fixing,
Richard.

void CompressionBackend::thread_loop(bool single_run) {
// Register if this is a worker thread.
if (!single_run) {
void CompressionBackend::thread_loop() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify CompressionBackend::thread_loop() further:

void CompressionBackend::thread_loop() {
  {
    MonitorLocker ml(_lock, Mutex::_no_safepoint_check_flag);
    _nr_of_threads++;
  }

  WriteWork* work = get_work();
  while (work != NULL) {
      do_compress(work);
      finish_work(work);
      work = get_work();
  }

  MonitorLocker ml(_lock, Mutex::_no_safepoint_check_flag);
  _nr_of_threads--;
  assert(_nr_of_threads >= 0, "Too many threads finished");
  ml.notify_all();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: why is ml.notify_all() in line 275 needed at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thanks for the review Lin and Richard.

The notify_all() is indeed not needed anymore. It was originally needed when the worker threads were newly created threads and we had to wait for them to finish at the end of the dump operation. But since we now use the GC work gang, this can be removed.

I will update the PR with your suggestions.

Best regards,
Ralf

@openjdk
Copy link

openjdk bot commented May 12, 2021

@schmelter-sap This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException

Reviewed-by: rrich, cjplummer

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • 360928d: 8260046: Assert left >= right in pointer_delta() methods
  • 5eda812: 8267180: Typo in copyright header for HashesTest
  • e90388b: 8266461: tools/jmod/hashes/HashesTest.java fails: static @test methods
  • 599d07c: 8263006: Add optimization for Max()Node and Min()Node
  • 16ca370: 8265694: Investigate test StressHiddenClasses.java
  • af4cd04: 8266291: (jrtfs) Calling Files.exists may break the JRT filesystem
  • ebcf399: 8266622: Optimize Class.descriptorString() and Class.getCanonicalName0()
  • 644f28c: 8266810: Move trivial Matcher code to cpu-specific header files
  • 88907bb: 8266904: Use function pointer typedefs in OopOopIterateDispatch
  • 301095c: 8266795: Remove dead code LowMemoryDetectorDisabler

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 12, 2021
Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

@schmelter-sap
Copy link
Contributor Author

/integrate

@openjdk openjdk bot closed this May 17, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 17, 2021
@openjdk
Copy link

openjdk bot commented May 17, 2021

@schmelter-sap Since your change was applied there have been 22 commits pushed to the master branch:

  • a555fd8: 8264734: Some SA classes could use better hashCode() implementation
  • 2313a21: 8266637: CHT: Add insert_and_get method
  • 7b736ec: 8266489: Enable G1 to use large pages on Windows when region size is larger than 2m
  • f422787: 8266073: Regression ~2% in Derby after 8261804
  • 02f895c: 8252685: APIs that require JavaThread should take JavaThread arguments
  • 2066f49: 8266764: [REDO] JDK-8255493 Support for pre-generated java.lang.invoke classes in CDS dynamic archive
  • 8c71144: 8265153: add time based test for ThreadMXBean.getThreadInfo() and ThreadInfo.getLockOwnerName()
  • 10cafd2: 8267153: Problemlist jdk/jfr/event/gc/collection/TestG1ParallelPhases.java to remove the noise from CI
  • f3fb5a4: 8266942: gtest/GTestWrapper.java os.iso8601_time_vm failed
  • 7ab6dc8: 6676643: Improve current C_GetAttributeValue native implementation
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/1e0ecd6d56541c948e0d120295f5008d3248598f...master

Your commit was automatically rebased without conflicts.

Pushed as commit a29612e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@schmelter-sap schmelter-sap deleted the JDK-8255661 branch July 27, 2022 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants