Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException #3628

Closed
wants to merge 6 commits into from

Conversation

@schmelter-sap
Copy link
Contributor

@schmelter-sap schmelter-sap commented Apr 22, 2021

This fixes a race condition in the CompressionBackend class of the heap dump code.

The race happens when the thread iterating the heap wants to write the data it has collected. If the compression backend has worker threads, the buffer to write would just be added to a queue and the worker threads would then compress (if needed) and write the buffer. But if no worker threads are present, the thread doing the iteration must do this itself.

The iterating thread checks the _nr_of_threads member under lock protection and if it is 0, it assume it would have to do the work itself. It then releases the lock and enters the loop of the worker threads for one round. But after the lock has been released, a worker thread could be registered and handle the buffer itself. Then the iterating thread would wait until another buffer is available, which will never happen.

The fix is to take the buffer to write out of the queue in the iterating thread under lock protection and the do the unlocking.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3628/head:pull/3628
$ git checkout pull/3628

Update a local copy of the PR:
$ git checkout pull/3628
$ git pull https://git.openjdk.java.net/jdk pull/3628/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3628

View PR using the GUI difftool:
$ git pr show -t 3628

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3628.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 22, 2021

👋 Welcome back rschmelter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Apr 22, 2021

@schmelter-sap The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@schmelter-sap schmelter-sap marked this pull request as ready for review Apr 27, 2021
@openjdk openjdk bot added the rfr label Apr 27, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 27, 2021

Webrevs

@schmelter-sap
Copy link
Contributor Author

@schmelter-sap schmelter-sap commented Apr 28, 2021

/label serviceability

@openjdk
Copy link

@openjdk openjdk bot commented Apr 28, 2021

@schmelter-sap
The serviceability label was successfully added.

@linzang
Copy link
Contributor

@linzang linzang commented May 12, 2021

Dear Ralf(@schmelter-sap),
I am not a reviewer, just want to state that this change looks good to me and also helpful.
I have encountered an issue that happened 2 times among 100 times try when testing #2261. And after applied this change, it works normally for another 100 times test. Just FYI.

BRs,
Lin

Copy link
Contributor

@reinrich reinrich left a comment

Hi Ralf,

your change looks good to me.

Thanks for fixing,
Richard.

void CompressionBackend::thread_loop(bool single_run) {
// Register if this is a worker thread.
if (!single_run) {
void CompressionBackend::thread_loop() {
Copy link
Contributor

@reinrich reinrich May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify CompressionBackend::thread_loop() further:

void CompressionBackend::thread_loop() {
  {
    MonitorLocker ml(_lock, Mutex::_no_safepoint_check_flag);
    _nr_of_threads++;
  }

  WriteWork* work = get_work();
  while (work != NULL) {
      do_compress(work);
      finish_work(work);
      work = get_work();
  }

  MonitorLocker ml(_lock, Mutex::_no_safepoint_check_flag);
  _nr_of_threads--;
  assert(_nr_of_threads >= 0, "Too many threads finished");
  ml.notify_all();
}

Copy link
Contributor

@reinrich reinrich May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: why is ml.notify_all() in line 275 needed at all?

Copy link
Contributor Author

@schmelter-sap schmelter-sap May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

thanks for the review Lin and Richard.

The notify_all() is indeed not needed anymore. It was originally needed when the worker threads were newly created threads and we had to wait for them to finish at the end of the dump operation. But since we now use the GC work gang, this can be removed.

I will update the PR with your suggestions.

Best regards,
Ralf

@openjdk
Copy link

@openjdk openjdk bot commented May 12, 2021

@schmelter-sap This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255661: TestHeapDumpOnOutOfMemoryError fails with EOFException

Reviewed-by: rrich, cjplummer

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • 360928d: 8260046: Assert left >= right in pointer_delta() methods
  • 5eda812: 8267180: Typo in copyright header for HashesTest
  • e90388b: 8266461: tools/jmod/hashes/HashesTest.java fails: static @test methods
  • 599d07c: 8263006: Add optimization for Max()Node and Min()Node
  • 16ca370: 8265694: Investigate test StressHiddenClasses.java
  • af4cd04: 8266291: (jrtfs) Calling Files.exists may break the JRT filesystem
  • ebcf399: 8266622: Optimize Class.descriptorString() and Class.getCanonicalName0()
  • 644f28c: 8266810: Move trivial Matcher code to cpu-specific header files
  • 88907bb: 8266904: Use function pointer typedefs in OopOopIterateDispatch
  • 301095c: 8266795: Remove dead code LowMemoryDetectorDisabler

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label May 12, 2021
Copy link
Contributor

@reinrich reinrich left a comment

LGTM 👍

Copy link
Contributor

@plummercj plummercj left a comment

Changes look good.

@schmelter-sap
Copy link
Contributor Author

@schmelter-sap schmelter-sap commented May 17, 2021

/integrate

@openjdk openjdk bot closed this May 17, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels May 17, 2021
@openjdk
Copy link

@openjdk openjdk bot commented May 17, 2021

@schmelter-sap Since your change was applied there have been 22 commits pushed to the master branch:

  • a555fd8: 8264734: Some SA classes could use better hashCode() implementation
  • 2313a21: 8266637: CHT: Add insert_and_get method
  • 7b736ec: 8266489: Enable G1 to use large pages on Windows when region size is larger than 2m
  • f422787: 8266073: Regression ~2% in Derby after 8261804
  • 02f895c: 8252685: APIs that require JavaThread should take JavaThread arguments
  • 2066f49: 8266764: [REDO] JDK-8255493 Support for pre-generated java.lang.invoke classes in CDS dynamic archive
  • 8c71144: 8265153: add time based test for ThreadMXBean.getThreadInfo() and ThreadInfo.getLockOwnerName()
  • 10cafd2: 8267153: Problemlist jdk/jfr/event/gc/collection/TestG1ParallelPhases.java to remove the noise from CI
  • f3fb5a4: 8266942: gtest/GTestWrapper.java os.iso8601_time_vm failed
  • 7ab6dc8: 6676643: Improve current C_GetAttributeValue native implementation
  • ... and 12 more: https://git.openjdk.java.net/jdk/compare/1e0ecd6d56541c948e0d120295f5008d3248598f...master

Your commit was automatically rebased without conflicts.

Pushed as commit a29612e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants