Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device. #7227

Closed
wants to merge 6 commits into from

Conversation

tkiriyama
Copy link
Contributor

@tkiriyama tkiriyama commented Jan 26, 2022

I think JFR should report an error message and jvm should shut down safely instead of gurantee failure.

For instance, jdk.jfr.internal.Repository#newChunk() reports an appropriate message and stops jvm as below
by using JfrJavaSupport::abort().

[0.673s][error][jfr] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
[0.673s][error][jfr,system] Could not create chunk in repository /tmp/2022_01_12_22_32_42_18030, class java.io.IOException: Unable to create JFR repository directory using base location (/tmp)
[0.673s][error][jfr,system] An irrecoverable error in Jfr. Shutting down VM...

I modified StreamWriterHost not to call guarantee failure but to call JfrJavaSupport::abort().
I added a argument to JfrJavaSupport::abort() which tells os::abort() not to put out core
because there is no space on device.
Could you please review the fix?


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device.

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7227/head:pull/7227
$ git checkout pull/7227

Update a local copy of the PR:
$ git checkout pull/7227
$ git pull https://git.openjdk.java.net/jdk pull/7227/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7227

View PR using the GUI difftool:
$ git pr show -t 7227

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7227.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 26, 2022

👋 Welcome back tkiriyama! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 26, 2022

@tkiriyama The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jan 26, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 26, 2022
@mlbridge
Copy link

mlbridge bot commented Jan 26, 2022

Webrevs

@dholmes-ora
Copy link
Member

/label add hotspot-jfr

@openjdk openjdk bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label Jan 28, 2022
@openjdk
Copy link

openjdk bot commented Jan 28, 2022

@dholmes-ora
The hotspot-jfr label was successfully added.

@dholmes-ora
Copy link
Member

JFR team need to review this.

@tkiriyama
Copy link
Contributor Author

Hi, JFR team

Could somebody please review this fix for 8280684?

Copy link

@mgronlun mgronlun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Takuya, thanks for your contribution.

@@ -100,7 +100,7 @@ class JfrJavaSupport : public AllStatic {
static bool set_handler(jobject clazz, jobject handler, TRAPS);

// critical
static void abort(jstring errorMsg, TRAPS);
static void abort(jstring errorMsg, TRAPS, bool dump_core=true);
Copy link

@mgronlun mgronlun Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is necessary. The existing core dump logic already handles the case where a core file cannot be generated due to disk full.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review.

Whether or not hotspot generate a core file is determined by the argument of vm_abort(bool dump_core). If the argument is "true", vm_abort(bool dump_core) calls os::abort(bool dump_core) to generate a core file.
See the following code:

void vm_abort(bool dump_core) {

I think JfrJavaSupport::abort() should pass "false" as an argument to vm_abort(bool dump_core).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. My point was that the os won't be able to create a core file if there is no available space.

But this is indeed more succinct, if we don't want to create a core categorically from this location.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation but the filesystem that is full, and the filesystem where a core would be written, need not be the same file system. That said, a core dump in this case seems unwarranted.

System.out.printf(" Wrote large file in %d ns (%d ms) %n", t1 - t0, TimeUnit.NANOSECONDS.toMillis(t1 - t0));
raf.close();
}
}
Copy link

@mgronlun mgronlun Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the effort, but we can't have a test that intentionally provokes a disk full situation. Instead, the updated error message will have to be manually verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use @run main/manual in TestJFRDiskFull.java. I think this label means manually test.
I mannually confirmed this test to pass with jtreg after this fix.

Copy link

@mgronlun mgronlun Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, I missed the @run main/manual decoration. I don't think we have any JFR tests that use it.

If you can ensure this test is excluded for automatic runs, then perhaps...but then I don't know who will get to run it, so the value of the test is questionable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual tests are excluded if the jtreg test run specifies to run automatic tests only (as we do in our CI). So this really only serves as a validation of the fix, with no real expectation that anyone will necessarily every run it again. Even as a locally run test, filling the disk can easily lead to unexpected problems for other processes - including the swap/paging file on Windows - so this is a risky test to run.

/*
* @test
* @bug 8280684
* @summary JfrRecorderService failes with guarantee(num_written > 0) when no space left on device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: failes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually "summary" is meant to describe what the test does, not what the original bug was about

@mgronlun
Copy link

Takuya, can I suggest keeping your proposed changes but excluding the test?

@tkiriyama
Copy link
Contributor Author

Takuya, can I suggest keeping your proposed changes but excluding the test?

OK. This test is surely risky. I remove this test.

ThreadInVMfromNative transition(jt);
JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false);
}
else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else block can be removed. Just put the guarantee inline with the other code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much. You're right. I removed the else block.

log_error(jfr, system)("%s", msg);
JavaThread* jt = JavaThread::current();
ThreadInVMfromNative transition(jt);
JfrJavaSupport::abort(JfrJavaSupport::new_string(msg, jt), jt, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi again Takuya, I'm sorry, but I should have noticed this earlier:

I now see that the code needs to allocate a Java string oop to conform to the existing abort function signature, which caters to invocations from Java. Then abort() immediately strips out the c-string from the oop. To be correct, also headers for logging/log.hpp and runtime/thread.inline.hpp should need be included.

I believe we can simplify this by updating the abort() signature so that we don't need to drag in those extra dependencies. Please see my following comment where I suggest a way to do this.

Thanks for your patience
Markus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your valuable comments. I agree with you. I corrected this fix in accordance with your suggestions.

@mgronlun
Copy link


diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
index 95b96e02c06..015d4ebe065 100644
--- a/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
+++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.cpp
@@ -563,14 +563,16 @@ void JfrJavaSupport::throw_runtime_exception(const char* message, TRAPS) {
 
 void JfrJavaSupport::abort(jstring errorMsg, JavaThread* t) {
   DEBUG_ONLY(check_java_thread_in_vm(t));
-
   ResourceMark rm(t);
-  const char* const error_msg = c_str(errorMsg, t);
-  if (error_msg != NULL) {
-    log_error(jfr, system)("%s",error_msg);
+  abort(c_str(errorMsg, t));
+}
+
+void JfrJavaSupport::abort(const char* error_msg, bool dump_core /* true */) {
+  if (error_msg != nullptr) {
+    log_error(jfr, system)("%s", error_msg);
   }
   log_error(jfr, system)("%s", "An irrecoverable error in Jfr. Shutting down VM...");
-  vm_abort();
+  vm_abort(dump_core);
 }
 
 JfrJavaSupport::CAUSE JfrJavaSupport::_cause = JfrJavaSupport::VM_ERROR;
diff --git a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
index 53d6eed68a8..1ec5a884b4b 100644
--- a/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
+++ b/src/hotspot/share/jfr/jni/jfrJavaSupport.hpp
@@ -112,6 +112,7 @@ class JfrJavaSupport : public AllStatic {
 
   // critical
   static void abort(jstring errorMsg, TRAPS);
+  static void abort(const char* error_msg, bool dump_core = true);
   static void uncaught_exception(jthrowable throwable, JavaThread* t);
 
   // asserts
diff --git a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
index 3a7ec286381..73404a1aede 100644
--- a/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
+++ b/src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp
@@ -25,8 +25,8 @@
 #ifndef SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP
 #define SHARE_JFR_WRITERS_JFRSTREAMWRITERHOST_INLINE_HPP
 
+#include "jfr/jni/jfrJavaSupport.hpp"
 #include "jfr/writers/jfrStreamWriterHost.hpp"
-
 #include "runtime/os.hpp"
 
 template <typename Adapter, typename AP>
@@ -77,6 +77,9 @@ inline void StreamWriterHost<Adapter, AP>::write_bytes(const u1* buf, intptr_t l
   while (len > 0) {
     const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len;
     const ssize_t num_written = os::write(_fd, buf, nBytes);
+    if (errno == ENOSPC) {
+      JfrJavaSupport::abort("Failed to write to jfr stream because no space left on device", false);
+    }
     guarantee(num_written > 0, "Nothing got written, or os::write() failed");
     _stream_pos += num_written;
     len -= num_written;

Copy link

@mgronlun mgronlun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you.

@openjdk
Copy link

openjdk bot commented Feb 22, 2022

@tkiriyama This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8280684: JfrRecorderService failes with guarantee(num_written > 0) when no space left on device.

Reviewed-by: mgronlun

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 335 new commits pushed to the master branch:

  • bf19fc6: 8280357: user.home = "?" when running with systemd DynamicUser=true
  • b6843a1: 8005885: enhance PrintCodeCache to print more data
  • 23995f8: 8281525: Enable Zc:strictStrings flag in Visual Studio build
  • 20e78f7: 8282307: Parallel: Incorrect discovery mode in PCReferenceProcessor
  • 0b6862e: 8282348: Remove unused CardTable::dirty_card_iterate
  • 6fab8a2: 8277204: Implement PAC-RET branch protection on Linux/AArch64
  • abc0ce1: 8282316: Operation before String case conversion
  • 0796620: 8281944: JavaDoc throws java.lang.IllegalStateException: ERRONEOUS
  • 231e48f: 8282200: ShouldNotReachHere() reached by AsyncGetCallTrace after JDK-8280422
  • f4486a1: 8262400: runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java fails in test_ame5_compiled_vtable_stub with wrapper
  • ... and 325 more: https://git.openjdk.java.net/jdk/compare/2eab86b513a9e4566b3f5989f899ae44280d3834...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@mgronlun) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 22, 2022
@tkiriyama
Copy link
Contributor Author

I hope this change is integrated.

@tkiriyama
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 25, 2022
@openjdk
Copy link

openjdk bot commented Feb 25, 2022

@tkiriyama
Your change (at version a6958ad) is now ready to be sponsored by a Committer.

@mgronlun
Copy link

/sponsor

@openjdk
Copy link

openjdk bot commented Feb 25, 2022

Going to push as commit 9471f24.
Since your change was applied there have been 337 commits pushed to the master branch:

  • 3efd6aa: 8282347: AARCH64: Untaken branch in has_negatives stub
  • cd36be4: 8206187: javax/management/remote/mandatory/connection/DefaultAgentFilterTest.java fails with Port already in use
  • bf19fc6: 8280357: user.home = "?" when running with systemd DynamicUser=true
  • b6843a1: 8005885: enhance PrintCodeCache to print more data
  • 23995f8: 8281525: Enable Zc:strictStrings flag in Visual Studio build
  • 20e78f7: 8282307: Parallel: Incorrect discovery mode in PCReferenceProcessor
  • 0b6862e: 8282348: Remove unused CardTable::dirty_card_iterate
  • 6fab8a2: 8277204: Implement PAC-RET branch protection on Linux/AArch64
  • abc0ce1: 8282316: Operation before String case conversion
  • 0796620: 8281944: JavaDoc throws java.lang.IllegalStateException: ERRONEOUS
  • ... and 327 more: https://git.openjdk.java.net/jdk/compare/2eab86b513a9e4566b3f5989f899ae44280d3834...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 25, 2022
@openjdk openjdk bot closed this Feb 25, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Feb 25, 2022
@openjdk
Copy link

openjdk bot commented Feb 25, 2022

@mgronlun @tkiriyama Pushed as commit 9471f24.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org hotspot-jfr hotspot-jfr-dev@openjdk.org integrated Pull request has been integrated
3 participants