Skip to content
This repository was archived by the owner on Sep 2, 2022. It is now read-only.

8267952: async logging supports to dynamically change tags and decorators #130

Closed
wants to merge 4 commits into from

Conversation

navyxliu
Copy link
Member

@navyxliu navyxliu commented Jun 23, 2021

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8267952: async logging supports to dynamically change tags and decorators

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/130/head:pull/130
$ git checkout pull/130

Update a local copy of the PR:
$ git checkout pull/130
$ git pull https://git.openjdk.java.net/jdk17 pull/130/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 130

View PR using the GUI difftool:
$ git pr show -t 130

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/130.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 23, 2021

👋 Welcome back xliu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 23, 2021
@openjdk
Copy link

openjdk bot commented Jun 23, 2021

@navyxliu The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.java.net label Jun 23, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 23, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Xin,

Thank you for your patience and perseverance with this.

This looks good to me now. The flush token seems to work well and is preferable to potentially blocking the async log thread.

I have a number of comments below, but they are all about comments.

I'm running this through our tier 1-7 testing again.

I have one remaining concern, which is just something we need to keep an eye on, and that is the potential for delaying VM termination if the async log writer gets blocked/delayed.

Thanks,
David

@mlbridge
Copy link

mlbridge bot commented Jun 24, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 24/06/2021 10:55 am, David Holmes wrote:

On Wed, 23 Jun 2021 20:18:32 GMT, Xin Liu <xliu at openjdk.org> wrote:

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.

Hi Xin,

Thank you for your patience and perseverance with this.

This looks good to me now. The flush token seems to work well and is preferable to potentially blocking the async log thread.

I have a number of comments below, but they are all about comments.

I'm running this through our tier 1-7 testing again.

The test passed in all tiers - which is good.

One oddity is that the test log shows at the end:

Warning: asynclog is OFF.
Warning: asynclog is OFF.
Warning: asynclog is OFF.

and I have grepped the source code and the tests and cannot find where
this message is coming from. So it is unclear to me if this is
normal/expected or whether there is something that needs attention here?

Thanks,
David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 24, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 24/06/2021 10:55 am, David Holmes wrote:

On Wed, 23 Jun 2021 20:18:32 GMT, Xin Liu <xliu at openjdk.org> wrote:

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.

Hi Xin,

Thank you for your patience and perseverance with this.

This looks good to me now. The flush token seems to work well and is preferable to potentially blocking the async log thread.

I have a number of comments below, but they are all about comments.

I'm running this through our tier 1-7 testing again.

The test passed in all tiers - which is good.

One oddity is that the test log shows at the end:

Warning: asynclog is OFF.
Warning: asynclog is OFF.
Warning: asynclog is OFF.

and I have grepped the source code and the tests and cannot find where
this message is coming from. So it is unclear to me if this is
normal/expected or whether there is something that needs attention here?

Thanks,
David
-----

@navyxliu
Copy link
Member Author

navyxliu commented Jun 24, 2021

hi, David,

I have one remaining concern, which is just something we need to keep an eye on, and that is the potential for delaying VM termination if the async log writer gets blocked/delayed.

We discussed this before. Yes, if "write_blocking()" does block on I/O part, flush() is blocked. We have this issue before, and synchronous logging has this issue as well. I think Async logging is better than sync logging on this because at least JVM got stuck at termination instead of any safepoint if it did happen.

It isn't perfect. Ideally, we can use flush_sem.wait(timeout) if Semaphore::wait supports timeout for all platforms.

I'm curious now about the fact the buffer size seems only a logical size constraint not an actual out-of-space constraint.

We can have an assertion like this. It's because flush() must be synchronized. The reason I don't enforce it using a mutex or yet another semaphore is that current use cases have already been protected. If all flush() are synchronized, it means that there's one or none token at any time. therefore, the following assertion is right. Shall I put it back? If there's more than one token appears in the buffer, it will hit the assertion I place in write().

assert(_buffer.size() < _buffer_max_size+1, "_buffer is over-sized.");

Warning: asynclog is OFF.
It is from gtest fixture AsyncLogTest, which is expected. We run those tests in two modes.
Normally, gtest run this test in synchronous mode. It doesn't test much.

https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/logging/test_asynclog.cpp#L37

Thank you for you patience. TBH, I coundn't wrap around my head if you did tell me that handshake problem.
I will post a new revision to include your feedbacks.

thanks,
--lx

@mlbridge
Copy link

mlbridge bot commented Jun 24, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 24/06/2021 2:33 pm, Xin Liu wrote:

On Wed, 23 Jun 2021 20:18:32 GMT, Xin Liu <xliu at openjdk.org> wrote:

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.

hi, David,

I have one remaining concern, which is just something we need to keep an eye on, and that is the potential for delaying VM termination if the async log writer gets blocked/delayed.

We discussed this before. Yes, if "write_blocking()" does block on I/O part, flush() is blocked. We have this issue before, and synchronous logging has this issue as well. I think Async logging is better than sync logging on this because at least JVM got stuck at termination instead of any safepoint if it did happen.

It isn't perfect. Ideally, we can use flush_sem.wait(timeout) if Semaphore::wait supports timeout for all platforms.

I agree existing synchronous logging also has a (worse?) problem here.
Not sure there is a suitable fix as any timeout will race with the
asyncLog thread again potentially causing the crashes you just fixed.

I'm curious now about the fact the buffer size seems only a logical size constraint not an actual out-of-space constraint.

We can have an assertion like this. It's because flush() must be thread-safe. The reason I don't enforce it using mutex or yet another semaphore is that current use cases have already been protected. If all flush() are serialized, it means that there's one or none token at any time. therefore, the following assertion is right. Shall I put it back?

`
assert(_buffer.size() < _buffer_max_size+1, "_buffer is over-sized.");
`

Not sure where this assertion was.

But that wasn't what I was asking about. I thought we had
AsyncLogBufferSize because we had a fixed size buffer and hence if the
buffer were full you couldn't insert the token. BUt I was misremembering
- the buffer size is not fixed (it is just a linked list) it is the
number of buffer entries that is limited by AsyncLogBufferSize, to
ensure we don't consume too much memory. So all is fine here.

Warning: asynclog is OFF.
It is from gtest fixture AsyncLogTest, which is expected. We run those tests in two modes.
Normally, gtest run this test in synchronous mode. It doesn't test much.

https://github.com/openjdk/jdk/blob/master/test/hotspot/gtest/logging/test_asynclog.cpp#L37

Doh! I grepped in hotspot/jtreg by mistake. Thanks for clarifying.

Cheers,
David
-----

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comment nits remain but otherwise this is fine.

Thanks,
David

@openjdk
Copy link

openjdk bot commented Jun 24, 2021

@navyxliu This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8267952: async logging supports to dynamically change tags and decorators

Reviewed-by: dholmes, phh

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 22 new commits pushed to the master branch:

  • e4c5446: 8268236: The documentation of the String.regionMatches method contains error
  • d9cb068: 8258746: illegal access to global field _jvmci_old_thread_counters by terminated thread causes crash
  • 6eb734a: 8266269: Lookup::accessClass fails with IAE when accessing an arrayClass with a protected inner class as component class
  • 3d0d27c: 8269351: Proxy::newProxyInstance and MethodHandleProxies::asInterfaceInstance should reject sealed interfaces
  • 824a516: 8269260: Add AVX512 and other SSE + AVX combinations testing for tests which generate vector instructions
  • 1404e4b: 8269302: serviceability/dcmd/framework/InvalidCommandTest.java still fails after JDK-8268433
  • fb0a95f: 8269036: tools/jpackage/share/AppImagePackageTest.java failed with "hdiutil: create failed - Resource busy"
  • 5ebed06: 8269074: (fs) Files.copy fails to copy from /proc on some linux kernel versions
  • d799563: 8256919: BCEL: Utility.encode forget to close
  • 1e3b418: 8269335: Unable to load svml library
  • ... and 12 more: https://git.openjdk.java.net/jdk17/compare/b4743143428a3e0c9a6d1d7dcaf73f7a06882e84...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 24, 2021
@navyxliu
Copy link
Member Author

hi, @dcubed-ojdk
Could you try to PR and see if it can solve the two crash sightings?

@mlbridge
Copy link

mlbridge bot commented Jun 25, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 25/06/2021 10:37 am, Xin Liu wrote:

On Thu, 24 Jun 2021 05:58:54 GMT, Xin Liu <xliu at openjdk.org> wrote:

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.

Xin Liu has updated the pull request incrementally with one additional commit since the last revision:

Update comments according to reviewers's feedback.

hi, @dcubed-ojdk
Could you try to PR and see if it can solve the two crash sightings?

I already re-ran this through our tier 1-7 testing, running the gtests
10 times in each configuration they normally run. No failures observed.

David

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 25, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 25/06/2021 10:37 am, Xin Liu wrote:

On Thu, 24 Jun 2021 05:58:54 GMT, Xin Liu <xliu at openjdk.org> wrote:

This patch fixed intermittent crashes of gtest/AsyncLogGtest.java. The root cause
is that previous flush() can't guarantee flush all pending messages in AsyncLog
buffer. This patch implements flush() using a synchronizaton token and waits for
completion. This approach isn't MT-safe but it can serialize all flush() calls in
a thread.

Two concurrent tests are appended to cover the hazard cases.
This patch also comments LogConfiguration::configure_output() MT-safety.

Xin Liu has updated the pull request incrementally with one additional commit since the last revision:

Update comments according to reviewers's feedback.

hi, @dcubed-ojdk
Could you try to PR and see if it can solve the two crash sightings?

I already re-ran this through our tier 1-7 testing, running the gtests
10 times in each configuration they normally run. No failures observed.

David

@navyxliu
Copy link
Member Author

navyxliu commented Jun 28, 2021

Hi, Reviewers,

May I have another reviewer to take a look at this? This patch can fix the intermittent crashes for jdk17.
By the rules, we need 2 reviewers to approve it before we can integrate it. David helped me verify it.

thanks,
--lx

Copy link
Member

@phohensee phohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@navyxliu
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jun 28, 2021

Going to push as commit 09bb846.
Since your change was applied there have been 25 commits pushed to the master branch:

  • 20640a5: 8269534: Remove java/util/concurrent/locks/Lock/TimedAcquireLeak.java from ProblemList.txt
  • efe8423: 8269403: Fix jpackage tests to gracefully handle jpackage app launcher crashes
  • 8d09596: 8269304: Regression ~5% in 2005 in b27
  • e4c5446: 8268236: The documentation of the String.regionMatches method contains error
  • d9cb068: 8258746: illegal access to global field _jvmci_old_thread_counters by terminated thread causes crash
  • 6eb734a: 8266269: Lookup::accessClass fails with IAE when accessing an arrayClass with a protected inner class as component class
  • 3d0d27c: 8269351: Proxy::newProxyInstance and MethodHandleProxies::asInterfaceInstance should reject sealed interfaces
  • 824a516: 8269260: Add AVX512 and other SSE + AVX combinations testing for tests which generate vector instructions
  • 1404e4b: 8269302: serviceability/dcmd/framework/InvalidCommandTest.java still fails after JDK-8268433
  • fb0a95f: 8269036: tools/jpackage/share/AppImagePackageTest.java failed with "hdiutil: create failed - Resource busy"
  • ... and 15 more: https://git.openjdk.java.net/jdk17/compare/b4743143428a3e0c9a6d1d7dcaf73f7a06882e84...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jun 28, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Jun 28, 2021
@openjdk
Copy link

openjdk bot commented Jun 28, 2021

@navyxliu Pushed as commit 09bb846.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.java.net integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants