Skip to content

8267952: async logging supports to dynamically change tags and decorators #4408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

navyxliu
Copy link
Member

@navyxliu navyxliu commented Jun 8, 2021

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8267952: async logging supports to dynamically change tags and decorators

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/4408/head:pull/4408
$ git checkout pull/4408

Update a local copy of the PR:
$ git checkout pull/4408
$ git pull https://git.openjdk.java.net/jdk pull/4408/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 4408

View PR using the GUI difftool:
$ git pr show -t 4408

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/4408.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 8, 2021

👋 Welcome back xliu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 8, 2021
@openjdk
Copy link

openjdk bot commented Jun 8, 2021

@navyxliu The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 8, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 8, 2021

Webrevs

@dholmes-ora
Copy link
Member

dholmes-ora commented Jun 11, 2021

Hi Xin,

Unfortunately the new test is crashing:

[ RUN ] LogConfigurationTest.reconfigure_decorators_MT_vm

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fb565530ee0, pid=24293, tid=24303
#
# JRE version: Java(TM) SE Runtime Environment (17.0) (fastdebug build 17-internal+0-LTS-2021-06-09-0450595.david.holmes.jdk-dev.git)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 17-internal+0-LTS-2021-06-09-0450595.david.holmes.jdk-dev.git, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x6bee0]  flockfile+0x0
#
# Core dump will be written. Default location: Core dumps may be processed with "/opt/core.sh %p" (or dumping to  testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_misc/scratch/1/core.24293)
#
# An error report file with more information is saved as:
# testoutput/test-support/jtreg_open_test_hotspot_jtreg_hotspot_misc/scratch/1/hs_err_pid24293.log
[       OK ] LogConfigurationTest.reconfigure_decorators_MT_vm (1018 ms)
[ RUN      ] LogConfigurationTest.reconfigure_tags_MT_vm
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#

Current thread (0x000055cff4e59eb0):  Thread "AsyncLog Thread" [stack: 0x00007fb530d63000,0x00007fb530e63000] [id=24303]

Stack: [0x00007fb530d63000,0x00007fb530e63000],  sp=0x00007fb530e61ae8,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x6bee0]  flockfile+0x0
V  [libjvm.so+0x15fb542]  LogFileOutput::write_blocking(LogDecorations const&, char const*)+0x32
V  [libjvm.so+0x15f5b6d]  AsyncLogWriter::run()+0x35d
V  [libjvm.so+0x1c0c290]  Thread::call_run()+0x100
V  [libjvm.so+0x18d98ae]  thread_native_entry(Thread*)+0x10e

I'll try to investigate more on Monday

@dholmes-ora
Copy link
Member

Unfortunately the new test is crashing:
...
I'll try to investigate more on Monday

I think the fix for 8268165 is missing. That would explain the crashes.
David

@navyxliu
Copy link
Member Author

hi, Kim,

Thanks for the head-up.

I did prepare a patch to support Atomic::store. I think I followed what you guys did.
https://cr.openjdk.java.net/~xliu/8267952/atomic_store_LogDecorators.diff

Because this patch uses some C++ tricks(well, advanced techniques), I have to convince myself that it's necessary. so far, I found that it always generates the same store instructions in x86_64.

@dholmes-ora
The biggest concern is the MT-safety. Users can potentially change unified logging configurations anytime in the presence of dcmd. That's why I added 2 concurrent gtests to demonstrate it's MT-safe no matter it's synchronous logging or async logging. We must be sure that SEGV fault never happen.

As far as I known, there are 2 reasons why we can see that crash sight.

  1. LogConfiguration::configure_output may be still lack of a memory barrier somewhere.
    Some logs directed to output[idx] can still enqueue after AsyncLogWriter::flush(). delete_output(idx) is the hazard.

If so, synchronous logging should observe the same problem. The fact is we only observed the crash in AsyncLog Thread.
Further, I ran LogConfigurationTest.reconfigure_tags_MT a couple of hours on x86_64 and aarch64 individually. I still can't capture the crash.

  1. LogFileOutput::rotate() may nullify _stream.
    yes, I missed bugfix JDK-8268165, but this patch uses "filecount=0" to prevent log from rotating in LogConfigurationTest.reconfigure_tags_MT. In my understanding, rotate() should not be invoked by this test.

I have merged the master branch, which includes 8268165. Could you try it again and see if it will still crash at the same place?

@mlbridge
Copy link

mlbridge bot commented Jun 14, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 14/06/2021 4:59 pm, Xin Liu wrote:

On Mon, 14 Jun 2021 04:51:28 GMT, Xin Liu <xliu at openjdk.org> wrote:

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

- Merge branch 'master' into JDK-8267952
- Add comments in LogTagSet::log().
- 8267952: async logging supports to dynamically change tags and decorators

hi, Kim,

Thanks for the head-up.

I did prepare a patch to support Atomic::store. I think I followed what you guys did.
https://cr.openjdk.java.net/~xliu/8267952/atomic_store_LogDecorators.diff

Because this patch uses some C++ tricks(well, advanced techniques), I have to convince myself that it's necessary. so far, I found that it always generates the same store instructions in x86_64.

@dholmes-ora
The biggest concern is the MT-safety. Users can potentially change unified logging configurations anytime in the presence of dcmd. That's why I added 2 concurrent gtests to demonstrate it's MT-safe no matter it's synchronous logging or async logging. We must be sure that SEGV fault never happen.

As far as I known, there are 2 reasons why we can see that crash sight.

1. `LogConfiguration::configure_output` may be still lack of a memory barrier somewhere.
Some logs directed to output[idx] can still enqueue after `AsyncLogWriter::flush()`. delete_output(idx) is the hazard.

If so, synchronous logging should observe the same problem. The fact is we only observed the crash in AsyncLog Thread.
Further, I ran `LogConfigurationTest.reconfigure_tags_MT` a couple of hours on x86_64 and aarch64 individually. I still can't capture the crash.

2. LogFileOutput::rotate() may nullify _stream.
yes, I missed bugfix JDK-8268165, but this patch uses "filecount=0" to prevent log from rotating in `LogConfigurationTest.reconfigure_tags_MT`. In my understanding, rotate() should not be invoked by this test.

I have merged the master branch, which includes 8268165. Could you try it again and see if it will still crash at the same place?

The updated branch passes the tier6 testing that previously failed. If
rotate() should not be being used by the test, yet the rotate() fix has
addressed the crashes, then apparently something else must be happening
here that is not fully understood.

David
-----

@navyxliu
Copy link
Member Author

The updated branch passes the tier6 testing that previously failed. If
rotate() should not be being used by the test, yet the rotate() fix has
addressed the crashes, then apparently something else must be happening
here that is not fully understood.

Thanks for the feedback. To be honest, I am not sure what the root cause is.
I ran those two concurrent tests in hours as follows on different platforms, I have never seen a single crash. If I can have a reproducible, I will certainly figure out why and fix it.

Now I intend to believe it's an external factor which caused it fails. Those two tests don't trigger log rotation, but other gtests may do. Perhaps Oracle Linux mounts a small partition on /tmp.

My comment is like "a theory of operation". I try to document the original author's intention. IMHO, it's thread-safe.
Do you mind trying that patch again? Maybe we were just lucky.

[ RUN      ] LogConfigurationTest.reconfigure_decorators_MT_vm
[       OK ] LogConfigurationTest.reconfigure_decorators_MT_vm (3601037 ms)
[ RUN      ] LogConfigurationTest.reconfigure_tags_MT_vm
[       OK ] LogConfigurationTest.reconfigure_tags_MT_vm (3600112 ms)

thanks,
--lx

@mlbridge
Copy link

mlbridge bot commented Jun 15, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 15/06/2021 3:40 pm, Xin Liu wrote:

On Mon, 14 Jun 2021 22:21:13 GMT, David Holmes <david.holmes at oracle.com> wrote:

The updated branch passes the tier6 testing that previously failed. If
rotate() should not be being used by the test, yet the rotate() fix has
addressed the crashes, then apparently something else must be happening
here that is not fully understood.

Thanks for the feedback. To be honest, I am not sure what the root cause is.
I ran those two concurrent tests in hours as follows on different platforms, I have never seen a single crash. If I can have a reproducible, I will certainly figure out why and fix it.

Now I intend to believe it's an external factor which caused it fails. Those two tests don't trigger log rotation, but other gtests may do. Perhaps Oracle Linux mounts a small partition on /tmp.

My comment is like "a theory of operation". I try to document the original author's intention. IMHO, it's thread-safe.
Do you mind trying that patch again? Maybe we were just lucky.

This time I got the old crash again in tier7:

[ RUN ] LogConfigurationTest.reconfigure_decorators_MT_vm
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error
(/opt/mach5/mesos/work_dir/slaves/35546316-ed70-4084-8244-d672e7734fd1-S56997/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3e3c1142-1595-4682-b00e-27397233f191/runs/5367405b-c511-4cff-962e-c538d3a40ccb/workspace/open/src/hotspot/share/logging/logDecorations.cpp:78),
pid=23019, tid=23025
# assert(_decorators.is_decorator(decorator)) failed: decorator was not
part of the decorator set specified at creation.
#
# JRE version: Java(TM) SE Runtime Environment (18.0) (fastdebug build
18-internal+0-2021-06-15-0623001.david.holmes.jdk-dev3.git)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug
18-internal+0-2021-06-15-0623001.david.holmes.jdk-dev3.git, mixed mode,
tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x15fa3e4]
LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
long) const+0x94
#

Stack: [0x00007f8834f2a000,0x00007f883502a000], sp=0x00007f8835028600,
free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
C=native code)
V [libjvm.so+0x15fa3e4]
LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
long) const+0x94
V [libjvm.so+0x15fd207] LogFileStreamOutput::write(LogDecorations
const&, char const*)+0x77
V [libjvm.so+0x15fc4f2] LogFileOutput::write_blocking(LogDecorations
const&, char const*)+0x42
V [libjvm.so+0x15f6b0d] AsyncLogWriter::run()+0x35d
V [libjvm.so+0x1c0d0c0] Thread::call_run()+0x100
V [libjvm.so+0x18d9f64] thread_native_entry(Thread*)+0x10

BTW: I noticed a separate bug in the crash log. You've defined:

bool is_Named_thread() const override { return true; }

for the AsyncLogWriter, but as you pointed out when I asked, this thread
can't be a NamedThread because that is a special kind of GC thread. So
that method should not be there.

David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 15, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 15/06/2021 3:40 pm, Xin Liu wrote:

On Mon, 14 Jun 2021 22:21:13 GMT, David Holmes <david.holmes at oracle.com> wrote:

The updated branch passes the tier6 testing that previously failed. If
rotate() should not be being used by the test, yet the rotate() fix has
addressed the crashes, then apparently something else must be happening
here that is not fully understood.

Thanks for the feedback. To be honest, I am not sure what the root cause is.
I ran those two concurrent tests in hours as follows on different platforms, I have never seen a single crash. If I can have a reproducible, I will certainly figure out why and fix it.

Now I intend to believe it's an external factor which caused it fails. Those two tests don't trigger log rotation, but other gtests may do. Perhaps Oracle Linux mounts a small partition on /tmp.

My comment is like "a theory of operation". I try to document the original author's intention. IMHO, it's thread-safe.
Do you mind trying that patch again? Maybe we were just lucky.

This time I got the old crash again in tier7:

[ RUN ] LogConfigurationTest.reconfigure_decorators_MT_vm
#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error
(/opt/mach5/mesos/work_dir/slaves/35546316-ed70-4084-8244-d672e7734fd1-S56997/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/3e3c1142-1595-4682-b00e-27397233f191/runs/5367405b-c511-4cff-962e-c538d3a40ccb/workspace/open/src/hotspot/share/logging/logDecorations.cpp:78),
pid=23019, tid=23025
# assert(_decorators.is_decorator(decorator)) failed: decorator was not
part of the decorator set specified at creation.
#
# JRE version: Java(TM) SE Runtime Environment (18.0) (fastdebug build
18-internal+0-2021-06-15-0623001.david.holmes.jdk-dev3.git)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug
18-internal+0-2021-06-15-0623001.david.holmes.jdk-dev3.git, mixed mode,
tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V [libjvm.so+0x15fa3e4]
LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
long) const+0x94
#

Stack: [0x00007f8834f2a000,0x00007f883502a000], sp=0x00007f8835028600,
free space=1017k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
C=native code)
V [libjvm.so+0x15fa3e4]
LogDecorations::decoration(LogDecorators::Decorator, char*, unsigned
long) const+0x94
V [libjvm.so+0x15fd207] LogFileStreamOutput::write(LogDecorations
const&, char const*)+0x77
V [libjvm.so+0x15fc4f2] LogFileOutput::write_blocking(LogDecorations
const&, char const*)+0x42
V [libjvm.so+0x15f6b0d] AsyncLogWriter::run()+0x35d
V [libjvm.so+0x1c0d0c0] Thread::call_run()+0x100
V [libjvm.so+0x18d9f64] thread_native_entry(Thread*)+0x10

BTW: I noticed a separate bug in the crash log. You've defined:

bool is_Named_thread() const override { return true; }

for the AsyncLogWriter, but as you pointed out when I asked, this thread
can't be a NamedThread because that is a special kind of GC thread. So
that method should not be there.

David
-----

@navyxliu
Copy link
Member Author

for the AsyncLogWriter, but as you pointed out when I asked, this thread
can't be a NamedThread because that is a special kind of GC thread. So
that method should not be there.

Here is original answer: "NamedThread() is not just NonJavaThread with a name. It has some gc-specific fields. it seams NamedThread is for GC threads. Further, I don't need to support dynamic names."

Yes, you're right. Sidetrack it here JDK-8268852.
I misinterpret the member function is_Named_Thread().

Xin Liu added 2 commits June 17, 2021 23:19
This patch also installed a sanity check which will assert that
LogOutput::_decorators is subset of LogDecocrations. It will disclose
more information if the assertion fails.
@mlbridge
Copy link

mlbridge bot commented Jun 18, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Xin,

On 18/06/2021 9:29 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request incrementally with one additional commit since the last revision:

Use Atomic::load/store for decorators.

This patch also installed a sanity check which will assert that
LogOutput::_decorators is subset of LogDecocrations. It will disclose
more information if the assertion fails.

I'm putting these latest changes through tier6 and 7 testing - but there
is a big backlog at the moment so I don't know when I will get results.

Looking at the code I still find it very difficult to see what actions
can happen concurrently and what mechanism ensure correct concurrent
actions. The wait_until_no_readers() logic doesn't make sense to me as I
can't see what stops the number of readers from increasing the moment
after it was read as zero! The relationships between all the different
objects is far from clear to me.

David
-----

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 18, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Xin,

On 18/06/2021 9:29 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request incrementally with one additional commit since the last revision:

Use Atomic::load/store for decorators.

This patch also installed a sanity check which will assert that
LogOutput::_decorators is subset of LogDecocrations. It will disclose
more information if the assertion fails.

I'm putting these latest changes through tier6 and 7 testing - but there
is a big backlog at the moment so I don't know when I will get results.

Looking at the code I still find it very difficult to see what actions
can happen concurrently and what mechanism ensure correct concurrent
actions. The wait_until_no_readers() logic doesn't make sense to me as I
can't see what stops the number of readers from increasing the moment
after it was read as zero! The relationships between all the different
objects is far from clear to me.

David
-----

@mlbridge
Copy link

mlbridge bot commented Jun 18, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 18/06/2021 12:00 pm, David Holmes wrote:

Hi Xin,

On 18/06/2021 9:29 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are
provided.
The regression test discovers a race condition in LogTagSet::log()
even with
synchronous logging. It's not MT-safe if context switch happens
between the
creation of LogDecorations? and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request incrementally with one additional
commit since the last revision:

?? Use Atomic::load/store for decorators.
?? This patch also installed a sanity check which will assert that
?? LogOutput::_decorators is subset of LogDecocrations. It will disclose
?? more information if the assertion fails.

I'm putting these latest changes through tier6 and 7 testing - but there
is a big backlog at the moment so I don't know when I will get results.

Not complete yet but the asynclogTest failed again:

- one crash with the SEGV in flockfile
- one crash with assert(bitmask == _decorators._decorators) failed:
LogOutput: 4095 decorations: 0 message = ConcurrentLogsite 0 emits a log

David
-----

Looking at the code I still find it very difficult to see what actions
can happen concurrently and what mechanism ensure correct concurrent
actions. The wait_until_no_readers() logic doesn't make sense to me as I
can't see what stops the number of readers from increasing the moment
after it was read as zero! The relationships between all the different
objects is far from clear to me.

David
-----

@navyxliu
Copy link
Member Author

Mailing list message from David Holmes on hotspot-runtime-dev:

On 18/06/2021 12:00 pm, David Holmes wrote:

Hi Xin,
On 18/06/2021 9:29 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are
provided.
The regression test discovers a race condition in LogTagSet::log()
even with
synchronous logging. It's not MT-safe if context switch happens
between the
creation of LogDecorations? and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request incrementally with one additional
commit since the last revision:
?? Use Atomic::load/store for decorators.
?? This patch also installed a sanity check which will assert that
?? LogOutput::_decorators is subset of LogDecocrations. It will disclose
?? more information if the assertion fails.

I'm putting these latest changes through tier6 and 7 testing - but there
is a big backlog at the moment so I don't know when I will get results.

Not complete yet but the asynclogTest failed again:

  • one crash with the SEGV in flockfile
  • one crash with assert(bitmask == _decorators._decorators) failed:
    LogOutput: 4095 decorations: 0 message = ConcurrentLogsite 0 emits a log

David

Looking at the code I still find it very difficult to see what actions
can happen concurrently and what mechanism ensure correct concurrent
actions. The wait_until_no_readers() logic doesn't make sense to me as I
can't see what stops the number of readers from increasing the moment
after it was read as zero! The relationships between all the different
objects is far from clear to me.
David

Thank you. I see. we haven't solved any of two crash sights.
Current synchronization mechanism doesn't work for both tags or decorators.
Let me think of it.

@navyxliu navyxliu marked this pull request as draft June 22, 2021 07:08
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jun 22, 2021
…letion.

This patch serializes all flush() calls in one thread.
@mlbridge
Copy link

mlbridge bot commented Jun 23, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Xin,

On 23/06/2021 2:50 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:

- AsyncLogWrite::flush() uses a synchronization token and wait for completion.

This patch serializes all flush\(\) calls in one thread\.

I put this through our test system and tiers 6 and 7 also passed this way.

So to walk through the fix ...

When changing the configuration:
- first update the values so that future log entries will use the new
ones (this includes the RCU check for "no readers")
- insert a token item into the enqueued messages for the AsyncLog thread
to see
- wait on the flush_sem

In this way when the asyncLog thread finds the token it can signal the
thread blocked in the flush() call and that thread can then proceed to
change the configuration knowing no enqueued messages can be dependent
on anything being changed.

I think that seem okay. Only ever doing the writing in the asyncLog
thread seems cleaner too.

The only thing I need to double-check is whether there is any potential
termination race.

In the meantime, I would suggest closing this PR and opening a new one
for JDK 17 as we need to get this fixed there ASAP. And a fresh PR will
make things a lot clearer for other reviewers.

Thanks,
David

1 similar comment
@mlbridge
Copy link

mlbridge bot commented Jun 23, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

Hi Xin,

On 23/06/2021 2:50 am, Xin Liu wrote:

Support dynamic reconfiguration for async logging. 2 unittests are provided.
The regression test discovers a race condition in LogTagSet::log() even with
synchronous logging. It's not MT-safe if context switch happens between the
creation of LogDecorations and LogOutputList::Iterator. fixed.

Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:

- AsyncLogWrite::flush() uses a synchronization token and wait for completion.

This patch serializes all flush\(\) calls in one thread\.

I put this through our test system and tiers 6 and 7 also passed this way.

So to walk through the fix ...

When changing the configuration:
- first update the values so that future log entries will use the new
ones (this includes the RCU check for "no readers")
- insert a token item into the enqueued messages for the AsyncLog thread
to see
- wait on the flush_sem

In this way when the asyncLog thread finds the token it can signal the
thread blocked in the flush() call and that thread can then proceed to
change the configuration knowing no enqueued messages can be dependent
on anything being changed.

I think that seem okay. Only ever doing the writing in the asyncLog
thread seems cleaner too.

The only thing I need to double-check is whether there is any potential
termination race.

In the meantime, I would suggest closing this PR and opening a new one
for JDK 17 as we need to get this fixed there ASAP. And a fresh PR will
make things a lot clearer for other reviewers.

Thanks,
David

@navyxliu
Copy link
Member Author

I understand your fix. You mark write() with keyword "synchonized in java" . One bonus is flush() is MT-safe as well.

Yes, the benefit of token is that only one thread calls write(). It doesn't pay "synchronized" method cost if no flush() is invoked. I will pick one solution and remake the PR. thanks!

@navyxliu navyxliu closed this Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants