Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8229517: Support for optional asynchronous/buffered logging #3135

Closed
wants to merge 39 commits into from
Closed
Changes from 28 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0697587
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
a3945d0
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
e23657f
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
2871894
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
760ec76
8229517: Support for optional asynchronous/buffered logging
Mar 28, 2021
bcefbec
8229517: Support for optional asynchronous/buffered logging
Mar 28, 2021
09c058d
8229517: Support for optional asynchronous/buffered logging
Apr 1, 2021
9211121
8229517: Support for optional asynchronous/buffered logging
Apr 9, 2021
81b2a0c
fix runtime/logging/RedefineClasses.java crashed with -XX:+AsyncLogging
Apr 9, 2021
54aadfa
Revert "fix runtime/logging/RedefineClasses.java crashed with -XX:+As…
Apr 12, 2021
f453d03
Inject the number of dropped messages since last dumpping.
Apr 15, 2021
3d74a32
Fix build issue with `--disable-precompiled-headers`
Apr 15, 2021
2bd104f
Support Jcmd pid VM.log disable
Apr 16, 2021
edb15c6
Remove LogAsyncInterval completely
Apr 20, 2021
11c2b7a
Resolve rank conflict between tty_lock and LogAsyncFlusher's _lock.
Apr 21, 2021
b2ad511
Fix a race condition bug on LogAsyncFlusher termination.
Apr 22, 2021
13c1c2b
Merge branch 'master' into JDK-8229517
Apr 22, 2021
4cf4a57
Refactor LogAsyncFlusher::abort()
Apr 23, 2021
1f06be3
Accurate Decorations for AsyncLogging.
Apr 23, 2021
e2d6f9e
Fix bugs/style/typo based on reviewers' feedbacks.
Apr 23, 2021
78d8f2c
Revert "Accurate Decorations for AsyncLogging."
Apr 29, 2021
e035ee3
Reimplement Accurate Decorations for AsyncLogging.
Apr 29, 2021
98ae33f
Implement the global option -Xlog:async per CSR.
May 3, 2021
1d5e540
Use LogTagSetMapping<LogTag::__NO_TAG>::tagset() instead of NULL poin…
May 5, 2021
c67d74e
Change option AsyncLogBufferEntries to AsyncLogBufferSize.
May 7, 2021
e3ee430
Merge branch 'master' into JDK-8229517
May 8, 2021
908d2a3
Update based on the feedbacks from reviwers.
May 11, 2021
7df8b1e
Fix build on Windows.
May 12, 2021
8f7b3b1
Refactor Initialize() and Termiante().
May 13, 2021
cade304
Replace Mutex with Semaphore.
May 14, 2021
e19e90a
refactor code and update per reviewers' feedback.
May 16, 2021
be83aaf
Implement the new discard policy: drop the incoming message.
May 17, 2021
348edfc
Update according to reviewer's feedback.
May 20, 2021
2381ba1
Add threadump support for AsyncLog Thread.
May 22, 2021
cad42d7
Update according to reviewers' feedback (Part-2).
May 22, 2021
9800a8c
flush() waits until all pending logging IO operations are done.
May 24, 2021
a25b8d6
Biased to the thread which is taking buffer lock.
May 24, 2021
2ffc33f
Remove AsyncLogWriter::flush() from os::shutdown().
May 25, 2021
b3d36cf
Correct typos and add a comment for a corner case.
May 27, 2021
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -29,6 +29,7 @@
#endif
#include "jvmtifiles/jvmti.h"
#include "logging/log.hpp"
#include "logging/logAsyncFlusher.hpp"
#include "memory/allocation.inline.hpp"
#include "os_posix.inline.hpp"
#include "runtime/globals_extension.hpp"
@@ -1946,6 +1947,7 @@ void os::shutdown() {
// needs to remove object in file system
AttachListener::abort();

LogAsyncFlusher::abort();
// flush buffered output, finish log files
ostream_abort();

@@ -38,6 +38,7 @@
#include "jvmtifiles/jvmti.h"
#include "logging/log.hpp"
#include "logging/logStream.hpp"
#include "logging/logAsyncFlusher.hpp"
#include "memory/allocation.inline.hpp"
#include "oops/oop.inline.hpp"
#include "os_share_windows.hpp"
@@ -714,6 +715,7 @@ bool os::create_thread(Thread* thread, ThreadType thr_type,
case os::vm_thread:
case os::pgc_thread:
case os::cgc_thread:
case os::asynclog_thread:
case os::watcher_thread:
if (VMThreadStackSize > 0) stack_size = (size_t)(VMThreadStackSize * K);
break;
@@ -1094,6 +1096,7 @@ void os::shutdown() {
// allow PerfMemory to attempt cleanup of any persistent resources
perfMemory_exit();

LogAsyncFlusher::abort();
// flush buffered output, finish log files
ostream_abort();

@@ -0,0 +1,233 @@
/*
* Copyright Amazon.com Inc. or its affiliates. All Rights Reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*
*/
#include "precompiled.hpp"
#include "jvm.h"
#include "logging/logAsyncFlusher.hpp"
#include "logging/logConfiguration.hpp"
#include "logging/logFileOutput.hpp"
#include "logging/logHandle.hpp"
#include "runtime/atomic.hpp"

void AsyncLogMessage::writeback() {
if (_message != NULL) {
_output.write_blocking(_decorations, _message);
os::free(_message);
}
}

void LogAsyncFlusher::enqueue_impl(const AsyncLogMessage& msg) {

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Could you rename this to enqueue_locked? We use this name at other places for functions which factor out locking to the caller.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

okay.

assert_lock_strong(&_lock);

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Can we enqueue (so, log) from within the async logger itself? If not, add an assert here that this should not be called from the logger thread?

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

enqueue_impl() is private. it's not accessible from outside.
only public member functions can be seen by clients.

This comment has been minimized.

@tstuefe

tstuefe May 14, 2021
Member Outdated

enqueue_impl() is private. it's not accessible from outside.
only public member functions can be seen by clients.

What I meant is, can logging - hence enqueing - be done from the async thread itself? Can it happen that the logger thread itself logs, would that be a problem, and if yes, should we guard against it?

This comment has been minimized.

@navyxliu

navyxliu May 21, 2021
Author Member Outdated

Actually, it could happen. This line is the very first 'log' from AsyncLog Thread.

  void pre_run() override {
    NonJavaThread::pre_run();
    log_debug(logging, thread)("starting AsyncLog Thread tid = " INTX_FORMAT, os::current_thread_id());
  }

This logsite is handled exactly same as other logsites. grab the lock, enqueue the cloned message and return -1.


if (_buffer.size() >= _buffer_max_size) {

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

As mentioned in the CSR, I prefer dropping the newest message, not the oldest one. So there would be no reason to pop anything from the buffer.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

copy that. I will update it.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

One thing occurring to me was that a global buffer for all outputs is maybe not the best choice. One buffer per LogFileOutput seems like a cleaner, more natural fit:

That's another reason I introduce a hashmap for dropped counters.

I read you in the RFC thread about the "Global Big Lock" for all filed-based log outputs. I know it's not scalable.I proposed to Volker splitting buffers for different outputs. Then the cost of synchronization would be effectively same as current UL. Volker countered that it wasn't an issue until users deploys many log files. it seems rare. We can leave it for future improvement.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

I think what Volker said makes sense, then KVHashtable<LogFileOutput*, AsyncLogBuffer&> downsized to
KVHashtable<LogFileOutput*, uint32_t>...

This comment has been minimized.

@tstuefe

tstuefe May 14, 2021
Member Outdated

One thing occurring to me was that a global buffer for all outputs is maybe not the best choice. One buffer per LogFileOutput seems like a cleaner, more natural fit:

That's another reason I introduce a hashmap for dropped counters.

I read you in the RFC thread about the "Global Big Lock" for all filed-based log outputs. I know it's not scalable.I proposed to Volker splitting buffers for different outputs. Then the cost of synchronization would be effectively same as current UL. Volker countered that it wasn't an issue until users deploys many log files. it seems rare. We can leave it for future improvement.

Performance was not my main reason for suggesting this, but code simplicity.

Im fine with leaving this for a future RFE

Fior the record, here is the idea:

  • keep async buffer, drop counter and async buffer lock as members in LogFileOutput
  • have a global counter "enqueued-items"
  • in LogFileOutput::write:
    • if enqueued-items=max, increase the drop counter in LogFileOutput
    • otherwise create buffer if needed, then enqueue into this buffer, increase global enqueued-items counter
  • LogAsyncFlusher::flush becomes LogFileOutput::flush_async_buffer() or similar and:
    • prints its drop counter, if > 0
    • flushes its own buffer, same logic as today, but locks over its local _async_buffer_log
    • and decreases global enqueued-items counter
  • flushing means iterating over all LogFileOutput objects and calling flush on them.

I think what Volker said makes sense, then KVHashtable<LogFileOutput*, AsyncLogBuffer&> downsized to
KVHashtable<LogFileOutput*, uint32_t>...

I'm loosing track here, where is Volker's remark and where is the first variant used?

const AsyncLogMessage* h = _buffer.front();
assert(h != NULL, "sanity check");

if (h->message() != nullptr) {
bool p_created;
uint32_t* counter = _stats.add_if_absent(h->output(), 0, &p_created);
*counter = *counter + 1;

if (Verbose) {
// Temporarily turn off SerializeVMOutput so defaultStream will not
// invoke set_owner(self) for tty_lock.
FlagSetting t(SerializeVMOutput, false);
// The writing below can not guarantee non-blocking because tty may be piped by the filesystems
// or throttled by XOFF, so only dump the dropping message in Verbose mode.
tty->print_cr("asynclog dropping message: %s", h->message());
}
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

I'm not sure this is needed or sensible. If we are flooded, we would then proceed to print every dropped message to tty? So now we flood tty?

I'd hope exhausting the output buffer is rare, and easily remedied by telling the user to increase the buffer or stop using async mode.

}

_buffer.pop_front();
}
assert(_buffer.size() < _buffer_max_size, "_buffer is over-sized.");
_buffer.push_back(msg);

// notify asynclog thread if occupancy is over 3/4
size_t sz = _buffer.size();
if (sz > (_buffer_max_size >> 2) * 3 ) {

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Can you reform this to use normal division? Ideally with "0.75" appearing somewhere as a constant which is easily changeable?

_lock.notify();
}
}

void LogAsyncFlusher::enqueue(LogFileOutput& output, const LogDecorations& decorations, const char* msg) {
AsyncLogMessage m(output, decorations, os::strdup(msg));

{ // critical area
// The rank of _lock is same as _tty_lock on purpuse.
// if logging thread is holding _tty_lock now, temporarily yield to _lock.
ttyUnlocker ttyul;
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
enqueue_impl(m);
}
}

// LogMessageBuffer consists of a multiple-part/multiple-line messsages.
// the mutex here gurantees its integrity.
void LogAsyncFlusher::enqueue(LogFileOutput& output, LogMessageBuffer::Iterator msg_iterator) {
ttyUnlocker ttyul;
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);

for (; !msg_iterator.is_at_end(); msg_iterator++) {
AsyncLogMessage m(output, msg_iterator.decorations(), os::strdup(msg_iterator.message()));
enqueue_impl(m);
}
}

LogAsyncFlusher::LogAsyncFlusher()
: _state(ThreadState::Running),
_lock(Mutex::tty, "async-log-monitor", true /* allow_vm_block */, Mutex::_safepoint_check_never),
_stats(17 /*table_size*/) {
if (os::create_thread(this, os::asynclog_thread)) {
os::start_thread(this);
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Here is some error handling needed if thread creation failed. UL should drop to sync mode then.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

make sense! I will take a look at the error handling.

}

log_info(logging)("The maximum entries of AsyncLogBuffer: " SIZE_FORMAT ", estimated memory use: " SIZE_FORMAT " bytes",
_buffer_max_size, AsyncLogBufferSize);

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

nit: how about "Asynchronous logging enabled. Buffer size: xxx entries"?

This comment has been minimized.

@navyxliu

navyxliu May 19, 2021
Author Member Outdated

I change the logic a little bit.
at this point we can't say "Asynchronous logging enabled", we need to make sure:

  1. asynclog Thread is ready.
  2. _instance is non-null.
  3. all active readers(synchronous logsites) are completed.

I emit a sentence later.
[0.036s][27407][debug][logging,thread ] AsyncLogging starts working.

}

bool AsyncLogMapIterator::do_entry(LogFileOutput* output, uint32_t* counter) {
using dummy = LogTagSetMapping<LogTag::__NO_TAG>;
LogDecorations decorations(LogLevel::Warning, dummy::tagset(), output->decorators());
const int sz = 128;
char out_of_band[sz];

if (*counter > 0) {
jio_snprintf(out_of_band, sz, UINT32_FORMAT_W(6) " messages dropped due to async logging", *counter);
output->write_blocking(decorations, out_of_band);
*counter = 0;
}

return true;
}

void LogAsyncFlusher::writeback(const LinkedList<AsyncLogMessage>& logs) {
LinkedListIterator<AsyncLogMessage> it(logs.head());
while (!it.is_empty()) {
AsyncLogMessage* e = it.next();
e->writeback();
}
}

void LogAsyncFlusher::flush(bool with_lock) {
LinkedListImpl<AsyncLogMessage, ResourceObj::C_HEAP, mtLogging> logs;

if (with_lock) { // critical area
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

I think the trick to get conditional Mutexes without having separate branches is to make the MutexLocker argument itself conditional (I think you can pass NULL, in which case no locking) - then you could merge these two branches.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

thank you for the head-up!
MutexLocker can take NULL and drive through! definitely worth a shot.
Interestingly, MonitorLocker can't do that.

// Caveat: current thread must not hold _tty_lock or other lower rank lockers.
// Cannot install ttyUnlocker here because flush() may be invoked before defaultStream
// initialization.
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
_buffer.pop_all(&logs);
AsyncLogMapIterator iter;
_stats.iterate(&iter);
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

All this is really difficult to understand for the casual reader. Clearer naming would help:

  • something like "move-list-content" instead of pop_all
  • the whole point of this iteration thing is to print the dropped counters, right? A function "Print Drop Counters" would be clearer. Unless you go the one-async-buffer-per-logfileoutput-route I mentioned earlier.

This comment has been minimized.

@navyxliu

navyxliu May 17, 2021
Author Member Outdated

I will send out a patch to fix this along with the new discard policy.
There's another issue. _stats.iterate(&iter) here invokes output->write_blocking while holding lock!
This has compromised non-blocking protocol.

} else {
// C++ lambda can simplify the code snippet.
_buffer.pop_all(&logs);
AsyncLogMapIterator iter;
_stats.iterate(&iter);
}

writeback(logs);
}

void LogAsyncFlusher::run() {
while (_state == ThreadState::Running) {
{
MonitorLocker m(&_lock, Mutex::_no_safepoint_check_flag);
m.wait(ASYNCLOG_WAIT_TIMEOUT);
}
flush();
}

// Signal thread has terminated
MonitorLocker ml(Terminator_lock);
_state = ThreadState::Terminated;
ml.notify_all();
}

LogAsyncFlusher* LogAsyncFlusher::_instance = nullptr;

void LogAsyncFlusher::initialize() {
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@simonis

simonis Mar 26, 2021
Member Outdated

I don't think you need this. Please merge the initialization into LogAsyncFlusher::instance() (see my comments on your changes in init.cpp).

This comment has been minimized.

@tstuefe

tstuefe May 7, 2021
Member Outdated

In contrast to Volker I think an explicit initialization is fine and follows the established pattern for other subsystems. However, I would:

  • assert for _instance == NULL and remove the condition. This should only be called once.
  • no need for Atomics here since we do this during VM init.

And while writing this, I realize this would conflict with the gtest fixture where you switch this on or off several time in one process. I propose you switch those gtests - only those testing the LogAsyncFlusher, not those testing the Deque etc - as TEST_OTHER_VM.

Which arguably we should do for all logging gtests, not only yours, if they fiddle with global VM settings. In other words, it should be possible to run the gtests with -Xlog:... options and the gtests should still pass, including all log tests. Today, this is not the case.

if (!LogConfiguration::is_async_mode()) return;

if (_instance == NULL) {
Atomic::release_store(&LogAsyncFlusher::_instance, new LogAsyncFlusher());
}
}

// Termination
// 1. issue an atomic release_store to close the logging window.
// 2. flush itself in-place.
// 3. signal asynclog thread to exit.
// 4. wait until asynclog thread exits.
// 5. (optional) delete this in post_run().
void LogAsyncFlusher::terminate() {
if (_instance != NULL) {
LogAsyncFlusher* self = _instance;

Atomic::release_store<LogAsyncFlusher*, LogAsyncFlusher*>(&_instance, nullptr);

This comment has been minimized.

@tstuefe

tstuefe May 7, 2021
Member Outdated

IIUC the mechanism you signal "async"-ness to the logging callsites with is via this pointer being NULL or not, right?

I am not sure this is safe wrt to lost log output on termination. This is not a big deal, but it would be nice if we lost no output from concurrent logging when terminating and switching over to non-async mode.

Proposal:

in LogFileOutput::write, instead of querying this pointer for NULL, just call LogAsyncFlusher::enqueue() unconditionally and let enqueue() return a state whether write succeeded or not. In that function, return false if flusher is terminated or not active at all - in which case you do the normal synchronized writing.

Thinking this through, one could even scratch the LogAsyncBuffer dynamic allocation and make it either a static global object or even an AllStatic, with all static methods. As a bonus, you can replace the instance variable, as well as the NULL tests and possibly the Thread::current(), with access to either static LogAsyncBuffer functions or to a global static LogAsyncBuffer singleton.

This comment has been minimized.

@navyxliu

navyxliu May 12, 2021
Author Member Outdated

IIUC the mechanism you signal "async"-ness to the logging callsites with is via this pointer being NULL or not, right?

yes.

I am not sure this is safe wrt to lost log output on termination.

It's not safe! now it may lose a coupe of messages in race condition. I have a patch to fix that.

Thinking this through, one could even scratch the LogAsyncBuffer dynamic allocation and make it either a static
global object or even an AllStatic, with all static methods.

I tried this as well. I found it's hard. Logging sites spread the entire lifecycle of JVM. Unfortunately, async logging can't be in function without some runtime supports, such as Thread::current(), threading. One lesson I learned from this patch is that DON'T use any runtime in such low-level module.

Now my target is to define a clear interval. Between 2 synchronization points, async logging takes over synchronous logging. I think it's acceptable because it covers the entire "java" runtime which is concerned by Java applications.

|JVM begin---LogAsyncFlusher::initialize()---...---LogAsyncFlusher::terminate()--JVM End|

Another bonus for this 2-point-design is that we can assume only AsyncLog Thread is doing logging I/O when async logging is in charge. it means I can skip the FileLocker imposed by unified logging framework.

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

IIUC the mechanism you signal "async"-ness to the logging callsites with is via this pointer being NULL or not, right?

yes.

I am not sure this is safe wrt to lost log output on termination.

It's not safe! now it may lose a coupe of messages in race condition. I have a patch to fix that.

Thinking this through, one could even scratch the LogAsyncBuffer dynamic allocation and make it either a static
global object or even an AllStatic, with all static methods.

I tried this as well. I found it's hard. Logging sites spread the entire lifecycle of JVM. Unfortunately, async logging can't be in function without some runtime supports, such as Thread::current(), threading. One lesson I learned from this patch is that DON'T use any runtime in such low-level module.

Now my target is to define a clear interval. Between 2 synchronization points, async logging takes over synchronous logging. I think it's acceptable because it covers the entire "java" runtime which is concerned by Java applications.

|JVM begin---LogAsyncFlusher::initialize()---...---LogAsyncFlusher::terminate()--JVM End|

That's fine.

Another bonus for this 2-point-design is that we can assume only AsyncLog Thread is doing logging I/O when async logging is in charge. it means I can skip the FileLocker imposed by unified logging framework.

Which is good, but you have to find a solution for log sites which do not fulfill conditions to be logged asynchronously, if they fall into the live phase, see my comment above.

self->flush();
{
MonitorLocker ml(&self->_lock, Mutex::_no_safepoint_check_flag);
self->_state = ThreadState::Terminating;
ml.notify();
}
{
MonitorLocker ml(Terminator_lock, Mutex::_no_safepoint_check_flag);
while (self->_state != ThreadState::Terminated) {
ml.wait();
}
}
}
}

LogAsyncFlusher* LogAsyncFlusher::instance() {
// thread may has been detached, then Thread::current() used by Mutex is inavailable.
// eg. ThreadsSMRSupport::smr_delete() uses log_debug() after delete thread.
if (Thread::current_or_null() != nullptr) {
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 7, 2021
Member Outdated

I don't understand why this depends on the existence of a current thread? Ideally we should be able to log just fine in corner cases, e.g. when thread is detached or if we have an error (during signal handling e..g.)

This comment has been minimized.

@navyxliu

navyxliu May 10, 2021
Author Member Outdated

LogAsyncFlusher::enqueue needs Thread::current() to use _lock.

void Mutex::lock_without_safepoint_check() {
  lock_without_safepoint_check(Thread::current());
}

This check is for a very rare corner case. This can be trigger using the following cmdline.
java -Xlog:'thread+smr=debug:file=/tmp/t.log' -Xlog:async -version
the root cause is the following code in thread

void ThreadsSMRSupport::smr_delete(JavaThread *thread) {
  elapsedTimer timer;
  if (EnableThreadSMRStatistics) {
    timer.start();
  }

  wait_until_not_protected(thread);

  delete thread; // Thread::current() will trigger assert(current != __null) failed: Thread::current() called on detached thread after ~Thread()
  if (EnableThreadSMRStatistics) {
    timer.stop();
    uint millis = (uint)timer.milliseconds();
    ThreadsSMRSupport::inc_deleted_thread_cnt();
    ThreadsSMRSupport::add_deleted_thread_times(millis);
    ThreadsSMRSupport::update_deleted_thread_time_max(millis);
  }
  // logsite will crash  in async mode because we can't use Thread::current() here. 
  log_debug(thread, smr)("tid=" UINTX_FORMAT ": ThreadsSMRSupport::smr_delete: thread=" INTPTR_FORMAT " is deleted.", os::current_thread_id(), p2i(thread));
}

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Sigh. I get it. Another example that a good logging system should be independent from the infrastructure of the surrounding system. Any chance this patch could use raw locking primitives, pthread_mutex resp. CreateMutex, without having to rewrite everything?

Ironically I though async logging would reduce the number of log call sites where the conditions for logging are not met, by moving most of the stack out of the caller into a separate thread. But the Thread::current condition is a new limitation then compared to sync logging.

If you don't want to change the Mutex:

So you have some rare log sites which won't allow you to use asynchronous logging since you cannot use locks, and you solve this by returning NULL, which then falls back to synchronous logging? What happens if you mix synchronous and asynchronous logging like this? Does this work?

From a code point of view your reasoning is very difficult to deduce. Can you please change this: remove the THread::current condition from the singleton getter, and add an explicit check for Thread::current at its caller site. In async mode either just omit printing any output needing Thread::current, or relegate to synchronous printing. And add a good comment there.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

Thank you for reviewing. you raise the exact problem which keep me from sleeping...

From a code point of view your reasoning is very difficult to deduce. Can you please change this: remove the THread::current condition from the singleton getter, and add an explicit check for Thread::current at its caller site. In async mode either just omit printing any output needing Thread::current, or relegate to synchronous printing. And add a good comment there.

That's exactly the current approach. It works but async logging and synchronous logging may overlap in some rare windows such as my example of smr_delete.

LogAsyncFlusher* LogAsyncFlusher::instance() {
  // thread may has been detached, then Thread::current() used by Mutex is inavailable.
  // eg. ThreadsSMRSupport::smr_delete() uses log_debug() after delete thread.
  if (Thread::current_or_null() != nullptr) {
    return _instance;
  } else {
    return nullptr;
  }
}

One drawback is that I have to give up the FileLocker optimization for the entire interval async logging(99.99% time)
for those "rare" windows :( Once I remove FileLocker for FILE*, it may get stuck when two logging writers overlap!

I am evaluating different approaches. eg. the 2-synchronization-points design or even replace Mutex with a semaphore-based lock like ConfigurationLock

return _instance;
} else {
return nullptr;
}
}

// Different from terminate(), abort is invoked by os::abort().
// There are 2 constraints:
// 1. must be async-safe because os::abort() may be invoked by a signal handler while other
// threads are executing.
// 2. must not obtain _lock. eg. gtest.MutexRank.mutex_lock_access_leaf(test_mutex_rank.cpp)
// holds a 'assess' lock and then traps SIGSEGV on purpose.
//
// Unlike terminate, abort() just ensures all pending log messages are flushed. It doesnot
// exit asynclog thread.
void LogAsyncFlusher::abort() {
if (_instance != nullptr) {
// To meet prior constraints, I borrow the idea in LogConfiguration::disable_outputs(),
// the following code shut down all outputs for all tagsets with a RCU synchroniziation.
// After then, I can flush pending queue without a lock.
for (LogTagSet* ts = LogTagSet::first(); ts != NULL; ts = ts->next()) {
ts->disable_outputs();
}
_instance->flush(false /*with_lock*/);
}
}