Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8229517: Support for optional asynchronous/buffered logging #3135

Closed
wants to merge 39 commits into from
Closed
Changes from 20 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
0697587
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
a3945d0
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
e23657f
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
2871894
8229517: Support for optional asynchronous/buffered logging
Mar 22, 2021
760ec76
8229517: Support for optional asynchronous/buffered logging
Mar 28, 2021
bcefbec
8229517: Support for optional asynchronous/buffered logging
Mar 28, 2021
09c058d
8229517: Support for optional asynchronous/buffered logging
Apr 1, 2021
9211121
8229517: Support for optional asynchronous/buffered logging
Apr 9, 2021
81b2a0c
fix runtime/logging/RedefineClasses.java crashed with -XX:+AsyncLogging
Apr 9, 2021
54aadfa
Revert "fix runtime/logging/RedefineClasses.java crashed with -XX:+As…
Apr 12, 2021
f453d03
Inject the number of dropped messages since last dumpping.
Apr 15, 2021
3d74a32
Fix build issue with `--disable-precompiled-headers`
Apr 15, 2021
2bd104f
Support Jcmd pid VM.log disable
Apr 16, 2021
edb15c6
Remove LogAsyncInterval completely
Apr 20, 2021
11c2b7a
Resolve rank conflict between tty_lock and LogAsyncFlusher's _lock.
Apr 21, 2021
b2ad511
Fix a race condition bug on LogAsyncFlusher termination.
Apr 22, 2021
13c1c2b
Merge branch 'master' into JDK-8229517
Apr 22, 2021
4cf4a57
Refactor LogAsyncFlusher::abort()
Apr 23, 2021
1f06be3
Accurate Decorations for AsyncLogging.
Apr 23, 2021
e2d6f9e
Fix bugs/style/typo based on reviewers' feedbacks.
Apr 23, 2021
78d8f2c
Revert "Accurate Decorations for AsyncLogging."
Apr 29, 2021
e035ee3
Reimplement Accurate Decorations for AsyncLogging.
Apr 29, 2021
98ae33f
Implement the global option -Xlog:async per CSR.
May 3, 2021
1d5e540
Use LogTagSetMapping<LogTag::__NO_TAG>::tagset() instead of NULL poin…
May 5, 2021
c67d74e
Change option AsyncLogBufferEntries to AsyncLogBufferSize.
May 7, 2021
e3ee430
Merge branch 'master' into JDK-8229517
May 8, 2021
908d2a3
Update based on the feedbacks from reviwers.
May 11, 2021
7df8b1e
Fix build on Windows.
May 12, 2021
8f7b3b1
Refactor Initialize() and Termiante().
May 13, 2021
cade304
Replace Mutex with Semaphore.
May 14, 2021
e19e90a
refactor code and update per reviewers' feedback.
May 16, 2021
be83aaf
Implement the new discard policy: drop the incoming message.
May 17, 2021
348edfc
Update according to reviewer's feedback.
May 20, 2021
2381ba1
Add threadump support for AsyncLog Thread.
May 22, 2021
cad42d7
Update according to reviewers' feedback (Part-2).
May 22, 2021
9800a8c
flush() waits until all pending logging IO operations are done.
May 24, 2021
a25b8d6
Biased to the thread which is taking buffer lock.
May 24, 2021
2ffc33f
Remove AsyncLogWriter::flush() from os::shutdown().
May 25, 2021
b3d36cf
Correct typos and add a comment for a corner case.
May 27, 2021
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -29,6 +29,7 @@
#endif
#include "jvmtifiles/jvmti.h"
#include "logging/log.hpp"
#include "logging/logAsyncFlusher.hpp"
#include "memory/allocation.inline.hpp"
#include "os_posix.inline.hpp"
#include "runtime/globals_extension.hpp"
@@ -1945,6 +1946,7 @@ void os::shutdown() {
// needs to remove object in file system
AttachListener::abort();

LogAsyncFlusher::abort();
// flush buffered output, finish log files
ostream_abort();

@@ -38,6 +38,7 @@
#include "jvmtifiles/jvmti.h"
#include "logging/log.hpp"
#include "logging/logStream.hpp"
#include "logging/logAsyncFlusher.hpp"
#include "memory/allocation.inline.hpp"
#include "memory/filemap.hpp"
#include "oops/oop.inline.hpp"
@@ -717,6 +718,7 @@ bool os::create_thread(Thread* thread, ThreadType thr_type,
case os::vm_thread:
case os::pgc_thread:
case os::cgc_thread:
case os::asynclog_thread:
case os::watcher_thread:
if (VMThreadStackSize > 0) stack_size = (size_t)(VMThreadStackSize * K);
break;
@@ -1097,6 +1099,7 @@ void os::shutdown() {
// allow PerfMemory to attempt cleanup of any persistent resources
perfMemory_exit();

LogAsyncFlusher::abort();
// flush buffered output, finish log files
ostream_abort();

@@ -0,0 +1,227 @@
/*
* Copyright Amazon.com Inc. or its affiliates. All Rights Reserved.
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
*
* This code is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 only, as
* published by the Free Software Foundation.
*
* This code is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* version 2 for more details (a copy is included in the LICENSE file that
* accompanied this code).
*
* You should have received a copy of the GNU General Public License version
* 2 along with this work; if not, write to the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
*
* Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
* or visit www.oracle.com if you need additional information or have any
* questions.
*
*/
#include "precompiled.hpp"
#include "jvm.h"
#include "logging/logAsyncFlusher.hpp"
#include "logging/logFileOutput.hpp"
#include "logging/logHandle.hpp"
#include "runtime/atomic.hpp"

void AsyncLogMessage::writeback() {
if (_message != NULL) {
LogDecorations decorations(_level, _tagset, _decorations_ref);
_output.write_blocking(decorations, _message);
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe Mar 27, 2021
Member Outdated

Would this mean that time dependent decorators get resolved at print time, not when the log happen?

This comment has been minimized.

@navyxliu

navyxliu Mar 28, 2021
Author Member Outdated

yes, it is. I just realize that this does distort the timestamps of relevant decorators.

I explain to Volker why I choose to compress LogDecorations to LogDecorators. #3135 (comment)

This issue is similar to "safepoint-bias" in Java profiler. Those timestamps "bias" to the flusher, but the error is limited to AsyncLogInterval. We could spend extra efforts to keep the original timestamps. Is it worth it?

}
}

void LogAsyncFlusher::enqueue_impl(const AsyncLogMessage& msg) {

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Could you rename this to enqueue_locked? We use this name at other places for functions which factor out locking to the caller.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

okay.

assert_lock_strong(&_lock);

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Can we enqueue (so, log) from within the async logger itself? If not, add an assert here that this should not be called from the logger thread?

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

enqueue_impl() is private. it's not accessible from outside.
only public member functions can be seen by clients.

This comment has been minimized.

@tstuefe

tstuefe May 14, 2021
Member Outdated

enqueue_impl() is private. it's not accessible from outside.
only public member functions can be seen by clients.

What I meant is, can logging - hence enqueing - be done from the async thread itself? Can it happen that the logger thread itself logs, would that be a problem, and if yes, should we guard against it?

This comment has been minimized.

@navyxliu

navyxliu May 21, 2021
Author Member Outdated

Actually, it could happen. This line is the very first 'log' from AsyncLog Thread.

  void pre_run() override {
    NonJavaThread::pre_run();
    log_debug(logging, thread)("starting AsyncLog Thread tid = " INTX_FORMAT, os::current_thread_id());
  }

This logsite is handled exactly same as other logsites. grab the lock, enqueue the cloned message and return -1.


if (_buffer.size() >= AsyncLogBufferSize) {
const AsyncLogMessage* h = _buffer.front();
assert(h != NULL, "sanity check");

if (h->message() != nullptr) {
bool p_created;
uintx* counter = _stats.add_if_absent(h->output(), 0, &p_created);
*counter = *counter + 1;

if (Verbose) {
// Temporarily turn off SerializeVMOutput so defaultStream will not
// invoke set_owner(self) for tty_lock.
FlagSetting t(SerializeVMOutput, false);
// The writing below can not guarantee non-blocking because tty may be piped by the filesystems
// or throttled by XOFF, so only dump the dropping message in Verbose mode.
tty->print_cr("asynclog dropping message: %s", h->message());
}
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

I'm not sure this is needed or sensible. If we are flooded, we would then proceed to print every dropped message to tty? So now we flood tty?

I'd hope exhausting the output buffer is rare, and easily remedied by telling the user to increase the buffer or stop using async mode.

}

_buffer.pop_front();
}
assert(_buffer.size() < AsyncLogBufferSize, "_buffer is over-sized.");
_buffer.push_back(msg);

// notify asynclog thread if occupancy is over 3/4
size_t sz = _buffer.size();
if (sz > (AsyncLogBufferSize >> 2) * 3 ) {
_lock.notify();
}
}

void LogAsyncFlusher::enqueue(LogFileOutput& output, const LogDecorations& decorations, const char* msg) {
AsyncLogMessage m(output, decorations, msg);

{ // critical area
// The rank of _lock is same as _tty_lock on purpuse.
// if logging thread is holding _tty_lock now, temporarily yield to _lock.
ttyUnlocker ttyul;
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
enqueue_impl(m);
}
}

// LogMessageBuffer consists of a multiple-part/multiple-line messsages.
// the mutex here gurantees its integrity.
void LogAsyncFlusher::enqueue(LogFileOutput& output, LogMessageBuffer::Iterator msg_iterator) {
ttyUnlocker ttyul;
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);

for (; !msg_iterator.is_at_end(); msg_iterator++) {
AsyncLogMessage m(output, msg_iterator.decorations(), msg_iterator.message());
enqueue_impl(m);
}
}

LogAsyncFlusher::LogAsyncFlusher()
: _state(ThreadState::Running),
_lock(Mutex::tty, "async-log-monitor", true /* allow_vm_block */, Mutex::_safepoint_check_never),
_stats(17 /*table_size*/) {
if (os::create_thread(this, os::asynclog_thread)) {
os::start_thread(this);
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Here is some error handling needed if thread creation failed. UL should drop to sync mode then.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

make sense! I will take a look at the error handling.

}
}

bool AsyncLogMapIterator::do_entry(LogFileOutput* output, uintx* counter) {
LogDecorators decorators = output->decorators();
decorators.without(LogDecorators::tags_decorator);
LogDecorations decorations(LogLevel::Warning, decorators);
const int sz = 128;
char out_of_band[sz];

if (*counter > 0) {
jio_snprintf(out_of_band, sz, UINTX_FORMAT_W(6) " messages dropped...", *counter);
output->write_blocking(decorations, out_of_band);
*counter = 0;
}

return true;
}

void LogAsyncFlusher::writeback(const LinkedList<AsyncLogMessage>& logs) {
LinkedListIterator<AsyncLogMessage> it(logs.head());
while (!it.is_empty()) {
AsyncLogMessage* e = it.next();
e->writeback();
}
}

void LogAsyncFlusher::flush(bool with_lock) {
LinkedListImpl<AsyncLogMessage, ResourceObj::C_HEAP, mtLogging> logs;

if (with_lock) { // critical area
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

I think the trick to get conditional Mutexes without having separate branches is to make the MutexLocker argument itself conditional (I think you can pass NULL, in which case no locking) - then you could merge these two branches.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

thank you for the head-up!
MutexLocker can take NULL and drive through! definitely worth a shot.
Interestingly, MonitorLocker can't do that.

// Caveat: current thread must not hold _tty_lock or other lower rank lockers.
// Cannot install ttyUnlocker here because flush() may be invoked before defaultStream
// initialization.
MutexLocker ml(&_lock, Mutex::_no_safepoint_check_flag);
_buffer.pop_all(&logs);
AsyncLogMapIterator iter;
_stats.iterate(&iter);
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

All this is really difficult to understand for the casual reader. Clearer naming would help:

  • something like "move-list-content" instead of pop_all
  • the whole point of this iteration thing is to print the dropped counters, right? A function "Print Drop Counters" would be clearer. Unless you go the one-async-buffer-per-logfileoutput-route I mentioned earlier.

This comment has been minimized.

@navyxliu

navyxliu May 17, 2021
Author Member Outdated

I will send out a patch to fix this along with the new discard policy.
There's another issue. _stats.iterate(&iter) here invokes output->write_blocking while holding lock!
This has compromised non-blocking protocol.

} else {
// C++ lambda can simplify the code snippet.
_buffer.pop_all(&logs);
AsyncLogMapIterator iter;
_stats.iterate(&iter);
}

writeback(logs);
}

void LogAsyncFlusher::run() {
while (_state == ThreadState::Running) {
{
MonitorLocker m(&_lock, Mutex::_no_safepoint_check_flag);
m.wait(500 /* ms, timeout*/);
}
flush();
}

// Signal thread has terminated
MonitorLocker ml(Terminator_lock);
Atomic::release_store(&_state, ThreadState::Terminated);
ml.notify_all();
}

LogAsyncFlusher* LogAsyncFlusher::_instance = nullptr;

void LogAsyncFlusher::initialize() {
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@simonis

simonis Mar 26, 2021
Member Outdated

I don't think you need this. Please merge the initialization into LogAsyncFlusher::instance() (see my comments on your changes in init.cpp).

This comment has been minimized.

@tstuefe

tstuefe May 7, 2021
Member Outdated

In contrast to Volker I think an explicit initialization is fine and follows the established pattern for other subsystems. However, I would:

  • assert for _instance == NULL and remove the condition. This should only be called once.
  • no need for Atomics here since we do this during VM init.

And while writing this, I realize this would conflict with the gtest fixture where you switch this on or off several time in one process. I propose you switch those gtests - only those testing the LogAsyncFlusher, not those testing the Deque etc - as TEST_OTHER_VM.

Which arguably we should do for all logging gtests, not only yours, if they fiddle with global VM settings. In other words, it should be possible to run the gtests with -Xlog:... options and the gtests should still pass, including all log tests. Today, this is not the case.

if (!_instance) {
_instance = new LogAsyncFlusher();
}
}

// Termination
// 1. issue an atomic release_store&fence to close the logging window.
// 2. flush itself in-place.
// 3. signal asynclog thread to exit.
// 4. wait until asynclog thread exits.
// 5. (optional) delete this in post_run().
void LogAsyncFlusher::terminate() {
if (_instance != NULL) {
LogAsyncFlusher* self = _instance;

Atomic::release_store_fence<LogAsyncFlusher*, LogAsyncFlusher*>(&_instance, nullptr);
self->flush();
{
MonitorLocker ml(&self->_lock, Mutex::_no_safepoint_check_flag);
Atomic::release_store(&self->_state, ThreadState::Terminating);
ml.notify();
}
{
MonitorLocker ml(Terminator_lock, Mutex::_no_safepoint_check_flag);
while (self->_state != ThreadState::Terminated) {
ml.wait();
}
}
}
}

LogAsyncFlusher* LogAsyncFlusher::instance() {
if (Thread::current_or_null() != nullptr) {
This conversation was marked as resolved by navyxliu

This comment has been minimized.

@tstuefe

tstuefe May 7, 2021
Member Outdated

I don't understand why this depends on the existence of a current thread? Ideally we should be able to log just fine in corner cases, e.g. when thread is detached or if we have an error (during signal handling e..g.)

This comment has been minimized.

@navyxliu

navyxliu May 10, 2021
Author Member Outdated

LogAsyncFlusher::enqueue needs Thread::current() to use _lock.

void Mutex::lock_without_safepoint_check() {
  lock_without_safepoint_check(Thread::current());
}

This check is for a very rare corner case. This can be trigger using the following cmdline.
java -Xlog:'thread+smr=debug:file=/tmp/t.log' -Xlog:async -version
the root cause is the following code in thread

void ThreadsSMRSupport::smr_delete(JavaThread *thread) {
  elapsedTimer timer;
  if (EnableThreadSMRStatistics) {
    timer.start();
  }

  wait_until_not_protected(thread);

  delete thread; // Thread::current() will trigger assert(current != __null) failed: Thread::current() called on detached thread after ~Thread()
  if (EnableThreadSMRStatistics) {
    timer.stop();
    uint millis = (uint)timer.milliseconds();
    ThreadsSMRSupport::inc_deleted_thread_cnt();
    ThreadsSMRSupport::add_deleted_thread_times(millis);
    ThreadsSMRSupport::update_deleted_thread_time_max(millis);
  }
  // logsite will crash  in async mode because we can't use Thread::current() here. 
  log_debug(thread, smr)("tid=" UINTX_FORMAT ": ThreadsSMRSupport::smr_delete: thread=" INTPTR_FORMAT " is deleted.", os::current_thread_id(), p2i(thread));
}

This comment has been minimized.

@tstuefe

tstuefe May 13, 2021
Member Outdated

Sigh. I get it. Another example that a good logging system should be independent from the infrastructure of the surrounding system. Any chance this patch could use raw locking primitives, pthread_mutex resp. CreateMutex, without having to rewrite everything?

Ironically I though async logging would reduce the number of log call sites where the conditions for logging are not met, by moving most of the stack out of the caller into a separate thread. But the Thread::current condition is a new limitation then compared to sync logging.

If you don't want to change the Mutex:

So you have some rare log sites which won't allow you to use asynchronous logging since you cannot use locks, and you solve this by returning NULL, which then falls back to synchronous logging? What happens if you mix synchronous and asynchronous logging like this? Does this work?

From a code point of view your reasoning is very difficult to deduce. Can you please change this: remove the THread::current condition from the singleton getter, and add an explicit check for Thread::current at its caller site. In async mode either just omit printing any output needing Thread::current, or relegate to synchronous printing. And add a good comment there.

This comment has been minimized.

@navyxliu

navyxliu May 13, 2021
Author Member Outdated

Thank you for reviewing. you raise the exact problem which keep me from sleeping...

From a code point of view your reasoning is very difficult to deduce. Can you please change this: remove the THread::current condition from the singleton getter, and add an explicit check for Thread::current at its caller site. In async mode either just omit printing any output needing Thread::current, or relegate to synchronous printing. And add a good comment there.

That's exactly the current approach. It works but async logging and synchronous logging may overlap in some rare windows such as my example of smr_delete.

LogAsyncFlusher* LogAsyncFlusher::instance() {
  // thread may has been detached, then Thread::current() used by Mutex is inavailable.
  // eg. ThreadsSMRSupport::smr_delete() uses log_debug() after delete thread.
  if (Thread::current_or_null() != nullptr) {
    return _instance;
  } else {
    return nullptr;
  }
}

One drawback is that I have to give up the FileLocker optimization for the entire interval async logging(99.99% time)
for those "rare" windows :( Once I remove FileLocker for FILE*, it may get stuck when two logging writers overlap!

I am evaluating different approaches. eg. the 2-synchronization-points design or even replace Mutex with a semaphore-based lock like ConfigurationLock

return _instance;
} else {
// current thread may has been detached.
return nullptr;
}
}

// Different from terminate(), abort is invoked by os::abort().
// There are 2 constraints:
// 1. must be async-safe because os::abort() may be invoked by a signal handler while other
// threads are executing.
// 2. must not obtain _lock. eg. gtest.MutexRank.mutex_lock_access_leaf(test_mutex_rank.cpp)
// holds a 'assess' lock and then traps SIGSEGV on purpose.
//
// Unlike terminate, abort() just ensures all pending log messages are flushed. It doesnot
// exit asynclog thread.
void LogAsyncFlusher::abort() {
if (_instance != nullptr) {
// To meet prior constraints, I borrow the idea in LogConfiguration::disable_outputs(),
// the following code shut down all outputs for all tagsets with a RCU synchroniziation.
// After then, I can flush pending queue without a lock.
for (LogTagSet* ts = LogTagSet::first(); ts != NULL; ts = ts->next()) {
ts->disable_outputs();
}
_instance->flush(false /*with_lock*/);
}
}