-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8305819: LogConfigurationTest intermittently fails on AArch64 #13421
Conversation
Hi @gaogao-mem, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user gaogao-mem" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@gaogao-mem The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
/covered |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
Hi, please send an e-mail to Dalibor.Topic@oracle.com so that I can mark your account as verified. |
Hi, I have sent the e-mail, please check it. |
@gaogao-mem Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thank you for this fix. At first glance this looks correct to me. I'll do some further analysis soon. You will need approval from a Reviewer also, I'll ask someone to have a look.
Performance wise this is OK.
@@ -127,6 +127,8 @@ void LogOutputList::add_output(LogOutput* output, LogLevelType level) { | |||
node->_next = node->_next->_next) { | |||
} | |||
|
|||
OrderAccess::storestore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a short comment explaining why this is needed, similar to what's done in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've added it.
@@ -127,6 +127,8 @@ void LogOutputList::add_output(LogOutput* output, LogLevelType level) { | |||
node->_next = node->_next->_next) { | |||
} | |||
|
|||
OrderAccess::storestore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrderAccess::storestore(); | |
OrderAccess::release(); |
I don't think it'd hurt performance significantly, and is safer and easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestion! The performance issue I mentioned earlier was due to adding OrderAccess::loadload while reading the list, but after researching some information for memory barrier, it was found to be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic access is unnecessary when reading the list? I'd like to know the reasoning behind that.
I guess you could argue that dependency would save you on the read side, but I'd like to see and understand the code. It's very risky to depend on that,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list is read in src/hotspot/share/logging/logTagSet.cpp by iterator in src/hotspot/share/logging/logOutputList.hpp.
For example, in void LogTagSet::log(LogLevelType level, const char* msg), the list is read as follows:
src/hotspot/share/logging/logTagSet.cpp
LogOutputList::Iterator it = _output_list.iterator(level);
LogDecorations decorations(level, *this, _decorators);
for (; it != _output_list.end(); it++) {
(*it)->write(decorations, msg);
}
And I intended to add loadload in Iterator as follows:
src/hotspot/share/logging/logOutputList.hpp
void operator++(int) {
_current = _current->_next;
+ OrderAccess::loadload();
}
Iterator iterator(LogLevelType level = LogLevel::Last) {
increase_readers();
- return Iterator(this, _level_start[level]);
+ LogOutputNode* start=_level_start[level];
+ OrderAccess::loadload();
+ return Iterator(this, start);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why some sort of atomic operation isn't required on the reading side. I'd think something like Atomic::load_acquire()
would be appropriate here, along with Atomic::release_store()
on the writing side. I think I need you to explain, with references, why you don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test, two threads read the list while another thread insert to the list .
The writing thread insert to the list as follows:
void LogOutputList::add_output(LogOutput* output, LogLevelType level) {
LogOutputNode* node = new LogOutputNode();
node->_value = output;
node->_level = level;
// Set the next pointer to the first node of a lower level
for (node->_next = _level_start[level];
node->_next != nullptr && node->_next->_level == level;
node->_next = node->_next->_next) {
}
// Update the _level_start index
for (int l = LogLevel::Last; l >= level; l--) {
if (_level_start[l] == nullptr || _level_start[l]->_level < level) {
_level_start[l] = node;
}
}
// Add the node the list
for (LogOutputNode* cur = _level_start[LogLevel::Last]; cur != nullptr; cur = cur->_next) {
if (cur != node && cur->_next == node->_next) {
cur->_next = node;
break;
}
}
}
On the writing side, it set the value, level and next pointer of the node, then update the _level_start array(_level_start[i] stores the head node of the list whose level is i) and add the node to the list.
And on the reading side, it reads as follows:
void LogTagSet::log(LogLevelType level, const char* msg) {
// Increasing the atomic reader counter in iterator(level) must
// happen before the creation of LogDecorations instance so
// wait_until_no_readers() in LogConfiguration::configure_output()
// synchronizes _decorations as well. The order is guaranteed by
// the implied memory order of Atomic::add().
LogOutputList::Iterator it = _output_list.iterator(level); // obtain the head node of the list from _level_start[level]
LogDecorations decorations(level, *this, _decorators);
for (; it != _output_list.end(); it++) {
(*it)->write(decorations, msg); //crash
}
}
it++ acquires the next node of the list, and then (*it) reads the value of the node.
In the test, it crash at (*it)->write(decorations, msg), it may due to fault loading on the read side. On the writing side, setting the value of node and adding node to list may be reordered, therefore memory barrier is necessary. While on the reading side, I think getting the pointer of node and reading the value is dependent so memory barrier is unnecessary.
By the way, I have tested millions of times and it hasn't failed when just adding memory barrier on the writing side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think you're wrong, and I'm going to explain why.
I don't think we've met before, and I don't know your level of knowledge and experience, so I'm going to go through the problem one step at a time. Please forgive me if I appear to be stating the obvious.
Firstly, testing doesn't provide any evidence that code is reliable. We've had bugs in hotspot for decades, and the code has appeared to work until one day it mysteriously "stopped working". Some of them involve races just like this one, and I recently had to fix a really nasty one.
Some definitions
Undefined behavior (UB): Renders the entire program meaningless if certain rules of the language are violated. There are no restrictions on the behavior of the program. Examples of undefined behavior are data races... the compiled program is not required to do anything meaningful.
Threads and data races: When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless [certain conditions]
If a data race occurs, the behavior of the program is undefined.
While there are many instances of what is technically UB in hotspot, this is an acknowledged problem. We try to get rid of it whenever we can, and the UB that remains has to be carefully analysed and justified. We may be able in many cases to use known properties of a system to argue that while something is technically UB according to the language standard, it's effectively implementation defined.
About this patch
This patch certainly has a data race, and therefore is undefined. There is nothing on the reading side which helps: while dependency ordering works under certain circumstances on the underlying machine, this is shared code, not AArch64-specific, and in any case a C++ complier isn't required to respect dependencies, so any attempt to use dependencies at this level is invalid.
It don't look to me like this is the only data race in LogOutputList
. In my opinion, the whole thing really needs more analysis. I think we'd find more bugs.
As a minimum, I think you need to use a releasing store on the writer side and an acquiring load on the reader side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your guidance and advice. I appreciate the time and effort you took to share your knowledge with me. And I have modified the code according to your suggestion, please take a look.
So far I have only spotted this data race, and it seems atomic operation on the writing and reading side could provide a fix. Moreover, data races related with LogOutputList are likely to cause more potential bugs, which deserves deeper insight. We will try to do more analysis about it later.
modify membarrier Co-authored-by: Andrew Haley <aph-open@littlepinkcloud.com>
@gaogao-mem This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 496 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jdksjolen, @theRealAph, @dholmes-ora, @navyxliu) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Webrevs
|
@jdksjolen Any comment or suggestion? |
Thank you @theRealAph for the in-depth review of this.
I agree. @gaogao-mem, I'll run this through our test suite and assuming everything looks OK I'll give you the thumbs up. You'll also need a sponsor, which I think Andrew will be OK with being. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I don't like this fix. Not because it is incorrect, but because it is far too low-level for this kind of shared code. The UL code should be correctly synchronized, using locking where needed to ensure full thread-safety. It should not be racy with memory barriers being inserted judiciously as per this PR. If we need a lock-free design for this for performance (do we?) then we should have a lock-free design, where the synchronization points are clear in the API.
Is the solution as simple as the test needing to use the ConfigurationLock? |
@dholmes-ora Thank you for your comment. I agree with your opinion that the API here needs better design and implementation. In my humble opinion, if we need to use locks, I would prefer to use a readwrite lock. Personally, I lean towards a lock-free implementation, but as a beginner, I'm not fully familiar with the overall design and implementation of the UL, so it may require more time. The current crash can be resolved with this patch, so I think maybe we can first apply this patch to solve the problem we are facing now. If possible, I'm willing to propose a more thorough solution in a follow up patch, but it may take some time. |
I've had an idea. How about you use a load-acquire on the reader side, making the code correct, with a FIXME comment. Something like "// FIXME: memory_order_consume could be used here." |
|
That's OK. |
This is UB, but the reason isn't the data race. In RCU technique, a load of aligned pointer is either old or new. With a storestore barrier, a reader is supposed to see the new memory state if it has seen the new pointer. It's UB because compiler optimizations may flip over in load node->_value before the program evaluate _level_start[level]. In your diagram, it's something like this.
In my understanding, no compiler can conduct an optimization like that because evaluation of 'node->value' prematurely may have side-effect(segfault). it's legal indeed. @theRealAph I'm also okay that your load-acquire solution. |
@@ -48,11 +49,11 @@ class LogOutputList { | |||
private: | |||
struct LogOutputNode : public CHeapObj<mtLogging> { | |||
LogOutput* _value; | |||
LogOutputNode* _next; | |||
LogOutputNode* volatile _next; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atomic::load_acquire will cast its argument to const volatile T* no matter what.
another reason is that there's no other load of _next field. I don't see any chance compiler elides load instruction for it.
so I don't think we need 'volatile' qualifier for it. same reasons for _level_start[LogLevel::Count]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have referred to the usage of Atomic::load_acquire in hotspot, and found that most of them use 'volatile' qualifier even if there is no multiple load of the field in some situation. Furthermore, I think adding volatile will not cause performance or correctness issue (harmless), thus I tend to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a volatile qualifier for atomically-accessed fields is standard practice in HotSpot. Apart from anything else, it's good for readability.
Hi, is the current solution okay with you? @dholmes-ora |
I disagree. The data race is the reason. A compiler is allowed to move loads because a data race is UB.
...do absolutely anything. Once you have undefined behaviour, the entire program is undefined. It's not possible to negotiate with a compiler around UB. @gaogao-mem is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the iteration of the output list falls outside of the RCU reader protection then I think we need to treat it like any other lock-free code. That will at least repair the currently detected bug.
After that we should look at replacing the this "ad-hoc" synchronization (lock + RCU + barriers) with a more direct synchronization mechanism (rwLock?) - which I think is what @theRealAph is advocating for to get rid of any UB associated with RCU.
@@ -127,6 +127,10 @@ void LogOutputList::add_output(LogOutput* output, LogLevelType level) { | |||
node->_next = node->_next->_next) { | |||
} | |||
|
|||
// Prevent the reordering of adding node to list and setting the value of node. | |||
// Such a reordering would lead to reading incorrect values. | |||
OrderAccess::release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we were to treat this code as the usual pattern of allowing lock-free iteration of a linked-list then we would do a release_store on line 137 and a release_store on line 144. We could then factor the release out ahead of the two loops and place it in the current position. However we would also normally use Atomic::store for the actual stores in the loops, and we would use Atomic::load for the loads in the loops. So:
// To allow lock-free iteration of the output list the updates in the loops
// below require release semantics.
OrderAccess::release();
// Update the _level_start index
for (int l = LogLevel::Last; l >= level; l--) {
LogOutputNode* lnode = Atomic::load(&_level_start[l]);
if ( lnode == nullptr || lnode->_level < level) {
Atomic::store(&_level_start[l], node);
}
}
// Add the node the list
for (LogOutputNode* cur = Atomic::load(&_level_start[LogLevel::Last]);
cur != nullptr;
cur = Atomic::load(&cur->_next)) {
if (cur != node && Atomic::load(&cur->_next) == node->_next) {
Atomic::store(&cur->_next, node);
break;
}
}
Logically the intent is that _level_start
array and the node _next
field are treated as special "atomic" types and so all accesses must be Atomic::xx. But I suspect in this case that would involve quite a lot of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. In the usual pattern of allowing lock-free iteration of a linked-list, atomic access is needed especially when store the _level_start
and _next
. However it may require many changes if all accesses of them be atomic. Therefore, I only made the changes as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am not a reviewer.
To be clear, I'm happy with any legal C++ code. |
Hi, I have modified based on your suggestion. Is it okay now? @dholmes-ora |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am okay with the changes in this form. Thanks.
/integrate |
@gaogao-mem |
/sponsor |
Going to push as commit 911cc7c.
Your commit was automatically rebased without conflicts. |
@y1yang0 @gaogao-mem Pushed as commit 911cc7c. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
/backport jdk17u-dev |
@mmyxym Could not automatically backport
Please fetch the appropriate branch/commit and manually resolve these conflicts by using the following commands in your personal fork of openjdk/jdk17u-dev. Note: these commands are just some suggestions and you can use other equivalent commands you know.
Once you have resolved the conflicts as explained above continue with creating a pull request towards the openjdk/jdk17u-dev with the title Below you can find a suggestion for the pull request body:
|
LogConfigurationTest*reconfigure*MT* crash intermittently on AArch64.
According to the crash log and coredump, we found it crash as follows:
In the test, two threads write into the log while another thread dynamically changes the decorators and tags. During this time, the _output_list will be modified. Because of the relax memory model of aarch64, while adding LogOutputNode to LogOutputList, adding node to list and setting the value of node may be reordered, therefore the read thread may not read the correct value of the node's content. Consequently, storestore memory barrier is needed to ensure the order of writing.
By the way, applying this patch may affect performance.
How to reproduce on Linux aarch64:
test case
Crash may occasionally occur after running continuously for 5000 times.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13421/head:pull/13421
$ git checkout pull/13421
Update a local copy of the PR:
$ git checkout pull/13421
$ git pull https://git.openjdk.org/jdk.git pull/13421/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13421
View PR using the GUI difftool:
$ git pr show -t 13421
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13421.diff
Webrev
Link to Webrev Comment