-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8288904: Incorrect memory ordering in UL #9225
Conversation
👋 Welcome back jdksjolen! A progress list of the required criteria for merging this PR into |
@jdksjolen The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to add a comment explaining the reason for the fence to be there.
@fisk , sure, something like this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good.
@jdksjolen This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 179 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@fisk, @robehn, @dholmes-ora) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix in itself looks good.
But it raises a lot of questions for me about how concurrent access is actual controlled here. I can't see what stops a new reader from becoming active after we see _active_readers go to zero?
Thanks.
The implementation is lock-free and uses RCU. This is approximately how it works in pseudo-code:
There are some details here which I have omitted, but this is the gist of it. |
Please sponsor this, thank you :-). |
/integrate Ooops, gotta integrate before someone sponsors. |
@jdksjolen |
// Busy wait | ||
} | ||
// Prevent mutations to the output list to float above the active reader check. | ||
// Such a reordering would lead to readers loading faulty data. | ||
OrderAccess::loadstore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just got curious about this memory ordering problem. How can a write after the while loop be executed(and be seen by other threads) without even knowing if we can bail out of the while loop. So I'm thinking that otherwise this program would crash:
volatile int never_touched_variable = 0;
int* address = (int*)0xBADC0FEE;
int main() {
while(never_touched_variable == 0) {}
*address = 1;
return 0;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going by ARM's memory model, which is understood by me as something like: "Any nondependent loads and stores may be observed in a different order than program order". Dependency here means register or address dependency.
I think your program is allowed to crash, because I think you're invoking UB by making and dereferencing a pointer to random memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the memory state change due to calling free on a node can reorder to before the load checking there are no readers. The. The memory can appear to be free memory and get allocated to random other memory that changes state arbitrarily, while there is still a concurrent reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but isn't there a control dependency that the hardware will still obey? Or can the hardware write to memory even if that path is never taken?
Regarding UB, I could paste the assembly of that instead (maybe I should have done that). My question was whether the cpu can execute that write instruction before even knowing if that branch will be taken.
Note: I found this interesting article about control dependencies (https://lwn.net/Articles/860037/). It mentions that the hardware will respect that dependency but there could be some aggressive compiler optimizations on some cases. I don't think that applies here though.
@fisk sorry, not sure I understood the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchilano, it seems that it is true that a control dependency establishes that writes are not moved above the read of a control branch (as we do not know which, if any, branch is taken before the read is done). read-on-read allows for moving it up however.
For example:
// OK
x = true;
while(x == true) { }
y = a[0];
~>
x = true;
y = a[0];
while(x == true) {}
// NOT OK
x = true;
while(x == true) { }
a[0] = 5;
~>
x = true;
a[0] = 5;
while(x == true) {}
Source:
https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf section 4.2 and 4.4
I believe that that means that this barrier is unnecessary, but it's good manners to do the Atomic::load
.
Nice catch :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold your horses guys. I think the logic in the cited paper is flawed. It essentially says that surely the hardware couldn't perform stores before knowing what control flow is taken, at which point the value of the load must be known. Therefore the hardware can't reorder the load and store.
Well, the hardware can speculate one branch is taken, defer the load, buffer the stores without committing them to caches, and then commit them once the load is executed, and the speculation is proven right, committing the branch. Then the already buffered stores will be published. This will yield a result equivalent to the load and store appearing to have reordered across the control dependency. It would never break a sequential program, but would reorder a LoadStore pairbseparated by a control dependency, requiring barriers.
In fact, the official ARMv8 memory model document linked here https://documentation-service.arm.com/static/6048f1aaee937942ba302655 (end of section 6.2) says explicitly that the control dependency is not enough to prevent ordering between a load and a store, and that you need explicit barriers if you don't want reordering.
So basically I would rather trust the official ARM documentation that explicitly says it may reorder, than an academic paper saying it isn't possible because building such hardware would be tricky. It's essentially proof by lack of imagination IMO.
It would be interesting I guess, to run the "S" litmus test on the failing machine. Although I wouldn't want to rely on control dependency tricks for ordering regardless of the outcome. I would follow the spec and assume reordering can happen, as it clearly states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdksjolen, @fisk thanks for the investigation and linked documents! Mmm there seems to be contradictory definitions then. I also found [1] which as I understand it defines there should be an order between those instructions (sections B2.3.2 and B2.3.3; there is even an intent Note about that). But the fact that we have to look this up and check the fine print is already surprising for me. I thought any cpu would provide that guarantee.
[1] Arm Architecture Reference Manual: https://documentation-service.arm.com/static/623b2de33b9f553dde8fd3b0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting find! So we have 2 ARMv8 memory model documents from ARM, one explicitly saying it may reorder and one explicitly saying it may not.
Since the document saying it won't reorder is more recent, I guess they might have completely changed their minds about this. Has happened before, and seems fine.
While similarly to how we don't exploit data dependency without language level guarantees (we are writing C++, not assembly), I would not exploit control dependencies either, without a C++ level contract saying that's fine, defined in a C++ context. So we should have fences anyway.
However, I would have to assume that ARM would only really be able to update the spec to say this won't reorder, if they knew there are no implementations where it does. So that might at least provide a clue that maybe the reason this code crashed, is due to something different, like maybe the lack of release when allocating new list entries that are injected into the list. So while we might want some fence here anyway, it seems likely it won't stop this code from crashing.
Is it time to wrap the whole thing in a readers writer lock? We already increment and decrement a global variable on the reader side, and writes barely ever happen, but are marked with a lock for writers. It would seemingly fix the problems.
Talked to Johan and StefanK earlier today. In general this data structure is very poorly synchronized. Nodes are injected without release store even though there are concurrent readers, there is no fencing or atomics around basically any of the heads or next pointer accesses, despite reading and writing to them concurrently. Not sure how much of it we want to rewrite in this particular patch, but it does need major reworking I think. Maybe the most reasonable option is writing a new data structure entirely. Or maybe this is one of those times where having any form of readers writer lock would make this code way easier to reason about. We already mark where the reader vs writer sections are, but with unnecessarily relaxed and fragile synchronization for what it is. |
@jdksjolen - it is the actual details I have concerns about. More seems to be done than just a pointer switch to unlink, and the unlink itself is not an atomic operation - are we guaranteed only a single writer is possible? And even so, as others have noted, without atomics or barriers there is no guarantee readers will see the unlink has happened. I'm a bit surprised as this code has been extensively examined in the past. |
If I remember correctly I suggested this code should use the global counter instead. On control flow dependencies, even compiler can change the control flow so this is an issue on x86 also. (in this case the variable is volatile, so it eliminates most issue). According to "Who's afraid of a big bad optimizing compiler?" even this is buggy:
This particular compiler re-write seems to be very, very uncommon, but legal. |
@jdksjolen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
I've now re-run tier1-3 tests and they are fully passing (excluding an issue fixed in 8291289). I propose that this patch is merged as is and a new ticket is added to evaluate and perhaps rewrite this implementation. @dholmes-ora , I believe that it is meant to be multiple reader, single writer. It seems that this is only altered through |
This PR is still looking for a sponsor. @dholmes-ora, would you mind sponsoring this? |
/sponsor |
Going to push as commit 3579024.
Your commit was automatically rebased without conflicts. |
@dholmes-ora @jdksjolen Pushed as commit 3579024. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review this fix for JDK-8288904, thank you.
Passed tier1-tier3 tests, excluding faulty tests.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9225/head:pull/9225
$ git checkout pull/9225
Update a local copy of the PR:
$ git checkout pull/9225
$ git pull https://git.openjdk.org/jdk pull/9225/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9225
View PR using the GUI difftool:
$ git pr show -t 9225
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9225.diff