Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8288904: Incorrect memory ordering in UL #9225

Closed
wants to merge 2 commits into from

Conversation

jdksjolen
Copy link
Contributor

@jdksjolen jdksjolen commented Jun 21, 2022

Please review this fix for JDK-8288904, thank you.

Passed tier1-tier3 tests, excluding faulty tests.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9225/head:pull/9225
$ git checkout pull/9225

Update a local copy of the PR:
$ git checkout pull/9225
$ git pull https://git.openjdk.org/jdk pull/9225/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9225

View PR using the GUI difftool:
$ git pr show -t 9225

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9225.diff

@jdksjolen jdksjolen changed the title JDK-8288904: Incorrect memory ordering in UL 8288904: Incorrect memory ordering in UL Jun 21, 2022
@bridgekeeper
Copy link

bridgekeeper bot commented Jun 21, 2022

👋 Welcome back jdksjolen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 21, 2022
@openjdk
Copy link

openjdk bot commented Jun 21, 2022

@jdksjolen The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jun 21, 2022
@mlbridge
Copy link

mlbridge bot commented Jun 21, 2022

Webrevs

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to add a comment explaining the reason for the fence to be there.

@jdksjolen
Copy link
Contributor Author

@fisk , sure, something like this?

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good.

@openjdk
Copy link

openjdk bot commented Jun 22, 2022

@jdksjolen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8288904: Incorrect memory ordering in UL

Reviewed-by: rehn, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 179 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@fisk, @robehn, @dholmes-ora) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jun 22, 2022
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix in itself looks good.

But it raises a lot of questions for me about how concurrent access is actual controlled here. I can't see what stops a new reader from becoming active after we see _active_readers go to zero?

Thanks.

@jdksjolen
Copy link
Contributor Author

@dholmes-ora,

The implementation is lock-free and uses RCU.

This is approximately how it works in pseudo-code:

class LogOutputList {
  LogOutputNode* entry; // All threads gain entry to the linked list through this pointer
}

void LogOutputList::mutate_output_list() {
  // Save the entry pointer
  LogOutputNode* cur = entry;
  // Remove the entry pointer, now no new readers can access this list
  entry = NULL;
  // OK, we can now wait for all readers to exit
  wait_until_no_readers();
  // ... Do things we need to do to cur ...
  // Restore pointer, new readers can occur.
  entry = cur;
} 

There are some details here which I have omitted, but this is the gist of it.

@jdksjolen
Copy link
Contributor Author

jdksjolen commented Jun 22, 2022

Please sponsor this, thank you :-).

@jdksjolen
Copy link
Contributor Author

/integrate

Ooops, gotta integrate before someone sponsors.

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Jun 22, 2022
@openjdk
Copy link

openjdk bot commented Jun 22, 2022

@jdksjolen
Your change (at version 3141766) is now ready to be sponsored by a Committer.

// Busy wait
}
// Prevent mutations to the output list to float above the active reader check.
// Such a reordering would lead to readers loading faulty data.
OrderAccess::loadstore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got curious about this memory ordering problem. How can a write after the while loop be executed(and be seen by other threads) without even knowing if we can bail out of the while loop. So I'm thinking that otherwise this program would crash:

volatile int never_touched_variable = 0;
int* address = (int*)0xBADC0FEE;

int main() {
	while(never_touched_variable == 0) {}
	*address = 1;
	return 0;

}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going by ARM's memory model, which is understood by me as something like: "Any nondependent loads and stores may be observed in a different order than program order". Dependency here means register or address dependency.

I think your program is allowed to crash, because I think you're invoking UB by making and dereferencing a pointer to random memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the memory state change due to calling free on a node can reorder to before the load checking there are no readers. The. The memory can appear to be free memory and get allocated to random other memory that changes state arbitrarily, while there is still a concurrent reader.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but isn't there a control dependency that the hardware will still obey? Or can the hardware write to memory even if that path is never taken?
Regarding UB, I could paste the assembly of that instead (maybe I should have done that). My question was whether the cpu can execute that write instruction before even knowing if that branch will be taken.
Note: I found this interesting article about control dependencies (https://lwn.net/Articles/860037/). It mentions that the hardware will respect that dependency but there could be some aggressive compiler optimizations on some cases. I don't think that applies here though.
@fisk sorry, not sure I understood the example.

Copy link
Contributor Author

@jdksjolen jdksjolen Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchilano, it seems that it is true that a control dependency establishes that writes are not moved above the read of a control branch (as we do not know which, if any, branch is taken before the read is done). read-on-read allows for moving it up however.

For example:

// OK
x = true;
while(x == true) { }
y = a[0];
~>
x = true;
y = a[0];
while(x == true) {}
// NOT OK
x = true;
while(x == true) { }
a[0] = 5;
~>
x = true;
a[0] = 5;
while(x == true) {}

Source:

https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf section 4.2 and 4.4

I believe that that means that this barrier is unnecessary, but it's good manners to do the Atomic::load.

Nice catch :-).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold your horses guys. I think the logic in the cited paper is flawed. It essentially says that surely the hardware couldn't perform stores before knowing what control flow is taken, at which point the value of the load must be known. Therefore the hardware can't reorder the load and store.

Well, the hardware can speculate one branch is taken, defer the load, buffer the stores without committing them to caches, and then commit them once the load is executed, and the speculation is proven right, committing the branch. Then the already buffered stores will be published. This will yield a result equivalent to the load and store appearing to have reordered across the control dependency. It would never break a sequential program, but would reorder a LoadStore pairbseparated by a control dependency, requiring barriers.

In fact, the official ARMv8 memory model document linked here https://documentation-service.arm.com/static/6048f1aaee937942ba302655 (end of section 6.2) says explicitly that the control dependency is not enough to prevent ordering between a load and a store, and that you need explicit barriers if you don't want reordering.

So basically I would rather trust the official ARM documentation that explicitly says it may reorder, than an academic paper saying it isn't possible because building such hardware would be tricky. It's essentially proof by lack of imagination IMO.

It would be interesting I guess, to run the "S" litmus test on the failing machine. Although I wouldn't want to rely on control dependency tricks for ordering regardless of the outcome. I would follow the spec and assume reordering can happen, as it clearly states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jdksjolen, @fisk thanks for the investigation and linked documents! Mmm there seems to be contradictory definitions then. I also found [1] which as I understand it defines there should be an order between those instructions (sections B2.3.2 and B2.3.3; there is even an intent Note about that). But the fact that we have to look this up and check the fine print is already surprising for me. I thought any cpu would provide that guarantee.

[1] Arm Architecture Reference Manual: https://documentation-service.arm.com/static/623b2de33b9f553dde8fd3b0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting find! So we have 2 ARMv8 memory model documents from ARM, one explicitly saying it may reorder and one explicitly saying it may not.
Since the document saying it won't reorder is more recent, I guess they might have completely changed their minds about this. Has happened before, and seems fine.

While similarly to how we don't exploit data dependency without language level guarantees (we are writing C++, not assembly), I would not exploit control dependencies either, without a C++ level contract saying that's fine, defined in a C++ context. So we should have fences anyway.

However, I would have to assume that ARM would only really be able to update the spec to say this won't reorder, if they knew there are no implementations where it does. So that might at least provide a clue that maybe the reason this code crashed, is due to something different, like maybe the lack of release when allocating new list entries that are injected into the list. So while we might want some fence here anyway, it seems likely it won't stop this code from crashing.

Is it time to wrap the whole thing in a readers writer lock? We already increment and decrement a global variable on the reader side, and writes barely ever happen, but are marked with a lock for writers. It would seemingly fix the problems.

@fisk
Copy link
Contributor

fisk commented Jun 22, 2022

Talked to Johan and StefanK earlier today. In general this data structure is very poorly synchronized. Nodes are injected without release store even though there are concurrent readers, there is no fencing or atomics around basically any of the heads or next pointer accesses, despite reading and writing to them concurrently. Not sure how much of it we want to rewrite in this particular patch, but it does need major reworking I think. Maybe the most reasonable option is writing a new data structure entirely. Or maybe this is one of those times where having any form of readers writer lock would make this code way easier to reason about. We already mark where the reader vs writer sections are, but with unnecessarily relaxed and fragile synchronization for what it is.
We have other places where we really want a readers writer lock and end up inventing the wheel again.
Not sure if fixing this properly should be done in this patch or a follow up patch.

@dholmes-ora
Copy link
Member

There are some details here which I have omitted, but this is the gist of it.

@jdksjolen - it is the actual details I have concerns about. More seems to be done than just a pointer switch to unlink, and the unlink itself is not an atomic operation - are we guaranteed only a single writer is possible? And even so, as others have noted, without atomics or barriers there is no guarantee readers will see the unlink has happened. I'm a bit surprised as this code has been extensively examined in the past.

@robehn
Copy link
Contributor

robehn commented Jun 23, 2022

Talked to Johan and StefanK earlier today. In general this data structure is very poorly synchronized. Nodes are injected without release store even though there are concurrent readers, there is no fencing or atomics around basically any of the heads or next pointer accesses, despite reading and writing to them concurrently. Not sure how much of it we want to rewrite in this particular patch, but it does need major reworking I think. Maybe the most reasonable option is writing a new data structure entirely. Or maybe this is one of those times where having any form of readers writer lock would make this code way easier to reason about. We already mark where the reader vs writer sections are, but with unnecessarily relaxed and fragile synchronization for what it is. We have other places where we really want a readers writer lock and end up inventing the wheel again. Not sure if fixing this properly should be done in this patch or a follow up patch.

If I remember correctly I suggested this code should use the global counter instead.

On control flow dependencies, even compiler can change the control flow so this is an issue on x86 also. (in this case the variable is volatile, so it eliminates most issue).

According to "Who's afraid of a big bad optimizing compiler?" even this is buggy:

// Writer, stores are ordered
y = 1;
OrderAccess::storestore();
Atomic::store(&x, 1);

// Reader
int xr = Atomic::load(&x);
OrderAccess::loadstore();
if (xr == 1)
  y = 0;
 
// Compiler can re-write the reader as:
int xr = Atomic::load(&x);
OrderAccess::loadstore();
if (xr == 1)
   if (y != 0) // <--- can be reordered with load of x
     y = 0;

This particular compiler re-write seems to be very, very uncommon, but legal.

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 21, 2022

@jdksjolen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jdksjolen
Copy link
Contributor Author

I've now re-run tier1-3 tests and they are fully passing (excluding an issue fixed in 8291289). I propose that this patch is merged as is and a new ticket is added to evaluate and perhaps rewrite this implementation.

@dholmes-ora , I believe that it is meant to be multiple reader, single writer. It seems that this is only altered through LogConfiguration, which holds the ConfigurationLock during reading and writing. I assume that people have accepted this code as correct as empirically it works.

@jdksjolen
Copy link
Contributor Author

This PR is still looking for a sponsor. @dholmes-ora, would you mind sponsoring this?

@dholmes-ora
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Jul 30, 2022

Going to push as commit 3579024.
Since your change was applied there have been 662 commits pushed to the master branch:

  • 470c0eb: 8290740: Catalog not used when the handler is null
  • dd9bd31: 8289688: jfr command hangs when it processes invalid file
  • 8179a19: 8290243: move seeTagToContent from HtmlDocletWriter to TagletWriterImpl
  • 15943e4: 8282666: nsk/jvmti/PopFrame/popframe004 failed with: TEST FAILED: 30 JVMTI events were generated by the function PopFrame()
  • 0bcf176: 6227536: KeyGenerator.init() methods do not throw IllegalArgumentException for keysize == 0
  • cc2861a: 8290901: Reduce use of -source in langtools tests
  • 64a1a08: 8289647: AssertionError during annotation processing of record related tests
  • 95fc16b: 8290525: Move HeapRegion::_compaction_top to G1FullCollector
  • f58e08e: 8290715: Fix incorrect uses of G1CollectedHeap::heap_region_containing()
  • 18cd16d: 8291003: ARM32: constant_table.size assertion
  • ... and 652 more: https://git.openjdk.org/jdk/compare/5ad6286b73889e47f40d0051a96ef91137faa25c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 30, 2022
@openjdk openjdk bot closed this Jul 30, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Jul 30, 2022
@openjdk
Copy link

openjdk bot commented Jul 30, 2022

@dholmes-ora @jdksjolen Pushed as commit 3579024.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

5 participants