Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upstyle: Tweak rule tree memory ordering. #15562
Conversation
highfive
commented
Feb 15, 2017
|
Heads up! This PR modifies the following files:
|
highfive
commented
Feb 15, 2017
|
r? @bholley |
|
@bors-servo try |
style: Tweak rule tree memory ordering. I've commented on the ones that I think are the most tricky. Note that this code is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs). <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562) <!-- Reviewable:end -->
|
|
|
Those are a few unrelated intermittents: #14338, #15444, etc., and a frameborder_a.html that seems also intermittent, let's check: @bors-servo retry |
style: Tweak rule tree memory ordering. I've commented on the ones that I think are the most tricky. Note that this code is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs). <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562) <!-- Reviewable:end -->
|
|
| // messing things up. | ||
| let ptr = self.get().next_sibling.load(Ordering::SeqCst); | ||
| // Note that relaxed is fine because we take care of racing if needed | ||
| // with the correct memory ordering at the end of our search. |
This comment has been minimized.
This comment has been minimized.
bholley
Feb 22, 2017
Contributor
Hm, what do you mean by "with the correct memory ordering" here? It seems like the reads below are relaxed as well.
This comment has been minimized.
This comment has been minimized.
emilio
Mar 1, 2017
Author
Member
Sorry for the lag here, I've reworded this comment to be clear. What I meant is that we'll eventually arrive into the CAS-loop safely in the only case the ordering may actually impact anything as far as I know.
| @@ -580,13 +581,13 @@ impl StrongRuleNode { | |||
| let existing = | |||
| next_sibling_ptr.compare_and_swap(ptr::null_mut(), | |||
| new_ptr, | |||
| Ordering::SeqCst); | |||
| Ordering::Relaxed); | |||
This comment has been minimized.
This comment has been minimized.
bholley
Feb 22, 2017
Contributor
Hm, at first glance it would seem that we either need to manage next_sibling_ptr (and probably first_child_ptr) with acquire/release semantics, or put a barrier between the initialization of the new RuleNode and CAS-ing it into the rule tree. Otherwise, what stops the compiler from reordering the initializing writes in RuleNode::new to occur after the CAS?
This comment has been minimized.
This comment has been minimized.
emilio
Mar 1, 2017
Author
Member
I'm not completely sure how that could happen, tbh. I can't imagine a reordering that would make sense there, given it's across a loop. Anyway, using Aquire seems safer, so let's just use that.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Sorry for the terrible delay here. I'll try to be quick on this one going forward. |
| @@ -554,6 +559,7 @@ impl StrongRuleNode { | |||
| source: StyleSource, | |||
| level: CascadeLevel) -> StrongRuleNode { | |||
| let mut last = None; | |||
| // TODO(emilio): We could avoid all the refcount churn here. | |||
This comment has been minimized.
This comment has been minimized.
bholley
Mar 21, 2017
Contributor
Given that the refcounts are atomic, I think this is worth doing. Can you file a bug against stylo-perf?
This comment has been minimized.
This comment has been minimized.
| let mut old_head = free_list.load(Ordering::SeqCst); | ||
| // | ||
| // Note that we can use relaxed operations, we'll fail the | ||
| // compare_exchange_weak anyway if we fail. |
This comment has been minimized.
This comment has been minimized.
bholley
Mar 21, 2017
Contributor
This comment needs updating. Worth explaining in some detail why (Acquired, Relaxed) is the right ordering here.
This comment has been minimized.
This comment has been minimized.
| // | ||
| // Note that we can use relaxed operations here since we're effectively | ||
| // locking the free list, and the memory ordering is already guaranteed | ||
| // by that locking/unlocking. |
This comment has been minimized.
This comment has been minimized.
| // mutated for search/insertion. In that case, the only thing that may | ||
| // happen is that we find a null sibling pointer when other thread has | ||
| // written already to that memory location. | ||
| // |
This comment has been minimized.
This comment has been minimized.
bholley
Mar 21, 2017
Contributor
This still needs to be Acquire in order to synchronize properly with the Release when CAS-ing in new siblings, right?
This comment has been minimized.
This comment has been minimized.
emilio
Apr 3, 2017
Author
Member
Why? The point of this being relaxed is that even if we read when another thread has written into the field (and it's not visible), the worst case is that we'd get a null pointer when someone really wrote into it, and in that case, we'd enter the cas loop anyway, so this can be faster when there are actual siblings, which is the more common case.
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
Contributor
I think this basically hinges on the outcome of the discussion below. If that write needs to be Release, then this read needs to be Acquire, otherwise the synchronization doesn't take effect.
| @@ -580,13 +586,13 @@ impl StrongRuleNode { | |||
| let existing = | |||
| next_sibling_ptr.compare_and_swap(ptr::null_mut(), | |||
| new_ptr, | |||
| Ordering::SeqCst); | |||
| Ordering::Acquire); | |||
This comment has been minimized.
This comment has been minimized.
bholley
Mar 21, 2017
Contributor
Hm, doesn't this need to be AcqRel? It needs to be Release because of the issue I pointed out last time (same reason why next_sibling() needs to use Acquire()). I think it needs to be Acquire if something reads fields on |existing|, which happens in the upgrade() call below.
This comment has been minimized.
This comment has been minimized.
emilio
Apr 3, 2017
Author
Member
Why? Upon reflection, I think even relaxed may be ok given we're looping. Aren't you mixing reads of the fields on the nodes vs reads on the atomic pointers? How's the field accesses on existing relevant to the loading order of next_sibling_ptr? (It's late and I may need re-paging all the memory orderings back in, but...)
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
•
Contributor
Aren't you mixing reads of the fields on the nodes vs reads on the atomic pointers?
Unless I'm very confused here (which is a real possibility), I don't think so.
From my understanding, memory ordering isn't about ensuring any sort of consistency with regards to the actual atomic being loaded/stored (those operations are already atomic). Rather, memory ordering is about consistency in how other nearby, potentially-non-atomic operations are observed by various threads, in particularly how they may be reordered by the compiler and by the CPU. Sequential Consistency means that no other operations are observably reordered across the Sequentially-Consistent atomic operation, from the perspective of any thread. Release/Acquire means that no write operations performed by a thread manipulating field X with |Release| will be observed to be reordered across the change to X from the perspective of a thread reading field X with |Acquire|. Other threads can observe entirely different global orderings, because they're not "synchronizing" on the same memory address. [1]
So in this case, I believe we need Release when CAS-ing in the new entries, because otherwise the initialization of the new entry may not be "flushed" before the memory is accessed by other threads. And those threads need to obtain the pointers with Acquire, so that they are guaranteed to view the "flushed" writes.
If I'm wrong here, please tell me, because I would be very interested to know that. :-)
[1] http://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering
This comment has been minimized.
This comment has been minimized.
emilio
Apr 3, 2017
•
Author
Member
Yeah, so there are two different things that this controls:
- Visibility of non-atomic/relaxed writes, and.
- Compiler instruction re-ordering.
I'm still not seeing how some re-ordering gone wrong could affect this, and wrt:
So in this case, I believe we need Release when CAS-ing in the new entries, because otherwise the initialization of the new entry may not be "flushed" before the memory is accessed by other threads.
In that case, the relaxed CAS itself wouldn't be visible either, so there's no way any other thread would be able to access that pointer, right?
This comment has been minimized.
This comment has been minimized.
emilio
Apr 3, 2017
Author
Member
Anyway, let's get down to what we agree with, and perhaps file followups as needed
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
Contributor
In that case, the relaxed CAS itself wouldn't be visible either,
so there's no way any other thread would be able to access that pointer, right?
This is the heart of my whole concern here. There are 3 operations that happen here:
(1) The allocator selects an address in memory and returns it as a pointer.
(2) The caller writes various initialization patterns to the region in memory referenced by the pointer.
(3) The caller CASes the pointer into the rule tree, expose it to other threads.
With Relaxed semantics, what prevents another thread observing the ordering (1) (3) (2)? If nothing does, then the thread may see garbage memory.
|
Just a bump on this one - we're real close here so would be good to get it over the line to prevent us both from having to page everything back in all the time. :-) |
|
Pushed the new changes as a fixup commit, in order for them to be easily reviewed. I'll squash them later. |
| // FIXME(emilio): Fiddle with memory orderings. | ||
| let first_child = self.first_child.load(Ordering::SeqCst); | ||
| // See next_sibling to see why relaxed is ok. | ||
| let first_child = self.first_child.load(Ordering::Relaxed); |
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
Contributor
Given that next_sibling is currently |Acquire|, this needs to be |Acquire| as well.
| // succeeds, the writes from the last thread that wrote into | ||
| // next_sibling_ptr are visible in this one. If we have the | ||
| // guarantee that _all_ memory written before that write is | ||
| // visible (which I think we do), that means that their |
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
Contributor
We don't have that guarantee with Relaxed, since IIUC that's what memory ordering is all about.
| // too, so we shouldn't need this `AcqRel` semantics. | ||
| // | ||
| // If this is true, the next_sibling() function should also be | ||
| // changed to `Relaxed`. |
This comment has been minimized.
This comment has been minimized.
| // guarantee that _all_ memory written before that write is | ||
| // visible (which I think we do), that means that their | ||
| // initialization of the new node is visible into our thread | ||
| // too, so we shouldn't need this `AcqRel` semantics. |
This comment has been minimized.
This comment has been minimized.
bholley
Apr 3, 2017
Contributor
If you do end up landing with this comment, please reference the discussion in this PR.
| /// Returns whether this node has any child, only intended for testing | ||
| /// purposes. | ||
| pub unsafe fn has_children_for_testing(&self) -> bool { | ||
| !self.get().first_child.load(Ordering::Relaxed).is_null() |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
emilio
Apr 3, 2017
Author
Member
This is only for testing, and there are barriers around (this is only executed in the "sequential" mode).
| }; | ||
|
|
||
| if !should_drop { | ||
| return | ||
| } | ||
|
|
||
| debug_assert_eq!(node.first_child.load(Ordering::SeqCst), | ||
| debug_assert_eq!(node.first_child.load(Ordering::Relaxed), |
This comment has been minimized.
This comment has been minimized.
|
To summarize IRC discussion: All first_child and next_siblings should have:
(excluding main-thread-only functions). r=me with that changed and the comments fixed up. Thanks for doing this! @bors-servo delegate+ |
|
|
d1d1036
to
5102d15
I've commented on the ones that I think are the most tricky. Note that this code is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).
|
@bors-servo r=bholley
|
|
|
style: Tweak rule tree memory ordering. I've commented on the ones that I think are the most tricky. Note that this code is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs). <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562) <!-- Reviewable:end -->
|
|
emilio commentedFeb 15, 2017
•
edited by larsbergstrom
I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).
This change is