Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

style: Tweak rule tree memory ordering. #15562

Merged
merged 1 commit into from Apr 3, 2017
Merged

Conversation

@emilio
Copy link
Member

emilio commented Feb 15, 2017

I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).


This change is Reviewable

@highfive
Copy link

highfive commented Feb 15, 2017

Heads up! This PR modifies the following files:

  • @bholley: components/style/rule_tree/mod.rs
@highfive
Copy link

highfive commented Feb 15, 2017

warning Warning warning

  • These commits modify unsafe code. Please review it carefully!
@emilio
Copy link
Member Author

emilio commented Feb 15, 2017

@highfive highfive assigned bholley and unassigned cbrewster Feb 15, 2017
@emilio
Copy link
Member Author

emilio commented Feb 15, 2017

@bors-servo
Copy link
Contributor

bors-servo commented Feb 15, 2017

Trying commit e45e3bf with merge a279331...

bors-servo added a commit that referenced this pull request Feb 15, 2017
style: Tweak rule tree memory ordering.

I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Feb 15, 2017

💔 Test failed - linux-rel-wpt

@emilio
Copy link
Member Author

emilio commented Feb 15, 2017

Those are a few unrelated intermittents: #14338, #15444, etc., and a frameborder_a.html that seems also intermittent, let's check:

@bors-servo retry

@bors-servo
Copy link
Contributor

bors-servo commented Feb 15, 2017

Trying commit e45e3bf with merge c294699...

bors-servo added a commit that referenced this pull request Feb 15, 2017
style: Tweak rule tree memory ordering.

I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Feb 15, 2017

// messing things up.
let ptr = self.get().next_sibling.load(Ordering::SeqCst);
// Note that relaxed is fine because we take care of racing if needed
// with the correct memory ordering at the end of our search.

This comment has been minimized.

@bholley

bholley Feb 22, 2017

Contributor

Hm, what do you mean by "with the correct memory ordering" here? It seems like the reads below are relaxed as well.

This comment has been minimized.

@emilio

emilio Mar 1, 2017

Author Member

Sorry for the lag here, I've reworded this comment to be clear. What I meant is that we'll eventually arrive into the CAS-loop safely in the only case the ordering may actually impact anything as far as I know.

@@ -580,13 +581,13 @@ impl StrongRuleNode {
let existing =
next_sibling_ptr.compare_and_swap(ptr::null_mut(),
new_ptr,
Ordering::SeqCst);
Ordering::Relaxed);

This comment has been minimized.

@bholley

bholley Feb 22, 2017

Contributor

Hm, at first glance it would seem that we either need to manage next_sibling_ptr (and probably first_child_ptr) with acquire/release semantics, or put a barrier between the initialization of the new RuleNode and CAS-ing it into the rule tree. Otherwise, what stops the compiler from reordering the initializing writes in RuleNode::new to occur after the CAS?

This comment has been minimized.

@emilio

emilio Mar 1, 2017

Author Member

I'm not completely sure how that could happen, tbh. I can't imagine a reordering that would make sense there, given it's across a loop. Anyway, using Aquire seems safer, so let's just use that.

This comment has been minimized.

@bholley

bholley Mar 4, 2017

Contributor

(Did you mean to push an update to this?)

This comment has been minimized.

@emilio

emilio Mar 10, 2017

Author Member

I meant to, whoops. It's up now

@emilio emilio force-pushed the emilio:rule-mem-order branch from e45e3bf to 56a10e9 Mar 10, 2017
Copy link
Contributor

bholley left a comment

Sorry for the terrible delay here. I'll try to be quick on this one going forward.

@@ -554,6 +559,7 @@ impl StrongRuleNode {
source: StyleSource,
level: CascadeLevel) -> StrongRuleNode {
let mut last = None;
// TODO(emilio): We could avoid all the refcount churn here.

This comment has been minimized.

@bholley

bholley Mar 21, 2017

Contributor

Given that the refcounts are atomic, I think this is worth doing. Can you file a bug against stylo-perf?

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

Sure, will do.

let mut old_head = free_list.load(Ordering::SeqCst);
//
// Note that we can use relaxed operations, we'll fail the
// compare_exchange_weak anyway if we fail.

This comment has been minimized.

@bholley

bholley Mar 21, 2017

Contributor

This comment needs updating. Worth explaining in some detail why (Acquired, Relaxed) is the right ordering here.

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

True that.

//
// Note that we can use relaxed operations here since we're effectively
// locking the free list, and the memory ordering is already guaranteed
// by that locking/unlocking.

This comment has been minimized.

@bholley

bholley Mar 21, 2017

Contributor

This comment needs updating as well.

// mutated for search/insertion. In that case, the only thing that may
// happen is that we find a null sibling pointer when other thread has
// written already to that memory location.
//

This comment has been minimized.

@bholley

bholley Mar 21, 2017

Contributor

This still needs to be Acquire in order to synchronize properly with the Release when CAS-ing in new siblings, right?

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

Why? The point of this being relaxed is that even if we read when another thread has written into the field (and it's not visible), the worst case is that we'd get a null pointer when someone really wrote into it, and in that case, we'd enter the cas loop anyway, so this can be faster when there are actual siblings, which is the more common case.

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

I think this basically hinges on the outcome of the discussion below. If that write needs to be Release, then this read needs to be Acquire, otherwise the synchronization doesn't take effect.

@@ -580,13 +586,13 @@ impl StrongRuleNode {
let existing =
next_sibling_ptr.compare_and_swap(ptr::null_mut(),
new_ptr,
Ordering::SeqCst);
Ordering::Acquire);

This comment has been minimized.

@bholley

bholley Mar 21, 2017

Contributor

Hm, doesn't this need to be AcqRel? It needs to be Release because of the issue I pointed out last time (same reason why next_sibling() needs to use Acquire()). I think it needs to be Acquire if something reads fields on |existing|, which happens in the upgrade() call below.

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

Why? Upon reflection, I think even relaxed may be ok given we're looping. Aren't you mixing reads of the fields on the nodes vs reads on the atomic pointers? How's the field accesses on existing relevant to the loading order of next_sibling_ptr? (It's late and I may need re-paging all the memory orderings back in, but...)

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

Aren't you mixing reads of the fields on the nodes vs reads on the atomic pointers?

Unless I'm very confused here (which is a real possibility), I don't think so.

From my understanding, memory ordering isn't about ensuring any sort of consistency with regards to the actual atomic being loaded/stored (those operations are already atomic). Rather, memory ordering is about consistency in how other nearby, potentially-non-atomic operations are observed by various threads, in particularly how they may be reordered by the compiler and by the CPU. Sequential Consistency means that no other operations are observably reordered across the Sequentially-Consistent atomic operation, from the perspective of any thread. Release/Acquire means that no write operations performed by a thread manipulating field X with |Release| will be observed to be reordered across the change to X from the perspective of a thread reading field X with |Acquire|. Other threads can observe entirely different global orderings, because they're not "synchronizing" on the same memory address. [1]

So in this case, I believe we need Release when CAS-ing in the new entries, because otherwise the initialization of the new entry may not be "flushed" before the memory is accessed by other threads. And those threads need to obtain the pointers with Acquire, so that they are guaranteed to view the "flushed" writes.

If I'm wrong here, please tell me, because I would be very interested to know that. :-)

[1] http://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

Yeah, so there are two different things that this controls:

  • Visibility of non-atomic/relaxed writes, and.
  • Compiler instruction re-ordering.

I'm still not seeing how some re-ordering gone wrong could affect this, and wrt:

So in this case, I believe we need Release when CAS-ing in the new entries, because otherwise the initialization of the new entry may not be "flushed" before the memory is accessed by other threads.

In that case, the relaxed CAS itself wouldn't be visible either, so there's no way any other thread would be able to access that pointer, right?

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

Anyway, let's get down to what we agree with, and perhaps file followups as needed

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

In that case, the relaxed CAS itself wouldn't be visible either,
so there's no way any other thread would be able to access that pointer, right?

This is the heart of my whole concern here. There are 3 operations that happen here:
(1) The allocator selects an address in memory and returns it as a pointer.
(2) The caller writes various initialization patterns to the region in memory referenced by the pointer.
(3) The caller CASes the pointer into the rule tree, expose it to other threads.

With Relaxed semantics, what prevents another thread observing the ordering (1) (3) (2)? If nothing does, then the thread may see garbage memory.

@bholley
Copy link
Contributor

bholley commented Apr 2, 2017

Just a bump on this one - we're real close here so would be good to get it over the line to prevent us both from having to page everything back in all the time. :-)

@emilio emilio force-pushed the emilio:rule-mem-order branch from 56a10e9 to 6636d4b Apr 3, 2017
@emilio
Copy link
Member Author

emilio commented Apr 3, 2017

Pushed the new changes as a fixup commit, in order for them to be easily reviewed. I'll squash them later.

@emilio emilio force-pushed the emilio:rule-mem-order branch from 6636d4b to a4a6e90 Apr 3, 2017
// FIXME(emilio): Fiddle with memory orderings.
let first_child = self.first_child.load(Ordering::SeqCst);
// See next_sibling to see why relaxed is ok.
let first_child = self.first_child.load(Ordering::Relaxed);

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

Given that next_sibling is currently |Acquire|, this needs to be |Acquire| as well.

// succeeds, the writes from the last thread that wrote into
// next_sibling_ptr are visible in this one. If we have the
// guarantee that _all_ memory written before that write is
// visible (which I think we do), that means that their

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

We don't have that guarantee with Relaxed, since IIUC that's what memory ordering is all about.

// too, so we shouldn't need this `AcqRel` semantics.
//
// If this is true, the next_sibling() function should also be
// changed to `Relaxed`.

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

Also reference iter_children() here.

// guarantee that _all_ memory written before that write is
// visible (which I think we do), that means that their
// initialization of the new node is visible into our thread
// too, so we shouldn't need this `AcqRel` semantics.

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

If you do end up landing with this comment, please reference the discussion in this PR.

/// Returns whether this node has any child, only intended for testing
/// purposes.
pub unsafe fn has_children_for_testing(&self) -> bool {
!self.get().first_child.load(Ordering::Relaxed).is_null()

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

This should be Acquire.

This comment has been minimized.

@emilio

emilio Apr 3, 2017

Author Member

This is only for testing, and there are barriers around (this is only executed in the "sequential" mode).

};

if !should_drop {
return
}

debug_assert_eq!(node.first_child.load(Ordering::SeqCst),
debug_assert_eq!(node.first_child.load(Ordering::Relaxed),

This comment has been minimized.

@bholley

bholley Apr 3, 2017

Contributor

It probably doesn't matter, but this should be Acquire to be safe.

@bholley
Copy link
Contributor

bholley commented Apr 3, 2017

To summarize IRC discussion: All first_child and next_siblings should have:

  • Release stores
  • Acquire loads
  • AcqRel CASes

(excluding main-thread-only functions).

r=me with that changed and the comments fixed up. Thanks for doing this!

@bors-servo delegate+

@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

✌️ @emilio can now approve this pull request

@emilio emilio force-pushed the emilio:rule-mem-order branch 2 times, most recently from d1d1036 to 5102d15 Apr 3, 2017
I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).
@emilio emilio force-pushed the emilio:rule-mem-order branch from 5102d15 to 1d94a68 Apr 3, 2017
@emilio
Copy link
Member Author

emilio commented Apr 3, 2017

@bors-servo r=bholley

  • Sorry for being wrong on this one after all! :/
@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

📌 Commit 1d94a68 has been approved by bholley

@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

Testing commit 1d94a68 with merge dc8ed0d...

bors-servo added a commit that referenced this pull request Apr 3, 2017
style: Tweak rule tree memory ordering.

I've commented on the ones that I think are the most tricky. Note that this code
is stress-tested in the style tests (tests/unit/style/rule_tree/bench.rs).

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/15562)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

bors-servo commented Apr 3, 2017

☀️ Test successful - android, arm32, arm64, linux-dev, linux-rel-css, linux-rel-wpt, mac-dev-unit, mac-rel-css, mac-rel-wpt1, mac-rel-wpt2, windows-msvc-dev
Approved by: bholley
Pushing dc8ed0d to master...

@bors-servo bors-servo merged commit 1d94a68 into servo:master Apr 3, 2017
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.