Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upIntroduce atomic closures implemented using flat-combining #141
Conversation
nikomatsakis
force-pushed the
flat-combine
branch
from
ca4bf73
to
82ad491
Nov 15, 2016
This comment has been minimized.
This comment has been minimized.
|
It may of course make sense to factor this outside of rayon altogether; the main difference would be that we would have to assign each thread that touches it an index, instead of being able to leverage rayon's pre-existing indices. You might also want to revisit the "won't be used by an open-ended set of threads over a long period of time" assumption in that case, which would in general make the management of the active thread list more complicated. (The flat-combining paper talks about ways to manage it.) |
cuviper
reviewed
Nov 15, 2016
| // The actual `ID` value is irrelevant. We're just using its TLS | ||
| // address as a unique thread key, faster than a real thread-id call. | ||
| thread_local!{ static ID: bool = false } | ||
| ID.with(|id| ThreadId { addr: id as *const bool as usize }) |
This comment has been minimized.
This comment has been minimized.
cuviper
Nov 15, 2016
Member
- Is this refactoring relevant to the PR? I don't see any new callers here.
- You did not keep it
#[inline], which means it will probably turn into a cross-crate call from the generic bridge functions. - The performance claim that I made about this should probably be examined, if this ever gets used in a true hotspot. e.g. I'm not sure if Rust ever uses static TLS, in which case there's probably a call for dynamic TLS anyway. (Your OS may vary, no warranty is expressed or implied, yada yada.)
- But anyway, it never showed up as a hotspot in my testing, and AFAIK there's no thread id available in the standard library, so this invented id is probably still good enough.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
Nov 15, 2016
Author
Member
Yeah, not really relevant -- I originally planned to write a more general purpose thing that made use of ThreadId. I'll just drop this commit I think.
cuviper
reviewed
Nov 15, 2016
| @@ -27,5 +31,8 @@ pub use api::dump_stats; | |||
| pub use api::initialize; | |||
| pub use api::join; | |||
| pub use api::ThreadPool; | |||
| pub use atomic::Atomic; | |||
| #[cfg(feature = "unstable")] | |||
| pub use atomic::Atomic; | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nikomatsakis
force-pushed the
flat-combine
branch
3 times, most recently
from
744d0fa
to
b75e8e0
Nov 15, 2016
This comment has been minimized.
This comment has been minimized.
|
Did some measurements on my linux box with this branch:
|
nikomatsakis
force-pushed the
flat-combine
branch
4 times, most recently
from
8158168
to
b245f4a
Nov 18, 2016
This comment has been minimized.
This comment has been minimized.
|
I encountered an interesting problem trying to port the TSP solver to use this (actually, I encountered it when doing filtering too). The current |
This was referenced Dec 20, 2016
nikomatsakis
added some commits
Nov 14, 2016
nikomatsakis
force-pushed the
flat-combine
branch
from
b245f4a
to
affb3dd
Dec 30, 2016
This comment has been minimized.
This comment has been minimized.
|
Rebased. This is working now, but I was trying to integrate it into the TSP benchmark and I found one problem with the API: it assumes you want to own the argument, but this isn't always true, sometimes you'd prefer a closure of type |
This comment has been minimized.
This comment has been minimized.
|
Where do you stand on this? It will at least have to be updated for the |
stjepang
reviewed
Mar 18, 2017
|
Orderings in If in doubt, I'd suggest using |
| head: AtomicPtr<Cell<T>>, // if not null, a unique, transmuted ptr to a Box<T> | ||
| } | ||
|
|
||
| struct Cell<T> { |
This comment has been minimized.
This comment has been minimized.
stjepang
Mar 18, 2017
Contributor
I'd prefer to name this Node<T> in order to avoid confusion with std::cell::Cell.
| loop { | ||
| let head = self.head.load(Ordering::Relaxed); | ||
| (*cell).next = head; | ||
| if self.head.compare_and_swap(head, cell, Ordering::Release) == head { |
This comment has been minimized.
This comment has been minimized.
stjepang
Mar 18, 2017
Contributor
You can make this loop faster under contention, like this:
let mut head = self.head.load(Ordering::Acquire);
loop {
(*cell).next = head;
let previous = self.head.compare_and_swap(head, cell, Ordering::AcqRel);
if previous == head {
break;
} else {
head = previous;
}
}Also, your orderings are incorrect - you need Acquire and AcqRel. :)
The way to think about this is the following...
Suppose Bob wants to prepend a new value to the list. He loads the current head with Relaxed ordering (like you did). Then he sets the next field of the new cell and installs a new cell with Release ordering.
Now Alice comes and loads the head with Acquire ordering. She gets value h, which is the head Bob just installed. Because she used Acquire ordering, she will see all writes to memory that happened before h was installed. This is good.
However, will she see writes to memory that happened before (*h).next was installed? Unfortunately, no, and that is because Bob used Relaxed load! Had he used Acquire load, then Alice would see even those writes.
With correct ordering, this is what would happen...
Think of this as a long synchronization chain. Alice's load (Acquire) synchronizes with Bob's store (Release), which happened before he wrote to (*cell).next, and that load (Acquire) synchronized with the previous store (Release) to the head (whoever did it) and so forth...
So the key take away is: make sure to chain these operations! Relaxed is often fishy. :)
This comment has been minimized.
This comment has been minimized.
|
I think I'll probably wind up closing this, at least for now, but I'm going to keep it open until I have time to read @stjepang's comments and give them some thought! @stjepang, I would encourage you to take a look at the orderings in the sleep module and give me your feedback on that as well! There is some subtle logic there and I suspect there are better ways of doing things. (The README for that module contains a lot of details.) |
This was referenced Apr 14, 2017
cuviper
referenced this pull request
May 15, 2017
Closed
Add an `unordered_map` iterator adapter. #338
This comment has been minimized.
This comment has been minimized.
|
Gonna close this for now. |
nikomatsakis commentedNov 15, 2016
•
edited
This is a second stab at flat-combine. This time, the construct is intended for end-user use. The easiest way to use it is with the
atomically!macro, which lets you wrap a closure to indicate that it should execute atomically:But more generally you can create an atomic closure like:
and then call it with
x.invoke(data).The current code is intended for use with Rayon; it won't really provide a benefit outside of Rayon, just acts more-or-less like a regular lock. It is also not intended to be super long-lived, so e.g. we never prune the lists of registries that touch it and so forth, which could lead to a (very minor) memory leak if you had a long-lived atomic and created a lot of thread pools that used it independently. But both of those things are anti-patterns, so I'm not worried about that.
Some things remain not done:
cc @aturon -- I'd love a detailed look at the acquire/release orderings. I think they're correct. =)