Add a generic CAS loop to std::sync::Atomic* #48658

llogiq · 2018-03-02T06:36:19Z

This adds two new methods to both AtomicIsize and AtomicUsize with optimized safe compare-and-set loops, so users will no longer need to write their own, except in very strange circumstances.

update_and_fetch will apply the function and return its result, whereas fetch_and_update will apply the function and return the previous value.

This solves #48384 with x.update_and_fetch(|x| x.max(y)). It also relates to #48655 (which I misuse as tracking issue for now)..

note This might need a crater run because the functions could clash with third party extension traits.

rust-highfive · 2018-03-02T06:36:24Z

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

ranma42 · 2018-03-02T08:05:08Z

src/libcore/sync/atomic.rs

+            /// Examples:
+            ///
+            /// ```rust
+            /// use std::sync::atömic::{AtomicIsize, Ordering};


This is a heavy metal atom 😆

I swear it was a typo.

ranma42 · 2018-03-02T08:05:24Z

src/libcore/sync/atomic.rs

+            /// Examples:
+            ///
+            /// ```rust
+            /// use std::sync::atömic::{AtomicIsize, Ordering};


kennytm · 2018-03-02T08:55:47Z

src/libcore/sync/atomic.rs

@@ -1268,6 +1268,70 @@ macro_rules! atomic_int {
                }
            }

+            /// fetch the value, apply a function to it and return the result


Capitalize the F, and also use the singular form.

Fetches the value, applies a function to it and returns the result.

kennytm · 2018-03-02T08:59:16Z

src/libcore/sync/atomic.rs

+            /// Note: This may call the function multiple times if the value has been
+            /// changed from other threads in the meantime.
+            ///
+            /// Examples:


Examples should be a section header, followed by a summary what the example does.

/// # Examples /// /// Compute the maximum of a slice, where each update step is done atomically.

kennytm · 2018-03-02T09:02:25Z

src/libcore/sync/atomic.rs

+                let mut prev = self.get(Ordering::Relaxed);
+                loop {
+                    let next = f(prev);
+                    match compare_exchange_weak(self, prev, next, Ordering::SeqCst, Ordering::Relaxed) {


Is there any chance the user wants to use ordering other than SeqCst/Relaxed? (Similar question for the self.get(Relaxed) above)

Definitely yes. Current implementation seems to be always right, however it might end up doing more loops than strictly necessary if some stricter ordering was used for get… I think. And SeqCst isn’t always necessary for CEW, but once you go for a weaker ordering there, it is important to be able to specify a stronger one for get and failure cases.

@nagisa Benchmarks in other languages show that using the most relaxed ordering and allowing for more loops achieves for better throughput. However, as this is not other languages, I'll see if I can come up with some benchmarks for it.

@llogiq it is difficult to generalise the benchmarks like this. The article you’ve linked to is benchmarking on x86, which will have different performance characteristics compared to an architecture where the memory model is weaker. This is exactly why we provide means to specify the orderings everywhere.

nagisa · 2018-03-03T12:43:31Z

Furthermore, I couldn’t manage to convince LLVM to generate as good a code with manual CAS loop compared to just calling the atomic_max intrinsic.

Here’s the code for the atomic_max intrinsic for the ARMv7 architecture:

	dmb	ish
.LBB1_1:
	ldrexd	r4, r5, [r0]
	subs	r1, r2, r4
	sbcs	r1, r3, r5
	mov	r1, #0
	movwlt	r1, #1
	cmp	r1, #0
	mov	r1, r3
	moveq	r4, r2
	movne	r1, r5
	mov	r5, r1
	strexd	r1, r4, r5, [r0]
	cmp	r1, #0
	bne	.LBB1_1
	dmb	ish

and something that gets generated when implementing the same-ish functionality using the CAS loop:

	push	{r4, r5, r6, r7, r8, r9, r10, r11, lr}
	ldrd	r8, r9, [r0]
	ldrexd	r10, r11, [r0]
	eor	r1, r11, r9
	eor	r7, r10, r8
	orr	r5, r7, r1
	subs	r7, r2, r8
	sbcs	r7, r3, r9
	mov	r1, #0
	mov	r7, #0
	movwlo	r7, #1
	cmp	r7, #0
	mov	r7, r3
	movne	r7, r9
	mov	r6, r2
	movne	r6, r8
	cmp	r5, #0
	bne	.LBB2_4
	dmb	ish
.LBB2_2:
	strexd	r5, r6, r7, [r0]
	cmp	r5, #0
	beq	.LBB2_6
	ldrexd	r10, r11, [r0]
	eor	r5, r10, r8
	eor	r4, r11, r9
	orrs	r5, r5, r4
	beq	.LBB2_2
.LBB2_4:
	clrex
	cmp	r1, #0
	beq	.LBB2_8
.LBB2_5:
	pop	{r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB2_6:
	dmb	ish
	mov	r1, #1
	cmp	r1, #0
	beq	.LBB2_8
	b	.LBB2_5
.LBB2_7:
	mov	r10, r6
	mov	r11, r7
	cmp	r1, #0
	bne	.LBB2_5
.LBB2_8:
	ldrexd	r6, r7, [r0]
	mov	r9, r3
	eor	r1, r6, r10
	eor	r5, r7, r11
	orr	r1, r1, r5
	subs	r5, r2, r10
	sbcs	r5, r3, r11
	mov	r5, #0
	movwlo	r5, #1
	cmp	r5, #0
	movne	r9, r11
	mov	r8, r2
	movne	r8, r10
	cmp	r1, #0
	bne	.LBB2_12
	dmb	ish
.LBB2_10:
	strexd	r1, r8, r9, [r0]
	cmp	r1, #0
	beq	.LBB2_13
	ldrexd	r6, r7, [r0]
	eor	r1, r6, r10
	eor	r4, r7, r11
	orrs	r1, r1, r4
	beq	.LBB2_10
.LBB2_12:
	mov	r1, #0
	clrex
	b	.LBB2_7
.LBB2_13:
	dmb	ish
	mov	r1, #1
	b	.LBB2_7

So it seems to me that it would be worthwhile to add a binding for such a function anyway.

0xAX · 2018-03-05T09:05:14Z

src/libcore/sync/atomic.rs

+                }
+            }
+
+            /// Fetches the value, applies a function to it and returns the prvious value


s/prvious/previous

pietroalbini · 2018-03-12T10:26:11Z

Ping from triage @llogiq! Could you finish this PR so it can be merged?

llogiq · 2018-03-12T11:15:19Z

I'm a bit unsure how to progress:

Should I add the Orderings for each operation as arguments? Or unify the load orderings?
Should I add the atomic_min / atomic_max intrinsics, too?

nagisa · 2018-03-12T12:19:28Z

I think we should add two ordering arguments -- one for CAS success case and the other for for failure case (also used for initial load). We should expose the min and max methods separately as well IMO.

Also cc @Amanieu, who may have opinions about this stuff.

kennytm · 2018-03-12T12:26:04Z

At the very least one should be able to choose an ordering other than SeqCst. There are three orderings involved here (load, compare_exchange_weak's success + failure). If one can prove that any one can be always Relaxed or any two ordering must be equivalent, then we don't need to expose that ordering to the API.

Amanieu · 2018-03-12T14:11:05Z

@kennytm The ordering for the initial load and for the CAS failure are the same: in both cases this is used to load a value for the next loop iteration.

For the API itself, I think something like this would be better:

fn transaction<F: FnMut($int_type) -> Option<$int_type>>(&self, f: F, success: Ordering, failure: Ordering) -> bool;

The closure can return None, which aborts the operation: the value of the atomic is not changed.

Amanieu · 2018-03-12T14:14:09Z

I'll also leave this as an example of prior work (in C++).

Amanieu · 2018-03-17T01:33:31Z

@llogiq I'm not convinced that we need both fetch_and_update and update_and_fetch, hence my suggestion above to only have a single method (transaction). In both cases the old/new value can easily be extracted from the closure, so I think that the previous value should be returned for consistency with the other atomic operations.

llogiq · 2018-03-17T06:28:24Z

@Amanieu in that case I think it'd be prudent to have an example of extracting the new value from the closure. How would you do that? Setting a mutable binding outside? We also should look at the generated assembly for that case.

Amanieu · 2018-03-17T06:34:01Z

Yes, basically something like this:

let mut new = 0;
ATOMIC.fetch_and_update(|x| {new = x + 1; new}, Ordering::Relaxed, Ordering::Relaxed);

You need to replace Fn with FnMut in the function definition, there's no reason not to support FnMut here.

(As a side note, I'm still not a big fan of the fetch_and_update name, have you considered transaction like I suggested above?)

llogiq · 2018-03-17T07:22:22Z

I'll change the name when I remove update_and_fetch. But I'll be offline for a few hours first.

kennytm · 2018-03-17T07:38:21Z

I strongly oppose to the name transaction, it tells nothing about the intended behavior of the function without looking up the doc. Please keep the name fetch_and_update (or shorten it to fetch_update), it is totally fine.

llogiq · 2018-03-17T10:10:01Z

I disagree that transaction doesn't tell the intended behavior of the function (though the interface of the C++ method of the same name returns bool for some reason), but to keep with the naming scheme of the Atomic* type methods, fetch_update would be preferrable (fetch_with would be another option, but that's not as telling. Perhaps fetch_modifyor fetch_change?). I'm open for suggestions.

Amanieu · 2018-03-17T17:24:14Z

After some consideration, I'm OK with the name fetch_update, since it is consistent with the other method names.

I went through a few of the CAS loops in my various crates, and the majority of them are of this form:

loop {
    if some_condition {
        return false;
    }
    match atomic.compare_exchange_weak(
        oldval,
        newval,
        Ordering::Relaxed,
        Ordering::Relaxed,
    ) {
        Ok(_) => return true,
        Err(x) => oldval = x,
    }
}

I therefore think that we should make the closure return an Option<$int_type> where a None will simply break out of the loop without changing the underlying value (this is very similar to the behavior of compare_exchange).

llogiq · 2018-03-17T17:40:32Z

So the fndecl would be pub fn fetch_update<F>(&self, f: F) ->$int_type where F: FnMut($int_type) -> Option<$int_type>;, right? Returning the previous value in all cases & breaking the loop if f returns None?

Amanieu · 2018-03-17T17:43:29Z

@llogiq That's correct.

I'm somewhat torn on the return type of fetch_update itself. Should we unconditionally return the previous value or should we wrap it in a Result like the return value of compare_exchange?

llogiq · 2018-03-17T18:49:52Z

So you'd return either Ok(old_value) or Err(old_value)?

Amanieu · 2018-03-17T18:50:57Z

Yes

llogiq · 2018-03-27T16:17:07Z

Rebased and squashed.

@kennytm r?

kennytm · 2018-03-27T17:45:23Z

@llogiq The fetch_update method is gone?

llogiq · 2018-03-27T18:15:12Z

It appears I lost it during the rebase. I'll re-add it.

llogiq · 2018-03-27T19:49:56Z

Done. Let's see if Travis likes it.

llogiq · 2018-03-28T05:31:32Z

Besides fixing some typos, I had to remove the random number generator, as it doesn't work with types below 32 bits.

llogiq · 2018-03-28T07:36:18Z

Now with the fetch_update method back, @kennytm r? (also thanks for bearing with me for so long :slight_smile:)

kennytm · 2018-03-28T07:46:02Z

src/libcore/sync/atomic.rs

+
+Note: This may call the function multiple times if the value has been changed from other threads in
+the meantime, as long as the function returns `Some(_)`. It will however only apply the function
+once to the value stored within.


It will however only apply the function once to the value stored within.

I don't get what does this mean. Could be clarify this in the code?

I wanted to say that the function will be applied only once per observed value. I've changed the text accordingly.

Since you are using compare_exchange_weak, it could actually be applied more than once even if the value has not changed. For example on ARM, a compare_exchange_weak may spuriously fail due to an interrupt.

That is true. Ok, I've changed the text once more to say that the function will only have been applied once to the stored value. Otherwise people may fear they need to make the function idempotent.

kennytm · 2018-03-28T07:48:22Z

src/libcore/sync/atomic.rs

+assert!(max_foo == 42);
+```"),
+                #[inline]
+                #[unstable(feature = "no_more_cas",


I suggest we pick a different feature name to fetch_{max, min} (e.g. atomic_min_max). We may have different stabilization schedule between fetch_{min, max} and fetch_update.

This adds a new method to all numeric `Atomic*` types with a safe compare-and-set loop, so users will no longer need to write their own, except in *very* strange circumstances. This solves rust-lang#48384 with `x.fetch_max(_)`/`x.fetch_min(_)`. It also relates to rust-lang#48655 (which I misuse as tracking issue for now). *note* This *might* need a crater run because the functions could clash with third party extension traits.

llogiq · 2018-03-31T19:25:10Z

Now we have the split feature gate and a better wording for the comments. Travis also seems to like it. r?

kennytm · 2018-04-05T08:35:41Z

@bors r+

bors · 2018-04-05T08:35:42Z

📌 Commit 0f5e419 has been approved by kennytm

Add a generic CAS loop to std::sync::Atomic* This adds two new methods to both `AtomicIsize` and `AtomicUsize` with optimized safe compare-and-set loops, so users will no longer need to write their own, except in *very* strange circumstances. `update_and_fetch` will apply the function and return its result, whereas `fetch_and_update` will apply the function and return the previous value. This solves rust-lang#48384 with `x.update_and_fetch(|x| x.max(y))`. It also relates to rust-lang#48655 (which I misuse as tracking issue for now).. *note* This *might* need a crater run because the functions could clash with third party extension traits.

Rollup of 9 pull requests Successful merges: - #48658 (Add a generic CAS loop to std::sync::Atomic*) - #49253 (Take the original extra-filename passed to a crate into account when resolving it as a dependency) - #49345 (RFC 2008: Finishing Touches) - #49432 (Flush executables to disk after linkage) - #49496 (Add more vec![... ; n] optimizations) - #49563 (add a dist builder to build rust-std components for the THUMB targets) - #49654 (Host compiler documentation: Include private items) - #49667 (Add more features to rust_2018_preview) - #49674 (ci: Remove x86_64-gnu-incremental builder) Failed merges:

rust-highfive assigned kennytm Mar 2, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 2, 2018

llogiq changed the title ~~Add a gerneic CAS loop to std::sync::Atomic?size~~ Add a generic CAS loop to std::sync::Atomic?size Mar 2, 2018

ranma42 reviewed Mar 2, 2018

View reviewed changes

kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 2, 2018

kennytm reviewed Mar 2, 2018

View reviewed changes

0xAX reviewed Mar 5, 2018

View reviewed changes

kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 5, 2018

llogiq changed the title ~~Add a generic CAS loop to std::sync::Atomic?size~~ Add a generic CAS loop to std::sync::Atomic* Mar 17, 2018

llogiq force-pushed the no-more-cas branch 3 times, most recently from 3589b35 to b87c545 Compare March 27, 2018 12:57

llogiq force-pushed the no-more-cas branch from b87c545 to c419dc5 Compare March 27, 2018 19:49

llogiq force-pushed the no-more-cas branch 4 times, most recently from dde148c to 2bc4859 Compare March 28, 2018 05:27

kennytm reviewed Mar 28, 2018

View reviewed changes

llogiq force-pushed the no-more-cas branch from 2bc4859 to 91e52b0 Compare March 30, 2018 10:16

llogiq force-pushed the no-more-cas branch from 91e52b0 to 0f5e419 Compare March 30, 2018 10:27

Amanieu mentioned this pull request Mar 31, 2018

Add AtomicCell crossbeam-rs/crossbeam-utils#13

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 5, 2018

kennytm mentioned this pull request Apr 5, 2018

Rollup of 9 pull requests #49684

Merged

bors merged commit 0f5e419 into rust-lang:master Apr 5, 2018

llogiq deleted the no-more-cas branch April 6, 2018 01:35

Add a generic CAS loop to std::sync::Atomic* #48658

Add a generic CAS loop to std::sync::Atomic* #48658

Conversation

llogiq commented Mar 2, 2018

rust-highfive commented Mar 2, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nagisa commented Mar 3, 2018 • edited Loading

Choose a reason for hiding this comment

pietroalbini commented Mar 12, 2018

llogiq commented Mar 12, 2018

nagisa commented Mar 12, 2018

kennytm commented Mar 12, 2018

Amanieu commented Mar 12, 2018

Amanieu commented Mar 12, 2018

Amanieu commented Mar 17, 2018

llogiq commented Mar 17, 2018 • edited Loading

Amanieu commented Mar 17, 2018

llogiq commented Mar 17, 2018

kennytm commented Mar 17, 2018

llogiq commented Mar 17, 2018

Amanieu commented Mar 17, 2018

llogiq commented Mar 17, 2018

Amanieu commented Mar 17, 2018

llogiq commented Mar 17, 2018

Amanieu commented Mar 17, 2018

llogiq commented Mar 27, 2018

kennytm commented Mar 27, 2018 • edited Loading

llogiq commented Mar 27, 2018

llogiq commented Mar 27, 2018 • edited Loading

llogiq commented Mar 28, 2018

llogiq commented Mar 28, 2018

Choose a reason for hiding this comment

llogiq Mar 30, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llogiq commented Mar 31, 2018

kennytm commented Apr 5, 2018

bors commented Apr 5, 2018

nagisa commented Mar 3, 2018 •

edited

Loading

llogiq commented Mar 17, 2018 •

edited

Loading

kennytm commented Mar 27, 2018 •

edited

Loading

llogiq commented Mar 27, 2018 •

edited

Loading

llogiq Mar 30, 2018 •

edited

Loading