Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a generic CAS loop to std::sync::Atomic* #48658

Merged
merged 1 commit into from
Apr 5, 2018

Conversation

llogiq
Copy link
Contributor

@llogiq llogiq commented Mar 2, 2018

This adds two new methods to both AtomicIsize and AtomicUsize with optimized safe compare-and-set loops, so users will no longer need to write their own, except in very strange circumstances.

update_and_fetch will apply the function and return its result, whereas fetch_and_update will apply the function and return the previous value.

This solves #48384 with x.update_and_fetch(|x| x.max(y)). It also relates to #48655 (which I misuse as tracking issue for now)..

note This might need a crater run because the functions could clash with third party extension traits.

@rust-highfive
Copy link
Collaborator

r? @kennytm

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 2, 2018
@llogiq llogiq changed the title Add a gerneic CAS loop to std::sync::Atomic?size Add a generic CAS loop to std::sync::Atomic?size Mar 2, 2018
/// Examples:
///
/// ```rust
/// use std::sync::atömic::{AtomicIsize, Ordering};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atömic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a heavy metal atom 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I swear it was a typo.

/// Examples:
///
/// ```rust
/// use std::sync::atömic::{AtomicIsize, Ordering};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atömic?

@kennytm kennytm added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 2, 2018
@@ -1268,6 +1268,70 @@ macro_rules! atomic_int {
}
}

/// fetch the value, apply a function to it and return the result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize the F, and also use the singular form.

Fetches the value, applies a function to it and returns the result.

/// Note: This may call the function multiple times if the value has been
/// changed from other threads in the meantime.
///
/// Examples:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples should be a section header, followed by a summary what the example does.

/// # Examples
///
/// Compute the maximum of a slice, where each update step is done atomically.

let mut prev = self.get(Ordering::Relaxed);
loop {
let next = f(prev);
match compare_exchange_weak(self, prev, next, Ordering::SeqCst, Ordering::Relaxed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance the user wants to use ordering other than SeqCst/Relaxed? (Similar question for the self.get(Relaxed) above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely yes. Current implementation seems to be always right, however it might end up doing more loops than strictly necessary if some stricter ordering was used for get… I think. And SeqCst isn’t always necessary for CEW, but once you go for a weaker ordering there, it is important to be able to specify a stronger one for get and failure cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagisa Benchmarks in other languages show that using the most relaxed ordering and allowing for more loops achieves for better throughput. However, as this is not other languages, I'll see if I can come up with some benchmarks for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@llogiq it is difficult to generalise the benchmarks like this. The article you’ve linked to is benchmarking on x86, which will have different performance characteristics compared to an architecture where the memory model is weaker. This is exactly why we provide means to specify the orderings everywhere.

@nagisa
Copy link
Member

nagisa commented Mar 3, 2018

Furthermore, I couldn’t manage to convince LLVM to generate as good a code with manual CAS loop compared to just calling the atomic_max intrinsic.

Here’s the code for the atomic_max intrinsic for the ARMv7 architecture:

	dmb	ish
.LBB1_1:
	ldrexd	r4, r5, [r0]
	subs	r1, r2, r4
	sbcs	r1, r3, r5
	mov	r1, #0
	movwlt	r1, #1
	cmp	r1, #0
	mov	r1, r3
	moveq	r4, r2
	movne	r1, r5
	mov	r5, r1
	strexd	r1, r4, r5, [r0]
	cmp	r1, #0
	bne	.LBB1_1
	dmb	ish

and something that gets generated when implementing the same-ish functionality using the CAS loop:

	push	{r4, r5, r6, r7, r8, r9, r10, r11, lr}
	ldrd	r8, r9, [r0]
	ldrexd	r10, r11, [r0]
	eor	r1, r11, r9
	eor	r7, r10, r8
	orr	r5, r7, r1
	subs	r7, r2, r8
	sbcs	r7, r3, r9
	mov	r1, #0
	mov	r7, #0
	movwlo	r7, #1
	cmp	r7, #0
	mov	r7, r3
	movne	r7, r9
	mov	r6, r2
	movne	r6, r8
	cmp	r5, #0
	bne	.LBB2_4
	dmb	ish
.LBB2_2:
	strexd	r5, r6, r7, [r0]
	cmp	r5, #0
	beq	.LBB2_6
	ldrexd	r10, r11, [r0]
	eor	r5, r10, r8
	eor	r4, r11, r9
	orrs	r5, r5, r4
	beq	.LBB2_2
.LBB2_4:
	clrex
	cmp	r1, #0
	beq	.LBB2_8
.LBB2_5:
	pop	{r4, r5, r6, r7, r8, r9, r10, r11, pc}
.LBB2_6:
	dmb	ish
	mov	r1, #1
	cmp	r1, #0
	beq	.LBB2_8
	b	.LBB2_5
.LBB2_7:
	mov	r10, r6
	mov	r11, r7
	cmp	r1, #0
	bne	.LBB2_5
.LBB2_8:
	ldrexd	r6, r7, [r0]
	mov	r9, r3
	eor	r1, r6, r10
	eor	r5, r7, r11
	orr	r1, r1, r5
	subs	r5, r2, r10
	sbcs	r5, r3, r11
	mov	r5, #0
	movwlo	r5, #1
	cmp	r5, #0
	movne	r9, r11
	mov	r8, r2
	movne	r8, r10
	cmp	r1, #0
	bne	.LBB2_12
	dmb	ish
.LBB2_10:
	strexd	r1, r8, r9, [r0]
	cmp	r1, #0
	beq	.LBB2_13
	ldrexd	r6, r7, [r0]
	eor	r1, r6, r10
	eor	r4, r7, r11
	orrs	r1, r1, r4
	beq	.LBB2_10
.LBB2_12:
	mov	r1, #0
	clrex
	b	.LBB2_7
.LBB2_13:
	dmb	ish
	mov	r1, #1
	b	.LBB2_7

So it seems to me that it would be worthwhile to add a binding for such a function anyway.

}
}

/// Fetches the value, applies a function to it and returns the prvious value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/prvious/previous

@kennytm kennytm added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 5, 2018
@pietroalbini
Copy link
Member

Ping from triage @llogiq! Could you finish this PR so it can be merged?

@llogiq
Copy link
Contributor Author

llogiq commented Mar 12, 2018

I'm a bit unsure how to progress:

  • Should I add the Orderings for each operation as arguments? Or unify the load orderings?
  • Should I add the atomic_min / atomic_max intrinsics, too?

@nagisa
Copy link
Member

nagisa commented Mar 12, 2018

I think we should add two ordering arguments -- one for CAS success case and the other for for failure case (also used for initial load). We should expose the min and max methods separately as well IMO.

Also cc @Amanieu, who may have opinions about this stuff.

@kennytm
Copy link
Member

kennytm commented Mar 12, 2018

At the very least one should be able to choose an ordering other than SeqCst. There are three orderings involved here (load, compare_exchange_weak's success + failure). If one can prove that any one can be always Relaxed or any two ordering must be equivalent, then we don't need to expose that ordering to the API.

@Amanieu
Copy link
Member

Amanieu commented Mar 12, 2018

@kennytm The ordering for the initial load and for the CAS failure are the same: in both cases this is used to load a value for the next loop iteration.

For the API itself, I think something like this would be better:

fn transaction<F: FnMut($int_type) -> Option<$int_type>>(&self, f: F, success: Ordering, failure: Ordering) -> bool;

The closure can return None, which aborts the operation: the value of the atomic is not changed.

@Amanieu
Copy link
Member

Amanieu commented Mar 12, 2018

I'll also leave this as an example of prior work (in C++).

@Amanieu
Copy link
Member

Amanieu commented Mar 17, 2018

@llogiq I'm not convinced that we need both fetch_and_update and update_and_fetch, hence my suggestion above to only have a single method (transaction). In both cases the old/new value can easily be extracted from the closure, so I think that the previous value should be returned for consistency with the other atomic operations.

@llogiq
Copy link
Contributor Author

llogiq commented Mar 17, 2018

@Amanieu in that case I think it'd be prudent to have an example of extracting the new value from the closure. How would you do that? Setting a mutable binding outside? We also should look at the generated assembly for that case.

@Amanieu
Copy link
Member

Amanieu commented Mar 17, 2018

Yes, basically something like this:

let mut new = 0;
ATOMIC.fetch_and_update(|x| {new = x + 1; new}, Ordering::Relaxed, Ordering::Relaxed);

You need to replace Fn with FnMut in the function definition, there's no reason not to support FnMut here.

(As a side note, I'm still not a big fan of the fetch_and_update name, have you considered transaction like I suggested above?)

@llogiq
Copy link
Contributor Author

llogiq commented Mar 17, 2018

I'll change the name when I remove update_and_fetch. But I'll be offline for a few hours first.

@kennytm
Copy link
Member

kennytm commented Mar 17, 2018

I strongly oppose to the name transaction, it tells nothing about the intended behavior of the function without looking up the doc. Please keep the name fetch_and_update (or shorten it to fetch_update), it is totally fine.

@llogiq
Copy link
Contributor Author

llogiq commented Mar 17, 2018

I disagree that transaction doesn't tell the intended behavior of the function (though the interface of the C++ method of the same name returns bool for some reason), but to keep with the naming scheme of the Atomic* type methods, fetch_update would be preferrable (fetch_with would be another option, but that's not as telling. Perhaps fetch_modifyor fetch_change?). I'm open for suggestions.

@llogiq llogiq changed the title Add a generic CAS loop to std::sync::Atomic?size Add a generic CAS loop to std::sync::Atomic* Mar 17, 2018
@Amanieu
Copy link
Member

Amanieu commented Mar 17, 2018

After some consideration, I'm OK with the name fetch_update, since it is consistent with the other method names.

I went through a few of the CAS loops in my various crates, and the majority of them are of this form:

loop {
    if some_condition {
        return false;
    }
    match atomic.compare_exchange_weak(
        oldval,
        newval,
        Ordering::Relaxed,
        Ordering::Relaxed,
    ) {
        Ok(_) => return true,
        Err(x) => oldval = x,
    }
}

I therefore think that we should make the closure return an Option<$int_type> where a None will simply break out of the loop without changing the underlying value (this is very similar to the behavior of compare_exchange).

@llogiq
Copy link
Contributor Author

llogiq commented Mar 17, 2018

So the fndecl would be pub fn fetch_update<F>(&self, f: F) ->$int_type where F: FnMut($int_type) -> Option<$int_type>;, right? Returning the previous value in all cases & breaking the loop if f returns None?

@Amanieu
Copy link
Member

Amanieu commented Mar 17, 2018

@llogiq That's correct.

I'm somewhat torn on the return type of fetch_update itself. Should we unconditionally return the previous value or should we wrap it in a Result like the return value of compare_exchange?

@llogiq
Copy link
Contributor Author

llogiq commented Mar 17, 2018

So you'd return either Ok(old_value) or Err(old_value)?

@Amanieu
Copy link
Member

Amanieu commented Mar 17, 2018

Yes

@llogiq llogiq force-pushed the no-more-cas branch 3 times, most recently from 3589b35 to b87c545 Compare March 27, 2018 12:57
@llogiq
Copy link
Contributor Author

llogiq commented Mar 27, 2018

Rebased and squashed.

@kennytm r?

@kennytm
Copy link
Member

kennytm commented Mar 27, 2018

@llogiq The fetch_update method is gone?

@llogiq
Copy link
Contributor Author

llogiq commented Mar 27, 2018

It appears I lost it during the rebase. I'll re-add it.

@llogiq
Copy link
Contributor Author

llogiq commented Mar 27, 2018

Done. Let's see if Travis likes it.

@llogiq llogiq force-pushed the no-more-cas branch 4 times, most recently from dde148c to 2bc4859 Compare March 28, 2018 05:27
@llogiq
Copy link
Contributor Author

llogiq commented Mar 28, 2018

Besides fixing some typos, I had to remove the random number generator, as it doesn't work with types below 32 bits.

@llogiq
Copy link
Contributor Author

llogiq commented Mar 28, 2018

Now with the fetch_update method back, @kennytm r? (also thanks for bearing with me for so long :slight_smile:)


Note: This may call the function multiple times if the value has been changed from other threads in
the meantime, as long as the function returns `Some(_)`. It will however only apply the function
once to the value stored within.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will however only apply the function once to the value stored within.

I don't get what does this mean. Could be clarify this in the code?

Copy link
Contributor Author

@llogiq llogiq Mar 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to say that the function will be applied only once per observed value. I've changed the text accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using compare_exchange_weak, it could actually be applied more than once even if the value has not changed. For example on ARM, a compare_exchange_weak may spuriously fail due to an interrupt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. Ok, I've changed the text once more to say that the function will only have been applied once to the stored value. Otherwise people may fear they need to make the function idempotent.

assert!(max_foo == 42);
```"),
#[inline]
#[unstable(feature = "no_more_cas",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we pick a different feature name to fetch_{max, min} (e.g. atomic_min_max). We may have different stabilization schedule between fetch_{min, max} and fetch_update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

This adds a new method to all numeric `Atomic*` types with a
safe compare-and-set loop, so users will no longer need to write
their own, except in *very* strange circumstances.

This solves rust-lang#48384 with `x.fetch_max(_)`/`x.fetch_min(_)`. It
also relates to rust-lang#48655 (which I misuse as tracking issue for now).

*note* This *might* need a crater run because the functions could
clash with third party extension traits.
@llogiq
Copy link
Contributor Author

llogiq commented Mar 31, 2018

Now we have the split feature gate and a better wording for the comments. Travis also seems to like it. r?

@kennytm
Copy link
Member

kennytm commented Apr 5, 2018

@bors r+

@bors
Copy link
Contributor

bors commented Apr 5, 2018

📌 Commit 0f5e419 has been approved by kennytm

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 5, 2018
kennytm added a commit to kennytm/rust that referenced this pull request Apr 5, 2018
Add a generic CAS loop to std::sync::Atomic*

This adds two new methods to both `AtomicIsize` and `AtomicUsize` with optimized safe compare-and-set loops, so users will no longer need to write their own, except in *very* strange circumstances.

`update_and_fetch` will apply the function and return its result, whereas `fetch_and_update` will apply the function and return the previous value.

This solves rust-lang#48384 with `x.update_and_fetch(|x| x.max(y))`. It also relates to rust-lang#48655 (which I misuse as tracking issue for now)..

*note* This *might* need a crater run because the functions could clash with third party extension traits.
bors added a commit that referenced this pull request Apr 5, 2018
Rollup of 9 pull requests

Successful merges:

 - #48658 (Add a generic CAS loop to std::sync::Atomic*)
 - #49253 (Take the original extra-filename passed to a crate into account when resolving it as a dependency)
 - #49345 (RFC 2008: Finishing Touches)
 - #49432 (Flush executables to disk after linkage)
 - #49496 (Add more vec![... ; n] optimizations)
 - #49563 (add a dist builder to build rust-std components for the THUMB targets)
 - #49654 (Host compiler documentation: Include private items)
 - #49667 (Add more features to rust_2018_preview)
 - #49674 (ci: Remove x86_64-gnu-incremental builder)

Failed merges:
@bors bors merged commit 0f5e419 into rust-lang:master Apr 5, 2018
@llogiq llogiq deleted the no-more-cas branch April 6, 2018 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants