Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Cell::update #50186

Open
ghost opened this issue Apr 23, 2018 · 51 comments
Open

Tracking issue for Cell::update #50186

ghost opened this issue Apr 23, 2018 · 51 comments
Labels
B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@ghost
Copy link

ghost commented Apr 23, 2018

This issue tracks the cell_update feature.

The feature adds Cell::update, which allows one to more easily modify the inner value.
For example, instead of c.set(c.get() + 1) now you can just do c.update(|x| x + 1):

let c = Cell::new(5);
let new = c.update(|x| x + 1);

assert_eq!(new, 6);
assert_eq!(c.get(), 6);
@ghost ghost mentioned this issue Apr 23, 2018
@sfackler sfackler added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Implemented in the nightly compiler and unstable. labels Apr 24, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Apr 24, 2018

Not a blocking issue for merging this as an unstable feature, so I'm registering my concern here:

The update function returning its new value feels very ++i-ish. Similar to += returning the result of the operation (which rust luckily does not!). I don't think we should be doing this. Options I see:

  • return () (that was the suggestion in the PR)
  • make the closure argument a &mut T, and allow the closure to return any value.
    • if users want the "return new/old value" behaviour, they can now encode it themselves via |x| {*x += 1; *y }

Maybe there are other options?

Data point: the volatile crate's update method returns () (https://github.com/embed-rs/volatile/blob/master/src/lib.rs#L134)

@ghost
Copy link
Author

ghost commented Apr 24, 2018

@oli-obk

I like the second option (taking a &mut T in the closure). There's an interesting variation on this option - instead of requiring T: Copy we could require T: Default to support updates like this one:

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);

@ghost ghost mentioned this issue May 2, 2018
@ghost
Copy link
Author

ghost commented May 2, 2018

What do you think, @SimonSapin? Would two such methods, get_update and take_update, be a more appropriate interface?

@SimonSapin
Copy link
Contributor

The T: Default flavor feels kinda awkward but I don’t know how to put that in more concrete terms, sorry :/

It looks like there is a number of slightly different possible APIs for this. I don’t have a strong opinion on which is (or are) preferable.

bors added a commit that referenced this issue Jul 23, 2018
Implement rfc 1789: Conversions from `&mut T` to `&Cell<T>`

I'm surprised that RFC 1789 has not been implemented for several months. Tracking issue: #43038

Please note: when I was writing tests for `&Cell<[i32]>`, I found it is not easy to get the length of the contained slice. So I designed a `get_with` method which might be useful for similar cases. This method is not designed in the RFC, and it certainly needs to be reviewed by core team. I think it has some connections with `Cell::update` #50186 , which is also in design phase.
@CodeSandwich
Copy link

CodeSandwich commented Aug 6, 2018

Taking &mut in this method would be awesome, I would love to use it.

Unfortunately it seems that it would have to be unsafe, because AFAIK there is no way in Rust to restrict usage like this:

let c = Cell::new(vec![123]);
c.update(|vec| {
    let x = &mut vec[0];            // x references 123
    c.update(|vec| vec.clear());    // x references uninitialised memory
    *x = 456;                       // BOOM! undefined behavior
})

I hope, I'm wrong.

@SimonSapin
Copy link
Contributor

@CodeSandwich it is a fundamental restriction of Cell<T> that a reference &T or &mut T (or anything else inside of T) must not be given to the value inside the cell. It’s what makes it sound. (This is the difference with RefCell.)

We could make an API that takes a closure that takes &mut T, but that reference would not be to the inside of the cell. It would be to value that has been copied (with T: Copy) or moved (and temporarily replaced with some other value, with T: Default) onto the stack and will be copied/moved back when the closure returns. So code like your example would be arguably buggy but still sound: the inner update() would operate on a temporary value that would be overwritten at the end of the outer one.

@CodeSandwich
Copy link

CodeSandwich commented Aug 13, 2018

I think, that a modifier method might be safe if we just forbid usage of reference to Cell's self inside. Rust does not have mechanism for forbidding usage of a SPECIFIC reference, but it does have mechanism for forbidding usage of ALL references: 'static closures.

I've created a simple implementation of such method and I couldn't find any hole in it. It's a limited solution, but it's enough for many use cases including simple number and collections modifications.

@SimonSapin
Copy link
Contributor

A 'static closure could still reach the Cell, for example by owning a Rc.

@CodeSandwich
Copy link

That's absolutely right :(

@XAMPPRocky XAMPPRocky added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Oct 2, 2018
@earthengine
Copy link

earthengine commented Oct 3, 2018

It seems like, the semantic are a bit different between Default and Copy: the later is more light weighted, as it does not have to fill the hole behind with some temporary rubbish.

What shall we name those variants? update_move for the Default case?

Also there is another variation: update_unsafe which will be unsafe and does not require a trait bound: it only moves the value and leave uninitialized value behind, before the method returns. This will make any nested calls UB and so have to be marked as unsafe.

pub unsafe fn update_unsafe<F>(&self, f: F) -> T where F: FnOnce(T) -> T;

@SimonSapin
Copy link
Contributor

if you’re gonna use unsafe anyway you can use .get() and manipulate the raw pointer. I don’t think we should facilitate it more than that.

@yuriks
Copy link
Contributor

yuriks commented Oct 15, 2018

IMO, closure receiving a &mut to a stack copy isn't that advantageous ergonomically in terms of offering flexibility. You still need to jump through hoops to get the old value if you want to. Consider:

let id = next_id.update(|x| { let y = *x; *x += 1; y });

I think the most universally applicable thing to do would be to offer all of update, get_and_update and update_and_get:

things.update(|v| v.push(4)); // returns ()
// Alternative with v by-move. I'd guess this one isn't as useful, and you can easily use
// mem::swap or mem::replace in the by-ref version instead if you need it.
things.update(|v| { v.push(4); v });
let id = next_id.get_and_update(|x| x + 1);
let now = current_time.update_and_get(|x| x + 50);

These are 3 additional names, but I think they'll make the function more useful. A by-mut-ref update would only be useful for complex types where you're likely to do in-place updates, but I find myself doing the .get() -> update/keep value -> .set() pattern a lot with Cell.

@peterjoel
Copy link
Contributor

Related conversation on a similar PR (for RefCell): #38741

@matthew-mcallister
Copy link
Contributor

matthew-mcallister commented Jan 6, 2020

Has it been considered to use an arbitrary return type on top of updating the value? That is (naming aside),

fn update1<F>(&self, f: F) where F: FnOnce(T) -> T;
fn update2<F, R>(&self, f: F) -> R where F: FnOnce(T) -> (T, R);

The second method has a very functional feel to it and appears both flexible and sound (i.e. you can't return &T). You could get either ++i or i++ behavior out of it. Also it is the same whether T: Copy or not. Though ergonomics are of course debatable.

EDIT: Something to chew on:

fn dup<T: Copy>(x: T) -> (T, T) { (x, x) }
let i = i.update2(|i| (i + 1, i)); // i++
let i = i.update2(|i| dup(i + 1)); // ++i

@Lucretiel
Copy link
Contributor

+1 for a T version, as opposed to &mut T. It's easy to construct demo code that is more expressive with either version (ie, x + 1 vs vec.push(..)), but unless we add both, I think the more general version is preferable, since under the hood they both require a swap-out swap-in for soundness. Also +1 for an extra version with a return value.

Finally, I think Default is preferable to Copy. My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around, and hopefully for simple cases the compiler should be able to optimize out the extra writes anyway.

@camsteffen
Copy link
Contributor

It sounds like accepting &mut T isn't a viable option.

I'm sort of combining previous ideas here. How about this?

fn update<F>(&self, f: F) where T: Copy, F: FnOnce(T) -> T;
fn update_map<F, U>(&self, f: F) -> U where T: Copy, F: FnOnce(T) -> (T, U);
fn take_update<F>(&self, f: F) where T: Default, F: FnOnce(T) -> T;
fn take_update_map<F, U>(&self, f: F) -> U where T: Default, F: FnOnce(T) -> (T, U);

@m-ou-se
Copy link
Member

m-ou-se commented Feb 29, 2020

A completely different option could be to not take a function as argument, but return a 'smart pointer' object that contains the taken/copied value and writes it back to the cell when it is destructed.

Then you could write

    a.take_update().push(1);

    *b.copy_update() += 1;

[Demo implementation]

@fogti
Copy link
Contributor

fogti commented Mar 2, 2020

@m-ou-se I think that would be unsafe suprising because multiple instances of the 'smart pointer' / guard could exist simultaneously.

        let mut cu1 = b.copy_update();
        let mut cu2 = b.copy_update();
        *cu1 += 1;
        *cu2 += 1;
        *cu1 += *cu2;

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=f3fc67425f1b7f1dba1220fb961ad7d9

@Lucretiel
Copy link
Contributor

Lucretiel commented Mar 2, 2020

The proposed implementation appears to move the value out of the cell and into the smart pointer; pointer in this case is a misnomer. This will lead to the old c++ auto_ptr problem: not unsound, but surprising in nontrivial use cases (ie, if two exist at the same time, one will receive a garbage value)

@m-ou-se
Copy link
Member

m-ou-se commented Mar 2, 2020

None of the proposed solutions (with a T -> T lambda, with a &mut T -> () lambda, or with a 'smart pointer'/RaiiGuard/younameit, etc.) are unsafe. All of them can be implemented using only safe code.

All of them have this 'update while updating' problem. That's not inherent to a specific solution, but inherent to Cell. There's just no way to 'lock' access to a Cell. That's the entire point of a Cell. Rust has a built-in solution for this problem in general: only allowing mutation through a provably uniquely borrowed &mut. Cell is the way to explicitly disable that feature.

With the current (unstable) update() function: [Demo]

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| x + 1);
        x + 1
    });
    assert_eq!(a.get(), 1); // not 2

With a &mut:

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| *x += 1);
        *x += 1;
    });
    assert_eq!(a.get(), 1); // not 2

With a copy-containing 'smart pointer':

    let a = Cell::new(0);
    {
        let mut x = a.copy_update();
        *a.copy_update() += 1;
        *x += 1;
    }
    assert_eq!(a.get(), 1); // not 2

There is no way to avoid this problem. Every possible solution we come up with will have this problem. All we can do is try to make it easier/less verbose to write the commonly used .get() + .set() (or .take() + .set()) combination.

@camsteffen
Copy link
Contributor

camsteffen commented Mar 2, 2020

Then I suppose the behavior of nested updates needs to at least be well defined.

I would think that the update method opens the door for compiler optimizations, and that could be part of the motivation for adding the method. For example, shouldn't we be able to optimize away Default::default() in some cases?

@m-ou-se
Copy link
Member

m-ou-se commented Mar 2, 2020

With all proposed solutions, 'nested' updates are well defined.

It isn't needed for optimization. The optimizer will understand a .get()+.set() (or .take()+.set()) just as well. An update function is just to make it shorter to write, and possibly to make it easier to write correct code.

In most cases, the optimizer will be able optimize the Default::default() value away, sure. However, if you're calling a (non-inline) function from another crate, it can not always know whether the function can panic or read from the Cell, so it can't optimize away writing the temporary placeholder.

[Demo]

@camsteffen
Copy link
Contributor

The "smart pointer" idea is really neat. It just seems a little too obfuscated since the write back to the Cell is hidden. As soon as somebody decides to bind the "smart pointer" to a variable, it gets more confusing and the user probably should have used get and set instead. With a lambda, the timing of the update to the Cell is clear.

@camsteffen
Copy link
Contributor

camsteffen commented Mar 2, 2020

I second the motion that we replace update with get_update and take_update, but with lambda type T -> T. The need for having both has been demonstrated. The value must be copied or taken to supply the lambda. I think the API should just be upfront about that fact. This will alleviate the risk that "nested" updates cause surprising behavior. It's also very consistent with the current API.

The smart pointer idea could be considered as a separate enhancement, unless people feel that that should be the main implementation of update.

@Lucretiel
Copy link
Contributor

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

By definition, a mutable borrow is an exclusive borrow.

@rrevenantt
Copy link

Can't we require closure to be Send to make variant with &mut T sound?

fn update<F>(&self, f: F) where F: Send + FnOnce(&mut T);

Since Cell is !Sync anything that can contain reference to Cell must be !Send which prevents any kind of reference to Cell from appearing inside the closure, thus trivially preventing any recursive calls on the same Cell. Here is a playground that fails to compile on "update while updating". Error message is confusing, though, because threads are not involved here, but with proper documentation it shouldn't be a problem.

@RustyYato
Copy link
Contributor

RustyYato commented Jun 9, 2020

This is an abuse of Send to get desired results. Send should only be used in the context of threading, and so it doesn't fit here. Secondly, it would prevent many valid use-cases for example, you can't capture a Rc<_> (which you would probably do if you are using a Cell).

@glaebhoerl
Copy link
Contributor

@rrevenantt I brought up the same idea several years ago; unfortunately it's still not sufficient for soundness. The closure could still refer to a Cell through a thread-local variable, which is the only specific example I remember (but one is sufficient to demonstrate unsoundness), but IIRC, there were other ways to circumvent the restriction as well.

@RustyYato Please try to keep an open mind. I'm sure there have been other cases in Rust's history where something originally intended for one purpose turned out to be useful for another one as well, and in general it can be a fruitful and enlightening direction to explore. This also would not have removed any of Cell's existing methods, only added a new one with a different tradeoff.

you can't capture a Rc<_> (which you would probably do if you are using a Cell)

(And furthermore, if you are using a Cell then you couldn't call the method, so this logic is backwards.)

@KodrAus KodrAus added Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@Kevan0v
Copy link

Kevan0v commented Aug 23, 2020

Any reason why the API for RefCell and Cell is so inconsistent?

RefCell has a method replace_with similar to the method update for Cell but return the old value instead.

Shouldn't it be named update_with and be available on both or at least have the methods update and replace_with available for both?

@SimonSapin
Copy link
Contributor

Today I’m writing a helper method (local to my crate) that does the same as Cell::update. Having it available on Stable directly on Cell would be nice.

There are a few possible variations of the API, but at this point more time on Nightly is not gonna bring much new information about which is better.


What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

It’s not just similar, an exclusive borrow is exactly what &mut T is. Before 1.0 there was a proposal to rename &mut to &uniq: https://smallcultfollowing.com/babysteps/blog/2014/05/13/focusing-on-ownership/ . I tend to call &T shared borrow and &mut T exclusive borrow even though the syntax still uses mut.

Exclusive borrows don’t help with this issue, though. The whole point of Cell and friends is to allow mutation through shared borrows. If you have &mut Cell<T> you can call .get_mut() on it to get a &mut T.

@danielhenrymantilla
Copy link
Contributor

danielhenrymantilla commented Sep 23, 2021

Regarding naming, what about .set_with() (and a copy / get counterpart)? These would be T -> T closures, which, thanks to the given naming, would be less "C-ish" than the initial concern pointed out regarding the update name: cell.set_with(|x| x + 1);

Also, the fact that we get a T -> T closure is way more honest / less 🦶🔫y w.r.t. the semantics of the function under the hood:

let counter: Cell<i32> = Cell::new(0);
counter.set_with(|c: i32| {
    counter.set_with(|c| c + 1);
    c + 1 // <- clearer that this is the *original* value + 1
});
assert_eq!(counter.get(), 1);

The only drawback of this approach is that since it uses the return slot for the update, extracting an arbitrary value from the closure is a bit more cumbersome. I think this is an acceptable tradeoff, given the name .…set… which emphasizes the aspect of being an overwrite which aims at overwriting the cell innards; for those wanting not only to mutate that value but also to compute something off either the old or the new value, they can either use an "Option second return slot", or they'll have to use an "extra" .get() (e.g. keep using .get()/.take() with .set(), or using .get() after the set_with), which, for the task in question, seems like a very legitimate thing to do (and nothing worse than what currently needs to be done).

Finally, regarding those who like the ergonomics of a &mut T update operation over a "mut" T -> T one, that definitely belongs to a more general "irk" in Rust which is currently tackled by things like .tap_mut() (while I'm not fond of that name and/or of the fact the more useful &mut based API is longer to type than the occasional &-based API, the API itself, that is, the (Self, FnOnce(&mut Self)) -> Self operation, is incredibly handy): that is an addition that ought to eventually be considered / discussed for the stdlib, but one which belongs to another thread and whatnot.

TL,DR

If C++ is any example, the footgun that a &mut T-based API provides is not worth, imho, the minor ergonomic benefits, so we should stick to the clearer T -> T API, and amend the name accordingly.

@m-ou-se
Copy link
Member

m-ou-se commented Sep 29, 2021

The 'copy vs take' issue was discussed in the library api team meeting recently. We concluded that having this functionality for only for Copy types or only for Default types would be too restrictive, but that having two versions of this function would also not be the best option. The difference between e.g. copy_update and take_update would be very subtle and in many cases not matter at all to the user, making them more confusing than necessary.

We'd like to explore the possibility of having a single function that works on both Copy types and Default types, perhaps with some marker trait with overlapping implementations. For types that implement both of these traits, the function would use the copying behaviour, as that seems to be the least surprising. It'd only .take() the value if it can't be copied.

@danielhenrymantilla
Copy link
Contributor

Oh yeah, that would be wonderful, and I agree that Copying is a more sensible default than take()ing

@camsteffen
Copy link
Contributor

camsteffen commented Sep 29, 2021

So, just to see what that might look like...

/// Automatically implemented for types that implement either `Copy` or `Default`
trait CopyOrDefault {
    /// Copies the value if `Self: Copy`, otherwise replaces with `default()` and returns the previous value
    fn copy_or_take(&mut self) -> Self;
}

impl Cell<T> {
    fn update(&self, f: impl Fn(T) -> T)
    where T: CopyOrDefault {..}
}

@m-ou-se
Copy link
Member

m-ou-se commented Sep 29, 2021

Here's an implementation of this idea: #89357

@m-ou-se
Copy link
Member

m-ou-se commented Sep 29, 2021

So, just to see what that might look like...

@camsteffen The -> T will have to be removed from update, because it can no longer return a copy of the new value if the value is not necessarily Copy anymore.

@camsteffen
Copy link
Contributor

I kinda like the "unstable trickery" rather than exposing a new marker trait that seems really niche. We could always expose the marker trait in the future if wanted.

@comex
Copy link
Contributor

comex commented Sep 30, 2021

But it does commit Rust to having a form of specialization capable of expressing "A or B" for unrelated A and B… are there any other standard library APIs that do this?

@camsteffen
Copy link
Contributor

I opened a related discussion at https://internals.rust-lang.org/t/pre-rfc-assign-copy-types-to-cell-without-set/16081 - maybe we can enable the assignment operator for Copy types, and update can just work for Default types.

@psionic12
Copy link

psionic12 commented May 2, 2022

Currently, the implementation use Cell::get() to get the original value, which makes a copy, is it reasonable to use MaybeUninit to replace the old value, since a new value will be set back after all?

@danielhenrymantilla
Copy link
Contributor

@psionic12 perf-wise, a mem::replace cannot be less expensive than a Copy (the original copy is still there; and there is potentially a second write unless elided out). API-wise, remember that .set_with() can be re-entrant, so the nested call still needs to see a well-formed value 😬. So, a MaybeUninit-based approach, on top of requiring a change to all Cells so that they wrap a MaybeUninit for soundness, would lead to a problematic situation (need to panic?) when facing re-entrancy like this, which seems quite at odds with Cell's design philosophy 🙂

@psionic12
Copy link

psionic12 commented May 2, 2022

@danielhenrymantilla I'm sorry I forgot the "re-entrant" thing, if someone somehow needs to access the cell again in f(), bad things happens :(, thanks for pointing out. Design a stable API is really harder than I think.

@matklad
Copy link
Member

matklad commented May 20, 2023

To add one more color:

fn swap_with<F>(&self, f: F) where T: Default, F: FnMut(T) -> T;

"swap" in the name gives intuition about what happens under the hood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests