Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for Cell::update #50186

Open
stjepang opened this issue Apr 23, 2018 · 37 comments
Open

Tracking issue for Cell::update #50186

stjepang opened this issue Apr 23, 2018 · 37 comments

Comments

@stjepang
Copy link
Contributor

@stjepang stjepang commented Apr 23, 2018

This issue tracks the cell_update feature.

The feature adds Cell::update, which allows one to more easily modify the inner value.
For example, instead of c.set(c.get() + 1) now you can just do c.update(|x| x + 1):

let c = Cell::new(5);
let new = c.update(|x| x + 1);

assert_eq!(new, 6);
assert_eq!(c.get(), 6);
@oli-obk
Copy link
Contributor

@oli-obk oli-obk commented Apr 24, 2018

Not a blocking issue for merging this as an unstable feature, so I'm registering my concern here:

The update function returning its new value feels very ++i-ish. Similar to += returning the result of the operation (which rust luckily does not!). I don't think we should be doing this. Options I see:

  • return () (that was the suggestion in the PR)
  • make the closure argument a &mut T, and allow the closure to return any value.
    • if users want the "return new/old value" behaviour, they can now encode it themselves via |x| {*x += 1; *y }

Maybe there are other options?

Data point: the volatile crate's update method returns () (https://github.com/embed-rs/volatile/blob/master/src/lib.rs#L134)

@stjepang
Copy link
Contributor Author

@stjepang stjepang commented Apr 24, 2018

@oli-obk

I like the second option (taking a &mut T in the closure). There's an interesting variation on this option - instead of requiring T: Copy we could require T: Default to support updates like this one:

let c = Cell::new(Vec::new());
c.update(|v| v.push("foo"));

Perhaps we could have two methods with different names that clearly reflect how they internally work?

fn get_update<F>(&self, f: F) where T: Copy, F: FnMut(&mut T);
fn take_update<F>(&self, f: F) where T: Default, F: FnMut(&mut T);
@stjepang
Copy link
Contributor Author

@stjepang stjepang commented May 2, 2018

What do you think, @SimonSapin? Would two such methods, get_update and take_update, be a more appropriate interface?

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented May 14, 2018

The T: Default flavor feels kinda awkward but I don’t know how to put that in more concrete terms, sorry :/

It looks like there is a number of slightly different possible APIs for this. I don’t have a strong opinion on which is (or are) preferable.

bors added a commit that referenced this issue Jul 23, 2018
Implement rfc 1789: Conversions from `&mut T` to `&Cell<T>`

I'm surprised that RFC 1789 has not been implemented for several months. Tracking issue: #43038

Please note: when I was writing tests for `&Cell<[i32]>`, I found it is not easy to get the length of the contained slice. So I designed a `get_with` method which might be useful for similar cases. This method is not designed in the RFC, and it certainly needs to be reviewed by core team. I think it has some connections with `Cell::update` #50186 , which is also in design phase.
@CodeSandwich
Copy link

@CodeSandwich CodeSandwich commented Aug 6, 2018

Taking &mut in this method would be awesome, I would love to use it.

Unfortunately it seems that it would have to be unsafe, because AFAIK there is no way in Rust to restrict usage like this:

let c = Cell::new(vec![123]);
c.update(|vec| {
    let x = &mut vec[0];            // x references 123
    c.update(|vec| vec.clear());    // x references uninitialised memory
    *x = 456;                       // BOOM! undefined behavior
})

I hope, I'm wrong.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Aug 7, 2018

@CodeSandwich it is a fundamental restriction of Cell<T> that a reference &T or &mut T (or anything else inside of T) must not be given to the value inside the cell. It’s what makes it sound. (This is the difference with RefCell.)

We could make an API that takes a closure that takes &mut T, but that reference would not be to the inside of the cell. It would be to value that has been copied (with T: Copy) or moved (and temporarily replaced with some other value, with T: Default) onto the stack and will be copied/moved back when the closure returns. So code like your example would be arguably buggy but still sound: the inner update() would operate on a temporary value that would be overwritten at the end of the outer one.

@CodeSandwich
Copy link

@CodeSandwich CodeSandwich commented Aug 13, 2018

I think, that a modifier method might be safe if we just forbid usage of reference to Cell's self inside. Rust does not have mechanism for forbidding usage of a SPECIFIC reference, but it does have mechanism for forbidding usage of ALL references: 'static closures.

I've created a simple implementation of such method and I couldn't find any hole in it. It's a limited solution, but it's enough for many use cases including simple number and collections modifications.

@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Aug 13, 2018

A 'static closure could still reach the Cell, for example by owning a Rc.

@CodeSandwich
Copy link

@CodeSandwich CodeSandwich commented Aug 13, 2018

That's absolutely right :(

@earthengine
Copy link

@earthengine earthengine commented Oct 3, 2018

It seems like, the semantic are a bit different between Default and Copy: the later is more light weighted, as it does not have to fill the hole behind with some temporary rubbish.

What shall we name those variants? update_move for the Default case?

Also there is another variation: update_unsafe which will be unsafe and does not require a trait bound: it only moves the value and leave uninitialized value behind, before the method returns. This will make any nested calls UB and so have to be marked as unsafe.

pub unsafe fn update_unsafe<F>(&self, f: F) -> T where F: FnOnce(T) -> T;
@SimonSapin
Copy link
Contributor

@SimonSapin SimonSapin commented Oct 3, 2018

if you’re gonna use unsafe anyway you can use .get() and manipulate the raw pointer. I don’t think we should facilitate it more than that.

@yuriks
Copy link
Contributor

@yuriks yuriks commented Oct 15, 2018

IMO, closure receiving a &mut to a stack copy isn't that advantageous ergonomically in terms of offering flexibility. You still need to jump through hoops to get the old value if you want to. Consider:

let id = next_id.update(|x| { let y = *x; *x += 1; y });

I think the most universally applicable thing to do would be to offer all of update, get_and_update and update_and_get:

things.update(|v| v.push(4)); // returns ()
// Alternative with v by-move. I'd guess this one isn't as useful, and you can easily use
// mem::swap or mem::replace in the by-ref version instead if you need it.
things.update(|v| { v.push(4); v });
let id = next_id.get_and_update(|x| x + 1);
let now = current_time.update_and_get(|x| x + 50);

These are 3 additional names, but I think they'll make the function more useful. A by-mut-ref update would only be useful for complex types where you're likely to do in-place updates, but I find myself doing the .get() -> update/keep value -> .set() pattern a lot with Cell.

@peterjoel
Copy link
Contributor

@peterjoel peterjoel commented May 13, 2019

Related conversation on a similar PR (for RefCell): #38741

@matthew-mcallister
Copy link
Contributor

@matthew-mcallister matthew-mcallister commented Jan 6, 2020

Has it been considered to use an arbitrary return type on top of updating the value? That is (naming aside),

fn update1<F>(&self, f: F) where F: FnOnce(T) -> T;
fn update2<F, R>(&self, f: F) -> R where F: FnOnce(T) -> (T, R);

The second method has a very functional feel to it and appears both flexible and sound (i.e. you can't return &T). You could get either ++i or i++ behavior out of it. Also it is the same whether T: Copy or not. Though ergonomics are of course debatable.

EDIT: Something to chew on:

fn dup<T: Copy>(x: T) -> (T, T) { (x, x) }
let i = i.update2(|i| (i + 1, i)); // i++
let i = i.update2(|i| dup(i + 1)); // ++i
@Lucretiel
Copy link
Contributor

@Lucretiel Lucretiel commented Jan 17, 2020

+1 for a T version, as opposed to &mut T. It's easy to construct demo code that is more expressive with either version (ie, x + 1 vs vec.push(..)), but unless we add both, I think the more general version is preferable, since under the hood they both require a swap-out swap-in for soundness. Also +1 for an extra version with a return value.

Finally, I think Default is preferable to Copy. My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around, and hopefully for simple cases the compiler should be able to optimize out the extra writes anyway.

@tomaka tomaka mentioned this issue Feb 6, 2020
70 of 81 tasks complete
@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Feb 24, 2020

It sounds like accepting &mut T isn't a viable option.

I'm sort of combining previous ideas here. How about this?

fn update<F>(&self, f: F) where T: Copy, F: FnOnce(T) -> T;
fn update_map<F, U>(&self, f: F) -> U where T: Copy, F: FnOnce(T) -> (T, U);
fn take_update<F>(&self, f: F) where T: Default, F: FnOnce(T) -> T;
fn take_update_map<F, U>(&self, f: F) -> U where T: Default, F: FnOnce(T) -> (T, U);
@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Feb 29, 2020

A completely different option could be to not take a function as argument, but return a 'smart pointer' object that contains the taken/copied value and writes it back to the cell when it is destructed.

Then you could write

    a.take_update().push(1);

    *b.copy_update() += 1;

[Demo implementation]

@zseri
Copy link

@zseri zseri commented Mar 2, 2020

@m-ou-se I think that would be unsafe suprising because multiple instances of the 'smart pointer' / guard could exist simultaneously.

        let mut cu1 = b.copy_update();
        let mut cu2 = b.copy_update();
        *cu1 += 1;
        *cu2 += 1;
        *cu1 += *cu2;

https://play.rust-lang.org/?version=beta&mode=debug&edition=2018&gist=f3fc67425f1b7f1dba1220fb961ad7d9

@Lucretiel
Copy link
Contributor

@Lucretiel Lucretiel commented Mar 2, 2020

The proposed implementation appears to move the value out of the cell and into the smart pointer; pointer in this case is a misnomer. This will lead to the old c++ auto_ptr problem: not unsound, but surprising in nontrivial use cases (ie, if two exist at the same time, one will receive a garbage value)

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

None of the proposed solutions (with a T -> T lambda, with a &mut T -> () lambda, or with a 'smart pointer'/RaiiGuard/younameit, etc.) are unsafe. All of them can be implemented using only safe code.

All of them have this 'update while updating' problem. That's not inherent to a specific solution, but inherent to Cell. There's just no way to 'lock' access to a Cell. That's the entire point of a Cell. Rust has a built-in solution for this problem in general: only allowing mutation through a provably uniquely borrowed &mut. Cell is the way to explicitly disable that feature.

With the current (unstable) update() function: [Demo]

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| x + 1);
        x + 1
    });
    assert_eq!(a.get(), 1); // not 2

With a &mut:

    let a = Cell::new(0);
    a.update(|x| {
        a.update(|x| *x += 1);
        *x += 1;
    });
    assert_eq!(a.get(), 1); // not 2

With a copy-containing 'smart pointer':

    let a = Cell::new(0);
    {
        let mut x = a.copy_update();
        *a.copy_update() += 1;
        *x += 1;
    }
    assert_eq!(a.get(), 1); // not 2

There is no way to avoid this problem. Every possible solution we come up with will have this problem. All we can do is try to make it easier/less verbose to write the commonly used .get() + .set() (or .take() + .set()) combination.

@zseri
Copy link

@zseri zseri commented Mar 2, 2020

I think Cell::update should be marked as unsafe (I know this idea was already discarded, but I currently don't know about alternatives, or do we have any other "fitting" attribute which marks functions as potentially suprising/non-trivial with edge cases, e.g. something that marks a function call as "review me more intense, I may have suprising side-effects, but I'm not unsafe"?) even though it doesn't contain unsafe code, because of this non-trivial/possible suprising behavoir, which (I think) doesn't happen when using normal .get/.set methods (at least not "hidden").

// e.g.:
  // current Cell::update function, edge-case behavoir isn't obvious
  let a = Cell::new(0);
  a.update(|x| {
    a.update(|x| *x += 1);
    *x += 1;
  });
  assert_eq!(a.get(), 1);

// versus
  // same workflow without the "blackbox" Cell::update, edge-case behavoir is obvious
  let a = Cell::new(0);
  {
    let old = a.get();
    let new2 = {
      let new1 = old + 1;
      a.set(new1);
      old + 1
    };
    a.set(new2);
  }
  assert_eq!(a.get(), 1);

Do we have some existing use case which would hugely benefit from the current Cell::update?

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

My gut is that it covers a wider range of cases; that is, it's more likely for a type to implement Default but not Copy than the other way around

There are quite a few types that implement Copy but not Default. For example: &T, Instant, SystemTime, IpAddr, NonNull<T>, NonZero*, MaybeUninit<T>, pointers, ThreadId, ...

(And of course, there are a lot of types that implement Default that don't implement Copy, such as Vec<T> and many more.)

Temporarily placing the Default value in the Cell (e.g. 0 for i32) while updating might lead to harder to debug issues: if a Cell is accessed while it is being updated, it then contains an unexpected value (0) instead of just the old value.

So I think we really need both options.

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

Do we have some existing use case which would hugely benefit from the current Cell::update?

I was working on a variant on Rc<T> recently, and the source contains lines like:

if self.inner.counter().strong.update(|x| x - 1) == 0 {
    ...
}

That'd be a lot more verbose to write with separate .get() and .set().

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

The only way to make absolutely sure 'update-while-updating' doesn't happen, is to not allow the user to provide arbitrary code to update the value. Then we would basically end up with the same restrictive interface as the atomics: fetch_add(1), fetch_sub(1), etc. for a few 'approved' operations like |x| x + y and |x| x - y, etc. But that would be way less flexible. (Atomics are usually integers, but Cells are used for all kind of types.)

@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Mar 2, 2020

Edit: Nevermind, this doesn't work.

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

Okay, I'm just trying to think outside the box here...

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

fn update<F>(#[exclusive_borrow] &self, f: F) -> T where F: FnOnce(T) -> T
cell.update(|x| {
    cell.update(|x| x + 1); // Error: cell is exclusively borrowed above
    x + 1
});
@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Mar 2, 2020

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

To put the problem in other words, what we need is for the borrow checker to treat &self as &mut self so that the lambda is not allowed to borrow self.

That's not really possible. It could be some function deep within some function call that also happens to have a reference to the cell.

The entire point of the Cell is to disable this part of the borrow checker. If there was a way to prove to the borrow checker you're only modifying it from one place, you didn't need a Cell<T> to begin with. Then you could just have used a T instead.

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

Is it an option to leave the update-while-updating problem to the user? We could put a clear warning in the documentation.

I think that's fine. It's also our only option, other than not providing any update mechanism at all.

@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Mar 2, 2020

Then I suppose the behavior of nested updates needs to at least be well defined.

I would think that the update method opens the door for compiler optimizations, and that could be part of the motivation for adding the method. For example, shouldn't we be able to optimize away Default::default() in some cases?

@m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Mar 2, 2020

With all proposed solutions, 'nested' updates are well defined.

It isn't needed for optimization. The optimizer will understand a .get()+.set() (or .take()+.set()) just as well. An update function is just to make it shorter to write, and possibly to make it easier to write correct code.

In most cases, the optimizer will be able optimize the Default::default() value away, sure. However, if you're calling a (non-inline) function from another crate, it can not always know whether the function can panic or read from the Cell, so it can't optimize away writing the temporary placeholder.

[Demo]

@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Mar 2, 2020

The "smart pointer" idea is really neat. It just seems a little too obfuscated since the write back to the Cell is hidden. As soon as somebody decides to bind the "smart pointer" to a variable, it gets more confusing and the user probably should have used get and set instead. With a lambda, the timing of the update to the Cell is clear.

@camsteffen
Copy link
Contributor

@camsteffen camsteffen commented Mar 2, 2020

I second the motion that we replace update with get_update and take_update, but with lambda type T -> T. The need for having both has been demonstrated. The value must be copied or taken to supply the lambda. I think the API should just be upfront about that fact. This will alleviate the risk that "nested" updates cause surprising behavior. It's also very consistent with the current API.

The smart pointer idea could be considered as a separate enhancement, unless people feel that that should be the main implementation of update.

@Lucretiel
Copy link
Contributor

@Lucretiel Lucretiel commented Mar 5, 2020

What if we created a way to tell the borrow checker that a borrow is "exclusive", similar to a mutable borrow?

By definition, a mutable borrow is an exclusive borrow.

@rrevenantt
Copy link

@rrevenantt rrevenantt commented Jun 9, 2020

Can't we require closure to be Send to make variant with &mut T sound?

fn update<F>(&self, f: F) where F: Send + FnOnce(&mut T);

Since Cell is !Sync anything that can contain reference to Cell must be !Send which prevents any kind of reference to Cell from appearing inside the closure, thus trivially preventing any recursive calls on the same Cell. Here is a playground that fails to compile on "update while updating". Error message is confusing, though, because threads are not involved here, but with proper documentation it shouldn't be a problem.

@RustyYato
Copy link
Contributor

@RustyYato RustyYato commented Jun 9, 2020

This is an abuse of Send to get desired results. Send should only be used in the context of threading, and so it doesn't fit here. Secondly, it would prevent many valid use-cases for example, you can't capture a Rc<_> (which you would probably do if you are using a Cell).

@glaebhoerl
Copy link
Contributor

@glaebhoerl glaebhoerl commented Jun 10, 2020

@rrevenantt I brought up the same idea several years ago; unfortunately it's still not sufficient for soundness. The closure could still refer to a Cell through a thread-local variable, which is the only specific example I remember (but one is sufficient to demonstrate unsoundness), but IIRC, there were other ways to circumvent the restriction as well.

@RustyYato Please try to keep an open mind. I'm sure there have been other cases in Rust's history where something originally intended for one purpose turned out to be useful for another one as well, and in general it can be a fruitful and enlightening direction to explore. This also would not have removed any of Cell's existing methods, only added a new one with a different tradeoff.

you can't capture a Rc<_> (which you would probably do if you are using a Cell)

(And furthermore, if you are using a Cell then you couldn't call the method, so this logic is backwards.)

@Kevan0v
Copy link

@Kevan0v Kevan0v commented Aug 23, 2020

Any reason why the API for RefCell and Cell is so inconsistent?

RefCell has a method replace_with similar to the method update for Cell but return the old value instead.

Shouldn't it be named update_with and be available on both or at least have the methods update and replace_with available for both?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet